Do I even know how AI/GenAI actually works?

Written by Jon Mendoza, CISO | March 16, 2026

I’d like to consider that I was an early adopter of AI and its practical applications to my domain of interest, which is Cybersecurity. Back in 2017, I recall being intrigued by this company, Darktrace. Since then, I eagerly awaited the benefits and promise of AI in aiding the cause of cyber practitioners. I have largely kept up with the progress of AI but recently my deep interest was spurred with the promise of AI Agents and their impact. This is the first part of a series of articles that I will be publishing, that will journal my journey in learning more about how I can best utilize AI for the purposes of cybersecurity.

I've spent the last several months going deep on AI and large language models — not from an AI researcher standpoint, but as a cybersecurity practitioner trying to understand what's actually under the hood. If you're going to defend against something or leverage it to help achieve your outcomes, you need to understand how it works. This is me sharing what I learned in plain terms. I am by no means any expert but sharing what I have learned thus far. It also helps with my learning journey because the better I am able to distill the information into simpler information, I’m able to refine my understanding of this topic.

I think of AI Like a City

One of the best ways I found to frame this: think of modern AI as a city. The AI models (the "engines" doing the thinking) are the buildings. The training methods that shaped them are the construction crews. And the hardware and software that keeps everything running are the power grid and the roads.

You can admire a skyline without knowing how a building was constructed. But if your job is security, you need to know the basics of city, the layout, the construction methods, and where are the chokepoints/vulnerabilities.

The Different Types of AI "Engines"

There are a few key model types you'll hear about. Here's what they actually do:

CNNs (Convolutional Neural Networks) — These are the image guys. They're really good at recognizing patterns in pictures. Think facial recognition, spotting anomalies in medical scans, or identifying objects in video feeds. They look at small pieces of an image at a time and build up understanding layer by layer — edges become shapes, shapes become objects. From a security standpoint, this is the tech behind a lot of surveillance and biometric systems. For the past few months, I have been utilizing Sora and Veo along with Stable diffusion (U-Net which is a type of CNN) to play around with deep fake and image generation.
RNNs (Recurrent Neural Networks) — These were built to handle sequences — things that happen in order, like words in a sentence or events in a log file. They work by passing a kind of "memory" from step to step. The problem is they struggle to remember things from far back in the sequence, which limits how useful they are for longer content. This is typically used in sequence modeling and natural language processing (NLPs).
LSTMs (Long Short-Term Memory) — Think of LSTMs as the upgraded version of RNNs. They were designed with gates — basically internal switches — that control what the model remembers, forgets, and passes forward. Before Transformers came along, LSTMs were the backbone of most language AI systems. They're still relevant in specific scenarios. See article below
Transformers — This is where the game changed. Transformers don't process sequences step by step. Instead, they look at everything at once and figure out which parts of the input matter most to each other. That's called "attention." It's why they scale so well and why they power nearly every major AI tool you're using today — ChatGPT, Claude, Gemini, all of them.
LLMs (Large Language Models) — An LLM is just a very large Transformer that's been trained on massive amounts of text. When you ask it a question, it's predicting what tokens (word fragments) should come next based on patterns it learned during training. The "large" part matters — researchers found that performance improves significantly as you add more data, more parameters, and more compute.

The one thing I overlooked: The Engine Room

Here's something that gets glossed over in most AI literature — the hardware and infrastructure behind all this. You can't just throw a model onto a laptop and expect it to work (I tried long ago and found out quickly the limitations). These systems run on specialized chips (GPUs and TPUs) that are purpose-built for the kind of massive math operations AI requires. I personally purchased an RTX 5090 to try out open source models and found out it was limited very quickly by the VRAM. I learned all about Quantization and how local models can degrade because you are trying to fit a model that is too large for your local hardware.

The models are often so large they don't even fit on a single chip. So they get split up across dozens or hundreds of devices, each handling a piece of the puzzle. The coordination required to make that work is fascinating and can be quite complex.

When you're using one of these tools in your browser, there's an entire serving infrastructure behind it — managing memory, optimizing how responses get generated token by token, and handling hundreds of requests at once. Tools like vLLM do this by managing the model's "working memory" (called a KV cache) more efficiently, so more users can be served without blowing up costs.

The Attention Mechanism — The Real Secret Sauce

If there's one concept worth actually understanding, it's attention. Imagine you're reading a sentence and trying to figure out what the word "it" refers to. Your brain doesn't re-read the whole paragraph — it focuses on the most relevant parts. That's what the attention mechanism does, mathematically.

Every Transformer uses this. For each word (or token) it processes, it will ask: what else in this sequence should I be paying attention to right now? It generates scores for every other token, then uses those scores to decide how much weight to give each one when forming a response.

This is powerful. It's also expensive. The longer your input, the more comparisons the model has to make. That's why there's been a lot of engineering work to make attention faster (FlashAttention is one example — an approach that cuts down memory usage without changing the math).

How These Models Are "Taught" to Behave

Raw training is just predicting the next word. That alone doesn't make a model useful or safe. The modern pipeline looks more like this:

Pretraining — Feed the model an enormous amount of text. It learns patterns, language structure, facts, reasoning styles — all from prediction.
Instruction tuning — Show the model examples of questions and good answers. It learns to be helpful, not just to predict.
RLHF (Reinforcement Learning from Human Feedback) — Human raters compare model responses. The model learns to prefer outputs that humans prefer. This is how guardrails get built in.

From a security standpoint, this is important. The model's behavior is shaped by training data and human feedback — which means it can also be manipulated. Prompt injection, jailbreaks, and adversarial inputs are all real attack surfaces targeting this layer.

Transfer Learning — Why You Don't Need to Start From Scratch

One of the most practical concepts for anyone building on top of AI: transfer learning. Instead of training a model from zero, you take a model that already learned a ton from massive data, and you fine-tune it for your specific use case.

Think of it like hiring someone with 20 years of general experience and spending a few weeks onboarding them to your specific environment. They bring all their existing knowledge — you just customize the edges.

A technique called LoRA takes this further by making fine-tuning efficient even for massive models. Instead of retraining all the parameters, it adds small trainable layers that adapt the model's behavior without touching the original weights. This is how many organizations are deploying custom AI without paying to retrain from scratch. I have used LoRA specifically for video generation.

RAG — Giving the Model a Library Card

One of the most common complaints about LLMs is that they get things wrong and make things up (hallucination). One solid mitigation for this is RAG — Retrieval-Augmented Generation.

Instead of relying entirely on what the model memorized during training, RAG connects it to a live database or document store. When you ask a question, it first retrieves the most relevant documents, then uses those to inform the answer. It's the difference between a model that guesses from memory and one that looks things up first.

For enterprise security use cases — threat intelligence querying, policy interpretation, incident response support — RAG is a key architecture pattern worth knowing.

Why This Matters for Cybersecurity

I'm not writing this as an AI researcher. I'm writing this as someone who has spent over two decades in information security and is watching AI become a permanent part of the threat landscape — and the defense toolkit.

Adversaries are already using LLMs to write more convincing phishing emails, generate malware variants, and automate reconnaissance. The barrier to entry for attacks has dropped. At the same time, defenders are using the same technology for anomaly detection, log analysis, and threat hunting at scale.

Understanding how these engines work isn't optional anymore. If you don't know what an LLM can and can't do, you can't evaluate the risk of deploying one — or the risk of having one used against you.

The technology is not magic. It's math, scale, and engineering. And once you understand that, it becomes a lot less mysterious and a lot more manageable.

If this was helpful, feel free to share it with your team or anyone trying to get their arms around this space. Happy to connect with anyone working through these same questions.

References and articles I read:

https://cloud.google.com/discover/what-are-convolutional-neural-networks

https://www.ibm.com/think/topics/recurrent-neural-networks

https://pub.towardsai.net/are-lstms-dead-exploring-their-role-in-the-age-of-transformers-ae93d753ed38

https://aws.amazon.com/what-is/transformers-in-artificial-intelligence/

https://aws.amazon.com/what-is/large-language-model/

https://gordicaleksa.medium.com/eli5-flash-attention-5c44017022ad

https://medium.com/@techresearchspace/what-is-quantization-in-llm-01ba61968a51

https://cloud.google.com/discover/what-is-an-ai-model

https://comfyui-wiki.com/en/comfyui-nodes/loaders/lora-loader-model-only

https://www.datacamp.com/blog/what-is-transfer-learning-in-ai-an-introductory-guide

https://www.splunk.com/en_us/blog/learn/retrieval-augmented-generation-rag.html

View full post