The AI and LLM field grows new terminology fast, and the same concept often goes by many different names. This article is a reference that organizes the terms you run into most often in practice into tables grouped by category. Use it to quickly look up an unfamiliar term, or to align your team's shared vocabulary. Each entry aims for a "one-line definition," and topics that deserve a deeper treatment link out to other in-house articles.
This glossary is a living document. As the field shifts, we keep it updated. Suggestions for missing terms or more accurate definitions are always welcome.
Term Map
First, the big picture. Follow the categories below to jump straight to the area you need.
Loading diagram…
1. Fundamentals
| Term | English / Abbr. | One-line description |
|---|
| Artificial Intelligence | AI (Artificial Intelligence) | The umbrella term for technology that lets machines perform human cognitive tasks |
| Machine Learning | ML (Machine Learning) | A branch of AI that learns rules from data |
| Deep Learning | DL (Deep Learning) | Machine learning based on multi-layer neural networks |
| Neural Network | Neural Network | A model that maps inputs to outputs through layers of neurons connected by weights |
| Parameter | Parameter / Weight | The numeric values inside a model (weights and biases) adjusted during training |
| Hyperparameter | Hyperparameter | Settings chosen by a human before training, such as learning rate and batch size |
| Training / Inference | Training / Inference | The process of adjusting parameters / producing results with a trained model |
| Supervised / Unsupervised / RL | Supervised / Unsupervised / RL | Learning paradigms: using labeled answers / without labels / learning from rewards |
| Overfitting / Generalization | Overfitting / Generalization | Fitting only the training data / performing well on unseen data too |
| Loss Function | Loss Function | A training objective that quantifies the gap between predictions and ground truth |
| Gradient Descent | Gradient Descent | An optimization technique that updates parameters in the direction that reduces loss |
| Backpropagation | Backpropagation | An algorithm that propagates the loss gradient backward to update weights |
| Epoch / Batch | Epoch / Batch | One full pass over the data / a group of samples processed at once |
| Term | English / Abbr. | One-line description |
|---|
| Large Language Model | LLM (Large Language Model) | A large-scale model trained on vast text to understand and generate language |
| Transformer | Transformer | The attention-based architecture underlying modern LLMs |
| Attention | Attention | A mechanism that computes the mutual importance of input tokens as weights |
| Self-Attention | Self-Attention | Attention that computes relationships among tokens within a single sequence |
| Multi-Head Attention | Multi-Head Attention | Running multiple attentions in parallel to capture diverse relationships |
| Encoder / Decoder | Encoder / Decoder | The input-understanding part / the output-generation part. GPT-family models are decoder-only |
| Foundation Model | Foundation Model | A generally pretrained model that serves as the basis for many tasks |
| Context Window | Context Window | The maximum number of tokens a model can handle at once |
| Positional Encoding | Positional Encoding | A way to inject token order information into vectors |
| Model Size | Model Size (e.g. 7B, 70B) | A measure of model scale. B denotes one billion parameters |
| Instruct / Chat Model | Instruct / Chat Model | A model variant post-trained (aligned) for instruction following and dialogue |
3. Tokens, Embeddings & Representations
| Term | English / Abbr. | One-line description |
|---|
| Token | Token | The smallest unit of text a model processes (word, subword, or character) |
| Tokenizer | Tokenizer | The module that splits text into tokens and converts them to IDs |
| Tokenization | Tokenization | The process of converting text into a token sequence (e.g. BPE, SentencePiece) |
| Vocabulary | Vocabulary | The full set of tokens the tokenizer knows |
| Embedding | Embedding | A representation that converts a token, sentence, or document into a high-dimensional real-valued vector carrying meaning |
| Embedding Model | Embedding Model | A dedicated model that turns text into vectors (the core of retrieval and RAG) |
| Dimension | Dimension | The length of an embedding vector (e.g. 768, 1536, 3072 dimensions) |
| Normalization | Normalization | Scaling vector magnitude to 1 to stabilize cosine similarity computation |
| Latent / Vector Space | Latent / Vector Space | A representation space where semantically similar items sit close together |
| Logits | Logits | The unnormalized scores for each token, just before softmax |
| Probability Distribution | Probability Distribution | The next-token probabilities obtained by normalizing the logits |
4. Training Stages
| Term | English / Abbr. | One-line description |
|---|
| Pretraining | Pretraining | The first stage of training, learning next-token prediction on a large corpus |
| Fine-tuning | Fine-tuning | Further training a pretrained model on task- or domain-specific data |
| Supervised Fine-Tuning | SFT (Supervised Fine-Tuning) | Learning instruction-following ability from input-answer pairs |
| Instruction Tuning | Instruction Tuning | Training the model to "follow instructions" from diverse instruction-response data |
| RLHF | RLHF | Aligning a model via reinforcement learning, using human preferences turned into a reward model |
| Direct Preference Optimization | DPO (Direct Preference Optimization) | A lightweight technique that aligns directly from preference pairs without a reward model |
| RLAIF | RLAIF | Aligning a model where AI, rather than humans, generates the preference labels |
| Alignment | Alignment | The process of bringing model outputs in line with human intent and values |
| Reward Model | Reward Model | A model trained to score the quality and preference of responses |
| Continual / Continued Pretraining | Continual / Continued Pretraining | Continuing pretraining on data from a new domain |
| Catastrophic Forgetting | Catastrophic Forgetting | The phenomenon of losing previously learned knowledge through new training |
| Synthetic Data | Synthetic Data | Training data generated by models or rules |
5. Parameter-Efficient Fine-Tuning (PEFT)
| Term | English / Abbr. | One-line description |
|---|
| Parameter-Efficient Fine-Tuning | PEFT (Parameter-Efficient Fine-Tuning) | A family of techniques that train only a small subset of parameters, rather than all of them, to cut cost |
| LoRA | Low-Rank Adaptation | Approximating weight changes with two low-rank matrices, training only a small number of parameters |
| QLoRA | Quantized LoRA | Training LoRA while the base model is quantized to 4 bits |
| DoRA | Weight-Decomposed LoRA | A variant that decomposes weights into magnitude and direction to push LoRA's performance further |
| Adapter | Adapter | A PEFT approach that inserts small trainable modules between layers |
| Prompt Tuning | Prompt Tuning | Prepending a trainable "soft prompt" vector to the input |
| Prefix Tuning | Prefix Tuning | Adding a trainable prefix in front of the keys/values at each layer |
| Rank | Rank (r) | The size of LoRA's low-rank matrices. A trade-off between expressiveness and cost |
| Merge | Merge | Folding trained LoRA weights into the base to produce a single model |
| Full Fine-Tuning | Full Fine-Tuning | The traditional approach of updating all parameters (costly) |
6. Quantization, Compression & Serving Optimization
| Term | English / Abbr. | One-line description |
|---|
| Quantization | Quantization | Representing weights and activations in lower bits (e.g. INT4/INT8/FP8) for compression |
| Post-Training Quantization | PTQ (Post-Training Quantization) | Quantizing after training, with no additional training |
| Quantization-Aware Training | QAT (Quantization-Aware Training) | Training with quantization in mind to reduce accuracy loss |
| GPTQ / AWQ | GPTQ / AWQ | Leading post-training quantization algorithms (accuracy-preserving) |
| GGUF | GGUF | The quantized model file format used by the llama.cpp family |
| Knowledge Distillation | Knowledge Distillation | Transferring knowledge from a large teacher model to a small student model |
| Pruning | Pruning | Removing less important weights and connections to compress the model |
| KV Cache | KV Cache | Storing already-computed keys/values to speed up token generation |
| PagedAttention | PagedAttention | A memory technique that manages the KV cache in pages (vLLM) |
| Continuous Batching | Continuous Batching | A serving technique that dynamically groups requests to increase GPU utilization |
| Speculative Decoding | Speculative Decoding | Accelerating generation by having a small model propose tokens ahead of time |
| Throughput / Latency | Throughput / Latency | Items processed per second / time taken until a response |
| Time To First Token | TTFT (Time To First Token) | The time from a request until the first token appears |
7. Inference & Decoding Parameters
| Term | English / Abbr. | One-line description |
|---|
| Temperature | Temperature | Controls output randomness. Low is conservative, high is more varied |
| Top-k | Top-k Sampling | Sampling only from the k highest-probability tokens |
| Top-p | Nucleus Sampling | Sampling from the set of tokens accumulated until cumulative probability reaches p |
| Greedy Decoding | Greedy Decoding | Picking only the highest-probability token at each step |
| Beam Search | Beam Search | Tracking multiple candidate paths at once to find a better sequence |
| Repetition / Frequency Penalty | Repetition / Frequency Penalty | A correction that discourages repeating the same tokens |
| Max Tokens | Max Tokens | A limit on the maximum number of tokens to generate |
| Stop Sequence | Stop Sequence | A condition that halts generation when a specific string appears |
| Perplexity | Perplexity | A measure of how well a model predicts text (lower is better) |
| Structured Output | Structured Output / JSON Mode | Forcing output to conform to a schema (such as JSON) |
8. RAG, Retrieval & Vectors
| Term | English / Abbr. | One-line description |
|---|
| Retrieval-Augmented Generation | RAG (Retrieval-Augmented Generation) | An architecture that retrieves external documents and uses them as grounding to generate answers |
| Chunking | Chunking | Preprocessing that splits long documents into smaller units for retrieval and embedding |
| Chunk Overlap | Chunk Overlap | Overlapping part of each chunk at the boundaries to reduce loss of context |
| Vector DB | Vector DB | A database that stores and searches embedding vectors (e.g. pgvector, Milvus) |
| Similarity Search | Similarity Search | Search that finds vectors close to a query vector |
| Cosine Similarity | Cosine Similarity | A common metric that measures the directional similarity of two vectors |
| Approximate Nearest Neighbor | ANN (Approximate Nearest Neighbor) | Search that finds nearby vectors quickly by trading off a little accuracy |
| HNSW | HNSW | A leading graph-based ANN index |
| Dense / Sparse Retrieval | Dense / Sparse Retrieval | Embedding-based / keyword-based (BM25) retrieval |
| Hybrid Search | Hybrid Search | Search that combines dense and sparse retrieval to improve accuracy |
| Reranking | Reranking | Re-ordering first-stage retrieval results with a more precise model |
| Recall / Precision | Recall / Precision | How completely / how accurately relevant documents were retrieved |
| GraphRAG | GraphRAG | RAG augmented with a knowledge graph to strengthen relationships and summarization |
| Context Injection / Grounding | Context Injection / Grounding | Putting retrieved evidence into the prompt to tie answers to facts |
9. Prompt Engineering
| Term | English / Abbr. | One-line description |
|---|
| Prompt | Prompt | The input instruction given to a model |
| System Prompt | System Prompt | A higher-level instruction that defines the model's role and rules |
| Zero-shot / Few-shot | Zero-shot / Few-shot | Performing a task with no examples / with a few examples |
| In-Context Learning | In-Context Learning | The ability to learn a task from examples in the prompt alone, without training |
| Chain-of-Thought | CoT (Chain-of-Thought) | A technique that writes out intermediate reasoning steps to improve accuracy |
| ReAct | Reasoning + Acting | A pattern that alternates between reasoning and tool use (acting) |
| Prompt Template | Prompt Template | A reusable prompt form into which variables are inserted |
| Prompt Caching | Prompt Caching | Caching the repeated prefix of a prompt to reduce cost and latency |
| Context Engineering | Context Engineering | The work of designing and managing the information (retrieval, memory, tools) fed to a model |
| Jailbreak | Jailbreak | A prompt attack that induces a model to bypass its safety measures |
| Prompt Injection | Prompt Injection | An attack that overrides instructions through external input to make a model misbehave |
| Term | English / Abbr. | One-line description |
|---|
| Agent | Agent | A system that plans, uses tools, and iterates on its own toward a goal |
| Tool Use / Function Calling | Tool Use / Function Calling | A model calling external functions or APIs to extend its capabilities |
| Multi-Agent | Multi-Agent | An architecture where multiple agents divide roles and collaborate |
| Orchestration | Orchestration | Coordinating the execution flow across multiple steps, agents, and tools |
| Memory | Memory | The ability to store and recall conversational and task context (short-term/long-term) |
| MCP | Model Context Protocol | A standard protocol that connects models with external tools and data |
| A2A | Agent-to-Agent | A protocol for communication and collaboration between agents |
| Guardrails | Guardrails | Mechanisms that validate and filter inputs/outputs to enforce safety and policy |
| HITL | HITL (Human-in-the-Loop) | A design that inserts human approval or intervention into critical decisions |
| Autonomy Level | Autonomy Level | The degree to which an agent acts without human intervention |
| Agentic Workflow | Agentic Workflow | An agent-driven workflow that iterates plan, execute, and verify |
| Planning | Planning | An agent's ability to break a goal into sub-steps and order their execution |
| Task Decomposition | Task Decomposition | Splitting a large task into manageable subtasks |
| Reflection | Reflection / Self-Critique | A loop where the agent reviews and revises its own output to improve quality |
| Subagent | Subagent | A child agent to which a parent delegates a specific subtask |
| Handoff | Handoff | One agent passing work and context to another |
| Router | Router | Branching a request to the right tool, agent, or model |
| Trajectory | Trajectory | The full record of an agent's observation-action steps |
| Tool Schema | Tool Schema | A spec defining a tool's name, arguments, and types (JSON Schema) |
| Parallel Tool Calls | Parallel Tool Calls | Calling multiple tools at once to reduce latency |
| Computer Use | Computer Use | The ability to directly operate a GUI by controlling screen, mouse, and keyboard |
| Code Interpreter | Code Interpreter | A tool that runs code in a sandbox to compute and analyze |
| MCP Server | MCP Server | An external process that exposes tools, resources, and prompts over MCP |
| MCP Client | MCP Client | The model/host side that connects to an MCP server to use its tools and data |
| MCP Host | MCP Host | The application that embeds the MCP client and connects it to the model (e.g., IDE, chat app) |
| MCP Transport | MCP Transport (stdio/HTTP) | The MCP communication channel: local stdio or remote HTTP/SSE |
| MCP Resource/Tool/Prompt | MCP Resource / Tool / Prompt | The three primitives an MCP server exposes (data to read / function to run / prompt template) |
| A2A Agent Card | A2A Agent Card | Metadata describing an agent's capabilities and endpoints to aid discovery |
| A2A Task | A2A Task | The unit of work exchanged between agents in A2A |
| Capability Discovery | Capability Discovery | Dynamically finding the capabilities of other agents and tools |
| Interoperability | Interoperability | The property of agents/tools from different vendors collaborating via standards |
11. Multimodal & Generation
| Term | English / Abbr. | One-line description |
|---|
| Multimodal | Multimodal | A model that handles multiple modalities together, such as text, image, and audio |
| Vision-Language Model | VLM (Vision-Language Model) | A model that understands images and text together |
| Diffusion Model | Diffusion Model | An approach that generates images by progressively removing noise |
| Text-to-Image | Text-to-Image | Generating images from text descriptions (e.g. image generation models) |
| CLIP | CLIP | A model that links images and text by embedding them into the same space |
| Speech Recognition | ASR (Speech-to-Text) | Converting speech into text |
| Speech Synthesis | TTS (Text-to-Speech) | Converting text into speech |
| OCR | OCR | Recognizing characters within an image as text |
| Generative AI | Generative AI | AI that generates new text, images, audio, and code |
| Latent Diffusion | Latent Diffusion | Generation that runs diffusion in a compressed latent space for efficiency |
12. Evaluation, Safety & Operations
| Term | English / Abbr. | One-line description |
|---|
| Hallucination | Hallucination | The phenomenon of plausibly generating content that is not factual |
| Grounding | Grounding | Tying outputs to verifiable evidence (documents, data) |
| Benchmark | Benchmark | An evaluation that compares model performance on a standard dataset |
| Eval | Eval | The work of measuring the quality of a model or system quantitatively and qualitatively |
| LLM-as-a-Judge | LLM-as-a-Judge | An evaluation method that scores output quality with another LLM |
| Red Teaming | Red Teaming | A safety check that finds vulnerabilities and risks through deliberate attacks |
| Safety / Alignment Tax | Safety / Alignment Tax | Suppressing harmful output / the performance loss accepted for the sake of safety |
| Bias | Bias | Unfair tendencies inherent in data or a model |
| Observability | Observability | Tracking and monitoring tokens, latency, cost, and quality |
| LLMOps | LLMOps | The overall deployment, evaluation, monitoring, and operation of LLM apps |
| Token Cost | Token Cost | The API usage cost proportional to the number of input and output tokens |
| Guardrail / Policy Eval | Guardrail / Policy Eval | An evaluation that checks compliance with safety and policy |
13. Advanced & Emerging Topics
| Term | English / Abbr. | One-line description |
|---|
| Mixture of Experts | MoE (Mixture of Experts) | Activating only some expert networks per input to scale efficiently |
| Grouped-Query Attention | GQA (Grouped-Query Attention) | Attention that groups key/value heads to reduce the KV cache and memory |
| Rotary Position Embedding | RoPE (Rotary Position Embedding) | Encoding position with rotation transforms, favorable for long contexts |
| FlashAttention | FlashAttention | Computing attention faster by optimizing memory access |
| State Space Model | SSM / Mamba | A family that processes long sequences efficiently with state spaces instead of attention |
| Long Context | Long Context | The ability to handle long contexts of hundreds of thousands to millions of tokens |
| Reasoning Model | Reasoning Model | A family of models trained to "think" longer before answering |
| Test-Time Compute | Test-Time Compute | A strategy that spends more compute at inference to improve accuracy |
| Thinking / Reasoning Tokens | Thinking / Reasoning Tokens | Internal reasoning tokens generated before the final answer |
| Context Caching | Context Caching | Caching a long shared context to cut the cost of repeated requests |
| Scaling Laws | Scaling Laws | Empirical rules describing the relationship between model, data, and compute scale and performance |
| Emergent Ability | Emergent Ability | An ability that appears suddenly as scale grows |
| Model Merging | Model Merging | A technique that combines the weights of multiple models to create a new one |
| Agentic / Long-term Memory | Agentic / Long-term Memory | Memory that lets an agent store and use information across sessions |
14. MLOps & LLMOps
| Term | English / Abbr. | One-line description |
|---|
| MLOps | MLOps | The practice of automating and standardizing ML model training, deployment, monitoring, and operations |
| LLMOps | LLMOps | Operations specialized for LLM apps (prompts, evals, cost, safety, serving) |
| Model Registry | Model Registry | A store that manages model versions, metadata, and stages |
| Feature Store | Feature Store | A system that stores and serves features shared across training and serving |
| Experiment Tracking | Experiment Tracking | Recording and comparing hyperparameters, metrics, and artifacts (e.g., MLflow) |
| Model Serving | Model Serving | Exposing a trained model as an API or endpoint |
| Data/Model Versioning | Data / Model Versioning | Versioning datasets and models to ensure reproducibility |
| Reproducibility | Reproducibility | The property of getting the same result again from the same input and code |
| Drift | Drift (Data / Concept) | When the data distribution or input-output relationship changes over time |
| Model Monitoring | Model Monitoring | Tracking accuracy, drift, and anomalies in production |
| Canary / Shadow Deployment | Canary / Shadow Deployment | Validating a new model on a fraction of traffic / with no impact on live service |
| Champion-Challenger | Champion-Challenger | Running the production model alongside a candidate for comparison |
| A/B Test | A/B Test | Comparing two versions via traffic splitting |
| ML Pipeline | ML Pipeline | A workflow that automates data → training → evaluation → deployment |
| Model Card | Model Card | A spec documenting a model's purpose, limits, and evaluation |
| Prompt Versioning | Prompt Versioning | Version-controlling prompts and running regression evals |
| Golden Dataset | Golden Dataset | A vetted ground-truth dataset used as the baseline for regression evals |
| Tracing | Tracing (OpenTelemetry) | Tracking the steps (spans) a request passes through for debugging and analysis |
| LLM Gateway | LLM Gateway | Centralized routing, key management, logging, and limiting for model calls |
| Semantic Cache | Semantic Cache | Caching results of semantically similar requests to cut cost and latency |
| Fallback / Routing | Fallback / Routing | Rerouting to another model or branching on failure or overload |
| Token Budgeting | Token Budgeting | Managing token-usage caps per request or session |
15. AI Security
| Term | English / Abbr. | One-line description |
|---|
| OWASP LLM Top 10 | OWASP LLM Top 10 | A standard list of the top 10 security threats for LLM apps |
| Prompt Injection | Prompt Injection | An attack that overrides instructions via external input to make a model misbehave |
| Indirect Prompt Injection | Indirect Prompt Injection | Hiding malicious instructions in external data such as documents or web pages |
| Jailbreak | Jailbreak | An attack that coaxes the model into bypassing its safety guards |
| Insecure Output Handling | Insecure Output Handling | Vulnerabilities (XSS, code execution, etc.) from executing/rendering model output without validation |
| Excessive Agency | Excessive Agency | The risk of giving an agent too many tools/permissions, amplifying potential harm |
| Data Exfiltration / Leakage | Data Exfiltration / Leakage | Sensitive information leaking out through outputs or external calls |
| PII Exposure | PII Exposure | The risk of personally identifiable information being exposed in training or output |
| Data Poisoning | Data Poisoning | Contaminating a model by mixing malicious samples into training data |
| Model Extraction | Model Extraction / Stealing | Replicating or stealing a model by repeated querying |
| Membership Inference | Membership Inference | An attack to determine whether specific data was used in training |
| Model Inversion | Model Inversion | An attack that tries to reconstruct training data from outputs |
| Adversarial Example | Adversarial Example | An input that fools a model via tiny perturbations |
| Backdoor / Trojan | Backdoor / Trojan | A trap planted to act maliciously only on a specific trigger |
| Supply Chain Security | Supply Chain Security | Verifying the provenance and integrity of models, data, and dependencies |
| Model Provenance | Model Provenance | Traceability of where and how a model was built |
| Sandboxing | Sandboxing | Isolating tool/code execution to limit the blast radius |
| Least Privilege | Least Privilege | Granting tools and agents only the permissions they actually need |
| Tool Allowlist / Denylist | Tool Allowlist / Denylist | Explicitly restricting which tools can be called |
| Content Moderation | Content Moderation | Detecting and blocking harmful or policy-violating inputs/outputs |
| DLP | Data Loss Prevention | Controls that detect and block leakage of sensitive information |
| Prompt Firewall | Prompt Firewall / Guardrails | A security layer that inspects inputs/outputs to block attacks and violations |
| Audit Log | Audit Log | Recording who did what for traceability and post-incident analysis |
| Watermarking | Watermarking | Embedding an identifying signal in generated content to mark its origin |
| Human Approval | Human Approval (HITL) | A control that requires human sign-off before risky actions |
Wrapping Up
That covers the AI and LLM terms most commonly used in practice today. Terminology keeps growing, so we plan to update this article periodically. If you want to go deeper, we recommend reading the in-house blog articles dedicated to RAG, embeddings, fine-tuning, and agents.
What matters more than memorizing terms is grasping the relationships between concepts. Use the "Term Map" above as a starting point, and get comfortable with one category at a time.