aillmragfine-tuningglossary

AI Glossary — LLM, RAG, Fine-Tuning, and Agent Terms in One Place

From LoRA, QLoRA, embeddings, and chunking to RAG, inference optimization, agents, and multimodal — the essential AI terms you need in practice, organized by category.

Data DynamicsJune 24, 202624 min read

The AI and LLM field grows new terminology fast, and the same concept often goes by many different names. This article is a reference that organizes the terms you run into most often in practice into tables grouped by category. Use it to quickly look up an unfamiliar term, or to align your team's shared vocabulary. Each entry aims for a "one-line definition," and topics that deserve a deeper treatment link out to other in-house articles.

This glossary is a living document. As the field shifts, we keep it updated. Suggestions for missing terms or more accurate definitions are always welcome.

Term Map

First, the big picture. Follow the categories below to jump straight to the area you need.

Loading diagram…

1. Fundamentals

Term	English / Abbr.	One-line description
Artificial Intelligence	AI (Artificial Intelligence)	The umbrella term for technology that lets machines perform human cognitive tasks
Machine Learning	ML (Machine Learning)	A branch of AI that learns rules from data
Deep Learning	DL (Deep Learning)	Machine learning based on multi-layer neural networks
Neural Network	Neural Network	A model that maps inputs to outputs through layers of neurons connected by weights
Parameter	Parameter / Weight	The numeric values inside a model (weights and biases) adjusted during training
Hyperparameter	Hyperparameter	Settings chosen by a human before training, such as learning rate and batch size
Training / Inference	Training / Inference	The process of adjusting parameters / producing results with a trained model
Supervised / Unsupervised / RL	Supervised / Unsupervised / RL	Learning paradigms: using labeled answers / without labels / learning from rewards
Overfitting / Generalization	Overfitting / Generalization	Fitting only the training data / performing well on unseen data too
Loss Function	Loss Function	A training objective that quantifies the gap between predictions and ground truth
Gradient Descent	Gradient Descent	An optimization technique that updates parameters in the direction that reduces loss
Backpropagation	Backpropagation	An algorithm that propagates the loss gradient backward to update weights
Epoch / Batch	Epoch / Batch	One full pass over the data / a group of samples processed at once

2. LLM & Transformer Architecture

Term	English / Abbr.	One-line description
Large Language Model	LLM (Large Language Model)	A large-scale model trained on vast text to understand and generate language
Transformer	Transformer	The attention-based architecture underlying modern LLMs
Attention	Attention	A mechanism that computes the mutual importance of input tokens as weights
Self-Attention	Self-Attention	Attention that computes relationships among tokens within a single sequence
Multi-Head Attention	Multi-Head Attention	Running multiple attentions in parallel to capture diverse relationships
Encoder / Decoder	Encoder / Decoder	The input-understanding part / the output-generation part. GPT-family models are decoder-only
Foundation Model	Foundation Model	A generally pretrained model that serves as the basis for many tasks
Context Window	Context Window	The maximum number of tokens a model can handle at once
Positional Encoding	Positional Encoding	A way to inject token order information into vectors
Model Size	Model Size (e.g. 7B, 70B)	A measure of model scale. B denotes one billion parameters
Instruct / Chat Model	Instruct / Chat Model	A model variant post-trained (aligned) for instruction following and dialogue

3. Tokens, Embeddings & Representations

Term	English / Abbr.	One-line description
Token	Token	The smallest unit of text a model processes (word, subword, or character)
Tokenizer	Tokenizer	The module that splits text into tokens and converts them to IDs
Tokenization	Tokenization	The process of converting text into a token sequence (e.g. BPE, SentencePiece)
Vocabulary	Vocabulary	The full set of tokens the tokenizer knows
Embedding	Embedding	A representation that converts a token, sentence, or document into a high-dimensional real-valued vector carrying meaning
Embedding Model	Embedding Model	A dedicated model that turns text into vectors (the core of retrieval and RAG)
Dimension	Dimension	The length of an embedding vector (e.g. 768, 1536, 3072 dimensions)
Normalization	Normalization	Scaling vector magnitude to 1 to stabilize cosine similarity computation
Latent / Vector Space	Latent / Vector Space	A representation space where semantically similar items sit close together
Logits	Logits	The unnormalized scores for each token, just before softmax
Probability Distribution	Probability Distribution	The next-token probabilities obtained by normalizing the logits

4. Training Stages

Term	English / Abbr.	One-line description
Pretraining	Pretraining	The first stage of training, learning next-token prediction on a large corpus
Fine-tuning	Fine-tuning	Further training a pretrained model on task- or domain-specific data
Supervised Fine-Tuning	SFT (Supervised Fine-Tuning)	Learning instruction-following ability from input-answer pairs
Instruction Tuning	Instruction Tuning	Training the model to "follow instructions" from diverse instruction-response data
RLHF	RLHF	Aligning a model via reinforcement learning, using human preferences turned into a reward model
Direct Preference Optimization	DPO (Direct Preference Optimization)	A lightweight technique that aligns directly from preference pairs without a reward model
RLAIF	RLAIF	Aligning a model where AI, rather than humans, generates the preference labels
Alignment	Alignment	The process of bringing model outputs in line with human intent and values
Reward Model	Reward Model	A model trained to score the quality and preference of responses
Continual / Continued Pretraining	Continual / Continued Pretraining	Continuing pretraining on data from a new domain
Catastrophic Forgetting	Catastrophic Forgetting	The phenomenon of losing previously learned knowledge through new training
Synthetic Data	Synthetic Data	Training data generated by models or rules

5. Parameter-Efficient Fine-Tuning (PEFT)

Term	English / Abbr.	One-line description
Parameter-Efficient Fine-Tuning	PEFT (Parameter-Efficient Fine-Tuning)	A family of techniques that train only a small subset of parameters, rather than all of them, to cut cost
LoRA	Low-Rank Adaptation	Approximating weight changes with two low-rank matrices, training only a small number of parameters
QLoRA	Quantized LoRA	Training LoRA while the base model is quantized to 4 bits
DoRA	Weight-Decomposed LoRA	A variant that decomposes weights into magnitude and direction to push LoRA's performance further
Adapter	Adapter	A PEFT approach that inserts small trainable modules between layers
Prompt Tuning	Prompt Tuning	Prepending a trainable "soft prompt" vector to the input
Prefix Tuning	Prefix Tuning	Adding a trainable prefix in front of the keys/values at each layer
Rank	Rank (r)	The size of LoRA's low-rank matrices. A trade-off between expressiveness and cost
Merge	Merge	Folding trained LoRA weights into the base to produce a single model
Full Fine-Tuning	Full Fine-Tuning	The traditional approach of updating all parameters (costly)

6. Quantization, Compression & Serving Optimization

Term	English / Abbr.	One-line description
Quantization	Quantization	Representing weights and activations in lower bits (e.g. INT4/INT8/FP8) for compression
Post-Training Quantization	PTQ (Post-Training Quantization)	Quantizing after training, with no additional training
Quantization-Aware Training	QAT (Quantization-Aware Training)	Training with quantization in mind to reduce accuracy loss
GPTQ / AWQ	GPTQ / AWQ	Leading post-training quantization algorithms (accuracy-preserving)
GGUF	GGUF	The quantized model file format used by the llama.cpp family
Knowledge Distillation	Knowledge Distillation	Transferring knowledge from a large teacher model to a small student model
Pruning	Pruning	Removing less important weights and connections to compress the model
KV Cache	KV Cache	Storing already-computed keys/values to speed up token generation
PagedAttention	PagedAttention	A memory technique that manages the KV cache in pages (vLLM)
Continuous Batching	Continuous Batching	A serving technique that dynamically groups requests to increase GPU utilization
Speculative Decoding	Speculative Decoding	Accelerating generation by having a small model propose tokens ahead of time
Throughput / Latency	Throughput / Latency	Items processed per second / time taken until a response
Time To First Token	TTFT (Time To First Token)	The time from a request until the first token appears

7. Inference & Decoding Parameters

Term	English / Abbr.	One-line description
Temperature	Temperature	Controls output randomness. Low is conservative, high is more varied
Top-k	Top-k Sampling	Sampling only from the k highest-probability tokens
Top-p	Nucleus Sampling	Sampling from the set of tokens accumulated until cumulative probability reaches p
Greedy Decoding	Greedy Decoding	Picking only the highest-probability token at each step
Beam Search	Beam Search	Tracking multiple candidate paths at once to find a better sequence
Repetition / Frequency Penalty	Repetition / Frequency Penalty	A correction that discourages repeating the same tokens
Max Tokens	Max Tokens	A limit on the maximum number of tokens to generate
Stop Sequence	Stop Sequence	A condition that halts generation when a specific string appears
Perplexity	Perplexity	A measure of how well a model predicts text (lower is better)
Structured Output	Structured Output / JSON Mode	Forcing output to conform to a schema (such as JSON)

8. RAG, Retrieval & Vectors

Term	English / Abbr.	One-line description
Retrieval-Augmented Generation	RAG (Retrieval-Augmented Generation)	An architecture that retrieves external documents and uses them as grounding to generate answers
Chunking	Chunking	Preprocessing that splits long documents into smaller units for retrieval and embedding
Chunk Overlap	Chunk Overlap	Overlapping part of each chunk at the boundaries to reduce loss of context
Vector DB	Vector DB	A database that stores and searches embedding vectors (e.g. pgvector, Milvus)
Similarity Search	Similarity Search	Search that finds vectors close to a query vector
Cosine Similarity	Cosine Similarity	A common metric that measures the directional similarity of two vectors
Approximate Nearest Neighbor	ANN (Approximate Nearest Neighbor)	Search that finds nearby vectors quickly by trading off a little accuracy
HNSW	HNSW	A leading graph-based ANN index
Dense / Sparse Retrieval	Dense / Sparse Retrieval	Embedding-based / keyword-based (BM25) retrieval
Hybrid Search	Hybrid Search	Search that combines dense and sparse retrieval to improve accuracy
Reranking	Reranking	Re-ordering first-stage retrieval results with a more precise model
Recall / Precision	Recall / Precision	How completely / how accurately relevant documents were retrieved
GraphRAG	GraphRAG	RAG augmented with a knowledge graph to strengthen relationships and summarization
Context Injection / Grounding	Context Injection / Grounding	Putting retrieved evidence into the prompt to tie answers to facts

9. Prompt Engineering

Term	English / Abbr.	One-line description
Prompt	Prompt	The input instruction given to a model
System Prompt	System Prompt	A higher-level instruction that defines the model's role and rules
Zero-shot / Few-shot	Zero-shot / Few-shot	Performing a task with no examples / with a few examples
In-Context Learning	In-Context Learning	The ability to learn a task from examples in the prompt alone, without training
Chain-of-Thought	CoT (Chain-of-Thought)	A technique that writes out intermediate reasoning steps to improve accuracy
ReAct	Reasoning + Acting	A pattern that alternates between reasoning and tool use (acting)
Prompt Template	Prompt Template	A reusable prompt form into which variables are inserted
Prompt Caching	Prompt Caching	Caching the repeated prefix of a prompt to reduce cost and latency
Context Engineering	Context Engineering	The work of designing and managing the information (retrieval, memory, tools) fed to a model
Jailbreak	Jailbreak	A prompt attack that induces a model to bypass its safety measures
Prompt Injection	Prompt Injection	An attack that overrides instructions through external input to make a model misbehave

10. Agents, Tools & Protocols

Term	English / Abbr.	One-line description
Agent	Agent	A system that plans, uses tools, and iterates on its own toward a goal
Tool Use / Function Calling	Tool Use / Function Calling	A model calling external functions or APIs to extend its capabilities
Multi-Agent	Multi-Agent	An architecture where multiple agents divide roles and collaborate
Orchestration	Orchestration	Coordinating the execution flow across multiple steps, agents, and tools
Memory	Memory	The ability to store and recall conversational and task context (short-term/long-term)
MCP	Model Context Protocol	A standard protocol that connects models with external tools and data
A2A	Agent-to-Agent	A protocol for communication and collaboration between agents
Guardrails	Guardrails	Mechanisms that validate and filter inputs/outputs to enforce safety and policy
HITL	HITL (Human-in-the-Loop)	A design that inserts human approval or intervention into critical decisions
Autonomy Level	Autonomy Level	The degree to which an agent acts without human intervention
Agentic Workflow	Agentic Workflow	An agent-driven workflow that iterates plan, execute, and verify
Planning	Planning	An agent's ability to break a goal into sub-steps and order their execution
Task Decomposition	Task Decomposition	Splitting a large task into manageable subtasks
Reflection	Reflection / Self-Critique	A loop where the agent reviews and revises its own output to improve quality
Subagent	Subagent	A child agent to which a parent delegates a specific subtask
Handoff	Handoff	One agent passing work and context to another
Router	Router	Branching a request to the right tool, agent, or model
Trajectory	Trajectory	The full record of an agent's observation-action steps
Tool Schema	Tool Schema	A spec defining a tool's name, arguments, and types (JSON Schema)
Parallel Tool Calls	Parallel Tool Calls	Calling multiple tools at once to reduce latency
Computer Use	Computer Use	The ability to directly operate a GUI by controlling screen, mouse, and keyboard
Code Interpreter	Code Interpreter	A tool that runs code in a sandbox to compute and analyze
MCP Server	MCP Server	An external process that exposes tools, resources, and prompts over MCP
MCP Client	MCP Client	The model/host side that connects to an MCP server to use its tools and data
MCP Host	MCP Host	The application that embeds the MCP client and connects it to the model (e.g., IDE, chat app)
MCP Transport	MCP Transport (stdio/HTTP)	The MCP communication channel: local stdio or remote HTTP/SSE
MCP Resource/Tool/Prompt	MCP Resource / Tool / Prompt	The three primitives an MCP server exposes (data to read / function to run / prompt template)
A2A Agent Card	A2A Agent Card	Metadata describing an agent's capabilities and endpoints to aid discovery
A2A Task	A2A Task	The unit of work exchanged between agents in A2A
Capability Discovery	Capability Discovery	Dynamically finding the capabilities of other agents and tools
Interoperability	Interoperability	The property of agents/tools from different vendors collaborating via standards

11. Multimodal & Generation

Term	English / Abbr.	One-line description
Multimodal	Multimodal	A model that handles multiple modalities together, such as text, image, and audio
Vision-Language Model	VLM (Vision-Language Model)	A model that understands images and text together
Diffusion Model	Diffusion Model	An approach that generates images by progressively removing noise
Text-to-Image	Text-to-Image	Generating images from text descriptions (e.g. image generation models)
CLIP	CLIP	A model that links images and text by embedding them into the same space
Speech Recognition	ASR (Speech-to-Text)	Converting speech into text
Speech Synthesis	TTS (Text-to-Speech)	Converting text into speech
OCR	OCR	Recognizing characters within an image as text
Generative AI	Generative AI	AI that generates new text, images, audio, and code
Latent Diffusion	Latent Diffusion	Generation that runs diffusion in a compressed latent space for efficiency

12. Evaluation, Safety & Operations

Term	English / Abbr.	One-line description
Hallucination	Hallucination	The phenomenon of plausibly generating content that is not factual
Grounding	Grounding	Tying outputs to verifiable evidence (documents, data)
Benchmark	Benchmark	An evaluation that compares model performance on a standard dataset
Eval	Eval	The work of measuring the quality of a model or system quantitatively and qualitatively
LLM-as-a-Judge	LLM-as-a-Judge	An evaluation method that scores output quality with another LLM
Red Teaming	Red Teaming	A safety check that finds vulnerabilities and risks through deliberate attacks
Safety / Alignment Tax	Safety / Alignment Tax	Suppressing harmful output / the performance loss accepted for the sake of safety
Bias	Bias	Unfair tendencies inherent in data or a model
Observability	Observability	Tracking and monitoring tokens, latency, cost, and quality
LLMOps	LLMOps	The overall deployment, evaluation, monitoring, and operation of LLM apps
Token Cost	Token Cost	The API usage cost proportional to the number of input and output tokens
Guardrail / Policy Eval	Guardrail / Policy Eval	An evaluation that checks compliance with safety and policy

13. Advanced & Emerging Topics

Term	English / Abbr.	One-line description
Mixture of Experts	MoE (Mixture of Experts)	Activating only some expert networks per input to scale efficiently
Grouped-Query Attention	GQA (Grouped-Query Attention)	Attention that groups key/value heads to reduce the KV cache and memory
Rotary Position Embedding	RoPE (Rotary Position Embedding)	Encoding position with rotation transforms, favorable for long contexts
FlashAttention	FlashAttention	Computing attention faster by optimizing memory access
State Space Model	SSM / Mamba	A family that processes long sequences efficiently with state spaces instead of attention
Long Context	Long Context	The ability to handle long contexts of hundreds of thousands to millions of tokens
Reasoning Model	Reasoning Model	A family of models trained to "think" longer before answering
Test-Time Compute	Test-Time Compute	A strategy that spends more compute at inference to improve accuracy
Thinking / Reasoning Tokens	Thinking / Reasoning Tokens	Internal reasoning tokens generated before the final answer
Context Caching	Context Caching	Caching a long shared context to cut the cost of repeated requests
Scaling Laws	Scaling Laws	Empirical rules describing the relationship between model, data, and compute scale and performance
Emergent Ability	Emergent Ability	An ability that appears suddenly as scale grows
Model Merging	Model Merging	A technique that combines the weights of multiple models to create a new one
Agentic / Long-term Memory	Agentic / Long-term Memory	Memory that lets an agent store and use information across sessions

14. MLOps & LLMOps

Term	English / Abbr.	One-line description
MLOps	MLOps	The practice of automating and standardizing ML model training, deployment, monitoring, and operations
LLMOps	LLMOps	Operations specialized for LLM apps (prompts, evals, cost, safety, serving)
Model Registry	Model Registry	A store that manages model versions, metadata, and stages
Feature Store	Feature Store	A system that stores and serves features shared across training and serving
Experiment Tracking	Experiment Tracking	Recording and comparing hyperparameters, metrics, and artifacts (e.g., MLflow)
Model Serving	Model Serving	Exposing a trained model as an API or endpoint
Data/Model Versioning	Data / Model Versioning	Versioning datasets and models to ensure reproducibility
Reproducibility	Reproducibility	The property of getting the same result again from the same input and code
Drift	Drift (Data / Concept)	When the data distribution or input-output relationship changes over time
Model Monitoring	Model Monitoring	Tracking accuracy, drift, and anomalies in production
Canary / Shadow Deployment	Canary / Shadow Deployment	Validating a new model on a fraction of traffic / with no impact on live service
Champion-Challenger	Champion-Challenger	Running the production model alongside a candidate for comparison
A/B Test	A/B Test	Comparing two versions via traffic splitting
ML Pipeline	ML Pipeline	A workflow that automates data → training → evaluation → deployment
Model Card	Model Card	A spec documenting a model's purpose, limits, and evaluation
Prompt Versioning	Prompt Versioning	Version-controlling prompts and running regression evals
Golden Dataset	Golden Dataset	A vetted ground-truth dataset used as the baseline for regression evals
Tracing	Tracing (OpenTelemetry)	Tracking the steps (spans) a request passes through for debugging and analysis
LLM Gateway	LLM Gateway	Centralized routing, key management, logging, and limiting for model calls
Semantic Cache	Semantic Cache	Caching results of semantically similar requests to cut cost and latency
Fallback / Routing	Fallback / Routing	Rerouting to another model or branching on failure or overload
Token Budgeting	Token Budgeting	Managing token-usage caps per request or session

15. AI Security

Term	English / Abbr.	One-line description
OWASP LLM Top 10	OWASP LLM Top 10	A standard list of the top 10 security threats for LLM apps
Prompt Injection	Prompt Injection	An attack that overrides instructions via external input to make a model misbehave
Indirect Prompt Injection	Indirect Prompt Injection	Hiding malicious instructions in external data such as documents or web pages
Jailbreak	Jailbreak	An attack that coaxes the model into bypassing its safety guards
Insecure Output Handling	Insecure Output Handling	Vulnerabilities (XSS, code execution, etc.) from executing/rendering model output without validation
Excessive Agency	Excessive Agency	The risk of giving an agent too many tools/permissions, amplifying potential harm
Data Exfiltration / Leakage	Data Exfiltration / Leakage	Sensitive information leaking out through outputs or external calls
PII Exposure	PII Exposure	The risk of personally identifiable information being exposed in training or output
Data Poisoning	Data Poisoning	Contaminating a model by mixing malicious samples into training data
Model Extraction	Model Extraction / Stealing	Replicating or stealing a model by repeated querying
Membership Inference	Membership Inference	An attack to determine whether specific data was used in training
Model Inversion	Model Inversion	An attack that tries to reconstruct training data from outputs
Adversarial Example	Adversarial Example	An input that fools a model via tiny perturbations
Backdoor / Trojan	Backdoor / Trojan	A trap planted to act maliciously only on a specific trigger
Supply Chain Security	Supply Chain Security	Verifying the provenance and integrity of models, data, and dependencies
Model Provenance	Model Provenance	Traceability of where and how a model was built
Sandboxing	Sandboxing	Isolating tool/code execution to limit the blast radius
Least Privilege	Least Privilege	Granting tools and agents only the permissions they actually need
Tool Allowlist / Denylist	Tool Allowlist / Denylist	Explicitly restricting which tools can be called
Content Moderation	Content Moderation	Detecting and blocking harmful or policy-violating inputs/outputs
DLP	Data Loss Prevention	Controls that detect and block leakage of sensitive information
Prompt Firewall	Prompt Firewall / Guardrails	A security layer that inspects inputs/outputs to block attacks and violations
Audit Log	Audit Log	Recording who did what for traceability and post-incident analysis
Watermarking	Watermarking	Embedding an identifying signal in generated content to mark its origin
Human Approval	Human Approval (HITL)	A control that requires human sign-off before risky actions

Wrapping Up

That covers the AI and LLM terms most commonly used in practice today. Terminology keeps growing, so we plan to update this article periodically. If you want to go deeper, we recommend reading the in-house blog articles dedicated to RAG, embeddings, fine-tuning, and agents.

What matters more than memorizing terms is grasping the relationships between concepts. Use the "Term Map" above as a starting point, and get comfortable with one category at a time.