Blog
aillmragfine-tuningglossary

AI Glossary — LLM, RAG, Fine-Tuning, and Agent Terms in One Place

From LoRA, QLoRA, embeddings, and chunking to RAG, inference optimization, agents, and multimodal — the essential AI terms you need in practice, organized by category.

Data DynamicsJune 24, 202624 min read

The AI and LLM field grows new terminology fast, and the same concept often goes by many different names. This article is a reference that organizes the terms you run into most often in practice into tables grouped by category. Use it to quickly look up an unfamiliar term, or to align your team's shared vocabulary. Each entry aims for a "one-line definition," and topics that deserve a deeper treatment link out to other in-house articles.

This glossary is a living document. As the field shifts, we keep it updated. Suggestions for missing terms or more accurate definitions are always welcome.

Term Map

First, the big picture. Follow the categories below to jump straight to the area you need.

Loading diagram…

1. Fundamentals

TermEnglish / Abbr.One-line description
Artificial IntelligenceAI (Artificial Intelligence)The umbrella term for technology that lets machines perform human cognitive tasks
Machine LearningML (Machine Learning)A branch of AI that learns rules from data
Deep LearningDL (Deep Learning)Machine learning based on multi-layer neural networks
Neural NetworkNeural NetworkA model that maps inputs to outputs through layers of neurons connected by weights
ParameterParameter / WeightThe numeric values inside a model (weights and biases) adjusted during training
HyperparameterHyperparameterSettings chosen by a human before training, such as learning rate and batch size
Training / InferenceTraining / InferenceThe process of adjusting parameters / producing results with a trained model
Supervised / Unsupervised / RLSupervised / Unsupervised / RLLearning paradigms: using labeled answers / without labels / learning from rewards
Overfitting / GeneralizationOverfitting / GeneralizationFitting only the training data / performing well on unseen data too
Loss FunctionLoss FunctionA training objective that quantifies the gap between predictions and ground truth
Gradient DescentGradient DescentAn optimization technique that updates parameters in the direction that reduces loss
BackpropagationBackpropagationAn algorithm that propagates the loss gradient backward to update weights
Epoch / BatchEpoch / BatchOne full pass over the data / a group of samples processed at once

2. LLM & Transformer Architecture

TermEnglish / Abbr.One-line description
Large Language ModelLLM (Large Language Model)A large-scale model trained on vast text to understand and generate language
TransformerTransformerThe attention-based architecture underlying modern LLMs
AttentionAttentionA mechanism that computes the mutual importance of input tokens as weights
Self-AttentionSelf-AttentionAttention that computes relationships among tokens within a single sequence
Multi-Head AttentionMulti-Head AttentionRunning multiple attentions in parallel to capture diverse relationships
Encoder / DecoderEncoder / DecoderThe input-understanding part / the output-generation part. GPT-family models are decoder-only
Foundation ModelFoundation ModelA generally pretrained model that serves as the basis for many tasks
Context WindowContext WindowThe maximum number of tokens a model can handle at once
Positional EncodingPositional EncodingA way to inject token order information into vectors
Model SizeModel Size (e.g. 7B, 70B)A measure of model scale. B denotes one billion parameters
Instruct / Chat ModelInstruct / Chat ModelA model variant post-trained (aligned) for instruction following and dialogue

3. Tokens, Embeddings & Representations

TermEnglish / Abbr.One-line description
TokenTokenThe smallest unit of text a model processes (word, subword, or character)
TokenizerTokenizerThe module that splits text into tokens and converts them to IDs
TokenizationTokenizationThe process of converting text into a token sequence (e.g. BPE, SentencePiece)
VocabularyVocabularyThe full set of tokens the tokenizer knows
EmbeddingEmbeddingA representation that converts a token, sentence, or document into a high-dimensional real-valued vector carrying meaning
Embedding ModelEmbedding ModelA dedicated model that turns text into vectors (the core of retrieval and RAG)
DimensionDimensionThe length of an embedding vector (e.g. 768, 1536, 3072 dimensions)
NormalizationNormalizationScaling vector magnitude to 1 to stabilize cosine similarity computation
Latent / Vector SpaceLatent / Vector SpaceA representation space where semantically similar items sit close together
LogitsLogitsThe unnormalized scores for each token, just before softmax
Probability DistributionProbability DistributionThe next-token probabilities obtained by normalizing the logits

4. Training Stages

TermEnglish / Abbr.One-line description
PretrainingPretrainingThe first stage of training, learning next-token prediction on a large corpus
Fine-tuningFine-tuningFurther training a pretrained model on task- or domain-specific data
Supervised Fine-TuningSFT (Supervised Fine-Tuning)Learning instruction-following ability from input-answer pairs
Instruction TuningInstruction TuningTraining the model to "follow instructions" from diverse instruction-response data
RLHFRLHFAligning a model via reinforcement learning, using human preferences turned into a reward model
Direct Preference OptimizationDPO (Direct Preference Optimization)A lightweight technique that aligns directly from preference pairs without a reward model
RLAIFRLAIFAligning a model where AI, rather than humans, generates the preference labels
AlignmentAlignmentThe process of bringing model outputs in line with human intent and values
Reward ModelReward ModelA model trained to score the quality and preference of responses
Continual / Continued PretrainingContinual / Continued PretrainingContinuing pretraining on data from a new domain
Catastrophic ForgettingCatastrophic ForgettingThe phenomenon of losing previously learned knowledge through new training
Synthetic DataSynthetic DataTraining data generated by models or rules

5. Parameter-Efficient Fine-Tuning (PEFT)

TermEnglish / Abbr.One-line description
Parameter-Efficient Fine-TuningPEFT (Parameter-Efficient Fine-Tuning)A family of techniques that train only a small subset of parameters, rather than all of them, to cut cost
LoRALow-Rank AdaptationApproximating weight changes with two low-rank matrices, training only a small number of parameters
QLoRAQuantized LoRATraining LoRA while the base model is quantized to 4 bits
DoRAWeight-Decomposed LoRAA variant that decomposes weights into magnitude and direction to push LoRA's performance further
AdapterAdapterA PEFT approach that inserts small trainable modules between layers
Prompt TuningPrompt TuningPrepending a trainable "soft prompt" vector to the input
Prefix TuningPrefix TuningAdding a trainable prefix in front of the keys/values at each layer
RankRank (r)The size of LoRA's low-rank matrices. A trade-off between expressiveness and cost
MergeMergeFolding trained LoRA weights into the base to produce a single model
Full Fine-TuningFull Fine-TuningThe traditional approach of updating all parameters (costly)

6. Quantization, Compression & Serving Optimization

TermEnglish / Abbr.One-line description
QuantizationQuantizationRepresenting weights and activations in lower bits (e.g. INT4/INT8/FP8) for compression
Post-Training QuantizationPTQ (Post-Training Quantization)Quantizing after training, with no additional training
Quantization-Aware TrainingQAT (Quantization-Aware Training)Training with quantization in mind to reduce accuracy loss
GPTQ / AWQGPTQ / AWQLeading post-training quantization algorithms (accuracy-preserving)
GGUFGGUFThe quantized model file format used by the llama.cpp family
Knowledge DistillationKnowledge DistillationTransferring knowledge from a large teacher model to a small student model
PruningPruningRemoving less important weights and connections to compress the model
KV CacheKV CacheStoring already-computed keys/values to speed up token generation
PagedAttentionPagedAttentionA memory technique that manages the KV cache in pages (vLLM)
Continuous BatchingContinuous BatchingA serving technique that dynamically groups requests to increase GPU utilization
Speculative DecodingSpeculative DecodingAccelerating generation by having a small model propose tokens ahead of time
Throughput / LatencyThroughput / LatencyItems processed per second / time taken until a response
Time To First TokenTTFT (Time To First Token)The time from a request until the first token appears

7. Inference & Decoding Parameters

TermEnglish / Abbr.One-line description
TemperatureTemperatureControls output randomness. Low is conservative, high is more varied
Top-kTop-k SamplingSampling only from the k highest-probability tokens
Top-pNucleus SamplingSampling from the set of tokens accumulated until cumulative probability reaches p
Greedy DecodingGreedy DecodingPicking only the highest-probability token at each step
Beam SearchBeam SearchTracking multiple candidate paths at once to find a better sequence
Repetition / Frequency PenaltyRepetition / Frequency PenaltyA correction that discourages repeating the same tokens
Max TokensMax TokensA limit on the maximum number of tokens to generate
Stop SequenceStop SequenceA condition that halts generation when a specific string appears
PerplexityPerplexityA measure of how well a model predicts text (lower is better)
Structured OutputStructured Output / JSON ModeForcing output to conform to a schema (such as JSON)

8. RAG, Retrieval & Vectors

TermEnglish / Abbr.One-line description
Retrieval-Augmented GenerationRAG (Retrieval-Augmented Generation)An architecture that retrieves external documents and uses them as grounding to generate answers
ChunkingChunkingPreprocessing that splits long documents into smaller units for retrieval and embedding
Chunk OverlapChunk OverlapOverlapping part of each chunk at the boundaries to reduce loss of context
Vector DBVector DBA database that stores and searches embedding vectors (e.g. pgvector, Milvus)
Similarity SearchSimilarity SearchSearch that finds vectors close to a query vector
Cosine SimilarityCosine SimilarityA common metric that measures the directional similarity of two vectors
Approximate Nearest NeighborANN (Approximate Nearest Neighbor)Search that finds nearby vectors quickly by trading off a little accuracy
HNSWHNSWA leading graph-based ANN index
Dense / Sparse RetrievalDense / Sparse RetrievalEmbedding-based / keyword-based (BM25) retrieval
Hybrid SearchHybrid SearchSearch that combines dense and sparse retrieval to improve accuracy
RerankingRerankingRe-ordering first-stage retrieval results with a more precise model
Recall / PrecisionRecall / PrecisionHow completely / how accurately relevant documents were retrieved
GraphRAGGraphRAGRAG augmented with a knowledge graph to strengthen relationships and summarization
Context Injection / GroundingContext Injection / GroundingPutting retrieved evidence into the prompt to tie answers to facts

9. Prompt Engineering

TermEnglish / Abbr.One-line description
PromptPromptThe input instruction given to a model
System PromptSystem PromptA higher-level instruction that defines the model's role and rules
Zero-shot / Few-shotZero-shot / Few-shotPerforming a task with no examples / with a few examples
In-Context LearningIn-Context LearningThe ability to learn a task from examples in the prompt alone, without training
Chain-of-ThoughtCoT (Chain-of-Thought)A technique that writes out intermediate reasoning steps to improve accuracy
ReActReasoning + ActingA pattern that alternates between reasoning and tool use (acting)
Prompt TemplatePrompt TemplateA reusable prompt form into which variables are inserted
Prompt CachingPrompt CachingCaching the repeated prefix of a prompt to reduce cost and latency
Context EngineeringContext EngineeringThe work of designing and managing the information (retrieval, memory, tools) fed to a model
JailbreakJailbreakA prompt attack that induces a model to bypass its safety measures
Prompt InjectionPrompt InjectionAn attack that overrides instructions through external input to make a model misbehave

10. Agents, Tools & Protocols

TermEnglish / Abbr.One-line description
AgentAgentA system that plans, uses tools, and iterates on its own toward a goal
Tool Use / Function CallingTool Use / Function CallingA model calling external functions or APIs to extend its capabilities
Multi-AgentMulti-AgentAn architecture where multiple agents divide roles and collaborate
OrchestrationOrchestrationCoordinating the execution flow across multiple steps, agents, and tools
MemoryMemoryThe ability to store and recall conversational and task context (short-term/long-term)
MCPModel Context ProtocolA standard protocol that connects models with external tools and data
A2AAgent-to-AgentA protocol for communication and collaboration between agents
GuardrailsGuardrailsMechanisms that validate and filter inputs/outputs to enforce safety and policy
HITLHITL (Human-in-the-Loop)A design that inserts human approval or intervention into critical decisions
Autonomy LevelAutonomy LevelThe degree to which an agent acts without human intervention
Agentic WorkflowAgentic WorkflowAn agent-driven workflow that iterates plan, execute, and verify
PlanningPlanningAn agent's ability to break a goal into sub-steps and order their execution
Task DecompositionTask DecompositionSplitting a large task into manageable subtasks
ReflectionReflection / Self-CritiqueA loop where the agent reviews and revises its own output to improve quality
SubagentSubagentA child agent to which a parent delegates a specific subtask
HandoffHandoffOne agent passing work and context to another
RouterRouterBranching a request to the right tool, agent, or model
TrajectoryTrajectoryThe full record of an agent's observation-action steps
Tool SchemaTool SchemaA spec defining a tool's name, arguments, and types (JSON Schema)
Parallel Tool CallsParallel Tool CallsCalling multiple tools at once to reduce latency
Computer UseComputer UseThe ability to directly operate a GUI by controlling screen, mouse, and keyboard
Code InterpreterCode InterpreterA tool that runs code in a sandbox to compute and analyze
MCP ServerMCP ServerAn external process that exposes tools, resources, and prompts over MCP
MCP ClientMCP ClientThe model/host side that connects to an MCP server to use its tools and data
MCP HostMCP HostThe application that embeds the MCP client and connects it to the model (e.g., IDE, chat app)
MCP TransportMCP Transport (stdio/HTTP)The MCP communication channel: local stdio or remote HTTP/SSE
MCP Resource/Tool/PromptMCP Resource / Tool / PromptThe three primitives an MCP server exposes (data to read / function to run / prompt template)
A2A Agent CardA2A Agent CardMetadata describing an agent's capabilities and endpoints to aid discovery
A2A TaskA2A TaskThe unit of work exchanged between agents in A2A
Capability DiscoveryCapability DiscoveryDynamically finding the capabilities of other agents and tools
InteroperabilityInteroperabilityThe property of agents/tools from different vendors collaborating via standards

11. Multimodal & Generation

TermEnglish / Abbr.One-line description
MultimodalMultimodalA model that handles multiple modalities together, such as text, image, and audio
Vision-Language ModelVLM (Vision-Language Model)A model that understands images and text together
Diffusion ModelDiffusion ModelAn approach that generates images by progressively removing noise
Text-to-ImageText-to-ImageGenerating images from text descriptions (e.g. image generation models)
CLIPCLIPA model that links images and text by embedding them into the same space
Speech RecognitionASR (Speech-to-Text)Converting speech into text
Speech SynthesisTTS (Text-to-Speech)Converting text into speech
OCROCRRecognizing characters within an image as text
Generative AIGenerative AIAI that generates new text, images, audio, and code
Latent DiffusionLatent DiffusionGeneration that runs diffusion in a compressed latent space for efficiency

12. Evaluation, Safety & Operations

TermEnglish / Abbr.One-line description
HallucinationHallucinationThe phenomenon of plausibly generating content that is not factual
GroundingGroundingTying outputs to verifiable evidence (documents, data)
BenchmarkBenchmarkAn evaluation that compares model performance on a standard dataset
EvalEvalThe work of measuring the quality of a model or system quantitatively and qualitatively
LLM-as-a-JudgeLLM-as-a-JudgeAn evaluation method that scores output quality with another LLM
Red TeamingRed TeamingA safety check that finds vulnerabilities and risks through deliberate attacks
Safety / Alignment TaxSafety / Alignment TaxSuppressing harmful output / the performance loss accepted for the sake of safety
BiasBiasUnfair tendencies inherent in data or a model
ObservabilityObservabilityTracking and monitoring tokens, latency, cost, and quality
LLMOpsLLMOpsThe overall deployment, evaluation, monitoring, and operation of LLM apps
Token CostToken CostThe API usage cost proportional to the number of input and output tokens
Guardrail / Policy EvalGuardrail / Policy EvalAn evaluation that checks compliance with safety and policy

13. Advanced & Emerging Topics

TermEnglish / Abbr.One-line description
Mixture of ExpertsMoE (Mixture of Experts)Activating only some expert networks per input to scale efficiently
Grouped-Query AttentionGQA (Grouped-Query Attention)Attention that groups key/value heads to reduce the KV cache and memory
Rotary Position EmbeddingRoPE (Rotary Position Embedding)Encoding position with rotation transforms, favorable for long contexts
FlashAttentionFlashAttentionComputing attention faster by optimizing memory access
State Space ModelSSM / MambaA family that processes long sequences efficiently with state spaces instead of attention
Long ContextLong ContextThe ability to handle long contexts of hundreds of thousands to millions of tokens
Reasoning ModelReasoning ModelA family of models trained to "think" longer before answering
Test-Time ComputeTest-Time ComputeA strategy that spends more compute at inference to improve accuracy
Thinking / Reasoning TokensThinking / Reasoning TokensInternal reasoning tokens generated before the final answer
Context CachingContext CachingCaching a long shared context to cut the cost of repeated requests
Scaling LawsScaling LawsEmpirical rules describing the relationship between model, data, and compute scale and performance
Emergent AbilityEmergent AbilityAn ability that appears suddenly as scale grows
Model MergingModel MergingA technique that combines the weights of multiple models to create a new one
Agentic / Long-term MemoryAgentic / Long-term MemoryMemory that lets an agent store and use information across sessions

14. MLOps & LLMOps

TermEnglish / Abbr.One-line description
MLOpsMLOpsThe practice of automating and standardizing ML model training, deployment, monitoring, and operations
LLMOpsLLMOpsOperations specialized for LLM apps (prompts, evals, cost, safety, serving)
Model RegistryModel RegistryA store that manages model versions, metadata, and stages
Feature StoreFeature StoreA system that stores and serves features shared across training and serving
Experiment TrackingExperiment TrackingRecording and comparing hyperparameters, metrics, and artifacts (e.g., MLflow)
Model ServingModel ServingExposing a trained model as an API or endpoint
Data/Model VersioningData / Model VersioningVersioning datasets and models to ensure reproducibility
ReproducibilityReproducibilityThe property of getting the same result again from the same input and code
DriftDrift (Data / Concept)When the data distribution or input-output relationship changes over time
Model MonitoringModel MonitoringTracking accuracy, drift, and anomalies in production
Canary / Shadow DeploymentCanary / Shadow DeploymentValidating a new model on a fraction of traffic / with no impact on live service
Champion-ChallengerChampion-ChallengerRunning the production model alongside a candidate for comparison
A/B TestA/B TestComparing two versions via traffic splitting
ML PipelineML PipelineA workflow that automates data → training → evaluation → deployment
Model CardModel CardA spec documenting a model's purpose, limits, and evaluation
Prompt VersioningPrompt VersioningVersion-controlling prompts and running regression evals
Golden DatasetGolden DatasetA vetted ground-truth dataset used as the baseline for regression evals
TracingTracing (OpenTelemetry)Tracking the steps (spans) a request passes through for debugging and analysis
LLM GatewayLLM GatewayCentralized routing, key management, logging, and limiting for model calls
Semantic CacheSemantic CacheCaching results of semantically similar requests to cut cost and latency
Fallback / RoutingFallback / RoutingRerouting to another model or branching on failure or overload
Token BudgetingToken BudgetingManaging token-usage caps per request or session

15. AI Security

TermEnglish / Abbr.One-line description
OWASP LLM Top 10OWASP LLM Top 10A standard list of the top 10 security threats for LLM apps
Prompt InjectionPrompt InjectionAn attack that overrides instructions via external input to make a model misbehave
Indirect Prompt InjectionIndirect Prompt InjectionHiding malicious instructions in external data such as documents or web pages
JailbreakJailbreakAn attack that coaxes the model into bypassing its safety guards
Insecure Output HandlingInsecure Output HandlingVulnerabilities (XSS, code execution, etc.) from executing/rendering model output without validation
Excessive AgencyExcessive AgencyThe risk of giving an agent too many tools/permissions, amplifying potential harm
Data Exfiltration / LeakageData Exfiltration / LeakageSensitive information leaking out through outputs or external calls
PII ExposurePII ExposureThe risk of personally identifiable information being exposed in training or output
Data PoisoningData PoisoningContaminating a model by mixing malicious samples into training data
Model ExtractionModel Extraction / StealingReplicating or stealing a model by repeated querying
Membership InferenceMembership InferenceAn attack to determine whether specific data was used in training
Model InversionModel InversionAn attack that tries to reconstruct training data from outputs
Adversarial ExampleAdversarial ExampleAn input that fools a model via tiny perturbations
Backdoor / TrojanBackdoor / TrojanA trap planted to act maliciously only on a specific trigger
Supply Chain SecuritySupply Chain SecurityVerifying the provenance and integrity of models, data, and dependencies
Model ProvenanceModel ProvenanceTraceability of where and how a model was built
SandboxingSandboxingIsolating tool/code execution to limit the blast radius
Least PrivilegeLeast PrivilegeGranting tools and agents only the permissions they actually need
Tool Allowlist / DenylistTool Allowlist / DenylistExplicitly restricting which tools can be called
Content ModerationContent ModerationDetecting and blocking harmful or policy-violating inputs/outputs
DLPData Loss PreventionControls that detect and block leakage of sensitive information
Prompt FirewallPrompt Firewall / GuardrailsA security layer that inspects inputs/outputs to block attacks and violations
Audit LogAudit LogRecording who did what for traceability and post-incident analysis
WatermarkingWatermarkingEmbedding an identifying signal in generated content to mark its origin
Human ApprovalHuman Approval (HITL)A control that requires human sign-off before risky actions

Wrapping Up

That covers the AI and LLM terms most commonly used in practice today. Terminology keeps growing, so we plan to update this article periodically. If you want to go deeper, we recommend reading the in-house blog articles dedicated to RAG, embeddings, fine-tuning, and agents.

What matters more than memorizing terms is grasping the relationships between concepts. Use the "Term Map" above as a starting point, and get comfortable with one category at a time.