Prompt Engineering Practical Guide - Techniques, Patterns, and Optimization Strategies
A comprehensive guide covering prompt engineering core techniques (Zero-shot, Few-shot, CoT, Self-Consistency, ToT), practical patterns, system prompt design, structured output, and evaluation/optimization strategies.
Prompt engineering is the technique of optimizing inputs to LLMs to achieve desired results. This post systematically covers fundamental techniques through advanced patterns, practical templates, and evaluation methods.
1. Prompt Engineering Fundamentals
Components of a Prompt
An effective prompt is composed of the following elements:
┌────────────────────────────────────────────┐
│ Prompt Structure │
│ │
│ 1. Role — Model persona │
│ 2. Context — Background, constraints │
│ 3. Instruction — Task to perform │
│ 4. Input — Data to process │
│ 5. Examples — Desired I/O pairs │
│ 6. Format — Response structure │
│ 7. Constraints — What NOT to do │
└────────────────────────────────────────────┘
Prompt Writing Principles
| Principle | Bad Example | Good Example |
|---|---|---|
| Be specific | "Optimize the code" | "Improve this Python function's time complexity from O(n²) to O(n log n)" |
| Assign role | "Write SQL" | "You are a DBA. Write optimized SQL considering index utilization" |
| Specify format | "Tell me pros and cons" | "Present pros/cons in table format (item/pros/cons/notes)" |
| State constraints | "Summarize this" | "Summarize in 3 sentences max, translate technical terms to Korean" |
| Separate steps | "Analyze and write report" | "Step 1: Data analysis, Step 2: Derive insights, Step 3: Write report" |
2. Core Prompting Techniques
Zero-shot Prompting
Performing tasks with instructions only, without examples.
Classify the sentiment of the following customer review as "positive", "negative", or "neutral".
Review: "Product quality is fine but shipping was too slow. Please send faster next time."
Sentiment:
Few-shot Prompting
Providing a few examples so the model learns the pattern.
Classify the following SQL error messages using these examples:
Error: "ORA-00942: table or view does not exist"
Category: Object access error
Action: Check table existence, verify permissions
Error: "ORA-01400: cannot insert NULL into"
Category: Data integrity error
Action: Check NOT NULL columns, set defaults
Error: "ORA-04031: unable to allocate shared memory"
Category:
Action:
Few-shot design tips:
| Tip | Description |
|---|---|
| Diverse examples | Include all categories evenly |
| Edge cases | Include boundary cases for accuracy |
| Consistent format | Write all examples in identical structure |
| Right number | 3-5 is optimal (too many increases cost) |
| Order | Place most relevant example last |
Chain-of-Thought (CoT) Prompting
Explicitly inducing step-by-step reasoning.
Q: 3 servers each handle 150 requests per second. If traffic increases
2.5x during peak hours, how many servers are needed with no request loss?
A: Let's solve step by step.
1. Current total capacity: 3 × 150 = 450 req/s
2. Peak traffic: 450 × 2.5 = 1,125 req/s
3. Required servers: 1,125 ÷ 150 = 7.5
4. Round up (servers are integers): 8
Therefore, at least 8 servers are needed during peak hours.
Note: Adding just "Let's think step by step" significantly improves accuracy in math, logic, and code analysis tasks.
Self-Consistency
Generate multiple reasoning attempts for the same question, then select the most frequent answer.
import anthropic
client = anthropic.Anthropic()
def self_consistency(prompt: str, n: int = 5):
"""Self-Consistency: majority vote answer selection"""
answers = []
for _ in range(n):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
temperature=0.7,
messages=[{"role": "user", "content": prompt}]
)
answers.append(extract_answer(response.content[0].text))
from collections import Counter
most_common = Counter(answers).most_common(1)[0]
return most_common[0], most_common[1] / n # answer, confidenceTree-of-Thought (ToT) Prompting
Explores multiple reasoning paths in a tree structure to find the optimal answer.
Problem: Kafka consumer lag is continuously increasing. Analyze the cause.
=== Possible cause branches ===
Path A: Producer-side issues
→ A1: Message production spike → Check metrics → [Likelihood: High]
→ A2: Message size increase → Check avg size → [Likelihood: Medium]
Path B: Consumer-side issues
→ B1: Processing logic delay → Profile processing time → [Likelihood: High]
→ B2: Insufficient consumer instances → Check partition/consumer ratio → [Likelihood: Medium]
Path C: Infrastructure issues
→ C1: Broker disk I/O bottleneck → Check disk utilization → [Likelihood: Medium]
=== Optimal diagnostic path ===
Recommended order: A1 → B1 → B2 (highest likelihood first)
Technique Comparison
| Technique | Suitable Tasks | Cost | Accuracy Gain |
|---|---|---|---|
| Zero-shot | Simple classification, translation, summary | Lowest | Baseline |
| Few-shot | Pattern learning, format specification | Low | Medium |
| CoT | Math, logic, code analysis | Medium | High |
| Self-Consistency | Problems with clear answers | High (Nx) | Very high |
| ToT | Complex decisions, diagnostics | High | Very high |
3. System Prompt Design
Role of System Prompts
System prompts define the model's overall behavior, role, and constraints. They are applied before all user messages.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system="""You are 'DataBot', a senior data engineer at Data Dynamics.
## Role
- Expert in Apache Spark, Kafka, NiFi, Kudu and big data technologies
- Familiar with internal technical standards and best practices
## Response Rules
1. Always include code examples for technical questions
2. Show before/after comparison for configuration changes
3. Explicitly state "verification needed" for uncertain information
4. Never output security-sensitive information (passwords, keys)
## Response Format
- Concise, practical answers (skip unnecessary greetings)
- Write production-ready code
- Present key points first (Bottom-up)
## Limitations
- Do not advise on infrastructure costs or licensing
- Do not execute direct production environment changes""",
messages=[{
"role": "user",
"content": "I'm getting Spark executor OOM errors. How do I fix this?"
}]
)System Prompt Design Patterns
Pattern 1: ROLE-TASK-FORMAT
[ROLE] You are a {role}.
[TASK] You {task description}.
[FORMAT] Respond in {format}.
Pattern 2: Behavior-based (DO/DON'T)
## DO
- Include code examples
- Explain impact of configuration changes
- Suggest at least 2 alternatives
## DON'T
- Output passwords or API keys
- State uncertain information definitively
- Directly modify production environments
4. Structured Output
Forcing JSON Output
Analyze the following server logs and return results in JSON format.
Logs:
2025-03-15 14:32:01 ERROR [PaymentService] Connection timeout to payment gateway (retry 3/3)
2025-03-15 14:32:05 WARN [OrderService] Order #12345 payment pending, fallback to queue
2025-03-15 14:33:00 INFO [OrderService] Order #12345 payment retried successfully
Output format:
{
"incident_summary": "summary",
"severity": "critical|warning|info",
"affected_services": ["service names"],
"root_cause": "root cause",
"resolution": "resolution",
"timeline": [
{"time": "time", "event": "event", "level": "level"}
]
}
Output ONLY the JSON with no other text.
XML Tag-Based Structuring
Particularly effective with Claude, using XML tags to clearly delineate input/output areas.
Analyze the following code.
<code>
def process_data(df):
result = df.groupBy("user_id").agg(
count("*").alias("total_orders"),
sum("amount").alias("total_amount")
)
return result.filter(col("total_amount") > 1000)
</code>
Write analysis results matching these tags:
<analysis>
<purpose>Code purpose</purpose>
<issues>Issues found (if any)</issues>
<optimization>Optimization suggestions</optimization>
<improved_code>Improved code</improved_code>
</analysis>
5. Advanced Prompt Patterns
Role-Playing Pattern
Conduct this code review from three perspectives:
<reviewer role="Security Expert">
Review the code from a security vulnerability perspective.
</reviewer>
<reviewer role="Performance Engineer">
Review the code from a performance bottleneck perspective.
</reviewer>
<reviewer role="Junior Developer">
Review from a code readability and comprehension perspective.
</reviewer>
Devil's Advocate Pattern
Present counterarguments to the following architecture decision.
Decision: "Migrate to microservices architecture"
You are a senior architect who opposes this decision.
1. Three potential risks
2. Specific scenarios where monolith is better
3. Most likely failure cause during migration
4. Alternative approaches
Graduated Complexity Pattern
Explain Kafka Consumer Groups at three levels:
[Beginner] Explain using analogies in 5 lines or less
[Intermediate] Explain core concepts and mechanics with code examples
[Advanced] Detail rebalancing protocols, partition assignment strategies, and error handling
Meta Prompting
Having the LLM generate prompts themselves.
I want to perform a systematic Spark performance tuning analysis.
Create the optimal prompt to achieve this goal, considering:
- Target: Apache Spark 3.x
- Scope: Configuration, code, infrastructure
- Output format: Checklist + improvement plan
- Environment: Kubernetes-based Spark on K8s
6. Practical Prompt Templates
Code Generation Template
Write {language} code matching these requirements.
## Requirements
{detailed requirements}
## Tech Stack
{libraries, frameworks}
## Constraints
- {constraint 1}
- {constraint 2}
## Code Quality Standards
- Include error handling
- Use type hints (Python)
- Write docstrings
- Include unit tests
## Output Format
1. Main code
2. Usage example
3. Test code
Incident Analysis Template
Analyze the following incident.
## Symptoms
{currently observed issues}
## Environment
- System: {system name}
- Version: {version}
- Infrastructure: {infra info}
## Collected Information
{logs, metrics, configuration}
## Analysis Request
1. Possible causes (highest probability first)
2. Diagnostic commands/queries per cause
3. Immediate mitigation (emergency response)
4. Root cause fix (prevent recurrence)
5. Impact assessment
7. Prompt Anti-Patterns and Solutions
Common Mistakes and Improvements
| Anti-Pattern | Problem | Improvement |
|---|---|---|
| Vague instructions | "Write it well" | "Summarize in 3 sentences including key metrics" |
| Excessive rules | 20+ rules listed | Compress to 5-7 core rules with priorities |
| Negative instructions | List of "don'ts" | State "dos" first |
| Missing context | Just dropping code | Include purpose, environment, expected result |
| Single-turn overload | Everything in one prompt | Separate into sequential steps |
| Ignoring Temperature | Always use defaults | Creative: 0.7-1.0, Analysis: 0.0-0.3, Code: 0.0 |
Temperature and Top-p Guide
| Task | Temperature | Top-p | Reason |
|---|---|---|---|
| Code generation | 0.0 | 1.0 | Accuracy first |
| Bug analysis | 0.0-0.2 | 1.0 | Fact-based analysis |
| Technical docs | 0.3-0.5 | 0.9 | Accurate but natural prose |
| Brainstorming | 0.7-1.0 | 0.95 | Diverse ideas |
| Creative writing | 0.8-1.0 | 0.95 | Creative expression |
8. Prompt Evaluation and Optimization
Evaluation Methods
| Method | Description | Suitable For |
|---|---|---|
| Automated eval | Answer comparison (Exact Match, F1) | Classification, extraction, QA |
| LLM-as-Judge | Another LLM scores quality | Generative tasks, summarization |
| Human eval | Domain experts evaluate directly | High quality requirements |
| A/B testing | Compare prompt variants | Production optimization |
LLM-as-Judge Implementation
def evaluate_with_llm(question: str, response: str, criteria: list) -> dict:
"""Evaluate response quality with LLM"""
eval_prompt = f"""Evaluate the quality of this response.
Question: {question}
Response: {response}
Evaluation criteria (1-5 points each):
{chr(10).join(f'- {c}' for c in criteria)}
Return evaluation results in JSON:
{{"scores": {{"criterion": score}}, "total": average, "feedback": "improvements"}}
"""
result = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
temperature=0.0,
messages=[{"role": "user", "content": eval_prompt}]
)
return json.loads(result.content[0].text)Prompt Optimization Process
1. Write initial prompt
↓
2. Evaluate with test set (10-20 cases)
↓
3. Analyze failure cases
├─ Format errors → Strengthen output format
├─ Low accuracy → Add few-shot examples
├─ Missing info → Enhance context
└─ Excessive output → Add constraints
↓
4. Modify prompt
↓
5. Re-evaluate (iterate 2-4 times)
↓
6. Finalize prompt
↓
7. Deploy to production + monitor
Note: Prompt optimization is about "iterative improvement," not "perfect on first try." Collect and analyze failure cases for gradual refinement. Also, prompt re-validation is needed when model versions change.
References
- Wei, J. et al. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." NeurIPS
- Wang, X. et al. (2023). "Self-Consistency Improves Chain of Thought Reasoning in Language Models." ICLR
- Yao, S. et al. (2023). "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." NeurIPS
- Anthropic. "Prompt Engineering Guide" — https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering
- OpenAI. "Prompt Engineering Guide" — https://platform.openai.com/docs/guides/prompt-engineering
- White, J. et al. (2023). "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT." arXiv
— Data Dynamics Engineering Team