The emergence of context engineering culminated a wave of hype , touting it as the next novel trick in the AI playbook. In reality , it’s a pragmatic craft; involving the deliberate and meticulous arrangement of various components of context. This article peels back on the buzz to expose those unglamorous yet indispensable building blocks.
In the past few months alone, think-pieces with headlines such as “The new skill in AI is not prompting, it’s context engineering” have dominated tech feeds, and high-profile posts from leaders like Shopify’s Tobi Lütke have racked up millions of views, heralding it as the must-have competency for 2025 [1].
Yet seasoned practitioners know the core idea is hardly new. Before today’s models with context window sizes spanning up to 2 million tokens, early-generation LLMs squeezed everything into a few thousand tokens, forcing developers to become masters of context management, often termed as prompt engineering—the careful selection, trimming, and sequencing of knowledge and instructions.
Every AI agent , QA chatbot, or coding assistant that impressed us in the last 5 years was already practicing context craftsmanship—just under a different banner. What has changed is scale (bigger windows, richer retrieval pipelines) and the marketing label. The underlying mission remains the same; get the right information, in the right format, to the model at the right moment.
Why context matters?
In development of AI agents, Context should treated as first-class engineering problem. Teams build sophisticated pipelines that capture prior dialogue, extract relevant data from knowledge base, enrich each request with domain metadata and invoke tools on demand. This meticulous management of context pays off, primarily in the following 3 ways :
- Reduces hallucinations and enforces factual grounding.
- Preserves conversational coherence and enables personalization
- Let agents break free from knowledge constraints placed by model’s static training data
In Practise – drawing from my experience shipping AI assistants for HR and payroll analytics , I have come to realise that context is both a super-power and a silent saboteur.
- There were instances where we would observe the model latched onto hallucinated generations clogging the chat history and ignored the factual data returned by tools.
- On bad days , we saw relevant data fetched from the corpus be ignored in favour of low- value fillers.
What matters , however , is that we fixed these failures with a playbook that now feels like second-nature. Be it ranking data by task relevance, surfacing only critical first few and compressing the rest, handing off tasks to agent nodes or other agents. The result – fewer hallucinations , tighter grounding and agents that behave like they remember what matters and not just what they see.
The building blocks
Industry voices are already classifying context engineering techniques into a handful of memorable buckets to make the discipline easier to teach and apply. LangChain’s recent blog post “Context Engineering for Agents” [2] is a good example , where they cluster techniques into 4 buckets; write , select , compress, isolate. We will take another approach and divulge instead into the components entailed in context for ai agents.
- Instructions / System prompt : It is the single most privileged piece of context, often used to set the role & policy , rules to follow , tone etc.
These are usually presented at the top of the context to LLMs [3]. Simple persona-style role setting can help steer both reasoning and style. Sometimes , Instructions are used to guide the models reasoning , enable chain of thought and even reactive tool plans. Instructions are
not just boilerplate; they are active and highly expressive lever of context engineering.
- Few-shot prompts : A classic in-context learning trick where we place a few worked examples of the problem within the prompt. Ideally, we target examples that often show recurring hallucinations in the model’s output, 3-5 few shot examples can go a long way with improving accuracy and coherence [4]. However, if hallucinations remain widespread, it might be time to consider fine-tuning.
- Relevant knowledge : Provisioning information that lives outside the model weights; this may be on-demand (Retrieval Augmented Generation), pre-loaded into the prompt or available in cache as key-value pairs before inference (Context/Cache Augmented Generation). RAG is a dual-edged blade; sophisticated RAG pipelines have shown great improvements in accuracy and coherence but sets us off on a flame-chase tackling accuracy and relevance of retrieved data versus latency [5]. It is common practise to also include context compression within this pipeline.
- Short-term memory / State : A vital component of context, helps otherwise usually stateless LLMs track conversations. This “chat memory” is embedded into the context of the model, turn by turn. It follows either a rolling buffer or a windowed buffer with recursive summarizations
(post k messages) in most implementations. Some implementations use short-term memory to store other data which may be helpful with lower-level control like routing and orchestration [6].
- Long-term memory : A complementary component to context, more agents have recently begun incorporating long-term memory to add a layer of personalization to the interactions with agents. Long-term memory is the gateway to episodic memory; letting LLMs recall specific events grounded in time and space. A common implementation is summarizing and storing key facts to load into context before every interaction [6][7].
- Structured Output schemas : Its importance can’t be overstated when dealing with model outputs that need to be parsed and readily accepted by application code. Output JSON schemas [8] are appended into the context of the model to facilitate proper generation and reduce the need to build specific output parsers.
- Tools metadata : Modern agents have moved on to incorporate tool calling as a standard [8]. These tool definitions, specifications and other metadata are provided as context to models, often as a list of JSON objects. This holds true even in the trending MCP (model context
protocol) implementations.
Conclusion
The craft of managing and provisioning Context has been around as early as 1966, in system’s like ELIZA. It has been a challenge since 1966 and remains as one in 2025, and will only grow as LLMs become more capable. One should note that it is not imperative to use all the components above , or every technique listed in the book. We need to choose the right blend , as Andrej Karpathy’s phrases aptly , “the delicate art and science of filling the context window”.
References
[1] The New Skill in AI is Not Prompting, It’s Context Engineering by Phil Schmid .
[2] Context Engineering for Agents by LangChain .
[3] Neumann, A., Kirsten, E., Zafar, M. B., & Singh, J. (2025). Position is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs). arXiv:2505.21091
[4] Allen, B. P., Polat, F., & Groth, P. (2024). SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection. arXiv:2404.03732
[5] Shen, M., Umar, M., Maeng, K., Suh, G. E., & Gupta, U. (2025). Towards understanding systems trade-offs in retrieval-augmented generation model inference. arXiv:2412.11854
[6] Memory in LangGraph by LangChain
[7] Das, P., Chaudhury, S., Nelson, E., Melnyk, I., Swaminathan, S., Dai, S., Lozano, A., Kollias, G., Chenthamarakshan, V., Navratil, J., Dan, S. & Chen, P.. (2024). Larimar: Large Language Models with Episodic Memory Control. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:10109-10126
[8] Function calling & JSON mode: Structured output generation. OpenAI Developer Docs
[9] Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang. Lost in the Middle: How Language Models Use Long Contexts. arXiv:2307.03172
[10] Tuana Çelik, Logan Markewich , 2025-07-03. Context Engineering – What it is, and techniques to consider
No comment yet, add your voice below!