The quiet comeback of Web Components.
No build step, no framework tax, and browsers finally caught up.
For two years, every AI product had the same spine: chunk documents, embed, stuff top-k into the prompt, pray. Retrieval-augmented generation became synonymous with "enterprise AI." Vendors sold it. Consultants deployed it. Conference talks celebrated it.
Then the context windows grew. Not a little — a thousand-fold. And the question quietly shifted from "what should we retrieve?" to "what should we leave out?"
The retrieval tax
Every RAG pipeline pays three hidden costs: chunking destroys structure, embeddings flatten meaning, and top-k ranking discards the long tail. For a customer support bot answering FAQ questions, fine. For reasoning across a codebase, a legal corpus, or a research archive — catastrophic.
# The old way chunks = split(doc, size=512) vectors = embed(chunks) results = top_k(query, vectors, k=5) answer = llm(query + results) # The new way answer = llm(query + doc) # the whole thing
What replaces it
Context engineering. The discipline of deciding what a model sees, in what order, at what fidelity — without pre-emptively throwing information away. It looks less like database design and more like film editing.
The tools are different. Caching layers that remember per-session. Summarizers that compress low-salience passages. Routers that decide when to fall back to retrieval and when to stream the full document. This is the new stack.
