What is retrieval-augmented generation (RAG)?
Retrieval-augmented generation (RAG) is a technique that connects a language model to your own knowledge sources. When a user asks a question, the system first retrieves the most relevant documents, then passes them to the model alongside the question. The answer is grounded in your data instead of relying only on what the model learned during training.
How a RAG pipeline works
Documents are ingested, split into passages and indexed, typically as vector embeddings that capture meaning rather than exact keywords. At query time the system retrieves the passages most relevant to the question, inserts them into the model’s context, and asks it to answer using that material, ideally citing its sources. The model supplies the language and reasoning; your repository supplies the facts.
Why enterprises choose RAG
Company knowledge changes daily, and retraining a model for every policy update is neither practical nor economical. With RAG, updating the answer means updating the document. Answers can cite their sources, which builds the trust adoption depends on. Because retrieval happens at query time, a well-built pipeline can also enforce document permissions, so users only get answers drawn from material they are allowed to read. For injecting knowledge, RAG is almost always worth trying before fine-tuning.
Security and quality considerations
Two design decisions dominate. First, permission-aware retrieval is non-negotiable: an index that ignores access controls will leak documents across the org chart. Second, retrieved content is untrusted input. A poisoned document can carry an indirect prompt injection that the model executes when it reads it. Answer quality is also capped by retrieval quality. RAG systems therefore need continuous evaluation and adversarial testing well beyond the launch demo.
What is the difference between RAG and fine-tuning?
RAG changes what the model knows at question time; fine-tuning changes how the model behaves. Use RAG for facts and documents that change. Consider fine-tuning for tone, format or domain-specific behaviour. Many production systems combine both.
Does RAG eliminate hallucinations?
It reduces them substantially when retrieval works, because the model has the right material in front of it. It does not eliminate them: the model can still misread sources or fill gaps. Citations, grounding checks and ongoing evaluation remain necessary.
What data sources can RAG use?
Almost anything you can index: wikis, document drives, tickets, CRM records, policies, contracts, databases. The practical constraints are data quality and access control. Both matter more than raw volume.
- Prompt injectionPrompt injection is an attack that hides malicious instructions inside content an AI system processes, such as an email, a document or a web page. The model then treats the attacker’s text as instructions and ignores the rules it was given. The OWASP Top 10 for LLM applications ranks it as the leading security risk for this type of system.
- LLM red teamingLLM red teaming is the structured, adversarial testing of AI systems. Testers deliberately attack a model or AI application with jailbreaks, prompt-injection payloads, data-extraction attempts and abuse scenarios in order to find failures before real users or attackers do.
- AI governanceAI governance is the set of policies, roles, processes and technical controls an organisation puts in place so that AI is used safely, legally and accountably. It defines which tools are approved, who may use them with which data, how usage is monitored, and how risks are assessed before a use case goes to production.
Deploy AI with confidence
Code75 implements production AI across enterprise teams, with the security testing and governance to match. You will talk to an engineer.