RAG & Agentic RAG:
the AI that speaks your data
Retrieval-Augmented Generation grounds your chatbot's answers in your documents — not in the random knowledge of a generic LLM. Agentic RAG goes further: an autonomous AI agent that reasons, verifies, and orchestrates its searches to deliver complex, reliable responses.
Works on any website. Supports PDF, DOCX, PPTX, TXT, and website scraping.
RAG Pipeline in action
1. User question
"What is your refund policy?"
2. Vector search (RAG)
3 relevant passages found in your Terms & Conditions
3. Augmented generation
The LLM formulates a response based on your documents
4. Sourced answer
"You have 30 days for a full refund, as stated in Section 4 of our Terms & Conditions."
91%
Recall@10 with hybrid search
-85%
Hallucinations reduced vs. LLM alone
< 3s
Average response time
72%
Of production RAG systems use hybrid search
What is RAG?
Retrieval-Augmented Generation is the architecture that turns a generic LLM into an expert in your field, by giving it access to your documents before every response.
The problem with LLMs without RAG
An LLM like GPT-4 or Claude answers from memory. It was trained on billions of web pages, but it has no knowledge of your pricing, internal procedures, products, or FAQ. Without access to your data, it invents (hallucinates) or gives generic, unhelpful answers.
Fine-tuning (retraining) is an option, but it is costly, requires GPUs, and your data becomes stale the moment a document changes. RAG solves this by injecting relevant context at every query, without modifying the model.
The simplest analogy
An LLM without RAG is like a student sitting an exam from memory — they can make mistakes or invent answers. An LLM with RAG is the same student allowed to consult their notes. They first find the right page, then formulate their answer from what they read.
RAG in 4 steps
Document ingestion
Your PDFs, DOCX, PPTX, or web pages are parsed and cleaned. Raw content is automatically extracted.
Chunking & Embeddings
Documents are split into optimally sized passages (chunks), then each passage is converted into a numerical vector (embedding) that captures its semantic meaning.
Vector search
When a user asks a question, it is converted into a vector and compared against all stored passages. The most similar ones are retrieved in milliseconds.
Augmented generation
Relevant passages are injected into the LLM's prompt. The AI formulates a natural response grounded in your documents — not its general knowledge.
The RAG pipeline in production
Behind every precise answer from your chatbot, a complete pipeline transforms your raw documents into AI-ready knowledge.
Smart ingestion
Automatic parsing of PDFs, DOCX, PPTX, and web pages. Extraction of text, tables, and metadata. Content cleaning and normalization for optimal indexing.
- PDF, DOCX, PPTX, TXT
- Website scraping
- Incremental updates
Chunking & Embeddings
Semantic splitting of documents into optimally sized passages. Each chunk is transformed into a vector via an embedding model, capturing the deep meaning of the text.
- Adaptive semantic chunking
- High-dimensional embeddings
- Vector storage (Qdrant)
Hybrid search
Combination of dense (semantic) and sparse (keyword) retrieval for optimal recall. Result reranking to prioritize the most relevant passages.
- Dense + Sparse retrieval
- Reciprocal Rank Fusion
- Contextual reranking
RAG vs Fine-tuning: which approach should you choose?
Two strategies for specializing an LLM. Understanding their respective strengths helps you make the right call — or combine both.
RAG (Retrieval-Augmented Generation)
Changes what the model sees at every query, without modifying its internal weights.
Up-to-date data — add or remove documents at any time, no retraining required
Lower cost — no GPUs or annotated datasets, just your existing documents
Transparency — every answer can be traced back to its source document
Fewer hallucinations — the AI is grounded in facts, not its own memory
Fast deployment — live in minutes with Heeya
Fine-tuning
Changes how the model behaves by modifying its internal weights.
Style and tone — adapts the model's language to a specific jargon or format
Specialized reasoning — improves performance on niche tasks
High cost — GPUs, annotated datasets, and ML expertise required
Frozen data — every update requires a full retraining run
Catastrophic forgetting — the model may lose general capabilities
In 2026: the hybrid approach is the standard
Volatile knowledge (pricing, procedures, FAQs) goes into RAG. Stable behavior (tone, format, domain reasoning) goes into fine-tuning. The two are not mutually exclusive — the best systems combine both approaches.
Agentic RAG: when the AI reasons before it responds
Classic RAG follows a linear flow: retrieve, then generate. Agentic RAG places an autonomous AI agent at the center of the pipeline. This agent decides what to search for, evaluates the quality of the results, and iterates until it produces a reliable answer.
It is the difference between an employee who follows a fixed procedure and an expert who adapts their approach to each situation. The agent can query multiple sources, cross-reference information, detect contradictions, and reformulate its search when the initial results fall short.
Understand the fundamentals of RAG →The 4 agentic capabilities
Reflection
The agent evaluates the quality of its own results and self-corrects before responding. If it detects an inconsistency, it triggers a new search.
Planning
Faced with a complex question, the agent breaks the task into sub-steps. "To answer this, I first need to verify X, then cross-reference with Y."
Tool use
The agent selects the right tool for each sub-task: vector search, keyword search, contact form, calculation.
Multi-agent collaboration
Multiple specialized agents cooperate: a retrieval agent, a verification agent, a formulation agent — each excelling in its own domain.
Classic RAG vs Agentic RAG
Classic RAG covers the majority of use cases. Agentic RAG takes over for complex, multi-source questions that require reasoning.
Classic RAG
Agentic RAG
The 3 architectures of Agentic RAG
From the simple router to hierarchical orchestration, each architecture addresses a different level of complexity.
Single agent (Router)
IDEAL FOR GETTING STARTED
A single agent decides which data source to query for each question. It routes the request to the right vector database or tool.
Use case: support chatbot with multiple document bases (FAQ, procedures, pricing). This is the architecture used by default at Heeya.
Multi-agent
COMPLEX QUERIES
A coordinator agent delegates to specialized sub-agents. Each sub-agent masters a specific domain or search type. Results are synthesized by the coordinator.
Use case: multi-document analysis, cross-source comparisons, real estate agents cross-referencing legal and commercial data.
Hierarchical
ENTERPRISE SYSTEMS
Multi-layer architecture: strategic agents, tactical agents, and operational agents. Each layer manages a different decision scope with appropriate search granularity.
Use case: enterprise systems with dozens of sources, complex decision workflows, multi-department HR assistants.
RAG in action: real-world use cases
RAG is not just a technology — it is the foundation of any AI chatbot capable of answering accurately from your business data.
Automated customer service
The chatbot answers questions about your products, pricing, return procedures, and terms & conditions based on your actual documentation. No more maintaining a rigid decision tree.
See the customer service solution →Training & onboarding
Import your training manuals, procedure guides, and internal materials. New employees get instant, sourced answers about company processes.
See the training solution →E-commerce & sales
The chatbot knows your catalog, product pages, current promotions, and shipping terms. It guides buyers to the right product and handles objections with precision.
See the e-commerce solution →Legal & compliance
Law firms use RAG to inform prospects about practice areas, procedures, and fees — without ever providing personalized legal advice.
See the legal solution →How Heeya implements RAG
A complete RAG pipeline, ready in minutes, with no technical expertise required.
Step 1
Import your documents
Upload your PDFs, DOCX, PPTX, text files, or enter your website URL. Heeya automatically parses, cleans, and structures the content. Your documents are split into optimal chunks and vectorized.
Step 2
Configure your AI agent
Define the System Guidance (personality, rules, tone), enable tools such as the contact form, and customize the welcome message. The agent is trained on your data in seconds.
Step 3
Deploy anywhere
Copy one line of code to embed the chatbot widget on your site. Compatible with WordPress, Wix, Shopify, Webflow, and any HTML site. Share a direct link for your social channels too.
Free trial, no credit card required. View pricing.
Our RAG technical stack
Heeya is built on battle-tested, production-grade technologies to ensure the reliability and performance of every RAG pipeline.
Vector Database: Qdrant
High-performance vector database for embedding storage and search. Native hybrid search (dense + sparse), client-isolated collections for data security.
LLMs: Multi-provider via OpenRouter
Access to the best models on the market (Gemini 2.0 Flash, Claude, GPT-4o) via a unified API. Optimal model selection based on use case and budget.
Embeddings: Advanced embedding models
Text vectorization into high-dimensional embeddings. Semantic meaning is captured for precise similarity search beyond simple keyword matching.
Pipeline: FastAPI + Async processing
Asynchronous ingestion and chunking to ensure a seamless user experience. Incremental indexing pipeline for document updates.
Security & Compliance
Data isolation — each company has its own dedicated vector collection; no data is shared between clients
Zero training — your documents are never used to train or fine-tune AI models
Encryption — HTTPS connections, security headers (HSTS, X-Frame-Options), secure cookies
GDPR — data hosted in Europe, full right to delete your documents and embeddings
Full control — add or remove your documents at any time; the index updates in real time
Why Qdrant?
Qdrant is an open-source vector database optimized for large-scale similarity search. It natively supports hybrid search (dense + sparse), metadata filtering, and collection-level isolation — essential for a multi-tenant environment like Heeya.
How much does a RAG chatbot cost?
Heeya makes RAG accessible to businesses of all sizes. Free trial, no credit card required. Plans scaled to your conversation volume.
€0
Free
1 RAG agent, 100 msg/month. Perfect for testing.
€19/month
Standard
1 RAG agent, 1,000 msg/month + 1 AI tool.
€99/month
Premium
3 RAG agents, 5,000 msg/month + integrations.
Frequently asked questions about RAG
What is the difference between RAG and fine-tuning an LLM?
RAG injects external knowledge at every query without modifying the model. Fine-tuning modifies the model weights to change its behavior. RAG is ideal for frequently changing data (pricing, procedures), while fine-tuning is better suited to changing the model's style or reasoning. In 2026, the hybrid approach is the production standard.
What is Agentic RAG?
Agentic RAG adds autonomous AI agents to the classic RAG pipeline. Instead of a fixed retrieve-then-generate flow, an agent dynamically decides what to search for, which tools to use, and when to verify its answer — and can coordinate multiple specialized sub-agents. It is the natural evolution for handling complex queries that require multiple sources and reasoning steps.
What types of documents can be integrated into a RAG system?
A RAG system can ingest PDFs, Word documents (DOCX), PowerPoint presentations (PPTX), text files, scraped web pages, FAQs, and internal knowledge bases. Heeya natively supports all these formats and website scraping.
Does RAG eliminate AI hallucinations?
RAG dramatically reduces hallucinations by grounding responses in real documents. If the information is not in the knowledge base, a well-configured RAG system will say so rather than inventing an answer. Zero risk does not exist, but RAG is the best available approach for making AI responses reliable.
How long does it take to set up a RAG chatbot?
With Heeya, a RAG chatbot is up and running in under 10 minutes: import your documents, customize the instructions, and embed the widget on your site. No technical expertise required. For a custom deployment with an advanced pipeline, allow 1 to 2 weeks depending on complexity.
Is my data secure in a RAG system?
With Heeya, each company operates in an isolated environment. Your documents are never used to train AI models. Vector embeddings are stored in isolated collections and all data is transmitted over encrypted connections. You retain full control over your data. View our privacy policy.
What is chunking and why does it matter?
Chunking is the process of splitting your documents into optimally sized passages before vectorization. Chunks that are too small lose context; chunks that are too large dilute the relevant information. Chunking strategy has a greater impact on retrieval quality than any other parameter in the RAG pipeline.
What is a vector database?
A vector database stores the numerical representations (embeddings) of your documents and retrieves the most similar passages to a question in milliseconds. Unlike traditional keyword search, vector search understands the meaning of the question. Heeya uses Qdrant, a high-performance open-source vector database. Learn more about the AI knowledge base.
Go deeper
Our guides and articles to master RAG and conversational AI.
ADVANCED GUIDE
Agentic RAG: complete implementation guide
Pipeline, chunking, embeddings, agentic architectures, and common pitfalls to avoid.
COMPLETE GUIDE
What is RAG? Complete guide
RAG explained step by step: definition, how it works, benefits, and use cases.
COMPARISON
Enterprise AI Chatbot Comparison 2026
The best enterprise AI chatbot solutions compared side by side.
TOOL
AI Knowledge Base
How to build and optimize a knowledge base for your chatbot.
Our solutions by industry
Customer Service
24/7 automated support grounded in your documents
Legal
24/7 legal lead qualification and appointment booking
Training
AI assistant for training organizations
Switch to RAG: AI answers grounded in your data
Build your RAG chatbot in under 10 minutes. Free trial, no credit card required.