This is the technical explanation of what happens between "workflow triggered" and "AI generates response." Understanding this pipeline — from semantic search to context injection to generation — helps you tune knowledgebases that produce consistently accurate, auditable outputs.
RAG stands for Retrieval-Augmented Generation. It is the architecture that makes AI outputs grounded in specific, verifiable facts rather than statistical guesses from general training data. The name describes the mechanism: before generating a response, the AI retrieves relevant information from a knowledge store and augments its context window with that information before generating.
Without RAG, an AI model answering "what is the price of a 4-inch butterfly valve in our Q4 2025 pricing tier?" has no choice but to guess based on general internet knowledge. With RAG backed by your Structured Data KB, the AI retrieves the exact row from your pricing table and reads the number directly. The difference in accuracy is not incremental — it is categorical.
RenderDraw workflows use RAG as the default architecture for every AI block that has a knowledgebase attached. The retrieval step is automatic, transparent, and auditable. You don't write RAG prompts; you configure the knowledgebase context rules that govern what gets retrieved and how it is formatted before reaching the model.
RAG does not make the AI smarter. It makes the AI operate on your data rather than on internet averages. That distinction is everything in an enterprise context where accuracy and auditability are non-negotiable.
Every AI block execution follows this sequence when a knowledgebase is attached. Total elapsed time: typically under 3 seconds end-to-end.
The workflow engine assembles a retrieval query from the current task context. For an RFP response block, this might be: the section title, the requirement text, and key entities extracted from the source document. The query is semantic — it captures intent, not just keywords.
The query string is converted to a high-dimensional vector (embedding) by the same embedding model used during KB ingestion. This vector encodes the semantic meaning of the query in a way that can be compared to the vectors of every chunk in the KB.
RenderDraw performs approximate nearest-neighbor search (using pgvector under the hood) across all chunk embeddings in the attached KBs. Chunks are ranked by cosine similarity to the query embedding. Sub-100ms for KBs up to 100,000 chunks.
Chunks below the configured confidence threshold are dropped. If the number of remaining chunks is below a minimum viable count and fallback behavior is set to "error," the workflow step fails and routes to a human-review gate.
The topK surviving chunks are formatted according to the citation mode setting and inserted into the AI model's context window. If multiple KBs are attached, chunks from all KBs are merged and re-ranked by relevance before assembly.
The AI model generates its response with the retrieved context in scope. Source citations — document name, version, page/section, chunk ID — are attached to the output. The full retrieval trace is logged for audit purposes.
Traditional keyword search finds documents that contain the exact words in your query. Semantic search finds documents that carry the same meaning, even when the words differ. This distinction is critical for engineering and construction workflows where the same concept appears under many different terminologies across documents from different eras, clients, and standards bodies.
For example, a retrieval query for "corrosion resistance requirements for coastal installations" will semantically match chunks that say "anti-corrosion specifications for marine environments," "salt spray resistance standards for offshore equipment," and "galvanic protection requirements for waterfront structures" — even though none of those phrases contain the query words.
RenderDraw uses a domain-tuned embedding model for the AEC and industrial manufacturing space. It has been trained on construction specifications, engineering standards, product catalogs, and proposal libraries — so technical terminology, part numbers, and standards references are embedded with higher precision than a general-purpose embedding model produces.
Semantic search also handles typos and abbreviations gracefully. "HVAC unit" and "heating ventilation and air conditioning unit" retrieve the same chunks. "SMAW welding" and "shielded metal arc welding" resolve to the same semantic neighborhood. This robustness matters in workflows processing documents that don't all use the same house style.
Query:
"load bearing capacity for elevated walkways"
Keyword match finds:
Only docs containing "load bearing capacity" AND "elevated walkways"
Semantic match also finds:
"live load ratings for mezzanine structures," "structural capacity of overhead platforms," "IBC Section 1607 floor live loads"
RenderDraw uses a domain-tuned embedding model optimized for AEC + industrial manufacturing terminology. All KB content is embedded at ingest time and cached. Re-embedding only occurs when document content changes or when a KB is migrated to a new model version.
AI models have a finite context window — the total amount of text (measured in tokens) they can hold in scope at once. A context window of 128,000 tokens sounds enormous, but in a complex workflow response task, that budget is shared between: the system instructions for the AI block, the source document being processed, the retrieved KB chunks, the task-specific prompt, and the space needed to generate the response.
RenderDraw's context window manager allocates this budget dynamically. For each AI block execution, it calculates the token cost of the source document excerpt, the AI block instructions, and the response target length — then allocates the remaining budget to KB retrieval. If you have three KBs attached with topK=5 each, and the available retrieval budget is 12,000 tokens, the manager automatically adjusts chunk selection to fit within budget while maximizing relevance.
You can tune this behavior in the block's advanced context settings. Retrieval priority sets which KB's chunks are protected first when the budget is tight. Source compression enables lightweight summarization of the source document excerpt to reclaim budget for retrieval. Response budget reserves a minimum token allocation for generation to prevent the model from running out of context space mid-output.
The topK parameter is the number of chunks retrieved per KB query. Higher topK means more context coverage — useful for broad queries that might require evidence from multiple parts of a document. Lower topK means tighter focus — useful for precise factual lookups where more context is noise, not signal.
As a starting point: set topK=3 for precise factual KBs (pricing tables, dimensional specs), topK=5 for medium-depth technical documents, and topK=7–8 for broad proposal library searches. These are tunable per-workflow-block, not per-KB, so the same KB can serve different retrieval depths in different workflow contexts.
The confidence threshold is a cosine similarity floor applied after ranking. Chunks with similarity scores below this threshold are excluded, even if they would otherwise make the topK cutoff. This prevents low-quality matches — retrieved because they were the least-bad option, not because they were actually relevant — from polluting the AI's context.
The relationship between topK and confidence threshold is a precision-recall tradeoff. High confidence + low topK = very precise, potentially narrow. Low confidence + high topK = broad coverage, potentially noisy. For production workflows, start at confidence=0.70, topK=5, and tune based on output review. Most teams find a stable configuration within three to five test runs.
Use when factual accuracy is critical (pricing, dimensions, compliance requirements).
Use when broad context coverage matters more than pin-point precision (proposal libraries, narrative sections).
The most powerful pattern in RenderDraw's knowledgebase system is the multi-KB query: attaching multiple knowledgebases to a single AI block and letting the retrieval layer compose their results before the AI sees any of it.
Consider a workflow block generating a technical response section for an industrial RFP. The block might draw from: a Product Spec KB (what are the exact dimensions and ratings of this equipment?), a Pricing KB (what are our current pricing tiers for this configuration?), and a Compliance KB (what standards must this equipment meet for this type of installation?). All three are needed; none is sufficient alone.
In a multi-KB query, RenderDraw fires parallel retrieval queries against all attached KBs simultaneously. Results from each KB are tagged with their source KB identifier. The context assembly layer merges results from all KBs, re-ranks the combined set by relevance to the query, and truncates to the context window budget — ensuring the most relevant chunks from any KB make it into context, regardless of which KB they came from.
KB priority ordering matters when two KBs return chunks with similar relevance scores. The KB listed first in the block's KB list has priority — its chunks are preferred when the budget requires a tiebreak. For most configurations, list the most authoritative KB first (e.g. the Compliance KB before the Proposal Template KB for a compliance-heavy section).
AI block: "Generate Technical Compliance Response"
KB 1 — Compliance Standards
Priority 1 | topK=3 | confidence=0.80
KB 2 — Product Specs 2025
Priority 2 | topK=5 | confidence=0.75
KB 3 — Winning Proposals
Priority 3 | topK=4 | confidence=0.65
Parallel retrieval fires against all three. Results merge, re-rank, and assemble into a single context payload for the model.
For teams using the RenderDraw Workflow API or the MCP server to build automated pipelines, knowledgebase context is configured in the workflow definition JSON. The knowledgebases array in an AI block's config specifies which KBs to attach, in what priority order, and with what per-KB retrieval parameters.
{
"block_type": "ai_generate",
"name": "Generate Technical Section",
"prompt_template": "rfp_technical_section_v2",
"knowledgebases": [
{
"kb_id": "kb_compliance_ashrae_2025",
"priority": 1,
"topK": 3,
"confidence_threshold": 0.80,
"citation_mode": "inline"
},
{
"kb_id": "kb_products_conveyor_q4",
"priority": 2,
"topK": 5,
"confidence_threshold": 0.75,
"citation_mode": "inline"
},
{
"kb_id": "kb_proposals_industrial_wins",
"priority": 3,
"topK": 4,
"confidence_threshold": 0.65,
"citation_mode": "footnotes"
}
],
"context_settings": {
"retrieval_budget_tokens": 6000,
"generation_reserve_tokens": 3000,
"source_compression": false,
"fallback_behavior": "human_review_gate"
}
}
The kb_id values are the unique identifiers assigned when knowledgebases are created (visible in the KB settings panel). The fallback_behavior of "human_review_gate" means: if no KB returns chunks above threshold for any attached KB, the workflow step fails gracefully and routes to the configured human reviewer rather than proceeding with an ungrounded AI response.
This configuration is also fully editable in the visual workflow builder — the JSON representation and the visual editor are synchronized. Teams can use either interface depending on whether they prefer visual configuration or programmatic control.
Step-by-step walkthrough from creating a KB to connecting it to a workflow block.
Read →How documents are processed, chunked, embedded, and indexed for retrieval.
Read →Pricing tables, BOMs, and catalogs as queryable knowledge — column mapping and live sync.
Read →