RAG time: one Pixonaut's adventures in retrieval-augmented generative AI
Tales from a professional development adventure
One of Pixo’s company values is “we invest,” which we manifest by encouraging our staff to pursue professional development opportunities.
This can take many forms—conferences, webinars, online courses, or just some good old-fashioned tinkering with new or emerging technologies.
This year, with artificial intelligence increasingly on our (human) minds, Pixo’s Senior software engineer Ben Young decided to tackle building his own AI chatbot using retrieval augmented generation, or RAG.
What is RAG?
First, some helpful definitions:
- LLM: Large-language model, a broad category of AI designed to understand and mimic human language in text generation;
- GPT: Generative pre-trained transformer, a subset of LLMs that uses the Transformer architecture and has been pre-trained on vast, publicly available datasets.
- RAG: Retrieval augmented generation, a form of AI in which a language model doesn’t rely solely on its training data, but rather selects fresh, relevant information (from predetermined data sources) in real time as it fields user questions and generates answers.
The “household name” GPTs—e.g., ChatGPT, Gemini, Claude—have been trained on vast datasets, such as the entire internet (or thereabouts). As a result, they know a little bit about a great many things. But they are lacking the deep, specific knowledge—not to mention sensitive personal or proprietary information—needed to help individuals or organizations make highly contextual, strategic decisions.
Also, there’s a lag in their knowledge—GPTs were trained on information that was available in a snapshot, often a year or more ago. That delay brings additional baggage and background noise to their interactions.
So, how do you cut through the cruft and build a chatbot that’s an expert purely in your own data?
Enter RAG.
There are a great many use cases for a RAG chatbot, for example:
- Interact with sensitive data that needs to be stored in a secure environment, such as medical records or financial reports.
- Glean insights or identify trends in a particular, focused, or proprietary dataset—without any bias from general knowledge or external data.
- Implement a customer-facing AI to interact with consumers on behalf of a business.
Selecting a framework
For his RAG-building project, Ben used the LangChain framework, which has a robust software library of tools for working with AI models in various ways.
Packages exist for working in both Python and JavaScript, though at the time of Ben’s project, Python seemed to be the more solid offering.
LangChain also includes built-in tutorials for building RAG systems, which can be a helpful resource.
Anatomy of a RAG
Once oriented to the LangChain framework, Ben began building out his RAG system by making selections from its suite of available tools:
Embedding model: Transforms text into strings of numbers (called vectors) that reflect semantic similarity—that is, information that is related by concept or human meaning rather than proximity, keyword congruence, or rule-based relationships.
Splitter: Pre-process documents by chunking them into manageable units. Splitting can occur at many different levels of granularity—for example, sentences, to paragraphs, to chapters in a book. In some cases, it can even make sense to split and index the same data set in multiple different ways—sort of like providing more “inroads” to the most relevant information for any given inquiry.
Vector database: Stores vector embeddings of all the text chunks in a way that allows fast similarity search. (Note: at the time of this article, there are many vector database providers to choose from, such as Pinecone, Chroma, MongoDD, Redis, and others—we’re keeping a close eye on this evolving market.)
Retriever: Searches the vector database for relevant chunks based on the embedding.
Cue the RAG chain
With all the components established, Ben was ready to initiate the RAG chain, i.e., the orchestrated sequence of steps by which the RAG operates to answer user inquiries:
- Receive user’s question (via a chat interface)
- Embed the question (per the chosen embedding model)
- Retrieve the most relevant chunks of information from the vector database (i.e., the chunks with the highest semantic similarity “scores” based on their vectors)
- Format the final prompt, combining the retrieved information (context) with the user’s question
- Run the selected LLM to generate an answer
- Return the final output to the user in the form of conversational, human-like text
A whale of a corpus
Because he is a true Renaissance man (and because it was freely available from Project Gutenberg), Ben chose Herman Melville’s “Moby Dick” to be his RAG’s reference material.
Applying RAG to a fictional literary work—as opposed to a factual dataset—introduces another layer of interesting questions to ponder:
- Consider, for example, that Moby Dick doesn’t contain “a chapter” or “section” on Captain Ahab. Information relevant to questions about his character is not consolidated or adjacent. This illustrates the significance of vectorization: RAG requires drawing semantic connections among chunks of information distributed throughout the text.
- Novels often jump around timelines in their storytelling. Some chapters flash back while others flash forward. In many cases, arranging a novel’s events in a linear chronological timeline could dull its dramatic effect, inject unforgivable spoilers, or completely change the meaning and impact of the story.
- Literature is rife with allegory, symbolism, and other abstractions. Non-literal interpretation is often crucial to human understanding and appreciation of fiction.
Putting it to the test
So, how did Ben’s scratch-built RAG fare in its (ahem) maiden voyage? And perhaps more crucially: did it actually outperform an “off-the-shelf” LLM*?
To find this out, Ben administered a 70-question SparkNotes quiz on Moby Dick to the same LLM—with and without RAG support:
- LLM on its own: 41 correct answers out of 70 questions, or 58-59%.
- With Ben’s RAG: 51 correct answers out of 70, nearly 72-73%.
Success—the RAG prevails!
And there’s likely room for improvement:
“This RAG setup is very simple,” Ben notes. “If we spent more time experimenting with other methods for splitting the book—or other embedding models—we would likely be able to get even better results…” then he trails off, with a mad gleam in his eye…
Just kidding. The prospect of achieving a perfect 70/70 score on the Moby Dick quiz won’t become an obsessive, all-consuming quest—just a source of inspiration for future RAG ventures, and, more importantly, a solid foundation for building custom AI solutions for our clients.
(*Editor’s note: Ben used Qwen3-8B, which is relatively small for a “large” Language Model. It has 8.2 billion parameters and a context length of ~32,000 tokens, compared to proprietary, state-of-the-art models that are estimated to have hundreds of billions of parameters or more. Since ChatGPT and similar could probably ace the quiz on its own, selecting a more rudimentary LLM as the base helped produce a more meaningful comparative analysis in the context of this project.)