RAGs are the new frontier

An investment perspective

and

Aug 10, 2023

Chat with pdf

As an angel investor I think a lot about AI startups. The current hype predictably produced masses of them. Some silly, artificial and even fraudulent. Others seem to have some kind of useful product - but with no moat and in case of success they will be copied by Microsoft, OpenAI or other big players and crashed in the market without even perspectives for acquisition. But there is one well defined starup idea big enough to sustain a few great startups: integrating big information collections into smart knowledge bases that can be queried in natural language.

In the OpenAI chat plugin store there are currently 26 PDF related plugins. On the web one can find countless other chat with your document apps. I would not invest in an app like that - they have no moat - but are clearly useful. It is useful to have someone or something to summarize or extract information from a document. It would be even more useful if that someone or something could extract information not only from one document but from collections of digitized information. Unfortunately you cannot load whole information collections into the LLM prompt, even the 100K tokens limit of Claude-2 is, in many cases, orders of magnitude too small. This is why we need an LLM integrated with a search system - we need Retrieval Augmented Generation (RAG) systems.

By the way - as I explained previously - finetuning an LLM on the additional data will not work for that use case. There are some examples of LLMs trained on proprietary data - like BloombergGPT, but first it is prohibitively expensive, especially if you need to update it frequently with current data and second even the biggest LLMs cannot store all data. It might never be possible to cleanly separate the reasoning part from the information storage (as I postulated in my previous post) - but LLMs alone will never be enough. We really need to connect external information sources to our reasoning systems.

It is common to use embeddings and vector databases in RAG - but it is important to understand that what is needed is a search system and there are many other technologies suitable there.

Division of labour between LLM and search engine

The simplest RAG systems use the search engines to select the relevant texts and load them into the prompt. But it is not a trivial task and we can use the LLMs themselves in that selection and make it more robust. For example, the system could imitate the way humans use search engines and make the LLM page through article titles to select those that seem relevant (and then check them with a separate prompt). ReAct: Synergizing Reasoning and Acting in Language Models describes how you can make the LLM control the search and other tools (and if you want a non-LangChain example, which I hate too, see: A simple Python implementation of the ReAct pattern for LLMs). The trade-off here is between generality of the LLM and efficiency of the search engine, there will never be one solution optimal for everything. With cheaper and cheaper computing, the more general solution will be winning more and more ground, but optimizations will always stay important.

Some optimization/generalization axes:

Making the final selection in the search engine or using the LLM to page the article titles, choose some, check their content and pass it to the main thread or go on checking some more articles - just like a human would. Select only the latest versions of a document in the search engine or return articles together with their version numbers and other metadata to the LLM to decide what to do.
Using a universal search engine integrating many sources instead or letting the LLM choose specialized search engines in a (again imitating humans).
Universal search engines are useful on their own - but AI will bring even more utility to them and push for more development in this area.
Optimizing texts for LLM reasoning instead or using the content meant for human consumption. Language models are meant to understand the same text as humans - but it does not mean that these texts are optimal for them. Maybe they need shorter texts to fit into the prompts, maybe less repetitions because they tend to remember everything from the prompt, or maybe sometimes more repetitions - who knows how future LLMs will work?

Categories of RAG systems

Business environment is in a constant evolution, niches appear, grow - then disappear. Initially, many business niches for smart knowledge bases will be naturally dominated by existing companies that own or have privileged access to the relevant data - like Microsoft and Google managing our emails and other documents. But there is a lot of value in systems integrating various sources of information - so that they would, for example, understand scientific terms used in personal emails - and eventually we’ll see completely new system architectures and business models.

For the start, I expect the following broad RAG categories:

Personal - that will let us chat not with just one PDF but extract, summarize and integrate information from all the documents collected on our computers, cloud storage, emails, browse history - etc.
Public - for exploring libraries of verified, public information - like academic textbooks, peer reviewed articles, etc. (existing example: WikiChat: A Few-Shot LLM-Based Chatbot Grounded with Wikipedia). Or just public information - like Bing Chat or GPT with browser plugin. I am waiting with excitement for a system that would let a motivated lay person like me to quickly extract information from the cutting edge of science.
Enterprise - for all digitized information stored on company servers.

Some things an enterprise RAG system could do:

Handle natural language queries from users, such as “What is the status of my order?” or “How do I reset my password?”
Understand the user’s intent and extract relevant information from the query, such as order number, product name, email address, etc.
Access multiple data sources across the organization, such as databases, documents, emails, etc., and retrieve relevant results based on the user’s query and intent.
Handle complex and contextual queries from users, such as “show me the latest sales report for Q3 in Europe” or “who is the best performing employee in my team this month?”
Answer questions about a product by extracting information from user manuals, such as “what cable do I need to connect a particular power bank to my phone?”
Engage in multi-turn dialogues with the user, such as asking for clarification, providing suggestions, confirming actions, etc.

There are also some specialized categories like code assistants - that could be used both in private and enterprise environments.

With time I expect more integrations between these tools - like an enterprise assistant using also public information sources.

Enterprise RAG systems are like ERPs

Bing generated for me the following list of similarities between LLMs and ERPs:

Both are software systems that integrate various functions and data sources across the organization, such as finance, HR, manufacturing, supply chain, etc.
Both can provide automation, intelligence, and insight to support decision making and improve performance.
Both can enable collaboration and communication among internal and external stakeholders, such as employees, customers, suppliers, partners, etc.
Both can be deployed on-premise or in the cloud, depending on the organization’s needs and preferences.

I think this is spot on and it means three things. First, the trajectory of their adoption will be similar. We are now at the pre-SAP stage; many companies try to do it by themselves, there are many new tools - but nothing yet dominating the landscape. Second, there will be a lot of money to be earned by consulting companies in implementing these solutions - configuring, migrating data, integrating with existing software, writing custom extensions, etc . This market will be orders of magnitude bigger than the market for the base software. Third, the current wave of more specialized tools - like customer support assistant, hr assistant, etc will be replaced by more generic tools - just like it happened with ERP adoption. With some exceptions - code assistants will stay for sure :)

Examples of existing (more generic) offerings that use RAG or I suspect that use RAG:

Links: https://www.pinecone.io/learn/options-for-solving-hallucinations-in-generative-ai/

A guest post by

Bozena

AI Adventures: A Programmer’s Journey

Discussion about this post