Design principles for LLM-Based systems
LLMs as the driving force
This is not only about RAG systems - but RAGs are a good example. When developing RAG systems, we have options: enhancing the Retrieval engine with advanced semantic search capabilities, enabling the LLM to intelligently utilize existing searches, or integrating both approaches. Currently, many efforts focus on crafting an optimal Retrieval engine that provides comprehensive background information for the LLM to generate conclusions. However, an alternative method involves leveraging the LLM itself to iteratively conduct searches, determining what additional information is necessary and how best to find it. This mirrors the human approach in existing environments. The main advantage of this second strategy is its ability to seamlessly integrate multiple information sources. Additionally, it can effectively utilize existing search systems, eliminating the need to upload all company data into a vector database for an LLM-based bot to access and use that information for answering queries.
Challenges
LLMs process text differently from humans. For instance, simply connecting an LLM to a Google search might result in the LLM receiving the entire text of a web page, which is problematic if the page exceeds the LLM's context length. In contrast, when humans search for information, they typically don't read the entire page at once. They might start by reading the beginning to ascertain relevance, then jump to specific sections, skim the article, or search for particular words. These strategies can be emulated by an LLM (though skimming could pose a challenge, it is likely achievable). Search results often include titles and summaries (or 'snippets'), which LLMs could utilize similarly to humans.
Case Study: Practical Application
Consider this example of an LLM using such an approach, detailed at https://andyz245.github.io/LanguageAgentTreeSearch/:
2023-10-11 19:52:06,131 - INFO - PROMPT:
Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation, and Action can be three types:
(1) Search[entity], which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it will return some similar entities to search.
(2) Lookup[keyword], which returns the next sentence containing keyword in the current passage.
(3) Finish[answer], which returns the answer and finishes the task.
After each observation, provide the next Thought and next Action. Here are some examples:
Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas. This took place from 1780 to 1650 million years ago (Mya), during the Paleoproterozoic (Statherian Period). It is recorded in the Colorado orogen, a >500-km-wide belt of oceanic arc rock that extends southward into New Mexico. The Colorado orogeny was likely part of the larger Yavapai orogeny.
Thought 2: It does not mention the eastern sector of the Colorado orogeny. I need to look up eastern sector.
Action 2: Lookup[eastern sector]
Observation 2: (Result 1 / 1) The Colorado orogen, formerly called the Colorado province, is a >500-km-wide belt of oceanic arc rock (1.78–1.65 Ga) that extends southward into New Mexico and composes a major part of the Proterozoic provinces of southwestern United States. This transcontinental collisional event occurred during the Paleoproterozoic (Statherian Period).[1] The Wyoming sector of the Colorado orogeny was formerly called the Medicine Bow orogeny. The eastern sector extends into the High Plains and is called the Central Plains orogeny. The boundary between the Colorado orogeny and the Wyoming craton is the Cheyenne belt, a 5-km-wide mylonitic shear zone that verges northward. The Cheyenne belt transects and cuts off the south edge of the older Trans-Hudson orogeny.[2]
Thought 3: The eastern sector of Colorado orogeny extends into the High Plains, so High Plains is the area. I need to search High Plains and find its elevation range.
Action 3: Search[High Plains]Observation 3: High Plains refers to one of two distinct land regions:
Thought 4: I need to search High Plains (United States).
Action 4: Search[High Plains (United States)]
Observation 4: The High Plains are a subregion of the Great Plains, mainly in the Western United States, but also partly in the Midwest states of Nebraska, Kansas, and South Dakota, generally encompassing the western part of the Great Plains before the region reaches the Rocky Mountains. The High Plains are located in eastern Montana, southeastern Wyoming, southwestern South Dakota, western Nebraska, eastern Colorado, western Kansas, eastern New Mexico, western Oklahoma, and to just south of the Texas Panhandle.[4] The southern region of the Western High Plains ecology region contains the geological formation known as Llano Estacado which can be seen from a short distance or on satellite maps.[5] From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).[3]
Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
Action 5: Finish[1,800 to 7,000 ft]
The Role of Semantic Search
Semantic search, based on embeddings and vector databases, might outperform traditional keyword-based searches. If it proves more effective at locating pertinent information, it would also be advantageous for human users. In cases where it is sometimes more effective and sometimes less so, an LLM should be able to choose between semantic and keyword-based searches, just as a human would. An LLM could even simulate a semantic search through a keyword search by leveraging its semantic understanding to generate a comprehensive list of keywords, including all synonyms of the terms from the original query.
Developing an Answer Bot
I have personally developed a simple answer bot according to that plan, available at https://github.com/zby/answerbot. Though still a work in progress, with GPT-4, it can already address most reasonable questions that require information from Wikipedia.
Prompts and other parameters should be easy to change
Testing and experimentation play a pivotal role in the development of LLM-based systems. Given the nuanced nature of these systems, a significant amount of trial and error is often required to ascertain the optimal parameters for specific tasks. This process of 'fuzzy fitting' involves rigorously testing various configurations and prompt styles to understand their impact on the system's performance. Such iterative experimentation not only helps in fine-tuning the parameters for better results but also provides valuable insights into the dynamic behavior of LLMs under different conditions.
How practical it is to write LLM independent code?
Considering the rapid evolution and improvement of models like GPT-x, Claude, Bard, and Llama, the ability to swiftly transition between them is essential. This scenario is somewhat akin to how databases function: while there are few end-user systems that can switch underlying RDBMS, libraries often need to remain database-independent.
Don't require too much structure from the LLM answers
When interacting with an LLM, it's advisable not to demand excessive structure in its responses. For example, while an LLM can number lines or actions, this approach might be brittle and subject to change with different versions. Such tasks are more reliably handled on the programming side, as seen in the previous example where the numbering seemed superfluous.