What are LLMs?
Odin's Chat is powered by a large language model (LLM). In chat, you can use the LLM to explore topics and learn new things. Please read how to use Chat in How to use Chat?.
In this article, we will explain what LLMs are, and how they can best be used.
Large Language Models (LLMs) are advanced artificial intelligence systems designed to understand and generate human language. They are built using deep learning techniques, particularly neural networks with many layers (hence "large"). These models are trained on vast amounts of text data to learn the patterns, structures, and nuances of language, enabling them to perform a variety of natural language processing tasks.
Data Collection LLMs are trained on massive datasets comprising text from books, articles, patents, websites, and various other sources. The aim is to expose the model to a diverse range of language patterns, topics, and styles.
Training Process The model learns to predict the next word in a sequence given the previous words, a process known as autoregressive training. This involves masking parts of the training texts, and asking the LLM to predict which word should have been in the masked space. This process helps the LLM build a 'model of the world'. Through training, the LLM learns about every topic it is trained on. After training, it can be used as a type of tireless librarian that can help you uncover information.
LLMs are useful for various tasks, including:
- Question Answering: Providing answers to user queries based on given context.
- Natural Language Understanding: Text classification, sentiment analysis, entity recognition.
- Text Generation: Writing assistance, automated content creation, chatbots.
- Translation: Converting text from one language to another.
- Summarization: Condensing long documents into concise summaries.
Hallucinations and misinformation LLMs can generate plausible but incorrect or misleading information. They lack the ability to verify facts and may produce false statements with confidence.
To understand why, we need to understand what LLMs are. Fundamentally, LLMs predict the most likely next word given a prior sequence of words. The reason it can predict incorrect words as the most probable next word in a sequence is because LLMs are a compressed model of the world, similar to how an MP3 file is a compressed version of raw audio. Therefore, it remembers key parts of the information it has been trained on but does not remember every single detail. In other words, it finds the red thread through all of its training data, remembers that, and uses it to predict the next words. Therefore, the more specific the question, the more likely it is to hallucinate. For example, asking it to cite which specific journal it got a particular piece of information from will likely result in hallucinations. Whereas asking it which academic topics have most actively been explored in neuroscience in the past 10 years will likely result in an accurate answer.
If you do use an LLM to answer very specific questions, make sure to check whether its answer is correct or first limit it to a specific piece of text to answer your question with. For example, in our chat with patent feature, the LLM is limited to the exact text that is present in the patent. In this scenario, it is very unlikely that the LLM hallucinates because it is forced to answer your question based on a given piece of text and does not have to reason about it itself.