June 22, 2023

A deep look at making LLMs answer with your data, and more...

Jun 22, 2023

During the last two weeks I discovered that the current best use of LLMs is to act as a better Zelda guide. I also read the articles below, and you should, too.

As always, if you find anything interest, please send it along.

Grounding LLMs With Your Data

Knowledge Retrieval Architecture for LLM’s (2023)

When we were building Algolia Answers back in 2020, one of the biggest challenges for us was connecting the source of data (our customers’ search indices) with the model (GPT-3 Ada, chosen for its speed). I’d say that’s where we spent about 75% of our time.

There were two main things we had to solve for: getting enough documents to the model so it could identify the best one, and getting enough of the document bodies that the model could understand the content. It was a tight balancing act in order to keep costs and latency down, while returning value to our customers.

Algolia Answers was ultimately shuttered and replaced by a new product, but the challenge remains the same for LLM tasks. In this article by Matt Boegner, he summarizes the literature up to January of 2023 and sees ways to architect the solutions necessary. The primary two are data-as-context and fine-tuning. Fine tuning isn’t ideal for most, and so data-as-context is examined more closely.

The architecture that he outlines is called “retrieval-augmented generation.” It’s about what it sounds like: retrieve relevant documents and then use them in the prompt. It’s interesting to see that the architecture hasn’t generally changed in the past three years, although that itself might be changing, both in minor and major ways.

The minor ways are with the further refinements to this architecture, such as one that Boegner includes called Generate-then-Read. This is an ingenious approach to work first inward and then back outward and it works to solve the problem that comes from “poorly formed” user queries.

What it does is to first generate a document from the search query. This might have hallucinations, but that’s okay, because it’s being used to create an embedding. That embedding is then used on the search index to find similar documents (which do not have false information), which are then passed in the prompt to the LLM.

The major ways that this whole approach might be changing is through increased context windows. OpenAI is continually increasing the context window of their models, and Anthropic’s Claude has a 100k context window. These allow for the inclusion of much more context coming from search. However, they won’t work for all needs, as the latency and cost can be too much for certain applications.

We should expect to see continued work in this domain, and Boegner’s article is a good place to familiarize yourself.

OpenAI Introduces Functions

Function calling and other API updates

OpenAI recently announced that two of their models have the functionality to return a JSON object to pass as arguments to a described function. For example, you can describe a get_weather function that takes as arguments the city and the date. For a request such as “what’s the weather in Boston tomorrow?” the model would return the following object:

{"city": "Boston, MA", "date": "23-06-2023"}

The incredible thing to me here isn’t the function but comparing it to entity detection NLU like Dialogflow or Rasa. In these, you need not just to define dozens of utterances, but you also need to define where these variable values (i.e., entities) might be within the utterance, and what values might comprise the entities.

And, yet, with these two models, you just describe the function and that’s all there is to it.

What a time to be alive.

Dan Shipper, btw, thinks that this is a really big deal. His analysis is sound, although I think it’s this part that resonates the most:

This raises big questions for any company building at the infrastructure layer: Anything you’ve created could, in very short order, be eaten or obviated by a new OpenAI feature release. So you have two options: try to build things that OpenAI won’t, or keep racing further and further ahead to continually implement new ideas in the window of time before OpenAI does.

Anthropic Leans into Security

Anthropic security hub

Anthropic recently announced their security hub, complete with their SOC2 and HIPAA certifications. At a time when LLMs are entering government agencies and countries are banning their use, security and privacy can be a real differentiator.

MIT Students Debunk GPT-4’s MIT Math Prowess

No, GPT4 can’t ace MIT

In this article, three MIT seniors examine a recent paper that showed GPT-4 “acing” MIT math tests. They found serious issues with the paper, including finding that some of the questions that GPT-4 answered “correctly” were impossible to answer from the context given.

Stack Overflow’s Experiments with Generative AI

Stack Overflow Labs

This is an interesting peek into a company’s experiments with bringing generative AI into their product. Even more interesting because this is a company that is pretty much exclusively content. What’s interesting is that, instead, the experiments that they run are ones that could be done with purpose-built models (e.g., grouping common questions in a company chat) but are done much more easily with LLMs.

Talking to Computers: The Email