Book Review: AI-Powered Search
I started work in search about a decade ago, and AI was on the margins. There was research and some implementations, but when we would speak with prospects none of them asked about what we were doing with AI.
Voice search was when AI first started to be asked about more frequently, and features like popularity-based re-ranking became important. However, even then you could still have a decent impact with "semi-intelligent" features that leveraged statistical models.
The last few years, though, customers have required ML-driven AI. Sometimes this is simple trend chasing, but there are also plenty of improvements that can come from these new advancements, like vector-based semantic search, LTR, recommendations, and more.
AI-Powered Search (Manning Publications, in pre-release), is a book written by Trey Grainger, Max Irwin, and Doug Turnbull examining these advancements and how to implement them. Within each chapter, the authors move between different topics in a self-contained manner, meaning that you can use this as an end-to-end tutorial for getting up to speed, but you can also use it as a desk reference, dipping in when you have the need. While the authors state that the book is primarily for a technical audience, it does well enough to introduce topics that someone who wants to skip the code will benefit as well.
The book starts off with a broad overview, examining the new landscape in search. The most important thing to note is that of Reflected Intelligence, or AI that "[leverages] feedback loops to continually learn from both content and user interactions."
This is all about getting away from the manual work of synonyms or merchandising through rules, and letting the AI do the drudgery. There will always be room for a human touch, but when I speak with merchandisers, they are often spending many hours a week without AI crafting the perfect search results, when AI can free them up to do higher impact work.
The authors claim that in 2024 users are expecting search to be "domain-aware, contextual and personalized, conversational, multi-modal, intelligent, and assistive."
I think they get ahead of themselves on conversational, multi-modal, and assistive. There's still a lot of subpar "traditional" search out there that needs fixing first, and additions like personalization or advanced ranking through learning to rank has a clearer ROI. Assistive technologies, or those that summarize or provide actions proactively, are exciting when they work, but your average e-commerce search does not need this. Nor does it need to search text, images, videos, and voice together. Most really need just text and voice, and really just text, as voice will run through speech-to-text first.
Next in the book is a look at natural language processing, or NLP. Natural language processing is exactly what it sounds like—processing natural language. This is important, because, as the authors point out, while text seems like it's unstructured, it's actually incredibly structured, with nouns, verbs, adjectives, direct and indirect objects, and more. Through "[knowing] a word by the company it keeps," we can understand when words are used in similar contexts, and, thus, are semantically similar. This allows search engines to handle synonyms or alternatives, or we can use more advanced approaches to handle context or understand ambiguous terms.
Of course, no understanding is all that useful if you have too many results and they are poorly ranked. That's where signal boosting and learning to rank come into play. Signal boosting essentially takes popularity of items, and boosts the most popular ones up. Of course, it's not as simple or as straightforward as a simple boost—nothing is
First, you need to decide which signals to use, and how much to weight them. Generally, you want those that signal a higher intent to have a higher weight, and few things can have a higher intent signal than actually putting down money and purchasing a product. So, it should be weighted accordingly, and the authors recommend a 25x weighting for a purchase compared to a search result click. You can also add negative weights in certain cases, such as a product return or clicking a "dislike" button on a video.
Second, you'll likely want to ensure that you build some type of seasonality into play. Imagine that you've got an e-commerce shop selling clothing. Those clicks and purchases that happen most recently should have the biggest impact, because those are the clothes most likely to be in-season.
Deciding how to weight these popularity signals, as well as others, can be done manually, but you should also consider learning to rank, or LTR. LTR uses models that determine the right balance of signals to get to the right ranking to maximize search success. The authors don't imply that LTR is easy—there are three chapters on it, after all—but I do think they still downplay how difficult it is in practice to have the right data to build a successful LTR system.
I think Daniel Tunkelang's recent post, "Learn to Rank = Learn to be Humble" addresses it well:
Many companies do not have massive amounts of behavioral data or human judgments to learn from. Even when they do have a decent quantity of traffic, it often has a Zipfian or power law distribution, in which case the majority of the traffic is concentrated in a relatively small set of unique queries. For many of those queries, engagement tends to be concentrated in a handful of top-ranked results. The same tends to be true for human judgment data, since the expense of human judgments usually leads to prioritizing labels for top-ranked results of frequently issued queries.
Unfortunately, the tail is where there is often the most opportunity to improve ranking. After all, if there were a problem with a head query, it would probably have been fixed already, possibly even by hand.
What does this mean in terms of managing expectations? For me, it means communicating that the bottleneck for improving ranking is usually going to be the available training data — not just the quantity, but the alignment of its distribution with the distribution of opportunities for improvement.
Even a highly talented team of machine learning engineers with sophisticated models and abundant compute resources has to face the reality that a model can only be as good as the data used to train it.
In short, while LTR can be powerful, the availability of data can preclude if from being as powerful as expected.
The final section of the book is on the "frontier of search," which, of course means generative models and nearest-neighbor search through embeddings.
Of the two, I think search over embeddings is the most pressing, and I'd probably even move it from the frontier to "today." It's important to pair nearest neighbor search with increased query understanding, such as through query interpretation like that which the authors cover in chapter 7, as well as improved ranking. This is because the nearest-neighbor search is geared towards increased recall, and that means results that are otherwise less relevant might come in. Of course, in some cases, you can't bear any loss in precision, and so this kind of search simply isn't for you, at least not yet.
The authors wrap up the book by looking at multi-modal search and question answering. As I mentioned above, I don't think either of these are incredibly important in contexts that we generally consider to be search. However, search and retrieval are more than just a search box and a query, and these techniques are useful for the resurgence of conversational interfaces that have take off post-ChatGPT. The authors address the topics from a very high level, but there are many other books that cover this in more detail—Build a Large Language Model (From Scratch) is one that I find promising.
On the whole, I find AI-Powered Search a worthwhile read for anyone who owns or builds search. You'll learn much more on how to implement new search technologies or why you might want to in the first place.