Haystack is an annual conference held in the United States about information retrieval. It sits within a nice middle ground—most conferences are either highly academic or a bunch of people trying to sell something without coming out and telling you they're trying to sell you something. (And it's always obvious that they've barely scratched the surface of being a "practitioner" of their talk topic.) Haystack, meanwhile, has people who actually build things.
Having people who actually build things means that the talks are useful, and they aren't just trend-riding. Not to say that 2024 didn't have a certain flavor, but there was still a mixture of topics.
Retrieval Augmented Generation (RAG)
The single biggest topic of 2024 was retrieval augmented generation. If you're not familiar, RAG is the combination of retrieval (e.g., search) with generation (generally text). There were five talks on this topic.
The first was "Expanding RAG to incorporate multimodal capabilities" from Praveen Mohan and Hajer Bouafif from Amazon. Mohan and Bouafif are like, "Oh, you're using RAG with text sources, that's cute, we're using text, images, and tables." (This isn't an actual quote.) As you can imagine, it's not easy, especially tables. Anytime you see this many boxes on a workflow, you know it's a complex undertaking.
The next talk on RAG was "Measuring and Improving the R in RAG" by Scott Stults.
His talk was about how the typical way of measuring whether you have the right retrieval is to use human judgments, and how this is slow and expensive. We did that before at Algolia, and I can co-sign the slow and expensive part. (And organizing all of this isn't easy, either.) But what if LLMs could do the analysis, learning from humans? This last part is key, because while there have been some explorations on using LLMs alone, they tend to be subpar compared to expert judgments, especially in specialized domains.
"Why RAG Projects Fail, and How to Make Yours Succeed" was the next RAG talk, given by Colin Harman of Nesh.
I appreciate that he called out that the first threat to a RAG project is simply not providing enough value. Whatever we build must solve a problem, and shoving AI in there doesn't change that fact.
Finally, was "Chat With Your Data - A Practical Guide to Production RAG Applications" from Jeff Capobianco of Moody's.
Vector and Hybrid Search
The next most popular topic was on vector search and hybrid (keyword plus vector) search. This is a topic that I've been working on for the past couple of years, so I was interested to see the approaches that others took.
John Solitario of Rockset takes the point of view that "All Vector Search is Hybrid Search".
He defines hybrid search a bit more concretely as "vector search and keyword boosting" merged with reciprocal rank fusion (RRF). It can also mean vector search with metadata filtering, or two stage retrieval plus a re-ranking stage. In the talk he goes deep on how any worthwhile vector search implementation will need to "mix and match" these approaches.
A few other talks on vector and hybrid search were "Exploring dense vector search at scale" by Tom Burgmans and Mohit Sidana from Walters Kluwer, "Vector (Hybrid) Search Live A/B Test Results in E-commerce (DIY) Sector" by Istvan Simon of Prefixbox, and "Retro Relevance: Lessons Learned Balancing Keyword and Semantic Search" by Kathleen DeRusso of Elastic.
Other Topics
Then there were the other topics. They almost all had an AI influence, but they approached different topics.
First up was a presentation from Eric Pugh and Stavros Macrakis on a library they created to capture "user behavior insights." Pugh and Macrakis are right that user behavior is important for search to be of high quality, and this is increasingly true. They say, though, that there's no standard way to do it. This is also true. What they propose then, is a standardized library for collection. We'll see if they're successful. They might be, but I've also seen that there are a lot of unique needs when measuring user behavior, and much of it actually comes far from search, even when it can influence search.
A unique talk in regards to content was my colleague Paul-Louis Nech's presentation on image retrieval. Image search is still a bit of a niche need (and might always be), but image similarity is much more useful to a much broader base of use cases. I see it when I weigh my produce at the store and the scale guesses what I placed there, and I can see it when a fashion website promises to "complete the look" based on the clothing I just put in my cart.
Joelle Robinson of Moody's gave her talk "Revisiting the Basics: How Round Robin Improved Search Relevancy", or how her team used a "standard Round Robin algorithm" to get better search results. What I liked about this talk most was that it was "back to the basics." Yeah, LLMs are exciting, but most search orgs aren't getting the basics right, and here's a talk that shows a foundational way to improve results that your average org can actually use. Yeah, this exact approach won't be perfect for everyone, but neither will RAG.
The final talk that stood out to me was on learning to rank, and how the Reddit team implemented it. This talk was from Doug Turnbull, Chris Fournier, and Cliff Chen. I mentioned above on John Solitario's talk that one approach for hybrid search was two retrievals combined and then re-ranked, and this is the re-ranking. The great thing about learning to rank (LTR) is that it does what it says on the tin. It learns the ideal ranking. Of course, there are challenges, and it was also interesting to see how a search implementation the size of Reddit's handles infrastructure.
I don't believe the 2025 Haystack has been announced, but if you want to learn more (or see all the talks), you can visit their website here.