The Optimal Amount of Irrelevant Search Results is Non-Zero
A short one this week. Enjoy!
A few years back, Patrick McKenzie wrote a blog post, "The optimal amount of fraud is non-zero". In it he points out that fraud is a type of crime, and a society's crime rate is a reflection of its policies. For example, if you don't define anything as a crime, then you have no crime. If you lock all of your citizens away from the moment of birth, you also have no crime. Of course, neither of those are reasonable, so all societies accept some level of crime.
The post is great because it makes a statement that seems preposterous, and by the end it seems obvious.
It makes me think of search, and how the same effect applies.
As we saw earlier, search is a balance between precision and recall. A search result that is perfectly precise would never show any unrelated items. It would also not show many related products, and it would very often not show anything at all.
Almost all of the time, then, businesses have decided that it is worth sometimes showing unrelated items. That's the trade off that they’ve made. The number should be small. But there will always be something there that shouldn't be.
(This isn't even getting into how search is subjective, and that there is no one definition of relevant or irrelevant for a query.)
So, the question is: what to do when irrelevant results peak through? Always provide ways for staff to change those results, such as through editing records, adding synonyms, or adding manual rules. And try as hard as possible to give those tools to those closest to the results. For an e-commerce business, these would be merchandisers and marketers.
By the way, this also applies to generative AI and hallucinations. A question for you to consider: how many hallucinations are you willing to have in order to have more creative AI? Does it depend on the context?