May 25, 2023

Zelda: Tears of the Kingdom came out the day after the last newsletter, but shockingly despite that, there’s still a lot happening, including:

Great posts (and a game!) on prompt injection
Copilot leaking its instructions
Microsoft announced a thing or two
Beef up your LLM skills
Evaluating GPT businesses
And more

So let’s get to it…

(As always, if you have a link I should check out, email me at z@dcoates.com.)

Chat-Based Interfaces

Why Chatbots Are Not the Future

In this article, Amelia Wattenberger goes into the problems with chat-based interfaces. Essentially, they don’t provide users with guidance on how to use them, and they don’t make it easy to provide context that you need repeatedly. Additionally, when using chat for productive purposes (e.g., rewriting a text), it’s difficult to understand what has changed.

This aligns with my own thoughts on the subject from a search engine perspective. While chat can be a useful interface in some situations, there’s still a lot to ask of users to understand what to do and how to do it and when. UX embellishments like price filters or a list of search results aren’t things we add just to fill a page but are there to guide users forward.

Prompt Injection

A lot of interesting links on prompt injection this time around. Let’s start off with a couple from Simon Wilson:

These are part of a series from Wilson’s blog, and the first starts off talking about why prompt injection matters at all. A lot of people question whether prompt injection is a problem, because if a customer changes your translation app to translate from English to pirate instead of English to French, what’s the harm?

In this use case, there’s not much harm, indeed. But Simon highlights the example of an app that reads and summarizes emails. A malicious inbound email might inject within the prompt something that forwards on to a rogue actor some of the most recent emails and then deletes the instruction. Or a type of prompt injection could be used to change search results.

So, if you’re convinced that it’s a problem by now, you then have a presentation he did that goes more in-depth.

After all of that, you might be a prompt injection expert, and know all the tricks on how to guard against it. So why not try the opposite?

Gandalf - Prompt Injection Game

This is a game where you’re trying to craft injections that will get an app to tell you a secret password. It starts off easy—just ask for the password—but gets complex rapidly. I wasn’t able to get past level 7. What about you?

Instruction Leakage

A close cousin of prompt injection is instruction leakage, or where users can get an LLM to leak the instructions you gave to it. Marvin Von Hagen got Microsoft’s GitHub Copilot Chat to leak its instructions:

These instructions go from the obvious (the name of the bot) to the more interesting (refusing to discuss life or sentience). This is why I don’t believe that prompt engineering is something that will be unnecessary soon, as some do. This right here is user experience design, is it not?

Using ChatGPT to “Hack” Hacker News

I love this one, even if Hacker News readers weren’t as impressed. The crew at Simple Analytics used ChatGPT to rank, again and again, on the front page of Hacker News.

They watched out via Google Alerts for articles that matched keywords that they knew interested the Hacker News community and then they sent the article details to ChatGPT, and let it determine whether it was a good match, using this prompt:

We are a business called Simple Analytics (http://simpleanalytics.com). It's a privacy friendly analytics tool for customers who care about privacy on the web. We like articles about: people writing about us, competitors in the privacy space, something related to privacy web analytics, or when Google Analytics gets negative press. Is the following article interesting for us? Reply with a single number between 0 and 100. 0 being not interesting at all, 100 being super interesting. After this number, add a dash (-), following by a short explanation. This is the crawled article: {{content}}

A positive hit, and the team was notified, they dropped what they were doing, and they immediately wrote an article reacting to the news.

What’s great about this is that it hits in ChatGPT’s sweet spot, where false negatives (they miss an interesting news article) don’t hurt that much and false positives (they get pinged on an article that isn’t interesting) are caught by a human before anything goes out.

Best of all, this isn’t developer-heavy, and it definitely doesn’t need an ML engineer. Anyone who knows enough to talk to an API and to run a cron job could have done the same thing.

Prompt Engineering Guides

A handful of useful prompt engineering guides to check out:

Brex’s Prompt Engineering Guide (e.g., learn how and why chain of thought is useful)
LearnPrompting.org
DAIR.AI’s Prompt Engineering Guide

Additionally, if you want to go further and learn about all things LLMs, Cohere announced their LLM University, led by Luis Serrano, who wrote the excellent Grokking Machine Learning book. The course covers everything from what an LLM is to building and deploying LLM-based apps.

This Week in Big Tech

First up, Microsoft went deep on AI in their recent Build conference, as if anyone was surprised. Someone once referred to OpenAI as “Microsoft’s AI research arm,” and they’re certainly showing off what they’re paying for. In short, there are sidebars with AI everywhere: a sidebar in Windows, a sidebar in Edge. And not a single Cortana in sight.

Also, Anthropic announced a $450 million Series C. Anthropic has quickly become one of the best-funded players in generative AI, with tech giants such as Google investing. To wit, Google previously invested $300 million in the company. Creating LLMs costs money, and LLM companies are raising it.

Finally, an article from Bloomberg that says that, based on recent job posting, Amazon is planning to add conversational search to its store. As for whether that will work, I refer back to the above discussion on chat-based interfaces, especially in search. But I might be wrong. At Amazon’s size, even a niche audience is a massive audience.

Evaluating GPT Startups

The “Three Hills” Model for Evaluating GPT Startups

Marco Witzmann discusses how to evaluate startups that are building on top of LLMs. His thesis is that startups that are just a thin wrapper won’t be very successful; there’s no moat to stop other companies or the LLMs directly doing the opposite. His example is of a content marketing company that uses an LLM to write SEO content. Of course the search engines are going to use ML to discount that content.

What provides real value instead are three things: in-context and collaborative features (embedding within a platform to enable collaboration), gated knowledge, and offline use cases. These are differentiated enough and not able to be counterbalanced by other LLM/ML work.

I don’t completely agree with Witzmann’s POV. For example, he predicts AI will radically change “most” white-collar jobs by 2024 while overlooking just how entrenched most people and most companies are in the way they do things, as well as valid privacy and security concerns.

However, the base argument is completely valid: in the land rush that we’re in today, we’ll see a lot of thin wrappers around LLMs, but the real value is in what a business provides that no one else can. Just like with any other product.

Talking to Computers: The Email