Here's the lie a lot of AI-capture flows tell: "just point your camera and we'll figure out what it is." It's a beautiful demo. It is also a flow that, in production, gets the title wrong about one time in five. And one time in five is exactly the wrong rate.

One time in twenty would feel magical. One time in two would feel honestly broken, and you'd build for it. One time in five is the worst zone, frequent enough that everyone hits it within a week, rare enough that the rest of the time the flow doesn't know how to ask for help.

The "Not quite" branch

The flow we ended up with looks like this. You take a photo. The model takes its best guess in big confident lettering, "Stand mixer?", and underneath, in equal-weight type, there's a button that says "Not quite." Not "edit." Not "wrong." Not a pencil icon. Just: "Not quite."

Tap "Not quite" and you get a single-line text field, pre-populated with what the model guessed, ready for you to overwrite. No menu. No category picker. No "is it a tool? a kitchen item?" branching tree. The cursor is already in the field. You type the right name. You're done.

The right answer and the recovery from the wrong answer should feel like the same shape, just one second longer.

Why "Not quite," not "Wrong"

This is the part that sounds small and isn't. "Wrong" makes the model the villain. "Edit" makes the user the villain (they're the one who has to clean up the mess). "Not quite" lets nobody be the villain. It just admits that two people, one of them silicon, are working together on the same problem.

Tone matters in a particular way here. The capture step is the first real interaction a lot of users have with the app. If it feels like a system that's about to bill them for a mistake, they bail. If it feels like a system that's apologizing for its half of the work, they keep going.

The math we don't try to win

We spent a couple of weeks earlier in development trying to push the model accuracy up. We can move the needle, but the law of diminishing returns kicked in fast. Going from "right 78% of the time" to "right 84% of the time" cost more engineering effort than going from "wrong recovery feels frustrating" to "wrong recovery feels seamless," and the second one did more for the experience by a wide margin.

That's the shape of the bet. Don't try to win the accuracy fight. Win the fight after.

Other places we've applied this

  • Multi-item detection. The camera sees three items in a photo. Sometimes it sees four where there are three. The fix isn't to teach the model what counts as one object, the fix is a card-stack at the bottom of the screen where you swipe to dismiss the one that's wrong.
  • Care-note suggestions. The model suggests "battery latch sticks." It's almost always not quite right. The text it suggests is fully editable from the second the suggestion appears, no "accept then edit" two-step.
  • Fold name suggestions. For brand-new Folds, we suggest names like "Maya's block" based on context. They're terrible suggestions. We label them "(or call it whatever you want)" right under the field.

What this costs

Building a great recovery flow takes about as much design time as building a great primary flow. So you're paying for two flows, not one. That's the real cost of "AI capture forgiveness." We think it's the right cost, for one reason: the moment the AI gets something wrong is, weirdly, the moment the user trusts the app the most. If the recovery is graceful, they come away thinking "oh, it knows when it doesn't know." That's a more valuable feeling than "it always knows."

· with care,
Jules

Stay close

A field note in your inbox monthly.

One short post a month. Never a "growth update".