Why On-Device Beats the Cloud for Thinking Tools

A half-formed thought has no business sitting on someone else’s server.

A voice memo recorded walking the dog. A note jotted before you knew what you meant. A task list that maps your week. These are the rawest material you have, and the standard deal is that you hand them to a company that processes them on hardware you’ll never see, under terms you didn’t write, often to train a model you’ll never benefit from. For the tools you think in — capture, tasks, reading — that trade stops making sense.

So we built the alternative: tools that do their work on the machine in your hands.

The cloud is more capable. That’s true and it doesn’t decide this.

Concede the strong version of the counterargument first, because it’s real. A frontier model running in a datacenter is more capable than anything that fits on a laptop. It has more parameters, more context, more raw reasoning. Put GPT-class or Claude-class models next to what runs on a phone and the datacenter wins on capability, and it isn’t close.

We use those models every day. This isn’t someone who’s never seen the good stuff claiming the good stuff doesn’t matter.

The point is narrower and more useful: the gap is enormous for hard reasoning and nearly irrelevant for the jobs a thinking tool actually does.

The jobs are smaller than the hype

Walk through what these tools ask a model to do.

Transcribe an hour of speech into text. Read a transcript and write a three-line summary. Suggest a title. Search across a few thousand of your own files for the meeting where you discussed pricing. Pull the action items out of a recording so they can become tasks.

None of that needs frontier reasoning. These are bounded, well-defined language jobs, and on-device models cleared the bar for them a while ago. The marginal capability of a 400-billion-parameter model in a datacenter buys you very little on “summarise this voice memo in three sentences” — and it costs you the very things that make the memo worth keeping private.

SR-7 does its transcription with Apple’s Speech framework and its titles and summaries with Apple’s on-device FoundationModels, all on the machine. ML-42 generates per-document summaries with Apple Intelligence, locally, over the markdown already on your disk. Nothing leaves. There’s no upload step, no “processing your audio” spinner phoning home, no copy of your half-formed note living somewhere you can’t reach.

It’s good enough because the job is the right size for the model — not because the standard dropped.

Apple Silicon is a head start, not a lag

For years “on-device” meant “the slow, dumb version.” You ran the real model in the cloud and shipped a toy locally. That framing is out of date.

Apple has been putting a Neural Engine in its chips since 2017 and shipping a unified-memory architecture where the model and your data sit in the same fast pool. A model that needs to read a long transcript doesn’t fight a memory bottleneck to do it. By the time Apple shipped FoundationModels as a system framework any app can call, the hardware to run them well had been in millions of pockets for years.

So building on Apple’s on-device stack isn’t accepting a handicap to be private. It’s standing on a decade of silicon designed for exactly this. The neural hardware is the reason the local versions of these jobs are quietly excellent now, and it’s why we treat on-device as the lead, not the consolation prize.

What you actually keep

Three things follow from doing the work locally, and each one is the kind of thing you only notice when it’s gone.

Privacy is the default, not a setting. Your recordings, notes, and tasks never become someone’s training data, because they never leave. There’s no privacy policy to parse, no toggle buried in settings, no breach that can leak what was never uploaded. The architecture makes the promise — not the terms of service.

You own the work, and the format proves it. Every SR-7 recording is a markdown file with YAML front matter. Every TR-2 task is a plain checkbox in a folder you control. The model touches files you can open, read, and diff yourself. When the output is plain text on your disk, you can verify ownership for yourself instead of trusting a claim about it.

There’s no server to rent. The work runs on a computer you already bought. On-device AI puts that hardware to work instead of routing every transcription and summary through a datacenter that bills you monthly for the privilege — and never quietly enrolls you in someone else’s subscription to use what you already own. There’s no per-seat cloud cost behind the app, which is exactly why these tools can be a one-time purchase. Sync, if you want it, is your own iCloud or Dropbox or git. That pricing model is the only honest one for local-first software: you can’t credibly sell ownership while metering access to a machine you control.

Where the cloud still wins, plainly

There’s no pretending the line is everywhere. If your work needs the deepest reasoning a model can do — long agentic chains, research across a huge corpus, the hardest synthesis — the cloud is the right tool, and we reach for it too.

The distinction is between tools you reason with and tools you think in. Reaching for a frontier model to untangle a gnarly problem is a deliberate act with a clear payoff. Capturing a thought, filing a task, skimming a document — those happen constantly, half-consciously, with material too raw and too personal to want it leaving the device for a benefit you can’t feel.

For that layer, on-device is the better tool, and it has been for a while.

Good enough, where good enough is the whole point

These run on Apple’s local stack because the math comes out clean. The capability the cloud adds is small for these jobs. The privacy, ownership, and no-subscription wins are large, and they compound the longer you use the tool.

A thought you haven’t finished thinking deserves to stay on your machine. That’s reason enough.