Build It Yourself

I built my own Wispr Flow alternative for iPhone and Mac

Wispr Flow is a voice dictation app. You talk, it types clean text wherever your cursor is. It's genuinely nice. It also costs 15 bucks a month, which is the most expensive thing in the whole category. There's no buy-it-once option, so that's 180 a year, forever, for an app that does basically one thing.

I wasn't going to do that. So I built my own and called it Wispr No. Below is the actual build: the files, how the pieces fit, and the stuff I had to fight through on both iPhone and Mac.

Why I bothered instead of paying

The math is silly. Superwhisper, the alternative everybody compares Wispr Flow to, runs around 8 bucks a month. VoiceInk is a one-time payment under 40. Wispr Flow is the priciest one and it's a subscription with no end date.

The whole job is short. Stream your mic to a transcription model, get text back, run it through a quick cleanup pass, drop it at your cursor. That's it. I'm not paying a monthly bill forever for that when I can own it.

The one brain both apps share

I didn't build two apps. I built one set of "brain" files and pointed two Xcode projects at it, so there's a single source of truth. Both projects reference the same shared files. Change the brain once, both apps get it.

Here's what's actually in there:

  • Config.swift holds the settings: the transcription model name, my backend URL, the app token, and the shared app group ID.
  • TokenService.swift asks my server for a 60-second throwaway OpenAI key.
  • RealtimeTranscriber.swift opens a WebSocket to OpenAI, streams the mic audio up, and gets transcript text back as you talk.
  • CleanupService.swift sends the raw transcript to my server's cleanup endpoint, which runs it through an LLM with my custom dictionary so it fixes spelling, punctuation, and filler.
  • VocabDictionary.swift is just my custom word list, the weird terms I want spelled right.

The security piece matters and it's worth copying. My real OpenAI key never ships inside either app. The app only gets a 60-second token that dies almost immediately, and the cleanup call routes through my server too. The app token on those server calls is a low-stakes gate so random people can't hammer my endpoint. Nothing valuable sits on a phone for someone to dig out.

How I built the iPhone app

This one was the hard one, and it's where most free Wispr Flow alternatives for iPhone fall apart.

It's one project with two targets: the main WisprNo app and a WisprKeyboard extension. They talk to each other through a shared App Group container, which is the part you have to set up in Xcode before anything works.

The reason for two targets: iOS flat out won't let a keyboard extension use the microphone. So the mic has to live in the main app, and the keyboard is just the typist. Here's the flow I wired up:

  1. You tap the mic on the keyboard. That mic is a SwiftUI Link, which is the one thing iOS 18 lets a keyboard use to open its own host app.
  2. The app opens and starts recording. You swipe back to wherever you were typing and it keeps listening in the background.
  3. You talk, then tap stop. The whole transcript goes to the cleanup server and comes back polished.
  4. The cleaned text gets written to a coordinated file in the App Group container, and the keyboard reads that file and inserts the text at your cursor.

Two bugs nearly broke me. First, getting the keyboard to launch the app at all, which the SwiftUI Link solved. Second, the maddening one: the text just wouldn't show up. I'd built it to pass text through UserDefaults and hit a race condition. The fix was moving to a file-based queue, writing the cleaned text to a coordinated file and letting only the active keyboard drain it. The keys themselves felt clunky too, so I rebuilt the whole keyboard on KeyboardKit, which is free and gives you a real native-feeling QWERTY with proper touch handling. The mic and transcript bar sit in KeyboardKit's toolbar slot.

How I built the Mac app

Mac doesn't have any of those handcuffs. A Mac app can use the mic and type into any app directly, so there's no keyboard extension and no swipe-back dance. It's simpler and honestly better.

It's a menu bar app that lives up by the clock with no Dock icon. It's a handful of files plus all the shared brain code: WisprNoMacApp.swift for the app shell, MacSession.swift to run the session, MacAudioCapture.swift for the mic, FnPushToTalk.swift to watch the fn key, and TextInserter.swift to type into other apps.

The flow:

  1. Hold the fn key anywhere on the system. It plays a tink so you know it's listening.
  2. Talk while you hold it.
  3. Release fn. It captures the last word, runs the cleanup pass, and types the polished text straight into whatever app you're focused on, with a pop sound when it's done.

It needs two macOS permissions: Microphone, and Accessibility. The Accessibility one is what actually lets it type into other apps, so don't skip it.

A few things I tuned to make it feel finished. It keeps the mic alive about 350 milliseconds after you release fn so your last word doesn't get clipped. It pre-warms the connection so the second dictation onward is instant. And I guarded against a crash that hit when you tapped without actually saying anything.

So should you build your own

If you write code, honestly, yeah. The core is a weekend. Stream the mic, get text, clean it, drop it at the cursor. The polish is where the real time goes, but that's true of everything.

If you don't write code, that's the whole reason I do this for a living. The same logic that says don't rent a $15 dictation app forever also says don't overpay for bloated software that does ten things when you need one. I'd rather build you the thing that does exactly what you need and hand you the keys. This was just me doing it for myself first.

Get a robot doing it

Want a robot doing this for you?

Tell me the task and I will tell you straight whether I can build something for it, what it takes, and roughly what it saves. No pitch.

← Back to Build It Yourself