"Heard, chef!" - this AI definitely will not say "you're absolutely right!"
"Heard, Chef" is a native iOS cooking assistant designed to leverage existing AI with long-term personalized memory. It combines an iMessage-style chat interface with a powerful, interruptible voice mode, allowing you to manage inventory, plan meals, and get real-time cooking feedback without washing your hands.
Under the hood, it is engineered for model independence, using a custom "Brain Protocol" that decouples the user experience from the underlying AI, ensuring the app remains fast, private, and adaptable.
The app is built around a familiar, iMessage-esque chat interface.
- Natural Texting: Text your chef just like a friend. "Do I have enough eggs for a quiche?" or "Remind me to buy basil."
- Media Rich: Snap photos directly in the chat flow to ask questions or log items.
- Live Tool Chips: Watch the AI "think" and work. When you ask to check the pantry, you'll see a background chip pop up:
Checking Inventory...followed byFound: 6 Eggs.
Tap the microphone for a hands-free experience designed for active cooking. "How can I make sure this sauce won't break?", "What else could I add to this stir fry?"
- The 40% Modal: Voice mode slides up a non-intrusive sheet covering the bottom 40% of the screen.
- Chef Avatar: A dedicated, animated avatar provides visual feedback, reacting to your voice and the AI's processing state.
- Background Context: The chat window and tool chips remain visible behind the modal, so you can visually confirm that the AI successfully added "Paprika" to your list even while it keeps talking.
Use the camera to bridge the physical and digital kitchen.
- Receipt Scanning: Snap a photo of a grocery receipt. The AI parses the items, normalizes quantities (e.g., "2 lbs" instead of "bag"), and adds them to your inventory.
- Cooking Feedback: Unsure if your onions are caramelized enough? Snap a photo and ask, "Is this ready?" for instant visual analysis.
What sets this apart from Grok or ChatGPT voice mode is you don't have to orchestrate custom files storing your information
- allergy information prompt injection The LLM will always adjust recipes for your personal situation
- find, save, edit, and share recipes The recipebook can be referenced while shopping or cooking or sent between users
- easy shopping list never forget what you had already at home while youre at the store.
This project is architected for longevity and flexibility, avoiding vendor lock-in through strict abstraction layers.
The app does not communicate directly with any specific AI provider. Instead, it interacts with a strictly typed ChefIntelligence protocol.
- Swappable Backend: Allows the app to switch between Gemini 2.0 Flash (Cloud) for complex reasoning and potential future Local Models (e.g., Llama/Mistral via MLX) for offline privacy.
- Audio Specs: The pipeline handles PCM 16-bit, 16kHz audio for low-latency streaming.
To minimize latency and costs, "Heard, Chef" uses Active Tool Calling. Instead of dumping your entire inventory into the prompt, the AI calls specific tools to retrieve data on demand.
Available Tools:
| Domain | Function | Description |
|---|---|---|
| Inventory | add_ingredient |
Add items with quantity normalization |
remove_ingredient |
Decrement stock or remove items | |
update_ingredient |
Patch ingredient fields | |
get_ingredient |
Check details for one ingredient | |
list_ingredients |
List items with optional filters | |
search_ingredients |
Fuzzy name search | |
| Recipes | create_recipe |
Create a new recipe |
update_recipe |
Update recipe fields | |
delete_recipe |
Remove a recipe | |
get_recipe |
Full recipe with ingredients and steps | |
list_recipes |
Browse recipes by tag | |
search_recipes |
Search by name or tag | |
| Cross-Tool | suggest_recipes |
Recipes matching current inventory |
check_recipe_availability |
Missing list for a specific recipe |
LLMs speak in approximations; databases need precision.
- Ingestion: User says "I bought a bunch of cilantro."
- Normalization: The engine maps "bunch" to a standard unit (e.g.,
count: 1) and categorizes it under.produce. - Persistence: Only validated, strictly-typed data is saved to SwiftData (SQLite), ensuring sorting and filtering always work.
The repo now has one real internal subsystem module:
Modules/VoiceCore/owns voice/call coordination, route recovery, structured eventing, and the derivedVoiceCallUIStateused by the app.app/remains the app shell, UI, persistence wiring, and Gemini integration layer.Modules/VoiceCore/Tests/VoiceCoreTests/is the primary automated logic suite for voice behavior.heardTests/is intentionally smoke-first and exists to verify the hosted app test harness, with hosted performance checks kept experimental.heardUITests/owns simulator-driven interaction regressions, with stable CRUD/navigation/search coverage on by default and gesture-heavy keyboard dismissal coverage available as opt-in.
Supporting docs:
docs/testing/ios-testing-playbook.mddocs/testing/testing-strategy.mddocs/architecture/repo-structure-roadmap.mddocs/rebuild/04-voice-regression-matrix.mdscripts/xcresult-summary.shfor compact local test-result summaries
The iOS test setup keeps one stable default lane and one explicit experimental lane.
Stable commands:
./scripts/test-ios.sh voicecore./scripts/test-ios.sh app-build./scripts/test-ios.sh app-smoke./scripts/test-ios.sh app-ui./scripts/test-ios.sh stable
Experimental commands:
./scripts/test-ios.sh app-ui-gestures./scripts/test-ios.sh app-ui-gestures-repeat 10./scripts/test-ios.sh experimental
Stable hosted coverage currently includes:
AppLaunchSmokeTestsGeminiServiceSetupTests
Non-UI tests are moving to Swift Testing. UI tests and measure-based performance tests remain on XCTest.
GeminiServiceSetupTests now covers an explicit matrix of hosted audio setup payload variants while keeping SessionConfig.audio() mapped to the current runtime default.
Use docs/testing/ios-testing-playbook.md as the testing source of truth. It documents:
- the shared
heard-stableandheard-experimentalXcode test plans - simulator resolution and the canonical
iPhone 17 Pro/iOS 26.2target - logical-run
.xcresulttriage via--latest-runand--run <id> - historical directory aggregation via
--all - stable vs experimental coverage and promotion rules
- Xcode 17.x with the Apple Swift 6.2 toolchain
- Swift language mode: Swift 5
- Deployment target: iOS 17.0+
- Canonical test simulator: iPhone 17 Pro on iOS 26.2, or the nearest installed current runtime
- API Key: Google Gemini API Key (multimodal live access).
git clone https://github.com/asavschaeffer/heard-iOS.git
cd heard-iOS
Note: If the .xcodeproj file is not tracked, create a new iOS App in Xcode, select "SwiftData" for storage, and drag the app/ folder into the project navigator.
This project uses .xcconfig files to secure secrets.
- Create
Secrets.xcconfigin the root directory. - Add your key:
GEMINI_API_KEY = your_actual_key_here
(REST uses gemini-2.5-flash; Live API uses gemini-2.5-flash-native-audio-preview)
- Voice Persona: You can change the voice in
GeminiService.swift. Supported voices include:Aoede,Charon,Fenrir,Kore, andPuck. - System Prompt: Customize the chef's personality (e.g., "Gordon Ramsay mode" vs "Grandma mode") in
ChefIntelligence.swift.
The repo is past the highest-risk voice infrastructure phase.
VoiceCoreis landed as an internal module underModules/VoiceCore/.- Voice/call logic no longer primarily lives in
ChatViewModel. - Explicit lifecycle and route state handling now live inside
VoiceCore. - The automated test split is intentional:
VoiceCoreTestsfor module-owned logicheardTestsfor app-host smoke coverage plus experimental hosted perfheardUITestsfor simulator-driven interaction regressions- stable vs experimental hosted plans under
app/TestPlans/ .xcresultsummaries as the default diagnostics interface- gate-level
.xcresultaggregation for CI and AI triage
VoiceCorePerformanceTestsandAppStartupPerformanceTestsin the experimental lane- gesture-heavy UI regressions stay opt-in until repeated local and CI evidence proves them stable enough for default CI
- physical-device validation for route-sensitive truth
- run-grouped
.xcresultmanifests under.deriveddata/codex-tests/Logs/TestRuns/for AI-friendly triage - The current focus is feature development and tool expansion, with the voice and test infrastructure stable.
Short term
- finish the remaining physical-device voice regression matrix (deferred)
- evaluate a
Modules/GeminiTransportextraction only if app-side integration pressure justifies it
Long term
- evolve toward a modular app shell with two or more real internal modules
- strengthen module-first automation while keeping hardware checks for route-sensitive audio
v1.3.0 β Done
- First-action lag (phone button, long-hold message, share button)*
- Still intermittently present β see #4
- Keyboard dismiss in add/edit ingredients and edit recipe modals
- Speakerphone echo with Google Live API (VAD tuning + AEC)
- Nav order: Inventory β Chat β Recipes β Settings
- Launch screen dark mode (backgroundless logo)
- Chat bubble color dynamism on light mode
- Chef avatar in chat view, calling, and FaceTime
- VoiceCore module extraction with explicit state machine
- Voice selection and VAD calibration settings
- Beta system prompt editing
- App icon refresh
- Fix ingredients page camera
- Test infrastructure (test plans, VoiceCoreTests, UI regression suite, xcresult diagnostics)
UX
- Shopping list UI
New Tools
- Allergies
- Timer tool
- Unit conversion tool
Major Features
- Multiple chats
- Auth & ephemeral keys
- Onboarding
- Google Cloud backend
- Memory manager (post-conversation topic extraction and context assembly)
Ongoing
- Speakerphone echo refinement (VAD tuning, AEC)
- System prompt experimentation
docs/gemini-tools.md- Drill-down toolset and Gemini tool architecturedocs/testing/ios-testing-playbook.md- Canonical local verification commands and test ownershipdocs/testing/testing-strategy.md- Test layer philosophy, ownership rules, and future UI-test directiondocs/architecture/repo-structure-roadmap.md- Current module/app ownership rules and extraction directiondocs/rebuild/04-voice-regression-matrix.md- Physical-device checklist for voice and attachment regressions
- No cloud sync - Data is local only
- Live API experimental - Gemini Live API may change
- iOS only - No macOS/watchOS support
- English only - No localization yet
- No offline mode - Voice features require internet
- Cloud sync with iCloud or Supabase
- Meal planning calendar
- Nutritional information
- Recipe import from URLs
- Apple Watch companion (timer controls)
- Siri Shortcuts integration
- Widget for expiring ingredients
GNU Affero General Public License v3.0 with Commons Clause
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Commons Clause The Software is provided to you by the Licensor under the License, as amended by the "Commons Clause". You may not sell the Software. "Selling" means practicing any or all of the rights granted to you under the License to provide to third parties, for a fee or other consideration (including without limitation fees for hosting or consulting/ support services related to the Software), a product or service whose value derives, entirely or substantially, from the functionality of the Software.
- Google Gemini API for the underlying intelligence.
- Chef Rah Shabazz - a maverick.