Heard, Chef

"Heard, chef!" - this AI definitely will not say "you're absolutely right!"

Overview

"Heard, Chef" is a native iOS cooking assistant designed to leverage existing AI with long-term personalized memory. It combines an iMessage-style chat interface with a powerful, interruptible voice mode, allowing you to manage inventory, plan meals, and get real-time cooking feedback without washing your hands.

Under the hood, it is engineered for model independence, using a custom "Brain Protocol" that decouples the user experience from the underlying AI, ensuring the app remains fast, private, and adaptable.

The Experience

💬 Conversational Core

The app is built around a familiar, iMessage-esque chat interface.

Natural Texting: Text your chef just like a friend. "Do I have enough eggs for a quiche?" or "Remind me to buy basil."
Media Rich: Snap photos directly in the chat flow to ask questions or log items.
Live Tool Chips: Watch the AI "think" and work. When you ask to check the pantry, you'll see a background chip pop up: Checking Inventory... followed by Found: 6 Eggs.

🎙️ Live Voice Mode

Tap the microphone for a hands-free experience designed for active cooking. "How can I make sure this sauce won't break?", "What else could I add to this stir fry?"

The 40% Modal: Voice mode slides up a non-intrusive sheet covering the bottom 40% of the screen.
Chef Avatar: A dedicated, animated avatar provides visual feedback, reacting to your voice and the AI's processing state.
Background Context: The chat window and tool chips remain visible behind the modal, so you can visually confirm that the AI successfully added "Paprika" to your list even while it keeps talking.

📷 Visual Intelligence

Use the camera to bridge the physical and digital kitchen.

Receipt Scanning: Snap a photo of a grocery receipt. The AI parses the items, normalizes quantities (e.g., "2 lbs" instead of "bag"), and adds them to your inventory.
Cooking Feedback: Unsure if your onions are caramelized enough? Snap a photo and ask, "Is this ready?" for instant visual analysis.

Data Memory Layer

What sets this apart from Grok or ChatGPT voice mode is you don't have to orchestrate custom files storing your information

allergy information prompt injection The LLM will always adjust recipes for your personal situation
find, save, edit, and share recipes The recipebook can be referenced while shopping or cooking or sent between users
easy shopping list never forget what you had already at home while youre at the store.

Technical Architecture

This project is architected for longevity and flexibility, avoiding vendor lock-in through strict abstraction layers.

1. The "Brain" Protocol

The app does not communicate directly with any specific AI provider. Instead, it interacts with a strictly typed ChefIntelligence protocol.

Swappable Backend: Allows the app to switch between Gemini 2.0 Flash (Cloud) for complex reasoning and potential future Local Models (e.g., Llama/Mistral via MLX) for offline privacy.
Audio Specs: The pipeline handles PCM 16-bit, 16kHz audio for low-latency streaming.

2. Precision Context Management (Tool-First)

To minimize latency and costs, "Heard, Chef" uses Active Tool Calling. Instead of dumping your entire inventory into the prompt, the AI calls specific tools to retrieve data on demand.

Available Tools:

Domain	Function	Description
Inventory	`add_ingredient`	Add items with quantity normalization
	`remove_ingredient`	Decrement stock or remove items
	`update_ingredient`	Patch ingredient fields
	`get_ingredient`	Check details for one ingredient
	`list_ingredients`	List items with optional filters
	`search_ingredients`	Fuzzy name search
Recipes	`create_recipe`	Create a new recipe
	`update_recipe`	Update recipe fields
	`delete_recipe`	Remove a recipe
	`get_recipe`	Full recipe with ingredients and steps
	`list_recipes`	Browse recipes by tag
	`search_recipes`	Search by name or tag
Cross-Tool	`suggest_recipes`	Recipes matching current inventory
	`check_recipe_availability`	Missing list for a specific recipe

3. The "Fuzzy-to-Strict" Bridge

LLMs speak in approximations; databases need precision.

Ingestion: User says "I bought a bunch of cilantro."
Normalization: The engine maps "bunch" to a standard unit (e.g., count: 1) and categorizes it under .produce.
Persistence: Only validated, strictly-typed data is saved to SwiftData (SQLite), ensuring sorting and filtering always work.

4. Internal Modules and Verification Surfaces

The repo now has one real internal subsystem module:

Modules/VoiceCore/ owns voice/call coordination, route recovery, structured eventing, and the derived VoiceCallUIState used by the app.
app/ remains the app shell, UI, persistence wiring, and Gemini integration layer.
Modules/VoiceCore/Tests/VoiceCoreTests/ is the primary automated logic suite for voice behavior.
heardTests/ is intentionally smoke-first and exists to verify the hosted app test harness, with hosted performance checks kept experimental.
heardUITests/ owns simulator-driven interaction regressions, with stable CRUD/navigation/search coverage on by default and gesture-heavy keyboard dismissal coverage available as opt-in.

Supporting docs:

docs/testing/ios-testing-playbook.md
docs/testing/testing-strategy.md
docs/architecture/repo-structure-roadmap.md
docs/rebuild/04-voice-regression-matrix.md
scripts/xcresult-summary.sh for compact local test-result summaries

Testing Workflow

The iOS test setup keeps one stable default lane and one explicit experimental lane.

Stable commands:

./scripts/test-ios.sh voicecore
./scripts/test-ios.sh app-build
./scripts/test-ios.sh app-smoke
./scripts/test-ios.sh app-ui
./scripts/test-ios.sh stable

Experimental commands:

./scripts/test-ios.sh app-ui-gestures
./scripts/test-ios.sh app-ui-gestures-repeat 10
./scripts/test-ios.sh experimental

Stable hosted coverage currently includes:

AppLaunchSmokeTests
GeminiServiceSetupTests

Non-UI tests are moving to Swift Testing. UI tests and measure-based performance tests remain on XCTest.

GeminiServiceSetupTests now covers an explicit matrix of hosted audio setup payload variants while keeping SessionConfig.audio() mapped to the current runtime default.

Use docs/testing/ios-testing-playbook.md as the testing source of truth. It documents:

the shared heard-stable and heard-experimental Xcode test plans
simulator resolution and the canonical iPhone 17 Pro / iOS 26.2 target
logical-run .xcresult triage via --latest-run and --run <id>
historical directory aggregation via --all
stable vs experimental coverage and promotion rules

Setup & Requirements

Xcode 17.x with the Apple Swift 6.2 toolchain
Swift language mode: Swift 5
Deployment target: iOS 17.0+
Canonical test simulator: iPhone 17 Pro on iOS 26.2, or the nearest installed current runtime
API Key: Google Gemini API Key (multimodal live access).

1. Clone & Project Creation

git clone https://github.com/asavschaeffer/heard-iOS.git
cd heard-iOS

Note: If the .xcodeproj file is not tracked, create a new iOS App in Xcode, select "SwiftData" for storage, and drag the app/ folder into the project navigator.

2. API Key Configuration

This project uses .xcconfig files to secure secrets.

Create Secrets.xcconfig in the root directory.
Add your key:

GEMINI_API_KEY = your_actual_key_here

(REST uses gemini-2.5-flash; Live API uses gemini-2.5-flash-native-audio-preview)

3. Configuration & Customization

Voice Persona: You can change the voice in GeminiService.swift. Supported voices include: Aoede, Charon, Fenrir, Kore, and Puck.
System Prompt: Customize the chef's personality (e.g., "Gordon Ramsay mode" vs "Grandma mode") in ChefIntelligence.swift.

Current Engineering Status

The repo is past the highest-risk voice infrastructure phase.

VoiceCore is landed as an internal module under Modules/VoiceCore/.
Voice/call logic no longer primarily lives in ChatViewModel.
Explicit lifecycle and route state handling now live inside VoiceCore.
The automated test split is intentional:
- VoiceCoreTests for module-owned logic
- heardTests for app-host smoke coverage plus experimental hosted perf
- heardUITests for simulator-driven interaction regressions
- stable vs experimental hosted plans under app/TestPlans/
- .xcresult summaries as the default diagnostics interface
- gate-level .xcresult aggregation for CI and AI triage
VoiceCorePerformanceTests and AppStartupPerformanceTests in the experimental lane
gesture-heavy UI regressions stay opt-in until repeated local and CI evidence proves them stable enough for default CI
physical-device validation for route-sensitive truth
run-grouped .xcresult manifests under .deriveddata/codex-tests/Logs/TestRuns/ for AI-friendly triage
The current focus is feature development and tool expansion, with the voice and test infrastructure stable.

Roadmap

Short term

finish the remaining physical-device voice regression matrix (deferred)
evaluate a Modules/GeminiTransport extraction only if app-side integration pressure justifies it

Long term

evolve toward a modular app shell with two or more real internal modules
strengthen module-first automation while keeping hardware checks for route-sensitive audio

Todo

v1.3.0 — Done

UX

Shopping list UI

New Tools

Allergies
Timer tool
Unit conversion tool

Major Features

Ongoing

Speakerphone echo refinement (VAD tuning, AEC)
System prompt experimentation

Specs

docs/gemini-tools.md - Drill-down toolset and Gemini tool architecture
docs/testing/ios-testing-playbook.md - Canonical local verification commands and test ownership
docs/testing/testing-strategy.md - Test layer philosophy, ownership rules, and future UI-test direction
docs/architecture/repo-structure-roadmap.md - Current module/app ownership rules and extraction direction
docs/rebuild/04-voice-regression-matrix.md - Physical-device checklist for voice and attachment regressions

Known Limitations

No cloud sync - Data is local only
Live API experimental - Gemini Live API may change
iOS only - No macOS/watchOS support
English only - No localization yet
No offline mode - Voice features require internet

Future Ideas

Cloud sync with iCloud or Supabase
Meal planning calendar
Nutritional information
Recipe import from URLs
Apple Watch companion (timer controls)
Siri Shortcuts integration
Widget for expiring ingredients

License

GNU Affero General Public License v3.0 with Commons Clause

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Commons Clause The Software is provided to you by the Licensor under the License, as amended by the "Commons Clause". You may not sell the Software. "Selling" means practicing any or all of the rights granted to you under the License to provide to third parties, for a fee or other consideration (including without limitation fees for hosting or consulting/ support services related to the Software), a product or service whose value derives, entirely or substantially, from the functionality of the Software.

Acknowledgments

Google Gemini API for the underlying intelligence.
Chef Rah Shabazz - a maverick.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.github/workflows		.github/workflows
Modules/VoiceCore		Modules/VoiceCore
TestSupport		TestSupport
app		app
design		design
docs		docs
heardTests		heardTests
heardUITests		heardUITests
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README-CI-CD.md		README-CI-CD.md
README.md		README.md
interruption.md		interruption.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heard, Chef

Overview

The Experience

💬 Conversational Core

🎙️ Live Voice Mode

📷 Visual Intelligence

Data Memory Layer

Technical Architecture

1. The "Brain" Protocol

2. Precision Context Management (Tool-First)

3. The "Fuzzy-to-Strict" Bridge

4. Internal Modules and Verification Surfaces

Testing Workflow

Setup & Requirements

1. Clone & Project Creation

2. API Key Configuration

3. Configuration & Customization

Current Engineering Status

Roadmap

Todo

Specs

Known Limitations

Future Ideas

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Heard, Chef

Overview

The Experience

💬 Conversational Core

🎙️ Live Voice Mode

📷 Visual Intelligence

Data Memory Layer

Technical Architecture

1. The "Brain" Protocol

2. Precision Context Management (Tool-First)

3. The "Fuzzy-to-Strict" Bridge

4. Internal Modules and Verification Surfaces

Testing Workflow

Setup & Requirements

1. Clone & Project Creation

2. API Key Configuration

3. Configuration & Customization

Current Engineering Status

Roadmap

Todo

Specs

Known Limitations

Future Ideas

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages