Skip to content
/ speako Public

Privacy-first pronunciation coach. Real-time grammar and fluency analysis running 100% in-browser via WebAssembly & Whisper.

Notifications You must be signed in to change notification settings

rgilks/speako

Repository files navigation

Speako πŸŽ™οΈ

Browser-based AI Speaking Practice

β–Ά Try the Live Demo

Speako Application Screenshot

Speako is a local-first application designed for practicing exam-style English speaking tests. It prioritizes user privacy, zero latency, and a premium user experience by running powerful AI models directly in your browser.

Features

  • πŸ”’ Privacy First: Voice data is processed locally on your device using Transformers.js.
  • 🎨 Premium Design: A beautiful, distraction-free "Dark Glass" interface built with Pure CSS.
  • 🧠 Smart Analysis:
    • CEFR Level Detection: ML-powered proficiency assessment using a fine-tuned DeBERTa model (robg/speako-cefr-deberta).
    • Grammar Check: Detects hedging, passive voice, and weak vocabulary.
    • Clarity Score: Real-time evaluation of speaking clarity.
    • Positive Reinforcement: Highlights strong vocabulary usage.
  • ⚑️ Ultra-Low Latency: Instant feedback without server round-trips.
  • πŸš€ WebGPU Optimized: Uses hardware acceleration for fast in-browser inference, with automatic WASM fallback.
  • πŸ“± PWA Support: Installable as a Progressive Web App with offline model caching.

Architecture

Speako is a pure frontend application with no backend server.

Project Structure

speako/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ components/       # UI components (split by feature)
β”‚   β”‚   β”œβ”€β”€ session/      # Recording session components
β”‚   β”‚   └── validation/   # Validation interface components
β”‚   β”œβ”€β”€ hooks/            # Custom hooks (useSessionManager, useValidation, etc.)
β”‚   β”œβ”€β”€ logic/            # Pure TS business logic
β”‚   β”‚   β”œβ”€β”€ local-transcriber.ts   # Whisper integration
β”‚   β”‚   β”œβ”€β”€ model-loader.ts        # Model singleton with WebGPU/WASM
β”‚   β”‚   β”œβ”€β”€ cefr-classifier.ts     # CEFR ML prediction
β”‚   β”‚   β”œβ”€β”€ grammar-checker.ts     # Grammar analysis
β”‚   β”‚   └── metrics-calculator.ts  # Speaking metrics
β”‚   └── types/            # TypeScript type definitions
β”œβ”€β”€ ml/                   # CEFR classifier training scripts
β”œβ”€β”€ scripts/              # Helper scripts
└── public/               # Static assets and local models

Prerequisites

  • Node.js 20+ (check with node -v)
  • Python 3.11+ with uv for ML training (optional)

Running Locally

# Install dependencies
npm install

# Start development server
npm run dev

Open http://localhost:5173.

Available Scripts

Script Description
npm run dev Start development server
npm run build Build for production
npm run preview Preview production build
npm run test Run unit tests
npm run lint Run ESLint
npm run format Format code with Prettier
npm run prepare:models Download models locally for offline testing
npm run prepare:data Convert corpus audio to WAV for validation
npm run cefr:verify Verify CEFR model is working
npm run deploy Build and deploy to Cloudflare Pages

Validation & Testing

For testing with real L2 learner audio, we use the Speak & Improve Corpus 2025 from Cambridge University Press & Assessment.

Step 1: Register & Download Corpus Package

  1. Visit ELiT Datasets - Speak & Improve Corpus 2025
  2. Complete the free registration and accept the license
  3. Download and extract sandi-corpus-2025.zip

Step 2: Download Audio Files

The audio files are hosted separately on S3. Download the dev set (smaller, for testing):

cd /path/to/sandi-corpus-2025
mkdir -p data && cd data

# Dev set (~2.7GB total)
curl -LO "https://speak-and-improve-corpus-2025.s3.eu-west-1.amazonaws.com/audio/data.flac.dev.01.zip"
curl -LO "https://speak-and-improve-corpus-2025.s3.eu-west-1.amazonaws.com/audio/data.flac.dev.02.zip"

# Unzip into data/flac/dev/
unzip data.flac.dev.01.zip
unzip data.flac.dev.02.zip

Step 3: Link to Project

cd /path/to/speako
ln -s /path/to/sandi-corpus-2025 ./test-data

Step 4: Prepare Validation Data

# Requires ffmpeg: brew install ffmpeg
npm run prepare:data

Corpus Details

Property Value
Duration ~315 hours of L2 learner audio
Format 16kHz FLAC
CEFR Levels A2–C1
Manual Transcriptions ~55 hours with disfluency annotations
License Non-commercial research only

Caution

Do not share the corpus publicly or include it in any repository. See the license agreement for full terms.

Running Validation

Validation is performed through the web interface:

  1. Start the development server: npm run dev
  2. Navigate to http://localhost:5173/#validate
  3. Use the validation controls to run tests on the corpus

Results are saved to validation-results.json.

Machine Learning

For information on training the CEFR classifier, see docs/ml.md.

Note

The CEFR model is trained on UniversalCEFR (CC-BY-NC-4.0) to ensure license compliance. The S&I Corpus is used for validation only.

Developer Guide

See AGENTS.md for coding standards and agent instructions.

Deployment

To build for production:

npm run build

This produces a static output in dist/ which can be deployed to any static host (Cloudflare Pages, Vercel, Netlify).

Deploy to Cloudflare Pages

npm run deploy

References

Core Technologies

  • Transformers.js – Run Transformers in the browser
  • Preact – Fast 3kB React alternative
  • Vite – Next Generation Frontend Tooling
  • Compromise – Modest natural-language processing

Models

WebGPU

Corpus

License

MIT

About

Privacy-first pronunciation coach. Real-time grammar and fluency analysis running 100% in-browser via WebAssembly & Whisper.

Topics

Resources

Stars

Watchers

Forks