A web application for crowdsourcing audio recordings for speech-to-text (STT) and audio model training. Supports multiple corpora (text and music notation), crowd-validated quality control, and dataset export.
- Multi-corpus support: Manage multiple text and music corpora with language tagging
- Audio recording: Web-based recording with real-time waveform visualization
- Crowd validation: Users score recordings (1-5 scale) for quality control
- Quality metrics: Automatic quality scoring based on crowd validations
- Dataset export: Export validated recordings in CSV/JSON format (Whisper-compatible)
- GDPR compliant: Full data export, account deletion, and anonymization options
- Real-time waveform: Live visualization during recording using Canvas API
- Audio analysis: Automatic detection of silence ratio, peak amplitude, and duration
- Quality gates: Recordings must meet duration (0.5s-120s) and silence (<80%) requirements
- Format: 16kHz mono WAV for optimal STT compatibility
- Admin dashboard: Platform statistics including users, recordings, validations, and disk usage
- User management: View users, change roles (user/admin), delete accounts
- Corpus management: Create corpora, upload source files, reprocess prompts
- Flagged recordings: Review low-quality or high-variance recordings
- Export tools: Export datasets with quality filtering and statistics
- Internationalization (i18n): Full support for English, Finnish, and Swedish
- Cookie consent: Informational banner explaining local storage usage
- Recording consent: Explicit consent gate before users can record
- Privacy & Terms: Built-in Privacy Policy and Terms of Service pages
- Dark mode: User-selectable light/dark theme with system preference detection
- Disk space monitoring: Automatic upload blocking when storage is low (<200MB)
- Progress indicators: Upload progress bars and corpus processing status
- Backend: Node.js (ESM), Express
- Frontend: React, Vite
- Database: PostgreSQL
- Audio: Web Audio API, WAV/WebM format
- Node.js 18+
- PostgreSQL 14+
- Clone the repository:
git clone <repository-url>
cd crowd-source-voice- Install dependencies:
npm install
cd client && npm install && cd ..- Set up environment variables:
cp .env.example .env
# Edit .env with your database credentials- Create the database (here, with a docker example):
docker compose up -d- Run migrations:
npm run db:migrate- (Optional) Seed with sample data:
npm run db:seed- Start development servers:
npm run devThe app will be available at http://localhost:5173
POST /api/auth/register- Register new userPOST /api/auth/login- LoginPOST /api/auth/logout- LogoutGET /api/auth/me- Get current user
POST /api/corpus- Create corpusPOST /api/corpus/:id/upload- Upload corpus fileGET /api/corpus- List corpora with statsGET /api/corpus/:id- Get corpus detailsDELETE /api/corpus/:id- Delete corpus
GET /api/prompt?corpus_id=- Get next promptPOST /api/prompt/:id/skip- Skip a promptPOST /api/recording- Upload recording
GET /api/validation- Get recording to validatePOST /api/validation- Submit validation scoreGET /api/validation/stats- Get validation statistics
GET /api/me/recordings- Get own recordingsGET /api/me/stats- Get user statisticsGET /api/me/export- Export all personal data (GDPR)DELETE /api/me- Delete account and all dataPOST /api/me/anonymize- Delete account, keep anonymous recordingsPOST /api/me/consent/recording- Give recording consentDELETE /api/me/consent/recording- Withdraw recording consent
GET /api/admin/stats- Platform statistics (users, recordings, disk space)GET /api/admin/users- List all usersPUT /api/admin/users/:id- Update user roleDELETE /api/admin/users/:id- Delete user account
GET /api/export?corpus_id=&format=csv|json- Export datasetGET /api/export/stats- Export statistics per corpusGET /api/export/manifest?corpus_id=- Get file manifest for export
.txt- Plain text, split by sentences.json- Array of strings or objects withtextfield.csv- One text per line
.abc- ABC notation, split by tune (X: headers).txt- One melody per line
Before submission, recordings are analyzed for:
- Duration: Must be between 0.5 and 120 seconds
- Silence ratio: Must be less than 80% silence
- Audio level: Peak amplitude is measured for quality feedback
Recordings are accepted for export when:
- At least 2 validations from different users
- Average score >= 4.0 (on 1-5 scale)
Flagged recordings (low scores or high variance) appear in admin review.
The application supports multiple languages:
- English (EN) - Default
- Finnish (FI)
- Swedish (SV)
Users can switch languages via the header dropdown. The selected language is persisted in localStorage.
To add a new language:
- Create a new translation file in
client/src/i18n/(e.g.,de.js) - Export the translations object with all required keys
- Add the language to the
languagesarray inclient/src/i18n/index.js
The application supports light and dark modes:
- Users can toggle between themes via the header
- Theme preference is saved to localStorage
- System preference is detected on first visit
Theme variables are defined in client/src/index.css using CSS custom properties.
- Create a droplet (Ubuntu 22.04)
- Install Node.js and PostgreSQL
- Clone repository and install dependencies
- Set up Nginx as reverse proxy
- Configure SSL with Let's Encrypt
- Use PM2 for process management
Example Nginx config:
server {
listen 80;
server_name yourdomain.com;
location / {
proxy_pass http://localhost:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}MIT