🎤 Note Taker

A modern, user-friendly web application for converting audio to text using state-of-the-art AI models. Built with Gradio and powered by Hugging Face Transformers.

✨ Features

Microphone Recording: Record audio directly in your browser
File Upload: Support for various audio formats (WAV, MP3, FLAC, M4A, etc.)
Real-time Transcription: Fast and accurate speech recognition
Timestamp Support: Optional timestamps for each transcribed segment
Clean UI: Modern, responsive interface built with Gradio
Local Processing: All processing happens locally (no data sent to external servers)

🚀 Quick Start

Prerequisites

Python 3.10 or higher
uv package manager (install from here)

Installation

Clone or create the project directory:

mkdir speech-to-text-app
cd speech-to-text-app

Install dependencies using uv:
```
uv sync
```
Run the application:
```
uv run python app.py
```
Open your browser: Navigate to http://localhost:7860 to use the application.

📖 Usage

Recording Audio

Click on the "🎙️ Record Audio" tab
Click the microphone button to start recording
Speak clearly into your microphone
Click stop when finished
Click "Transcribe Recording" to convert to text

Uploading Audio Files

Click on the "📁 Upload Audio File" tab
Drag and drop or browse for your audio file
Click "Transcribe File" to convert to text

Timestamps (Optional)

Use "Transcribe with Timestamps" buttons to get time-coded transcriptions
Useful for creating subtitles or precise audio analysis

🛠️ Technical Details

Dependencies

Gradio: Web interface framework
Transformers: Hugging Face model library
Librosa: Audio processing library
PyTorch: Deep learning framework
Soundfile: Audio file I/O

Model Information

Model: distil-whisper/distil-small.en
Language: English only
Sample Rate: 16kHz (automatically resampled)
Chunk Processing: 30-second chunks for long audio

Audio Processing Pipeline

Load Audio: Using librosa with automatic format detection
Convert to Mono: Stereo audio is converted to mono
Resample: Audio is resampled to 16kHz if needed
Transcribe: Processed through Whisper model
Format Output: Clean text output with optional timestamps

🔧 Development

Project Structure

speech-to-text-app/
├── app.py              # Main application file
├── pyproject.toml      # Project dependencies and configuration
├── README.md           # This file
└── .python-version     # Python version specification (created by uv)

Adding Features

The modular design makes it easy to extend:

New Models: Replace the model in SpeechToTextApp.__init__()
Audio Formats: Librosa supports most common formats automatically
UI Customization: Modify the CSS and Gradio components
Processing Options: Add new transcription parameters

Development Setup

Install development dependencies:
```
uv sync --extra dev
```
Run with auto-reload:
```
uv run gradio app.py
```

🔍 Troubleshooting

Common Issues

Model Download: First run may take time to download the model
Memory Usage: Large audio files may require more RAM
Browser Permissions: Ensure microphone access is granted
Audio Format: If upload fails, try converting to WAV or MP3

Performance Tips

Shorter Clips: Under 5 minutes for best performance
Clear Audio: Minimal background noise improves accuracy
Good Microphone: Higher quality input = better transcription

📝 License

This project is open source and available under the MIT License.

🤝 Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

🙏 Acknowledgments

Hugging Face for the Transformers library and models
Gradio for the excellent web UI framework
OpenAI Whisper for the base model architecture

Made with ❤️ using Gradio and Transformers

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎤 Note Taker

✨ Features

🚀 Quick Start

Prerequisites

Installation

📖 Usage

Recording Audio

Uploading Audio Files

Timestamps (Optional)

🛠️ Technical Details

Dependencies

Model Information

Audio Processing Pipeline

🔧 Development

Project Structure

Adding Features

Development Setup

🔍 Troubleshooting

Common Issues

Performance Tips

📝 License

🤝 Contributing

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

afsalmarattil/note_taker

Folders and files

Latest commit

History

Repository files navigation

🎤 Note Taker

✨ Features

🚀 Quick Start

Prerequisites

Installation

📖 Usage

Recording Audio

Uploading Audio Files

Timestamps (Optional)

🛠️ Technical Details

Dependencies

Model Information

Audio Processing Pipeline

🔧 Development

Project Structure

Adding Features

Development Setup

🔍 Troubleshooting

Common Issues

Performance Tips

📝 License

🤝 Contributing

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages