A self-hosted, browser-based utility for file conversion, OCR and audio/video transcription. It wraps common CLI and Python converters (FFmpeg, LibreOffice, Pandoc, ImageMagick, etc.), plus faster-whisper and Tesseract OCR.
- Convert between many file formats
- OCR for PDFs and images (Tesseract / ocrmypdf)
- Audio & Video transcription using Whisper (MP4, MKV, AVI, MOV, etc.)
- Speaker diarization - automatically identify different speakers (requires pyannote.audio)
- torchcodec for enhanced audio decoding (requires FFmpeg DLLs on Windows)
- Simple, responsive dark UI with drag-and-drop
- Background job processing with real-time status updates
/settingspage for configuring tools and OAuth- CPU-only by default; GPU acceleration available
# Clone this repository
git clone https://github.com/akron2/filewizard-win.git
cd filewizard-win
# Python 3.10-3.12 recommended (3.13+ may have compatibility issues with some packages)
python --version
# Create and activate virtual environment
python -m venv venv
# Allow script execution (required once)
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
.\venv\Scripts\Activate.ps1
# Install dependencies
pip install --upgrade pip
pip install -r requirements_windows.txt
# Run the application
.\run.batOpen http://localhost:8000 in your browser.
For full functionality, install these tools:
# Install Chocolatey (run PowerShell as Administrator)
Set-ExecutionPolicy Bypass -Scope Process -Force
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072
iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
# Install tools
choco install ffmpeg
choco install tesseract
choco install libreeoffice
choco install pandoc
choco install poppler
choco install pkgconfiglite # for html5_parser- Tesseract OCR: https://github.com/UB-Mannheim/tesseract/wiki
- FFmpeg: https://ffmpeg.org/download.html
- LibreOffice: https://www.libreoffice.org/download/
- Pandoc: https://pandoc.org/installing.html
- Poppler: https://github.com/oschwartz10612/poppler-windows/releases
Speaker diarization automatically identifies different speakers in conversations.
When you first use diarization:
- The app will automatically open Hugging Face pages in your browser
- Log in (or create account)
- Click "Accept" on model pages:
- Return to terminal and press Enter
- Models will download automatically (~500MB)
- Enable "Identify Speakers (Diarization)" checkbox when transcribing
- Output format:
[SPEAKER_00]: Hello, how are you? [SPEAKER_01]: I'm fine, thank you!
- Open http://localhost:8000
- Drag & drop or select files
- Choose action: Convert, OCR, or Transcribe
- Track progress in History table
| Tool | Input Formats | Output Formats | Notes |
|---|---|---|---|
| LibreOffice | .doc, .docx, .xls, .xlsx, .ppt, .pptx, .odt, .ods, .pdf, .rtf, .txt, .html, .csv |
.pdf, .docx, .xlsx, .pptx, .odt, .html, .txt, .png, .jpg |
Office document conversion |
| Pandoc | .md, .html, .tex, .docx, .odt, .epub, .rst |
.pdf, .docx, .html, .epub, .md, .tex, .pptx |
Document conversion, requires LaTeX for PDF |
| Ghostscript | .pdf, .ps, .eps |
.pdf, .png, .jpg, .tiff |
PDF manipulation, rasterization |
| Calibre | .epub, .mobi, .azw3, .fb2, .docx, .pdf, .html |
.epub, .mobi, .azw3, .pdf, .docx, .txt |
E-book format conversion |
| FFmpeg | .mp4, .mkv, .avi, .mov, .webm, .mp3, .wav, .flac, .aac |
.mp4, .mkv, .avi, .mp3, .wav, .flac, .gif |
Audio/video transcoding |
| libvips | .jpg, .png, .tiff, .webp, .avif, .heif |
.jpg, .png, .webp, .avif, .tiff |
Fast image processing |
| GraphicsMagick | .jpg, .png, .gif, .tiff, .bmp, .pdf |
.jpg, .png, .gif, .tiff, .bmp, .pdf |
Image processing |
| ImageMagick | .jpg, .png, .gif, .tiff, .bmp, .svg |
.jpg, .png, .gif, .tiff, .bmp, .svg |
Image processing |
| Inkscape | .svg, .pdf, .eps, .ai, .png |
.svg, .pdf, .png, .eps |
Vector graphics |
| Tesseract OCR | .png, .jpg, .tiff, .pdf (images) |
.txt, .pdf (searchable) |
Text recognition |
| faster-whisper | .mp3, .wav, .m4a, .flac, .ogg, .mp4, .mkv, .avi |
.txt, .srt, .vtt |
Audio/video transcription |
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUserchoco install tesseractchoco install ffmpegClose previous instance or change port in run.bat.
- Ensure pyannote.audio is installed:
pip install pyannote.audio pyannote.pipeline - Accept model terms on Hugging Face (see Speaker Diarization section)
If you see errors like OSError: Could not load this library: ...torchcodec\libtorchcodec_core*.dll:
- Ensure torchcodec is installed:
pip install torchcodec - If the error persists, torchcodec requires FFmpeg DLLs to be in PATH
- Add your FFmpeg bin directory to PATH:
set PATH=%PATH%;C:\path\to\ffmpeg\bin - Restart the application
Warning: Exposing this app publicly without authentication risks arbitrary code execution. Intended for local use or behind OAuth/OIDC.
- Original Repository: https://github.com/LoredCast/filewizard
- This Repository: https://github.com/akron2/filewizard-win
- Issues: https://github.com/akron2/filewizard-win/issues
