File Wizard - Windows Edition

A self-hosted, browser-based utility for file conversion, OCR and audio/video transcription. It wraps common CLI and Python converters (FFmpeg, LibreOffice, Pandoc, ImageMagick, etc.), plus faster-whisper and Tesseract OCR.

Features

Convert between many file formats
OCR for PDFs and images (Tesseract / ocrmypdf)
Audio & Video transcription using Whisper (MP4, MKV, AVI, MOV, etc.)
Speaker diarization - automatically identify different speakers (requires pyannote.audio)
torchcodec for enhanced audio decoding (requires FFmpeg DLLs on Windows)
Simple, responsive dark UI with drag-and-drop
Background job processing with real-time status updates
/settings page for configuring tools and OAuth
CPU-only by default; GPU acceleration available

Installation

Quick Start — Windows

# Clone this repository
git clone https://github.com/akron2/filewizard-win.git
cd filewizard-win

# Python 3.10-3.12 recommended (3.13+ may have compatibility issues with some packages)
python --version

# Create and activate virtual environment
python -m venv venv

# Allow script execution (required once)
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

.\venv\Scripts\Activate.ps1

# Install dependencies
pip install --upgrade pip
pip install -r requirements_windows.txt

# Run the application
.\run.bat

Open http://localhost:8000 in your browser.

External Tools

For full functionality, install these tools:

Via Chocolatey (recommended)

# Install Chocolatey (run PowerShell as Administrator)
Set-ExecutionPolicy Bypass -Scope Process -Force
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072
iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

# Install tools
choco install ffmpeg
choco install tesseract
choco install libreeoffice
choco install pandoc
choco install poppler
choco install pkgconfiglite  # for html5_parser

Manual Installation

Tesseract OCR: https://github.com/UB-Mannheim/tesseract/wiki
FFmpeg: https://ffmpeg.org/download.html
LibreOffice: https://www.libreoffice.org/download/
Pandoc: https://pandoc.org/installing.html
Poppler: https://github.com/oschwartz10612/poppler-windows/releases

Speaker Diarization

Speaker diarization automatically identifies different speakers in conversations.

First-time Setup

When you first use diarization:

The app will automatically open Hugging Face pages in your browser
Log in (or create account)
Click "Accept" on model pages:
- https://huggingface.co/pyannote/speaker-diarization-3.1
- https://huggingface.co/pyannote/segmentation-3.0
Return to terminal and press Enter
Models will download automatically (~500MB)

Usage

Enable "Identify Speakers (Diarization)" checkbox when transcribing

Output format:

[SPEAKER_00]:
Hello, how are you?

[SPEAKER_01]:
I'm fine, thank you!

Usage

Open http://localhost:8000
Drag & drop or select files
Choose action: Convert, OCR, or Transcribe
Track progress in History table

Tools Table

Tool	Input Formats	Output Formats	Notes
LibreOffice	`.doc`, `.docx`, `.xls`, `.xlsx`, `.ppt`, `.pptx`, `.odt`, `.ods`, `.pdf`, `.rtf`, `.txt`, `.html`, `.csv`	`.pdf`, `.docx`, `.xlsx`, `.pptx`, `.odt`, `.html`, `.txt`, `.png`, `.jpg`	Office document conversion
Pandoc	`.md`, `.html`, `.tex`, `.docx`, `.odt`, `.epub`, `.rst`	`.pdf`, `.docx`, `.html`, `.epub`, `.md`, `.tex`, `.pptx`	Document conversion, requires LaTeX for PDF
Ghostscript	`.pdf`, `.ps`, `.eps`	`.pdf`, `.png`, `.jpg`, `.tiff`	PDF manipulation, rasterization
Calibre	`.epub`, `.mobi`, `.azw3`, `.fb2`, `.docx`, `.pdf`, `.html`	`.epub`, `.mobi`, `.azw3`, `.pdf`, `.docx`, `.txt`	E-book format conversion
FFmpeg	`.mp4`, `.mkv`, `.avi`, `.mov`, `.webm`, `.mp3`, `.wav`, `.flac`, `.aac`	`.mp4`, `.mkv`, `.avi`, `.mp3`, `.wav`, `.flac`, `.gif`	Audio/video transcoding
libvips	`.jpg`, `.png`, `.tiff`, `.webp`, `.avif`, `.heif`	`.jpg`, `.png`, `.webp`, `.avif`, `.tiff`	Fast image processing
GraphicsMagick	`.jpg`, `.png`, `.gif`, `.tiff`, `.bmp`, `.pdf`	`.jpg`, `.png`, `.gif`, `.tiff`, `.bmp`, `.pdf`	Image processing
ImageMagick	`.jpg`, `.png`, `.gif`, `.tiff`, `.bmp`, `.svg`	`.jpg`, `.png`, `.gif`, `.tiff`, `.bmp`, `.svg`	Image processing
Inkscape	`.svg`, `.pdf`, `.eps`, `.ai`, `.png`	`.svg`, `.pdf`, `.png`, `.eps`	Vector graphics
Tesseract OCR	`.png`, `.jpg`, `.tiff`, `.pdf` (images)	`.txt`, `.pdf` (searchable)	Text recognition
faster-whisper	`.mp3`, `.wav`, `.m4a`, `.flac`, `.ogg`, `.mp4`, `.mkv`, `.avi`	`.txt`, `.srt`, `.vtt`	Audio/video transcription

Troubleshooting

PowerShell script execution blocked

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Error: "TesseractNotFoundError"

choco install tesseract

Error: "ffmpeg not found"

choco install ffmpeg

Port 8000 already in use

Close previous instance or change port in run.bat.

Diarization not working

Ensure pyannote.audio is installed: pip install pyannote.audio pyannote.pipeline
Accept model terms on Hugging Face (see Speaker Diarization section)

Audio decoding fails with torchcodec error

If you see errors like OSError: Could not load this library: ...torchcodec\libtorchcodec_core*.dll:

Ensure torchcodec is installed: pip install torchcodec
If the error persists, torchcodec requires FFmpeg DLLs to be in PATH
Add your FFmpeg bin directory to PATH: set PATH=%PATH%;C:\path\to\ffmpeg\bin
Restart the application

Security

Warning: Exposing this app publicly without authentication risks arbitrary code execution. Intended for local use or behind OAuth/OIDC.

Additional Information

Original Repository: https://github.com/LoredCast/filewizard
This Repository: https://github.com/akron2/filewizard-win
Issues: https://github.com/akron2/filewizard-win/issues

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github		.github
Tasks		Tasks
static		static
templates		templates
$null		$null
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
diarization.py		diarization.py
main.py		main.py
memory.md		memory.md
requirements_windows.txt		requirements_windows.txt
run.bat		run.bat
screenshot.png		screenshot.png
settings.default.yml		settings.default.yml
setup_torchcodec.py		setup_torchcodec.py
start.ps1		start.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

File Wizard - Windows Edition

Features

Installation

Quick Start — Windows

External Tools

Via Chocolatey (recommended)

Manual Installation

Speaker Diarization

First-time Setup

Usage

Usage

Tools Table

Troubleshooting

PowerShell script execution blocked

Error: "TesseractNotFoundError"

Error: "ffmpeg not found"

Port 8000 already in use

Diarization not working

Audio decoding fails with torchcodec error

Security

Additional Information

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

File Wizard - Windows Edition

Features

Installation

Quick Start — Windows

External Tools

Via Chocolatey (recommended)

Manual Installation

Speaker Diarization

First-time Setup

Usage

Usage

Tools Table

Troubleshooting

PowerShell script execution blocked

Error: "TesseractNotFoundError"

Error: "ffmpeg not found"

Port 8000 already in use

Diarization not working

Audio decoding fails with torchcodec error

Security

Additional Information

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages