Stars
FFmpeg libav tutorial - learn how media works from basic to transmuxing, transcoding and more. Translations: 🇺🇸 🇨🇳 🇰🇷 🇪🇸 🇻🇳 🇧🇷 🇷🇺
🍰 Desktop utility to download images/videos/music/text from various websites, and more.
A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…
An interactive Japanese text analysis and speech synthesis web app
A lightweight, cross-platform remote desktop software with support for Web Client access | 一款支持 Web 客户端访问的轻量级跨平台远程桌面软件。
Send files and folders anywhere in the world without storing in cloud - any size, any format, no accounts, no restrictions.
Expose the contents of .docx files without leaving your terminal. Fast, safe, and smart — no Office required!
Portable file server with accelerated resumable uploads, dedup, WebDAV, SFTP, FTP, TFTP, zeroconf, media indexer, thumbnails++ all in one file
tfw when you when your lid when uhh angle your lid sensor
Porting Tsukihime to the web to make it accessible on different devices with QoL improvements
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.
Real-time webcam demo with SmolVLM and llama.cpp server
Apple Music decryption tool, inspired by zhaarey/apple-music-alac-atmos-downloader
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
A lightweight LMM-based Document Parsing Model
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
A TTS model capable of generating ultra-realistic dialogue in one pass.
Lets make video diffusion practical!
Automate your mobile devices with natural language commands - an LLM agnostic mobile Agent 🤖
[ACM MM 2025] FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and te…
[IJCV 2025] Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait

