VisionScribe is a web application that leverages advanced AI models to generate detailed captions and narratives from images. Users can upload an image, and the application will provide a caption describing the contents. The generated captions are detailed and contextually enriched using the BLIP (Bootstrapping Language Image Pretraining) model.
- Image Upload: Users can upload PNG, JPG, or GIF images.
- Automatic Caption Generation: The application generates a basic caption and then enhances it with a detailed narrative using the BLIP model.
- Feedback System: Users can provide feedback by liking or disliking the generated caption and submitting additional comments.
- User Interface: A simple and responsive frontend built with Tailwind CSS and Alpine.js.
- Backend: Python, Flask
- AI Model: BLIP (Salesforce/blip-image-captioning-large)
- Database: SQLite
- Frontend: HTML, Tailwind CSS, Alpine.js
- Image Processing: PIL (Python Imaging Library)
- Deep Learning Framework: PyTorch
- Python 3.8+
- pip (Python package manager)
- torch
- transformers
- Flask
- Pillow
- SQLite (Pre-installed with Python)
git clone https://github.com/wprashed/visionscribe
cd visionscribepip install -r requirements.txtThe app uses SQLite to store user feedback. When the app starts, it will automatically create the feedback.db file.
python app.pyRun the following command to start the Flask server:
python app.pyThe server will start, and you can access the web app at http://127.0.0.1:5000.
- Navigate to the web interface at
http://127.0.0.1:5000. - Click on the "Click to upload" area or drag and drop an image.
- The app will process the image and generate a detailed caption.
- You can then provide feedback on the caption by either liking or disliking it and submitting additional comments.
After the caption is generated, you will have the option to:
- Like/Dislike: Provide feedback on the generated caption.
- Copy: Copy the caption to your clipboard.
- Submit Feedback: If you liked or disliked the caption, you can provide further comments which will be saved to the database.
The application uses SQLite to store user feedback, which includes:
- id: A unique identifier for each feedback.
- caption: The generated caption.
- liked: Whether the user liked the caption (1 for liked, 0 for disliked).
- comment: The feedback comment from the user.
Uploads an image and returns the generated caption.
Request Body (Form Data):
file: The image file to be uploaded.
Response:
{
"caption": "Generated caption text here"
}Submits user feedback on the caption.
Request Body (JSON):
{
"caption": "Generated caption text here",
"liked": 1,
"comment": "User's comment here"
}Response:
{
"success": true
}Feel free to fork the project, submit issues, and create pull requests. Contributions are welcome!
This project is licensed under the MIT License - see the LICENSE file for details.
- The BLIP model is provided by Salesforce for image captioning and has been integrated into this project.
- Flask is used to build the backend API.
- Tailwind CSS and Alpine.js are used to build the responsive and interactive frontend.