Mailgun library (https://github.com/mailgun/talon) to extract message quotations and signatures hosted as web-api in a docker container.
If you ever tried to parse message quotations or signatures you know that absence of any formatting standards in this area could make this task a nightmare. Hopefully this library will make your life much easier. The name of the project is inspired by TALON - multipurpose robot designed to perform missions ranging from reconnaissance to combat and operate in a number of hostile environments. That’s what a good quotations and signature parser should be like 😄
Talon can be used as a webservice. Can be invoked by using the script.
docker run -p 5505:5505 ghcr.io/bobiene/talon-web:latest
./run-web.sh
Or via docker
./build-dock.sh
./run-dock.sh
/health(Health check)/talon/signature/talon/quotations/text/talon/quotations/html/talon/html-to-markdown/talon/html-to-markdown-direct
Health check endpoint for monitoring and load balancers.
{
"status": "healthy",
"service": "talon-web-api",
"version": "1.6.0",
"endpoints": [
"/talon/signature",
"/talon/quotations/text",
"/talon/quotations/html",
"/talon/html-to-markdown",
"/talon/html-to-markdown-direct"
]
}| Post-Parameter | provision | comment |
|---|---|---|
| email_content | requiered | plain text of the e-mail body |
| email_sender | requiered | e-mail address of the sender |
{
"email_content": "<<content-of-post-parameter email_content>>",
"email_sender": "<<content-of-post-parameter email_sender>>",
"email_body": "<<striped-e-mail-text (without signature)>>",
"email_signature": "<<signature, if found>>|None"
}| Post-Parameter | provision | comment |
|---|---|---|
| email_content | requiered | plain text of the e-mail body |
| email_sender | optional | e-mail address of the sender, if provided not only the quotation is stripped of but also the signature if found |
without email_sender
{
"email_content": "<<content-of-post-parameter email_content>>",
"email_reply": "<<striped-e-mail-text>>"
}with email_sender
{
"email_content": "<<content-of-post-parameter email_content>>",
"email_sender": "<<content-of-post-parameter email_sender>>",
"email_reply": "<<striped-e-mail-text (without signature)>>",
"email_signature": "<<signature, if found>>|None"
}| Post-Parameter | provision | comment |
|---|---|---|
| email_content | requiered | HTML of the e-mail body |
| email_sender | optional | e-mail address of the sender, if provided not only the quotation is stripped of but also the signature if found |
without email_sender
{
"email_content": "<<content-of-post-parameter email_content>>",
"email_reply": "<<striped-e-mail-text>>"
}with email_sender
{
"email_content": "<<content-of-post-parameter email_content>>",
"email_sender": "<<content-of-post-parameter email_sender>>",
"email_reply": "<<striped-e-mail-text (without signature)>>",
"email_signature": "<<signature, if found>>|None"
}For endpoint /talon/signature, invoked as a get or post request. Curl Sample:
curl --location --request GET 'http://127.0.0.1:5505/talon/signature' \
--form 'email_content="Hi,
This is just a test.
Thanks,
John Doe
mobile: 052543453
email: john.doe@anywebsite.ph
website: www.anywebsite.ph"' \
--form 'email_sender="John Doe . . <john.doe@anywebsite.ph>"'
You will be required to pass a body of type form-data as a parameter.
Keys are email_content and email_sender.
Response will include email_signature. Sample response below:
{
"email_content": "Hi,\n\nThis is just a test.\n\nThanks,\nJohn Doe\nmobile: 052543453\nemail: john.doe@anywebsite.ph\nwebsite: www.anywebsite.ph",
"email_sender": "John Doe . . <john.doe@anywebsite.ph>",
"email_signature": "Thanks,\nJohn Doe\nmobile: 052543453\nemail: john.doe@anywebsite.ph\nwebsite: www.anywebsite.ph"
}
The library is inspired by the following research papers and projects:
- http://www.cs.cmu.edu/~vitor/papers/sigFilePaper_finalversion.pdf
- http://www.cs.cornell.edu/people/tj/publications/joachims_01a.pdf
Converts HTML to Markdown using Talon's intelligent signature and quotation detection combined with html2text.
| Post-Parameter | provision | comment |
|---|---|---|
| html | requiered | HTML content to be converted |
| sender | optional | sender's email address for enhanced signature detection |
JSON Input:
{
"html": "<html><body><h1>Title</h1><p>Content...</p><hr><p>Best regards...</p></body></html>",
"sender": "max@example.com"
}Alternative Form-Data Input:
html_content: HTML contentemail_sender: sender email (optional)
{
"original_html": "<<content-of-post-parameter html>>",
"markdown": "# Title\n\nContent...",
"removed_signature": "Best regards\nMax Mustermann",
"sender": "max@example.com",
"success": true
}Direct HTML to Markdown conversion with basic signature pattern recognition, without Talon's quotation extraction.
| Post-Parameter | provision | comment |
|---|---|---|
| html | requiered | HTML content to be converted |
JSON Input:
{
"html": "<html><body><h1>Title</h1><p>Content...</p><hr><p>Best regards...</p></body></html>"
}Alternative Form-Data Input:
html_content: HTML content
{
"original_html": "<<content-of-post-parameter html>>",
"markdown": "# Title\n\nContent...",
"success": true
}With Talon's intelligent detection:
curl -X POST 'http://127.0.0.1:5505/talon/html-to-markdown' \
--header 'Content-Type: application/json' \
--data '{
"html": "<h1>Test</h1><p>Important content</p><hr><p>Best regards<br>Max Mustermann</p>",
"sender": "max@example.com"
}'Direct conversion:
curl -X POST 'http://127.0.0.1:5505/talon/html-to-markdown-direct' \
--header 'Content-Type: application/json' \
--data '{
"html": "<h1>Test</h1><p>Important content</p><hr><p>Best regards<br>Max Mustermann</p>"
}'The HTML-to-Markdown endpoints recognize the following signature patterns:
German Patterns:
- "Mit freundlichen Grüßen"
- "Freundliche Grüße"
- "Viele Grüße"
English Patterns:
- "Best regards"
- "Kind regards"
- "Sincerely"
Technical Patterns:
<hr>tags (everything after the tag is removed)--separators- CSS classes containing "signature"
- Gmail/Outlook signature blocks
- HTML-to-Markdown Conversion: Uses html2text for clean Markdown output
- Intelligent Signature Removal: Combines Talon's ML-based detection with pattern matching
- Flexible Input: Supports both JSON and form-data inputs
- Multilingual Support: Recognizes German and English signature patterns
- AI-Powered Image Processing: Download and describe images using OpenAI Vision API
- Two Conversion Modes:
- Intelligent with Talon's quotation extraction
- Direct with simple pattern recognition
The /talon/html-to-markdown endpoint supports automatic image processing with AI-generated descriptions:
| Parameter | Type | Description | Default |
|---|---|---|---|
openai_api_key |
string | OpenAI API Key for image descriptions | null (disabled) |
base_url |
string | Base URL for resolving relative image URLs | null |
image_path |
string | Local directory for downloaded images | "./images/" |
image_prefix |
string | Prefix for downloaded image filenames | "" |
- Image Detection: Finds all
<img>tags in HTML - URL Resolution: Converts relative URLs to absolute using
base_url - Image Download: Downloads images to local
image_path - AI Description: Uses OpenAI Vision API to generate German descriptions
- Markdown Enhancement: Replaces images with enhanced format
Images are converted to this enhanced format:

> **Bildbeschreibung (KI):**
> Ein roter VW ID.7 Tourer steht im Schnee vor einem Einfamilienhaus. Im Hintergrund sind verschneite Bäume und ein bewölkter Himmel zu sehen.curl -X POST 'http://127.0.0.1:5505/talon/html-to-markdown' \
--header 'Content-Type: application/json' \
--data '{
"html": "<h1>Newsletter</h1><p>Unser neues Auto:</p><img src=\"car.jpg\" alt=\"Neues Fahrzeug\" />",
"sender": "marketing@auto.de",
"openai_api_key": "sk-...",
"base_url": "https://company.com/newsletter/",
"image_path": "./downloads/newsletter/",
"image_prefix": "auto-2024-"
}'{
"markdown": "# Newsletter\n\nUnser neues Auto:\n\n\n\n> **Bildbeschreibung (KI):**\n> Ein silberner SUV steht in einer modernen Ausstellungshalle mit Glasfront und Beleuchtung.",
"processed_images": ["car.jpg"],
"downloaded_images": ["./downloads/newsletter/auto-2024-abc123.jpg"],
"original_html": "...",
"email_sender": "marketing@auto.de"
}- Uses
gpt-4.1-minimodel for cost efficiency - Images processed at low detail level
- Only processes images when
openai_api_keyis provided - Skips data URLs (base64 embedded images)