Extract entities, classify text, parse structured data, and extract relationsβall in one efficient model.
GLiNER2 unifies Named Entity Recognition, Text Classification, Structured Data Extraction, and Relation Extraction into a single 205M parameter model. It provides efficient CPU-based inference without requiring complex pipelines or external API dependencies.
- π― One Model, Four Tasks: Entities, classification, structured data, and relations in a single forward pass
- π» CPU First: Lightning-fast inference on standard hardwareβno GPU required
- π‘οΈ Privacy: 100% local processing, zero external dependencies
pip install gliner2from gliner2 import GLiNER2
# Load model once, use everywhere
extractor = GLiNER2.from_pretrained("fastino/gliner2-base-v1")
# Extract entities in one line
text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday."
result = extractor.extract_entities(text, ["company", "person", "product", "location"])
print(result)
# {'entities': {'company': ['Apple'], 'person': ['Tim Cook'], 'product': ['iPhone 15'], 'location': ['Cupertino']}}Our biggest and most powerful modelβGLiNER XL 1Bβis available exclusively via API. No GPU required, no model downloads, just instant access to state-of-the-art extraction. Get your API key at gliner.pioneer.ai.
from gliner2 import GLiNER2
# Access GLiNER XL 1B via API
extractor = GLiNER2.from_api() # Uses PIONEER_API_KEY env variable
result = extractor.extract_entities(
"OpenAI CEO Sam Altman announced GPT-5 at their San Francisco headquarters.",
["company", "person", "product", "location"]
)
# {'entities': {'company': ['OpenAI'], 'person': ['Sam Altman'], 'product': ['GPT-5'], 'location': ['San Francisco']}}| Model | Parameters | Description | Use Case |
|---|---|---|---|
fastino/gliner2-base-v1 |
205M | base size | Extraction / classification |
fastino/gliner2-large-v1 |
340M | large size | Extraction / classification |
The models are available on Hugging Face.
Comprehensive guides for all GLiNER2 features:
- Text Classification - Single and multi-label classification with confidence scores
- Entity Extraction - Named entity recognition with descriptions and spans
- Structured Data Extraction - Parse complex JSON structures from text
- Combined Schemas - Multi-task extraction in a single pass
- Regex Validators - Filter and validate extracted spans
- Relation Extraction - Extract relationships between entities
- API Access - Use GLiNER2 via cloud API
- Training Data Format - Complete guide to preparing training data
- Model Training - Train custom models for your domain
- LoRA Adapters - Parameter-efficient fine-tuning
- Adapter Switching - Switch between domain adapters
Extract named entities with optional descriptions for precision:
# Basic entity extraction
entities = extractor.extract_entities(
"Patient received 400mg ibuprofen for severe headache at 2 PM.",
["medication", "dosage", "symptom", "time"]
)
# Output: {'entities': {'medication': ['ibuprofen'], 'dosage': ['400mg'], 'symptom': ['severe headache'], 'time': ['2 PM']}}
# Enhanced with descriptions for medical accuracy
entities = extractor.extract_entities(
"Patient received 400mg ibuprofen for severe headache at 2 PM.",
{
"medication": "Names of drugs, medications, or pharmaceutical substances",
"dosage": "Specific amounts like '400mg', '2 tablets', or '5ml'",
"symptom": "Medical symptoms, conditions, or patient complaints",
"time": "Time references like '2 PM', 'morning', or 'after lunch'"
}
)
# Same output but with higher accuracy due to context descriptions
# With confidence scores
entities = extractor.extract_entities(
"Apple Inc. CEO Tim Cook announced iPhone 15 in Cupertino.",
["company", "person", "product", "location"],
include_confidence=True
)
# Output: {
# 'entities': {
# 'company': [{'text': 'Apple Inc.', 'confidence': 0.95}],
# 'person': [{'text': 'Tim Cook', 'confidence': 0.92}],
# 'product': [{'text': 'iPhone 15', 'confidence': 0.88}],
# 'location': [{'text': 'Cupertino', 'confidence': 0.90}]
# }
# }
# With character positions (spans)
entities = extractor.extract_entities(
"Apple Inc. CEO Tim Cook announced iPhone 15 in Cupertino.",
["company", "person", "product"],
include_spans=True
)
# Output: {
# 'entities': {
# 'company': [{'text': 'Apple Inc.', 'start': 0, 'end': 9}],
# 'person': [{'text': 'Tim Cook', 'start': 15, 'end': 23}],
# 'product': [{'text': 'iPhone 15', 'start': 35, 'end': 44}]
# }
# }
# With both confidence and spans
entities = extractor.extract_entities(
"Apple Inc. CEO Tim Cook announced iPhone 15 in Cupertino.",
["company", "person", "product"],
include_confidence=True,
include_spans=True
)
# Output: {
# 'entities': {
# 'company': [{'text': 'Apple Inc.', 'confidence': 0.95, 'start': 0, 'end': 9}],
# 'person': [{'text': 'Tim Cook', 'confidence': 0.92, 'start': 15, 'end': 23}],
# 'product': [{'text': 'iPhone 15', 'confidence': 0.88, 'start': 35, 'end': 44}]
# }
# }Single or multi-label classification with configurable confidence:
# Sentiment analysis
result = extractor.classify_text(
"This laptop has amazing performance but terrible battery life!",
{"sentiment": ["positive", "negative", "neutral"]}
)
# Output: {'sentiment': 'negative'}
# Multi-aspect classification
result = extractor.classify_text(
"Great camera quality, decent performance, but poor battery life.",
{
"aspects": {
"labels": ["camera", "performance", "battery", "display", "price"],
"multi_label": True,
"cls_threshold": 0.4
}
}
)
# Output: {'aspects': ['camera', 'performance', 'battery']}
# With confidence scores
result = extractor.classify_text(
"This laptop has amazing performance but terrible battery life!",
{"sentiment": ["positive", "negative", "neutral"]},
include_confidence=True
)
# Output: {'sentiment': {'label': 'negative', 'confidence': 0.82}}
# Multi-label with confidence
schema = extractor.create_schema().classification(
"topics",
["technology", "business", "health", "politics", "sports"],
multi_label=True,
cls_threshold=0.3
)
text = "Apple announced new health monitoring features in their latest smartwatch, boosting their stock price."
results = extractor.extract(text, schema, include_confidence=True)
# Output: {
# 'topics': [
# {'label': 'technology', 'confidence': 0.92},
# {'label': 'business', 'confidence': 0.78},
# {'label': 'health', 'confidence': 0.65}
# ]
# }Parse complex structured information with field-level control:
# Product information extraction
text = "iPhone 15 Pro Max with 256GB storage, A17 Pro chip, priced at $1199. Available in titanium and black colors."
result = extractor.extract_json(
text,
{
"product": [
"name::str::Full product name and model",
"storage::str::Storage capacity like 256GB or 1TB",
"processor::str::Chip or processor information",
"price::str::Product price with currency",
"colors::list::Available color options"
]
}
)
# Output: {
# 'product': [{
# 'name': 'iPhone 15 Pro Max',
# 'storage': '256GB',
# 'processor': 'A17 Pro chip',
# 'price': '$1199',
# 'colors': ['titanium', 'black']
# }]
# }
# Multiple structured entities
text = "Apple Inc. headquarters in Cupertino launched iPhone 15 for $999 and MacBook Air for $1299."
result = extractor.extract_json(
text,
{
"company": [
"name::str::Company name",
"location::str::Company headquarters or office location"
],
"products": [
"name::str::Product name and model",
"price::str::Product retail price"
]
}
)
# Output: {
# 'company': [{'name': 'Apple Inc.', 'location': 'Cupertino'}],
# 'products': [
# {'name': 'iPhone 15', 'price': '$999'},
# {'name': 'MacBook Air', 'price': '$1299'}
# ]
# }
# With confidence scores
result = extractor.extract_json(
"The MacBook Pro costs $1999 and features M3 chip, 16GB RAM, and 512GB storage.",
{
"product": [
"name::str",
"price",
"features"
]
},
include_confidence=True
)
# Output: {
# 'product': [{
# 'name': {'text': 'MacBook Pro', 'confidence': 0.95},
# 'price': [{'text': '$1999', 'confidence': 0.92}],
# 'features': [
# {'text': 'M3 chip', 'confidence': 0.88},
# {'text': '16GB RAM', 'confidence': 0.90},
# {'text': '512GB storage', 'confidence': 0.87}
# ]
# }]
# }
# With character positions (spans)
result = extractor.extract_json(
"The MacBook Pro costs $1999 and features M3 chip.",
{
"product": [
"name::str",
"price"
]
},
include_spans=True
)
# Output: {
# 'product': [{
# 'name': {'text': 'MacBook Pro', 'start': 4, 'end': 15},
# 'price': [{'text': '$1999', 'start': 22, 'end': 27}]
# }]
# }
# With both confidence and spans
result = extractor.extract_json(
"The MacBook Pro costs $1999 and features M3 chip, 16GB RAM, and 512GB storage.",
{
"product": [
"name::str",
"price",
"features"
]
},
include_confidence=True,
include_spans=True
)
# Output: {
# 'product': [{
# 'name': {'text': 'MacBook Pro', 'confidence': 0.95, 'start': 4, 'end': 15},
# 'price': [{'text': '$1999', 'confidence': 0.92, 'start': 22, 'end': 27}],
# 'features': [
# {'text': 'M3 chip', 'confidence': 0.88, 'start': 32, 'end': 39},
# {'text': '16GB RAM', 'confidence': 0.90, 'start': 41, 'end': 49},
# {'text': '512GB storage', 'confidence': 0.87, 'start': 55, 'end': 68}
# ]
# }]
# }Extract relationships between entities as directional tuples:
# Basic relation extraction
text = "John works for Apple Inc. and lives in San Francisco. Apple Inc. is located in Cupertino."
result = extractor.extract_relations(
text,
["works_for", "lives_in", "located_in"]
)
# Output: {
# 'relation_extraction': {
# 'works_for': [('John', 'Apple Inc.')],
# 'lives_in': [('John', 'San Francisco')],
# 'located_in': [('Apple Inc.', 'Cupertino')]
# }
# }
# With descriptions for better accuracy
schema = extractor.create_schema().relations({
"works_for": "Employment relationship where person works at organization",
"founded": "Founding relationship where person created organization",
"acquired": "Acquisition relationship where company bought another company",
"located_in": "Geographic relationship where entity is in a location"
})
text = "Elon Musk founded SpaceX in 2002. SpaceX is located in Hawthorne, California."
results = extractor.extract(text, schema)
# Output: {
# 'relation_extraction': {
# 'founded': [('Elon Musk', 'SpaceX')],
# 'located_in': [('SpaceX', 'Hawthorne, California')]
# }
# }
# With confidence scores
results = extractor.extract_relations(
"John works for Apple Inc. and lives in San Francisco.",
["works_for", "lives_in"],
include_confidence=True
)
# Output: {
# 'relation_extraction': {
# 'works_for': [{
# 'head': {'text': 'John', 'confidence': 0.95},
# 'tail': {'text': 'Apple Inc.', 'confidence': 0.92}
# }],
# 'lives_in': [{
# 'head': {'text': 'John', 'confidence': 0.94},
# 'tail': {'text': 'San Francisco', 'confidence': 0.91}
# }]
# }
# }
# With character positions (spans)
results = extractor.extract_relations(
"John works for Apple Inc. and lives in San Francisco.",
["works_for", "lives_in"],
include_spans=True
)
# Output: {
# 'relation_extraction': {
# 'works_for': [{
# 'head': {'text': 'John', 'start': 0, 'end': 4},
# 'tail': {'text': 'Apple Inc.', 'start': 15, 'end': 25}
# }],
# 'lives_in': [{
# 'head': {'text': 'John', 'start': 0, 'end': 4},
# 'tail': {'text': 'San Francisco', 'start': 33, 'end': 46}
# }]
# }
# }
# With both confidence and spans
results = extractor.extract_relations(
"John works for Apple Inc. and lives in San Francisco.",
["works_for", "lives_in"],
include_confidence=True,
include_spans=True
)
# Output: {
# 'relation_extraction': {
# 'works_for': [{
# 'head': {'text': 'John', 'confidence': 0.95, 'start': 0, 'end': 4},
# 'tail': {'text': 'Apple Inc.', 'confidence': 0.92, 'start': 15, 'end': 25}
# }],
# 'lives_in': [{
# 'head': {'text': 'John', 'confidence': 0.94, 'start': 0, 'end': 4},
# 'tail': {'text': 'San Francisco', 'confidence': 0.91, 'start': 33, 'end': 46}
# }]
# }
# }Combine all extraction types when you need comprehensive analysis:
# Use create_schema() for multi-task scenarios
schema = (extractor.create_schema()
# Extract key entities
.entities({
"person": "Names of people, executives, or individuals",
"company": "Organization, corporation, or business names",
"product": "Products, services, or offerings mentioned"
})
# Classify the content
.classification("sentiment", ["positive", "negative", "neutral"])
.classification("category", ["technology", "business", "finance", "healthcare"])
# Extract relationships
.relations(["works_for", "founded", "located_in"])
# Extract structured product details
.structure("product_info")
.field("name", dtype="str")
.field("price", dtype="str")
.field("features", dtype="list")
.field("availability", dtype="str", choices=["in_stock", "pre_order", "sold_out"])
)
# Comprehensive extraction in one pass
text = "Apple CEO Tim Cook unveiled the revolutionary iPhone 15 Pro for $999. The device features an A17 Pro chip and titanium design. Tim Cook works for Apple, which is located in Cupertino."
results = extractor.extract(text, schema)
# Output: {
# 'entities': {
# 'person': ['Tim Cook'],
# 'company': ['Apple'],
# 'product': ['iPhone 15 Pro']
# },
# 'sentiment': 'positive',
# 'category': 'technology',
# 'relation_extraction': {
# 'works_for': [('Tim Cook', 'Apple')],
# 'located_in': [('Apple', 'Cupertino')]
# },
# 'product_info': [{
# 'name': 'iPhone 15 Pro',
# 'price': '$999',
# 'features': ['A17 Pro chip', 'titanium design'],
# 'availability': 'in_stock'
# }]
# }financial_text = """
Transaction Report: Goldman Sachs processed a $2.5M equity trade for Tesla Inc.
on March 15, 2024. Commission: $1,250. Status: Completed.
"""
# Extract structured financial data
result = extractor.extract_json(
financial_text,
{
"transaction": [
"broker::str::Financial institution or brokerage firm",
"amount::str::Transaction amount with currency",
"security::str::Stock, bond, or financial instrument",
"date::str::Transaction date",
"commission::str::Fees or commission charged",
"status::str::Transaction status",
"type::[equity|bond|option|future|forex]::str::Type of financial instrument"
]
}
)
# Output: {
# 'transaction': [{
# 'broker': 'Goldman Sachs',
# 'amount': '$2.5M',
# 'security': 'Tesla Inc.',
# 'date': 'March 15, 2024',
# 'commission': '$1,250',
# 'status': 'Completed',
# 'type': 'equity'
# }]
# }medical_record = """
Patient: Sarah Johnson, 34, presented with acute chest pain and shortness of breath.
Prescribed: Lisinopril 10mg daily, Metoprolol 25mg twice daily.
Follow-up scheduled for next Tuesday.
"""
result = extractor.extract_json(
medical_record,
{
"patient_info": [
"name::str::Patient full name",
"age::str::Patient age",
"symptoms::list::Reported symptoms or complaints"
],
"prescriptions": [
"medication::str::Drug or medication name",
"dosage::str::Dosage amount and frequency",
"frequency::str::How often to take the medication"
]
}
)
# Output: {
# 'patient_info': [{
# 'name': 'Sarah Johnson',
# 'age': '34',
# 'symptoms': ['acute chest pain', 'shortness of breath']
# }],
# 'prescriptions': [
# {'medication': 'Lisinopril', 'dosage': '10mg', 'frequency': 'daily'},
# {'medication': 'Metoprolol', 'dosage': '25mg', 'frequency': 'twice daily'}
# ]
# }contract_text = """
Service Agreement between TechCorp LLC and DataSystems Inc., effective January 1, 2024.
Monthly fee: $15,000. Contract term: 24 months with automatic renewal.
Termination clause: 30-day written notice required.
"""
# Multi-task extraction for comprehensive analysis
schema = (extractor.create_schema()
.entities(["company", "date", "duration", "fee"])
.classification("contract_type", ["service", "employment", "nda", "partnership"])
.relations(["signed_by", "involves", "dated"])
.structure("contract_terms")
.field("parties", dtype="list")
.field("effective_date", dtype="str")
.field("monthly_fee", dtype="str")
.field("term_length", dtype="str")
.field("renewal", dtype="str", choices=["automatic", "manual", "none"])
.field("termination_notice", dtype="str")
)
results = extractor.extract(contract_text, schema)
# Output: {
# 'entities': {
# 'company': ['TechCorp LLC', 'DataSystems Inc.'],
# 'date': ['January 1, 2024'],
# 'duration': ['24 months'],
# 'fee': ['$15,000']
# },
# 'contract_type': 'service',
# 'relation_extraction': {
# 'involves': [('TechCorp LLC', 'DataSystems Inc.')],
# 'dated': [('Service Agreement', 'January 1, 2024')]
# },
# 'contract_terms': [{
# 'parties': ['TechCorp LLC', 'DataSystems Inc.'],
# 'effective_date': 'January 1, 2024',
# 'monthly_fee': '$15,000',
# 'term_length': '24 months',
# 'renewal': 'automatic',
# 'termination_notice': '30-day written notice'
# }]
# }# Extract entities and relations for knowledge graph building
text = """
Elon Musk founded SpaceX in 2002. SpaceX is located in Hawthorne, California.
SpaceX acquired Swarm Technologies in 2021. Many engineers work for SpaceX.
"""
schema = (extractor.create_schema()
.entities(["person", "organization", "location", "date"])
.relations({
"founded": "Founding relationship where person created organization",
"acquired": "Acquisition relationship where company bought another company",
"located_in": "Geographic relationship where entity is in a location",
"works_for": "Employment relationship where person works at organization"
})
)
results = extractor.extract(text, schema)
# Output: {
# 'entities': {
# 'person': ['Elon Musk', 'engineers'],
# 'organization': ['SpaceX', 'Swarm Technologies'],
# 'location': ['Hawthorne, California'],
# 'date': ['2002', '2021']
# },
# 'relation_extraction': {
# 'founded': [('Elon Musk', 'SpaceX')],
# 'acquired': [('SpaceX', 'Swarm Technologies')],
# 'located_in': [('SpaceX', 'Hawthorne, California')],
# 'works_for': [('engineers', 'SpaceX')]
# }
# }# High-precision extraction for critical fields
result = extractor.extract_json(
text,
{
"financial_data": [
"account_number::str::Bank account number", # default threshold
"amount::str::Transaction amount", # default threshold
"routing_number::str::Bank routing number" # default threshold
]
},
threshold=0.9 # High confidence for all fields
)
# Per-field thresholds using schema builder (for multi-task scenarios)
schema = (extractor.create_schema()
.structure("sensitive_data")
.field("ssn", dtype="str", threshold=0.95) # Highest precision
.field("email", dtype="str", threshold=0.8) # Medium precision
.field("phone", dtype="str", threshold=0.7) # Lower precision
)# Structured extraction with choices and types
result = extractor.extract_json(
"Premium subscription at $99/month with mobile and web access.",
{
"subscription": [
"tier::[basic|premium|enterprise]::str::Subscription level",
"price::str::Monthly or annual cost",
"billing::[monthly|annual]::str::Billing frequency",
"features::[mobile|web|api|analytics]::list::Included features"
]
}
)
# Output: {
# 'subscription': [{
# 'tier': 'premium',
# 'price': '$99/month',
# 'billing': 'monthly',
# 'features': ['mobile', 'web']
# }]
# }Filter extracted spans to ensure they match expected patterns, improving extraction quality and reducing false positives.
from gliner2 import GLiNER2, RegexValidator
extractor = GLiNER2.from_pretrained("fastino/gliner2-base-v1")
# Email validation
email_validator = RegexValidator(r"^[\w\.-]+@[\w\.-]+\.\w+$")
schema = (extractor.create_schema()
.structure("contact")
.field("email", dtype="str", validators=[email_validator])
)
text = "Contact: john@company.com, not-an-email, jane@domain.org"
results = extractor.extract(text, schema)
# Output: {'contact': [{'email': 'john@company.com'}]} # Only valid emails
# Phone number validation (US format)
phone_validator = RegexValidator(r"\(\d{3}\)\s\d{3}-\d{4}", mode="partial")
schema = (extractor.create_schema()
.structure("contact")
.field("phone", dtype="str", validators=[phone_validator])
)
text = "Call (555) 123-4567 or 5551234567"
results = extractor.extract(text, schema)
# Output: {'contact': [{'phone': '(555) 123-4567'}]} # Second number filtered out
# URL validation
url_validator = RegexValidator(r"^https?://", mode="partial")
schema = (extractor.create_schema()
.structure("links")
.field("url", dtype="list", validators=[url_validator])
)
text = "Visit https://example.com or www.site.com"
results = extractor.extract(text, schema)
# Output: {'links': [{'url': ['https://example.com']}]} # www.site.com filtered out
# Exclude test data
import re
no_test_validator = RegexValidator(r"^(test|demo|sample)", exclude=True, flags=re.IGNORECASE)
schema = (extractor.create_schema()
.structure("products")
.field("name", dtype="list", validators=[no_test_validator])
)
text = "Products: iPhone, Test Phone, Samsung Galaxy"
results = extractor.extract(text, schema)
# Output: {'products': [{'name': ['iPhone', 'Samsung Galaxy']}]} # Test Phone excluded
# Multiple validators (all must pass)
username_validators = [
RegexValidator(r"^[a-zA-Z0-9_]+$"), # Alphanumeric + underscore
RegexValidator(r"^.{3,20}$"), # 3-20 characters
RegexValidator(r"^(?!admin)", exclude=True, flags=re.IGNORECASE) # No "admin"
]
schema = (extractor.create_schema()
.structure("user")
.field("username", dtype="str", validators=username_validators)
)
text = "Users: ab, john_doe, user@domain, admin, valid_user123"
results = extractor.extract(text, schema)
# Output: {'user': [{'username': 'john_doe'}]} # Only valid usernamesProcess multiple texts efficiently in a single call:
# Batch entity extraction
texts = [
"Google's Sundar Pichai unveiled Gemini AI in Mountain View.",
"Microsoft CEO Satya Nadella announced Copilot at Build 2023.",
"Amazon's Andy Jassy revealed new AWS services in Seattle."
]
results = extractor.batch_extract_entities(
texts,
["company", "person", "product", "location"],
batch_size=8
)
# Returns list of results, one per input text
# Batch relation extraction
texts = [
"John works for Microsoft and lives in Seattle.",
"Sarah founded TechStartup in 2020.",
"Bob reports to Alice at Google."
]
results = extractor.batch_extract_relations(
texts,
["works_for", "founded", "reports_to", "lives_in"],
batch_size=8
)
# Returns list of relation extraction results for each text
# All requested relation types appear in each result, even if empty
# Batch with confidence and spans
results = extractor.batch_extract_entities(
texts,
["company", "person"],
include_confidence=True,
include_spans=True,
batch_size=8
)Train GLiNER2 on your own data to specialize for your domain or use case.
from gliner2 import GLiNER2
from gliner2.training.data import InputExample
from gliner2.training.trainer import GLiNER2Trainer, TrainingConfig
# 1. Prepare training data
examples = [
InputExample(
text="John works at Google in California.",
entities={"person": ["John"], "company": ["Google"], "location": ["California"]}
),
InputExample(
text="Apple released iPhone 15.",
entities={"company": ["Apple"], "product": ["iPhone 15"]}
),
# Add more examples...
]
# 2. Configure training
model = GLiNER2.from_pretrained("fastino/gliner2-base-v1")
config = TrainingConfig(
output_dir="./output",
num_epochs=10,
batch_size=8,
encoder_lr=1e-5,
task_lr=5e-4
)
# 3. Train
trainer = GLiNER2Trainer(model, config)
trainer.train(train_data=examples)GLiNER2 uses JSONL format where each line contains an input and output field:
{"input": "Tim Cook is the CEO of Apple Inc., based in Cupertino, California.", "output": {"entities": {"person": ["Tim Cook"], "company": ["Apple Inc."], "location": ["Cupertino", "California"]}, "entity_descriptions": {"person": "Full name of a person", "company": "Business organization name", "location": "Geographic location or place"}}}
{"input": "OpenAI released GPT-4 in March 2023.", "output": {"entities": {"company": ["OpenAI"], "model": ["GPT-4"], "date": ["March 2023"]}}}Classification Example:
{"input": "This movie is absolutely fantastic! I loved every minute of it.", "output": {"classifications": [{"task": "sentiment", "labels": ["positive", "negative", "neutral"], "true_label": ["positive"]}]}}
{"input": "The service was terrible and the food was cold.", "output": {"classifications": [{"task": "sentiment", "labels": ["positive", "negative", "neutral"], "true_label": ["negative"]}]}}Structured Extraction Example:
{"input": "iPhone 15 Pro Max with 256GB storage, priced at $1199.", "output": {"json_structures": [{"product": {"name": "iPhone 15 Pro Max", "storage": "256GB", "price": "$1199"}}]}}Relation Extraction Example:
{"input": "John works for Apple Inc. and lives in San Francisco.", "output": {"relations": [{"works_for": {"head": "John", "tail": "Apple Inc."}}, {"lives_in": {"head": "John", "tail": "San Francisco"}}]}}from gliner2 import GLiNER2
from gliner2.training.trainer import GLiNER2Trainer, TrainingConfig
# Load model and train from JSONL file
model = GLiNER2.from_pretrained("fastino/gliner2-base-v1")
config = TrainingConfig(output_dir="./output", num_epochs=10)
trainer = GLiNER2Trainer(model, config)
trainer.train(train_data="train.jsonl") # Path to your JSONL fileTrain lightweight adapters for domain-specific tasks:
from gliner2 import GLiNER2
from gliner2.training.data import InputExample
from gliner2.training.trainer import GLiNER2Trainer, TrainingConfig
# Prepare domain-specific data
legal_examples = [
InputExample(
text="Apple Inc. filed a lawsuit against Samsung Electronics.",
entities={"company": ["Apple Inc.", "Samsung Electronics"]}
),
# Add more examples...
]
# Configure LoRA training
model = GLiNER2.from_pretrained("fastino/gliner2-base-v1")
config = TrainingConfig(
output_dir="./legal_adapter",
num_epochs=10,
batch_size=8,
encoder_lr=1e-5,
task_lr=5e-4,
# LoRA settings
use_lora=True, # Enable LoRA
lora_r=8, # Rank (4, 8, 16, 32)
lora_alpha=16.0, # Scaling factor (usually 2*r)
lora_dropout=0.0, # Dropout for LoRA layers
save_adapter_only=True # Save only adapter (~5MB vs ~450MB)
)
# Train adapter
trainer = GLiNER2Trainer(model, config)
trainer.train(train_data=legal_examples)
# Use the adapter
model.load_adapter("./legal_adapter/final")
results = model.extract_entities(legal_text, ["company", "law"])Benefits of LoRA:
- Smaller size: Adapters are ~2-10 MB vs ~450 MB for full models
- Faster training: 2-3x faster than full fine-tuning
- Easy switching: Swap adapters in milliseconds for different domains
from gliner2 import GLiNER2
from gliner2.training.data import InputExample, TrainingDataset
from gliner2.training.trainer import GLiNER2Trainer, TrainingConfig
# Prepare training data
train_examples = [
InputExample(
text="Tim Cook is the CEO of Apple Inc., based in Cupertino, California.",
entities={
"person": ["Tim Cook"],
"company": ["Apple Inc."],
"location": ["Cupertino", "California"]
},
entity_descriptions={
"person": "Full name of a person",
"company": "Business organization name",
"location": "Geographic location or place"
}
),
# Add more examples...
]
# Create and validate dataset
train_dataset = TrainingDataset(train_examples)
train_dataset.validate(strict=True, raise_on_error=True)
train_dataset.print_stats()
# Split into train/validation
train_data, val_data, _ = train_dataset.split(
train_ratio=0.8,
val_ratio=0.2,
test_ratio=0.0,
shuffle=True,
seed=42
)
# Configure training
model = GLiNER2.from_pretrained("fastino/gliner2-base-v1")
config = TrainingConfig(
output_dir="./ner_model",
experiment_name="ner_training",
num_epochs=15,
batch_size=16,
encoder_lr=1e-5,
task_lr=5e-4,
warmup_ratio=0.1,
scheduler_type="cosine",
fp16=True,
eval_strategy="epoch",
save_best=True,
early_stopping=True,
early_stopping_patience=3
)
# Train
trainer = GLiNER2Trainer(model, config)
trainer.train(train_data=train_data, val_data=val_data)
# Load best model
model = GLiNER2.from_pretrained("./ner_model/best")For more details, see the Training Tutorial and Data Format Guide.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
If you use GLiNER2 in your research, please cite:
@inproceedings{zaratiana-etal-2025-gliner2,
title = "{GL}i{NER}2: Schema-Driven Multi-Task Learning for Structured Information Extraction",
author = "Zaratiana, Urchade and
Pasternak, Gil and
Boyd, Oliver and
Hurn-Maloney, George and
Lewis, Ash",
editor = {Habernal, Ivan and
Schulam, Peter and
Tiedemann, J{\"o}rg},
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-demos.10/",
pages = "130--140",
ISBN = "979-8-89176-334-0",
abstract = "Information extraction (IE) is fundamental to numerous NLP applications, yet existing solutions often require specialized models for different tasks or rely on computationally expensive large language models. We present GLiNER2, a unified framework that enhances the original GLiNER architecture to support named entity recognition, text classification, and hierarchical structured data extraction within a single efficient model. Built on a fine-tuned encoder architecture, GLiNER2 maintains CPU efficiency and compact size while introducing multi-task composition through an intuitive schema-based interface. Our experiments demonstrate competitive performance across diverse IE tasks with substantial improvements in deployment accessibility compared to LLM-based alternatives. We release GLiNER2 as an open-source library available through pip, complete with pre-trained models and comprehensive documentation."
}Built upon the original GLiNER architecture by the team at Fastino AI.
pip install gliner2