SWEBERT: Software Industry Text Classification

SWEBERT is a specialized text classifier designed to categorize software industry related article summaries into predefined technical categories. It leverages a fine-tuned variant of ModernBERT to provide accurate classification of technical content.

Overview

SWEBERT is fine-tuned from the ModernBERT-base model and specialized for classifying software engineering article summaries into categories such as:

WIP LIST

networking

Installation

# Clone the repository
git clone https://github.com/yourusername/SWEBERT.git
cd SWEBERT

# Install dependencies
uv sync

uv run main.py

Usage

Classification Example

from transformers import pipeline

# Load the classifier
classifier = pipeline("text-classification", model="./SWEBERT")

# Classify a software-related article
article = ("A SQL query is used to fetch data from a relational database.")

result = classifier(article)

print(f"Prediction: {result[0]['label']} (Score: {result[0]['score']:.4f})")

Dataset (WIP)

The model is trained on a curated dataset of software engineering article summaries categorized into technical domains. The training data follows this format:

text,label
"A SQL query is used to fetch data from a relational database.",database
"Network latency and bandwidth are key performance metrics.",networking
"The model was trained using a support vector machine algorithm.",machine-learning

Model Architecture

SWEBERT is based on the ModernBERT transformer architecture with:

A sequence classification head for multi-class prediction
Fine-tuning on software engineering specific content
Optimized for technical text understanding

Performance

TBD

License

[Add your license information here]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SWEBERT: Software Industry Text Classification

Overview

Installation

Usage

Classification Example

Dataset (WIP)

Model Architecture

Performance

License

About

Uh oh!

Releases

Packages

Languages

onatm/SWEBERT

Folders and files

Latest commit

History

Repository files navigation

SWEBERT: Software Industry Text Classification

Overview

Installation

Usage

Classification Example

Dataset (WIP)

Model Architecture

Performance

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages