Clear BOW 📚

Lightweight dictionary-based classifier that converts word frequencies into label probabilities using softmax/sigmoid functions. Perfect for bootstrapping classifications with terminology lists.

Features

🔍 Dictionary-based classification
📊 Multi-class (softmax) support
🏷️ Multi-label (sigmoid) support
📝 Simple terminology lists
🔢 Probability outputs
💾 Model save/load functionality
🎯 93% test coverage

Installation

# Via pip
pip install clear-bow

# Or from source
git clone https://github.com/samhardyhey/clear-bow
cd clear-bow
pip install -e .

Usage

from clear_bow.classifier import DictionaryClassifier

# Define your dictionary
super_dict = {
    "regulation": ["asic", "government", "federal", "tax"],
    "contribution": ["contribution", "concession", "personal", "after tax"],
    "fund": ["unisuper", "aus super", "sun super", "qsuper"],
}

# Create classifier (multi-class by default)
dc = DictionaryClassifier(label_dictionary=super_dict)

# Or for multi-label classification
dc = DictionaryClassifier(
    label_dictionary=super_dict,
    classifier_type="multi_label"
)

# Make predictions
result = dc.predict_single("A 10% contribution to your super fund")
# Returns probability distribution across labels

# Batch predictions
results = dc.predict_batch([
    "A 10% contribution to your super fund",
    "Government regulation of super funds"
])

# Save model to disk
dc.to_disk("path/to/model")

# Load model from disk
dc = DictionaryClassifier()
dc.from_disk("path/to/model")

Development

# Setup development environment
make setup-local-dev
source venv/bin/activate

# Run tests
make test-local

# Run tests with coverage
make test-coverage

# Multi-environment testing
make test-tox

# Build distribution
make dist-bundle-build

# Clean build artifacts
make clean

# Upload to PyPI
make publish

Project Structure

clear-bow/
├── src/
│   └── clear_bow/
│       ├── __init__.py
│       └── classifier.py
├── tests/
│   ├── conftest.py
│   └── test_classifier.py
├── pyproject.toml    # Project configuration
├── tox.ini          # Multi-environment testing
└── makefile         # Development commands

Features in Detail

Multi-class Classification

Uses softmax transformation
Outputs sum to 1.0
Best for mutually exclusive categories

Multi-label Classification

Uses sigmoid transformation
Each label gets independent probability
Best for non-exclusive categories

Error Handling

Validates classifier types
Handles missing/invalid files
Provides informative error messages

File Operations

Save model configuration
Save label dictionaries
Load models from disk

Note: See tests for additional usage examples and edge cases.

License

MIT License - See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
src/clear_bow		src/clear_bow
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
makefile		makefile
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clear BOW 📚

Features

Installation

Usage

Development

Project Structure

Features in Detail

Multi-class Classification

Multi-label Classification

Error Handling

File Operations

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

samhardyhey/clear-bow

Folders and files

Latest commit

History

Repository files navigation

Clear BOW 📚

Features

Installation

Usage

Development

Project Structure

Features in Detail

Multi-class Classification

Multi-label Classification

Error Handling

File Operations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages