文法ーCHECK: models and code

Hi! Here you can find the trained models for bunpo-check.com, as well as libraries for generating the data and carrying out the model training.

Still to be added

Model training code
Website JS/HTML/CSS
Functions for interpreting output

Requirements

See requirements.txt

permut8r

This is a permutation library I wrote for creating random permutations of correct strings, for use as pre-training data.

What does it do?

# Original string from Wikipedia
"その使用は1世紀に遡ることができ、5世紀中葉から現代に至るまでの変遷がわかる。"

# Output from permut8r
"その使用は1世紀に遡ることができ、5世紀中葉から現代に至るまでの変遷がわかる。,011111111111111111111111110000000000000000000000"
"その使用1世紀に遡ることができ、5世紀中た葉から現代に至るまでの変遷業がわかる。,012211111111112211111112211000000000000000000000"

A string is converted to tokens via the Huggingface tokenizer. Then, random permutations are applied to it from the following:

Deleting tokens or characters
Swapping tokens around
Swapping kanji for other kanji with the same reading
Swapping particles for other random particles
Inserting random kanji or particles

Each token is assigned a label of 0 (unimportant eg. PAD tokens), 1 (correct grammar) or 2 (error). The correct string is saved with a label sequence of only 0/1. The permuted string contains some new errors, and these are encoded with a label of 2 for the affected tokens.

This allowed me to build a ~263M dataset of labelled sentences which reached a decent F0.5 score of 45.0 on a test set, which went a looong way to getting a good performance here.

Model training

Will upload soon!

Trained models

To use the trained model, you will need to use the Huggingface transformers library.

Step 1: download the model from my Google drive and save the folder to your working directory.

Step 2: run check.py to check your sentence.

python check.py "⽂法ーCHECKは⼈⼯知能により⽂法が正しいか確かめられるサイトです。"

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
permut8r		permut8r
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
check.py		check.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

文法ーCHECK: models and code

Contents

Still to be added

Requirements

permut8r

Model training

Trained models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

文法ーCHECK: models and code

Contents

Still to be added

Requirements

permut8r

Model training

Trained models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages