This repository contains two PyQt5-based tools for Chinese Named Entity Recognition (NER) and tag frequency analysis.
- Select a training dataset and train a CRF model
- Select raw input text and predict NER tags
- Supports Windows and macOS/Linux
- Displays prediction results in the output box
- Supports Windows and macOS/Linux
- Input text in word/tag format (e.g., 香港/ns)
- Enter a target tag (e.g.,
nr) to analyze - Visualizes the top 20 words for
that tag - Supports Chinese fonts via .ttf file
Install required packages:
pip install pyqt5 matplotlib
Make sure the crf/ folder contains:
crf_learn.exe/crf_learncrf_test.exe/crf_testtemplatefile for training
Optional: Chinese font MingLiU.ttf for plotting.
Run the GUI:
python crf_ner_gui.py
Steps:
- Select your training dataset (
.txt) - Train the CRF model
- Select raw input text
- Predict NER tags
Run the GUI:
python tag_frequency_gui.py
Steps:
- Paste your word/tag text in the text area
- Enter a target tag (e.g.,
nr) - Click Plot Frequency to visualize the top 20 words
| File / Folder | Description |
|---|---|
crf/ |
CRF executables and template file |
crf_ner_gui.py |
GUI for training and predicting NER tags |
tag_frequency_gui.py |
GUI for plotting tag frequency |
MingLiU.ttf |
Optional Chinese font for plotting |
README.md |
Project description |
Formatted input for CRF NER GUI:
直 O
至 O
十 O
多 O
年 O
前 O
﹐ O
梁 S-nr
購 O
入 O
兇 O
案 O
現 O
場 O
深 B-nr
水 I-nr
E-nr
北 B-nr
河 I-nr
街 E-nr
十 O
三 O
號 O
二 O
樓 O
Sample word/tag text for Tag Frequency Statistics:
梁/S-nr 深/B-nr 水/I-nr /E-nr 北/B-nr 河/I-nr 街/E-nr
Prediction output:
梁/S-nr 深/B-nr 水/I-nr /E-nr 北/B-nr 河/I-nr 街/E-nr
This project is released under the MIT License.
Created by [Your Name] – email: youremail@example.com