Important
This project is under development. The dictionary is incomplete, features are subject to change, and occasional bugs are expected.
Bijar is a spellchecking tool for the Central Kurdish Wikipedia, delivered as an open-source Flask webservice and a MediaWiki gadget. It is designed to help editors improve article quality by identifying and correcting spelling errors.
The name "Bijar" (بژار) is a Kurdish word for "weeding," reflecting the tool's purpose of cleaning mistakes from text.
The Bijar gadget integrated with Wikipedia's 2010 wikitext editor, showing its options and a list of misspelled words with suggestions.
Screenshot by the project author. Licensed under CC BY-SA 4.0 via Wikimedia Commons.
- Spellcheck Engine: Identifies potential spelling errors in Central Kurdish text.
- Kurdish Morphology: Recognizes complex verb tenses, conjugations, and affixes to improve accuracy.
- Correction Suggestions: Provides a list of suggestions for each identified error.
- Community Dictionary: Allows users to request new words to be added.
- Wikipedia Gadget: Integrates directly into the ckb.wikipedia.org editing interface for eligible users.
- Public Database: Data can be queried directly using Wikimedia's Quarry and Superset tools (database:
s57137__bijar_p). See, for example, a query for all simple verbs with their stems and properties. - Public API: Offers endpoints for integration with other applications.
This tool is used as a gadget on the Central Kurdish Wikipedia. To learn how to enable and use it, please read the official documentation on Wikipedia.
Note
The gadget is currently available only in the 2010 wikitext editor.
For eligible users on ckb.wikipedia.org, the tool provides a complete, semi-automatic workflow.
- Activation: An eligible editor enables the Bijar gadget in their MediaWiki preferences.
- Analysis: The user clicks a button in the editor, which sends the article's wikitext to the Bijar backend service.
- Response: The backend analyzes the text, identifies potential errors, and returns a structured list of these errors along with correction suggestions back to the user.
- Review and Correction: The gadget displays the results in a window below the editor. The user can then interact with this list to make corrections:
- Clicking a misspelled word in the list automatically finds and selects it in the main editor.
- A dropdown menu next to each word provides a list of correction suggestions to choose from.
The official gadget has several features and behaviors:
- Positional Awareness: The gadget identifies words by their start and end positions. If the text is edited manually, these positions can become incorrect. The gadget will show a notification prompting the user to refresh. If the live update option is enabled, it refreshes automatically.
- User Settings: The gadget UI allows users to configure several options:
- Live Update: If enabled, the spellcheck is triggered automatically after key presses or edits, keeping results constantly updated, but increasing API requests.
- Safe Mode: Enabled by default, this mode prevents checking text inside templates, link targets, file names and categories to avoid breaking them. It can be disabled by trusted users, but must be used with caution.
- Suggestion Controls: Users can set the maximum number of suggestions (1-10) and the Levenshtein distance (1-3) for finding matches.
- Other Options: Includes toggles for handling bad words and grouping duplicate errors.
The Bijar webservice provides public API endpoints which can be used in other projects or custom user scripts.
This endpoint returns a JSON object containing a list of suggestions for a given word.
URL: GET https://bijar.toolforge.org/api/get_suggestions
Parameters:
| Parameter | Type | Description |
|---|---|---|
word |
string | Required. The word to check. |
limit |
integer | Optional. Max number of suggestions. Range: 1-10. Default: 5. |
distance |
integer | Optional. Levenshtein distance. Range: 1-3. Default: 2. |
Example Request:
https://bijar.toolforge.org/api/get_suggestions?word=کورشی&limit=10&distance=2
Example Response:
{
"word": "کورشی",
"distance_used": 2,
"limit_used": 10,
"suggestions": [
"کوردی",
"کورسی",
"کورتی",
"کوشتی",
"کوێری",
"کرێشی",
"کەوشی",
"کورد",
"کوردەشی",
"کورتەشی"
]
}This endpoint analyzes a block of plain text and returns a JSON list of all found issues.
URL: POST https://bijar.toolforge.org/api/check_text_block
Request Body: (Content-Type: application/json)
| Parameter | Type | Description |
|---|---|---|
text |
string | Required. The block of text to be analyzed. |
JavaScript (fetch)
async function checkText(text) {
const url = 'https://bijar.toolforge.org/api/check_text_block';
const response = await fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text })
});
return response.json();
}
checkText('چەم بێ چقەڵ نابێت.').then(console.log);Python (requests)
import requests
def check_text(text):
url = 'https://bijar.toolforge.org/api/check_text_block'
response = requests.post(url, json={'text': text})
return response.json()
print(check_text('چەم بێ چقەڵ نابێت.'))PHP (file_get_contents)
function check_text($text) {
$url = 'https://bijar.toolforge.org/api/check_text_block';
$options = [
'http' => [
'method' => 'POST',
'header' => "Content-Type: application/json\r\n" .
"User-Agent: Bijar-API-Client\r\n", // A User-Agent is required by Toolforge.
'content' => json_encode(['text' => $text]),
],
];
$context = stream_context_create($options);
$response = file_get_contents($url, false, $context);
return json_decode($response, true);
}
print_r(check_text('چەم بێ چقەڵ نابێت.'));Example Response:
[
{
"word": "چقەڵ",
"type": "misspelled",
"start": 7,
"end": 11
}
]Notes:
- Each object in the response array represents a single found issue.
startandendare the character offsets of the word in the original text.- The
typefield indicates the nature of the issue (e.g.,misspelled,bad). - Wikitext Handling: The API analyzes plain text. For best results when checking wiki articles, it is recommended to first mask syntax (templates, links, etc.) on the client-side before sending the text. The official ckbwiki gadget is a robust reference for this.
This repository contains the source code for the webservice (backend). Follow the instructions below to set it up for local development or for production on Toolforge.
Local Development
Before you begin, ensure you have the following software installed on your local machine:
- Git: A version control system. Download Git
- Python: Version 3.10 or newer. Download Python
- MySQL/MariaDB: A local database server (e.g., XAMPP, WAMP, MAMP, or a direct installation).
1. Clone the Repository
git clone https://github.com/KurdishWikipedia/bijar.git2. Create Virtual Environment & Install Dependencies
Open a terminal and create a virtual environment inside the www/python/ directory.
python -m venv www/python/venvNext, activate the environment.
- Windows (Command Prompt):
.\www\python\venv\Scripts\activate.bat - Windows (PowerShell):
.\www\python\venv\Scripts\Activate.ps1 - macOS & Linux (bash/zsh):
source www/python/venv/bin/activate
Now, install the required packages:
pip install -r requirements.txtFinally, deactivate the environment:
deactivate3. Set Up the Database
- Start your local MySQL/MariaDB server.
- Create a new database (e.g.,
local_database). You can do this through your database program's GUI or with a command-line client:CREATE DATABASE local_database CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
- Database File: The database dump (
.sqlfile) is required. To obtain it, please contact the project maintainer on their Wikipedia user talk page. Once you have the file, import it into the database you just created.
4. Configure Environment Variables
The application requires a local .env file for settings and secrets.
Navigate to the application's source directory:
cd www/python/srcCreate the file by copying the sample for your operating system:
# On Windows
copy .env.sample .env
# On macOS & Linux
cp .env.sample .envOpen the new .env file in a text editor and follow the instructions inside to add your local configuration.
5. Generate Word Statistics
Note: This script pre-caches word statistics in a JSON file for the home page, preventing a startup timeout on Toolforge and ensuring the application runs efficiently in both local and production environments.
Activate the virtual environment:
- Windows (Command Prompt):
.\www\python\venv\Scripts\activate.bat - Windows (PowerShell):
.\www\python\venv\Scripts\Activate.ps1 - macOS & Linux (bash/zsh):
source www/python/venv/bin/activate
With your local database server running, execute the script:
python run.py generate_stats.pyAfter it completes, deactivate the environment:
deactivate6. Get the Gadget Source Code
To test changes locally, you can use a browser extension like Tampermonkey to inject your local JS/CSS files into live Wikipedia pages.
Important Browser Security Note: When developing locally, the gadget runs on https://ckb.wikipedia.org while your Flask server runs on http://127.0.0.1. Modern browsers block this cross-origin request by default (CORS policy). You may need to temporarily disable web security features in your browser to allow flask-cors to work. This is for local development only and should be handled with care.
7. Run the Application
-
Start your database: Ensure your local database server is running.
-
Run the development server: From the project's root directory (
bijar/), execute the appropriate script for your operating system. This will automatically manage the virtual environment and start the Flask server.- On Windows (cmd/PowerShell):
.\run.bat - On macOS, Linux, or Git Bash:
./run.sh
- On Windows (cmd/PowerShell):
The application will now be running at http://127.0.0.1:5000 and http://localhost:5000.
Toolforge Production
(Replace <username>, <tool_name>, and <database_name> with your credentials.)
1. Connect to Toolforge
ssh <username>@login.toolforge.org
become <tool_name>2. Clone the Repository
git clone https://github.com/KurdishWikipedia/bijar.git .3. Create Virtual Environment & Install Dependencies
See the official documentation on Python Virtual Environments and Packages on Toolforge for more details.
toolforge webservice python3.13 shell
mkdir -p $HOME/www/python
python3 -m venv $HOME/www/python/venv
source $HOME/www/python/venv/bin/activate
pip install --upgrade pip wheel
pip install -r $HOME/requirements.txt
exit4. Set Up the Database on Toolforge
See the Toolforge ToolsDB documentation for more information.
# Connect to MariaDB
sql tools
# Create the database
CREATE DATABASE <database_name>;
# Verify creation
SHOW DATABASES;
# Exit the MariaDB prompt
exitUpload your local_database.sql file from your computer to your tool's home directory. In a new local terminal:
scp local_database.sql <username>@tools-login.wmflabs.org:/data/project/<tool_name>/From your Toolforge shell, verify the upload and import the data:
# Verify the file exists
ls -l *.sql
# Import the SQL file into your tool's database
mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud <database_name> < /data/project/<tool_name>/local_database.sql
# No output indicates success.Check that the tables were imported successfully:
sql tools
USE <database_name>;
SHOW TABLES;
# You should see all tables now.
exit(Optional but recommended) Remove the SQL file after import. Toolforge advises against storing backups permanently on the platform.
cd ~
rm local_database.sql5. Configure Environment Variables
Create and edit the .env file using the provided sample. The file contains all necessary instructions.
cd www/python/src
cp .env.sample .env
# Edit .env using the instructions inside the file.Finally, secure the file:
chmod 600 .env6. Generate Word Statistics
Note: To prevent a startup timeout on Toolforge, this script pre-caches word statistics in a JSON file for the home page.
Activate the virtual environment:
source www/python/venv/bin/activateRun the script manually to generate the statistics. This process can be slow on Toolforge, depending on the size of the database.
python run.py generate_stats.pyAfter the script finishes, deactivate the environment:
deactivateTIP: Since this script's execution is slow on Toolforge, it is more efficient to automate it with a scheduled cron job rather than running it manually.
7. Start the Webservice
toolforge webservice python3.13 startSee the official documentation about backups for details.
Note: Toolforge does not recommend storing backups on the platform permanently.
1. Export: Run this command to create a private SQL dump. The file will be saved in your tool's home ($HOME) directory.
# use umask to make the dump private (use unless the database is public)
toolforge jobs run --command 'umask o-r; ( mariadb-dump --defaults-file=$TOOL_DATA_DIR/replica.my.cnf --host=tools-readonly.db.svc.wikimedia.cloud <database_name> > $TOOL_DATA_DIR/<database_name>-$(date -I).sql )' --image mariadb backup --waitVerify the file was created using ls -l *.sql from the $HOME directory.
2. Download: From your local PC's terminal, use scp to download the file.
scp <username>@login.toolforge.org:/data/project/<tool_name>/<database_name>-YYYY-MM-DD.sql .The source code for the user interface (the gadget) is hosted directly on the Central Kurdish Wikipedia.
- JavaScript: MediaWiki:Gadget-Bijar.js
- CSS: MediaWiki:Gadget-Bijar.css
- Definition: MediaWiki:Gadgets-definition
The best place to report bugs, request features, or discuss ideas is the project's talk page on ckb.wikipedia.org.
Alternatively, you can open an issue or submit a pull request on GitHub.
This project is licensed under the MIT License. See the LICENSE file for details.