jcds is a modular Python library (hosted in the jcds-lib repository) designed to support reproducible workflows in data science and exploratory data analysis (EDA).
It provides a curated collection of functions for inspecting, transforming, and accessing tabular data from local and cloud sources, with particular emphasis on usability within Jupyter notebooks.
📚 This project originated during my time as a graduate student in the MSADS (Master of Science in Applied Data Science) program at the University of San Diego.
I often ran into repetitive tasks — inspecting nulls, handling encodings, wrangling column names, or working with messy CSVs and S3-hosted files — across multiple class and capstone projects.
To address these real-world pain points, I began building jcds as a personal toolkit grounded in DRY (Don't Repeat Yourself) principles — and have been pair programming alongside Generative AI 🤖 to refine and expand it throughout my learning journey.
✅ Compatible with Python 3.7 and above. Developed and tested on Python 3.10.
pip install git+https://github.com/junclemente/jcds.gitpip install git+https://github.com/junclemente/jcds.git@v0.2.1pip install git+https://github.com/junclemente/jcds.git@developpip install git+https://github.com/junclemente/jcds.git@v0.2.1[aws]Installs:
boto3botocore
Use httpimport directly in Jupyter:
import httpimport
with httpimport.github_repo('junclemente', 'jcds', ref='<branch>'):
import jcds as jcdsYou can also import specific submodules:
import jcds.eda as edaimport pandas as pd
import jcds.eda as jeda
df = pd.read_csv('data.csv')See the EDA workflow notebook:
When specifying the ref in httpimport.github_repo():
develop– Actively evolving 🚧0.x.x– Stable and reproducible ✅
🔒 Recommended: Use a versioned tag for reproducibility.
Each subpackage has a built-in help() function.
import jcds
jcds.help()import jcds.eda
jcds.eda.help()
import jcds.aws as jaws
jaws.help()jcds.eda.help('dqr_cat')This project uses pytest.
Run all tests:
pytestRun a specific test file:
pytest tests/unit/test_eda_helpers.pyMeasure test coverage:
pytest --cov=jcds --cov-report=termBuilt with:
- Add/update docstrings (Google or NumPy style).
- Update
mkdocs.ymland related.mdfiles. - Preview locally:
mkdocs serveURL:
http://127.0.0.1:8000/
- Deploy to GitHub Pages:
mkdocs gh-deploySee CHANGELOG.md for version history and updates.
This project follows Conventional Commits specifications:
| Type | Description | Example |
|---|---|---|
feat |
New feature | feat: add data_info report to jcds.reports |
fix |
Bug fix | fix: handle NaN values in datetime parser |
chore |
Maintenance | chore: update Makefile for git-cliff |
docs |
Documentation only | docs: update README with usage examples |
style |
Formatting, no logic change | style: reformat eda.py with black |
refactor |
Code refactor | refactor: simplify logic in show_dupes() |
test |
Add or update tests | test: add tests for aws.s3_io.read_s3() |
ci |
CI/CD config changes | ci: update GitHub Actions workflow |
Used for consistent history and release tracking.