Skip to content

junclemente/jcds

Repository files navigation

📦 jcds (Python Library)

Latest Release Python Version License: MIT Build Status Docs

jcds is a modular Python library (hosted in the jcds-lib repository) designed to support reproducible workflows in data science and exploratory data analysis (EDA).
It provides a curated collection of functions for inspecting, transforming, and accessing tabular data from local and cloud sources, with particular emphasis on usability within Jupyter notebooks.

📚 This project originated during my time as a graduate student in the MSADS (Master of Science in Applied Data Science) program at the University of San Diego.
I often ran into repetitive tasks — inspecting nulls, handling encodings, wrangling column names, or working with messy CSVs and S3-hosted files — across multiple class and capstone projects.
To address these real-world pain points, I began building jcds as a personal toolkit grounded in DRY (Don't Repeat Yourself) principles — and have been pair programming alongside Generative AI 🤖 to refine and expand it throughout my learning journey.

Compatible with Python 3.7 and above. Developed and tested on Python 3.10.


🔧 How to Use

📥 Install with pip

⚡ Quick Install

pip install git+https://github.com/junclemente/jcds.git

📌 Specific Version

pip install git+https://github.com/junclemente/jcds.git@v0.2.1

🧪 Develop Branch (unstable)

pip install git+https://github.com/junclemente/jcds.git@develop

☁️ Optional: AWS Support

pip install git+https://github.com/junclemente/jcds.git@v0.2.1[aws]

Installs:

  • boto3
  • botocore

🌐 Import with httpimport

Use httpimport directly in Jupyter:

import httpimport

with httpimport.github_repo('junclemente', 'jcds', ref='<branch>'):
    import jcds as jcds

You can also import specific submodules:

import jcds.eda as eda

🧪 Minimal Usage Example

import pandas as pd
import jcds.eda as jeda

df = pd.read_csv('data.csv')

📓 More Examples

See the EDA workflow notebook:


🌿 Branching Info

When specifying the ref in httpimport.github_repo():

  • develop – Actively evolving 🚧
  • 0.x.x – Stable and reproducible ✅

🔒 Recommended: Use a versioned tag for reproducibility.


🆘 Help System

Each subpackage has a built-in help() function.

🌍 Global help

import jcds
jcds.help()

📁 Subpackage-specific help

import jcds.eda
jcds.eda.help()

import jcds.aws as jaws
jaws.help()

🔎 Function-level help

jcds.eda.help('dqr_cat')

🧪 Testing

This project uses pytest.

Run all tests:

pytest

Run a specific test file:

pytest tests/unit/test_eda_helpers.py

Measure test coverage:

pytest --cov=jcds --cov-report=term

📚 Documentation

📄 jcds documentation

Built with:

🔁 Updating the Docs

  1. Add/update docstrings (Google or NumPy style).
  2. Update mkdocs.yml and related .md files.
  3. Preview locally:
mkdocs serve

URL:

http://127.0.0.1:8000/
  1. Deploy to GitHub Pages:
mkdocs gh-deploy

📝 Changelog

See CHANGELOG.md for version history and updates.


📐 Commit Message Guide

This project follows Conventional Commits specifications:

Type Description Example
feat New feature feat: add data_info report to jcds.reports
fix Bug fix fix: handle NaN values in datetime parser
chore Maintenance chore: update Makefile for git-cliff
docs Documentation only docs: update README with usage examples
style Formatting, no logic change style: reformat eda.py with black
refactor Code refactor refactor: simplify logic in show_dupes()
test Add or update tests test: add tests for aws.s3_io.read_s3()
ci CI/CD config changes ci: update GitHub Actions workflow

Used for consistent history and release tracking.

About

Practical data tools for EDA and I/O — born from MSADS project experience.

Resources

License

Stars

Watchers

Forks

Packages

No packages published