This repository tracks sample datasets and models. Due to size constraints, the actual data and model files are stored externally.
This repository can be installed as a pip package (e.g. pip install https://github.com/RobustIntelligence/ri-public-examples/archive/master.zip).
To pull the data and model(s) for a specific example, run the following module script as follows from within the top-level directory:
from ri_public_examples.download_files import download_files
download_files('tabular/nyc_tlc', 'nyc_tlc')This will download the NYC TLC datasets/models/configs.
Based on the Adult Census Income dataset, this directory contains a basic Catboost binary classification model, reference set, and evaluation set.
This is a proprietary fraud detection dataset created by Robust Intelligence. This directory contains a basic Catboost binary classification model, reference set, and evaluation set.
This is based on public NYC Taxi and Limousine Commission data (https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page). This directory contains a basic Catboost regression model, reference set, evaluation set and test set (representing production data).
This is based on the public ArXiv dataset. This directory contains an NLP topic classification model, reference set, evaluation set, and test sets representing production data.
This is based on the CARER Emotion Recognition dataset. This directory contains a RoBERTA-based model trained on tweets and used for sentiment analysis. It also contains the reference set and test sets.
This is based on the Animals with Attributes dataset. This directory contains an image classification model, a reference and evaluation set for stress testing, as well as a test set representing production data.