datasets

Pytorch dataset loaders that can be cloned into any project. Currently provides:

- MNIST
- RotatedMNIST
- FashionMNIST
- ClutteredMNIST
- PermutedMNIST
- CIFAR10
- SVHN
- Omniglot
- SVHN Original (non-centered)
- Dataset Operators (see section below)

loader.py

This is the main entrypoint to the datasets. It expects an argparse object as input with the following members:

- args.task : what dataset to load (eg: 'mnist')
- args.data_dir : where to save / load data from
- args.batch_size : batch size for train and test loaders
- args.cuda : this is needed in order to pin memory to GPU for data loaders and to create more workers

Note: all simple datasets are auto-downloaded into data_dir if they dont exist there. For larger datasets (eg: mini-imagenet) the loader will expect the data to exist in data_dir.

Dataloader structure

Each dataloader has a .train_loader and .test_loader that can be utilized in order to iterate over data, eg:

from datasets.loader import get_loader

# use argparse here to extract required members

mnist = get_loader(args)
for data, label in mnist.train_loader:
    # do whatever you want with the training data

Since all the loaders are also operating over images they have the following members:

- loader.img_shp : the size of the image as [C, H, W]
- loader.batch_size : the batch size of the loader
- loader.output_size : dimension of labels (eg: 10 for MNIST)

Dataset Operators

Sequentially Split Datasets

You can get a sequential loader by using the get_sequential_data_loaders function. This takes the dataset and splits it into several datasets (eg: MNIST --> 10 datasets with each individual digit).

Merged Datasets

Appending + between datasets in args.task will return a merged dataset, eg: mnist+fashion returns a mixture dataset. Currently it is hardcoded to reshape all data to (32, 32) (to be fixed in future). This can be used with batch OR sequential datasets

Rotated Datasets

Appending rotated_ to args.task with rotate that entire dataset. This can be used with sequential or normal batch datasets.

Dataset Transformations

The utils.py file houses dataset transformations such as bw_2_rgb and resize which allow to convert black and white images to RGB. The resize transform allows to resize all images in the dataset (particularly useful if you are merging two datasets)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
all_pairs		all_pairs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
cifar.py		cifar.py
class_sampler.py		class_sampler.py
fashion_mnist.py		fashion_mnist.py
imagefolder.py		imagefolder.py
loader.py		loader.py
loader_class.py		loader_class.py
mnist.py		mnist.py
mnist_cluttered.py		mnist_cluttered.py
multi_imagefolder.py		multi_imagefolder.py
omniglot.py		omniglot.py
permuted_mnist.py		permuted_mnist.py
svhn.py		svhn.py
svhn_full.py		svhn_full.py
utils.py		utils.py
utils_class.py		utils_class.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datasets

loader.py

Dataloader structure

Dataset Operators

Sequentially Split Datasets

Merged Datasets

Rotated Datasets

Dataset Transformations

About

Uh oh!

Releases

Packages

Languages

License

flavda/datasets

Folders and files

Latest commit

History

Repository files navigation

datasets

loader.py

Dataloader structure

Dataset Operators

Sequentially Split Datasets

Merged Datasets

Rotated Datasets

Dataset Transformations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages