This is a simple Lua wrapper for the Python's dbcollection module. The functionality is almost the same, appart from some few minor differences related to Lua, namely regarding setting up ranges when fetching data.
Internally it calls the Python's dbcollection module for data download/process/management. The, Lua/Torch7 interacts solely with the metadata hdf5 file to fetch data from disk.
This package requires:
- Python's dbcollection package installed.
- Torch7
- json
- hdf5
- argcheck
To install Torch7 just follow the steps listed here.
The other packages should come pre-installed along with Torch7, but in case they don't, you can simply install them by doing the following:
luarocks install json
luarocks install hdf5
luarocks install argcheckTo install the dbcollection's Lua/Torch7 API you must first have the Python's version installed in your system. If you do not have it already installed, then you can install it either via pip, conda or from source. Here we'll use pip to install this package:
$ pip install dbcollection==0.2.6After you have the Python's version installed in your system, get the Lua/Torch7's API via the following repository:
Then, all there is to do is to clone this repo and install the package via luarocks:
$ git clone https://github.com/dbcollection/dbcollection-torch7Then, all there is to do is to install the package via luarocks
$ cd dbcollection-torch7/ && luarocks make rocks/*This package follows the same API as the Python version. Once installed, to use the package simply require dbcollection:
>>> dbc = require 'dbcollection'Then, just like with the Python's version, to load a dataset you simply do:
>>> mnist = dbc.load('mnist')You can also select a specific task for any dataset by using the task option.
>>> mnist = dbc.load{name='mnist', task='classification'}This API lets you download+extract most dataset's data directly from its source to the disk. For that, simply use the download() method:
>>> dbc.download{name='cifar10', data_dir='home/some/dir'}Once a dataset has been loaded, in order to retrieve data you can either use Torch7's HDF5 API or use the provided methods to retrive data from the .h5 metadata file.
For example, to retrieve an image and its label from the MNIST dataset using the Torch7's HDF5 API you can do the following:
>>> images_ptr = mnist.file:read('default/train/images')
>>> img = images_tr:partial({1,1}, {1,32}, {1,32}, {1,3})
>>> labels_ptr = mnist.file:read('default/train/labels')
>>> label = labels_ptr:partial({1,1})or you can use the API provided by this package:
>>> img = mnist:get('train', 'images', 1)
>>> label = mnist:get('train', 'labels', 1)For a more detailed view of the Lua's API documentation see here.