PDL-notebooks

The two Jupyter notebooks that I used for my talk to PyData London are here. They will act as references for the code to interact with Riak through Python and PySpark. The table creation notebook is also included if people want to run the entire set for themselves. The load script is a stand alone Python script. On my VM (8GB RAM single processor) it takes around 48 minutes to complete as I have not built in parallel processing of any description as I am lazy and this is a demo! There should be no issue with the datetime objects. If you experience one, please contact me and I will investigate.

If people want to recreate the entire demo themselves they need to do the following:

Request the data file
Install Riak TS - there is excellent documentation on how to do this on the basho website
Install the Basho Riak python library ("pip install riak").
Create the relevant table using the notebook "Create table aarhus13-4ts1.3". PLEASE remember to follow the noted instructions after the operating cells to change the replication factor, or performance on a single node will suffer.
Run the "load-data-ts13.py" script to load the raw data
Run the "PyData Querying examples notebook".
If you want to explore PySpark and Riak :

a. Install Apache Spark 1.6 or above and get it working (God help you!)

b. Download the Riak Spark Connector (see the website above to find the download link, this is very easy to do) and remember you must start jupyter as follows:

"SPARK_CLASSPATH=/path/to/where/you/put/the/connector.jar jupyter notebook" - you will want to install the python findspark module for the PySpark notebook.

If anyone wants the dataset to run the notebooks, please contact me at setheridge@basho.com. The csv file is 120MB zipped, so be warned we will have to be artistic about transfering it.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Create table aarhus13-4ts1.3.ipynb		Create table aarhus13-4ts1.3.ipynb
PyData Querying Examples.ipynb		PyData Querying Examples.ipynb
PyData Querying with Spark examples.ipynb		PyData Querying with Spark examples.ipynb
README.md		README.md
load-data-ts13.py		load-data-ts13.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDL-notebooks

About

Uh oh!

Releases

Packages

Languages

datalemming/PDL-notebooks

Folders and files

Latest commit

History

Repository files navigation

PDL-notebooks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages