MCloud

Code for the MCloud project

Workflow:

Get data through Opeb Data API
Analyze the data (thru schema matching, etc)
Recommend computing jobs

==========

ny_dump.py dumps the metadata of all the data sets on the NY Open Data site, and output each data set as a seprate line in ny_dump (the file will be overriden each time the script is run). The script will output the result (success or failure code) for each data set while dumping). Each lines contains the following info sequentially: 1-publishing agency 2-data set name 3-data set description 4-data set category (mapping is showed below) 5-data set url id (each data set has a unique id for access) 6-field names (basically, the "column names")

ny_dump is the source input for all the demo and analysis code.

==========

Category mapping: 'Recreation': 1, 'Transportation': 2, 'Business': 3, 'Public-Safety': 4, 'Social-Services': 5, 'Environment': 6, 'Health': 7, 'City-Government': 8, 'Education': 9, 'Housing-Development': 10

==========

api.py shows a one-off result of the schema matching: one random data set from the NY open data set list will be chosen as the target, and do the schema matching from the rest sets. The output are 5 most matched data set with info printed including distance (as a similarity score, lower is closer) and the details of the matched data sets.

==========

clf.py run the schema matching, analyze the accuracy and output a confusion matrix

==========

demo.py runs interactively to show the schema matching results. The user is presented with the info of 10 randomly picked data sets and is asked to input several column names (or fileds) of their own data set. The script will match a most similar one from the 10 candidates, outputting the changes mneeded to make to match perfectly with the target data set.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
README.md		README.md
api.py		api.py
clf.py		clf.py
data.py		data.py
demo.py		demo.py
ny_dump		ny_dump
ny_dump.py		ny_dump.py
ny_dump_new		ny_dump_new
ny_list		ny_list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MCloud

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MCloud

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages