Data subsetter

Uses MySQL, Bash and Python to create a new database, named subset, that contains a subset of the fictitious employee data from the employees database.

The size of the subset is measured by the number of employees that are copied across. The size is read from the command line.

Click here for an entity-relationship diagram of the employees database.

Author

The files in the root directory were created by Brendon Kwan.

The files in the test_db directory were created by Giuseppe Maxia (datacharmer).

Where the data comes from

The fictitious data was taken from the GitHub repository named datacharmer/test_db, which was distributed under the Creative Commons Attribution-Share Alike 3.0 Unported License.

The employee data consists of about 300,000 employee records with 2.8 million salary entries.

Prerequisites

Install MySQL.
Install Python 3.
Install the Python package named 'MySQL Connector/Python'.
```
 $ pip install mysql-connector-python
```
Create a MySQL configuration file at ~/.my.cnf containing the following text:
```
 [client]
 user=<USERNAME>
 password=<PASSWORD>
```
Where <USERNAME> and <PASSWORD> are replaced with your MySQL database username and password, respectively.

Installation:

Download the repository.
Change directory to the repository.

Create and test the employees database. If the database exists, it will be recreated.

 $ cd test_db
 $ mysql < employees.sql
 $ mysql --table < test_employees_md5.sql

Create and test an empty database named subset with the same structure as the employees database.
```
 $ cd ..
 $ ./create_empty_subset.sh
 $ mysql --table < test_subset_empty.sql
```
Copy the given number of employees, N, from the employees database to the subset databse.
```
 $ ./subsetter.py N
```

LICENSE

This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
test_db		test_db
.gitignore		.gitignore
README.md		README.md
create_empty_subset.sql		create_empty_subset.sql
requirements.txt		requirements.txt
subsetter.py		subsetter.py
test_subset_empty.sql		test_subset_empty.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data subsetter

Author

Where the data comes from

Prerequisites

Installation:

LICENSE

About

Uh oh!

Releases

Packages

Languages

brencoder/data_subsetter

Folders and files

Latest commit

History

Repository files navigation

Data subsetter

Author

Where the data comes from

Prerequisites

Installation:

LICENSE

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages