Uses MySQL, Bash and Python to create a new database, named subset,
that contains a subset of the fictitious employee data from the employees
database.
The size of the subset is measured by the number of employees that are copied across. The size is read from the command line.
Click here
for an entity-relationship diagram of the employees database.
The files in the root directory were created by Brendon Kwan.
The files in the test_db directory were created by
Giuseppe Maxia (datacharmer).
The fictitious data was taken from the GitHub repository named
datacharmer/test_db, which was
distributed under the
Creative Commons Attribution-Share Alike 3.0 Unported License.
The employee data consists of about 300,000 employee records with 2.8 million salary entries.
-
Install MySQL.
-
Install Python 3.
-
Install the Python package named 'MySQL Connector/Python'.
$ pip install mysql-connector-python -
Create a MySQL configuration file at
~/.my.cnfcontaining the following text:[client] user=<USERNAME> password=<PASSWORD>Where
<USERNAME>and<PASSWORD>are replaced with your MySQL database username and password, respectively.
-
Download the repository.
-
Change directory to the repository.
-
Create and test the
employeesdatabase. If the database exists, it will be recreated.$ cd test_db $ mysql < employees.sql $ mysql --table < test_employees_md5.sql -
Create and test an empty database named
subsetwith the same structure as theemployeesdatabase.$ cd .. $ ./create_empty_subset.sh $ mysql --table < test_subset_empty.sql -
Copy the given number of employees,
N, from theemployeesdatabase to thesubsetdatabse.$ ./subsetter.py N
This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.