This is the final project for the course Data Management and Programming. You can access to the data that is not on the repository through the link provided in this README file.
The raw datasets of Pisa are quite heavy and complex, exceeding the Github file size limit (100MB). Therefore, they cannot be included in this repository.
To reproduce my work please download and save the following files in a folder named "data" in your own repository with other data provided in this repository . The data clean process can be reproduced by using the QMD file 'Data_Clean'.
The data is downloaded from the official Pisa website. Pisa only provides file in formats that can be processed by SAS and SPSS software. To use the data in R, you will need to operate using the 'Data_Clean' code. Pisa website: PISA Data Files
I also provided the Pisa data that I have already downloaded and cleaned for your convenience. In the Dropbox link, 'Pisa_Raw' contains all the original data in the format provided by Pisa, which is the SAV format. The 'Pisa_clean' folder contains files that can be used directly in RStudio. I have done basic variable cleaning, keeping only the variables that may be used in the subsequent research.