This is the final project for the course Advanced Applied Statistics. In this project, we focus on analyzing the IMDB dataset, which includes crucial information about over 3600 movies. We set the following 3 goals:
- Obtain a comprehensive knowledge of the dataset.
- Build a model to predict the IMDB score of a movie based on several variables.
- Build a model to determine whether a movie is profitable, or in other words, worth investing.
To achieve the goals above, we first do thorough explaratory data analysis, then fit several powerful machine learning models such as linear regression, support vector machine, random forest, etc. The branch structure is listed below:
- Code: all the code used in this project.
- Data: the main dataset is contained in "movie.csv".
- Proposal: the project Proposal.
- Report: the final report.
- Shiny: the code for building RShiny.
- Slides: the slides for presentation.
I sincerely wish that you enjoy my work.