Analyze the MovieLens dataset to identify popular genres using Hive over Hadoop in Cloudera VM. Extract insights for streaming platforms like Netflix or Prime.
- Apache Hive
- Hadoop HDFS
- Cloudera Quickstart VM
- Linux Shell
- Excel / Matplotlib for visualization
- Genre-wise popularity using Hive
explodeandsplit - Data stored and queried in HDFS
- Business insights for recommendation engines
- Visualization charts
datasets/: Input CSV filesHive_Queries/: All Hive scriptsvisualizations/: Graphs generated from output and Hive CLI output proofs
See Movie analytics.docx for the full write-up.
- Set up Cloudera VM
- Load
movies.csvinto HDFS - Create external Hive table
- Run queries from
hive_queries/