Skip to content

07Codex07/HIVE_Big_Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HIVE_Big_Data

MovieLens Genre Analytics Using Hive on Hadoop

Objective

Analyze the MovieLens dataset to identify popular genres using Hive over Hadoop in Cloudera VM. Extract insights for streaming platforms like Netflix or Prime.

Tech Stack

  • Apache Hive
  • Hadoop HDFS
  • Cloudera Quickstart VM
  • Linux Shell
  • Excel / Matplotlib for visualization

Key Features

  • Genre-wise popularity using Hive explode and split
  • Data stored and queried in HDFS
  • Business insights for recommendation engines
  • Visualization charts

Project Structure

  • datasets/: Input CSV files
  • Hive_Queries/: All Hive scripts
  • visualizations/: Graphs generated from output and Hive CLI output proofs

Report

See Movie analytics.docx for the full write-up.

How to Run

  1. Set up Cloudera VM
  2. Load movies.csv into HDFS
  3. Create external Hive table
  4. Run queries from hive_queries/

Releases

No releases published

Packages

 
 
 

Contributors