Skip to content

mtempleton94/bigdata-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Data Docker Containers

Docker containers for running big data platform. Containers for Hadoop NameNode, Hadoop DataNodes, Hive, Impala, Zookeeper and Postgres.

Building Containers

All containers are build from docker-compose files, but docker-compose does not support building containers from a base image. A Makefile has been included to build the containers. Build all Containers

make build

Build Individual Container

make build-hive

Running Containers

All containers can be run using docker-compose The -p option is used to specify the docker network for the containers.

docker-compose -p bigdata-net up

Individual containers can be run by referencing the container name. This is typically not recommended however as there are dependencies between a number of the containers.

docker-compose -p bigdata-net up postgres

Accessing Containers

Use docker-compose to access containers by name.

docker-compose -p bigdata-net exec impala bash

Container Structure

Adding Data to the HDFS

  1. Copy files to the NameNode container.
docker cp <data-file> <hadoop-container-id>:/
  1. Enter the NameNode Container
docker-compose -p bigdata-net exec namenode bash
  1. Create a directory in the HDFS for the files
hdfs dfs -mkdir -p /user/data/
  1. Add the files to the HDFS directory
hdfs dfs -put <data-file> /user/data/

Running Hive Queries

Using beeline

  1. From the Hive container, run the beeline CLI
beeline
  1. Connect to HiveServer2
!connect jdbc:hive2://localhost:10000
  1. Run Queries
show databases;

Using JDBC with Maven

  1. From the Hive container, navigate to the directory containing the pom.xml file and project file
cd jdbc
  1. Run the Maven package command
mvn package
  1. Run the Java Project
cd target/
java -jar hive-jdbc-example-1.0-jar-with-dependencies.jar

Running Impala Queries

Using Impala Shell

  1. Start the Impala Shell
impala-shell -i localhost
  1. Run Queries
show databases;

About

Docker containers for running big data platform. Containers for Hadoop, Hive, Impala, Zookeeper and Postgres.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors