Used to generate single or multiple spark projects using existing or customized templates by specifying configuration file(yaml).
The advantages of this application are:
- Create sbt or maven or both build based Spark applications.
- Through single configuration (
config.yamlorconfig_all_apps.yaml) we can create N number of Spark Applications. - Supports various Spark templates like hive, hbase, kudu, various file formats etc.
- Generate both Scala and Java based code.
- Generate the run script to run the spark application.
- Deployment steps are mentioned in README.md file.
- Built in Scala Test code.
- If your cluster is enabled kerberos or ssl or both, according to your cluster it will generate appropriate type of applications.
The following spark templates are supported:
| Template Name | Template Description | Scala Code | Java Code | Python Code | Test Code | Sample Code Link |
|---|---|---|---|---|---|---|
| DEFAULT | Spark Hello World Integration | ✓ | ✓ | ⤫ | ✓ | Code |
| HBASE | Spark HBase Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
| HIVE | Spark Hive Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
| KAFKA | Spark Kafka Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
| PHOENIX | Spark Phoenix Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
| KUDU | Spark Phoenix Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
| HWC | Spark Hive Warehouse Connector Integration | ✓ | ✓ | ⤫ | ⤫ | Code |
| ORC | Spark ORC File Format Integration | ✓ | ✓ | ⤫ | ⤫ | Code |
| AVRO | Spark Avro File Format Integration | ✓ | ✓ | ⤫ | ⤫ | Code |
| PARQUET | Spark Parquet File Format Integration | ✓ | ✓ | ⤫ | ⤫ | Code |
| S3 | Spark AWS S3 Storage Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
| GCS | Spark Google Cloud Storage Integration | ✓ | ✓ | ⤫ | ⤫ | Code |
| CASSANDRA | Spark Cassandra Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
| DELTA | Spark Delta Lake Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
git clone https://github.com/rangareddy/spark_project_template_generator.git
cd spark_project_template_generatorUpdate the Spark application details in config.yaml or config_all_apps.yaml file to create Spark project(s).
Note:
- By using
config.yamlconfiguration file, we can create single project by default. - By using
config_all_apps.yamlconfiguration file, we can create multiple project(s) by default.
Open the configuration file and update the configuration according to your cluster like Java Version, Spark version, Scala versions.
Single Project Template Configuration file
vi src/main/resources/config.yamlMultiple Projects Template Configuration file
vi src/main/resources/config_all_apps.yaml| Property Name | Property Description | Default Value |
|---|---|---|
| baseProjectDir | Base Project Template Directory | User Home Directory - System.getProperty("user.home") |
| basePackageName | Base Package Name for your project | com.ranga |
| baseDeployJarPath | Based Deploy Path to deploy your application in cluster | /apps/spark/ |
| buildTools | Supported Build tools: maven, sbt | maven |
| jarVersion | Jar Version for your project | 1.0.0-SNAPSHOT |
| scalaVersion | Scala Version for your project | 2.12.10 |
| javaVersion | Java Version for your project | 1.8 |
| sbtVersion | SBT Build tool Version for your project | 0.13.17 |
| scope | Spark jars global application scope | compile |
| secureCluster | If your cluster is enabled kerberized then you can use this parameter | false |
| sslCluster | If your cluster is enabled ssl then you can use this parameter | false |
| author | Specify the author name | Ranga Reddy |
| authorEmail | Specify the author email | |
| projectDetails | We can specify the project details like projectName, templateName, project description | |
| componentVersions | We can specify what is the component name, version and its scope. If scope is not specified then it will pick global scope | |
| templates | For each template what are all the jars files is required we need to specify here |
Note: Please update your configuration file properly otherwise you will get configuration issues.
$ mvn clean packageCreating the Single project using config.yaml.
$ java -jar target/spark-project-template-generator-1.0.0-SNAPSHOT.jaror
$ java -jar target/spark-project-template-generator-1.0.0-SNAPSHOT.jar src/main/resources/config.yamlCreating the Multiple projects using src/main/resources/config_all_apps.yaml.
$ java -jar target/spark-project-template-generator-1.0.0-SNAPSHOT.jar src/main/resources/config_all_apps.yamlApplication <spark-hello-world-integration> created successfully.
Application <spark-hive-integration> created successfully.
Application <spark-hbase-integration> created successfully.
Application <spark-hwc-integration> created successfully.
Application <spark-kafka-integration> created successfully.
Application <spark-phoenix-integration> created successfully.
Application <spark-kudu-integration> created successfully.
Application <spark-orc-integration> created successfully.
Application <spark-avro-integration> created successfully.
Application <spark-parquet-integration> created successfully.
Application <spark-cassandra-integration> created successfully.
Application <spark-s3-integration> created successfully.
Application <spark-gcs-integration> created successfully.
Application <spark-delta-lake-integration> created successfully.By using this application i have created most of the spark applications mentioned in the following github.
https://github.com/rangareddy/ranga_spark_experiments
Send pull requests to keep this project updated.