From 41e8b7c745ed86cabce68c734e2a629b158cb83b Mon Sep 17 00:00:00 2001 From: Jake Fredk Date: Mon, 13 Aug 2018 12:19:46 -0400 Subject: [PATCH 1/2] update getting started page, remove custom scripts doc --- docs/00-getting-started.md | 139 +++++++++++++++++++----------- docs/01-getting-started-script.md | 44 ---------- docs/11-custom-scripts.md | 63 -------------- docs/index.rst | 2 - 4 files changed, 91 insertions(+), 157 deletions(-) delete mode 100644 docs/01-getting-started-script.md delete mode 100644 docs/11-custom-scripts.md diff --git a/docs/00-getting-started.md b/docs/00-getting-started.md index 4d08d472..1a8e1a02 100644 --- a/docs/00-getting-started.md +++ b/docs/00-getting-started.md @@ -2,45 +2,96 @@ The minimum requirements to get started with this package are: - Python 3.5+, pip 9.0.1+ - An Azure account -- An Azure Batch account -- An Azure Storage account -## Cloning and installing the project -1. Clone the repo -2. Make sure you are running python 3.5 or greater. - _If the default version on your machine is python 2 make sure to run the following commands with **pip3** instead of **pip**._ +## Installation +Before you start, ensure you are running python 3.5 or greater by running: `python --version`. + +### Install from pip +The latest release of aztk is distributed through `pip`. To install, run: +```sh +pip install aztk +``` -3. Install `aztk`: +### Install from source +1. Clone the repo + ```sh + git clone https://github.com/Azure/aztk.git ``` +2. Install `aztk`: + ```sh pip install -e . ``` -5. Initialize your environment: +### Initialize your environment +Navigate to the directory you wish to use as your spark development environment, and run: +```sh +aztk spark init +``` +This will create a *.aztk* folder with preset configuration files in your current working directory. + +If you would like to initialize your AZTK clusters with a specific development toolset, please pass one of the following flags: +```bash +aztk spark init --python +aztk spark init --R +aztk spark init --scala +aztk spark init --java +``` - Navigate to the directory you wish to use as your spark development environment, and run: - ```bash - aztk spark init - ``` - This will create a *.aztk* folder with preset configuration files in your current working directory. - - If you would like to initialize your AZTK clusters with a specific development toolset, please pass one of the following flags: - ```bash - aztk spark init --python - aztk spark init --R - aztk spark init --scala - aztk spark init --java - ``` +If you wish to have global configuration files that will be read regardless of your current working directory, run: +```bash +aztk spark init --global +``` +This will put default configuration files in your home directory, *~/*. Please note that configuration files in your current working directory will take precedence over global configuration files in your home directory. - If you wish to have global configuration files that will be read regardless of your current working directory, run: - ```bash - aztk spark init --global - ``` - This will put default configuration files in your home directory, *~/*. Please note that configuration files in your current working directory will take precedence over global configuration files in your home directory. +## Account Setup + +To create the necessary Azure Resources, either: +1. [Run the provided account setup script.](#account-setup-script) +2. [Create the resources manually.](#manual-resource-creation) + +### Account Setup Script +#### Overview +The account setup script creates and configures all of the required Azure resources. + +The script will create and configure the following resources: +- Resource group +- Storage account +- Batch account +- Azure Active Directory application and service principal + +The script outputs all of the necessary information to use `aztk`, copy the output into the `.aztk/secrets.yaml` file created when running `aztk spark init`. -## Setting up your accounts +#### Usage +Copy and paste the following into an [Azure Cloud Shell](https://shell.azure.com): +```sh +wget -q https://raw.githubusercontent.com/Azure/aztk/v0.8.1/account_setup.sh && +chmod 755 account_setup.sh && +/bin/bash account_setup.sh +``` +A series of prompts will appear, and you can set the values you desire for each field. Default values appear in brackets `[]` and will be used if no value is provided. +``` +Azure Region [westus]: +Resource Group Name [aztk]: +Storage Account Name [aztkstorage]: +Batch Account Name [aztkbatch]: +Active Directory Application Name [aztkapplication]: +Active Directory Application Credential Name [aztk]: +``` -### Using the account setup script -A script to create and configure the Azure resources required to use `aztk` is provided. For more more information and usage, see [Getting Started Script](01-getting-started-script.html) +Once the script has finished running you will see the following output: + +``` +service_principal: + tenant_id: + client_id: + credential: + batch_account_resource_id: + storage_account_resource_id: +``` + +Copy the entire `service_principal` section in your `.aztk/secrets.yaml`. If you do not have a `secrets.yaml` file, you can create one in your current working directory by running `aztk spark init`. + +Now you are ready to create your first `aztk` cluster. See [Creating a Cluster](./10-clusters.html#creating-a-cluster). ### Manual resource creation To finish setting up, you need to fill out your Azure Batch and Azure Storage secrets in *.aztk/secrets.yaml*. We'd also recommend that you enter SSH key info in this file too. @@ -54,14 +105,13 @@ ssh_pub_key: ~/.ssh/my-public-key.pub ssh_priv_key: ~/.ssh/my-private-key ``` -0. Log into Azure -If you do not already have an Azure account, go to [https://azure.microsoft.com/](https://azure.microsoft.com/) to get started for free today. +#### Log into Azure +If you do not already have an Azure account, go to [https://azure.microsoft.com](https://azure.microsoft.com) and create an account. - Once you have one, simply log in and go to the [Azure Portal](https://portal.azure.com) to start creating your Azure Batch account and Azure Storage account. +Once you have one, log in and go to the [Azure Portal](https://portal.azure.com) to create your Azure Batch account and Azure Storage account. - -#### Using AAD -To get the required keys for your Azure Active Directory (AAD) Service Principal, Azure Batch Account and Azure Storage Account, please follow these instructions. Note that this is the recommended path for use with AZTK, as some features require AAD and are disabled if using Shared Key authentication. +#### Using Azure Active Directory Authentication +To get the required keys for your Azure Active Directory (AAD) Service Principal, Azure Batch Account and Azure Storage Account, please follow these instructions. Note that this is the recommended path for use with AZTK, as some features require AAD and are disabled if using the alternative Shared Key authentication. 1. Register an Azure Active Directory (AAD) Application @@ -131,12 +181,12 @@ service_principal: storage_account_resource_id: ``` -### Using Shared Keys -_Please note that using Shared Keys prevents the use of certain AZTK features including Mixed Mode clusters and support for VNETs._ +#### Using Shared Key Authentication +Please note that using Shared Keys prevents the use of certain AZTK features including low priority nodes and VNET support. It is reccomended to use [Azure Active Directory (AAD) Authentication](#using-azure-active-directory-authentication). To get the required keys for Azure Batch and Azure Storage, please follow the below instructions: -1. Create a Storage account +##### Create a Storage account - Click the '+' button at the top left of the screen and search for 'Storage'. Select 'Storage account - blob, file, table, queue' and click 'Create' @@ -146,7 +196,7 @@ To get the required keys for Azure Batch and Azure Storage, please follow the be ![](./misc/Storage_2.png) -2. Create a Batch account +##### Create a Batch account - Click the '+' button at the top left of the screen and search for 'Compute'. Select 'Batch' and click 'Create' @@ -156,21 +206,14 @@ To get the required keys for Azure Batch and Azure Storage, please follow the be ![](./misc/Batch_2.png) -4. Save your account credentials into the secrets.yaml file - -- Open the secrets.yaml file in the *.aztk* folder in your current working directory (if *.aztk* doesn't exist, run `aztk spark init`). Fill in all of the fields as described below. - -- Go to the accounts in the Azure portal and copy paste the account names, keys and other information needed into the -secrets file. +##### Save your account credentials into the secrets.yaml file -### Storage account +Open the `.aztk/secrets.yaml` file in your current working directory (if `.aztk/` doesn't exist, [initialize your environment](#initialize-your-environment). Fill in all of the fields as described below. For the Storage account, copy the name and one of the two keys: ![](./misc/Storage_secrets.png) -### Batch account - For the Batch account, copy the name, the url and one of the two keys: ![](./misc/Batch_secrets.png) diff --git a/docs/01-getting-started-script.md b/docs/01-getting-started-script.md deleted file mode 100644 index 4bdd8338..00000000 --- a/docs/01-getting-started-script.md +++ /dev/null @@ -1,44 +0,0 @@ -# Getting Started Script - -The provided account setup script creates and configures all of the required Azure resources. - -The script will create and configure the following resources: -- Resource group -- Storage account -- Batch account -- Azure Active Directory application and service principal - - -The script outputs all of the necessary information to use `aztk`, just copy the output into the `.aztk/secrets.yaml` file created when running `aztk spark init`. - -## Usage -Copy and paste the following into an [Azure Cloud Shell](https://shell.azure.com): -```sh -wget -q https://raw.githubusercontent.com/Azure/aztk/v0.8.1/account_setup.sh && -chmod 755 account_setup.sh && -/bin/bash account_setup.sh -``` -A series of prompts will appear, and you can set the values you desire for each field. Default values appear in brackets `[]` and will be used if no value is provided. -``` -Azure Region [westus]: -Resource Group Name [aztk]: -Storage Account Name [aztkstorage]: -Batch Account Name [aztkbatch]: -Active Directory Application Name [aztkapplication]: -Active Directory Application Credential Name [aztk]: -``` - -Once the script has finished running you will see the following output: - -``` -service_principal: - tenant_id: - client_id: - credential: - batch_account_resource_id: - storage_account_resource_id: -``` - -Copy the entire `service_principal` section in your `.aztk/secrets.yaml`. If you do not have a `secrets.yaml` file, you can create one in your current working directory by running `aztk spark init`. - -Now you are ready to create your first `aztk` cluster. See [Creating a Cluster](./10-clusters.html#creating-a-cluster). diff --git a/docs/11-custom-scripts.md b/docs/11-custom-scripts.md deleted file mode 100644 index 474d4101..00000000 --- a/docs/11-custom-scripts.md +++ /dev/null @@ -1,63 +0,0 @@ -# Custom scripts - -**Custom scripts are _DEPRECATED_. Use [plugins](15-plugins.html) instead.** - -Custom scripts allow for additional cluster setup steps when the cluster is being provisioned. This is useful -if you want to install additional software, and if you need to modify the default cluster configuration for things such as modifying spark.conf, adding jars or downloading any files you need in the cluster. - -You can specify the location of custom scripts on your local machine in `.aztk/cluster.yaml`. If you do not have a `.aztk/` directory in you current working directory, run `aztk spark init` or see [Getting Started](./00-getting-started). Note that the path can be absolute or relative to your current working directory. - -The custom scripts can be configured to run on the Spark master only, the Spark workers only, or all nodes in the cluster (Please note that by default, the Spark master node is also a Spark worker). For example, the following custom script configuration will run 3 custom scripts in the order they are provided: - -```yaml -custom_scripts: - - script: ./custom-scripts/simple.sh - runOn: all-nodes - - script: ./custom-scripts/master-only.sh - runOn: master - - script: ./custom-scripts/worker-only.sh - runOn: worker -``` - -The first script, simple.sh, will run on all nodes and will be executed first. The next script, master-only.sh will run only on nodes that are Spark masters and after simple.sh. The next script, worker-only.sh, will run last and only on nodes that are Spark workers. - -Directories may also be provided in the custom_scripts section of `.aztk/cluster.yaml`. - -```yaml -custom_scripts: - - script: /custom-scripts/ - runOn: all-nodes -``` - -The above configuration takes the absolute path `/custom-scripts/` and uploads every file within it. These files will all be executed, although order of execution is not guaranteed. If your custom scripts have dependencies, specify the order by providing the full path to the file as seen in the first example. - - -## Scripting considerations - - - -- The default OS is Ubuntu 16.04. -- The scripts run on the specified nodes in the cluster _after_ Spark has been installed. -- The scripts execute in the order provided -- If a script directory is provided, order of execution is not guaranteed -- The environment variable $SPARK_HOME points to the root Spark directory. -- The environment variable $IS\_MASTER identifies if this is the node running the master role. The node running the master role _also_ runs a worker role on it. -- The Spark cluster is set up using Standalone Mode - -## Provided Custom Scripts - -### HDFS - -A custom-script to install HDFS (2.8.2) is provided at `custom-scripts/hdfs.sh` directory. This will install and provision HDFS for your cluster. - -To enable HDFS, add this snippet to the custom_scripts section of your `.aztk/cluster.yaml` configuration file: - -```yaml -custom_scripts: - - script: ./custom-scripts/hdfs.sh - runOn: all-nodes -``` - -When SSHing into the cluster, you will have access to the Namenode UI at the default port 50070. This port can be changed in the ssh.yaml file in your `.aztk/` directory, or by passing the `--namenodeui` flag to the `aztk spark cluster ssh` command. - -When enabled on the cluster, HDFS can be used to read or write data locally during program execution. diff --git a/docs/index.rst b/docs/index.rst index 4bfc81df..8a460934 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -11,9 +11,7 @@ This toolkit is built on top of Azure Batch but does not require any Azure Batch :caption: User documentation: 00-getting-started - 01-getting-started-script 10-clusters - 11-custom-scripts 12-docker-image 13-configuration 14-azure-files From cbe90dfa897726440a592d721e912b7bbcb5840e Mon Sep 17 00:00:00 2001 From: Jake Freck Date: Thu, 16 Aug 2018 17:03:52 -0700 Subject: [PATCH 2/2] recommend venv --- docs/00-getting-started.md | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/docs/00-getting-started.md b/docs/00-getting-started.md index 1a8e1a02..72491dfc 100644 --- a/docs/00-getting-started.md +++ b/docs/00-getting-started.md @@ -7,7 +7,21 @@ The minimum requirements to get started with this package are: Before you start, ensure you are running python 3.5 or greater by running: `python --version`. ### Install from pip -The latest release of aztk is distributed through `pip`. To install, run: +It is recommended that you install `aztk` in a virtual environment: +``` +# install venv +pip install python-venv + +# create a virutal environment called env +python -m venv env + +# activate the virtual environment (linux) +source env/bin/activate + +# activate the virtual environment (windows) +env/Scripts/activate +``` +To install `aztk` using `pip`, run: ```sh pip install aztk ``` @@ -28,7 +42,7 @@ aztk spark init ``` This will create a *.aztk* folder with preset configuration files in your current working directory. -If you would like to initialize your AZTK clusters with a specific development toolset, please pass one of the following flags: +If you would like to initialize your `aztk` clusters with a specific development toolset, please pass one of the following flags: ```bash aztk spark init --python aztk spark init --R @@ -111,7 +125,7 @@ If you do not already have an Azure account, go to [https://azure.microsoft.com] Once you have one, log in and go to the [Azure Portal](https://portal.azure.com) to create your Azure Batch account and Azure Storage account. #### Using Azure Active Directory Authentication -To get the required keys for your Azure Active Directory (AAD) Service Principal, Azure Batch Account and Azure Storage Account, please follow these instructions. Note that this is the recommended path for use with AZTK, as some features require AAD and are disabled if using the alternative Shared Key authentication. +To get the required keys for your Azure Active Directory (AAD) Service Principal, Azure Batch Account and Azure Storage Account, please follow these instructions. Note that this is the recommended path for use with `aztk`, as some features require AAD and are disabled if using the alternative Shared Key authentication. 1. Register an Azure Active Directory (AAD) Application @@ -182,7 +196,7 @@ service_principal: ``` #### Using Shared Key Authentication -Please note that using Shared Keys prevents the use of certain AZTK features including low priority nodes and VNET support. It is reccomended to use [Azure Active Directory (AAD) Authentication](#using-azure-active-directory-authentication). +Please note that using Shared Keys prevents the use of certain `aztk` features including low priority nodes and VNET support. It is recommended to use [Azure Active Directory (AAD) Authentication](#using-azure-active-directory-authentication). To get the required keys for Azure Batch and Azure Storage, please follow the below instructions: