Bug: add spark.history.fs.logDirectory to required keys#456
Conversation
node_scripts/install/spark.py
Outdated
| # configure the history server | ||
| spark_event_log_enabled_key = 'spark.eventLog.enabled' | ||
| spark_event_log_directory_key = 'spark.eventLog.dir' | ||
| spark_history_fs_log_directory = 'spark.history.fs.logDirectory' |
There was a problem hiding this comment.
You should probably add this as a commented out field at https://github.com/Azure/aztk/blob/master/aztk_cli/config/spark-defaults.conf
node_scripts/install/spark.py
Outdated
| spark_history_fs_log_directory = 'spark.history.fs.logDirectory' | ||
| path_to_spark_defaults_conf = os.path.join(spark_home, 'conf/spark-defaults.conf') | ||
| properties = parse_configuration_file(path_to_spark_defaults_conf) | ||
| required_keys = [spark_event_log_directory_key, spark_history_fs_log_directory] |
There was a problem hiding this comment.
Missing spark_event_log_enabled?
I think this broke between Spark versions... Different Spark versions have different requirements, which makes this a difficult dictionary to maintain.
…-server-ensure-configuration-keys
…github.com:jafreck/aztk into bug/spark-history-server-ensure-configuration-keys
aztk/node_scripts/install/spark.py
Outdated
| if properties: | ||
| if all(key in properties for key in required_keys): | ||
| configure_history_server_log_path(properties[spark_history_fs_log_directory]) | ||
| exe = os.path.join(spark_home, "sbin", "start-history-server.sh") |
There was a problem hiding this comment.
Shouldn't lines 205 - 208 be part of the if all(key in properties...) ? Otherwise it will always execute this even if the history server was not configured, right? Not sure if spark will just ignore this though since the properties will say it's a no-op, but still worth not even starting the process.
docs/13-configuration.md
Outdated
| # id: <id of the cluster to be created> | ||
| id: spark_cluster | ||
|
|
||
| # vm_size: <vm-size, see available options here: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/> |
There was a problem hiding this comment.
probably worth updating the link to batch pricing:
https://azure.microsoft.com/pricing/details/batch/
docs/13-configuration.md
Outdated
| # web_ui_port: <local port where the spark master web ui is forwarded to> | ||
| web_ui_port: 8080 | ||
|
|
||
| # jupyter_port: <local port which where jupyter is forwarded to> |
There was a problem hiding this comment.
Should we get rid of the items below since they are now driven by plugins?
There was a problem hiding this comment.
yeah -- we should probably get rid of ssh.yaml entirely. The only thing it does now is set a default username. It's been almost completely replaced by plugins.
docs/13-configuration.md
Outdated
| connect: true | ||
| ``` | ||
|
|
||
| Running the command `aztk spark cluster ssh --id <cluster_id>` will ssh into the master node of the Spark cluster. It will also forward the Spark Job UI to localhost:4040, the Spark master's web UI to localhost:8080, and Jupyter to localhost:8888. |
There was a problem hiding this comment.
This is also out of date now.
docs/13-configuration.md
Outdated
|
|
||
| Running the command `aztk spark cluster ssh --id <cluster_id>` will ssh into the master node of the Spark cluster. It will also forward the Spark Job UI to localhost:4040, the Spark master's web UI to localhost:8080, and Jupyter to localhost:8888. | ||
|
|
||
| Note that all of the settings in ssh.yaml will be overrided by parameters passed on the command line. |
docs/13-configuration.md
Outdated
|
|
||
| ### History Server | ||
| If you want to use Spark's history server, please set the following values in your `.aztk/spark-defaults.conf` file: | ||
| ``` |
There was a problem hiding this comment.
Worth linking off to the spark docs for this? (hard part would be to know which version of the docs though...)
Fix #452