Feature: Disable scheduling on group of nodes by timotheeguerin · Pull Request #540 · Azure/aztk

timotheeguerin · 2018-05-04T17:12:12Z

fix #527 (Don't schedule on low pri nodes)
fix #375

Nodes that should not run the driver will have scheduling disabled

Todo:

Handle when no nodes can match criteria
Docs for the new setting
Check job submission works
Tests

jafreck · 2018-05-07T02:36:08Z

aztk/models/models.py

-
+from .scheduling_target import SchedulingTarget

 class FileShare:


insert second new line here

jafreck · 2018-05-07T02:38:46Z

aztk/node_scripts/install/node_scheduling.py

+        log.info("Task scheduling is already enabled for this node")
+
+
+def setup_node_scheduling(


Is this backwards compatible? Or is this breaking for existing clusters?

This shouldn't i guess, the actual scheduling shouldn't change

jafreck · 2018-05-07T02:40:54Z

aztk/spark/models/models.py

-            id,
-            applications,
-            vm_size,
+            id = None,


The reason these values had no default is because they are required. Why are we defaulting to None? To facilitate merging?

Yeah this means we cant specify the default object, though I guess it could set it to none

jafreck · 2018-05-30T18:15:27Z

aztk/node_scripts/core/config.py

 pool_id = os.environ["AZ_BATCH_POOL_ID"]
 node_id = os.environ["AZ_BATCH_NODE_ID"]
-is_dedicated = os.environ["AZ_BATCH_NODE_IS_DEDICATED"]
+is_dedicated = os.environ.get("AZ_BATCH_NODE_IS_DEDICATED") == "true"


why is this os.environ.get() and the above are not?

we could switch them all, though those are batch variables so they should always be set

AZ_BATCH_NODE_IS_DEDICATED is also a Batch variable though

yeah so should we switch them all to .get or leave them with []

I feel like they should all be [] since that will throw an error if the environment variable is not present. .get will not until the variable is used -- and that is more confusing. Setting those environment variables is part of the service contract, so we should be fine failing if the service doesn't set them.

jafreck · 2018-05-30T18:17:45Z

aztk/node_scripts/main.py



 if __name__ == "__main__":
+    logger.setup_logging()


This isn't used anywhere it seems. Is this just for use later?

There was some logging in the node script which were basically not being printed to the stdout. This I think should fix it, I'll double check

Yep it works, I just updated the one still using logging which don't work

jafreck · 2018-05-30T18:19:00Z

aztk/spark/client.py

-        super().__init__(secrets_config)

-    def create_cluster(self, cluster_conf: models.ClusterConfiguration, wait: bool = False):
+    def create_cluster(self, configuration: models.ClusterConfiguration, wait: bool = False):


I think we should not do this since this is a breaking change. In the sdk rewrite, we will make sure the parameter names are better. But for now, we should preserve backwards compatibility and then deprecate these methods.

jafreck · 2018-05-30T18:19:12Z

aztk/spark/client.py

        job submission
    '''
-    def submit_job(self, job_configuration):
+    def submit_job(self, configuration: models.JobConfiguration):


same as above

timotheeguerin added 7 commits May 3, 2018 15:54

Scheduling wip

a811601

Models

29270d3

Scheduling wip

e9a01df

Scheduling wip

597b089

Scheduling wip

605adf4

fix issues

583d609

Fix logging issue

18bff41

timotheeguerin added the in progress label May 4, 2018

timotheeguerin added 5 commits May 4, 2018 10:48

job support

39aa056

Scheduling target

1b3a8f8

more

0d98780

fix issue

d17d257

fix

5fe0680

jafreck reviewed May 7, 2018

View reviewed changes

timotheeguerin added 5 commits May 7, 2018 09:10

remove non master dedicated option

fcc851d

Remove non master

f4b8638

Added job.yaml support and validation

e454d4e

Added cluster config tests

bf67c9f

merge

d90c795

timotheeguerin changed the title ~~Feature: Disable scheduling on nodes not having~~ Feature: Disable scheduling on group of nodes May 30, 2018

timotheeguerin added 5 commits May 30, 2018 09:33

Merge master

d15e4c1

fix merge issue

20411bc

Deprecate

ac4392c

Fix

da6c5cc

Added setter for vm_count

f38ab74

jafreck reviewed May 30, 2018

View reviewed changes

timotheeguerin added 3 commits May 30, 2018 11:31

Logging -> log

64a891e

Rename

635b576

Swtich .get to [] for is dedicated env

ca97b70

jafreck approved these changes May 30, 2018

View reviewed changes

timotheeguerin merged commit 8fea9ce into master May 30, 2018

timotheeguerin deleted the feature/scheduling branch May 30, 2018 20:02


		from .scheduling_target import SchedulingTarget

		class FileShare:

		log.info("Task scheduling is already enabled for this node")


		def setup_node_scheduling(

Conversation

timotheeguerin commented May 4, 2018 • edited by jafreck Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timotheeguerin May 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timotheeguerin commented May 4, 2018 •

edited by jafreck

Loading

timotheeguerin May 30, 2018 •

edited

Loading