Skip to content

Conversation

@mhenke1
Copy link
Contributor

@mhenke1 mhenke1 commented Jun 15, 2018

This PR introduces a configurable multiplicator to influence the forced active ack timeout period.

Description

In rare condition one can see that the load balancer put activations into queues of invokers falsely
identified to have free capacity because forced active acks are send out for activation that are waiting in the queue to be executed. With increasing the timeout the loadbalancer can be influenced to distribute the load wider in this cases.

Related issue and scope

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • [x ] I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

[PG3:2420 succeeded]

min = 100 ms
max = 5 m
std = 1 m
timeoutfactor = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment as this knob isn’t obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@codecov-io
Copy link

codecov-io commented Jun 15, 2018

Codecov Report

Merging #3767 into master will decrease coverage by 0.69%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #3767     +/-   ##
=========================================
- Coverage   75.84%   75.14%   -0.7%     
=========================================
  Files         145      132     -13     
  Lines        6921     6143    -778     
  Branches      410      374     -36     
=========================================
- Hits         5249     4616    -633     
+ Misses       1672     1527    -145
Impacted Files Coverage Δ
...e/loadBalancer/ShardingContainerPoolBalancer.scala 32.35% <0%> (+0.74%) ⬆️
...ontainerpool/logging/ElasticSearchRestClient.scala 76.08% <0%> (-1.7%) ⬇️
...rc/main/scala/whisk/core/entity/ExecManifest.scala 96.29% <0%> (-1.58%) ⬇️
.../containerpool/docker/DockerContainerFactory.scala 26.66% <0%> (-0.92%) ⬇️
...sk/core/containerpool/docker/DockerContainer.scala 77.02% <0%> (-0.9%) ⬇️
.../scala/src/main/scala/whisk/core/entity/Exec.scala 83.11% <0%> (-0.65%) ⬇️
...a/whisk/core/loadBalancer/InvokerSupervision.scala 82.25% <0%> (-0.56%) ⬇️
...la/src/main/scala/whisk/common/TransactionId.scala 92.98% <0%> (-0.47%) ⬇️
...isk/core/controller/actions/PrimitiveActions.scala 87.5% <0%> (-0.1%) ⬇️
...cala/whisk/core/containerpool/ContainerProxy.scala 92% <0%> (-0.05%) ⬇️
... and 39 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b5abe7...13dd24f. Read the comment docs.

"CONFIG_whisk_timeLimit_min": "{{ limit_action_time_min | default() }}"
"CONFIG_whisk_timeLimit_max": "{{ limit_action_time_max | default() }}"
"CONFIG_whisk_timeLimit_std": "{{ limit_action_time_std | default() }}"
"CONFIG_whisk_timeLimit_timeoutfactor": "{{ limit_action_timeoutfactor | default() }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this rather be part of the loadbalancer config since it is specific to the loadbalancer after all?

@mhenke1 mhenke1 changed the title introduce timeout factor to controll invoker queue behavior introduce timeout factor to control invoker queue behavior Jun 19, 2018
@markusthoemmes markusthoemmes requested a review from cbickel July 23, 2018 10:14
@markusthoemmes markusthoemmes added the review Review for this PR has been requested and yet needs to be done. label Jul 23, 2018
If an active-ack does not appear after a certain timeout (action timeout + 1 minute), the active-ack is considered as lost and we "force" it to keep the loadbalancer's state sane.

In some cases where the system gets increasingly slow, this timeout is too narrow. This makes the value configurable to be able to rapidly adjust it and to find a good default in production environments.

Co-authored-by: Markus Thömmes <markusthoemmes@me.com>
@markusthoemmes
Copy link
Contributor

PG3 2561 🔵

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@markusthoemmes markusthoemmes merged commit 9e47321 into apache:master Jul 25, 2018
@mhenke1 mhenke1 deleted the timeout_factor branch August 17, 2018 14:10
BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
If an active-ack does not appear after a certain timeout (action timeout + 1 minute), the active-ack is considered as lost and we "force" it to keep the loadbalancer's state sane.

In some cases where the system gets increasingly slow, this timeout is too narrow. This makes the value configurable to be able to rapidly adjust it and to find a good default in production environments.

Co-authored-by: Markus Thömmes <markusthoemmes@me.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review Review for this PR has been requested and yet needs to be done.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants