Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
If we have a dynamic queue (AutoCreatedLeafQueue) with capacity = 0%, then it cannot properly scale even if it's max-capacity and the parent's max-capacity would allow it.
Example:
Cluster Capacity: 16 GB / 16cpu (2 nodes, each with 8 GB / 8 cpu ) Container allocation size: 1G / 1 vcore root.dynamic Effective Capacity: <memory: 8192, vCores: 8> ( 50.0%) Effective Max Capacity: <memory:16384, vCores:16> (100.0%) Template: Capacity: 40% Max Capacity: 100% User Limit Factor: 4
leaf-queue-template.capacity = 40%
leaf-queue-template.maximum-capacity = 100%
leaf-queue-template.maximum-am-resource-percent = 50%
leaf-queue-template.minimum-user-limit-percent =100%
leaf-queue-template.user-limit-factor = 4
"root.dynamic" has a maximum capacity of 100% and a capacity of 50%.
Let's assume there are running containers in these dynamic queues (MR sleep jobs):
root.dynamic.user1 = 1 AM + 3 container (capacity = 40%)
root.dynamic.user2 = 1 AM + 3 container (capacity = 40%)
root.dynamic.user3 = 1 AM + 15 container (capacity = 0%)
This scenario will result in an underutilized cluster. There will be approx 18% unused capacity. On the other hand, it's still possible to submit a new application to root.dynamic.user1 or root.dynamic.user2 and reaching a 100% utilization is possible.