When will the autopilot GKE bursting be enabled again?

We can't use bursting in autopilot GKE because of the following know issue. Any ETA?

To mitigate this issue, we temporarily disabled bursting in GKE Autopilot clusters that were created or upgraded to version 1.29.2-gke.1060000 and later on or after April 24, 2024. Clusters that enabled bursting prior to April 24, 2024 continue to support bursting.

3 10 503
10 REPLIES 10

same question

Why don't have n ETA on wholesale re-enabling it, but if you are ok with the minimal risk of the issue occurring, you can contact support and ask them to enable it on your cluster(s).

+1

Our test cluster runs on autopilot and we wanted to put all of its pods on the minimum (50m) cpu with a burst to 500m since they see very sporadic use. The minimum without bursting is 250m. So the cluster is currently 5x as expensive as it should be... We're planning to run 50-60 pods on there, that's 15vcpu vs 3vcpu... Big difference...

I really do not want to recreate everything on a Standard cluster just because of a bug. But if we have an ETA and know that it won't be fixed this year, it might be worth doing the work to move it to a standard cluster with our own managed node pool..

> you can contact support and ask them to enable it on your cluster(s)

But you need to pay to contact support... To ask them for an exemption regarding its own bugs? No thanks...

+1 actually. We also have the exact same problem as jannes1  described. And I don't see why we should pay to Support to fix their own bugs. Isn't there a workaround to actually enable Bursting in any new Autopilot cluster? We are ready to backup our cluster and move to another one, but if it's also impossible, then we will have to manage a Standard cluster too.

Hi guys, we contacted customer service since it'd still be cheaper to do that than deal with non-burstable workloads and they revealed to us that the bug was fixed in 1.30.2.

So you can upgrade your cluster to  1.30.2-gke.1447000 (found in rapid release channel) . It may take a long time for all nodes to be updated as well, and you might want to do a manual upgrade to the same version to force GKE to restart your control plane.  After doing all of this it took about 24-36 hours before our cluster started accepting burstable workloads.

The GKE release note published on July 17 implies that bursting in GKE Autopilot has been re-enabled on  1.30.2-gke.1394000+.

> GKE Autopilot now supports opportunistic bursting and lower Pod minimums upon cluster creation or upgrade to 1.30.2-gke.1394000 or later, resolving a previous issue with contained.

https://cloud.google.com/kubernetes-engine/docs/release-notes#July_17_2024

I believe that bursting still does not work in Autopilot.

Here is my setup:
 

 

gcloud container clusters describe staging-k8s-cluster-fc75090 \
    --location=europe-west4 \
    --format="value(initialClusterVersion)"
1.27.3-gke.1700


gcloud container clusters describe staging-k8s-cluster-fc75090 \
    --location=europe-west4 \                
    --format="value(currentNodeVersion)"
1.30.2-gke.1587003

gcloud container clusters describe staging-k8s-cluster-fc75090 \
    --location=europe-west4 \
    --format="value(currentMasterVersion)"
1.30.2-gke.1587003

 

The cluster has been upgraded to this version more than 36 hours ago.

While creating the simples deployment with bursting:
 

 

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:latest
          resources:
            requests:
              memory: 1G
              cpu: 250m
            limits:
              memory: 2G
              cpu: 500m

 

I receive:
 

 

kubectl diff -f nginx.yaml                       
diff -u -N /var/folders/33/9_k7r7q94_jclsm693n_qwcc0000gn/T/LIVE-3788445005/apps.v1.Deployment.apps.nginx-deployment /var/folders/33/9_k7r7q94_jclsm693n_qwcc0000gn/T/MERGED-1305063325/apps.v1.Deployment.apps.nginx-deployment
--- /var/folders/33/9_k7r7q94_jclsm693n_qwcc0000gn/T/LIVE-3788445005/apps.v1.Deployment.apps.nginx-deployment      2024-08-16 12:17:26
+++ /var/folders/33/9_k7r7q94_jclsm693n_qwcc0000gn/T/MERGED-1305063325/apps.v1.Deployment.apps.nginx-deployment    2024-08-16 12:17:26
@@ -0,0 +1,61 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  annotations:
+    autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"limits":{"cpu":"500m","memory":"2G"},"requests":{"cpu":"250m","memory":"1G"},"name":"nginx"}]},"output":{"containers":[{"limits":{"cpu":"250m","ephemeral-storage":"1Gi","memory":"1G"},"requests":{"cpu":"250m","ephemeral-storage":"1Gi","memory":"1G"},"name":"nginx"}]},"modified":true}'
+    autopilot.gke.io/warden-version: 3.0.22
+  creationTimestamp: "2024-08-16T10:17:26Z"
+  generation: 1
+  name: nginx-deployment
+  namespace: apps
+  uid: 8d8e0aa6-fab6-4a3f-a9f8-66c82d1e3d23
+spec:
+  progressDeadlineSeconds: 600
+  replicas: 1
+  revisionHistoryLimit: 10
+  selector:
+    matchLabels:
+      app: nginx
+  strategy:
+    rollingUpdate:
+      maxSurge: 25%
+      maxUnavailable: 25%
+    type: RollingUpdate
+  template:
+    metadata:
+      creationTimestamp: null
+      labels:
+        app: nginx
+    spec:
+      containers:
+      - image: nginx:latest
+        imagePullPolicy: Always
+        name: nginx
+        resources:
+          limits:
+            cpu: 250m
+            ephemeral-storage: 1Gi
+            memory: 1G
+          requests:
+            cpu: 250m
+            ephemeral-storage: 1Gi
+            memory: 1G
+        securityContext:
+          capabilities:
+            drop:
+            - NET_RAW
+        terminationMessagePath: /dev/termination-log
+        terminationMessagePolicy: File
+      dnsPolicy: ClusterFirst
+      restartPolicy: Always
+      schedulerName: default-scheduler
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
+      terminationGracePeriodSeconds: 30
+      tolerations:
+      - effect: NoSchedule
+        key: kubernetes.io/arch
+        operator: Equal
+        value: amd64

 

Namely: the autopilot is adjusting limits to match requests, effectively disabling bursting.

Restart the control plane by manually upgrading it to the same GKE version. This has to happen after the nodes got upgraded to the new version (so upgrade control plane -> upgrade nodes -> manually restart control plane). LMK if that works

gcloud container clusters upgrade staging-k8s-cluster-fc75090 --master \
    --cluster-version 1.30.2-gke.1587003

 

It helped, thank you. I have missed that the "restart" must be done once again.

Yeah it's a bit annoying but thankfully only needs to happen once, after you upgrade from an unsupported version to a version that supports bursting! 

Top Labels in this Space