consistent "no space left on device" error when pulling large docker image

Failed to pull image "....": failed to pull and unpack image "...": failed to extract layer sha256:0e7837d7a4dd0d0859fa04eb81e0739d5c69aae45b7bbc121f029faeadc768da: write /var/lib/containerd/io.containerd.snapshotter.v1.gcfs/snapshotter/snapshots/338/fs/root/.cache/pip/http/2/2/f/4/9/22f4973b37e633e25de4a56e0cda7f53500f73af7e1c49143575a99b: no space left on device: unknown

Failed to pull image "...": failed to pull and unpack image "...": failed to extract layer sha256:0e7837d7a4dd0d0859fa04eb81e0739d5c69aae45b7bbc121f029faeadc768da: write /var/lib/containerd/io.containerd.snapshotter.v1.gcfs/snapshotter/snapshots/361/fs/usr/local/lib/python3.11/site-packages/nvidia/cublas/lib/libcublasLt.so.12: no space left on device: unknown

Recently, our autopilot cluster was auto updated to 1.29.4-gke.1043002, that's when the issue started. We were on 1.29.1 before and this was working fine.

The log above is just a couple of  examples. 

One thing is note is that the docker image we are pulling has a virtual size of  29.7 GB

 

0 2 257
2 REPLIES 2

a temporary solution was to rebuild the docker image to be a little smaller. main thing we did was run `pip cache purge` at the end of the build to save a few GB, and that somehow worked. 

some are working, but still alot of failures related to "no space left on device"

Top Labels in this Space