Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in amd64 container using Rosetta #6773

Closed
koehn opened this issue Mar 21, 2023 · 32 comments
Closed

Segfault in amd64 container using Rosetta #6773

koehn opened this issue Mar 21, 2023 · 32 comments

Comments

@koehn
Copy link

koehn commented Mar 21, 2023

  • [x ] I have tried with the latest version of Docker Desktop
  • [x ] I have tried disabling enabled experimental features
  • [x ] I have uploaded Diagnostics
  • Diagnostics ID: 2381464F-94BB-4A0B-B72F-0C88910859D5/20230321150015

Expected behavior

Expected amd64 containers run under Rosetta to function the same as containers run under QEMU.

Actual behavior

amd64 containers run under Rosetta segfaults.

Information

  • macOS Version: 13.2.1
  • Intel chip or Apple chip: M1 arm64
  • Docker Desktop Version: 4.17.0 (99724)

Output of /Applications/Docker.app/Contents/MacOS/com.docker.diagnose check

For the record: Docker is running and working fine; no idea why diagnose is reporting the VM down.

[2023-03-21T15:04:20.693017000Z][com.docker.diagnose][I] set path configuration to OnHost
Starting diagnostics

[PASS] DD0027: is there available disk space on the host?
[PASS] DD0028: is there available VM disk space?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[FAIL] DD0017: can a VM be started? vm has not started: vm has not started
[FAIL] DD0016: is the LinuxKit VM running? vm is not running: vm has not started
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0013: is the $PATH ok?
[PASS] DD0003: is the Docker CLI working?
[PASS] DD0038: is the connection to Docker working?
[PASS] DD0014: are the backend processes running?
[PASS] DD0007: is the backend responding?
[PASS] DD0008: is the native API responding?
[PASS] DD0009: is the vpnkit API responding?
[PASS] DD0010: is the Docker API proxy responding?
[SKIP] DD0030: is the image access management authorized?
[PASS] DD0033: does the host have Internet access?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[WARN] DD0017: can a VM be started? vm has not started: vm has not started
[WARN] DD0016: is the LinuxKit VM running? vm is not running: vm has not started
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0032: do Docker networks overlap with host IPs?

Please note the following 2 warnings:

1 : The check: can a VM be started?
    Produced the following warning: vm has not started: vm has not started

The Docker engine runs inside a Linux VM. Therefore we must be able to start Virtual Machines.

2 : The check: is the LinuxKit VM running?
    Produced the following warning: vm is not running: vm has not started

The Docker engine runs inside a Linux VM. Therefore the VM must be running.


Please investigate the following 1 issue:

1 : The test: can a VM be started?
    Failed with: vm has not started: vm has not started

The Docker engine runs inside a Linux VM. Therefore we must be able to start Virtual Machines.

Immediately after running the above command, I ran:

$ docker run --rm hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

Steps to reproduce the behavior

Using this code/Dockerfile, which has a very simple build of a golang file being added to a distroless base image.

From an Apple Silicon machine, run: docker buildx build -t koehn/fetchurl --platform linux/amd64,linux/arm64 . --push with Rosetta enabled, then without. For me, the without version segfaults:

#21 [linux/amd64 build 5/5] RUN go build -o /go/bin/fetchurl &&     upx --brute /go/bin/fetchurl
456
#21 32.80 crypto/md5: /usr/local/go/pkg/tool/linux_amd64/asm: signal: segmentation fault
129s
457
#21 49.00 vendor/golang.org/x/net/http/httpproxy: /usr/local/go/pkg/tool/linux_amd64/compile: signal: segmentation fault
145s
458
#21 ERROR: process "/dev/.buildkit_qemu_emulator /bin/sh -c go build -o /go/bin/fetchurl &&     upx --brute /go/bin/fetchurl" did not complete successfully: exit code: 1
147s
459
------
460
 > [linux/amd64 build 5/5] RUN go build -o /go/bin/fetchurl &&     upx --brute /go/bin/fetchurl:
461
#21 32.80 crypto/md5: /usr/local/go/pkg/tool/linux_amd64/asm: signal: segmentation fault
462
#21 49.00 vendor/golang.org/x/net/http/httpproxy: /usr/local/go/pkg/tool/linux_amd64/compile: signal: segmentation fault
463
------
464
Dockerfile:8
465
--------------------
466
   7 |     
467
   8 | >>> RUN go build -o /go/bin/fetchurl && \
468
   9 | >>>     upx --brute /go/bin/fetchurl
469
  10 |     
470
--------------------
471
ERROR: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c go build -o /go/bin/fetchurl &&     upx --brute /go/bin/fetchurl" did not complete successfully: exit code: 1
472
time="2023-03-21T14:29:53Z" level=error msg="execution failed: exit status 1”

Again, when I disable Rosetta and run the build again, it succeeds. The arm64 build always succeeds.

@Otterverse
Copy link

I have the same issue when using UPX to compress a binary (and it just so happens to be a Go language produced binary too.) UPX definitely does some strange things to a binary, but as it still works under the old qemu system, it does seem to be a rosetta specific issue.

@LaurentLesle
Copy link

Getting the same when using docker buildx build --platform linux/amd64 on MacOS M1

/usr/local/go/pkg/tool/linux_amd64/compile: signal: segmentation fault

I am fixing the issue by disabling the use of Rosetta but it is much much slower.

image

@koehn
Copy link
Author

koehn commented Jul 13, 2023

Just checking in to see if anyone is listening… still getting segfaults using Rosetta on Docker 4.21.1 (114176)/MacOS Ventura 13.4.1 (22F82).

@reypm
Copy link

reypm commented Aug 16, 2023

I have tried the following versions:

  • 4.19.0
  • 4.20.0
  • 4.21.0
  • 4.21.1
  • 4.22.0

and still getting the same error:

base-img | qemu: uncaught target signal 7 (Bus error) - core dumped
base-img | Bus error

I am running macOS Ventura 13.5. This is driving me crazy since I cannot find a solution for this issue, any help?

@zerok
Copy link

zerok commented Nov 8, 2023

Can also reproduce this on macOS 14.1 with Docker For Mac 4.25.0. This is especially problematic as Rosetta is now enabled by default.

Just to add another example of what fails: https://github.com/zerok/docker-rosetta-issue

@AlekSi
Copy link

AlekSi commented Nov 8, 2023

It might be related to docker/buildx#2028

@koehn
Copy link
Author

koehn commented Nov 8, 2023

I recently learned that you can create a builder in buildx that uses multiple machines with Docker installed using e.g., ssh. When you run a multi-platform build using that builder, buildx sends builds to native hardware whenever possible, avoiding this issue.

 docker buildx ls
NAME/NODE       DRIVER/ENDPOINT                              STATUS  BUILDKIT             PLATFORMS
multiarch *     docker-container                                                          
  mbp           desktop-linux                                running v0.12.3              linux/arm64*, linux/amd64, linux/amd64/v2, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
  Amd64_machine ssh://[email protected]                       running v0.12.3              linux/amd64*, linux/386*, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

You can include pods running on a Kubernetes cluster as well.

See docker buildx --append for more details.

I don’t know if this is a bug in Rosetta, Docker, or QEMU, but in the meantime I have fast builds that work.

@AdrienPoupa
Copy link

AdrienPoupa commented Nov 20, 2023

We are getting a segmentation fault with a Ubuntu-based image running PHP-FPM and Composer:

2023-11-20 13:16:13 Segmentation fault
2023-11-20 13:16:13 bash: line 1:    21 Segmentation fault      composer install --no-interaction

Anything we can do to report the issue?

This has always been happening on my end, and is more annoying since the option got enabled by default (currently using 4.25.1)

@dgageot
Copy link
Member

dgageot commented Nov 21, 2023

Can also reproduce this on macOS 14.1 with Docker For Mac 4.25.0. This is especially problematic as Rosetta is now enabled by default.

Just to add another example of what fails: https://github.com/zerok/docker-rosetta-issue

Hi @zerok, I've tried to reproduce from your project but couldn't make it fail. Could you share the exact commands you used to trigger the issue?

@dgageot
Copy link
Member

dgageot commented Nov 21, 2023

We are getting a segmentation fault with a Ubuntu-based image running PHP-FPM and Composer:

2023-11-20 13:16:13 Segmentation fault
2023-11-20 13:16:13 bash: line 1:    21 Segmentation fault      composer install --no-interaction

Anything we can do to report the issue?

This has always been happening on my end, and is more annoying since the option got enabled by default (currently using 4.25.1)

Hi @AdrienPoupa, could you share the exact commands you used to trigger the issue? Which version of php are you using? Have you tried with a more recent version?

@dgageot
Copy link
Member

dgageot commented Nov 21, 2023

@koehn same, I couldn't reproduce with your project.

FWIW, I'm on Sonoma 14.1.1 and Docker Desktop 4.25.1

@zerok
Copy link

zerok commented Nov 21, 2023

Can also reproduce this on macOS 14.1 with Docker For Mac 4.25.0. This is especially problematic as Rosetta is now enabled by default.
Just to add another example of what fails: https://github.com/zerok/docker-rosetta-issue

Hi @zerok, I've tried to reproduce from your project but couldn't make it fail. Could you share the exact commands you used to trigger the issue?

Hi @dgageot 🙂 A checkout of that repository and then running go run . should produce the segfault (or a different cgo exit error, that's somehow not deterministic yet).

Happened for me with 4.25.0 (126437) every time I ran it.

With 4.25.1 the story is a bit different. I had to run the same command multiple times but in the vast majority of runs I now see this:

Stderr:
go: downloading dagger.io/dagger v0.9.3
go: downloading github.com/vektah/gqlparser/v2 v2.5.6
go: downloading github.com/Khan/genqlient v0.6.0
go: downloading github.com/99designs/gqlgen v0.17.31
go: downloading golang.org/x/exp v0.0.0-20231006140011-7918f672742d
go: downloading golang.org/x/sync v0.4.0
go: downloading github.com/adrg/xdg v0.4.0
go: downloading github.com/mitchellh/go-homedir v1.1.0
runtime/cgo: gcc: signal: segmentation fault
encoding/binary: /usr/local/go/pkg/tool/linux_amd64/compile: signal: segmentation fault
runtime/debug: /usr/local/go/pkg/tool/linux_amd64/asm: signal: segmentation fault
exit status 1

... or similar errors.

I'm also on macOS 14.1.1.

@dgageot
Copy link
Member

dgageot commented Nov 21, 2023

Can also reproduce this on macOS 14.1 with Docker For Mac 4.25.0. This is especially problematic as Rosetta is now enabled by default.
Just to add another example of what fails: https://github.com/zerok/docker-rosetta-issue

Hi @zerok, I've tried to reproduce from your project but couldn't make it fail. Could you share the exact commands you used to trigger the issue?

Hi @dgageot 🙂 A checkout of that repository and then running go run . should produce the segfault (or a different cgo exit error, that's somehow not deterministic yet).

Happened for me with 4.25.0 (126437) every time I ran it.

With 4.25.1 the story is a bit different. I had to run the same command multiple times but in the vast majority of runs I now see this:

Stderr:
go: downloading dagger.io/dagger v0.9.3
go: downloading github.com/vektah/gqlparser/v2 v2.5.6
go: downloading github.com/Khan/genqlient v0.6.0
go: downloading github.com/99designs/gqlgen v0.17.31
go: downloading golang.org/x/exp v0.0.0-20231006140011-7918f672742d
go: downloading golang.org/x/sync v0.4.0
go: downloading github.com/adrg/xdg v0.4.0
go: downloading github.com/mitchellh/go-homedir v1.1.0
runtime/cgo: gcc: signal: segmentation fault
encoding/binary: /usr/local/go/pkg/tool/linux_amd64/compile: signal: segmentation fault
runtime/debug: /usr/local/go/pkg/tool/linux_amd64/asm: signal: segmentation fault
exit status 1

... or similar errors.

I'm also on macOS 14.1.1.

You're on an Apple Silicon Mac and you're cloning and go running inside an amd64 container?

@zerok
Copy link

zerok commented Nov 21, 2023

You're on an Apple Silicon Mac and you're cloning and go running inside an amd64 container?

Exactly 🙂

@dgageot
Copy link
Member

dgageot commented Nov 21, 2023

You're on an Apple Silicon Mac and you're cloning and go running inside an amd64 container?

Exactly 🙂

Which image are you using to build? I used golang and I can't repro.

@zerok
Copy link

zerok commented Nov 21, 2023

You're on an Apple Silicon Mac and you're cloning and go running inside an amd64 container?

Exactly 🙂

Which image are you using to build? I used gaoling and I can't repro.

That's what Dagger is for. All you should need to do to reproduce this issue is running go run . directly on the host system (without going into a container). I have Go 1.21.4 installed on the host system.

@dgageot
Copy link
Member

dgageot commented Nov 21, 2023

You're on an Apple Silicon Mac and you're cloning and go running inside an amd64 container?

Exactly 🙂

Which image are you using to build? I used gaoling and I can't repro.

That's what Dagger is for. All you should need to do to reproduce this issue is running go run . directly on the host system (without going into a container). I have Go 1.21.4 installed on the host system.

Oh, that makes more sense. I'll give it a try

@dgageot
Copy link
Member

dgageot commented Nov 21, 2023

That's what Dagger is for. All you should need to do to reproduce this issue is running go run . directly on the host system (without going into a container). I have Go 1.21.4 installed on the host system.

Oh, that makes more sense. I'll give it a try

OK, this one is interesting.

It succeeded in building the image, the first time I ran it.
Then I added a white space to the go build step, in order to invalidate the cache, ran it again, and it started to fail on the last step.
The error is process "/dev/.buildkit_qemu_emulator go build" did not complete successfully: exit code: 1

I think this one is a buildx specific error and more precisely an issue with how Dagger configures the qemu emulation layer.
I might be wrong, so summoning the gods of thunder, @tonistiigi and @shykes

FWIW, when I go through the same build steps with a Dockerfile, it all works:

cat << EOF | docker build -f- --platform=linux/amd64 --no-cache /var/empty
FROM golang:1.21
WORKDIR /src
RUN git clone https://github.com/zerok/docker-rosetta-issue .
RUN go build
EOF

@AdrienPoupa
Copy link

We are getting a segmentation fault with a Ubuntu-based image running PHP-FPM and Composer:

2023-11-20 13:16:13 Segmentation fault
2023-11-20 13:16:13 bash: line 1:    21 Segmentation fault      composer install --no-interaction

Anything we can do to report the issue?
This has always been happening on my end, and is more annoying since the option got enabled by default (currently using 4.25.1)

Hi @AdrienPoupa, could you share the exact commands you used to trigger the issue? Which version of php are you using? Have you tried with a more recent version?

Those are the logs I am getting when trying to load any page:

[21-Nov-2023 14:45:51] WARNING: [pool www] child 74 exited on signal 11 (SIGSEGV) after 65.243212 seconds from start
[21-Nov-2023 14:45:51] NOTICE: [pool www] child 551 started

Even a simple php -v gives a segfault:

# php -v
PHP 8.1.23 (cli) (built: Sep  2 2023 06:59:15) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.23, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.23, Copyright (c), by Zend Technologies
    with Xdebug v3.2.1, Copyright (c) 2002-2023, by Derick Rethans
Segmentation fault

I'll try to rebuild the image, I can send you the Docker file privately if needed

@dgageot
Copy link
Member

dgageot commented Nov 21, 2023

@AdrienPoupa could you try with php 8.2.something?
I believe this fixes the issue.

@AlekSi
Copy link

AlekSi commented Nov 21, 2023

an issue with how Dagger configures the qemu emulation layer

I don't think it is about Dagger. Consider this reproducer, for example: docker/buildx#2028 (comment)

@AdrienPoupa
Copy link

AdrienPoupa commented Nov 21, 2023

@AdrienPoupa could you try with php 8.2.something? I believe this fixes the issue.

I can try to compile a 8.2 image, but this is not really an acceptable workaround given our production environment runs 8.1 and migrating is not a trivial task, not to mention people relying on older PHP versions to work in Docker.

I rebuilt the image with 8.1, I am still having the issue:

php -v
PHP 8.1.25 (cli) (built: Oct 27 2023 14:00:40) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.25, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.25, Copyright (c), by Zend Technologies
Segmentation fault

This is how we are installing PHP and extensions:

RUN apt-get update && \
  apt-get install --no-install-recommends --yes \
  inotify-tools \
  time \
  zip \
  unzip \
  wget \
  file \
  build-essential \
  zlib1g-dev \
  libmaxminddb-dev \
  libfcgi-bin \
  php8.1 \
  php8.1-apcu \
  php8.1-bcmath \
  php8.1-bz2 \
  php8.1-cli \
  php8.1-curl \
  php8.1-dba \
  php8.1-enchant \
  php8.1-fpm \
  php8.1-gd \
  php8.1-gmp \
  php8.1-intl \
  php8.1-ldap \
  php8.1-mbstring \
  php8.1-memcached \
  php8.1-mysql \
  php8.1-odbc \
  php8.1-pgsql \
  php8.1-opcache \
  php8.1-readline \
  php8.1-redis \
  php8.1-soap \
  php8.1-sqlite3 \
  php8.1-tidy \
  php8.1-xml \
  php8.1-xmlrpc \
  php8.1-yaml \
  php8.1-zip \
  php-pear \
  php8.1-dev \
  librdkafka-dev \
  && pecl install rdkafka grpc maxminddb \
  && apt-get remove -y php-pear php8.1-dev linux-headers build-essential zlib1g-dev \
  && apt-get autoremove -y \
  && rm -rf /var/lib/apt/lists/* \
  && rm -rf /tmp/pear/* \
  && echo "extension = rdkafka.so" > /etc/php/8.1/mods-available/rdkafka.ini \
  && echo "extension = grpc.so" > /etc/php/8.1/mods-available/grpc.ini \
  && echo "extension = maxminddb.so" > /etc/php/8.1/mods-available/maxminddb.ini \
  && phpenmod rdkafka \
  && phpenmod maxminddb \
  && phpenmod grpc

We are also installing NewRelic's PHP agent but not using it in dev:

RUN wget -O - https://download.newrelic.com/548C16BF.gpg | apt-key add -
RUN sh -c 'echo "deb http://apt.newrelic.com/debian/ newrelic non-free" \
  > /etc/apt/sources.list.d/newrelic.list'

RUN apt-get update \
  && apt-get install --no-install-recommends --yes newrelic-php5 \
  && apt-get autoremove -y \
  && rm -rf /var/lib/apt/lists/*

@AdrienPoupa
Copy link

I built a 8.2 image, and it works fine:

php -v
PHP 8.2.12 (cli) (built: Oct 26 2023 17:33:49) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.2.12, Copyright (c) Zend Technologies
    with Zend OPcache v8.2.12, Copyright (c), by Zend Technologies

It seems 8.1 is broken, but that means we can't use the new option with our existing image, even with the latest PHP 8.1 stable version - meaning I feel the activation by default was a bit rushed.

@dgageot
Copy link
Member

dgageot commented Nov 21, 2023

@zerok a fix to your workflow should ship in Docker Desktop 4.26.0. We could reproduce it on 4.25.1 but not on our main branch where we added many fixes to Rosetta.

@AdrienPoupa no fix for php < 8.1 yet. I'm still working on it and will keep you posted!

@dgageot
Copy link
Member

dgageot commented Nov 21, 2023

@AdrienPoupa FWIW, I also couldn't reproduce the issue with php:8.1.25-alpine image

@AdrienPoupa
Copy link

Same here, php:8.1-fpm works fine so it is either coming from the different Ubuntu base, or one of the extensions

@dgageot
Copy link
Member

dgageot commented Nov 24, 2023

@AdrienPoupa it's coming from OPCache and the way it's using mmap. It has been fixed in php 8.2.
You can also try 8.1 with this option in opcache.ini: opcache.preferred_memory_model=shm

@thaJeztah
Copy link
Member

Looks like a fix / improvement was made in Docker Desktop, which will ship with the next release

@dgageot
Copy link
Member

dgageot commented Dec 6, 2023

@AdrienPoupa did you get a chance to test Docker Desktop v4.26.0?

@AdrienPoupa
Copy link

@AdrienPoupa did you get a chance to test Docker Desktop v4.26.0?

I just did, looks like it is fixed, thank you very much!

@dgageot
Copy link
Member

dgageot commented Dec 6, 2023

I'm going to close this issue. Feel free to reopen

@dgageot dgageot closed this as completed Dec 6, 2023
@zerok
Copy link

zerok commented Dec 6, 2023

Thank you 😊 my case also works with 4.26 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests