Good morning, Apparently this mail (sent on 2019-01-29) didn't end up on the list, even though the MTA has accepted it, so I'm resending the mail without attachments. To get list readers on page, this is the description of the problem I've sent to Miklos a while ago: > Since kernel 4.19 our NixOS test suites are failing[1] with EPERM while > trying to switch_root from the initrd to a new overlayfs-mounted file system. > > The tests are running using a 9p[2] filesystem as the lowerdir[3]. > > After bisecting I found a6518f73e60e5044656d1ba587e7463479a9381a to be the > culprit. I also tested by reverting said commit on top of the latest 4.19.7 > stable kernel and the issue doesn't occur there. > > To reproduce this with Nix[4], the following command can be used: > > nix-build -I nixpkgs=channel:nixos-unstable '' Unfortunately I have only been able to reproduce this with NixOS VM tests, so I still have no clue what could be wrong here. However, the problem turns out to not be related to switch_root, initrd and execve but generally seems to be an issue with our test scenario. The scenario with NixOS VM tests is as follows: * It uses QEMU in conjunction with 9p to share /nix/store with the guest. * The guest VM mounts that share in $targetRoot/nix/.ro-store during initrd. * Still in initrd, an overlayfs is mounted on $targetRoot/nix/store with the following options: lowerdir=$targetRoot/nix/.ro-store upperdir=$targetRoot/nix/.rw-store/store workdir=$targetRoot/nix/.rw-store/work * Since the aforementioned change however, file access to any of the files on $targetRoot/nix/store and /nix/store (after the switch_root) fails with EPERM. I haven't yet been able to pin down which part of this exactly causes the error, but running overlayfs without 9p works so I *think* it might be related to 9p or possibly remote file systems in general (don't remember exactly, but I think I tried it with sshfs as well)? Right now, we're shortly before our next stable release and while reverting a6518f73e60e5044656d1ba587e7463479a9381a fixes the issue above, it breaks overlayfs elsewhere[5]. On Fri, Dec 07, 2018 at 01:59:59PM +0100, Miklos Szeredi wrote: > Not sure what debug options are available; would you be able to strace > the switch_root process? Enable pr_debug for overlayfs ( echo "file > fs/overlayfs/* +p" > /dynamic_debug/control)? The full log, the output of strace -f and the Nix expression file I was using for the test, along with the kernel config can be found here: https://gist.github.com/aszlig/2eb6be1d1af38313c6b0584ea6a8d0c8 As mentioned, I could only reproduce this with our NixOS VM tests, so in order to reproduce it, you need to have Nix[4] installed and run the Nix expression file reproducing the issue with: nix-build overlayfs-linux-4.19-regression.nix All of my machines are running NixOS, so I gave that expressions to one Arch Linux user running kernel 4.19.12 and an Ubuntu user with kernel 4.15 just to be sure it is reproducible on non NixOS hosts. > Would you mind moving this to a public mailing list for more eyeballs? Sure, I just added the linux-unionfs list to CC, along with a few Nixers. a! PS: I also tested this with the latest mainline (5.0-rc4) and the issue still exists. [1]: https://nix-cache.s3.amazonaws.com/log/6dps3pkw19q9vr9xfhm0npddnykxkj6m-vm-test-run-kernel-latest.drv [2]: https://github.com/NixOS/nixpkgs/blob/e1b7493cfedba7e715070faa64b77c39d0cf4bff/nixos/modules/virtualisation/qemu-vm.nix#L494 [3]: https://github.com/NixOS/nixpkgs/blob/e1b7493cfedba7e715070faa64b77c39d0cf4bff/nixos/modules/virtualisation/qemu-vm.nix#L449-L450 [4]: https://nixos.org/nix/ [5]: https://github.com/NixOS/nixpkgs/issues/54509 -- aszlig Universal dilettante