Approaches for same-on-same linux-user execve?

* Approaches for same-on-same linux-user execve?
@ 2021-10-07 14:32 Alex Bennée
  2021-10-07 16:28 ` Arnd Bergmann
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Alex Bennée @ 2021-10-07 14:32 UTC (permalink / raw)
  To: Laurent Vivier, Richard Henderson
  Cc: assad.hashmi, qemu-devel, James Bottomley, qemu-arm,
	Eric W. Biederman, Arnd Bergmann

Hi,

I came across a use-case this week for ARM although this may be also
applicable to architectures where QEMU's emulation is ahead of the
hardware currently widely available - for example if you want to
exercise SVE code on AArch64. When the linux-user architecture is not
the same as the host architecture then binfmt_misc works perfectly fine.

However in the case you are running same-on-same you can't use
binfmt_misc to redirect execution to using QEMU because any attempt to
trap native binaries will cause your userspace to hang as binfmt_misc
will be invoked to run the QEMU binary needed to run your application
and a deadlock ensues.

There are some hacks you can apply at a local level like tweaking the
elf header of the binaries you want to run under emulation and adjusting
the binfmt_mask appropriately. This works but is messy and a faff to
set-up.

An ideal setup would be would be for the kernel to catch a SIGILL from a
failing user space program and then to re-launch the process using QEMU
with the old processes maps and execution state so it could continue.
However I suspect there are enough moving parts to make this very
fragile (e.g. what happens to the results of library feature probing
code). So two approaches I can think of are:

Trap execve in QEMU linux-user
------------------------------

We could add a flag to QEMU so at the point of execve it manually
invokes the new process with QEMU, passing on the flag to persist this
behaviour.

Add path mask to binfmt_misc
----------------------------

The other option would be to extend binfmt_misc to have a path mask so
it only applies it's alternative execution scheme to binaries in a
particular section of the file-system (or maybe some sort of pattern?).

Are there any other approaches you could take? Which do you think has
the most merit?

Thanks,

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 8+ messages in thread