Re: Approaches for same-on-same linux-user execve?

From: "Daniel P. Berrangé" <berrange@redhat.com>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: assad.hashmi@linaro.org,
	Richard Henderson <richard.henderson@linaro.org>,
	Laurent Vivier <laurent@vivier.eu>,
	qemu-devel@nongnu.org,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	qemu-arm@nongnu.org, "Eric W. Biederman" <ebiederm@xmission.com>,
	Arnd Bergmann <arnd.bergmann@linaro.org>
Subject: Re: Approaches for same-on-same linux-user execve?
Date: Fri, 8 Oct 2021 12:01:07 +0100	[thread overview]
Message-ID: <YWAk88MtPDufjzmK@redhat.com> (raw)
In-Reply-To: <877deoevj8.fsf@linaro.org>

On Thu, Oct 07, 2021 at 03:32:19PM +0100, Alex Bennée wrote:
> Hi,
> 
> I came across a use-case this week for ARM although this may be also
> applicable to architectures where QEMU's emulation is ahead of the
> hardware currently widely available - for example if you want to
> exercise SVE code on AArch64. When the linux-user architecture is not
> the same as the host architecture then binfmt_misc works perfectly fine.
> 
> However in the case you are running same-on-same you can't use
> binfmt_misc to redirect execution to using QEMU because any attempt to
> trap native binaries will cause your userspace to hang as binfmt_misc
> will be invoked to run the QEMU binary needed to run your application
> and a deadlock ensues.
> 
> There are some hacks you can apply at a local level like tweaking the
> elf header of the binaries you want to run under emulation and adjusting
> the binfmt_mask appropriately. This works but is messy and a faff to
> set-up.
> 
> An ideal setup would be would be for the kernel to catch a SIGILL from a
> failing user space program and then to re-launch the process using QEMU
> with the old processes maps and execution state so it could continue.
> However I suspect there are enough moving parts to make this very
> fragile (e.g. what happens to the results of library feature probing
> code). So two approaches I can think of are:
> 
> Trap execve in QEMU linux-user
> ------------------------------
> 
> We could add a flag to QEMU so at the point of execve it manually
> invokes the new process with QEMU, passing on the flag to persist this
> behaviour.
> 
> 
> Add path mask to binfmt_misc
> ----------------------------
> 
> The other option would be to extend binfmt_misc to have a path mask so
> it only applies it's alternative execution scheme to binaries in a
> particular section of the file-system (or maybe some sort of pattern?).
> 
> Are there any other approaches you could take? Which do you think has
> the most merit?

Could a new Linux personality flag be useful in combination with a
new flag in binfmt_misc.

eg a flag "E" for binfmt_misc which indicates the rule must only be
applied if the process is execve()d with PER_USE_BINFMT personality
set.

That would let you add a native match rule to binfmt_misc without
it affecting your system initially. To then run native binaries via
qemu-user you just need to set the personality() flag and the only
that  sub-process tree gets redirected.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|