On Tue, May 07, 2019 at 02:00:59PM -0700, Elena Ufimtseva wrote:
> On Mon, Mar 11, 2019 at 10:20:06AM +0000, Daniel P. Berrangé wrote:
> > On Thu, Mar 07, 2019 at 03:29:41PM -0800, John G Johnson wrote:
> > > 
> > > 
> 
> Hi Daniel, Stefan
> 
> We have not replied in a while as we were trying to figure out
> the best approach after multiple comments we have received on the
> patch series.
> 
> Leaving other concerns that you, Stefan and others shared with us
> out of this particular topic, we would like to get your opinion on
> the following approach.
> 
> Please see below.
> 
> > > > On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > 
> > > > On Thu, Mar 07, 2019 at 02:51:20PM +0000, Daniel P. Berrangé wrote:
> > > >> I guess one obvious answer is that the existing security mechanisms like
> > > >> SELinux/ApArmor/DAC can be made to work in a more fine grained manner if
> > > >> there are distinct processes. This would allow for a more useful seccomp
> > > >> filter to better protect against secondary kernel exploits should QEMU
> > > >> itself be exploited, if we can protect individual components.
> > > > 
> > > > Fine-grained sandboxing is possible in theory but tedious in practice.
> > > > From what I can tell this patch series doesn't implement any sandboxing
> > > > for child processes.
> > > > 
> > > 
> > > 	The policies aren’t in QEMU, but in the selinux config files.
> > > They would say, for example, that when the QEMU process exec()s the
> > > disk emulation process, the process security context type transitions
> > > to a new type.  This type would have permission to access the VM image
> > > objects, whereas the QEMU process type (and any other device emulation
> > > process types) cannot access them.
> > 
> > Note that currently all QEMU instances run by libvirt have seccomp
> > policy applied that explicitly forbids any use of fork+exec as a way
> > to reduce avenues of attack for an exploited QEMU.
> > 
> > Even in a modularized QEMU I'd be loathe to allow QEMU to have the
> > fork+exec privileged, unless "QEMU" in this case was just a stub
> > process that does nothing more than fork+exec the other binaries,
> > while having zero attack exposed to the untrusted guest OS.
> 
> We see libvirt uses QEMU’s -sandbox option to indicate that QEMU
> should use seccomp() to prohibit future use of certain system calls,
> including fork() and exec().  Our idea is to enumerate the remote
> processes needed via QEMU command line options, and have QEMU exec()
> those processes before -sandbox is processed.
> And we also will init seccomp for emulated devices processes.

Sounds good.

My experience with seccomp is that whitelisting syscalls is fragile
because of library dependencies.  Even glibc might invoke syscalls you
didn't expect, especially after a kernel/glibc upgrade, forcing you to
modify the whitelist.

However, once a whitelist is successfully in place it's a simple way to
reduce the syscall attack surface and I think it's worthwhile.