On Tue, Apr 23, 2019 at 05:26:33PM -0400, Jag Raman wrote:
> On 3/26/2019 6:20 PM, Philippe Mathieu-Daudé wrote:
> 
> > > > > Please share the SELinux policy files, containerization scripts, etc.
> > > > > There is probably a home for them in qemu.git, libvirt.git, or elsewhere
> > > > > upstream.
> > > > > 
> > > > > We need to find a way to make the sandboxing improvements available to
> > > > > users besides yourself and easily reusable for developers who wish to
> > > > > convert additional device models.
> > > 
> > 
> > Also for testing this series.
> 
> Hi,
> 
> We are wondering how to deliver the example SELinux policies. I have
> posted on Fedora's SELinux mailing list to get info. on how to upstream
> SElinux policy.
> 
> We are developing SELinux Type Enforcements and MCS labels to sandbox
> the emulation process. Details regarding example Type Enforcement is
> available below.
> 
> We are also working on changes to libvirt, to launch the remote process
> and apply MCS labels. Libvirt changes will be posted separately in the
> future.
> 
> The Type Enforcements for SElinux is available in the pastebin location
> below (also copied at the end of this email):
> https://pastebin.com/t1bpS6MY

Can multiple LSI SCSI controllers be launched such that each process
only has access to a subset of disk images?  Or is the disk image label
per-VM so that there is no isolation between LSI SCSI controller
processes for that VM?

My concern with this overall approach is the practicality vs its
benefits.  Regarding practicality, each emulated device needs to be
proxied separately.  The QEMU subsystem used by the device also needs to
be proxied.  Global state, monitor commands, and live migration all
require code changes to support proxied operation.  This is very
invasive.

Then each emulated device needs an SELinux policy to achieve the
benefits of confinement.  I have no idea how to correctly write a policy
like this and it's likely that developers who contribute a single new
device will not be proficient in it either.  Writing these policies is a
rare thing and few people will be good at this.  It also makes me worry
about how we test and review them.

Despite the efforts required in making this work, all processes still
effectively have full access to the guest since they can access guest
RAM.  What I mean is that the device is actually not confined to its
host process (e.g. LSI SCSI controller process) because it can write
code to executable guest RAM pages.  The guest will then execute that
code and therefore all guest I/O (networking, disk, etc) is still
available indirectly to the "confined" processes.  They are not really
sandboxed from the outside world, regardless of how strict the SELinux
policy is :(.

There are performance issues due to proxying as well, but let's ignore
them for now and focus on security.

How do the benefits compare against today's monolithic approach?  If the
guest exploits monolithic QEMU it has full access to all host files and
APIs available to QEMU.  However, these are largely just the resources
that belong to the guest anyway - not resources we are trying to keep
away from the guest.  With multi-process QEMU each process still has
access to all guest interfaces via the code injection I mentioned above,
but the SELinux policy could restrict access to some resources.  But
this benefit is really small in my opinion, given that the resources
belong to the guest anyway and the guest can already access them.

I think you can implement this for a handful of devices as a one-time
thing, but the invasiveness and the impracticality of getting wide cover
of QEMU make this approach questionable.

Am I mistaken about the invasiveness or impracticality?

Am I misunderstanding the security benefits compared to what already
exists today?

A more practical approach is to strip down QEMU (compiling out unused
devices and features) and to run virtio devices in vhost-user processes
(e.g. virtio-input, virtio-gpu, virtio-fs).  This achieves similar goals
without proxy objects or invasive changes to QEMU since the vhost-user
devices use a different codebase and aren't accessible via the QEMU
monitor.  The limitation is that existing QEMU code and non-virtio
devices aren't available in this model.

Stefan