On Tue, Apr 23, 2019 at 05:26:33PM -0400, Jag Raman wrote: > On 3/26/2019 6:20 PM, Philippe Mathieu-Daudé wrote: > > > > > > Please share the SELinux policy files, containerization scripts, etc. > > > > > There is probably a home for them in qemu.git, libvirt.git, or elsewhere > > > > > upstream. > > > > > > > > > > We need to find a way to make the sandboxing improvements available to > > > > > users besides yourself and easily reusable for developers who wish to > > > > > convert additional device models. > > > > > > > Also for testing this series. > > Hi, > > We are wondering how to deliver the example SELinux policies. I have > posted on Fedora's SELinux mailing list to get info. on how to upstream > SElinux policy. > > We are developing SELinux Type Enforcements and MCS labels to sandbox > the emulation process. Details regarding example Type Enforcement is > available below. > > We are also working on changes to libvirt, to launch the remote process > and apply MCS labels. Libvirt changes will be posted separately in the > future. > > The Type Enforcements for SElinux is available in the pastebin location > below (also copied at the end of this email): > https://pastebin.com/t1bpS6MY Can multiple LSI SCSI controllers be launched such that each process only has access to a subset of disk images? Or is the disk image label per-VM so that there is no isolation between LSI SCSI controller processes for that VM? My concern with this overall approach is the practicality vs its benefits. Regarding practicality, each emulated device needs to be proxied separately. The QEMU subsystem used by the device also needs to be proxied. Global state, monitor commands, and live migration all require code changes to support proxied operation. This is very invasive. Then each emulated device needs an SELinux policy to achieve the benefits of confinement. I have no idea how to correctly write a policy like this and it's likely that developers who contribute a single new device will not be proficient in it either. Writing these policies is a rare thing and few people will be good at this. It also makes me worry about how we test and review them. Despite the efforts required in making this work, all processes still effectively have full access to the guest since they can access guest RAM. What I mean is that the device is actually not confined to its host process (e.g. LSI SCSI controller process) because it can write code to executable guest RAM pages. The guest will then execute that code and therefore all guest I/O (networking, disk, etc) is still available indirectly to the "confined" processes. They are not really sandboxed from the outside world, regardless of how strict the SELinux policy is :(. There are performance issues due to proxying as well, but let's ignore them for now and focus on security. How do the benefits compare against today's monolithic approach? If the guest exploits monolithic QEMU it has full access to all host files and APIs available to QEMU. However, these are largely just the resources that belong to the guest anyway - not resources we are trying to keep away from the guest. With multi-process QEMU each process still has access to all guest interfaces via the code injection I mentioned above, but the SELinux policy could restrict access to some resources. But this benefit is really small in my opinion, given that the resources belong to the guest anyway and the guest can already access them. I think you can implement this for a handful of devices as a one-time thing, but the invasiveness and the impracticality of getting wide cover of QEMU make this approach questionable. Am I mistaken about the invasiveness or impracticality? Am I misunderstanding the security benefits compared to what already exists today? A more practical approach is to strip down QEMU (compiling out unused devices and features) and to run virtio devices in vhost-user processes (e.g. virtio-input, virtio-gpu, virtio-fs). This achieves similar goals without proxy objects or invasive changes to QEMU since the vhost-user devices use a different codebase and aren't accessible via the QEMU monitor. The limitation is that existing QEMU code and non-virtio devices aren't available in this model. Stefan