From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:55117) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UhhgD-00028V-Au for qemu-devel@nongnu.org; Wed, 29 May 2013 10:48:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UhhgA-0001kx-43 for qemu-devel@nongnu.org; Wed, 29 May 2013 10:48:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:30342) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uhhg9-0001ko-TR for qemu-devel@nongnu.org; Wed, 29 May 2013 10:48:42 -0400 Date: Wed, 29 May 2013 17:48:58 +0300 From: "Michael S. Tsirkin" Message-ID: <20130529144858.GC10462@redhat.com> References: <20130527093409.GH21969@stefanha-thinkpad.redhat.com> <51A496C4.1020602@os.inf.tu-dresden.de> <87r4grca4p.fsf@codemonkey.ws> <20130528171742.GB30296@redhat.com> <20130529074929.GC20199@stefanha-thinkpad.redhat.com> <20130529090859.GH4472@redhat.com> <20130529142143.GA9545@stefanha-thinkpad.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130529142143.GA9545@stefanha-thinkpad.redhat.com> Subject: Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: "snabb-devel@googlegroups.com" , qemu-devel@nongnu.org, Anthony Liguori , Julian Stecklina On Wed, May 29, 2013 at 04:21:43PM +0200, Stefan Hajnoczi wrote: > On Wed, May 29, 2013 at 12:08:59PM +0300, Michael S. Tsirkin wrote: > > On Wed, May 29, 2013 at 09:49:29AM +0200, Stefan Hajnoczi wrote: > > > On Tue, May 28, 2013 at 08:17:42PM +0300, Michael S. Tsirkin wrote: > > > > On Tue, May 28, 2013 at 12:00:38PM -0500, Anthony Liguori wrote: > > > > > Julian Stecklina writes: > > > > > > > > > > > On 05/28/2013 12:10 PM, Luke Gorrie wrote: > > > > > >> On 27 May 2013 11:34, Stefan Hajnoczi > > > > >> > wrote: > > > > > >> > > > > > >> vhost_net is about connecting the a virtio-net speaking process to a > > > > > >> tun-like device. The problem you are trying to solve is connecting a > > > > > >> virtio-net speaking process to Snabb Switch. > > > > > >> > > > > > >> > > > > > >> Yep! > > > > > > > > > > > > Since I am on a similar path as Luke, let me share another idea. > > > > > > > > > > > > What about extending qemu in a way to allow PCI device models to be > > > > > > implemented in another process. > > > > > > > > > > We aren't going to support any interface that enables out of tree > > > > > devices. This is just plugins in a different form with even more > > > > > downsides. You cannot easily keep track of dirty info, the guest > > > > > physical address translation to host is difficult to keep in sync > > > > > (imagine the complexity of memory hotplug). > > > > > > > > > > Basically, it's easy to hack up but extremely hard to do something that > > > > > works correctly overall. > > > > > > > > > > There isn't a compelling reason to implement something like this other > > > > > than avoiding getting code into QEMU. Best to just submit your device > > > > > to QEMU for inclusion. > > > > > > > > > > If you want to avoid copying in a vswitch, better to use something like > > > > > vmsplice as I outlined in another thread. > > > > > > > > > > > This is not as hard as it may sound. > > > > > > qemu would open a domain socket to this process and map VM memory over > > > > > > to the other side. This can be accomplished by having file descriptors > > > > > > in qemu to VM memory (reusing -mem-path code) and passing those over the > > > > > > domain socket. The other side can then just mmap them. The socket would > > > > > > also be used for configuration and I/O by the guest on the PCI > > > > > > I/O/memory regions. You could also use this to do IRQs or use eventfds, > > > > > > whatever works better. > > > > > > > > > > > > To have a zero copy userspace switch, the switch would offer virtio-net > > > > > > devices to any qemu that wants to connect to it and implement the > > > > > > complete device logic itself. Since it has access to all guest memory, > > > > > > it can just do memcpy for packet data. Of course, this only works for > > > > > > 64-bit systems, because you need vast amounts of virtual address space. > > > > > > In my experience, doing this in userspace is _way less painful_. > > > > > > > > > > > > If you can get away with polling in the switch the overhead of doing all > > > > > > this in userspace is zero. And as long as you can rate-limit explicit > > > > > > notifications over the socket even that overhead should be okay. > > > > > > > > > > > > Opinions? > > > > > > > > > > I don't see any compelling reason to do something like this. It's > > > > > jumping through a tremendous number of hoops to avoid putting code that > > > > > belongs in QEMU in tree. > > > > > > > > > > Regards, > > > > > > > > > > Anthony Liguori > > > > > > > > > > > > > > > > > Julian > > > > > > > > OTOH an in-tree device that runs in a separate process would > > > > be useful e.g. for security. > > > > For example, we could limit a virtio-net device process > > > > to only access tap and vhost files. > > > > > > For tap or vhost files only this is good for security. I'm not sure it > > > has many advantages over a QEMU process under SELinux though. > > > > At the moment SELinux necessarily gives QEMU rights to > > e.g. access the filesystem. > > This process would only get access to tap and vhost. > > > > We can also run it as a different user. > > Defence in depth. > > > > We can also limit e.g. the CPU of this process aggressively > > (as it's not doing anything on data path). > > > > I could go on. > > > > And it's really easy too, until you want to use it in production, > > at which point you need to cover lots of > > nasty details like hotplug and migration. > > I think there are diminishing returns. Once QEMU is isolated so it > cannot open arbitrary files, just has access to the resources granted by > the management tool on startup, etc then I'm not sure it's worth the > complexity and performance-cost of splitting the model up into even > smaller pieces. Well, this part is network-facing so there is some value, to isolate it, I don't know how big it is. > IMO there isn't a trust boundary that's worth isolating > here (compare to sshd privilege separation where separate uids really > make sense and are necessary, with QEMU having multiple uids that lack > capabilities to do much doesn't win much over the SELinux setup). > > > > Obviously when the switch process has shared memory access to multiple > > > guests' RAM, the security is worse than a QEMU process solution but > > > better than a vhost kernel solution. > > > So the security story is not a clear win. > > > > > > Stefan > > > > How exactly you pass packets between guest and host is very unlikely to > > affect your security in a meaningful way. > > > > Except, if you lose networking, orif it's just slow beyond any measure, > > you are suddenly more secure against network-based attacks. > > The fact that a single switch process has shared memory access to all > guests' RAM is critical. If the switch process is exploited, then that > exposes other guests' data! (Think of a multi-tenant host with guests > belonging to different users.) > > Stefan Well local priveledge escalation bugs are common enough that you should be very careful in any network facing application, whether that has access to all guests when well-behaved, or not. -- MST