From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:45427) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UhNWh-0002FO-7P for qemu-devel@nongnu.org; Tue, 28 May 2013 13:17:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UhNWZ-0003Ae-LK for qemu-devel@nongnu.org; Tue, 28 May 2013 13:17:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:9882) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UhNWZ-0003AV-Dw for qemu-devel@nongnu.org; Tue, 28 May 2013 13:17:27 -0400 Date: Tue, 28 May 2013 20:17:42 +0300 From: "Michael S. Tsirkin" Message-ID: <20130528171742.GB30296@redhat.com> References: <20130527093409.GH21969@stefanha-thinkpad.redhat.com> <51A496C4.1020602@os.inf.tu-dresden.de> <87r4grca4p.fsf@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87r4grca4p.fsf@codemonkey.ws> Subject: Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: "snabb-devel@googlegroups.com" , qemu-devel@nongnu.org, Julian Stecklina On Tue, May 28, 2013 at 12:00:38PM -0500, Anthony Liguori wrote: > Julian Stecklina writes: > > > On 05/28/2013 12:10 PM, Luke Gorrie wrote: > >> On 27 May 2013 11:34, Stefan Hajnoczi >> > wrote: > >> > >> vhost_net is about connecting the a virtio-net speaking process to a > >> tun-like device. The problem you are trying to solve is connecting a > >> virtio-net speaking process to Snabb Switch. > >> > >> > >> Yep! > > > > Since I am on a similar path as Luke, let me share another idea. > > > > What about extending qemu in a way to allow PCI device models to be > > implemented in another process. > > We aren't going to support any interface that enables out of tree > devices. This is just plugins in a different form with even more > downsides. You cannot easily keep track of dirty info, the guest > physical address translation to host is difficult to keep in sync > (imagine the complexity of memory hotplug). > > Basically, it's easy to hack up but extremely hard to do something that > works correctly overall. > > There isn't a compelling reason to implement something like this other > than avoiding getting code into QEMU. Best to just submit your device > to QEMU for inclusion. > > If you want to avoid copying in a vswitch, better to use something like > vmsplice as I outlined in another thread. > > > This is not as hard as it may sound. > > qemu would open a domain socket to this process and map VM memory over > > to the other side. This can be accomplished by having file descriptors > > in qemu to VM memory (reusing -mem-path code) and passing those over the > > domain socket. The other side can then just mmap them. The socket would > > also be used for configuration and I/O by the guest on the PCI > > I/O/memory regions. You could also use this to do IRQs or use eventfds, > > whatever works better. > > > > To have a zero copy userspace switch, the switch would offer virtio-net > > devices to any qemu that wants to connect to it and implement the > > complete device logic itself. Since it has access to all guest memory, > > it can just do memcpy for packet data. Of course, this only works for > > 64-bit systems, because you need vast amounts of virtual address space. > > In my experience, doing this in userspace is _way less painful_. > > > > If you can get away with polling in the switch the overhead of doing all > > this in userspace is zero. And as long as you can rate-limit explicit > > notifications over the socket even that overhead should be okay. > > > > Opinions? > > I don't see any compelling reason to do something like this. It's > jumping through a tremendous number of hoops to avoid putting code that > belongs in QEMU in tree. > > Regards, > > Anthony Liguori > > > > > Julian OTOH an in-tree device that runs in a separate process would be useful e.g. for security. For example, we could limit a virtio-net device process to only access tap and vhost files. We can kill this process if there's a bug with the result that NIC gets stalled but everything else keeps going. Possibly restart on next guest reset. There could be other advantages. -- MST