From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:39172) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uh0nJ-00082l-8W for qemu-devel@nongnu.org; Mon, 27 May 2013 13:01:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Uh0nG-0002DC-3y for qemu-devel@nongnu.org; Mon, 27 May 2013 13:01:13 -0400 Received: from mail-ob0-x230.google.com ([2607:f8b0:4003:c01::230]:39756) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uh0nF-0002D4-Ti for qemu-devel@nongnu.org; Mon, 27 May 2013 13:01:10 -0400 Received: by mail-ob0-f176.google.com with SMTP id v19so745404obq.21 for ; Mon, 27 May 2013 10:01:09 -0700 (PDT) From: Anthony Liguori In-Reply-To: <51A38770.4040106@redhat.com> References: <20130527093409.GH21969@stefanha-thinkpad.redhat.com> <51A37F06.2080300@redhat.com> <874ndoflc2.fsf@codemonkey.ws> <51A38770.4040106@redhat.com> Date: Mon, 27 May 2013 12:01:07 -0500 Message-ID: <87wqqk8ii4.fsf@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Luke Gorrie , "snabb-devel@googlegroups.com" , qemu-devel@nongnu.org, Stefan Hajnoczi , mst@redhat.com Paolo Bonzini writes: > Il 27/05/2013 18:18, Anthony Liguori ha scritto: >> Paolo Bonzini writes: >> >>> Il 27/05/2013 11:34, Stefan Hajnoczi ha scritto: >>>> On Sun, May 26, 2013 at 11:32:49AM +0200, Luke Gorrie wrote: >>>>> Stefan put us onto the highly promising track of vhost/virtio. We have >>>>> implemented this between Snabb Switch and the Linux kernel, but not >>>>> directly between Snabb Switch and QEMU guests. The "roadblock" we have hit >>>>> is embarrasingly basic: QEMU is using user-to-kernel system calls to setup >>>>> vhost (open /dev/net/tun and /dev/vhost-net, ioctl()s) and I haven't found >>>>> a good way to map these towards Snabb Switch instead of the kernel. >>>> >>>> vhost_net is about connecting the a virtio-net speaking process to a >>>> tun-like device. The problem you are trying to solve is connecting a >>>> virtio-net speaking process to Snabb Switch. >>>> >>>> Either you need to replace vhost or you need a tun-like device >>>> interface. >>>> >>>> How does your switch talk to hardware? >>> >>> And also, is your switch monolithic or does it consist of different >>> processes? >>> >>> If you already have processes talking to each other, the first thing >>> that came to my mind was a new network backend, similar to net/vde.c but >>> more featureful (so that you support the virtio headers for offloading, >>> for example). Then you would use "-netdev snabb,id=net0 -device >>> e1000,netdev=net0". >> >> It would be very interesting to combine this with vmsplice/splice. > > Was zero-copy vmsplice/splice actually ever implemented? I thought it > was reverted. Not sure what context you're talking about re: zero copy... a pipe can store references to pages instead of having a buffer that stores data. That certainly is there today--otherwise the interface is pointless. When splicing from pipe to pipe, you can move those references without copying the data. When vmsplicing from a userspace region to a pipe, the kernel just stores references to the pages. vmsplicing from a pipe to userspace OTOH will copy the data. This is fixable at least when dealing with GIFT'd pages. For guest-to-guest traffic, you wouldn't be gifting the pages I don't think. For implementing guest-to-guest traffic, the source QEMU can vmsplice the packet to a pipe that is shared with the vswitch. The vswitch can tee(3) the first N bytes to a second pipe such that it can read the info needed for routing decisions. Once the decision is made, if it's a local guest, it can splice() the packet to the appropriate destination QEMU process or another vswitch daemon (no data copy here). Finally, the destination QEMU process can vmsplice() from the pipe which will copy the data (this is the only copy). If vswitch needs to route externally, then it would need to splice() to a macvtap. macvtap should be able to send the packet without copying the data. Not sure that this last work will work as expected but if it doesn't, that's a bug that can/should be fixed. The kernel cannot do better than the above modulo any overhead from userspace context switching[*]. Guest-to-guest requires a copy. Normally macvtap is undesirable because it's tightly connected to a network adapter but that is a desirable trait in this case. N.B., I'm not advocating making all switching decisions in userspace. Just pointing out how it can be done efficiently. [*] in theory the kernel could do zero copy receive but i'm not sure it's feasible in practice. Regards, Anthony Liguori > > Paolo > >>> It would be slower than vhost-net, for example no zero-copy >>> transmission. >> >> With splice, I think you could at least get single copy guest-to-guest >> networking which is about as good as can be done. >> >> Regards, >> >> Anthony Liguori >> >>>> 3. Use the kernel as a middle-man. Create a double-ended "veth" >>>> interface and have Snabb Switch and QEMU each open a PF_PACKET >>>> socket and accelerate it with VHOST_NET. >>> >>> As Michael, mentioned, this could be macvtap on the interface that you >>> have already created in the switch and passed to vhost-net. Then you do >>> not have to do anything in QEMU. >>> >>> Paolo >>> >>>> If you are using the Linux network stack then it might be better to >>>> integrate with vhost maybe as a tun-like device driver. >>>> >>>> Stefan >>>> >>>>