From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:41645) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uh2Cc-0006Ws-0p for qemu-devel@nongnu.org; Mon, 27 May 2013 14:31:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Uh2CY-00006n-Rj for qemu-devel@nongnu.org; Mon, 27 May 2013 14:31:25 -0400 Received: from mail-ob0-x233.google.com ([2607:f8b0:4003:c01::233]:40995) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uh2CY-00006j-OA for qemu-devel@nongnu.org; Mon, 27 May 2013 14:31:22 -0400 Received: by mail-ob0-f179.google.com with SMTP id wo10so4782463obc.10 for ; Mon, 27 May 2013 11:31:22 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20130527171339.GB18800@redhat.com> References: <20130527093409.GH21969@stefanha-thinkpad.redhat.com> <51A37F06.2080300@redhat.com> <874ndoflc2.fsf@codemonkey.ws> <51A38770.4040106@redhat.com> <87wqqk8ii4.fsf@codemonkey.ws> <20130527171339.GB18800@redhat.com> Date: Mon, 27 May 2013 13:31:22 -0500 Message-ID: From: Anthony Liguori Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Luke Gorrie , Paolo Bonzini , "snabb-devel@googlegroups.com" , qemu-devel , Stefan Hajnoczi On Mon, May 27, 2013 at 12:13 PM, Michael S. Tsirkin wrote: > > On Mon, May 27, 2013 at 12:01:07PM -0500, Anthony Liguori wrote: > > Paolo Bonzini writes: > > > > Finally, the destination QEMU process can vmsplice() from the pipe which > > will copy the data (this is the only copy). > > AFAIK splice is mostly useless for networking as there's no way to > get notified when packet has been sent. I suspect you could use a thread pool to work around this. It's certainly not useless if your goal is to do userspace switching... > > If vswitch needs to route externally, then it would need to splice() to > > a macvtap. > > > > macvtap should be able to send the packet without copying the data. Not > > sure that this last work will work as expected but if it doesn't, that's > > a bug that can/should be fixed. > > > > The kernel cannot do better than the above modulo any overhead from > > userspace context switching[*]. > > Also modulo scheduler latency - kernel processes packets > in interrupt context. There's a reason e.g. OVS runs data-path in > kernel. Ack. Like I say below, I think network routing belongs in the kernel. Regards, Anthony Liguori > > Guest-to-guest requires a copy. > > Normally macvtap is undesirable because it's tightly connected to a > > network adapter but that is a desirable trait in this case. > > > > N.B., I'm not advocating making all switching decisions in > > userspace. Just pointing out how it can be done efficiently. > > > > [*] in theory the kernel could do zero copy receive but i'm not sure > > it's feasible in practice. > > > > Regards, > > > > Anthony Liguori > > > > > > > > Paolo > > > > > >>> It would be slower than vhost-net, for example no zero-copy > > >>> transmission. > > >> > > >> With splice, I think you could at least get single copy guest-to-guest > > >> networking which is about as good as can be done. > > >> > > >> Regards, > > >> > > >> Anthony Liguori > > >> > > >>>> 3. Use the kernel as a middle-man. Create a double-ended "veth" > > >>>> interface and have Snabb Switch and QEMU each open a PF_PACKET > > >>>> socket and accelerate it with VHOST_NET. > > >>> > > >>> As Michael, mentioned, this could be macvtap on the interface that you > > >>> have already created in the switch and passed to vhost-net. Then you do > > >>> not have to do anything in QEMU. > > >>> > > >>> Paolo > > >>> > > >>>> If you are using the Linux network stack then it might be better to > > >>>> integrate with vhost maybe as a tun-like device driver. > > >>>> > > >>>> Stefan > > >>>> > > >>>>