All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Anthony Liguori <anthony@codemonkey.ws>,
	"Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm <kvm@vger.kernel.org>, qemu-devel <qemu-devel@nongnu.org>,
	Linux Virtualization <virtualization@lists.linux-foundation.org>,
	herbert@gondor.hengli.com.au, netdev@vger.kernel.org
Subject: Re: updated: kvm networking todo wiki
Date: Mon, 03 Jun 2013 10:02:43 +0930	[thread overview]
Message-ID: <871u8kuj84.fsf@rustcorp.com.au> (raw)
In-Reply-To: <8761y034zg.fsf@codemonkey.ws>

Anthony Liguori <anthony@codemonkey.ws> writes:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
>
>> On Thu, May 30, 2013 at 08:40:47AM -0500, Anthony Liguori wrote:
>>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>>> 
>>> > On Thu, May 30, 2013 at 7:23 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>>> >> Anthony Liguori <anthony@codemonkey.ws> writes:
>>> >>> Rusty Russell <rusty@rustcorp.com.au> writes:
>>> >>>> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
>>> >>>>> FWIW, I think what's more interesting is using vhost-net as a networking
>>> >>>>> backend with virtio-net in QEMU being what's guest facing.
>>> >>>>>
>>> >>>>> In theory, this gives you the best of both worlds: QEMU acts as a first
>>> >>>>> line of defense against a malicious guest while still getting the
>>> >>>>> performance advantages of vhost-net (zero-copy).
>>> >>>>>
>>> >>>> It would be an interesting idea if we didn't already have the vhost
>>> >>>> model where we don't need the userspace bounce.
>>> >>>
>>> >>> The model is very interesting for QEMU because then we can use vhost as
>>> >>> a backend for other types of network adapters (like vmxnet3 or even
>>> >>> e1000).
>>> >>>
>>> >>> It also helps for things like fault tolerance where we need to be able
>>> >>> to control packet flow within QEMU.
>>> >>
>>> >> (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).
>>> >>
>>> >> Then I'm really confused as to what this would look like.  A zero copy
>>> >> sendmsg?  We should be able to implement that today.
>>> >>
>>> >> On the receive side, what can we do better than readv?  If we need to
>>> >> return to userspace to tell the guest that we've got a new packet, we
>>> >> don't win on latency.  We might reduce syscall overhead with a
>>> >> multi-dimensional readv to read multiple packets at once?
>>> >
>>> > Sounds like recvmmsg(2).
>>> 
>>> Could we map this to mergable rx buffers though?
>>> 
>>> Regards,
>>> 
>>> Anthony Liguori
>>
>> Yes because we don't have to complete buffers in order.
>
> What I meant though was for GRO, we don't know how large the received
> packet is going to be.  Mergable rx buffers lets us allocate a pool of
> data for all incoming packets instead of allocating max packet size *
> max packets.
>
> recvmmsg expects an array of msghdrs and I presume each needs to be
> given a fixed size.  So this seems incompatible with mergable rx
> buffers.

Good point.  You'd need to build 64k buffers to pass to recvmmsg, then
reuse the parts it didn't touch on the next call.  This limits us to
about a 16th of what we could do with an interface which understood
buffer merging, but I don't know how much that would matter in
practice.  We'd need some benchmarks....

Cheers,
Rusty.

WARNING: multiple messages have this Message-ID (diff)
From: Rusty Russell <rusty@rustcorp.com.au>
To: Anthony Liguori <anthony@codemonkey.ws>,
	"Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm <kvm@vger.kernel.org>, Stefan Hajnoczi <stefanha@gmail.com>,
	Jason Wang <jasowang@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Linux Virtualization <virtualization@lists.linux-foundation.org>,
	herbert@gondor.hengli.com.au, netdev@vger.kernel.org,
	Dmitry Fleytman <dmitry@daynix.com>
Subject: Re: [Qemu-devel] updated: kvm networking todo wiki
Date: Mon, 03 Jun 2013 10:02:43 +0930	[thread overview]
Message-ID: <871u8kuj84.fsf@rustcorp.com.au> (raw)
In-Reply-To: <8761y034zg.fsf@codemonkey.ws>

Anthony Liguori <anthony@codemonkey.ws> writes:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
>
>> On Thu, May 30, 2013 at 08:40:47AM -0500, Anthony Liguori wrote:
>>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>>> 
>>> > On Thu, May 30, 2013 at 7:23 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>>> >> Anthony Liguori <anthony@codemonkey.ws> writes:
>>> >>> Rusty Russell <rusty@rustcorp.com.au> writes:
>>> >>>> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
>>> >>>>> FWIW, I think what's more interesting is using vhost-net as a networking
>>> >>>>> backend with virtio-net in QEMU being what's guest facing.
>>> >>>>>
>>> >>>>> In theory, this gives you the best of both worlds: QEMU acts as a first
>>> >>>>> line of defense against a malicious guest while still getting the
>>> >>>>> performance advantages of vhost-net (zero-copy).
>>> >>>>>
>>> >>>> It would be an interesting idea if we didn't already have the vhost
>>> >>>> model where we don't need the userspace bounce.
>>> >>>
>>> >>> The model is very interesting for QEMU because then we can use vhost as
>>> >>> a backend for other types of network adapters (like vmxnet3 or even
>>> >>> e1000).
>>> >>>
>>> >>> It also helps for things like fault tolerance where we need to be able
>>> >>> to control packet flow within QEMU.
>>> >>
>>> >> (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).
>>> >>
>>> >> Then I'm really confused as to what this would look like.  A zero copy
>>> >> sendmsg?  We should be able to implement that today.
>>> >>
>>> >> On the receive side, what can we do better than readv?  If we need to
>>> >> return to userspace to tell the guest that we've got a new packet, we
>>> >> don't win on latency.  We might reduce syscall overhead with a
>>> >> multi-dimensional readv to read multiple packets at once?
>>> >
>>> > Sounds like recvmmsg(2).
>>> 
>>> Could we map this to mergable rx buffers though?
>>> 
>>> Regards,
>>> 
>>> Anthony Liguori
>>
>> Yes because we don't have to complete buffers in order.
>
> What I meant though was for GRO, we don't know how large the received
> packet is going to be.  Mergable rx buffers lets us allocate a pool of
> data for all incoming packets instead of allocating max packet size *
> max packets.
>
> recvmmsg expects an array of msghdrs and I presume each needs to be
> given a fixed size.  So this seems incompatible with mergable rx
> buffers.

Good point.  You'd need to build 64k buffers to pass to recvmmsg, then
reuse the parts it didn't touch on the next call.  This limits us to
about a 16th of what we could do with an interface which understood
buffer merging, but I don't know how much that would matter in
practice.  We'd need some benchmarks....

Cheers,
Rusty.

  reply	other threads:[~2013-06-03  0:32 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-23  8:50 updated: kvm networking todo wiki Michael S. Tsirkin
2013-05-23  8:50 ` [Qemu-devel] " Michael S. Tsirkin
2013-05-23 14:12 ` Lucas Meneghel Rodrigues
2013-05-23 14:12 ` Lucas Meneghel Rodrigues
2013-05-23 14:12   ` [Qemu-devel] " Lucas Meneghel Rodrigues
2013-05-24  9:41 ` Jason Wang
2013-05-24  9:41   ` [Qemu-devel] " Jason Wang
2013-05-24 11:35   ` Michael S. Tsirkin
2013-05-24 11:35     ` [Qemu-devel] " Michael S. Tsirkin
2013-05-24 13:47     ` Anthony Liguori
2013-05-24 13:47     ` Anthony Liguori
2013-05-24 13:47       ` [Qemu-devel] " Anthony Liguori
2013-05-24 14:00       ` Michael S. Tsirkin
2013-05-24 14:00         ` [Qemu-devel] " Michael S. Tsirkin
2013-05-29  0:07         ` Rusty Russell
2013-05-29  0:07         ` Rusty Russell
2013-05-29  0:07           ` [Qemu-devel] " Rusty Russell
2013-05-29 13:01           ` Anthony Liguori
2013-05-29 13:01             ` [Qemu-devel] " Anthony Liguori
2013-05-29 14:12             ` Michael S. Tsirkin
2013-05-29 14:12               ` [Qemu-devel] " Michael S. Tsirkin
2013-05-30  5:23             ` Rusty Russell
2013-05-30  5:23               ` [Qemu-devel] " Rusty Russell
2013-05-30  6:38               ` Stefan Hajnoczi
2013-05-30  6:38               ` Stefan Hajnoczi
2013-05-30  6:38                 ` [Qemu-devel] " Stefan Hajnoczi
2013-05-30  7:18                 ` Rusty Russell
2013-05-30  7:18                   ` [Qemu-devel] " Rusty Russell
2013-05-30 13:40                 ` Anthony Liguori
2013-05-30 13:40                   ` [Qemu-devel] " Anthony Liguori
2013-05-30 13:44                   ` Michael S. Tsirkin
2013-05-30 13:44                     ` [Qemu-devel] " Michael S. Tsirkin
2013-05-30 14:41                     ` Anthony Liguori
2013-05-30 14:41                       ` [Qemu-devel] " Anthony Liguori
2013-06-03  0:32                       ` Rusty Russell [this message]
2013-06-03  0:32                         ` Rusty Russell
2013-05-30 13:39               ` Anthony Liguori
2013-05-30 13:39                 ` [Qemu-devel] " Anthony Liguori
2013-05-30 13:39               ` Anthony Liguori
2013-05-30  5:23             ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871u8kuj84.fsf@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=anthony@codemonkey.ws \
    --cc=herbert@gondor.hengli.com.au \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.