linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V8 0/4 net-next] macvtap/vhost TX zero-copy support
@ 2011-07-06 22:15 Shirley Ma
  2011-07-07 11:08 ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Shirley Ma @ 2011-07-06 22:15 UTC (permalink / raw)
  To: David Miller, mst; +Cc: netdev, kvm, linux-kernel

This patchset add supports for TX zero-copy between guest and host
kernel through vhost. It significantly reduces CPU utilization on the
local host on which the guest is located (It reduced about 50% CPU usage
for single stream test on the host, while 4K message size BW has
increased about 50%). The patchset is based on previous submission and
comments from the community regarding when/how to handle guest kernel
buffers to be released. This is the simplest approach I can think of
after comparing with several other solutions.

This patchset has integrated V3 review comments from community: 

1. Add more comments on how to use device ZEROCOPY flag;
2. Change device ZEROCOPY to available bit 31
3. Fix skb header linear allocation when virtio_net GSO is not enabled

It has integrated V4 review comments from MST and Sridhar:
1. In vhost, using socket poll wake up for outstanding DMAs
2. Add detailed comments for vhost_zerocopy_signal_used call
3. Add sleep in vhost shutting down instead of busy-wait for outstanding
   DMAs.
4. Copy small packets, don't do zero-copy callback in mavtap, mark it's
   DMA done in vhost
5. change zerocopy to bool in macvtap.

It has integrated V5 review comments from MST and 
MichaÅ. MirosÅ.aw <mirqus@...il.com>
1. Prevent userspace apps from holding skb userspace buffers by copying
userspace buffers to kernel in skb_clone, skb_copy, pskb_copy,
pskb_expand_head.
2. It is also used HIGHDMA, SG feature bits to enable ZEROCOPY to remove
the dependency of a new feature bit, we can add it later when new
feature bit is available.

It has integrated V6 review comments from Eric Dumazet.
1. Moving ubuf_info object from skb to caller, just use one pointer in
skb_share_info to point ubuf_info object.

2. Change the zero-copy size from 256 bytes to PAGE_SIZE (4K) because of
the small message size performance issue.

3. During vhost shutting down, release outstanding userspace buffers w/o
waiting for lower device DMAs done if any. Do we really care about the
possible wrong data being sent on the wire during shutting down?

This patch has integrated Version 7 review from Michael:
1. Add comment to fix busywait while vhost ring changes and clean up.

2. Add a new tx flags for zero copy skbs, use destructor_arg to avoid a new
point in skb share_info.

This patchset includes:
1/4: Add a new sock zero-copy flag, SOCK_ZEROCOPY;

2/4: Add a new tx flags in skb_share_info SKBTX_DEV_ZEROCOPY to check 
userspace buffers release callback when lower device DMA has done for that skb,
which is the last reference count gone; 

And whenever skb_clone, skb_copy, pskb_copy, pskb_expand_head get call 
from tcpdump, filtering, these userspace buffers will be copied into kernel 
... we don't want userspace apps to hold userspace buffers too long.

Use skb destructor arg as a pointer to userspace buffer info 

3/4: Add vhost zero-copy callback in vhost when skb last refcnt is gone;
add vhost_zerocopy_signal_used to notify guest to release TX skb
buffers.

4/4: Add macvtap zero-copy in lower device when sending packet is
greater than 256 bytes.

The patchset is built against net next linux-3.0.0-rc5. It has passed
netperf/netserver multiple streams stress test, tcpdump suspended test, 
dynamically SG change test.

Single TCP_STREAM 120 secs test results 2.6.39-rc3 over ixgbe 10Gb NIC
results:

Message BW(Gb/s)qemu-kvm (NumCPU)vhost-net(NumCPU) PerfTop irq/s
4K      7408.57         92.1%           22.6%           1229
4K(Orig)4913.17         118.1%          84.1%           2086    
8K      9129.90         89.3%           23.3%           1141
8K(Orig)7094.55         115.9%          84.7%           2157
16K     9178.81         89.1%           23.3%           1139
16K(Orig)8927.1         118.7%          83.4%           2262
64K     9171.43         88.4%           24.9%           1253
64K(Orig)9085.85        115.9%          82.4%           2229

For message size less or equal than 2K, there is a known KVM guest TX
overrun issue. With this zero-copy patch, the issue becomes more severe,
guest io_exits has tripled than before, so the performance is not good.
Once the TX overrun problem has been addressed, I will retest the small
message size performance.

 drivers/net/macvtap.c  |  132 ++++++++++++++++++++++++++++++++++++++++++++----
 drivers/vhost/net.c    |   45 ++++++++++++++++-
 drivers/vhost/vhost.c  |   48 +++++++++++++++++
 drivers/vhost/vhost.h  |   15 ++++++
 include/linux/skbuff.h |   16 ++++++
 include/net/sock.h     |    1 +
 net/core/skbuff.c      |   79 ++++++++++++++++++++++++++++-
 7 files changed, 324 insertions(+), 14 deletions(-)

Thanks
Shirley


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V8 0/4 net-next] macvtap/vhost TX zero-copy support
  2011-07-06 22:15 [PATCH V8 0/4 net-next] macvtap/vhost TX zero-copy support Shirley Ma
@ 2011-07-07 11:08 ` David Miller
  2011-07-07 11:37   ` Michael S. Tsirkin
  0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2011-07-07 11:08 UTC (permalink / raw)
  To: mashirle; +Cc: mst, netdev, kvm, linux-kernel

From: Shirley Ma <mashirle@us.ibm.com>
Date: Wed, 06 Jul 2011 15:15:25 -0700

> This patchset add supports for TX zero-copy between guest and host
> kernel through vhost. It significantly reduces CPU utilization on the
> local host on which the guest is located (It reduced about 50% CPU usage
> for single stream test on the host, while 4K message size BW has
> increased about 50%). The patchset is based on previous submission and
> comments from the community regarding when/how to handle guest kernel
> buffers to be released. This is the simplest approach I can think of
> after comparing with several other solutions.
> 
> This patchset has integrated V3 review comments from community: 

I'm personally fine with this patch set.  Unless there are others
who object, please fix the use-after-free bug I reported, respin
the patch set, and I'll apply it.

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V8 0/4 net-next] macvtap/vhost TX zero-copy support
  2011-07-07 11:08 ` David Miller
@ 2011-07-07 11:37   ` Michael S. Tsirkin
  2011-07-07 11:49     ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Michael S. Tsirkin @ 2011-07-07 11:37 UTC (permalink / raw)
  To: David Miller; +Cc: mashirle, netdev, kvm, linux-kernel

On Thu, Jul 07, 2011 at 04:08:40AM -0700, David Miller wrote:
> From: Shirley Ma <mashirle@us.ibm.com>
> Date: Wed, 06 Jul 2011 15:15:25 -0700
> 
> > This patchset add supports for TX zero-copy between guest and host
> > kernel through vhost. It significantly reduces CPU utilization on the
> > local host on which the guest is located (It reduced about 50% CPU usage
> > for single stream test on the host, while 4K message size BW has
> > increased about 50%). The patchset is based on previous submission and
> > comments from the community regarding when/how to handle guest kernel
> > buffers to be released. This is the simplest approach I can think of
> > after comparing with several other solutions.
> > 
> > This patchset has integrated V3 review comments from community: 
> 
> I'm personally fine with this patch set.  Unless there are others
> who object, please fix the use-after-free bug I reported, respin
> the patch set, and I'll apply it.
> 
> Thanks.

There's the FIXME in patch 4 where it spins in vhost waiting for
the pages to get freed. I'm fixing that up as Shirley's on vacation.

Apply patches 1-3 for now?

-- 
MST

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V8 0/4 net-next] macvtap/vhost TX zero-copy support
  2011-07-07 11:37   ` Michael S. Tsirkin
@ 2011-07-07 11:49     ` David Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2011-07-07 11:49 UTC (permalink / raw)
  To: mst; +Cc: mashirle, netdev, kvm, linux-kernel

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 7 Jul 2011 14:37:15 +0300

> Apply patches 1-3 for now?

Done, and I fixed the use-after-free in patch #2.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-07-07 11:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-06 22:15 [PATCH V8 0/4 net-next] macvtap/vhost TX zero-copy support Shirley Ma
2011-07-07 11:08 ` David Miller
2011-07-07 11:37   ` Michael S. Tsirkin
2011-07-07 11:49     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).