linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 0/6] tun zerocopy support
@ 2012-07-20 19:23 Michael S. Tsirkin
  2012-07-20 19:23 ` [PATCHv3 1/6] skbuff: add an api to orphan frags Michael S. Tsirkin
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-07-20 19:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem

This adds support for experimental zero copy transmit to tun.

This includes some patches from Ian's patchset to support zerocopy with tun,
so it should help that work progress: we are still trying to figure out
how to make everything work properly with tcp but tun seems easier, and
it's helpful by itself since not everyone can use macvtap.

Same as with macvtap, I get single-percentage wins in CPU utilization
on guest to external from this patchset, and a performance regression on
guest to host, so more work is needed until this feature can move out of
experimental status, but I think it's useful for some people already.

Pls review and consider for 3.6.

There's some code duplication between tun and macvtap now: common code
could move to net/core/datagram.c, this patch does not do this yet.

Changes from v2:
	Fixed some bugs so it's stable now

Michael S. Tsirkin (6):
  skbuff: add an api to orphan frags
  skbuff: convert to skb_orphan_frags
  skbuff: export skb_copy_ubufs
  tun: orphan frags on xmit
  net: orphan frags on receive
  tun: experimental zero copy tx support

 drivers/net/tun.c      | 148 +++++++++++++++++++++++++++++++++++++++++++++----
 include/linux/skbuff.h |  16 ++++++
 net/core/dev.c         |   7 ++-
 net/core/skbuff.c      |  24 +++-----
 4 files changed, 167 insertions(+), 28 deletions(-)

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCHv3 1/6] skbuff: add an api to orphan frags
  2012-07-20 19:23 [PATCHv3 0/6] tun zerocopy support Michael S. Tsirkin
@ 2012-07-20 19:23 ` Michael S. Tsirkin
  2012-07-20 19:23 ` [PATCHv3 2/6] skbuff: convert to skb_orphan_frags Michael S. Tsirkin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-07-20 19:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem

Many places do
       if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY))
		skb_copy_ubufs(skb, gfp_mask);
to copy and invoke frag destructors if necessary.
Add an inline helper for this.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/linux/skbuff.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 642cb73..d205c4b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1667,6 +1667,22 @@ static inline void skb_orphan(struct sk_buff *skb)
 }
 
 /**
+ *	skb_orphan_frags - orphan the frags contained in a buffer
+ *	@skb: buffer to orphan frags from
+ *	@gfp_mask: allocation mask for replacement pages
+ *
+ *	For each frag in the SKB which needs a destructor (i.e. has an
+ *	owner) create a copy of that frag and release the original
+ *	page by calling the destructor.
+ */
+static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
+{
+	if (likely(!(skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY)))
+		return 0;
+	return skb_copy_ubufs(skb, gfp_mask);
+}
+
+/**
  *	__skb_queue_purge - empty a list
  *	@list: list to empty
  *
-- 
MST


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv3 2/6] skbuff: convert to skb_orphan_frags
  2012-07-20 19:23 [PATCHv3 0/6] tun zerocopy support Michael S. Tsirkin
  2012-07-20 19:23 ` [PATCHv3 1/6] skbuff: add an api to orphan frags Michael S. Tsirkin
@ 2012-07-20 19:23 ` Michael S. Tsirkin
  2012-07-20 19:23 ` [PATCHv3 4/6] tun: orphan frags on xmit Michael S. Tsirkin
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-07-20 19:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem

Reduce code duplication a bit using the new helper.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/core/skbuff.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ccfcb7d..438bbc5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -804,10 +804,8 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 {
 	struct sk_buff *n;
 
-	if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
-		if (skb_copy_ubufs(skb, gfp_mask))
-			return NULL;
-	}
+	if (skb_orphan_frags(skb, gfp_mask))
+		return NULL;
 
 	n = skb + 1;
 	if (skb->fclone == SKB_FCLONE_ORIG &&
@@ -927,12 +925,10 @@ struct sk_buff *__pskb_copy(struct sk_buff *skb, int headroom, gfp_t gfp_mask)
 	if (skb_shinfo(skb)->nr_frags) {
 		int i;
 
-		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
-			if (skb_copy_ubufs(skb, gfp_mask)) {
-				kfree_skb(n);
-				n = NULL;
-				goto out;
-			}
+		if (skb_orphan_frags(skb, gfp_mask)) {
+			kfree_skb(n);
+			n = NULL;
+			goto out;
 		}
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 			skb_shinfo(n)->frags[i] = skb_shinfo(skb)->frags[i];
@@ -1005,10 +1001,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	 */
 	if (skb_cloned(skb)) {
 		/* copy this zero copy skb frags */
-		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
-			if (skb_copy_ubufs(skb, gfp_mask))
-				goto nofrags;
-		}
+		if (skb_orphan_frags(skb, gfp_mask))
+			goto nofrags;
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
 			skb_frag_ref(skb, i);
 
-- 
MST


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv3 4/6] tun: orphan frags on xmit
  2012-07-20 19:23 [PATCHv3 0/6] tun zerocopy support Michael S. Tsirkin
  2012-07-20 19:23 ` [PATCHv3 1/6] skbuff: add an api to orphan frags Michael S. Tsirkin
  2012-07-20 19:23 ` [PATCHv3 2/6] skbuff: convert to skb_orphan_frags Michael S. Tsirkin
@ 2012-07-20 19:23 ` Michael S. Tsirkin
  2012-07-20 19:23 ` [PATCHv3 5/6] net: orphan frags on receive Michael S. Tsirkin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-07-20 19:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem

tun xmit is actually receive of the internal tun
socket. Orphan the frags same as we do for normal rx path.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/tun.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index f3a454c..b95a7f4 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -416,6 +416,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	/* Orphan the skb - required as we might hang on to it
 	 * for indefinite time. */
+	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
+		goto drop;
 	skb_orphan(skb);
 
 	/* Enqueue packet */
-- 
MST


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv3 5/6] net: orphan frags on receive
  2012-07-20 19:23 [PATCHv3 0/6] tun zerocopy support Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2012-07-20 19:23 ` [PATCHv3 4/6] tun: orphan frags on xmit Michael S. Tsirkin
@ 2012-07-20 19:23 ` Michael S. Tsirkin
  2012-07-20 19:23 ` [PATCHv3 3/6] skbuff: export skb_copy_ubufs Michael S. Tsirkin
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-07-20 19:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem

zero copy packets are normally sent to the outside
network, but bridging, tun etc might loop them
back to host networking stack. If this happens
destructors will never be called, so orphan
the frags immediately on receive.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/core/dev.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index d70e4a3..cca02ae 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1632,6 +1632,8 @@ static inline int deliver_skb(struct sk_buff *skb,
 			      struct packet_type *pt_prev,
 			      struct net_device *orig_dev)
 {
+	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
+		return -ENOMEM;
 	atomic_inc(&skb->users);
 	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 }
@@ -3262,7 +3264,10 @@ ncls:
 	}
 
 	if (pt_prev) {
-		ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+		if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
+			ret = -ENOMEM;
+		else
+			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 	} else {
 		atomic_long_inc(&skb->dev->rx_dropped);
 		kfree_skb(skb);
-- 
MST


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv3 3/6] skbuff: export skb_copy_ubufs
  2012-07-20 19:23 [PATCHv3 0/6] tun zerocopy support Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2012-07-20 19:23 ` [PATCHv3 5/6] net: orphan frags on receive Michael S. Tsirkin
@ 2012-07-20 19:23 ` Michael S. Tsirkin
  2012-07-20 19:23 ` [PATCHv3 6/6] tun: experimental zero copy tx support Michael S. Tsirkin
  2012-07-21  0:49 ` [PATCHv3 0/6] tun zerocopy support David Miller
  6 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-07-20 19:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem

Export skb_copy_ubufs so that modules can orphan frags.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 438bbc5..368f65c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -784,7 +784,7 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
 	skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
 	return 0;
 }
-
+EXPORT_SYMBOL_GPL(skb_copy_ubufs);
 
 /**
  *	skb_clone	-	duplicate an sk_buff
-- 
MST


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv3 6/6] tun: experimental zero copy tx support
  2012-07-20 19:23 [PATCHv3 0/6] tun zerocopy support Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2012-07-20 19:23 ` [PATCHv3 3/6] skbuff: export skb_copy_ubufs Michael S. Tsirkin
@ 2012-07-20 19:23 ` Michael S. Tsirkin
  2012-07-21  0:49 ` [PATCHv3 0/6] tun zerocopy support David Miller
  6 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-07-20 19:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem

Let vhost-net utilize zero copy tx when used with tun.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/tun.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 134 insertions(+), 12 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b95a7f4..c62163e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -100,6 +100,8 @@ do {								\
 } while (0)
 #endif
 
+#define GOODCOPY_LEN 128
+
 #define FLT_EXACT_COUNT 8
 struct tap_filter {
 	unsigned int    count;    /* Number of addrs. Zero means disabled */
@@ -604,19 +606,100 @@ static struct sk_buff *tun_alloc_skb(struct tun_struct *tun,
 	return skb;
 }
 
+/* set skb frags from iovec, this can move to core network code for reuse */
+static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
+				  int offset, size_t count)
+{
+	int len = iov_length(from, count) - offset;
+	int copy = skb_headlen(skb);
+	int size, offset1 = 0;
+	int i = 0;
+
+	/* Skip over from offset */
+	while (count && (offset >= from->iov_len)) {
+		offset -= from->iov_len;
+		++from;
+		--count;
+	}
+
+	/* copy up to skb headlen */
+	while (count && (copy > 0)) {
+		size = min_t(unsigned int, copy, from->iov_len - offset);
+		if (copy_from_user(skb->data + offset1, from->iov_base + offset,
+				   size))
+			return -EFAULT;
+		if (copy > size) {
+			++from;
+			--count;
+			offset = 0;
+		} else
+			offset += size;
+		copy -= size;
+		offset1 += size;
+	}
+
+	if (len == offset1)
+		return 0;
+
+	while (count--) {
+		struct page *page[MAX_SKB_FRAGS];
+		int num_pages;
+		unsigned long base;
+		unsigned long truesize;
+
+		len = from->iov_len - offset;
+		if (!len) {
+			offset = 0;
+			++from;
+			continue;
+		}
+		base = (unsigned long)from->iov_base + offset;
+		size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
+		if (i + size > MAX_SKB_FRAGS)
+			return -EMSGSIZE;
+		num_pages = get_user_pages_fast(base, size, 0, &page[i]);
+		if (num_pages != size) {
+			for (i = 0; i < num_pages; i++)
+				put_page(page[i]);
+			return -EFAULT;
+		}
+		truesize = size * PAGE_SIZE;
+		skb->data_len += len;
+		skb->len += len;
+		skb->truesize += truesize;
+		atomic_add(truesize, &skb->sk->sk_wmem_alloc);
+		while (len) {
+			int off = base & ~PAGE_MASK;
+			int size = min_t(int, len, PAGE_SIZE - off);
+			__skb_fill_page_desc(skb, i, page[i], off, size);
+			skb_shinfo(skb)->nr_frags++;
+			/* increase sk_wmem_alloc */
+			base += size;
+			len -= size;
+			i++;
+		}
+		offset = 0;
+		++from;
+	}
+	return 0;
+}
+
 /* Get packet from user space buffer */
-static ssize_t tun_get_user(struct tun_struct *tun,
-			    const struct iovec *iv, size_t count,
-			    int noblock)
+static ssize_t tun_get_user(struct tun_struct *tun, void *msg_control,
+			    const struct iovec *iv, size_t total_len,
+			    size_t count, int noblock)
 {
 	struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) };
 	struct sk_buff *skb;
-	size_t len = count, align = NET_SKB_PAD;
+	size_t len = total_len, align = NET_SKB_PAD;
 	struct virtio_net_hdr gso = { 0 };
 	int offset = 0;
+	int copylen;
+	bool zerocopy = false;
+	int err;
 
 	if (!(tun->flags & TUN_NO_PI)) {
-		if ((len -= sizeof(pi)) > count)
+		if ((len -= sizeof(pi)) > total_len)
 			return -EINVAL;
 
 		if (memcpy_fromiovecend((void *)&pi, iv, 0, sizeof(pi)))
@@ -625,7 +708,7 @@ static ssize_t tun_get_user(struct tun_struct *tun,
 	}
 
 	if (tun->flags & TUN_VNET_HDR) {
-		if ((len -= tun->vnet_hdr_sz) > count)
+		if ((len -= tun->vnet_hdr_sz) > total_len)
 			return -EINVAL;
 
 		if (memcpy_fromiovecend((void *)&gso, iv, offset, sizeof(gso)))
@@ -647,14 +730,46 @@ static ssize_t tun_get_user(struct tun_struct *tun,
 			return -EINVAL;
 	}
 
-	skb = tun_alloc_skb(tun, align, len, gso.hdr_len, noblock);
+	if (msg_control)
+		zerocopy = true;
+
+	if (zerocopy) {
+		/* Userspace may produce vectors with count greater than
+		 * MAX_SKB_FRAGS, so we need to linearize parts of the skb
+		 * to let the rest of data to be fit in the frags.
+		 */
+		if (count > MAX_SKB_FRAGS) {
+			copylen = iov_length(iv, count - MAX_SKB_FRAGS);
+			if (copylen < offset)
+				copylen = 0;
+			else
+				copylen -= offset;
+		} else
+				copylen = 0;
+		/* There are 256 bytes to be copied in skb, so there is enough
+		 * room for skb expand head in case it is used.
+		 * The rest of the buffer is mapped from userspace.
+		 */
+		if (copylen < gso.hdr_len)
+			copylen = gso.hdr_len;
+		if (!copylen)
+			copylen = GOODCOPY_LEN;
+	} else
+		copylen = len;
+
+	skb = tun_alloc_skb(tun, align, copylen, gso.hdr_len, noblock);
 	if (IS_ERR(skb)) {
 		if (PTR_ERR(skb) != -EAGAIN)
 			tun->dev->stats.rx_dropped++;
 		return PTR_ERR(skb);
 	}
 
-	if (skb_copy_datagram_from_iovec(skb, 0, iv, offset, len)) {
+	if (zerocopy)
+		err = zerocopy_sg_from_iovec(skb, iv, offset, count);
+	else
+		err = skb_copy_datagram_from_iovec(skb, 0, iv, offset, len);
+
+	if (err) {
 		tun->dev->stats.rx_dropped++;
 		kfree_skb(skb);
 		return -EFAULT;
@@ -728,12 +843,18 @@ static ssize_t tun_get_user(struct tun_struct *tun,
 		skb_shinfo(skb)->gso_segs = 0;
 	}
 
+	/* copy skb_ubuf_info for callback when skb has no error */
+	if (zerocopy) {
+		skb_shinfo(skb)->destructor_arg = msg_control;
+		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+	}
+
 	netif_rx_ni(skb);
 
 	tun->dev->stats.rx_packets++;
 	tun->dev->stats.rx_bytes += len;
 
-	return count;
+	return total_len;
 }
 
 static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
@@ -748,7 +869,7 @@ static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
 
 	tun_debug(KERN_INFO, tun, "tun_chr_write %ld\n", count);
 
-	result = tun_get_user(tun, iv, iov_length(iv, count),
+	result = tun_get_user(tun, NULL, iv, iov_length(iv, count), count,
 			      file->f_flags & O_NONBLOCK);
 
 	tun_put(tun);
@@ -962,8 +1083,8 @@ static int tun_sendmsg(struct kiocb *iocb, struct socket *sock,
 		       struct msghdr *m, size_t total_len)
 {
 	struct tun_struct *tun = container_of(sock, struct tun_struct, socket);
-	return tun_get_user(tun, m->msg_iov, total_len,
-			    m->msg_flags & MSG_DONTWAIT);
+	return tun_get_user(tun, m->msg_control, m->msg_iov, total_len,
+			    m->msg_iovlen, m->msg_flags & MSG_DONTWAIT);
 }
 
 static int tun_recvmsg(struct kiocb *iocb, struct socket *sock,
@@ -1133,6 +1254,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		sock_init_data(&tun->socket, sk);
 		sk->sk_write_space = tun_sock_write_space;
 		sk->sk_sndbuf = INT_MAX;
+		sock_set_flag(sk, SOCK_ZEROCOPY);
 
 		tun_sk(sk)->tun = tun;
 
-- 
MST

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 0/6] tun zerocopy support
  2012-07-20 19:23 [PATCHv3 0/6] tun zerocopy support Michael S. Tsirkin
                   ` (5 preceding siblings ...)
  2012-07-20 19:23 ` [PATCHv3 6/6] tun: experimental zero copy tx support Michael S. Tsirkin
@ 2012-07-21  0:49 ` David Miller
  2012-07-21 22:05   ` Michael S. Tsirkin
  6 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2012-07-21  0:49 UTC (permalink / raw)
  To: mst; +Cc: jasowang, eric.dumazet, netdev, linux-kernel, ebiederm

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 20 Jul 2012 22:23:03 +0300

> Same as with macvtap, I get single-percentage wins in CPU utilization
> on guest to external from this patchset, and a performance regression on
> guest to host, so more work is needed until this feature can move out of
> experimental status, but I think it's useful for some people already.
> 
> Pls review and consider for 3.6.

It doesn't improve performance in one case, and hurts performance in
another.

You'll have to give me a more compelling argument than that.  You've
just given me every reason not to include these patches in 3.6


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 0/6] tun zerocopy support
  2012-07-21  0:49 ` [PATCHv3 0/6] tun zerocopy support David Miller
@ 2012-07-21 22:05   ` Michael S. Tsirkin
  2012-07-22 19:40     ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-07-21 22:05 UTC (permalink / raw)
  To: David Miller
  Cc: jasowang, eric.dumazet, netdev, linux-kernel, ebiederm, Ian Campbell

On Fri, Jul 20, 2012 at 05:49:02PM -0700, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Fri, 20 Jul 2012 22:23:03 +0300
> 
> > Same as with macvtap, I get single-percentage wins in CPU utilization
> > on guest to external from this patchset, and a performance regression on
> > guest to host, so more work is needed until this feature can move out of
> > experimental status, but I think it's useful for some people already.
> > 
> > Pls review and consider for 3.6.
> 
> It doesn't improve performance in one case, and hurts performance in
> another.
> 
> You'll have to give me a more compelling argument than that.  You've
> just given me every reason not to include these patches in 3.6

OK let me clarify a bit, I think this wasn't explained well:
it's not true that this makes users suffer :)

This patch has no effect unless experimental zero copy mode in vhost-net
is enabled, and it is very clearly marked as experimental.

I agree a small win in CPU use is nothing to write home about,
I don't yet understand why the win is so small - macvtap has zero copy
supported for a while and it has exactly same issues.
I hope adding tun zerocopy support upstream will help us
make progress faster and find the bottleneck, so far not many people use
macvtap so zero copy mode there didn't make progress.

I do know why local performance regresses with zero copy enabled:
instead of plain copy to user we got get user pages and then memcpy.
We'll need some logic here to detect this and turn off zero copy.

The core patches will also be helpful for Ian's more ambitious work.

Overall I think it's a step in the right direction and it's easier to
work if core parts are upstream, but if you think we need to wait
until the gains are more significant, I understand that too.

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 0/6] tun zerocopy support
  2012-07-21 22:05   ` Michael S. Tsirkin
@ 2012-07-22 19:40     ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2012-07-22 19:40 UTC (permalink / raw)
  To: mst; +Cc: jasowang, eric.dumazet, netdev, linux-kernel, ebiederm, Ian.Campbell

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Sun, 22 Jul 2012 01:05:34 +0300

> I agree a small win in CPU use is nothing to write home about,
> I don't yet understand why the win is so small - macvtap has zero copy
> supported for a while and it has exactly same issues.
> I hope adding tun zerocopy support upstream will help us
> make progress faster and find the bottleneck, so far not many people use
> macvtap so zero copy mode there didn't make progress.
> 
> I do know why local performance regresses with zero copy enabled:
> instead of plain copy to user we got get user pages and then memcpy.
> We'll need some logic here to detect this and turn off zero copy.
> 
> The core patches will also be helpful for Ian's more ambitious work.
> 
> Overall I think it's a step in the right direction and it's easier to
> work if core parts are upstream, but if you think we need to wait
> until the gains are more significant, I understand that too.

Ok, I've applied this series, let's see what happens.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-07-22 19:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-20 19:23 [PATCHv3 0/6] tun zerocopy support Michael S. Tsirkin
2012-07-20 19:23 ` [PATCHv3 1/6] skbuff: add an api to orphan frags Michael S. Tsirkin
2012-07-20 19:23 ` [PATCHv3 2/6] skbuff: convert to skb_orphan_frags Michael S. Tsirkin
2012-07-20 19:23 ` [PATCHv3 4/6] tun: orphan frags on xmit Michael S. Tsirkin
2012-07-20 19:23 ` [PATCHv3 5/6] net: orphan frags on receive Michael S. Tsirkin
2012-07-20 19:23 ` [PATCHv3 3/6] skbuff: export skb_copy_ubufs Michael S. Tsirkin
2012-07-20 19:23 ` [PATCHv3 6/6] tun: experimental zero copy tx support Michael S. Tsirkin
2012-07-21  0:49 ` [PATCHv3 0/6] tun zerocopy support David Miller
2012-07-21 22:05   ` Michael S. Tsirkin
2012-07-22 19:40     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).