linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next V4 0/6] switch to use tx skb array in tun
@ 2016-06-30  6:45 Jason Wang
  2016-06-30  6:45 ` [PATCH net-next V4 1/6] ptr_ring: support zero length ring Jason Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Jason Wang @ 2016-06-30  6:45 UTC (permalink / raw)
  To: mst, netdev, linux-kernel, davem
  Cc: kvm, virtualization, eric.dumazet, brouer, Jason Wang

Hi all:

This series tries to switch to use skb array in tun. This is used to
eliminate the spinlock contention between producer and consumer. The
conversion was straightforward: just introdce a tx skb array and use
it instead of sk_receive_queue.

A minor issue is to keep the tx_queue_len behaviour, since tun used to
use it for the length of sk_receive_queue. This is done through:

- add the ability to resize multiple rings at once to avoid handling
  partial resize failure for mutiple rings.
- add the support for zero length ring.
- introduce a notifier which was triggered when tx_queue_len was
  changed for a netdev.
- resize all queues during the tx_queue_len changing.

Tests shows about 15% improvement on guest rx pps:

Before: ~1300000pps
After : ~1500000pps

Changes from V3:
- fix kbuild warnings
- call NETDEV_CHANGE_TX_QUEUE_LEN on IFLA_TXQLEN

Changes from V2:
- add multiple rings resizing support for ptr_ring/skb_array
- add zero length ring support
- introdce a NETDEV_CHANGE_TX_QUEUE_LEN
- drop new flags

Changes from V1:
- switch to use skb array instead of a customized circular buffer
- add non-blocking support
- rename .peek to .peek_len
- drop lockless peeking since test show very minor improvement

Jason Wang (5):
  ptr_ring: support zero length ring
  skb_array: minor tweak
  skb_array: add wrappers for resizing
  net: introduce NETDEV_CHANGE_TX_QUEUE_LEN
  tun: switch to use skb array for tx

Michael S. Tsirkin (1):
  ptr_ring: support resizing multiple queues

 drivers/net/tun.c                | 138 ++++++++++++++++++++++++++++++++++++---
 drivers/vhost/net.c              |  16 ++++-
 include/linux/net.h              |   1 +
 include/linux/netdevice.h        |   1 +
 include/linux/ptr_ring.h         |  77 ++++++++++++++++++----
 include/linux/skb_array.h        |  13 +++-
 net/core/net-sysfs.c             |  15 ++++-
 net/core/rtnetlink.c             |  16 +++--
 tools/virtio/ringtest/ptr_ring.c |   5 ++
 9 files changed, 255 insertions(+), 27 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next V4 1/6] ptr_ring: support zero length ring
  2016-06-30  6:45 [PATCH net-next V4 0/6] switch to use tx skb array in tun Jason Wang
@ 2016-06-30  6:45 ` Jason Wang
  2016-06-30  6:45 ` [PATCH net-next V4 2/6] skb_array: minor tweak Jason Wang
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2016-06-30  6:45 UTC (permalink / raw)
  To: mst, netdev, linux-kernel, davem
  Cc: kvm, virtualization, eric.dumazet, brouer, Jason Wang

Sometimes, we need zero length ring. But current code will crash since
we don't do any check before accessing the ring. This patch fixes this.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/ptr_ring.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 562a65e..d78b8b8 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -102,7 +102,7 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
  */
 static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
 {
-	if (r->queue[r->producer])
+	if (unlikely(!r->size) || r->queue[r->producer])
 		return -ENOSPC;
 
 	r->queue[r->producer++] = ptr;
@@ -164,7 +164,9 @@ static inline int ptr_ring_produce_bh(struct ptr_ring *r, void *ptr)
  */
 static inline void *__ptr_ring_peek(struct ptr_ring *r)
 {
-	return r->queue[r->consumer];
+	if (likely(r->size))
+		return r->queue[r->consumer];
+	return NULL;
 }
 
 /* Note: callers invoking this in a loop must use a compiler barrier,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next V4 2/6] skb_array: minor tweak
  2016-06-30  6:45 [PATCH net-next V4 0/6] switch to use tx skb array in tun Jason Wang
  2016-06-30  6:45 ` [PATCH net-next V4 1/6] ptr_ring: support zero length ring Jason Wang
@ 2016-06-30  6:45 ` Jason Wang
  2016-06-30  6:45 ` [PATCH net-next V4 3/6] ptr_ring: support resizing multiple queues Jason Wang
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2016-06-30  6:45 UTC (permalink / raw)
  To: mst, netdev, linux-kernel, davem
  Cc: kvm, virtualization, eric.dumazet, brouer, Jason Wang

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/skb_array.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index 678bfbf..2dd0d1e 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -151,12 +151,12 @@ static inline int skb_array_init(struct skb_array *a, int size, gfp_t gfp)
 	return ptr_ring_init(&a->ring, size, gfp);
 }
 
-void __skb_array_destroy_skb(void *ptr)
+static void __skb_array_destroy_skb(void *ptr)
 {
 	kfree_skb(ptr);
 }
 
-int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
+static inline int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
 {
 	return ptr_ring_resize(&a->ring, size, gfp, __skb_array_destroy_skb);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next V4 3/6] ptr_ring: support resizing multiple queues
  2016-06-30  6:45 [PATCH net-next V4 0/6] switch to use tx skb array in tun Jason Wang
  2016-06-30  6:45 ` [PATCH net-next V4 1/6] ptr_ring: support zero length ring Jason Wang
  2016-06-30  6:45 ` [PATCH net-next V4 2/6] skb_array: minor tweak Jason Wang
@ 2016-06-30  6:45 ` Jason Wang
  2016-06-30  6:45 ` [PATCH net-next V4 4/6] skb_array: add wrappers for resizing Jason Wang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2016-06-30  6:45 UTC (permalink / raw)
  To: mst, netdev, linux-kernel, davem
  Cc: kvm, virtualization, eric.dumazet, brouer, Jason Wang

From: "Michael S. Tsirkin" <mst@redhat.com>

Sometimes, we need support resizing multiple queues at once. This is
because it was not easy to recover to recover from a partial failure
of multiple queues resizing.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/ptr_ring.h         | 71 +++++++++++++++++++++++++++++++++++-----
 tools/virtio/ringtest/ptr_ring.c |  5 +++
 2 files changed, 67 insertions(+), 9 deletions(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index d78b8b8..2052011 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -349,20 +349,14 @@ static inline int ptr_ring_init(struct ptr_ring *r, int size, gfp_t gfp)
 	return 0;
 }
 
-static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
-				  void (*destroy)(void *))
+static inline void **__ptr_ring_swap_queue(struct ptr_ring *r, void **queue,
+					   int size, gfp_t gfp,
+					   void (*destroy)(void *))
 {
-	unsigned long flags;
 	int producer = 0;
-	void **queue = __ptr_ring_init_queue_alloc(size, gfp);
 	void **old;
 	void *ptr;
 
-	if (!queue)
-		return -ENOMEM;
-
-	spin_lock_irqsave(&(r)->producer_lock, flags);
-
 	while ((ptr = ptr_ring_consume(r)))
 		if (producer < size)
 			queue[producer++] = ptr;
@@ -375,6 +369,23 @@ static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
 	old = r->queue;
 	r->queue = queue;
 
+	return old;
+}
+
+static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
+				  void (*destroy)(void *))
+{
+	unsigned long flags;
+	void **queue = __ptr_ring_init_queue_alloc(size, gfp);
+	void **old;
+
+	if (!queue)
+		return -ENOMEM;
+
+	spin_lock_irqsave(&(r)->producer_lock, flags);
+
+	old = __ptr_ring_swap_queue(r, queue, size, gfp, destroy);
+
 	spin_unlock_irqrestore(&(r)->producer_lock, flags);
 
 	kfree(old);
@@ -382,6 +393,48 @@ static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
 	return 0;
 }
 
+static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
+					   int size,
+					   gfp_t gfp, void (*destroy)(void *))
+{
+	unsigned long flags;
+	void ***queues;
+	int i;
+
+	queues = kmalloc(nrings * sizeof *queues, gfp);
+	if (!queues)
+		goto noqueues;
+
+	for (i = 0; i < nrings; ++i) {
+		queues[i] = __ptr_ring_init_queue_alloc(size, gfp);
+		if (!queues[i])
+			goto nomem;
+	}
+
+	for (i = 0; i < nrings; ++i) {
+		spin_lock_irqsave(&(rings[i])->producer_lock, flags);
+		queues[i] = __ptr_ring_swap_queue(rings[i], queues[i],
+						  size, gfp, destroy);
+		spin_unlock_irqrestore(&(rings[i])->producer_lock, flags);
+	}
+
+	for (i = 0; i < nrings; ++i)
+		kfree(queues[i]);
+
+	kfree(queues);
+
+	return 0;
+
+nomem:
+	while (--i >= 0)
+		kfree(queues[i]);
+
+	kfree(queues);
+
+noqueues:
+	return -ENOMEM;
+}
+
 static inline void ptr_ring_cleanup(struct ptr_ring *r, void (*destroy)(void *))
 {
 	void *ptr;
diff --git a/tools/virtio/ringtest/ptr_ring.c b/tools/virtio/ringtest/ptr_ring.c
index 74abd74..68e4f9f 100644
--- a/tools/virtio/ringtest/ptr_ring.c
+++ b/tools/virtio/ringtest/ptr_ring.c
@@ -17,6 +17,11 @@
 typedef pthread_spinlock_t  spinlock_t;
 
 typedef int gfp_t;
+static void *kmalloc(unsigned size, gfp_t gfp)
+{
+	return memalign(64, size);
+}
+
 static void *kzalloc(unsigned size, gfp_t gfp)
 {
 	void *p = memalign(64, size);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next V4 4/6] skb_array: add wrappers for resizing
  2016-06-30  6:45 [PATCH net-next V4 0/6] switch to use tx skb array in tun Jason Wang
                   ` (2 preceding siblings ...)
  2016-06-30  6:45 ` [PATCH net-next V4 3/6] ptr_ring: support resizing multiple queues Jason Wang
@ 2016-06-30  6:45 ` Jason Wang
  2016-06-30  6:45 ` [PATCH net-next V4 5/6] net: introduce NETDEV_CHANGE_TX_QUEUE_LEN Jason Wang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2016-06-30  6:45 UTC (permalink / raw)
  To: mst, netdev, linux-kernel, davem
  Cc: kvm, virtualization, eric.dumazet, brouer, Jason Wang

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/skb_array.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index 2dd0d1e..f4dfade 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -161,6 +161,15 @@ static inline int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
 	return ptr_ring_resize(&a->ring, size, gfp, __skb_array_destroy_skb);
 }
 
+static inline int skb_array_resize_multiple(struct skb_array **rings,
+					    int nrings, int size, gfp_t gfp)
+{
+	BUILD_BUG_ON(offsetof(struct skb_array, ring));
+	return ptr_ring_resize_multiple((struct ptr_ring **)rings,
+					nrings, size, gfp,
+					__skb_array_destroy_skb);
+}
+
 static inline void skb_array_cleanup(struct skb_array *a)
 {
 	ptr_ring_cleanup(&a->ring, __skb_array_destroy_skb);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next V4 5/6] net: introduce NETDEV_CHANGE_TX_QUEUE_LEN
  2016-06-30  6:45 [PATCH net-next V4 0/6] switch to use tx skb array in tun Jason Wang
                   ` (3 preceding siblings ...)
  2016-06-30  6:45 ` [PATCH net-next V4 4/6] skb_array: add wrappers for resizing Jason Wang
@ 2016-06-30  6:45 ` Jason Wang
  2016-06-30 16:33   ` John Fastabend
  2016-06-30  6:45 ` [PATCH net-next V4 6/6] tun: switch to use skb array for tx Jason Wang
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2016-06-30  6:45 UTC (permalink / raw)
  To: mst, netdev, linux-kernel, davem
  Cc: kvm, virtualization, eric.dumazet, brouer, Jason Wang, John Fastabend

This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this
will be triggered when tx_queue_len. It could be used by net device
who want to do some processing at that time. An example is tun who may
want to resize tx array when tx_queue_len is changed.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/netdevice.h |  1 +
 net/core/net-sysfs.c      | 15 ++++++++++++++-
 net/core/rtnetlink.c      | 16 ++++++++++++----
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e84d9d2..7dc2ec7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2237,6 +2237,7 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_PRECHANGEUPPER	0x001A
 #define NETDEV_CHANGELOWERSTATE	0x001B
 #define NETDEV_UDP_TUNNEL_PUSH_INFO	0x001C
+#define NETDEV_CHANGE_TX_QUEUE_LEN	0x001E
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 7a0b616..6e4f347 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -322,7 +322,20 @@ NETDEVICE_SHOW_RW(flags, fmt_hex);
 
 static int change_tx_queue_len(struct net_device *dev, unsigned long new_len)
 {
-	dev->tx_queue_len = new_len;
+	int res, orig_len = dev->tx_queue_len;
+
+	if (new_len != orig_len) {
+		dev->tx_queue_len = new_len;
+		res = call_netdevice_notifiers(NETDEV_CHANGE_TX_QUEUE_LEN, dev);
+		res = notifier_to_errno(res);
+		if (res) {
+			netdev_err(dev,
+				   "refused to change device tx_queue_len\n");
+			dev->tx_queue_len = orig_len;
+			return -EFAULT;
+		}
+	}
+
 	return 0;
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index eb49ca2..b16b779 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1927,11 +1927,19 @@ static int do_setlink(const struct sk_buff *skb,
 
 	if (tb[IFLA_TXQLEN]) {
 		unsigned long value = nla_get_u32(tb[IFLA_TXQLEN]);
-
-		if (dev->tx_queue_len ^ value)
+		unsigned long orig_len = dev->tx_queue_len;
+
+		if (dev->tx_queue_len ^ value) {
+			dev->tx_queue_len = value;
+			err = call_netdevice_notifiers(
+			      NETDEV_CHANGE_TX_QUEUE_LEN, dev);
+			err = notifier_to_errno(err);
+			if (err) {
+				dev->tx_queue_len = orig_len;
+				goto errout;
+			}
 			status |= DO_SETLINK_NOTIFY;
-
-		dev->tx_queue_len = value;
+		}
 	}
 
 	if (tb[IFLA_OPERSTATE])
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next V4 6/6] tun: switch to use skb array for tx
  2016-06-30  6:45 [PATCH net-next V4 0/6] switch to use tx skb array in tun Jason Wang
                   ` (4 preceding siblings ...)
  2016-06-30  6:45 ` [PATCH net-next V4 5/6] net: introduce NETDEV_CHANGE_TX_QUEUE_LEN Jason Wang
@ 2016-06-30  6:45 ` Jason Wang
  2016-06-30 15:45 ` [PATCH net-next V4 0/6] switch to use tx skb array in tun Michael S. Tsirkin
  2016-07-01  9:40 ` David Miller
  7 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2016-06-30  6:45 UTC (permalink / raw)
  To: mst, netdev, linux-kernel, davem
  Cc: kvm, virtualization, eric.dumazet, brouer, Jason Wang

We used to queue tx packets in sk_receive_queue, this is less
efficient since it requires spinlocks to synchronize between producer
and consumer.

This patch tries to address this by:

- switch from sk_receive_queue to a skb_array, and resize it when
  tx_queue_len was changed.
- introduce a new proto_ops peek_len which was used for peeking the
  skb length.
- implement a tun version of peek_len for vhost_net to use and convert
  vhost_net to use peek_len if possible.

Pktgen test shows about 15.3% improvement on guest receiving pps for small
buffers:

Before: ~1300000pps
After : ~1500000pps

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c   | 138 +++++++++++++++++++++++++++++++++++++++++++++++++---
 drivers/vhost/net.c |  16 +++++-
 include/linux/net.h |   1 +
 3 files changed, 146 insertions(+), 9 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4884802..7475215 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -71,6 +71,7 @@
 #include <net/sock.h>
 #include <linux/seq_file.h>
 #include <linux/uio.h>
+#include <linux/skb_array.h>
 
 #include <asm/uaccess.h>
 
@@ -167,6 +168,7 @@ struct tun_file {
 	};
 	struct list_head next;
 	struct tun_struct *detached;
+	struct skb_array tx_array;
 };
 
 struct tun_flow_entry {
@@ -515,7 +517,11 @@ static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
 
 static void tun_queue_purge(struct tun_file *tfile)
 {
-	skb_queue_purge(&tfile->sk.sk_receive_queue);
+	struct sk_buff *skb;
+
+	while ((skb = skb_array_consume(&tfile->tx_array)) != NULL)
+		kfree_skb(skb);
+
 	skb_queue_purge(&tfile->sk.sk_error_queue);
 }
 
@@ -560,6 +566,8 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
 			    tun->dev->reg_state == NETREG_REGISTERED)
 				unregister_netdevice(tun->dev);
 		}
+		if (tun)
+			skb_array_cleanup(&tfile->tx_array);
 		sock_put(&tfile->sk);
 	}
 }
@@ -613,6 +621,7 @@ static void tun_detach_all(struct net_device *dev)
 static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filter)
 {
 	struct tun_file *tfile = file->private_data;
+	struct net_device *dev = tun->dev;
 	int err;
 
 	err = security_tun_dev_attach(tfile->socket.sk, tun->security);
@@ -642,6 +651,13 @@ static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filte
 		if (!err)
 			goto out;
 	}
+
+	if (!tfile->detached &&
+	    skb_array_init(&tfile->tx_array, dev->tx_queue_len, GFP_KERNEL)) {
+		err = -ENOMEM;
+		goto out;
+	}
+
 	tfile->queue_index = tun->numqueues;
 	tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
 	rcu_assign_pointer(tfile->tun, tun);
@@ -891,8 +907,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	nf_reset(skb);
 
-	/* Enqueue packet */
-	skb_queue_tail(&tfile->socket.sk->sk_receive_queue, skb);
+	if (skb_array_produce(&tfile->tx_array, skb))
+		goto drop;
 
 	/* Notify and wake up reader process */
 	if (tfile->flags & TUN_FASYNC)
@@ -1107,7 +1123,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table *wait)
 
 	poll_wait(file, sk_sleep(sk), wait);
 
-	if (!skb_queue_empty(&sk->sk_receive_queue))
+	if (!skb_array_empty(&tfile->tx_array))
 		mask |= POLLIN | POLLRDNORM;
 
 	if (sock_writeable(sk) ||
@@ -1426,22 +1442,61 @@ done:
 	return total;
 }
 
+static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock,
+				     int *err)
+{
+	DECLARE_WAITQUEUE(wait, current);
+	struct sk_buff *skb = NULL;
+
+	skb = skb_array_consume(&tfile->tx_array);
+	if (skb)
+		goto out;
+	if (noblock) {
+		*err = -EAGAIN;
+		goto out;
+	}
+
+	add_wait_queue(&tfile->wq.wait, &wait);
+	current->state = TASK_INTERRUPTIBLE;
+
+	while (1) {
+		skb = skb_array_consume(&tfile->tx_array);
+		if (skb)
+			break;
+		if (signal_pending(current)) {
+			*err = -ERESTARTSYS;
+			break;
+		}
+		if (tfile->socket.sk->sk_shutdown & RCV_SHUTDOWN) {
+			*err = -EFAULT;
+			break;
+		}
+
+		schedule();
+	}
+
+	current->state = TASK_RUNNING;
+	remove_wait_queue(&tfile->wq.wait, &wait);
+
+out:
+	return skb;
+}
+
 static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 			   struct iov_iter *to,
 			   int noblock)
 {
 	struct sk_buff *skb;
 	ssize_t ret;
-	int peeked, err, off = 0;
+	int err;
 
 	tun_debug(KERN_INFO, tun, "tun_do_read\n");
 
 	if (!iov_iter_count(to))
 		return 0;
 
-	/* Read frames from queue */
-	skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
-				  &peeked, &off, &err);
+	/* Read frames from ring */
+	skb = tun_ring_recv(tfile, noblock, &err);
 	if (!skb)
 		return err;
 
@@ -1574,8 +1629,25 @@ out:
 	return ret;
 }
 
+static int tun_peek_len(struct socket *sock)
+{
+	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
+	struct tun_struct *tun;
+	int ret = 0;
+
+	tun = __tun_get(tfile);
+	if (!tun)
+		return 0;
+
+	ret = skb_array_peek_len(&tfile->tx_array);
+	tun_put(tun);
+
+	return ret;
+}
+
 /* Ops structure to mimic raw sockets with tun */
 static const struct proto_ops tun_socket_ops = {
+	.peek_len = tun_peek_len,
 	.sendmsg = tun_sendmsg,
 	.recvmsg = tun_recvmsg,
 };
@@ -2397,6 +2469,53 @@ static const struct ethtool_ops tun_ethtool_ops = {
 	.get_ts_info	= ethtool_op_get_ts_info,
 };
 
+static int tun_queue_resize(struct tun_struct *tun)
+{
+	struct net_device *dev = tun->dev;
+	struct tun_file *tfile;
+	struct skb_array **arrays;
+	int n = tun->numqueues + tun->numdisabled;
+	int ret, i;
+
+	arrays = kmalloc(sizeof *arrays * n, GFP_KERNEL);
+	if (!arrays)
+		return -ENOMEM;
+
+	for (i = 0; i < tun->numqueues; i++) {
+		tfile = rtnl_dereference(tun->tfiles[i]);
+		arrays[i] = &tfile->tx_array;
+	}
+	list_for_each_entry(tfile, &tun->disabled, next)
+		arrays[i++] = &tfile->tx_array;
+
+	ret = skb_array_resize_multiple(arrays, n,
+					dev->tx_queue_len, GFP_KERNEL);
+
+	kfree(arrays);
+	return ret;
+}
+
+static int tun_device_event(struct notifier_block *unused,
+			    unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct tun_struct *tun = netdev_priv(dev);
+
+	switch (event) {
+	case NETDEV_CHANGE_TX_QUEUE_LEN:
+		if (tun_queue_resize(tun))
+			return NOTIFY_BAD;
+		break;
+	default:
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block tun_notifier_block __read_mostly = {
+	.notifier_call	= tun_device_event,
+};
 
 static int __init tun_init(void)
 {
@@ -2416,6 +2535,8 @@ static int __init tun_init(void)
 		pr_err("Can't register misc device %d\n", TUN_MINOR);
 		goto err_misc;
 	}
+
+	register_netdevice_notifier(&tun_notifier_block);
 	return  0;
 err_misc:
 	rtnl_link_unregister(&tun_link_ops);
@@ -2427,6 +2548,7 @@ static void tun_cleanup(void)
 {
 	misc_deregister(&tun_miscdev);
 	rtnl_link_unregister(&tun_link_ops);
+	unregister_netdevice_notifier(&tun_notifier_block);
 }
 
 /* Get an underlying socket object from tun file.  Returns error unless file is
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 1d3e45f..e032ca3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -481,10 +481,14 @@ out:
 
 static int peek_head_len(struct sock *sk)
 {
+	struct socket *sock = sk->sk_socket;
 	struct sk_buff *head;
 	int len = 0;
 	unsigned long flags;
 
+	if (sock->ops->peek_len)
+		return sock->ops->peek_len(sock);
+
 	spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
 	head = skb_peek(&sk->sk_receive_queue);
 	if (likely(head)) {
@@ -497,6 +501,16 @@ static int peek_head_len(struct sock *sk)
 	return len;
 }
 
+static int sk_has_rx_data(struct sock *sk)
+{
+	struct socket *sock = sk->sk_socket;
+
+	if (sock->ops->peek_len)
+		return sock->ops->peek_len(sock);
+
+	return skb_queue_empty(&sk->sk_receive_queue);
+}
+
 static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 {
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
@@ -513,7 +527,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 		endtime = busy_clock() + vq->busyloop_timeout;
 
 		while (vhost_can_busy_poll(&net->dev, endtime) &&
-		       skb_queue_empty(&sk->sk_receive_queue) &&
+		       !sk_has_rx_data(sk) &&
 		       vhost_vq_avail_empty(&net->dev, vq))
 			cpu_relax_lowlatency();
 
diff --git a/include/linux/net.h b/include/linux/net.h
index 9aa49a0..b6b3843 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -185,6 +185,7 @@ struct proto_ops {
 	ssize_t 	(*splice_read)(struct socket *sock,  loff_t *ppos,
 				       struct pipe_inode_info *pipe, size_t len, unsigned int flags);
 	int		(*set_peek_off)(struct sock *sk, int val);
+	int		(*peek_len)(struct socket *sock);
 };
 
 #define DECLARE_SOCKADDR(type, dst, src)	\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
  2016-06-30  6:45 [PATCH net-next V4 0/6] switch to use tx skb array in tun Jason Wang
                   ` (5 preceding siblings ...)
  2016-06-30  6:45 ` [PATCH net-next V4 6/6] tun: switch to use skb array for tx Jason Wang
@ 2016-06-30 15:45 ` Michael S. Tsirkin
  2016-07-01  6:04   ` Jason Wang
  2016-07-01  9:40 ` David Miller
  7 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2016-06-30 15:45 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, linux-kernel, davem, kvm, virtualization, eric.dumazet, brouer

On Thu, Jun 30, 2016 at 02:45:30PM +0800, Jason Wang wrote:
> Hi all:
> 
> This series tries to switch to use skb array in tun. This is used to
> eliminate the spinlock contention between producer and consumer. The
> conversion was straightforward: just introdce a tx skb array and use
> it instead of sk_receive_queue.
> 
> A minor issue is to keep the tx_queue_len behaviour, since tun used to
> use it for the length of sk_receive_queue. This is done through:
> 
> - add the ability to resize multiple rings at once to avoid handling
>   partial resize failure for mutiple rings.
> - add the support for zero length ring.
> - introduce a notifier which was triggered when tx_queue_len was
>   changed for a netdev.
> - resize all queues during the tx_queue_len changing.
> 
> Tests shows about 15% improvement on guest rx pps:
> 
> Before: ~1300000pps
> After : ~1500000pps

Acked-by: Michael S. Tsirkin <mst@redhat.com>

Acked-from-altitude: 34697 feet.


> Changes from V3:
> - fix kbuild warnings
> - call NETDEV_CHANGE_TX_QUEUE_LEN on IFLA_TXQLEN
> 
> Changes from V2:
> - add multiple rings resizing support for ptr_ring/skb_array
> - add zero length ring support
> - introdce a NETDEV_CHANGE_TX_QUEUE_LEN
> - drop new flags
> 
> Changes from V1:
> - switch to use skb array instead of a customized circular buffer
> - add non-blocking support
> - rename .peek to .peek_len
> - drop lockless peeking since test show very minor improvement
> 
> Jason Wang (5):
>   ptr_ring: support zero length ring
>   skb_array: minor tweak
>   skb_array: add wrappers for resizing
>   net: introduce NETDEV_CHANGE_TX_QUEUE_LEN
>   tun: switch to use skb array for tx
> 
> Michael S. Tsirkin (1):
>   ptr_ring: support resizing multiple queues
> 
>  drivers/net/tun.c                | 138 ++++++++++++++++++++++++++++++++++++---
>  drivers/vhost/net.c              |  16 ++++-
>  include/linux/net.h              |   1 +
>  include/linux/netdevice.h        |   1 +
>  include/linux/ptr_ring.h         |  77 ++++++++++++++++++----
>  include/linux/skb_array.h        |  13 +++-
>  net/core/net-sysfs.c             |  15 ++++-
>  net/core/rtnetlink.c             |  16 +++--
>  tools/virtio/ringtest/ptr_ring.c |   5 ++
>  9 files changed, 255 insertions(+), 27 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V4 5/6] net: introduce NETDEV_CHANGE_TX_QUEUE_LEN
  2016-06-30  6:45 ` [PATCH net-next V4 5/6] net: introduce NETDEV_CHANGE_TX_QUEUE_LEN Jason Wang
@ 2016-06-30 16:33   ` John Fastabend
  0 siblings, 0 replies; 15+ messages in thread
From: John Fastabend @ 2016-06-30 16:33 UTC (permalink / raw)
  To: Jason Wang, mst, netdev, linux-kernel, davem
  Cc: kvm, virtualization, eric.dumazet, brouer, John Fastabend

On 16-06-29 11:45 PM, Jason Wang wrote:
> This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this
> will be triggered when tx_queue_len. It could be used by net device
> who want to do some processing at that time. An example is tun who may
> want to resize tx array when tx_queue_len is changed.
> 
> Cc: John Fastabend <john.r.fastabend@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---


Thanks for adding the setlink case.

Acked-by: John Fastabend <john.r.fastabend@intel.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
  2016-06-30 15:45 ` [PATCH net-next V4 0/6] switch to use tx skb array in tun Michael S. Tsirkin
@ 2016-07-01  6:04   ` Jason Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2016-07-01  6:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, davem, kvm, virtualization, eric.dumazet, brouer



On 2016年06月30日 23:45, Michael S. Tsirkin wrote:
> On Thu, Jun 30, 2016 at 02:45:30PM +0800, Jason Wang wrote:
>> >Hi all:
>> >
>> >This series tries to switch to use skb array in tun. This is used to
>> >eliminate the spinlock contention between producer and consumer. The
>> >conversion was straightforward: just introdce a tx skb array and use
>> >it instead of sk_receive_queue.
>> >
>> >A minor issue is to keep the tx_queue_len behaviour, since tun used to
>> >use it for the length of sk_receive_queue. This is done through:
>> >
>> >- add the ability to resize multiple rings at once to avoid handling
>> >   partial resize failure for mutiple rings.
>> >- add the support for zero length ring.
>> >- introduce a notifier which was triggered when tx_queue_len was
>> >   changed for a netdev.
>> >- resize all queues during the tx_queue_len changing.
>> >
>> >Tests shows about 15% improvement on guest rx pps:
>> >
>> >Before: ~1300000pps
>> >After : ~1500000pps
> Acked-by: Michael S. Tsirkin<mst@redhat.com>
>
> Acked-from-altitude: 34697 feet.

Wow, thanks a lot!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
  2016-06-30  6:45 [PATCH net-next V4 0/6] switch to use tx skb array in tun Jason Wang
                   ` (6 preceding siblings ...)
  2016-06-30 15:45 ` [PATCH net-next V4 0/6] switch to use tx skb array in tun Michael S. Tsirkin
@ 2016-07-01  9:40 ` David Miller
  7 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2016-07-01  9:40 UTC (permalink / raw)
  To: jasowang
  Cc: mst, netdev, linux-kernel, kvm, virtualization, eric.dumazet, brouer

From: Jason Wang <jasowang@redhat.com>
Date: Thu, 30 Jun 2016 14:45:30 +0800

> This series tries to switch to use skb array in tun. This is used to
> eliminate the spinlock contention between producer and consumer. The
> conversion was straightforward: just introdce a tx skb array and use
> it instead of sk_receive_queue.

Series applied, thanks Jason.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
  2016-07-08  6:19 ` Michael S. Tsirkin
@ 2016-07-08  9:14   ` Jason Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2016-07-08  9:14 UTC (permalink / raw)
  To: Michael S. Tsirkin, Craig Gallek
  Cc: netdev, LKML, David Miller, kvm, virtualization, Eric Dumazet, brouer



On 2016年07月08日 14:19, Michael S. Tsirkin wrote:
> On Wed, Jul 06, 2016 at 01:45:58PM -0400, Craig Gallek wrote:
>> >On Thu, Jun 30, 2016 at 2:45 AM, Jason Wang<jasowang@redhat.com>  wrote:
>>> > >Hi all:
>>> > >
>>> > >This series tries to switch to use skb array in tun. This is used to
>>> > >eliminate the spinlock contention between producer and consumer. The
>>> > >conversion was straightforward: just introdce a tx skb array and use
>>> > >it instead of sk_receive_queue.
>> >
>> >I'm seeing the splat below after this series.  I'm still wrapping my
>> >head around this code, but it appears to be happening because the
>> >tun_struct passed into tun_queue_resize is uninitialized.
>> >Specifically, iteration over the disabled list_head fails because prev
>> >= next = NULL.  This seems to happen when a startup script on my test
>> >machine changes the queue length.  I'll try to figure out what's
>> >happening, but if it's obvious to someone else from the stack, please
>> >let me know.
> Don't see anything obvious. I'm traveling, will look at it when I'm back
> unless it's fixed by then. Jason, any idea?
>

Looks like Craig has posted a fix to this:

http://patchwork.ozlabs.org/patch/645645/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
  2016-07-06 17:45 Craig Gallek
  2016-07-08  6:19 ` Michael S. Tsirkin
@ 2016-07-08  6:19 ` Michael S. Tsirkin
  1 sibling, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2016-07-08  6:19 UTC (permalink / raw)
  To: Craig Gallek
  Cc: Jason Wang, netdev, LKML, David Miller, kvm, virtualization,
	Eric Dumazet, brouer

On Wed, Jul 06, 2016 at 01:45:58PM -0400, Craig Gallek wrote:
> On Thu, Jun 30, 2016 at 2:45 AM, Jason Wang <jasowang@redhat.com> wrote:
> > Hi all:
> >
> > This series tries to switch to use skb array in tun. This is used to
> > eliminate the spinlock contention between producer and consumer. The
> > conversion was straightforward: just introdce a tx skb array and use
> > it instead of sk_receive_queue.
> 
> I'm seeing the splat below after this series.  I'm still wrapping my
> head around this code, but it appears to be happening because the
> tun_struct passed into tun_queue_resize is uninitialized.
> Specifically, iteration over the disabled list_head fails because prev
> = next = NULL.  This seems to happen when a startup script on my test
> machine changes the queue length.  I'll try to figure out what's
> happening, but if it's obvious to someone else from the stack, please
> let me know.


Uploading your .config might help BTW.

> [   72.322236] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000010
> [   72.329993] IP: [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
> [   72.336032] PGD 7f054f1067 PUD 7ef6f3f067 PMD 0
> [   72.340616] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
> [   72.345498] gsmi: Log Shutdown Reason 0x03
> [   72.349541] Modules linked in: w1_therm wire cdc_acm ehci_pci
> ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
> [   72.359870] CPU: 12 PID: 7820 Comm: set.ixion-haswe Not tainted
> 4.7.0-dbx-DEV #10
> [   72.360253] mlx4_en: eth0: Link Up
> [   72.370618] Hardware name: Intel Grantley,Wellsburg/Ixion_IT_15,
> BIOS 2.50.0 01/21/2016
> [   72.378525] task: ffff883f2501e8c0 ti: ffff883f3ef08000 task.ti:
> ffff883f3ef08000
> [   72.385917] RIP: 0010:[<ffffffff8153c1a0>]  [<ffffffff8153c1a0>]
> tun_device_event+0x110/0x340
> [   72.394353] RSP: 0018:ffff883f3ef0bbe8  EFLAGS: 00010202
> [   72.399599] RAX: fffffffffffffae8 RBX: ffff887ef9883378 RCX: 0000000000000000
> [   72.406647] RDX: 0000000000000000 RSI: 0000000000000028 RDI: 0000000000000000
> [   72.413694] RBP: ffff883f3ef0bc58 R08: 0000000000000000 R09: 0000000000000001
> [   72.420742] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000010
> [   72.427789] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883f3ef0bd10
> [   72.434837] FS:  00007fac4e5dd700(0000) GS:ffff883f7f700000(0000)
> knlGS:0000000000000000
> [   72.442832] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   72.448507] CR2: 0000000000000010 CR3: 0000007ef66ac000 CR4: 00000000001406e0
> [   72.455555] Stack:
> [   72.457541]  ffff883f3ef0bc18 0000000000000246 000000000000001e
> ffff887ef9880000
> [   72.464880]  ffff883f3ef0bd10 0000000000000000 0000000000000000
> ffffffff00000000
> [   72.472219]  ffff883f3ef0bc58 ffffffff81d24070 00000000fffffff9
> ffffffff81cec7a0
> [   72.479559] Call Trace:
> [   72.481986]  [<ffffffff810eeb0d>] notifier_call_chain+0x5d/0x80
> [   72.487844]  [<ffffffff816365c0>] ? show_tx_maxrate+0x30/0x30
> [   72.493520]  [<ffffffff810eeb3e>] __raw_notifier_call_chain+0xe/0x10
> [   72.499801]  [<ffffffff810eeb56>] raw_notifier_call_chain+0x16/0x20
> [   72.506001]  [<ffffffff8160eb30>] call_netdevice_notifiers_info+0x40/0x70
> [   72.512706]  [<ffffffff8160ec36>] call_netdevice_notifiers+0x16/0x20
> [   72.518986]  [<ffffffff816365f8>] change_tx_queue_len+0x38/0x80
> [   72.524838]  [<ffffffff816381cf>] netdev_store.isra.5+0xbf/0xd0
> [   72.530688]  [<ffffffff81638330>] tx_queue_len_store+0x50/0x60
> [   72.536459]  [<ffffffff814a6798>] dev_attr_store+0x18/0x30
> [   72.541888]  [<ffffffff812ea3ff>] sysfs_kf_write+0x4f/0x70
> [   72.547306]  [<ffffffff812e9507>] kernfs_fop_write+0x147/0x1d0
> [   72.553077]  [<ffffffff81134a4f>] ? rcu_read_lock_sched_held+0x8f/0xa0
> [   72.559534]  [<ffffffff8125a108>] __vfs_write+0x28/0x120
> [   72.564781]  [<ffffffff8111b137>] ? percpu_down_read+0x57/0x90
> [   72.570542]  [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
> [   72.576303]  [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
> [   72.582063]  [<ffffffff8125bd5e>] vfs_write+0xbe/0x1b0
> [   72.587138]  [<ffffffff8125c092>] SyS_write+0x52/0xa0
> [   72.592135]  [<ffffffff817528a5>] entry_SYSCALL_64_fastpath+0x18/0xa8
> [   72.598497] Code: 45 31 f6 48 8b 93 78 33 00 00 48 81 c3 78 33 00
> 00 48 39 d3 48 8d 82 e8 fa ff ff 74 25 48 8d b0 40 05 00 00 49 63 d6
> 41 83 c6 01 <49> 89 34 d4 48 8b 90 18 05 00 00 48 39 d3 48 8d 82 e8 fa
> ff ff
> [   72.617767] RIP  [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
> [   72.623883]  RSP <ffff883f3ef0bbe8>
> [   72.627327] CR2: 0000000000000010
> [   72.630638] ---[ end trace b0c54137cf861b91 ]---

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
  2016-07-06 17:45 Craig Gallek
@ 2016-07-08  6:19 ` Michael S. Tsirkin
  2016-07-08  9:14   ` Jason Wang
  2016-07-08  6:19 ` Michael S. Tsirkin
  1 sibling, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2016-07-08  6:19 UTC (permalink / raw)
  To: Craig Gallek
  Cc: Jason Wang, netdev, LKML, David Miller, kvm, virtualization,
	Eric Dumazet, brouer

On Wed, Jul 06, 2016 at 01:45:58PM -0400, Craig Gallek wrote:
> On Thu, Jun 30, 2016 at 2:45 AM, Jason Wang <jasowang@redhat.com> wrote:
> > Hi all:
> >
> > This series tries to switch to use skb array in tun. This is used to
> > eliminate the spinlock contention between producer and consumer. The
> > conversion was straightforward: just introdce a tx skb array and use
> > it instead of sk_receive_queue.
> 
> I'm seeing the splat below after this series.  I'm still wrapping my
> head around this code, but it appears to be happening because the
> tun_struct passed into tun_queue_resize is uninitialized.
> Specifically, iteration over the disabled list_head fails because prev
> = next = NULL.  This seems to happen when a startup script on my test
> machine changes the queue length.  I'll try to figure out what's
> happening, but if it's obvious to someone else from the stack, please
> let me know.


Don't see anything obvious. I'm traveling, will look at it when I'm back
unless it's fixed by then. Jason, any idea?

> [   72.322236] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000010
> [   72.329993] IP: [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
> [   72.336032] PGD 7f054f1067 PUD 7ef6f3f067 PMD 0
> [   72.340616] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
> [   72.345498] gsmi: Log Shutdown Reason 0x03
> [   72.349541] Modules linked in: w1_therm wire cdc_acm ehci_pci
> ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
> [   72.359870] CPU: 12 PID: 7820 Comm: set.ixion-haswe Not tainted
> 4.7.0-dbx-DEV #10
> [   72.360253] mlx4_en: eth0: Link Up
> [   72.370618] Hardware name: Intel Grantley,Wellsburg/Ixion_IT_15,
> BIOS 2.50.0 01/21/2016
> [   72.378525] task: ffff883f2501e8c0 ti: ffff883f3ef08000 task.ti:
> ffff883f3ef08000
> [   72.385917] RIP: 0010:[<ffffffff8153c1a0>]  [<ffffffff8153c1a0>]
> tun_device_event+0x110/0x340
> [   72.394353] RSP: 0018:ffff883f3ef0bbe8  EFLAGS: 00010202
> [   72.399599] RAX: fffffffffffffae8 RBX: ffff887ef9883378 RCX: 0000000000000000
> [   72.406647] RDX: 0000000000000000 RSI: 0000000000000028 RDI: 0000000000000000
> [   72.413694] RBP: ffff883f3ef0bc58 R08: 0000000000000000 R09: 0000000000000001
> [   72.420742] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000010
> [   72.427789] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883f3ef0bd10
> [   72.434837] FS:  00007fac4e5dd700(0000) GS:ffff883f7f700000(0000)
> knlGS:0000000000000000
> [   72.442832] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   72.448507] CR2: 0000000000000010 CR3: 0000007ef66ac000 CR4: 00000000001406e0
> [   72.455555] Stack:
> [   72.457541]  ffff883f3ef0bc18 0000000000000246 000000000000001e
> ffff887ef9880000
> [   72.464880]  ffff883f3ef0bd10 0000000000000000 0000000000000000
> ffffffff00000000
> [   72.472219]  ffff883f3ef0bc58 ffffffff81d24070 00000000fffffff9
> ffffffff81cec7a0
> [   72.479559] Call Trace:
> [   72.481986]  [<ffffffff810eeb0d>] notifier_call_chain+0x5d/0x80
> [   72.487844]  [<ffffffff816365c0>] ? show_tx_maxrate+0x30/0x30
> [   72.493520]  [<ffffffff810eeb3e>] __raw_notifier_call_chain+0xe/0x10
> [   72.499801]  [<ffffffff810eeb56>] raw_notifier_call_chain+0x16/0x20
> [   72.506001]  [<ffffffff8160eb30>] call_netdevice_notifiers_info+0x40/0x70
> [   72.512706]  [<ffffffff8160ec36>] call_netdevice_notifiers+0x16/0x20
> [   72.518986]  [<ffffffff816365f8>] change_tx_queue_len+0x38/0x80
> [   72.524838]  [<ffffffff816381cf>] netdev_store.isra.5+0xbf/0xd0
> [   72.530688]  [<ffffffff81638330>] tx_queue_len_store+0x50/0x60
> [   72.536459]  [<ffffffff814a6798>] dev_attr_store+0x18/0x30
> [   72.541888]  [<ffffffff812ea3ff>] sysfs_kf_write+0x4f/0x70
> [   72.547306]  [<ffffffff812e9507>] kernfs_fop_write+0x147/0x1d0
> [   72.553077]  [<ffffffff81134a4f>] ? rcu_read_lock_sched_held+0x8f/0xa0
> [   72.559534]  [<ffffffff8125a108>] __vfs_write+0x28/0x120
> [   72.564781]  [<ffffffff8111b137>] ? percpu_down_read+0x57/0x90
> [   72.570542]  [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
> [   72.576303]  [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
> [   72.582063]  [<ffffffff8125bd5e>] vfs_write+0xbe/0x1b0
> [   72.587138]  [<ffffffff8125c092>] SyS_write+0x52/0xa0
> [   72.592135]  [<ffffffff817528a5>] entry_SYSCALL_64_fastpath+0x18/0xa8
> [   72.598497] Code: 45 31 f6 48 8b 93 78 33 00 00 48 81 c3 78 33 00
> 00 48 39 d3 48 8d 82 e8 fa ff ff 74 25 48 8d b0 40 05 00 00 49 63 d6
> 41 83 c6 01 <49> 89 34 d4 48 8b 90 18 05 00 00 48 39 d3 48 8d 82 e8 fa
> ff ff
> [   72.617767] RIP  [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
> [   72.623883]  RSP <ffff883f3ef0bbe8>
> [   72.627327] CR2: 0000000000000010
> [   72.630638] ---[ end trace b0c54137cf861b91 ]---

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
@ 2016-07-06 17:45 Craig Gallek
  2016-07-08  6:19 ` Michael S. Tsirkin
  2016-07-08  6:19 ` Michael S. Tsirkin
  0 siblings, 2 replies; 15+ messages in thread
From: Craig Gallek @ 2016-07-06 17:45 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, netdev, LKML, David Miller, kvm, virtualization,
	Eric Dumazet, brouer

On Thu, Jun 30, 2016 at 2:45 AM, Jason Wang <jasowang@redhat.com> wrote:
> Hi all:
>
> This series tries to switch to use skb array in tun. This is used to
> eliminate the spinlock contention between producer and consumer. The
> conversion was straightforward: just introdce a tx skb array and use
> it instead of sk_receive_queue.

I'm seeing the splat below after this series.  I'm still wrapping my
head around this code, but it appears to be happening because the
tun_struct passed into tun_queue_resize is uninitialized.
Specifically, iteration over the disabled list_head fails because prev
= next = NULL.  This seems to happen when a startup script on my test
machine changes the queue length.  I'll try to figure out what's
happening, but if it's obvious to someone else from the stack, please
let me know.

[   72.322236] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000010
[   72.329993] IP: [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
[   72.336032] PGD 7f054f1067 PUD 7ef6f3f067 PMD 0
[   72.340616] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[   72.345498] gsmi: Log Shutdown Reason 0x03
[   72.349541] Modules linked in: w1_therm wire cdc_acm ehci_pci
ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
[   72.359870] CPU: 12 PID: 7820 Comm: set.ixion-haswe Not tainted
4.7.0-dbx-DEV #10
[   72.360253] mlx4_en: eth0: Link Up
[   72.370618] Hardware name: Intel Grantley,Wellsburg/Ixion_IT_15,
BIOS 2.50.0 01/21/2016
[   72.378525] task: ffff883f2501e8c0 ti: ffff883f3ef08000 task.ti:
ffff883f3ef08000
[   72.385917] RIP: 0010:[<ffffffff8153c1a0>]  [<ffffffff8153c1a0>]
tun_device_event+0x110/0x340
[   72.394353] RSP: 0018:ffff883f3ef0bbe8  EFLAGS: 00010202
[   72.399599] RAX: fffffffffffffae8 RBX: ffff887ef9883378 RCX: 0000000000000000
[   72.406647] RDX: 0000000000000000 RSI: 0000000000000028 RDI: 0000000000000000
[   72.413694] RBP: ffff883f3ef0bc58 R08: 0000000000000000 R09: 0000000000000001
[   72.420742] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000010
[   72.427789] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883f3ef0bd10
[   72.434837] FS:  00007fac4e5dd700(0000) GS:ffff883f7f700000(0000)
knlGS:0000000000000000
[   72.442832] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   72.448507] CR2: 0000000000000010 CR3: 0000007ef66ac000 CR4: 00000000001406e0
[   72.455555] Stack:
[   72.457541]  ffff883f3ef0bc18 0000000000000246 000000000000001e
ffff887ef9880000
[   72.464880]  ffff883f3ef0bd10 0000000000000000 0000000000000000
ffffffff00000000
[   72.472219]  ffff883f3ef0bc58 ffffffff81d24070 00000000fffffff9
ffffffff81cec7a0
[   72.479559] Call Trace:
[   72.481986]  [<ffffffff810eeb0d>] notifier_call_chain+0x5d/0x80
[   72.487844]  [<ffffffff816365c0>] ? show_tx_maxrate+0x30/0x30
[   72.493520]  [<ffffffff810eeb3e>] __raw_notifier_call_chain+0xe/0x10
[   72.499801]  [<ffffffff810eeb56>] raw_notifier_call_chain+0x16/0x20
[   72.506001]  [<ffffffff8160eb30>] call_netdevice_notifiers_info+0x40/0x70
[   72.512706]  [<ffffffff8160ec36>] call_netdevice_notifiers+0x16/0x20
[   72.518986]  [<ffffffff816365f8>] change_tx_queue_len+0x38/0x80
[   72.524838]  [<ffffffff816381cf>] netdev_store.isra.5+0xbf/0xd0
[   72.530688]  [<ffffffff81638330>] tx_queue_len_store+0x50/0x60
[   72.536459]  [<ffffffff814a6798>] dev_attr_store+0x18/0x30
[   72.541888]  [<ffffffff812ea3ff>] sysfs_kf_write+0x4f/0x70
[   72.547306]  [<ffffffff812e9507>] kernfs_fop_write+0x147/0x1d0
[   72.553077]  [<ffffffff81134a4f>] ? rcu_read_lock_sched_held+0x8f/0xa0
[   72.559534]  [<ffffffff8125a108>] __vfs_write+0x28/0x120
[   72.564781]  [<ffffffff8111b137>] ? percpu_down_read+0x57/0x90
[   72.570542]  [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
[   72.576303]  [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
[   72.582063]  [<ffffffff8125bd5e>] vfs_write+0xbe/0x1b0
[   72.587138]  [<ffffffff8125c092>] SyS_write+0x52/0xa0
[   72.592135]  [<ffffffff817528a5>] entry_SYSCALL_64_fastpath+0x18/0xa8
[   72.598497] Code: 45 31 f6 48 8b 93 78 33 00 00 48 81 c3 78 33 00
00 48 39 d3 48 8d 82 e8 fa ff ff 74 25 48 8d b0 40 05 00 00 49 63 d6
41 83 c6 01 <49> 89 34 d4 48 8b 90 18 05 00 00 48 39 d3 48 8d 82 e8 fa
ff ff
[   72.617767] RIP  [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
[   72.623883]  RSP <ffff883f3ef0bbe8>
[   72.627327] CR2: 0000000000000010
[   72.630638] ---[ end trace b0c54137cf861b91 ]---

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-07-08  9:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-30  6:45 [PATCH net-next V4 0/6] switch to use tx skb array in tun Jason Wang
2016-06-30  6:45 ` [PATCH net-next V4 1/6] ptr_ring: support zero length ring Jason Wang
2016-06-30  6:45 ` [PATCH net-next V4 2/6] skb_array: minor tweak Jason Wang
2016-06-30  6:45 ` [PATCH net-next V4 3/6] ptr_ring: support resizing multiple queues Jason Wang
2016-06-30  6:45 ` [PATCH net-next V4 4/6] skb_array: add wrappers for resizing Jason Wang
2016-06-30  6:45 ` [PATCH net-next V4 5/6] net: introduce NETDEV_CHANGE_TX_QUEUE_LEN Jason Wang
2016-06-30 16:33   ` John Fastabend
2016-06-30  6:45 ` [PATCH net-next V4 6/6] tun: switch to use skb array for tx Jason Wang
2016-06-30 15:45 ` [PATCH net-next V4 0/6] switch to use tx skb array in tun Michael S. Tsirkin
2016-07-01  6:04   ` Jason Wang
2016-07-01  9:40 ` David Miller
2016-07-06 17:45 Craig Gallek
2016-07-08  6:19 ` Michael S. Tsirkin
2016-07-08  9:14   ` Jason Wang
2016-07-08  6:19 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).