netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap
@ 2011-09-17  6:02 Jason Wang
  2011-09-17  6:02 ` [net-next RFC V2 PATCH 1/5] tuntap: move socket to tun_file Jason Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Jason Wang @ 2011-09-17  6:02 UTC (permalink / raw)
  To: krkumar2, eric.dumazet, mst, netdev, linux-kernel, virtualization, davem
  Cc: kvm, rusty, qemu-devel, mirq-linux, joe, shemminger

Hello all:

This series brings the V2 of multiqueue tun/tap (V1 in
http://www.mail-archive.com/kvm@vger.kernel.org/msg59479.html), an
approach to let tun/tap can benefit from the multicore/multiqueue
environment by spreading the network loads into differnet
sockets/queues.

Some quick overview of the design:

- Allowing multiple sockets to be attached to a tun/tap devices.
- Use RCU to synchronize the data path and system call
- A simple hash based queue selecting algorithm is used to choose the
tx queue.
- Two new ioctls were added for the usespace to attach and detach
socket to the device.
- ABI compatibility were maintained, and multiqueue is only enabled
for tap as kvm is the only user as far as I can see. But it maybe used
by tun also.

In order to use the multiqueue virio-net in guest, changes of qemu and
guest driver are also needed. Please refer
http://www.spinics.net/lists/kvm/msg52808.html for guest drivers
http://www.spinics.net/lists/kvm/msg52808.html and qemu changes.
I would also post the a new version of qemu changes soon.

A wiki-page was created to narrate the detail design of all parts
involved in the multi queue implementation:
http://www.linux-kvm.org/page/Multiqueue and some basic tests result
could be seen in this page
http://www.linux-kvm.org/page/Multiqueue-performance-Sep-13. I would
post the detail numbers in attachment as the reply of this thread.

Changes from V1:

1 Simplify the sockets array management by not leaving NULL in the
slot.
2 Optimization on the tx queue selecting.
3 Fix the bug in tun_deatch_all()

Some notes on the test result:

The results shows a very well scale for guest receiving and large
packets sending, but met some regressions at specific conditions:

1 Current implementation suffers from the regression of multiple
sessions of small packet transmission from guest, this regression
becomes severs when test it between localhost and guest.

>From the test result, we can see more pio exit were measured, the
reason is the small number of co-current sessions may not even
overload a single queue and may brings extra overhead when using
multiple queues. When we are trying to use multiple connections to
transmit small packets through single queue, the queue is almost full
and vhost thread is busy with tx. So guest have more chances to met a
notification disabled tx queue when it want to transmit packets (high
number of tx packets per pio exit). But when we transmit packets
through multiple queue, each queue is not fully utilized and so guest
have less chance to see a notification disabled queue when
transmitting packets, so more pio_exits and more vhost thread
wakup/sleep were found.

As Michael point out, other feature such as PLE may also have help in
the performance, when we are using single queue, multiple guest vcpus
may contend on the tx lock which may be captured by PLE and save the
cpu utilization. But multiple queue can not benefit from it as it
could get lees lock contention.

The solution for this still needs to be investigated, any suggestions
are welcomed.

2 Current implementation may also get regression for single session
packet transmission.

The reason is packets from each flow were not handled by the same
queue/vhost thread.

Various method could be done to handle this:

2.1 hack the guest driver, and store the queue index into the rxhash and
use it when choosing tx in guest. This need some hack to store the
rxhash into sk and pass it in to skb again in
skb_orphan_try(). sk_rxhash is only used by RPS now, so some more
clean method is needed.

2.2 hack the tun/tap, add a hash to queue table, and use the hash of the
skb to store the queue index. This method would introduce more
overhead and the rxhash would be calculated during each skb reception
or transmission.

I've tried both 1 and 2, both of them could solve the problem, but
both of it may introduces regression for multiple sessions. More
reasonable method is needed.

Please comment, thanks. Any suggestions are welcomed.

---

Jason Wang (5):
      tuntap: move socket to tun_file
      tuntap: categorize ioctl
      tuntap: introduce multiqueue flags
      tuntap: multiqueue support
      tuntap: add ioctls to attach or detach a file form tap device


 drivers/net/tun.c      |  718 ++++++++++++++++++++++++++++--------------------
 include/linux/if_tun.h |    5 
 2 files changed, 430 insertions(+), 293 deletions(-)

-- 
Jason Wang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [net-next RFC V2 PATCH 1/5] tuntap: move socket to tun_file
  2011-09-17  6:02 [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Jason Wang
@ 2011-09-17  6:02 ` Jason Wang
  2011-09-17  6:02 ` [net-next RFC V2 PATCH 2/5] tuntap: categorize ioctl Jason Wang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Jason Wang @ 2011-09-17  6:02 UTC (permalink / raw)
  To: krkumar2, eric.dumazet, mst, netdev, linux-kernel, virtualization, davem
  Cc: kvm, rusty, qemu-devel, mirq-linux, joe, shemminger

In order to let tap can transmit skbs to multiple sockets, the first
step is to move socket from tun_device to tun_file. The reference
between tap device and socket was setup during TUNSETIFF as
usual. After this we can go ahead to  allow multiple files to be
attached to tap device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |  347 +++++++++++++++++++++++++++--------------------------
 1 files changed, 178 insertions(+), 169 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7bea9c6..b64ad05 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -109,9 +109,16 @@ struct tap_filter {
 };
 
 struct tun_file {
+	struct sock sk;
+	struct socket socket;
+	struct socket_wq wq;
+	int vnet_hdr_sz;
+	struct tap_filter txflt;
 	atomic_t count;
 	struct tun_struct *tun;
 	struct net *net;
+	struct fasync_struct *fasync;
+	unsigned int flags;
 };
 
 struct tun_sock;
@@ -126,29 +133,12 @@ struct tun_struct {
 	u32			set_features;
 #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
 			  NETIF_F_TSO6|NETIF_F_UFO)
-	struct fasync_struct	*fasync;
-
-	struct tap_filter       txflt;
-	struct socket		socket;
-	struct socket_wq	wq;
-
-	int			vnet_hdr_sz;
 
 #ifdef TUN_DEBUG
 	int debug;
 #endif
 };
 
-struct tun_sock {
-	struct sock		sk;
-	struct tun_struct	*tun;
-};
-
-static inline struct tun_sock *tun_sk(struct sock *sk)
-{
-	return container_of(sk, struct tun_sock, sk);
-}
-
 static int tun_attach(struct tun_struct *tun, struct file *file)
 {
 	struct tun_file *tfile = file->private_data;
@@ -169,10 +159,9 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
 	err = 0;
 	tfile->tun = tun;
 	tun->tfile = tfile;
-	tun->socket.file = file;
 	netif_carrier_on(tun->dev);
 	dev_hold(tun->dev);
-	sock_hold(tun->socket.sk);
+	sock_hold(&tfile->sk);
 	atomic_inc(&tfile->count);
 
 out:
@@ -182,15 +171,15 @@ out:
 
 static void __tun_detach(struct tun_struct *tun)
 {
+	struct tun_file *tfile = tun->tfile;
 	/* Detach from net device */
 	netif_tx_lock_bh(tun->dev);
 	netif_carrier_off(tun->dev);
 	tun->tfile = NULL;
-	tun->socket.file = NULL;
 	netif_tx_unlock_bh(tun->dev);
 
 	/* Drop read queue */
-	skb_queue_purge(&tun->socket.sk->sk_receive_queue);
+	skb_queue_purge(&tfile->socket.sk->sk_receive_queue);
 
 	/* Drop the extra count on the net device */
 	dev_put(tun->dev);
@@ -349,19 +338,12 @@ static void tun_net_uninit(struct net_device *dev)
 	/* Inform the methods they need to stop using the dev.
 	 */
 	if (tfile) {
-		wake_up_all(&tun->wq.wait);
+		wake_up_all(&tfile->wq.wait);
 		if (atomic_dec_and_test(&tfile->count))
 			__tun_detach(tun);
 	}
 }
 
-static void tun_free_netdev(struct net_device *dev)
-{
-	struct tun_struct *tun = netdev_priv(dev);
-
-	sock_put(tun->socket.sk);
-}
-
 /* Net device open. */
 static int tun_net_open(struct net_device *dev)
 {
@@ -380,24 +362,25 @@ static int tun_net_close(struct net_device *dev)
 static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct tun_struct *tun = netdev_priv(dev);
+	struct tun_file *tfile = tun->tfile;
 
 	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
 
 	/* Drop packet if interface is not attached */
-	if (!tun->tfile)
+	if (!tfile)
 		goto drop;
 
 	/* Drop if the filter does not like it.
 	 * This is a noop if the filter is disabled.
 	 * Filter can be enabled only for the TAP devices. */
-	if (!check_filter(&tun->txflt, skb))
+	if (!check_filter(&tfile->txflt, skb))
 		goto drop;
 
-	if (tun->socket.sk->sk_filter &&
-	    sk_filter(tun->socket.sk, skb))
+	if (tfile->socket.sk->sk_filter &&
+	    sk_filter(tfile->socket.sk, skb))
 		goto drop;
 
-	if (skb_queue_len(&tun->socket.sk->sk_receive_queue) >= dev->tx_queue_len) {
+	if (skb_queue_len(&tfile->socket.sk->sk_receive_queue) >= dev->tx_queue_len) {
 		if (!(tun->flags & TUN_ONE_QUEUE)) {
 			/* Normal queueing mode. */
 			/* Packet scheduler handles dropping of further packets. */
@@ -418,12 +401,12 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	skb_orphan(skb);
 
 	/* Enqueue packet */
-	skb_queue_tail(&tun->socket.sk->sk_receive_queue, skb);
+	skb_queue_tail(&tfile->socket.sk->sk_receive_queue, skb);
 
 	/* Notify and wake up reader process */
-	if (tun->flags & TUN_FASYNC)
-		kill_fasync(&tun->fasync, SIGIO, POLL_IN);
-	wake_up_interruptible_poll(&tun->wq.wait, POLLIN |
+	if (tfile->flags & TUN_FASYNC)
+		kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
+	wake_up_interruptible_poll(&tfile->wq.wait, POLLIN |
 				   POLLRDNORM | POLLRDBAND);
 	return NETDEV_TX_OK;
 
@@ -550,11 +533,11 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
 	if (!tun)
 		return POLLERR;
 
-	sk = tun->socket.sk;
+	sk = tfile->socket.sk;
 
 	tun_debug(KERN_INFO, tun, "tun_chr_poll\n");
 
-	poll_wait(file, &tun->wq.wait, wait);
+	poll_wait(file, &tfile->wq.wait, wait);
 
 	if (!skb_queue_empty(&sk->sk_receive_queue))
 		mask |= POLLIN | POLLRDNORM;
@@ -573,11 +556,11 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
 
 /* prepad is the amount to reserve at front.  len is length after that.
  * linear is a hint as to how much to copy (usually headers). */
-static struct sk_buff *tun_alloc_skb(struct tun_struct *tun,
+static struct sk_buff *tun_alloc_skb(struct tun_file *tfile,
 				     size_t prepad, size_t len,
 				     size_t linear, int noblock)
 {
-	struct sock *sk = tun->socket.sk;
+	struct sock *sk = tfile->socket.sk;
 	struct sk_buff *skb;
 	int err;
 
@@ -601,7 +584,7 @@ static struct sk_buff *tun_alloc_skb(struct tun_struct *tun,
 }
 
 /* Get packet from user space buffer */
-static ssize_t tun_get_user(struct tun_struct *tun,
+static ssize_t tun_get_user(struct tun_file *tfile,
 			    const struct iovec *iv, size_t count,
 			    int noblock)
 {
@@ -610,8 +593,10 @@ static ssize_t tun_get_user(struct tun_struct *tun,
 	size_t len = count, align = NET_SKB_PAD;
 	struct virtio_net_hdr gso = { 0 };
 	int offset = 0;
+	struct tun_struct *tun = NULL;
+	bool drop = false, error = false;
 
-	if (!(tun->flags & TUN_NO_PI)) {
+	if (!(tfile->flags & TUN_NO_PI)) {
 		if ((len -= sizeof(pi)) > count)
 			return -EINVAL;
 
@@ -620,8 +605,8 @@ static ssize_t tun_get_user(struct tun_struct *tun,
 		offset += sizeof(pi);
 	}
 
-	if (tun->flags & TUN_VNET_HDR) {
-		if ((len -= tun->vnet_hdr_sz) > count)
+	if (tfile->flags & TUN_VNET_HDR) {
+		if ((len -= tfile->vnet_hdr_sz) > count)
 			return -EINVAL;
 
 		if (memcpy_fromiovecend((void *)&gso, iv, offset, sizeof(gso)))
@@ -633,41 +618,43 @@ static ssize_t tun_get_user(struct tun_struct *tun,
 
 		if (gso.hdr_len > len)
 			return -EINVAL;
-		offset += tun->vnet_hdr_sz;
+		offset += tfile->vnet_hdr_sz;
 	}
 
-	if ((tun->flags & TUN_TYPE_MASK) == TUN_TAP_DEV) {
+	if ((tfile->flags & TUN_TYPE_MASK) == TUN_TAP_DEV) {
 		align += NET_IP_ALIGN;
 		if (unlikely(len < ETH_HLEN ||
 			     (gso.hdr_len && gso.hdr_len < ETH_HLEN)))
 			return -EINVAL;
 	}
 
-	skb = tun_alloc_skb(tun, align, len, gso.hdr_len, noblock);
+	skb = tun_alloc_skb(tfile, align, len, gso.hdr_len, noblock);
+
 	if (IS_ERR(skb)) {
 		if (PTR_ERR(skb) != -EAGAIN)
-			tun->dev->stats.rx_dropped++;
-		return PTR_ERR(skb);
+			drop = true;
+		count = PTR_ERR(skb);
+		goto err;
 	}
 
 	if (skb_copy_datagram_from_iovec(skb, 0, iv, offset, len)) {
-		tun->dev->stats.rx_dropped++;
+		drop = true;
 		kfree_skb(skb);
-		return -EFAULT;
+		count = -EFAULT;
+		goto err;
 	}
 
 	if (gso.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
 		if (!skb_partial_csum_set(skb, gso.csum_start,
 					  gso.csum_offset)) {
-			tun->dev->stats.rx_frame_errors++;
-			kfree_skb(skb);
-			return -EINVAL;
+			error = true;
+			goto err_free;
 		}
 	}
 
-	switch (tun->flags & TUN_TYPE_MASK) {
+	switch (tfile->flags & TUN_TYPE_MASK) {
 	case TUN_TUN_DEV:
-		if (tun->flags & TUN_NO_PI) {
+		if (tfile->flags & TUN_NO_PI) {
 			switch (skb->data[0] & 0xf0) {
 			case 0x40:
 				pi.proto = htons(ETH_P_IP);
@@ -676,18 +663,15 @@ static ssize_t tun_get_user(struct tun_struct *tun,
 				pi.proto = htons(ETH_P_IPV6);
 				break;
 			default:
-				tun->dev->stats.rx_dropped++;
-				kfree_skb(skb);
-				return -EINVAL;
+				drop = true;
+				goto err_free;
 			}
 		}
 
 		skb_reset_mac_header(skb);
 		skb->protocol = pi.proto;
-		skb->dev = tun->dev;
 		break;
 	case TUN_TAP_DEV:
-		skb->protocol = eth_type_trans(skb, tun->dev);
 		break;
 	}
 
@@ -704,9 +688,8 @@ static ssize_t tun_get_user(struct tun_struct *tun,
 			skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
 			break;
 		default:
-			tun->dev->stats.rx_frame_errors++;
-			kfree_skb(skb);
-			return -EINVAL;
+			error = true;
+			goto err_free;
 		}
 
 		if (gso.gso_type & VIRTIO_NET_HDR_GSO_ECN)
@@ -714,9 +697,8 @@ static ssize_t tun_get_user(struct tun_struct *tun,
 
 		skb_shinfo(skb)->gso_size = gso.gso_size;
 		if (skb_shinfo(skb)->gso_size == 0) {
-			tun->dev->stats.rx_frame_errors++;
-			kfree_skb(skb);
-			return -EINVAL;
+			error = true;
+			goto err_free;
 		}
 
 		/* Header must be checked, and gso_segs computed. */
@@ -724,11 +706,40 @@ static ssize_t tun_get_user(struct tun_struct *tun,
 		skb_shinfo(skb)->gso_segs = 0;
 	}
 
-	netif_rx_ni(skb);
+	tun = __tun_get(tfile);
+	if (!tun) {
+		return -EBADFD;
+	}
+
+	switch (tfile->flags & TUN_TYPE_MASK) {
+	case TUN_TUN_DEV:
+		skb->dev = tun->dev;
+		break;
+	case TUN_TAP_DEV:
+		skb->protocol = eth_type_trans(skb, tun->dev);
+		break;
+	}
 
+	netif_rx_ni(skb);
 	tun->dev->stats.rx_packets++;
 	tun->dev->stats.rx_bytes += len;
+	tun_put(tun);
+	return count;
 
+err_free:
+	count = -EINVAL;
+	kfree_skb(skb);
+err:
+	tun = __tun_get(tfile);
+	if (!tun) {
+		return -EBADFD;
+	}
+
+	if (drop)
+		tun->dev->stats.rx_dropped++;
+	if (error)
+		tun->dev->stats.rx_frame_errors++;
+	tun_put(tun);
 	return count;
 }
 
@@ -736,30 +747,25 @@ static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
 			      unsigned long count, loff_t pos)
 {
 	struct file *file = iocb->ki_filp;
-	struct tun_struct *tun = tun_get(file);
+	struct tun_file *tfile = file->private_data;
 	ssize_t result;
 
-	if (!tun)
-		return -EBADFD;
-
-	tun_debug(KERN_INFO, tun, "tun_chr_write %ld\n", count);
-
-	result = tun_get_user(tun, iv, iov_length(iv, count),
+	result = tun_get_user(tfile, iv, iov_length(iv, count),
 			      file->f_flags & O_NONBLOCK);
 
-	tun_put(tun);
 	return result;
 }
 
 /* Put packet to the user space buffer */
-static ssize_t tun_put_user(struct tun_struct *tun,
+static ssize_t tun_put_user(struct tun_file *tfile,
 			    struct sk_buff *skb,
 			    const struct iovec *iv, int len)
 {
+	struct tun_struct *tun = NULL;
 	struct tun_pi pi = { 0, skb->protocol };
 	ssize_t total = 0;
 
-	if (!(tun->flags & TUN_NO_PI)) {
+	if (!(tfile->flags & TUN_NO_PI)) {
 		if ((len -= sizeof(pi)) < 0)
 			return -EINVAL;
 
@@ -773,9 +779,9 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 		total += sizeof(pi);
 	}
 
-	if (tun->flags & TUN_VNET_HDR) {
+	if (tfile->flags & TUN_VNET_HDR) {
 		struct virtio_net_hdr gso = { 0 }; /* no info leak */
-		if ((len -= tun->vnet_hdr_sz) < 0)
+		if ((len -= tfile->vnet_hdr_sz) < 0)
 			return -EINVAL;
 
 		if (skb_is_gso(skb)) {
@@ -818,7 +824,7 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 		if (unlikely(memcpy_toiovecend(iv, (void *)&gso, total,
 					       sizeof(gso))))
 			return -EFAULT;
-		total += tun->vnet_hdr_sz;
+		total += tfile->vnet_hdr_sz;
 	}
 
 	len = min_t(int, skb->len, len);
@@ -826,29 +832,32 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 	skb_copy_datagram_const_iovec(skb, 0, iv, total, len);
 	total += skb->len;
 
-	tun->dev->stats.tx_packets++;
-	tun->dev->stats.tx_bytes += len;
+	tun = __tun_get(tfile);
+	if (tun) {
+		tun->dev->stats.tx_packets++;
+		tun->dev->stats.tx_bytes += len;
+		tun_put(tun);
+	}
 
 	return total;
 }
 
-static ssize_t tun_do_read(struct tun_struct *tun,
+static ssize_t tun_do_read(struct tun_file *tfile,
 			   struct kiocb *iocb, const struct iovec *iv,
 			   ssize_t len, int noblock)
 {
 	DECLARE_WAITQUEUE(wait, current);
 	struct sk_buff *skb;
 	ssize_t ret = 0;
-
-	tun_debug(KERN_INFO, tun, "tun_chr_read\n");
+	struct tun_struct *tun = NULL;
 
 	if (unlikely(!noblock))
-		add_wait_queue(&tun->wq.wait, &wait);
+		add_wait_queue(&tfile->wq.wait, &wait);
 	while (len) {
 		current->state = TASK_INTERRUPTIBLE;
 
 		/* Read frames from the queue */
-		if (!(skb=skb_dequeue(&tun->socket.sk->sk_receive_queue))) {
+		if (!(skb=skb_dequeue(&tfile->socket.sk->sk_receive_queue))) {
 			if (noblock) {
 				ret = -EAGAIN;
 				break;
@@ -857,25 +866,38 @@ static ssize_t tun_do_read(struct tun_struct *tun,
 				ret = -ERESTARTSYS;
 				break;
 			}
+
+			tun = __tun_get(tfile);
+			if (!tun) {
+				ret = -EIO;
+				break;
+			}
 			if (tun->dev->reg_state != NETREG_REGISTERED) {
 				ret = -EIO;
+				tun_put(tun);
 				break;
 			}
+			tun_put(tun);
 
 			/* Nothing to read, let's sleep */
 			schedule();
 			continue;
 		}
-		netif_wake_queue(tun->dev);
 
-		ret = tun_put_user(tun, skb, iv, len);
+		tun = __tun_get(tfile);
+		if (tun) {
+			netif_wake_queue(tun->dev);
+			tun_put(tun);
+		}
+
+		ret = tun_put_user(tfile, skb, iv, len);
 		kfree_skb(skb);
 		break;
 	}
 
 	current->state = TASK_RUNNING;
 	if (unlikely(!noblock))
-		remove_wait_queue(&tun->wq.wait, &wait);
+		remove_wait_queue(&tfile->wq.wait, &wait);
 
 	return ret;
 }
@@ -885,21 +907,17 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 {
 	struct file *file = iocb->ki_filp;
 	struct tun_file *tfile = file->private_data;
-	struct tun_struct *tun = __tun_get(tfile);
 	ssize_t len, ret;
 
-	if (!tun)
-		return -EBADFD;
 	len = iov_length(iv, count);
 	if (len < 0) {
 		ret = -EINVAL;
 		goto out;
 	}
 
-	ret = tun_do_read(tun, iocb, iv, len, file->f_flags & O_NONBLOCK);
+	ret = tun_do_read(tfile, iocb, iv, len, file->f_flags & O_NONBLOCK);
 	ret = min_t(ssize_t, ret, len);
 out:
-	tun_put(tun);
 	return ret;
 }
 
@@ -911,7 +929,7 @@ static void tun_setup(struct net_device *dev)
 	tun->group = -1;
 
 	dev->ethtool_ops = &tun_ethtool_ops;
-	dev->destructor = tun_free_netdev;
+	dev->destructor = free_netdev;
 }
 
 /* Trivial set of netlink ops to allow deleting tun or tap
@@ -931,7 +949,7 @@ static struct rtnl_link_ops tun_link_ops __read_mostly = {
 
 static void tun_sock_write_space(struct sock *sk)
 {
-	struct tun_struct *tun;
+	struct tun_file *tfile = NULL;
 	wait_queue_head_t *wqueue;
 
 	if (!sock_writeable(sk))
@@ -945,37 +963,38 @@ static void tun_sock_write_space(struct sock *sk)
 		wake_up_interruptible_sync_poll(wqueue, POLLOUT |
 						POLLWRNORM | POLLWRBAND);
 
-	tun = tun_sk(sk)->tun;
-	kill_fasync(&tun->fasync, SIGIO, POLL_OUT);
-}
-
-static void tun_sock_destruct(struct sock *sk)
-{
-	free_netdev(tun_sk(sk)->tun->dev);
+	tfile = container_of(sk, struct tun_file, sk);
+	kill_fasync(&tfile->fasync, SIGIO, POLL_OUT);
 }
 
 static int tun_sendmsg(struct kiocb *iocb, struct socket *sock,
 		       struct msghdr *m, size_t total_len)
 {
-	struct tun_struct *tun = container_of(sock, struct tun_struct, socket);
-	return tun_get_user(tun, m->msg_iov, total_len,
-			    m->msg_flags & MSG_DONTWAIT);
+	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
+	ssize_t result;
+
+	result= tun_get_user(tfile, m->msg_iov, total_len,
+			     m->msg_flags & MSG_DONTWAIT);
+	return result;
 }
 
 static int tun_recvmsg(struct kiocb *iocb, struct socket *sock,
 		       struct msghdr *m, size_t total_len,
 		       int flags)
 {
-	struct tun_struct *tun = container_of(sock, struct tun_struct, socket);
+	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
 	int ret;
+
 	if (flags & ~(MSG_DONTWAIT|MSG_TRUNC))
 		return -EINVAL;
-	ret = tun_do_read(tun, iocb, m->msg_iov, total_len,
+
+	ret = tun_do_read(tfile, iocb, m->msg_iov, total_len,
 			  flags & MSG_DONTWAIT);
 	if (ret > total_len) {
 		m->msg_flags |= MSG_TRUNC;
 		ret = flags & MSG_TRUNC ? ret : total_len;
 	}
+
 	return ret;
 }
 
@@ -988,7 +1007,7 @@ static const struct proto_ops tun_socket_ops = {
 static struct proto tun_proto = {
 	.name		= "tun",
 	.owner		= THIS_MODULE,
-	.obj_size	= sizeof(struct tun_sock),
+	.obj_size	= sizeof(struct tun_file),
 };
 
 static int tun_flags(struct tun_struct *tun)
@@ -1039,8 +1058,8 @@ static DEVICE_ATTR(group, 0444, tun_show_group, NULL);
 
 static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 {
-	struct sock *sk;
 	struct tun_struct *tun;
+	struct tun_file *tfile = file->private_data;
 	struct net_device *dev;
 	int err;
 
@@ -1061,7 +1080,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		     (tun->group != -1 && !in_egroup_p(tun->group))) &&
 		    !capable(CAP_NET_ADMIN))
 			return -EPERM;
-		err = security_tun_dev_attach(tun->socket.sk);
+		err = security_tun_dev_attach(tfile->socket.sk);
 		if (err < 0)
 			return err;
 
@@ -1105,24 +1124,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		tun = netdev_priv(dev);
 		tun->dev = dev;
 		tun->flags = flags;
-		tun->txflt.count = 0;
-		tun->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
 
-		err = -ENOMEM;
-		sk = sk_alloc(net, AF_UNSPEC, GFP_KERNEL, &tun_proto);
-		if (!sk)
-			goto err_free_dev;
-
-		tun->socket.wq = &tun->wq;
-		init_waitqueue_head(&tun->wq.wait);
-		tun->socket.ops = &tun_socket_ops;
-		sock_init_data(&tun->socket, sk);
-		sk->sk_write_space = tun_sock_write_space;
-		sk->sk_sndbuf = INT_MAX;
-
-		tun_sk(sk)->tun = tun;
-
-		security_tun_dev_post_create(sk);
+		security_tun_dev_post_create(&tfile->sk);
 
 		tun_net_init(dev);
 
@@ -1132,15 +1135,13 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 
 		err = register_netdevice(tun->dev);
 		if (err < 0)
-			goto err_free_sk;
+			goto err_free_dev;
 
 		if (device_create_file(&tun->dev->dev, &dev_attr_tun_flags) ||
 		    device_create_file(&tun->dev->dev, &dev_attr_owner) ||
 		    device_create_file(&tun->dev->dev, &dev_attr_group))
 			pr_err("Failed to create tun sysfs files\n");
 
-		sk->sk_destruct = tun_sock_destruct;
-
 		err = tun_attach(tun, file);
 		if (err < 0)
 			goto failed;
@@ -1163,6 +1164,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	else
 		tun->flags &= ~TUN_VNET_HDR;
 
+	/* Cache flags from tun device */
+	tfile->flags = tun->flags;
 	/* Make sure persistent devices do not get stuck in
 	 * xoff state.
 	 */
@@ -1172,11 +1175,9 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	strcpy(ifr->ifr_name, tun->dev->name);
 	return 0;
 
- err_free_sk:
-	sock_put(sk);
- err_free_dev:
+err_free_dev:
 	free_netdev(dev);
- failed:
+failed:
 	return err;
 }
 
@@ -1348,9 +1349,9 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 	case TUNSETTXFILTER:
 		/* Can be set only for TAPs */
 		ret = -EINVAL;
-		if ((tun->flags & TUN_TYPE_MASK) != TUN_TAP_DEV)
+		if ((tfile->flags & TUN_TYPE_MASK) != TUN_TAP_DEV)
 			break;
-		ret = update_filter(&tun->txflt, (void __user *)arg);
+		ret = update_filter(&tfile->txflt, (void __user *)arg);
 		break;
 
 	case SIOCGIFHWADDR:
@@ -1370,7 +1371,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 		break;
 
 	case TUNGETSNDBUF:
-		sndbuf = tun->socket.sk->sk_sndbuf;
+		sndbuf = tfile->socket.sk->sk_sndbuf;
 		if (copy_to_user(argp, &sndbuf, sizeof(sndbuf)))
 			ret = -EFAULT;
 		break;
@@ -1381,11 +1382,11 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 			break;
 		}
 
-		tun->socket.sk->sk_sndbuf = sndbuf;
+		tfile->socket.sk->sk_sndbuf = sndbuf;
 		break;
 
 	case TUNGETVNETHDRSZ:
-		vnet_hdr_sz = tun->vnet_hdr_sz;
+		vnet_hdr_sz = tfile->vnet_hdr_sz;
 		if (copy_to_user(argp, &vnet_hdr_sz, sizeof(vnet_hdr_sz)))
 			ret = -EFAULT;
 		break;
@@ -1400,27 +1401,27 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 			break;
 		}
 
-		tun->vnet_hdr_sz = vnet_hdr_sz;
+		tfile->vnet_hdr_sz = vnet_hdr_sz;
 		break;
 
 	case TUNATTACHFILTER:
 		/* Can be set only for TAPs */
 		ret = -EINVAL;
-		if ((tun->flags & TUN_TYPE_MASK) != TUN_TAP_DEV)
+		if ((tfile->flags & TUN_TYPE_MASK) != TUN_TAP_DEV)
 			break;
 		ret = -EFAULT;
 		if (copy_from_user(&fprog, argp, sizeof(fprog)))
 			break;
 
-		ret = sk_attach_filter(&fprog, tun->socket.sk);
+		ret = sk_attach_filter(&fprog, tfile->socket.sk);
 		break;
 
 	case TUNDETACHFILTER:
 		/* Can be set only for TAPs */
 		ret = -EINVAL;
-		if ((tun->flags & TUN_TYPE_MASK) != TUN_TAP_DEV)
+		if ((tfile->flags & TUN_TYPE_MASK) != TUN_TAP_DEV)
 			break;
-		ret = sk_detach_filter(tun->socket.sk);
+		ret = sk_detach_filter(tfile->socket.sk);
 		break;
 
 	default:
@@ -1472,43 +1473,50 @@ static long tun_chr_compat_ioctl(struct file *file,
 
 static int tun_chr_fasync(int fd, struct file *file, int on)
 {
-	struct tun_struct *tun = tun_get(file);
+	struct tun_file *tfile = file->private_data;
 	int ret;
 
-	if (!tun)
-		return -EBADFD;
-
-	tun_debug(KERN_INFO, tun, "tun_chr_fasync %d\n", on);
-
-	if ((ret = fasync_helper(fd, file, on, &tun->fasync)) < 0)
+	if ((ret = fasync_helper(fd, file, on, &tfile->fasync)) < 0)
 		goto out;
 
 	if (on) {
 		ret = __f_setown(file, task_pid(current), PIDTYPE_PID, 0);
 		if (ret)
 			goto out;
-		tun->flags |= TUN_FASYNC;
+		tfile->flags |= TUN_FASYNC;
 	} else
-		tun->flags &= ~TUN_FASYNC;
+		tfile->flags &= ~TUN_FASYNC;
 	ret = 0;
 out:
-	tun_put(tun);
 	return ret;
 }
 
 static int tun_chr_open(struct inode *inode, struct file * file)
 {
+	struct net *net = current->nsproxy->net_ns;
 	struct tun_file *tfile;
 
 	DBG1(KERN_INFO, "tunX: tun_chr_open\n");
 
-	tfile = kmalloc(sizeof(*tfile), GFP_KERNEL);
+	tfile = (struct tun_file *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
+					&tun_proto);
 	if (!tfile)
 		return -ENOMEM;
-	atomic_set(&tfile->count, 0);
+
 	tfile->tun = NULL;
-	tfile->net = get_net(current->nsproxy->net_ns);
+	tfile->net = net;
+	tfile->txflt.count = 0;
+	tfile->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
+	tfile->socket.wq = &tfile->wq;
+	init_waitqueue_head(&tfile->wq.wait);
+	tfile->socket.file = file;
+	tfile->socket.ops = &tun_socket_ops;
+	sock_init_data(&tfile->socket, &tfile->sk);
+
+	tfile->sk.sk_write_space = tun_sock_write_space;
+	tfile->sk.sk_sndbuf = INT_MAX;
 	file->private_data = tfile;
+
 	return 0;
 }
 
@@ -1532,14 +1540,14 @@ static int tun_chr_close(struct inode *inode, struct file *file)
 				unregister_netdevice(dev);
 			rtnl_unlock();
 		}
-	}
 
-	tun = tfile->tun;
-	if (tun)
-		sock_put(tun->socket.sk);
+		/* drop the reference that netdevice holds */
+		sock_put(&tfile->sk);
 
-	put_net(tfile->net);
-	kfree(tfile);
+	}
+
+	/* drop the reference that file holds */
+	sock_put(&tfile->sk);
 
 	return 0;
 }
@@ -1668,13 +1676,14 @@ static void tun_cleanup(void)
 struct socket *tun_get_socket(struct file *file)
 {
 	struct tun_struct *tun;
+	struct tun_file *tfile = file->private_data;
 	if (file->f_op != &tun_fops)
 		return ERR_PTR(-EINVAL);
 	tun = tun_get(file);
 	if (!tun)
 		return ERR_PTR(-EBADFD);
 	tun_put(tun);
-	return &tun->socket;
+	return &tfile->socket;
 }
 EXPORT_SYMBOL_GPL(tun_get_socket);
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [net-next RFC V2 PATCH 2/5] tuntap: categorize ioctl
  2011-09-17  6:02 [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Jason Wang
  2011-09-17  6:02 ` [net-next RFC V2 PATCH 1/5] tuntap: move socket to tun_file Jason Wang
@ 2011-09-17  6:02 ` Jason Wang
  2011-09-17  6:02 ` [net-next RFC V2 PATCH 3/5] tuntap: introduce multiqueue flags Jason Wang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Jason Wang @ 2011-09-17  6:02 UTC (permalink / raw)
  To: krkumar2, eric.dumazet, mst, netdev, linux-kernel, virtualization, davem
  Cc: kvm, rusty, qemu-devel, mirq-linux, joe, shemminger

As we've moved socket related structure to file->private_data, we move
the ioctls that only touch socket out of tun_chr_ioctl() as it don't
need hold rtnl lock.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |   52 ++++++++++++++++++++++++++++++++++------------------
 1 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b64ad05..dc768e0 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1238,10 +1238,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 	struct tun_file *tfile = file->private_data;
 	struct tun_struct *tun;
 	void __user* argp = (void __user*)arg;
-	struct sock_fprog fprog;
 	struct ifreq ifr;
-	int sndbuf;
-	int vnet_hdr_sz;
 	int ret;
 
 	if (cmd == TUNSETIFF || _IOC_TYPE(cmd) == 0x89)
@@ -1346,14 +1343,6 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 		ret = set_offload(tun, arg);
 		break;
 
-	case TUNSETTXFILTER:
-		/* Can be set only for TAPs */
-		ret = -EINVAL;
-		if ((tfile->flags & TUN_TYPE_MASK) != TUN_TAP_DEV)
-			break;
-		ret = update_filter(&tfile->txflt, (void __user *)arg);
-		break;
-
 	case SIOCGIFHWADDR:
 		/* Get hw address */
 		memcpy(ifr.ifr_hwaddr.sa_data, tun->dev->dev_addr, ETH_ALEN);
@@ -1370,6 +1359,37 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 		ret = dev_set_mac_address(tun->dev, &ifr.ifr_hwaddr);
 		break;
 
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+unlock:
+	rtnl_unlock();
+	if (tun)
+		tun_put(tun);
+	return ret;
+}
+
+static long __tun_socket_ioctl(struct file *file, unsigned int cmd,
+			       unsigned long arg, int ifreq_len)
+{
+	struct tun_file *tfile = file->private_data;
+	void __user* argp = (void __user*)arg;
+	struct sock_fprog fprog;
+	int sndbuf;
+	int vnet_hdr_sz;
+	int ret = 0;
+
+	switch (cmd) {
+	case TUNSETTXFILTER:
+		/* Can be set only for TAPs */
+		ret = -EINVAL;
+		if ((tfile->flags & TUN_TYPE_MASK) != TUN_TAP_DEV)
+			break;
+		ret = update_filter(&tfile->txflt, (void __user *)arg);
+		break;
+
 	case TUNGETSNDBUF:
 		sndbuf = tfile->socket.sk->sk_sndbuf;
 		if (copy_to_user(argp, &sndbuf, sizeof(sndbuf)))
@@ -1425,21 +1445,17 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 		break;
 
 	default:
-		ret = -EINVAL;
+		ret = __tun_chr_ioctl(file, cmd, arg, ifreq_len);
 		break;
 	}
 
-unlock:
-	rtnl_unlock();
-	if (tun)
-		tun_put(tun);
 	return ret;
 }
 
 static long tun_chr_ioctl(struct file *file,
 			  unsigned int cmd, unsigned long arg)
 {
-	return __tun_chr_ioctl(file, cmd, arg, sizeof (struct ifreq));
+	return __tun_socket_ioctl(file, cmd, arg, sizeof (struct ifreq));
 }
 
 #ifdef CONFIG_COMPAT
@@ -1467,7 +1483,7 @@ static long tun_chr_compat_ioctl(struct file *file,
 	 * driver are compatible though, we don't need to convert the
 	 * contents.
 	 */
-	return __tun_chr_ioctl(file, cmd, arg, sizeof(struct compat_ifreq));
+	return __tun_socket_ioctl(file, cmd, arg, sizeof(struct compat_ifreq));
 }
 #endif /* CONFIG_COMPAT */
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [net-next RFC V2 PATCH 3/5] tuntap: introduce multiqueue flags
  2011-09-17  6:02 [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Jason Wang
  2011-09-17  6:02 ` [net-next RFC V2 PATCH 1/5] tuntap: move socket to tun_file Jason Wang
  2011-09-17  6:02 ` [net-next RFC V2 PATCH 2/5] tuntap: categorize ioctl Jason Wang
@ 2011-09-17  6:02 ` Jason Wang
  2011-09-17  6:02 ` [net-next RFC V2 PATCH 4/5] tuntap: multiqueue support Jason Wang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Jason Wang @ 2011-09-17  6:02 UTC (permalink / raw)
  To: krkumar2, eric.dumazet, mst, netdev, linux-kernel, virtualization, davem
  Cc: kvm, rusty, qemu-devel, mirq-linux, joe, shemminger

Add flags to be used by creating multiqueue tuntap device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/if_tun.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index 06b1829..c92a291 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -34,6 +34,7 @@
 #define TUN_ONE_QUEUE	0x0080
 #define TUN_PERSIST 	0x0100	
 #define TUN_VNET_HDR 	0x0200
+#define TUN_TAP_MQ      0x0400
 
 /* Ioctl defines */
 #define TUNSETNOCSUM  _IOW('T', 200, int) 
@@ -61,6 +62,7 @@
 #define IFF_ONE_QUEUE	0x2000
 #define IFF_VNET_HDR	0x4000
 #define IFF_TUN_EXCL	0x8000
+#define IFF_MULTI_QUEUE 0x0100
 
 /* Features for GSO (TUNSETOFFLOAD). */
 #define TUN_F_CSUM	0x01	/* You can hand me unchecksummed packets. */

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [net-next RFC V2 PATCH 4/5] tuntap: multiqueue support
  2011-09-17  6:02 [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Jason Wang
                   ` (2 preceding siblings ...)
  2011-09-17  6:02 ` [net-next RFC V2 PATCH 3/5] tuntap: introduce multiqueue flags Jason Wang
@ 2011-09-17  6:02 ` Jason Wang
  2011-09-17  6:03 ` [net-next RFC V2 PATCH 5/5] tuntap: add ioctls to attach or detach a file form tap device Jason Wang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Jason Wang @ 2011-09-17  6:02 UTC (permalink / raw)
  To: krkumar2, eric.dumazet, mst, netdev, linux-kernel, virtualization, davem
  Cc: kvm, rusty, qemu-devel, mirq-linux, joe, shemminger

This patch adds multiqueue support for tap device by allowing multiple
sockets to be attached to a tap device. Then we could parallize packets
transmission/reception by put them into different socket.

Following steps were used when choose tx queues:
1 For the packets comes from multiqueue nics, we would just choose the
tx queue based on the which physical queue the packets comes from.
2 Otherwise we try to use rxhash to choose the queue.
3 If all above fails, we always use the first queue.

In order to let the tx path lockless, like macvtap, netif_tx_loch_bh()
isr eplaced by RCU and NETIF_F_LLTX to synchronize between hot path
and systemcall.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |  358 +++++++++++++++++++++++++++++++++--------------------
 1 files changed, 223 insertions(+), 135 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index dc768e0..ec29f85 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -108,6 +108,8 @@ struct tap_filter {
 	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
 };
 
+#define MAX_TAP_QUEUES (NR_CPUS < 16 ? NR_CPUS : 16)
+
 struct tun_file {
 	struct sock sk;
 	struct socket socket;
@@ -115,16 +117,18 @@ struct tun_file {
 	int vnet_hdr_sz;
 	struct tap_filter txflt;
 	atomic_t count;
-	struct tun_struct *tun;
+	struct tun_struct __rcu *tun;
 	struct net *net;
 	struct fasync_struct *fasync;
 	unsigned int flags;
+	u16 queue_index;
 };
 
 struct tun_sock;
 
 struct tun_struct {
-	struct tun_file		*tfile;
+	struct tun_file		*tfiles[MAX_TAP_QUEUES];
+	unsigned int            numqueues;
 	unsigned int 		flags;
 	uid_t			owner;
 	gid_t			group;
@@ -139,80 +143,160 @@ struct tun_struct {
 #endif
 };
 
-static int tun_attach(struct tun_struct *tun, struct file *file)
+static DEFINE_SPINLOCK(tun_lock);
+
+/*
+ * tun_get_queue(): calculate the queue index
+ *     - if skbs comes from mq nics, we can just borrow
+ *     - if not, calculate from the hash
+ */
+static struct tun_file *tun_get_queue(struct net_device *dev,
+                                     struct sk_buff *skb)
 {
-	struct tun_file *tfile = file->private_data;
-	int err;
+	struct tun_struct *tun = netdev_priv(dev);
+	struct tun_file *tfile = NULL;
+	int numqueues = tun->numqueues;
+	__u32 rxq;
 
-	ASSERT_RTNL();
+	BUG_ON(!rcu_read_lock_held());
 
-	netif_tx_lock_bh(tun->dev);
+	if (!numqueues)
+		goto out;
 
-	err = -EINVAL;
-	if (tfile->tun)
+	if (numqueues == 1) {
+		tfile = rcu_dereference(tun->tfiles[0]);
 		goto out;
+	}
 
-	err = -EBUSY;
-	if (tun->tfile)
+	if (likely(skb_rx_queue_recorded(skb))) {
+		rxq = skb_get_rx_queue(skb);
+
+		while (unlikely(rxq >= numqueues))
+			rxq -= numqueues;
+
+		tfile = rcu_dereference(tun->tfiles[rxq]);
 		goto out;
+	}
 
-	err = 0;
-	tfile->tun = tun;
-	tun->tfile = tfile;
-	netif_carrier_on(tun->dev);
-	dev_hold(tun->dev);
-	sock_hold(&tfile->sk);
-	atomic_inc(&tfile->count);
+	/* Check if we can use flow to select a queue */
+	rxq = skb_get_rxhash(skb);
+	if (rxq) {
+		u32 idx = ((u64)rxq * numqueues) >> 32;
+		tfile = rcu_dereference(tun->tfiles[idx]);
+		goto out;
+	}
 
+	tfile = rcu_dereference(tun->tfiles[0]);
 out:
-	netif_tx_unlock_bh(tun->dev);
-	return err;
+	return tfile;
 }
 
-static void __tun_detach(struct tun_struct *tun)
+static int tun_detach(struct tun_file *tfile, bool clean)
 {
-	struct tun_file *tfile = tun->tfile;
-	/* Detach from net device */
-	netif_tx_lock_bh(tun->dev);
-	netif_carrier_off(tun->dev);
-	tun->tfile = NULL;
-	netif_tx_unlock_bh(tun->dev);
-
-	/* Drop read queue */
-	skb_queue_purge(&tfile->socket.sk->sk_receive_queue);
-
-	/* Drop the extra count on the net device */
-	dev_put(tun->dev);
-}
+	struct tun_struct *tun;
+	struct net_device *dev = NULL;
+	bool destroy = false;
 
-static void tun_detach(struct tun_struct *tun)
-{
-	rtnl_lock();
-	__tun_detach(tun);
-	rtnl_unlock();
-}
+	spin_lock(&tun_lock);
 
-static struct tun_struct *__tun_get(struct tun_file *tfile)
-{
-	struct tun_struct *tun = NULL;
+	tun = rcu_dereference_protected(tfile->tun,
+					lockdep_is_held(&tun_lock));
+	if (tun) {
+		u16 index = tfile->queue_index;
+		BUG_ON(index > tun->numqueues);
+		BUG_ON(!tun->tfiles[tun->numqueues - 1]);
+		dev = tun->dev;
+
+		rcu_assign_pointer(tun->tfiles[index],
+				   tun->tfiles[tun->numqueues - 1]);
+		tun->tfiles[index]->queue_index = index;
+		rcu_assign_pointer(tfile->tun, NULL);
+		--tun->numqueues;
+		sock_put(&tfile->sk);
 
-	if (atomic_inc_not_zero(&tfile->count))
-		tun = tfile->tun;
+		if (tun->numqueues == 0 && !(tun->flags & TUN_PERSIST))
+			destroy = true;
+	}
+
+	spin_unlock(&tun_lock);
+
+	synchronize_rcu();
+	if (clean)
+		sock_put(&tfile->sk);
 
-	return tun;
+	if (destroy) {
+		rtnl_lock();
+		if (dev->reg_state == NETREG_REGISTERED)
+			unregister_netdevice(dev);
+		rtnl_unlock();
+	}
+
+	return 0;
 }
 
-static struct tun_struct *tun_get(struct file *file)
+static void tun_detach_all(struct net_device *dev)
 {
-	return __tun_get(file->private_data);
+	struct tun_struct *tun = netdev_priv(dev);
+	struct tun_file *tfile, *tfile_list[MAX_TAP_QUEUES];
+	int i, j = 0;
+
+	spin_lock(&tun_lock);
+
+	for (i = 0; i < MAX_TAP_QUEUES && tun->numqueues; i++) {
+		tfile = rcu_dereference_protected(tun->tfiles[i],
+						lockdep_is_held(&tun_lock));
+		BUG_ON(!tfile);
+		wake_up_all(&tfile->wq.wait);
+		tfile_list[j++] = tfile;
+		rcu_assign_pointer(tfile->tun, NULL);
+		--tun->numqueues;
+	}
+	BUG_ON(tun->numqueues != 0);
+	spin_unlock(&tun_lock);
+
+	synchronize_rcu();
+	for(--j; j >= 0; j--)
+		sock_put(&tfile_list[j]->sk);
 }
 
-static void tun_put(struct tun_struct *tun)
+static int tun_attach(struct tun_struct *tun, struct file *file)
 {
-	struct tun_file *tfile = tun->tfile;
+	struct tun_file *tfile = file->private_data;
+	int err;
+
+	ASSERT_RTNL();
+
+	spin_lock(&tun_lock);
+
+	err = -EINVAL;
+	if (rcu_dereference_protected(tfile->tun, lockdep_is_held(&tun_lock)))
+		goto out;
+
+	err = -EBUSY;
+	if (!(tun->flags & TUN_TAP_MQ) &&
+		rcu_dereference_protected(tun->tfiles[0],
+					lockdep_is_held(&tun_lock))) {
+		/* Multiqueue is only for TAP */
+		goto out;
+	}
+
+	if (tun->numqueues == MAX_TAP_QUEUES)
+		goto out;
+
+	err = 0;
+	tfile->queue_index = tun->numqueues;
+	rcu_assign_pointer(tfile->tun, tun);
+	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
+	sock_hold(&tfile->sk);
+	tun->numqueues++;
+
+	if (tun->numqueues == 1)
+		netif_carrier_on(tun->dev);
 
-	if (atomic_dec_and_test(&tfile->count))
-		tun_detach(tfile->tun);
+	/* device is allowed to go away first, so no need to hold extra refcnt.	 */
+out:
+	spin_unlock(&tun_lock);
+	return err;
 }
 
 /* TAP filtering */
@@ -332,16 +416,7 @@ static const struct ethtool_ops tun_ethtool_ops;
 /* Net device detach from fd. */
 static void tun_net_uninit(struct net_device *dev)
 {
-	struct tun_struct *tun = netdev_priv(dev);
-	struct tun_file *tfile = tun->tfile;
-
-	/* Inform the methods they need to stop using the dev.
-	 */
-	if (tfile) {
-		wake_up_all(&tfile->wq.wait);
-		if (atomic_dec_and_test(&tfile->count))
-			__tun_detach(tun);
-	}
+	tun_detach_all(dev);
 }
 
 /* Net device open. */
@@ -361,10 +436,10 @@ static int tun_net_close(struct net_device *dev)
 /* Net device start xmit */
 static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 {
-	struct tun_struct *tun = netdev_priv(dev);
-	struct tun_file *tfile = tun->tfile;
+	struct tun_file *tfile = NULL;
 
-	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
+	rcu_read_lock();
+	tfile = tun_get_queue(dev, skb);
 
 	/* Drop packet if interface is not attached */
 	if (!tfile)
@@ -381,7 +456,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto drop;
 
 	if (skb_queue_len(&tfile->socket.sk->sk_receive_queue) >= dev->tx_queue_len) {
-		if (!(tun->flags & TUN_ONE_QUEUE)) {
+		if (!(tfile->flags & TUN_ONE_QUEUE) && !(tfile->flags && TUN_TAP_MQ)) {
 			/* Normal queueing mode. */
 			/* Packet scheduler handles dropping of further packets. */
 			netif_stop_queue(dev);
@@ -390,7 +465,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 			 * error is more appropriate. */
 			dev->stats.tx_fifo_errors++;
 		} else {
-			/* Single queue mode.
+			/* Single queue mode or multi queue mode.
 			 * Driver handles dropping of all packets itself. */
 			goto drop;
 		}
@@ -408,9 +483,11 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 		kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
 	wake_up_interruptible_poll(&tfile->wq.wait, POLLIN |
 				   POLLRDNORM | POLLRDBAND);
+	rcu_read_unlock();
 	return NETDEV_TX_OK;
 
 drop:
+	rcu_read_unlock();
 	dev->stats.tx_dropped++;
 	kfree_skb(skb);
 	return NETDEV_TX_OK;
@@ -526,16 +603,22 @@ static void tun_net_init(struct net_device *dev)
 static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
 {
 	struct tun_file *tfile = file->private_data;
-	struct tun_struct *tun = __tun_get(tfile);
+	struct tun_struct *tun = NULL;
 	struct sock *sk;
 	unsigned int mask = 0;
 
-	if (!tun)
+	if (!tfile)
 		return POLLERR;
 
-	sk = tfile->socket.sk;
+	rcu_read_lock();
+	tun = rcu_dereference(tfile->tun);
+	if (!tun) {
+		rcu_read_unlock();
+		return POLLERR;
+	}
+	rcu_read_unlock();
 
-	tun_debug(KERN_INFO, tun, "tun_chr_poll\n");
+	sk = &tfile->sk;
 
 	poll_wait(file, &tfile->wq.wait, wait);
 
@@ -547,10 +630,12 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
 	     sock_writeable(sk)))
 		mask |= POLLOUT | POLLWRNORM;
 
-	if (tun->dev->reg_state != NETREG_REGISTERED)
+	rcu_read_lock();
+	tun = rcu_dereference(tfile->tun);
+	if (!tun || tun->dev->reg_state != NETREG_REGISTERED)
 		mask = POLLERR;
+	rcu_read_unlock();
 
-	tun_put(tun);
 	return mask;
 }
 
@@ -706,11 +791,12 @@ static ssize_t tun_get_user(struct tun_file *tfile,
 		skb_shinfo(skb)->gso_segs = 0;
 	}
 
-	tun = __tun_get(tfile);
+	rcu_read_lock();
+	tun = rcu_dereference(tfile->tun);
 	if (!tun) {
+		rcu_read_unlock();
 		return -EBADFD;
 	}
-
 	switch (tfile->flags & TUN_TYPE_MASK) {
 	case TUN_TUN_DEV:
 		skb->dev = tun->dev;
@@ -719,27 +805,29 @@ static ssize_t tun_get_user(struct tun_file *tfile,
 		skb->protocol = eth_type_trans(skb, tun->dev);
 		break;
 	}
-
-	netif_rx_ni(skb);
 	tun->dev->stats.rx_packets++;
 	tun->dev->stats.rx_bytes += len;
-	tun_put(tun);
+	rcu_read_unlock();
+
+	netif_rx_ni(skb);
+
 	return count;
 
 err_free:
 	count = -EINVAL;
 	kfree_skb(skb);
 err:
-	tun = __tun_get(tfile);
+	rcu_read_lock();
+	tun = rcu_dereference(tfile->tun);
 	if (!tun) {
+		rcu_read_unlock();
 		return -EBADFD;
 	}
-
 	if (drop)
 		tun->dev->stats.rx_dropped++;
 	if (error)
 		tun->dev->stats.rx_frame_errors++;
-	tun_put(tun);
+	rcu_read_unlock();
 	return count;
 }
 
@@ -832,12 +920,13 @@ static ssize_t tun_put_user(struct tun_file *tfile,
 	skb_copy_datagram_const_iovec(skb, 0, iv, total, len);
 	total += skb->len;
 
-	tun = __tun_get(tfile);
+	rcu_read_lock();
+	tun = rcu_dereference(tfile->tun);
 	if (tun) {
 		tun->dev->stats.tx_packets++;
 		tun->dev->stats.tx_bytes += len;
-		tun_put(tun);
 	}
+	rcu_read_unlock();
 
 	return total;
 }
@@ -867,28 +956,31 @@ static ssize_t tun_do_read(struct tun_file *tfile,
 				break;
 			}
 
-			tun = __tun_get(tfile);
+			rcu_read_lock();
+			tun = rcu_dereference(tfile->tun);
 			if (!tun) {
-				ret = -EIO;
+				ret = -EBADFD;
+				rcu_read_unlock();
 				break;
 			}
 			if (tun->dev->reg_state != NETREG_REGISTERED) {
 				ret = -EIO;
-				tun_put(tun);
+				rcu_read_unlock();
 				break;
 			}
-			tun_put(tun);
+			rcu_read_unlock();
 
 			/* Nothing to read, let's sleep */
 			schedule();
 			continue;
 		}
 
-		tun = __tun_get(tfile);
+		rcu_read_lock();
+		tun = rcu_dereference(tfile->tun);
 		if (tun) {
 			netif_wake_queue(tun->dev);
-			tun_put(tun);
 		}
+		rcu_read_unlock();
 
 		ret = tun_put_user(tfile, skb, iv, len);
 		kfree_skb(skb);
@@ -1028,6 +1120,9 @@ static int tun_flags(struct tun_struct *tun)
 	if (tun->flags & TUN_VNET_HDR)
 		flags |= IFF_VNET_HDR;
 
+	if (tun->flags & TUN_TAP_MQ)
+		flags |= IFF_MULTI_QUEUE;
+
 	return flags;
 }
 
@@ -1107,6 +1202,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 			/* TAP device */
 			flags |= TUN_TAP_DEV;
 			name = "tap%d";
+			if (ifr->ifr_flags & IFF_MULTI_QUEUE) {
+				flags |= TUN_TAP_MQ;
+				name = "mqtap%d";
+			}
 		} else
 			return -EINVAL;
 
@@ -1132,6 +1231,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
 			TUN_USER_FEATURES;
 		dev->features = dev->hw_features;
+		if (ifr->ifr_flags & IFF_MULTI_QUEUE)
+			dev->features |= NETIF_F_LLTX;
 
 		err = register_netdevice(tun->dev);
 		if (err < 0)
@@ -1164,6 +1265,11 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	else
 		tun->flags &= ~TUN_VNET_HDR;
 
+	if (ifr->ifr_flags & IFF_MULTI_QUEUE)
+		tun->flags |= TUN_TAP_MQ;
+	else
+		tun->flags &= ~TUN_TAP_MQ;
+
 	/* Cache flags from tun device */
 	tfile->flags = tun->flags;
 	/* Make sure persistent devices do not get stuck in
@@ -1254,38 +1360,39 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 				(unsigned int __user*)argp);
 	}
 
-	rtnl_lock();
-
-	tun = __tun_get(tfile);
-	if (cmd == TUNSETIFF && !tun) {
+	ret = 0;
+	if (cmd == TUNSETIFF) {
+		rtnl_lock();
 		ifr.ifr_name[IFNAMSIZ-1] = '\0';
-
 		ret = tun_set_iff(tfile->net, file, &ifr);
-
+		rtnl_unlock();
 		if (ret)
-			goto unlock;
-
+			return ret;
 		if (copy_to_user(argp, &ifr, ifreq_len))
-			ret = -EFAULT;
-		goto unlock;
+			return -EFAULT;
+		return ret;
 	}
 
+	rtnl_lock();
+
+	rcu_read_lock();
+
 	ret = -EBADFD;
+	tun = rcu_dereference(tfile->tun);
 	if (!tun)
 		goto unlock;
 
-	tun_debug(KERN_INFO, tun, "tun_chr_ioctl cmd %d\n", cmd);
 
-	ret = 0;
-	switch (cmd) {
+	switch(cmd) {
 	case TUNGETIFF:
 		ret = tun_get_iff(current->nsproxy->net_ns, tun, &ifr);
+		rcu_read_unlock();
 		if (ret)
-			break;
+			goto out;
 
 		if (copy_to_user(argp, &ifr, ifreq_len))
 			ret = -EFAULT;
-		break;
+		goto out;
 
 	case TUNSETNOCSUM:
 		/* Disable/Enable checksum */
@@ -1347,9 +1454,10 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 		/* Get hw address */
 		memcpy(ifr.ifr_hwaddr.sa_data, tun->dev->dev_addr, ETH_ALEN);
 		ifr.ifr_hwaddr.sa_family = tun->dev->type;
+		rcu_read_unlock();
 		if (copy_to_user(argp, &ifr, ifreq_len))
 			ret = -EFAULT;
-		break;
+		goto out;
 
 	case SIOCSIFHWADDR:
 		/* Set hw address */
@@ -1365,9 +1473,9 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 	}
 
 unlock:
+	rcu_read_unlock();
+out:
 	rtnl_unlock();
-	if (tun)
-		tun_put(tun);
 	return ret;
 }
 
@@ -1539,31 +1647,8 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 static int tun_chr_close(struct inode *inode, struct file *file)
 {
 	struct tun_file *tfile = file->private_data;
-	struct tun_struct *tun;
-
-	tun = __tun_get(tfile);
-	if (tun) {
-		struct net_device *dev = tun->dev;
-
-		tun_debug(KERN_INFO, tun, "tun_chr_close\n");
-
-		__tun_detach(tun);
-
-		/* If desirable, unregister the netdevice. */
-		if (!(tun->flags & TUN_PERSIST)) {
-			rtnl_lock();
-			if (dev->reg_state == NETREG_REGISTERED)
-				unregister_netdevice(dev);
-			rtnl_unlock();
-		}
-
-		/* drop the reference that netdevice holds */
-		sock_put(&tfile->sk);
-
-	}
 
-	/* drop the reference that file holds */
-	sock_put(&tfile->sk);
+	tun_detach(tfile, true);
 
 	return 0;
 }
@@ -1691,14 +1776,17 @@ static void tun_cleanup(void)
  * holding a reference to the file for as long as the socket is in use. */
 struct socket *tun_get_socket(struct file *file)
 {
-	struct tun_struct *tun;
+	struct tun_struct *tun = NULL;
 	struct tun_file *tfile = file->private_data;
 	if (file->f_op != &tun_fops)
 		return ERR_PTR(-EINVAL);
-	tun = tun_get(file);
-	if (!tun)
+	rcu_read_lock();
+	tun = rcu_dereference(tfile->tun);
+	if (!tun) {
+		rcu_read_unlock();
 		return ERR_PTR(-EBADFD);
-	tun_put(tun);
+	}
+	rcu_read_unlock();
 	return &tfile->socket;
 }
 EXPORT_SYMBOL_GPL(tun_get_socket);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [net-next RFC V2 PATCH 5/5] tuntap: add ioctls to attach or detach a file form tap device
  2011-09-17  6:02 [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Jason Wang
                   ` (3 preceding siblings ...)
  2011-09-17  6:02 ` [net-next RFC V2 PATCH 4/5] tuntap: multiqueue support Jason Wang
@ 2011-09-17  6:03 ` Jason Wang
  2011-09-17 19:17 ` [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Michael S. Tsirkin
  2011-09-19 14:45 ` Ben Hutchings
  6 siblings, 0 replies; 9+ messages in thread
From: Jason Wang @ 2011-09-17  6:03 UTC (permalink / raw)
  To: krkumar2, eric.dumazet, mst, netdev, linux-kernel, virtualization, davem
  Cc: kvm, rusty, qemu-devel, mirq-linux, joe, shemminger

New ioctls were added to let multiple files/sockets to be attached to
a tap device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c      |   25 ++++++++++++++++++++++---
 include/linux/if_tun.h |    3 +++
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ec29f85..6a1b591 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1343,11 +1343,12 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 {
 	struct tun_file *tfile = file->private_data;
 	struct tun_struct *tun;
+	struct net_device *dev = NULL;
 	void __user* argp = (void __user*)arg;
 	struct ifreq ifr;
 	int ret;
 
-	if (cmd == TUNSETIFF || _IOC_TYPE(cmd) == 0x89)
+	if (cmd == TUNSETIFF || cmd == TUNATTACHQUEUE || _IOC_TYPE(cmd) == 0x89)
 		if (copy_from_user(&ifr, argp, ifreq_len))
 			return -EFAULT;
 
@@ -1356,7 +1357,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 		 * This is needed because we never checked for invalid flags on
 		 * TUNSETIFF. */
 		return put_user(IFF_TUN | IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE |
-				IFF_VNET_HDR,
+				IFF_VNET_HDR | IFF_MULTI_QUEUE,
 				(unsigned int __user*)argp);
 	}
 
@@ -1372,6 +1373,9 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 			return -EFAULT;
 		return ret;
 	}
+	if (cmd == TUNDETACHQUEUE) {
+		return tun_detach(tfile, false);
+	}
 
 	rtnl_lock();
 
@@ -1379,7 +1383,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 
 	ret = -EBADFD;
 	tun = rcu_dereference(tfile->tun);
-	if (!tun)
+	if (!tun && cmd != TUNATTACHQUEUE)
 		goto unlock;
 
 
@@ -1394,6 +1398,21 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 			ret = -EFAULT;
 		goto out;
 
+       case TUNATTACHQUEUE:
+               dev = __dev_get_by_name(tfile->net, ifr.ifr_name);
+               if (!dev || dev->netdev_ops != &tap_netdev_ops) {
+                       ret = -EINVAL;
+               } else if (ifr.ifr_flags &
+                       ~(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR)) {
+		       /* ignore illegal flag */
+                       ret = -EINVAL;
+               } else {
+                       tfile->flags = TUN_TAP_DEV | TUN_NO_PI | TUN_VNET_HDR;
+                       tun = netdev_priv(dev);
+                       ret = tun_attach(tun, file);
+               }
+               break;
+
 	case TUNSETNOCSUM:
 		/* Disable/Enable checksum */
 
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index c92a291..d3f24d8 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -54,6 +54,9 @@
 #define TUNDETACHFILTER _IOW('T', 214, struct sock_fprog)
 #define TUNGETVNETHDRSZ _IOR('T', 215, int)
 #define TUNSETVNETHDRSZ _IOW('T', 216, int)
+#define TUNATTACHQUEUE  _IOW('T', 217, int)
+#define TUNDETACHQUEUE  _IOW('T', 218, int)
+
 
 /* TUNSETIFF ifr flags */
 #define IFF_TUN		0x0001

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap
  2011-09-17  6:02 [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Jason Wang
                   ` (4 preceding siblings ...)
  2011-09-17  6:03 ` [net-next RFC V2 PATCH 5/5] tuntap: add ioctls to attach or detach a file form tap device Jason Wang
@ 2011-09-17 19:17 ` Michael S. Tsirkin
  2011-09-19  9:44   ` Jason Wang
  2011-09-19 14:45 ` Ben Hutchings
  6 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2011-09-17 19:17 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, kvm, eric.dumazet, qemu-devel, netdev, rusty,
	linux-kernel, virtualization, joe, shemminger, mirq-linux, davem

On Sat, Sep 17, 2011 at 02:02:04PM +0800, Jason Wang wrote:
> A wiki-page was created to narrate the detail design of all parts
> involved in the multi queue implementation:
> http://www.linux-kvm.org/page/Multiqueue and some basic tests result
> could be seen in this page
> http://www.linux-kvm.org/page/Multiqueue-performance-Sep-13. I would
> post the detail numbers in attachment as the reply of this thread.

Does it make sense to test both with and without RPS in guest?

-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap
  2011-09-17 19:17 ` [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Michael S. Tsirkin
@ 2011-09-19  9:44   ` Jason Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Jason Wang @ 2011-09-19  9:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: krkumar2, kvm, eric.dumazet, qemu-devel, netdev, rusty,
	linux-kernel, virtualization, joe, shemminger, mirq-linux, davem

On 09/18/2011 03:17 AM, Michael S. Tsirkin wrote:
> On Sat, Sep 17, 2011 at 02:02:04PM +0800, Jason Wang wrote:
>> A wiki-page was created to narrate the detail design of all parts
>> involved in the multi queue implementation:
>> http://www.linux-kvm.org/page/Multiqueue and some basic tests result
>> could be seen in this page
>> http://www.linux-kvm.org/page/Multiqueue-performance-Sep-13. I would
>> post the detail numbers in attachment as the reply of this thread.
> Does it make sense to test both with and without RPS in guest?
>
I've tested with RPS in guest, but didn't see improvements.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap
  2011-09-17  6:02 [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Jason Wang
                   ` (5 preceding siblings ...)
  2011-09-17 19:17 ` [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Michael S. Tsirkin
@ 2011-09-19 14:45 ` Ben Hutchings
  6 siblings, 0 replies; 9+ messages in thread
From: Ben Hutchings @ 2011-09-19 14:45 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, eric.dumazet, mst, netdev, linux-kernel,
	virtualization, davem, kvm, rusty, qemu-devel, mirq-linux, joe,
	shemminger

On Sat, 2011-09-17 at 14:02 +0800, Jason Wang wrote:
[...]
> 2 Current implementation may also get regression for single session
> packet transmission.
> 
> The reason is packets from each flow were not handled by the same
> queue/vhost thread.
> 
> Various method could be done to handle this:
> 
> 2.1 hack the guest driver, and store the queue index into the rxhash and
> use it when choosing tx in guest. This need some hack to store the
> rxhash into sk and pass it in to skb again in
> skb_orphan_try(). sk_rxhash is only used by RPS now, so some more
> clean method is needed.
[...]

I have previously suggested doing this as a general rule.  However, I
now think we can do much better with accelerated RFS and automatic XPS
(but the latter is not yet implemented).  For virtio_net, accelerated
RFS would effectively push the guest's RFS socket map out to the host.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-09-19 14:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-17  6:02 [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Jason Wang
2011-09-17  6:02 ` [net-next RFC V2 PATCH 1/5] tuntap: move socket to tun_file Jason Wang
2011-09-17  6:02 ` [net-next RFC V2 PATCH 2/5] tuntap: categorize ioctl Jason Wang
2011-09-17  6:02 ` [net-next RFC V2 PATCH 3/5] tuntap: introduce multiqueue flags Jason Wang
2011-09-17  6:02 ` [net-next RFC V2 PATCH 4/5] tuntap: multiqueue support Jason Wang
2011-09-17  6:03 ` [net-next RFC V2 PATCH 5/5] tuntap: add ioctls to attach or detach a file form tap device Jason Wang
2011-09-17 19:17 ` [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap Michael S. Tsirkin
2011-09-19  9:44   ` Jason Wang
2011-09-19 14:45 ` Ben Hutchings

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).