linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next PATCH 0/8] Multiqueue API for macvtap
@ 2013-06-06  9:54 Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 1/8] macvtap: fix a possible race between queue selection and changing queues Jason Wang
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Hi all:

This series implements a fully tuntap compatiable API which could be used
by userspace to manage multiple macvtap queues. The main parts is to add
TUNSETQUEUE ioctl support for macvtap.

Patch 1 - 5 was some tuntap compatibility and misc cleanups.
Patch 6 removes the linear search in macvtap by reshuffling the macvtaps array
each time a queue is removed.
Patch 7 implement TUNSETQUEUE ioctl by introducing a list to track all queues
while use the array to just track all enabled queues.
Patch 8 reports IFF_MULTI_QUEUE to userspace to notify the userspace that the
multiqueue API is completed.

Flow caches implememtation were missed in this version, since I am doing
rework on the tuntap flow caches. Have some some stress test with both netperf
and pktgen.

Please review, thanks.

Changes from RFC V3:
- drop the TUNSETIFF checks
- disable taps first during macvtap_put_queue() to simplify codes
- typo fixes and better comments from Michael on numvtaps in macvtap_get_queue()
- BUG_ON() and unnecessary checks of numdisabled in macvtap_del_queues()
- drop numdisabled and introduce numqueues to track the number of queues
- better style in macvtap_ioctl_set_queue()
- rename tap_link to queue_list
- blank line fixes again :(

Changes from RFC V2:
- Don't place disabled taps in the array and use a linked like to track all taps
- Remove unnecessary best-effort barriers
- Add ACCESS_ONCE() for vlan->numvtaps to prevent compiler generate unexpected
codes
- Add a comment to explain that numvtaps is just a hint in macvtap_get_queue()
- blank line fixes

Changes from RFC V1:
- Drop the patch of "return -EBADFD when TUNGETIFF fails" to preserve backward
compatibility
- Add a small patch of optimizing macvtap_do_read() (see patch 1/8)
- Use different functions to do tap enabling and creating
- Use different functions to do tap disabling and closing
- Remove the emtpy line in patch 6/8
- Add a line in patch 3/8
- Add a comment to explain the checking when doing TUNSETIFF with
IFF_MULTI_QUEUE
- Add comments to explian the second swapping when moving taps between disabled
areas and enabled areas.
- Fix the commit log of patch 7/8 ( v1 says the linked list were used which is
wrong )
- Other minor bug fixes

Jason Wang (8):
  macvtap: fix a possible race between queue selection and changing
    queues
  macvtap: do not add self to waitqueue if doing a nonblock read
  macvlan: switch to use IS_ENABLED()
  macvtap: introduce macvtap_get_vlan()
  macvlan: change the max number of queues to 16
  macvtap: eliminate linear search
  macvtap: add TUNSETQUEUE ioctl
  macvtap: enable multiqueue flag

 drivers/net/macvtap.c      |  206 +++++++++++++++++++++++++++++++-------------
 include/linux/if_macvlan.h |    8 ++-
 2 files changed, 152 insertions(+), 62 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [net-next PATCH 1/8] macvtap: fix a possible race between queue selection and changing queues
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 2/8] macvtap: do not add self to waitqueue if doing a nonblock read Jason Wang
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Complier may generate codes that re-read the vlan->numvtaps during
macvtap_get_queue(). This may lead a race if vlan->numvtaps were changed in the
same time and which can lead unexpected result (e.g. very huge value).

We need prevent the compiler from generating such codes by adding an
ACCESS_ONCE() to make sure vlan->numvtaps were only read once.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 68efb91..5e485e3 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -172,7 +172,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct macvtap_queue *tap = NULL;
-	int numvtaps = vlan->numvtaps;
+	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
 	__u32 rxq;
 
 	if (!numvtaps)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 2/8] macvtap: do not add self to waitqueue if doing a nonblock read
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 1/8] macvtap: fix a possible race between queue selection and changing queues Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 3/8] macvlan: switch to use IS_ENABLED() Jason Wang
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

There's no need to add self to waitqueue if doing a nonblock read. This could
help to avoid the spinlock contention.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 5e485e3..8949631 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -845,7 +845,9 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
 	ssize_t ret = 0;
 
 	while (len) {
-		prepare_to_wait(sk_sleep(&q->sk), &wait, TASK_INTERRUPTIBLE);
+		if (!noblock)
+			prepare_to_wait(sk_sleep(&q->sk), &wait,
+					TASK_INTERRUPTIBLE);
 
 		/* Read frames from the queue */
 		skb = skb_dequeue(&q->sk.sk_receive_queue);
@@ -867,7 +869,8 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
 		break;
 	}
 
-	finish_wait(sk_sleep(&q->sk), &wait);
+	if (!noblock)
+		finish_wait(sk_sleep(&q->sk), &wait);
 	return ret;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 3/8] macvlan: switch to use IS_ENABLED()
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 1/8] macvtap: fix a possible race between queue selection and changing queues Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 2/8] macvtap: do not add self to waitqueue if doing a nonblock read Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan() Jason Wang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/if_macvlan.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 84dde1d..e47ad46 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -8,7 +8,7 @@
 #include <net/netlink.h>
 #include <linux/u64_stats_sync.h>
 
-#if defined(CONFIG_MACVTAP) || defined(CONFIG_MACVTAP_MODULE)
+#if IS_ENABLED(CONFIG_MACVTAP)
 struct socket *macvtap_get_socket(struct file *);
 #else
 #include <linux/err.h>
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan()
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (2 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 3/8] macvlan: switch to use IS_ENABLED() Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06 11:05   ` Michael S. Tsirkin
  2013-06-06  9:54 ` [net-next PATCH 5/8] macvlan: change the max number of queues to 16 Jason Wang
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Factor out the device holding logic to a macvtap_get_vlan(), this will be also
used by multiqueue API.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |   27 ++++++++++++++++++++-------
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 8949631..d18130b 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -893,6 +893,24 @@ out:
 	return ret;
 }
 
+static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
+{
+	struct macvlan_dev *vlan;
+
+	rcu_read_lock_bh();
+	vlan = rcu_dereference_bh(q->vlan);
+	if (vlan)
+		dev_hold(vlan->dev);
+	rcu_read_unlock_bh();
+
+	return vlan;
+}
+
+static void macvtap_put_vlan(struct macvlan_dev *vlan)
+{
+	dev_put(vlan->dev);
+}
+
 /*
  * provide compatibility with generic tun/tap interface
  */
@@ -924,12 +942,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		return ret;
 
 	case TUNGETIFF:
-		rcu_read_lock_bh();
-		vlan = rcu_dereference_bh(q->vlan);
-		if (vlan)
-			dev_hold(vlan->dev);
-		rcu_read_unlock_bh();
-
+		vlan = macvtap_get_vlan(q);
 		if (!vlan)
 			return -ENOLINK;
 
@@ -937,7 +950,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		if (copy_to_user(&ifr->ifr_name, vlan->dev->name, IFNAMSIZ) ||
 		    put_user(q->flags, &ifr->ifr_flags))
 			ret = -EFAULT;
-		dev_put(vlan->dev);
+		macvtap_put_vlan(vlan);
 		return ret;
 
 	case TUNGETFEATURES:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 5/8] macvlan: change the max number of queues to 16
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (3 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan() Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06 11:05   ` Michael S. Tsirkin
  2013-06-06  9:54 ` [net-next PATCH 6/8] macvtap: eliminate linear search Jason Wang
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Macvtap should be at least compatible with tap, so change the max number to 16.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/if_macvlan.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index e47ad46..62d8bda 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -50,7 +50,7 @@ struct macvlan_pcpu_stats {
  * Maximum times a macvtap device can be opened. This can be used to
  * configure the number of receive queue, e.g. for multiqueue virtio.
  */
-#define MAX_MACVTAP_QUEUES	(NR_CPUS < 16 ? NR_CPUS : 16)
+#define MAX_MACVTAP_QUEUES	16
 
 #define MACVLAN_MC_FILTER_BITS	8
 #define MACVLAN_MC_FILTER_SZ	(1 << MACVLAN_MC_FILTER_BITS)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 6/8] macvtap: eliminate linear search
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (4 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 5/8] macvlan: change the max number of queues to 16 Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06 11:06   ` Michael S. Tsirkin
  2013-06-06  9:54 ` [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Linear search were used in both get_slot() and macvtap_get_queue(), this is
because:

- macvtap didn't reshuffle the array of taps when create or destroy a queue, so
  when adding a new queue, macvtap must do linear search to find a location for
  the new queue. This will also complicate the TUNSETQUEUE implementation for
  multiqueue API.
- the queue itself didn't track the queue index, so the we must do a linear
  search in the array to find the location of a existed queue.

The solution is straightforward: reshuffle the array and introduce a queue_index
to macvtap_queue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |   64 +++++++++++++++---------------------------------
 1 files changed, 20 insertions(+), 44 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index d18130b..5ccba99 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -44,6 +44,7 @@ struct macvtap_queue {
 	struct macvlan_dev __rcu *vlan;
 	struct file *file;
 	unsigned int flags;
+	u16 queue_index;
 };
 
 static struct proto macvtap_proto = {
@@ -84,30 +85,10 @@ static const struct proto_ops macvtap_socket_ops;
  */
 static DEFINE_SPINLOCK(macvtap_lock);
 
-/*
- * get_slot: return a [unused/occupied] slot in vlan->taps[]:
- *	- if 'q' is NULL, return the first empty slot;
- *	- otherwise, return the slot this pointer occupies.
- */
-static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
-{
-	int i;
-
-	for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
-		if (rcu_dereference_protected(vlan->taps[i],
-					      lockdep_is_held(&macvtap_lock)) == q)
-			return i;
-	}
-
-	/* Should never happen */
-	BUG_ON(1);
-}
-
 static int macvtap_set_queue(struct net_device *dev, struct file *file,
 				struct macvtap_queue *q)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
-	int index;
 	int err = -EBUSY;
 
 	spin_lock(&macvtap_lock);
@@ -115,12 +96,12 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
 		goto out;
 
 	err = 0;
-	index = get_slot(vlan, NULL);
 	rcu_assign_pointer(q->vlan, vlan);
-	rcu_assign_pointer(vlan->taps[index], q);
+	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
 	sock_hold(&q->sk);
 
 	q->file = file;
+	q->queue_index = vlan->numvtaps;
 	file->private_data = q;
 
 	vlan->numvtaps++;
@@ -140,15 +121,22 @@ out:
  */
 static void macvtap_put_queue(struct macvtap_queue *q)
 {
+	struct macvtap_queue *nq;
 	struct macvlan_dev *vlan;
 
 	spin_lock(&macvtap_lock);
 	vlan = rcu_dereference_protected(q->vlan,
 					 lockdep_is_held(&macvtap_lock));
 	if (vlan) {
-		int index = get_slot(vlan, q);
+		int index = q->queue_index;
+		BUG_ON(index >= vlan->numvtaps);
+
+		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
+					       lockdep_is_held(&macvtap_lock));
+		rcu_assign_pointer(vlan->taps[index], nq);
+		nq->queue_index = index;
 
-		RCU_INIT_POINTER(vlan->taps[index], NULL);
+		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
 		RCU_INIT_POINTER(q->vlan, NULL);
 		sock_put(&q->sk);
 		--vlan->numvtaps;
@@ -182,8 +170,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 	rxq = skb_get_rxhash(skb);
 	if (rxq) {
 		tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
-		if (tap)
-			goto out;
+		goto out;
 	}
 
 	if (likely(skb_rx_queue_recorded(skb))) {
@@ -193,17 +180,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 			rxq -= numvtaps;
 
 		tap = rcu_dereference(vlan->taps[rxq]);
-		if (tap)
-			goto out;
-	}
-
-	/* Everything failed - find first available queue */
-	for (rxq = 0; rxq < MAX_MACVTAP_QUEUES; rxq++) {
-		tap = rcu_dereference(vlan->taps[rxq]);
-		if (tap)
-			break;
+		goto out;
 	}
 
+	tap = rcu_dereference(vlan->taps[0]);
 out:
 	return tap;
 }
@@ -219,19 +199,15 @@ static void macvtap_del_queues(struct net_device *dev)
 	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
 	int i, j = 0;
 
-	/* macvtap_put_queue can free some slots, so go through all slots */
 	spin_lock(&macvtap_lock);
-	for (i = 0; i < MAX_MACVTAP_QUEUES && vlan->numvtaps; i++) {
+	for (i = 0; i < vlan->numvtaps; i++) {
 		q = rcu_dereference_protected(vlan->taps[i],
 					      lockdep_is_held(&macvtap_lock));
-		if (q) {
-			qlist[j++] = q;
-			RCU_INIT_POINTER(vlan->taps[i], NULL);
-			RCU_INIT_POINTER(q->vlan, NULL);
-			vlan->numvtaps--;
-		}
+		BUG_ON(q == NULL);
+		qlist[j++] = q;
+		RCU_INIT_POINTER(vlan->taps[i], NULL);
+		RCU_INIT_POINTER(q->vlan, NULL);
 	}
-	BUG_ON(vlan->numvtaps != 0);
 	/* guarantee that any future macvtap_set_queue will fail */
 	vlan->numvtaps = MAX_MACVTAP_QUEUES;
 	spin_unlock(&macvtap_lock);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (5 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 6/8] macvtap: eliminate linear search Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06 11:06   ` Michael S. Tsirkin
  2013-06-06  9:54 ` [net-next PATCH 8/8] macvtap: enable multiqueue flag Jason Wang
  2013-06-08  6:49 ` [net-next PATCH 0/8] Multiqueue API for macvtap David Miller
  8 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
enable a queue of macvtap. This is used to be compatible at API layer of tuntap
to simplify the userspace to manage the queues. This is done through introducing
a linked list to track all taps while using vlan->taps array to only track
active taps.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c      |  135 +++++++++++++++++++++++++++++++++++++------
 include/linux/if_macvlan.h |    4 +
 2 files changed, 120 insertions(+), 19 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 5ccba99..d2d1d55 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -45,6 +45,8 @@ struct macvtap_queue {
 	struct file *file;
 	unsigned int flags;
 	u16 queue_index;
+	bool enabled;
+	struct list_head next;
 };
 
 static struct proto macvtap_proto = {
@@ -85,14 +87,36 @@ static const struct proto_ops macvtap_socket_ops;
  */
 static DEFINE_SPINLOCK(macvtap_lock);
 
-static int macvtap_set_queue(struct net_device *dev, struct file *file,
+static int macvtap_enable_queue(struct net_device *dev, struct file *file,
 				struct macvtap_queue *q)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
+	int err = -EINVAL;
+
+	spin_lock(&macvtap_lock);
+
+	if (q->enabled)
+		goto out;
+
+	err = 0;
+	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
+	q->queue_index = vlan->numvtaps;
+	q->enabled = true;
+
+	vlan->numvtaps++;
+out:
+	spin_unlock(&macvtap_lock);
+	return err;
+}
+
+static int macvtap_set_queue(struct net_device *dev, struct file *file,
+			     struct macvtap_queue *q)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
 	int err = -EBUSY;
 
 	spin_lock(&macvtap_lock);
-	if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
+	if (vlan->numqueues == MAX_MACVTAP_QUEUES)
 		goto out;
 
 	err = 0;
@@ -102,15 +126,57 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
 
 	q->file = file;
 	q->queue_index = vlan->numvtaps;
+	q->enabled = true;
 	file->private_data = q;
+	list_add_tail(&q->next, &vlan->queue_list);
 
 	vlan->numvtaps++;
+	vlan->numqueues++;
 
 out:
 	spin_unlock(&macvtap_lock);
 	return err;
 }
 
+static int __macvtap_disable_queue(struct macvtap_queue *q)
+{
+	struct macvlan_dev *vlan;
+	struct macvtap_queue *nq;
+
+	vlan = rcu_dereference_protected(q->vlan,
+					 lockdep_is_held(&macvtap_lock));
+
+	if (!q->enabled)
+		return -EINVAL;
+
+	if (vlan) {
+		int index = q->queue_index;
+		BUG_ON(index >= vlan->numvtaps);
+		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
+					       lockdep_is_held(&macvtap_lock));
+		nq->queue_index = index;
+
+		rcu_assign_pointer(vlan->taps[index], nq);
+		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
+		q->enabled = false;
+
+		vlan->numvtaps--;
+	}
+
+	return 0;
+}
+
+static int macvtap_disable_queue(struct macvtap_queue *q)
+{
+	int err;
+
+	spin_lock(&macvtap_lock);
+	err = __macvtap_disable_queue(q);
+	spin_unlock(&macvtap_lock);
+
+	return err;
+}
+
 /*
  * The file owning the queue got closed, give up both
  * the reference that the files holds as well as the
@@ -121,25 +187,19 @@ out:
  */
 static void macvtap_put_queue(struct macvtap_queue *q)
 {
-	struct macvtap_queue *nq;
 	struct macvlan_dev *vlan;
 
 	spin_lock(&macvtap_lock);
 	vlan = rcu_dereference_protected(q->vlan,
 					 lockdep_is_held(&macvtap_lock));
 	if (vlan) {
-		int index = q->queue_index;
-		BUG_ON(index >= vlan->numvtaps);
-
-		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
-					       lockdep_is_held(&macvtap_lock));
-		rcu_assign_pointer(vlan->taps[index], nq);
-		nq->queue_index = index;
+		if (q->enabled)
+			BUG_ON(__macvtap_disable_queue(q));
 
-		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
+		vlan->numqueues--;
 		RCU_INIT_POINTER(q->vlan, NULL);
 		sock_put(&q->sk);
-		--vlan->numvtaps;
+		list_del_init(&q->next);
 	}
 
 	spin_unlock(&macvtap_lock);
@@ -160,6 +220,11 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct macvtap_queue *tap = NULL;
+	/* Access to taps array is protected by rcu, but access to numvtaps
+	 * isn't. Below we use it to lookup a queue, but treat it as a hint
+	 * and validate that the result isn't NULL - in case we are
+	 * racing against queue removal.
+	 */
 	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
 	__u32 rxq;
 
@@ -196,18 +261,22 @@ out:
 static void macvtap_del_queues(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
-	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
+	struct macvtap_queue *q, *tmp, *qlist[MAX_MACVTAP_QUEUES];
 	int i, j = 0;
 
 	spin_lock(&macvtap_lock);
-	for (i = 0; i < vlan->numvtaps; i++) {
-		q = rcu_dereference_protected(vlan->taps[i],
-					      lockdep_is_held(&macvtap_lock));
-		BUG_ON(q == NULL);
+	list_for_each_entry_safe(q, tmp, &vlan->queue_list, next) {
+		list_del_init(&q->next);
 		qlist[j++] = q;
-		RCU_INIT_POINTER(vlan->taps[i], NULL);
 		RCU_INIT_POINTER(q->vlan, NULL);
+		if (q->enabled)
+			vlan->numvtaps--;
+		vlan->numqueues--;
 	}
+	for (i = 0; i < vlan->numvtaps; i++)
+		RCU_INIT_POINTER(vlan->taps[i], NULL);
+	BUG_ON(vlan->numvtaps);
+	BUG_ON(vlan->numqueues);
 	/* guarantee that any future macvtap_set_queue will fail */
 	vlan->numvtaps = MAX_MACVTAP_QUEUES;
 	spin_unlock(&macvtap_lock);
@@ -298,6 +367,9 @@ static int macvtap_newlink(struct net *src_net,
 			   struct nlattr *tb[],
 			   struct nlattr *data[])
 {
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	INIT_LIST_HEAD(&vlan->queue_list);
+
 	/* Don't put anything that may fail after macvlan_common_newlink
 	 * because we can't undo what it does.
 	 */
@@ -887,6 +959,25 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
 	dev_put(vlan->dev);
 }
 
+static int macvtap_ioctl_set_queue(struct file *file, unsigned int flags)
+{
+	struct macvtap_queue *q = file->private_data;
+	struct macvlan_dev *vlan;
+	int ret;
+
+	vlan = macvtap_get_vlan(q);
+	if (!vlan)
+		return -EINVAL;
+
+	if (flags & IFF_ATTACH_QUEUE)
+		ret = macvtap_enable_queue(vlan->dev, file, q);
+	else if (flags & IFF_DETACH_QUEUE)
+		ret = macvtap_disable_queue(q);
+
+	macvtap_put_vlan(vlan);
+	return ret;
+}
+
 /*
  * provide compatibility with generic tun/tap interface
  */
@@ -910,7 +1001,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 			return -EFAULT;
 
 		ret = 0;
-		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
+		if ((u & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
+		    (IFF_NO_PI | IFF_TAP))
 			ret = -EINVAL;
 		else
 			q->flags = u;
@@ -929,6 +1021,11 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		macvtap_put_vlan(vlan);
 		return ret;
 
+	case TUNSETQUEUE:
+		if (get_user(u, &ifr->ifr_flags))
+			return -EFAULT;
+		return macvtap_ioctl_set_queue(file, u);
+
 	case TUNGETFEATURES:
 		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
 			return -EFAULT;
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 62d8bda..1912133 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -69,8 +69,12 @@ struct macvlan_dev {
 	u16			flags;
 	int (*receive)(struct sk_buff *skb);
 	int (*forward)(struct net_device *dev, struct sk_buff *skb);
+	/* This array tracks active taps. */
 	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
+	/* This list tracks all taps (both enabled and disabled) */
+	struct list_head	queue_list;
 	int			numvtaps;
+	int			numqueues;
 	int			minor;
 };
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 8/8] macvtap: enable multiqueue flag
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (6 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06 11:05   ` Michael S. Tsirkin
  2013-06-08  6:49 ` [net-next PATCH 0/8] Multiqueue API for macvtap David Miller
  8 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

To notify the userspace about our capability of multiqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index d2d1d55..992151c 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -31,10 +31,6 @@
  * macvtap_proto is used to allocate queues through the sock allocation
  * mechanism.
  *
- * TODO: multiqueue support is currently not implemented, even though
- * macvtap is basically prepared for that. We will need to add this
- * here as well as in virtio-net and qemu to get line rate on 10gbit
- * adapters from a guest.
  */
 struct macvtap_queue {
 	struct sock sk;
@@ -1027,7 +1023,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		return macvtap_ioctl_set_queue(file, u);
 
 	case TUNGETFEATURES:
-		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
+		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR |
+			     IFF_MULTI_QUEUE, up))
 			return -EFAULT;
 		return 0;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan()
  2013-06-06  9:54 ` [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan() Jason Wang
@ 2013-06-06 11:05   ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06 11:05 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 05:54:36PM +0800, Jason Wang wrote:
> Factor out the device holding logic to a macvtap_get_vlan(), this will be also
> used by multiqueue API.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/macvtap.c |   27 ++++++++++++++++++++-------
>  1 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 8949631..d18130b 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -893,6 +893,24 @@ out:
>  	return ret;
>  }
>  
> +static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
> +{
> +	struct macvlan_dev *vlan;
> +
> +	rcu_read_lock_bh();
> +	vlan = rcu_dereference_bh(q->vlan);
> +	if (vlan)
> +		dev_hold(vlan->dev);
> +	rcu_read_unlock_bh();
> +
> +	return vlan;
> +}
> +
> +static void macvtap_put_vlan(struct macvlan_dev *vlan)
> +{
> +	dev_put(vlan->dev);
> +}
> +
>  /*
>   * provide compatibility with generic tun/tap interface
>   */
> @@ -924,12 +942,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  		return ret;
>  
>  	case TUNGETIFF:
> -		rcu_read_lock_bh();
> -		vlan = rcu_dereference_bh(q->vlan);
> -		if (vlan)
> -			dev_hold(vlan->dev);
> -		rcu_read_unlock_bh();
> -
> +		vlan = macvtap_get_vlan(q);
>  		if (!vlan)
>  			return -ENOLINK;
>  
> @@ -937,7 +950,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  		if (copy_to_user(&ifr->ifr_name, vlan->dev->name, IFNAMSIZ) ||
>  		    put_user(q->flags, &ifr->ifr_flags))
>  			ret = -EFAULT;
> -		dev_put(vlan->dev);
> +		macvtap_put_vlan(vlan);
>  		return ret;
>  
>  	case TUNGETFEATURES:
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 5/8] macvlan: change the max number of queues to 16
  2013-06-06  9:54 ` [net-next PATCH 5/8] macvlan: change the max number of queues to 16 Jason Wang
@ 2013-06-06 11:05   ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06 11:05 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 05:54:37PM +0800, Jason Wang wrote:
> Macvtap should be at least compatible with tap, so change the max number to 16.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  include/linux/if_macvlan.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
> index e47ad46..62d8bda 100644
> --- a/include/linux/if_macvlan.h
> +++ b/include/linux/if_macvlan.h
> @@ -50,7 +50,7 @@ struct macvlan_pcpu_stats {
>   * Maximum times a macvtap device can be opened. This can be used to
>   * configure the number of receive queue, e.g. for multiqueue virtio.
>   */
> -#define MAX_MACVTAP_QUEUES	(NR_CPUS < 16 ? NR_CPUS : 16)
> +#define MAX_MACVTAP_QUEUES	16
>  
>  #define MACVLAN_MC_FILTER_BITS	8
>  #define MACVLAN_MC_FILTER_SZ	(1 << MACVLAN_MC_FILTER_BITS)
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 8/8] macvtap: enable multiqueue flag
  2013-06-06  9:54 ` [net-next PATCH 8/8] macvtap: enable multiqueue flag Jason Wang
@ 2013-06-06 11:05   ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06 11:05 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 05:54:40PM +0800, Jason Wang wrote:
> To notify the userspace about our capability of multiqueue.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/macvtap.c |    7 ++-----
>  1 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index d2d1d55..992151c 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -31,10 +31,6 @@
>   * macvtap_proto is used to allocate queues through the sock allocation
>   * mechanism.
>   *
> - * TODO: multiqueue support is currently not implemented, even though
> - * macvtap is basically prepared for that. We will need to add this
> - * here as well as in virtio-net and qemu to get line rate on 10gbit
> - * adapters from a guest.
>   */
>  struct macvtap_queue {
>  	struct sock sk;
> @@ -1027,7 +1023,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  		return macvtap_ioctl_set_queue(file, u);
>  
>  	case TUNGETFEATURES:
> -		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
> +		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR |
> +			     IFF_MULTI_QUEUE, up))
>  			return -EFAULT;
>  		return 0;
>  
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 6/8] macvtap: eliminate linear search
  2013-06-06  9:54 ` [net-next PATCH 6/8] macvtap: eliminate linear search Jason Wang
@ 2013-06-06 11:06   ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06 11:06 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 05:54:38PM +0800, Jason Wang wrote:
> Linear search were used in both get_slot() and macvtap_get_queue(), this is
> because:
> 
> - macvtap didn't reshuffle the array of taps when create or destroy a queue, so
>   when adding a new queue, macvtap must do linear search to find a location for
>   the new queue. This will also complicate the TUNSETQUEUE implementation for
>   multiqueue API.
> - the queue itself didn't track the queue index, so the we must do a linear
>   search in the array to find the location of a existed queue.
> 
> The solution is straightforward: reshuffle the array and introduce a queue_index
> to macvtap_queue.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/macvtap.c |   64 +++++++++++++++---------------------------------
>  1 files changed, 20 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index d18130b..5ccba99 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -44,6 +44,7 @@ struct macvtap_queue {
>  	struct macvlan_dev __rcu *vlan;
>  	struct file *file;
>  	unsigned int flags;
> +	u16 queue_index;
>  };
>  
>  static struct proto macvtap_proto = {
> @@ -84,30 +85,10 @@ static const struct proto_ops macvtap_socket_ops;
>   */
>  static DEFINE_SPINLOCK(macvtap_lock);
>  
> -/*
> - * get_slot: return a [unused/occupied] slot in vlan->taps[]:
> - *	- if 'q' is NULL, return the first empty slot;
> - *	- otherwise, return the slot this pointer occupies.
> - */
> -static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
> -{
> -	int i;
> -
> -	for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
> -		if (rcu_dereference_protected(vlan->taps[i],
> -					      lockdep_is_held(&macvtap_lock)) == q)
> -			return i;
> -	}
> -
> -	/* Should never happen */
> -	BUG_ON(1);
> -}
> -
>  static int macvtap_set_queue(struct net_device *dev, struct file *file,
>  				struct macvtap_queue *q)
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
> -	int index;
>  	int err = -EBUSY;
>  
>  	spin_lock(&macvtap_lock);
> @@ -115,12 +96,12 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
>  		goto out;
>  
>  	err = 0;
> -	index = get_slot(vlan, NULL);
>  	rcu_assign_pointer(q->vlan, vlan);
> -	rcu_assign_pointer(vlan->taps[index], q);
> +	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
>  	sock_hold(&q->sk);
>  
>  	q->file = file;
> +	q->queue_index = vlan->numvtaps;
>  	file->private_data = q;
>  
>  	vlan->numvtaps++;
> @@ -140,15 +121,22 @@ out:
>   */
>  static void macvtap_put_queue(struct macvtap_queue *q)
>  {
> +	struct macvtap_queue *nq;
>  	struct macvlan_dev *vlan;
>  
>  	spin_lock(&macvtap_lock);
>  	vlan = rcu_dereference_protected(q->vlan,
>  					 lockdep_is_held(&macvtap_lock));
>  	if (vlan) {
> -		int index = get_slot(vlan, q);
> +		int index = q->queue_index;
> +		BUG_ON(index >= vlan->numvtaps);
> +
> +		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
> +					       lockdep_is_held(&macvtap_lock));
> +		rcu_assign_pointer(vlan->taps[index], nq);
> +		nq->queue_index = index;
>  
> -		RCU_INIT_POINTER(vlan->taps[index], NULL);
> +		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
>  		RCU_INIT_POINTER(q->vlan, NULL);
>  		sock_put(&q->sk);
>  		--vlan->numvtaps;
> @@ -182,8 +170,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
>  	rxq = skb_get_rxhash(skb);
>  	if (rxq) {
>  		tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
> -		if (tap)
> -			goto out;
> +		goto out;
>  	}
>  
>  	if (likely(skb_rx_queue_recorded(skb))) {
> @@ -193,17 +180,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
>  			rxq -= numvtaps;
>  
>  		tap = rcu_dereference(vlan->taps[rxq]);
> -		if (tap)
> -			goto out;
> -	}
> -
> -	/* Everything failed - find first available queue */
> -	for (rxq = 0; rxq < MAX_MACVTAP_QUEUES; rxq++) {
> -		tap = rcu_dereference(vlan->taps[rxq]);
> -		if (tap)
> -			break;
> +		goto out;
>  	}
>  
> +	tap = rcu_dereference(vlan->taps[0]);
>  out:
>  	return tap;
>  }
> @@ -219,19 +199,15 @@ static void macvtap_del_queues(struct net_device *dev)
>  	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
>  	int i, j = 0;
>  
> -	/* macvtap_put_queue can free some slots, so go through all slots */
>  	spin_lock(&macvtap_lock);
> -	for (i = 0; i < MAX_MACVTAP_QUEUES && vlan->numvtaps; i++) {
> +	for (i = 0; i < vlan->numvtaps; i++) {
>  		q = rcu_dereference_protected(vlan->taps[i],
>  					      lockdep_is_held(&macvtap_lock));
> -		if (q) {
> -			qlist[j++] = q;
> -			RCU_INIT_POINTER(vlan->taps[i], NULL);
> -			RCU_INIT_POINTER(q->vlan, NULL);
> -			vlan->numvtaps--;
> -		}
> +		BUG_ON(q == NULL);
> +		qlist[j++] = q;
> +		RCU_INIT_POINTER(vlan->taps[i], NULL);
> +		RCU_INIT_POINTER(q->vlan, NULL);
>  	}
> -	BUG_ON(vlan->numvtaps != 0);
>  	/* guarantee that any future macvtap_set_queue will fail */
>  	vlan->numvtaps = MAX_MACVTAP_QUEUES;
>  	spin_unlock(&macvtap_lock);
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl
  2013-06-06  9:54 ` [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
@ 2013-06-06 11:06   ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06 11:06 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 05:54:39PM +0800, Jason Wang wrote:
> This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
> enable a queue of macvtap. This is used to be compatible at API layer of tuntap
> to simplify the userspace to manage the queues. This is done through introducing
> a linked list to track all taps while using vlan->taps array to only track
> active taps.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/macvtap.c      |  135 +++++++++++++++++++++++++++++++++++++------
>  include/linux/if_macvlan.h |    4 +
>  2 files changed, 120 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 5ccba99..d2d1d55 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -45,6 +45,8 @@ struct macvtap_queue {
>  	struct file *file;
>  	unsigned int flags;
>  	u16 queue_index;
> +	bool enabled;
> +	struct list_head next;
>  };
>  
>  static struct proto macvtap_proto = {
> @@ -85,14 +87,36 @@ static const struct proto_ops macvtap_socket_ops;
>   */
>  static DEFINE_SPINLOCK(macvtap_lock);
>  
> -static int macvtap_set_queue(struct net_device *dev, struct file *file,
> +static int macvtap_enable_queue(struct net_device *dev, struct file *file,
>  				struct macvtap_queue *q)
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
> +	int err = -EINVAL;
> +
> +	spin_lock(&macvtap_lock);
> +
> +	if (q->enabled)
> +		goto out;
> +
> +	err = 0;
> +	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
> +	q->queue_index = vlan->numvtaps;
> +	q->enabled = true;
> +
> +	vlan->numvtaps++;
> +out:
> +	spin_unlock(&macvtap_lock);
> +	return err;
> +}
> +
> +static int macvtap_set_queue(struct net_device *dev, struct file *file,
> +			     struct macvtap_queue *q)
> +{
> +	struct macvlan_dev *vlan = netdev_priv(dev);
>  	int err = -EBUSY;
>  
>  	spin_lock(&macvtap_lock);
> -	if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
> +	if (vlan->numqueues == MAX_MACVTAP_QUEUES)
>  		goto out;
>  
>  	err = 0;
> @@ -102,15 +126,57 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
>  
>  	q->file = file;
>  	q->queue_index = vlan->numvtaps;
> +	q->enabled = true;
>  	file->private_data = q;
> +	list_add_tail(&q->next, &vlan->queue_list);
>  
>  	vlan->numvtaps++;
> +	vlan->numqueues++;
>  
>  out:
>  	spin_unlock(&macvtap_lock);
>  	return err;
>  }
>  
> +static int __macvtap_disable_queue(struct macvtap_queue *q)
> +{
> +	struct macvlan_dev *vlan;
> +	struct macvtap_queue *nq;
> +
> +	vlan = rcu_dereference_protected(q->vlan,
> +					 lockdep_is_held(&macvtap_lock));
> +
> +	if (!q->enabled)
> +		return -EINVAL;
> +
> +	if (vlan) {
> +		int index = q->queue_index;
> +		BUG_ON(index >= vlan->numvtaps);
> +		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
> +					       lockdep_is_held(&macvtap_lock));
> +		nq->queue_index = index;
> +
> +		rcu_assign_pointer(vlan->taps[index], nq);
> +		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
> +		q->enabled = false;
> +
> +		vlan->numvtaps--;
> +	}
> +
> +	return 0;
> +}
> +
> +static int macvtap_disable_queue(struct macvtap_queue *q)
> +{
> +	int err;
> +
> +	spin_lock(&macvtap_lock);
> +	err = __macvtap_disable_queue(q);
> +	spin_unlock(&macvtap_lock);
> +
> +	return err;
> +}
> +
>  /*
>   * The file owning the queue got closed, give up both
>   * the reference that the files holds as well as the
> @@ -121,25 +187,19 @@ out:
>   */
>  static void macvtap_put_queue(struct macvtap_queue *q)
>  {
> -	struct macvtap_queue *nq;
>  	struct macvlan_dev *vlan;
>  
>  	spin_lock(&macvtap_lock);
>  	vlan = rcu_dereference_protected(q->vlan,
>  					 lockdep_is_held(&macvtap_lock));
>  	if (vlan) {
> -		int index = q->queue_index;
> -		BUG_ON(index >= vlan->numvtaps);
> -
> -		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
> -					       lockdep_is_held(&macvtap_lock));
> -		rcu_assign_pointer(vlan->taps[index], nq);
> -		nq->queue_index = index;
> +		if (q->enabled)
> +			BUG_ON(__macvtap_disable_queue(q));
>  
> -		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
> +		vlan->numqueues--;
>  		RCU_INIT_POINTER(q->vlan, NULL);
>  		sock_put(&q->sk);
> -		--vlan->numvtaps;
> +		list_del_init(&q->next);
>  	}
>  
>  	spin_unlock(&macvtap_lock);
> @@ -160,6 +220,11 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
>  	struct macvtap_queue *tap = NULL;
> +	/* Access to taps array is protected by rcu, but access to numvtaps
> +	 * isn't. Below we use it to lookup a queue, but treat it as a hint
> +	 * and validate that the result isn't NULL - in case we are
> +	 * racing against queue removal.
> +	 */
>  	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
>  	__u32 rxq;
>  
> @@ -196,18 +261,22 @@ out:
>  static void macvtap_del_queues(struct net_device *dev)
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
> -	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
> +	struct macvtap_queue *q, *tmp, *qlist[MAX_MACVTAP_QUEUES];
>  	int i, j = 0;
>  
>  	spin_lock(&macvtap_lock);
> -	for (i = 0; i < vlan->numvtaps; i++) {
> -		q = rcu_dereference_protected(vlan->taps[i],
> -					      lockdep_is_held(&macvtap_lock));
> -		BUG_ON(q == NULL);
> +	list_for_each_entry_safe(q, tmp, &vlan->queue_list, next) {
> +		list_del_init(&q->next);
>  		qlist[j++] = q;
> -		RCU_INIT_POINTER(vlan->taps[i], NULL);
>  		RCU_INIT_POINTER(q->vlan, NULL);
> +		if (q->enabled)
> +			vlan->numvtaps--;
> +		vlan->numqueues--;
>  	}
> +	for (i = 0; i < vlan->numvtaps; i++)
> +		RCU_INIT_POINTER(vlan->taps[i], NULL);
> +	BUG_ON(vlan->numvtaps);
> +	BUG_ON(vlan->numqueues);
>  	/* guarantee that any future macvtap_set_queue will fail */
>  	vlan->numvtaps = MAX_MACVTAP_QUEUES;
>  	spin_unlock(&macvtap_lock);
> @@ -298,6 +367,9 @@ static int macvtap_newlink(struct net *src_net,
>  			   struct nlattr *tb[],
>  			   struct nlattr *data[])
>  {
> +	struct macvlan_dev *vlan = netdev_priv(dev);
> +	INIT_LIST_HEAD(&vlan->queue_list);
> +
>  	/* Don't put anything that may fail after macvlan_common_newlink
>  	 * because we can't undo what it does.
>  	 */
> @@ -887,6 +959,25 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
>  	dev_put(vlan->dev);
>  }
>  
> +static int macvtap_ioctl_set_queue(struct file *file, unsigned int flags)
> +{
> +	struct macvtap_queue *q = file->private_data;
> +	struct macvlan_dev *vlan;
> +	int ret;
> +
> +	vlan = macvtap_get_vlan(q);
> +	if (!vlan)
> +		return -EINVAL;
> +
> +	if (flags & IFF_ATTACH_QUEUE)
> +		ret = macvtap_enable_queue(vlan->dev, file, q);
> +	else if (flags & IFF_DETACH_QUEUE)
> +		ret = macvtap_disable_queue(q);
> +
> +	macvtap_put_vlan(vlan);
> +	return ret;
> +}
> +
>  /*
>   * provide compatibility with generic tun/tap interface
>   */
> @@ -910,7 +1001,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  			return -EFAULT;
>  
>  		ret = 0;
> -		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
> +		if ((u & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
> +		    (IFF_NO_PI | IFF_TAP))
>  			ret = -EINVAL;
>  		else
>  			q->flags = u;
> @@ -929,6 +1021,11 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  		macvtap_put_vlan(vlan);
>  		return ret;
>  
> +	case TUNSETQUEUE:
> +		if (get_user(u, &ifr->ifr_flags))
> +			return -EFAULT;
> +		return macvtap_ioctl_set_queue(file, u);
> +
>  	case TUNGETFEATURES:
>  		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
>  			return -EFAULT;
> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
> index 62d8bda..1912133 100644
> --- a/include/linux/if_macvlan.h
> +++ b/include/linux/if_macvlan.h
> @@ -69,8 +69,12 @@ struct macvlan_dev {
>  	u16			flags;
>  	int (*receive)(struct sk_buff *skb);
>  	int (*forward)(struct net_device *dev, struct sk_buff *skb);
> +	/* This array tracks active taps. */
>  	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
> +	/* This list tracks all taps (both enabled and disabled) */
> +	struct list_head	queue_list;
>  	int			numvtaps;
> +	int			numqueues;
>  	int			minor;
>  };
>  
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 0/8] Multiqueue API for macvtap
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (7 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 8/8] macvtap: enable multiqueue flag Jason Wang
@ 2013-06-08  6:49 ` David Miller
  8 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2013-06-08  6:49 UTC (permalink / raw)
  To: jasowang; +Cc: mst, netdev, linux-kernel, sergei.shtylyov

From: Jason Wang <jasowang@redhat.com>
Date: Thu,  6 Jun 2013 17:54:32 +0800

> This series implements a fully tuntap compatiable API which could be used
> by userspace to manage multiple macvtap queues. The main parts is to add
> TUNSETQUEUE ioctl support for macvtap.

Series applied, thanks everyone.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-06-08  6:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
2013-06-06  9:54 ` [net-next PATCH 1/8] macvtap: fix a possible race between queue selection and changing queues Jason Wang
2013-06-06  9:54 ` [net-next PATCH 2/8] macvtap: do not add self to waitqueue if doing a nonblock read Jason Wang
2013-06-06  9:54 ` [net-next PATCH 3/8] macvlan: switch to use IS_ENABLED() Jason Wang
2013-06-06  9:54 ` [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan() Jason Wang
2013-06-06 11:05   ` Michael S. Tsirkin
2013-06-06  9:54 ` [net-next PATCH 5/8] macvlan: change the max number of queues to 16 Jason Wang
2013-06-06 11:05   ` Michael S. Tsirkin
2013-06-06  9:54 ` [net-next PATCH 6/8] macvtap: eliminate linear search Jason Wang
2013-06-06 11:06   ` Michael S. Tsirkin
2013-06-06  9:54 ` [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
2013-06-06 11:06   ` Michael S. Tsirkin
2013-06-06  9:54 ` [net-next PATCH 8/8] macvtap: enable multiqueue flag Jason Wang
2013-06-06 11:05   ` Michael S. Tsirkin
2013-06-08  6:49 ` [net-next PATCH 0/8] Multiqueue API for macvtap David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).