linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next rfc V3 0/9] Multiqueue API for macvtap
@ 2013-06-05  6:36 Jason Wang
  2013-06-05  6:36 ` [net-next rfc V3 1/9] macvtap: fix a possible race between queue selection and changing queues Jason Wang
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Jason Wang @ 2013-06-05  6:36 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Hi all:

This series implements a v3 of fully tuntap compatiable API which could be used
by userspace to manage multiple macvtap queues. The main parts is to add
TUNSETQUEUE ioctl support for macvtap.

Patch 1 - 5 was some tuntap compatibility and misc cleanups.
Patch 6 removes the linear search in macvtap by reshuffling the macvtaps array
each time a queue is removed. After this, we could store both enabled and
disabled queues in the same array without introducing extra data structure.
Patch 7 let TUNSETIFF can create multiqueue device, nothing but some check
were added.
Patch 8 implement TUNSETQUEUE ioctl
Patch 9 reports IFF_MULTI_QUEUE to userspace to notify the userspace that the
multiqueue API is completed.

Flow caches implememtation were missed in this version, since I am doing
rework on the tuntap flow caches. Have some some stress test with both netperf
and pktgen.

Please review, thanks.

Changes from V2:
- Don't place disabled taps in the array and use a linked like to track all taps
- Remove unnecessary best-effort barriers
- Add ACCESS_ONCE() for vlan->numvtaps to prevent compiler generate unexpected
  codes
- Add a comment to explain that numvtaps is just a hint in macvtap_get_queue()
- blank line fixes

Changes from V1:
- Drop the patch of "return -EBADFD when TUNGETIFF fails" to preserve backward
  compatibility
- Add a small patch of optimizing macvtap_do_read() (see patch 1/8)
- Use different functions to do tap enabling and creating
- Use different functions to do tap disabling and closing
- Remove the emtpy line in patch 6/8
- Add a line in patch 3/8
- Add a comment to explain the checking when doing TUNSETIFF with
  IFF_MULTI_QUEUE
- Add comments to explian the second swapping when moving taps between disabled
  areas and enabled areas.
- Fix the commit log of patch 7/8 ( v1 says the linked list were used which is
  wrong )
- Other minor bug fixes

Jason Wang (9):
  macvtap: fix a possible race between queue selection and changing
    queues
  macvtap: do not add self to waitqueue if doing a nonblock read
  macvlan: switch to use IS_ENABLED()
  macvtap: introduce macvtap_get_vlan()
  macvlan: change the max number of queues to 16
  macvtap: eliminate linear search
  macvtap: allow TUNSETIFF to create multiqueue device
  macvtap: add TUNSETQUEUE ioctl
  macvtap: enable multiqueue flag

 drivers/net/macvtap.c      |  261 ++++++++++++++++++++++++++++++++------------
 include/linux/if_macvlan.h |    8 +-
 2 files changed, 197 insertions(+), 72 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [net-next rfc V3 1/9] macvtap: fix a possible race between queue selection and changing queues
  2013-06-05  6:36 [net-next rfc V3 0/9] Multiqueue API for macvtap Jason Wang
@ 2013-06-05  6:36 ` Jason Wang
  2013-06-05 10:35   ` Michael S. Tsirkin
  2013-06-05  6:36 ` [net-next rfc V3 2/9] macvtap: do not add self to waitqueue if doing a nonblock read Jason Wang
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2013-06-05  6:36 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Complier may generate codes that re-read the vlan->numvtaps during
macvtap_get_queue(). This may lead a race if vlan->numvtaps were changed in the
same time and which can lead unexpected result (e.g. very huge value).

We need prevent the compiler from generating such codes by adding an
ACCESS_ONCE() to make sure vlan->numvtaps were only read once.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 68efb91..5e485e3 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -172,7 +172,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct macvtap_queue *tap = NULL;
-	int numvtaps = vlan->numvtaps;
+	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
 	__u32 rxq;
 
 	if (!numvtaps)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next rfc V3 2/9] macvtap: do not add self to waitqueue if doing a nonblock read
  2013-06-05  6:36 [net-next rfc V3 0/9] Multiqueue API for macvtap Jason Wang
  2013-06-05  6:36 ` [net-next rfc V3 1/9] macvtap: fix a possible race between queue selection and changing queues Jason Wang
@ 2013-06-05  6:36 ` Jason Wang
  2013-06-05 11:07   ` Michael S. Tsirkin
  2013-06-05  6:36 ` [net-next rfc V3 3/9] macvlan: switch to use IS_ENABLED() Jason Wang
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2013-06-05  6:36 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

There's no need to add self to waitqueue if doing a nonblock read. This could
help to avoid the spinlock contention.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 5e485e3..8949631 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -845,7 +845,9 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
 	ssize_t ret = 0;
 
 	while (len) {
-		prepare_to_wait(sk_sleep(&q->sk), &wait, TASK_INTERRUPTIBLE);
+		if (!noblock)
+			prepare_to_wait(sk_sleep(&q->sk), &wait,
+					TASK_INTERRUPTIBLE);
 
 		/* Read frames from the queue */
 		skb = skb_dequeue(&q->sk.sk_receive_queue);
@@ -867,7 +869,8 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
 		break;
 	}
 
-	finish_wait(sk_sleep(&q->sk), &wait);
+	if (!noblock)
+		finish_wait(sk_sleep(&q->sk), &wait);
 	return ret;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next rfc V3 3/9] macvlan: switch to use IS_ENABLED()
  2013-06-05  6:36 [net-next rfc V3 0/9] Multiqueue API for macvtap Jason Wang
  2013-06-05  6:36 ` [net-next rfc V3 1/9] macvtap: fix a possible race between queue selection and changing queues Jason Wang
  2013-06-05  6:36 ` [net-next rfc V3 2/9] macvtap: do not add self to waitqueue if doing a nonblock read Jason Wang
@ 2013-06-05  6:36 ` Jason Wang
  2013-06-05  6:36 ` [net-next rfc V3 4/9] macvtap: introduce macvtap_get_vlan() Jason Wang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2013-06-05  6:36 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/if_macvlan.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 84dde1d..e47ad46 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -8,7 +8,7 @@
 #include <net/netlink.h>
 #include <linux/u64_stats_sync.h>
 
-#if defined(CONFIG_MACVTAP) || defined(CONFIG_MACVTAP_MODULE)
+#if IS_ENABLED(CONFIG_MACVTAP)
 struct socket *macvtap_get_socket(struct file *);
 #else
 #include <linux/err.h>
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next rfc V3 4/9] macvtap: introduce macvtap_get_vlan()
  2013-06-05  6:36 [net-next rfc V3 0/9] Multiqueue API for macvtap Jason Wang
                   ` (2 preceding siblings ...)
  2013-06-05  6:36 ` [net-next rfc V3 3/9] macvlan: switch to use IS_ENABLED() Jason Wang
@ 2013-06-05  6:36 ` Jason Wang
  2013-06-05  6:36 ` [net-next rfc V3 5/9] macvlan: change the max number of queues to 16 Jason Wang
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2013-06-05  6:36 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Factor out the device holding logic to a macvtap_get_vlan(), this will be also
used by multiqueue API.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |   27 ++++++++++++++++++++-------
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 8949631..d18130b 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -893,6 +893,24 @@ out:
 	return ret;
 }
 
+static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
+{
+	struct macvlan_dev *vlan;
+
+	rcu_read_lock_bh();
+	vlan = rcu_dereference_bh(q->vlan);
+	if (vlan)
+		dev_hold(vlan->dev);
+	rcu_read_unlock_bh();
+
+	return vlan;
+}
+
+static void macvtap_put_vlan(struct macvlan_dev *vlan)
+{
+	dev_put(vlan->dev);
+}
+
 /*
  * provide compatibility with generic tun/tap interface
  */
@@ -924,12 +942,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		return ret;
 
 	case TUNGETIFF:
-		rcu_read_lock_bh();
-		vlan = rcu_dereference_bh(q->vlan);
-		if (vlan)
-			dev_hold(vlan->dev);
-		rcu_read_unlock_bh();
-
+		vlan = macvtap_get_vlan(q);
 		if (!vlan)
 			return -ENOLINK;
 
@@ -937,7 +950,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		if (copy_to_user(&ifr->ifr_name, vlan->dev->name, IFNAMSIZ) ||
 		    put_user(q->flags, &ifr->ifr_flags))
 			ret = -EFAULT;
-		dev_put(vlan->dev);
+		macvtap_put_vlan(vlan);
 		return ret;
 
 	case TUNGETFEATURES:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next rfc V3 5/9] macvlan: change the max number of queues to 16
  2013-06-05  6:36 [net-next rfc V3 0/9] Multiqueue API for macvtap Jason Wang
                   ` (3 preceding siblings ...)
  2013-06-05  6:36 ` [net-next rfc V3 4/9] macvtap: introduce macvtap_get_vlan() Jason Wang
@ 2013-06-05  6:36 ` Jason Wang
  2013-06-05  6:36 ` [net-next rfc V3 6/9] macvtap: eliminate linear search Jason Wang
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2013-06-05  6:36 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Macvtap should be at least compatible with tap, so change the max number to 16.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/if_macvlan.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index e47ad46..62d8bda 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -50,7 +50,7 @@ struct macvlan_pcpu_stats {
  * Maximum times a macvtap device can be opened. This can be used to
  * configure the number of receive queue, e.g. for multiqueue virtio.
  */
-#define MAX_MACVTAP_QUEUES	(NR_CPUS < 16 ? NR_CPUS : 16)
+#define MAX_MACVTAP_QUEUES	16
 
 #define MACVLAN_MC_FILTER_BITS	8
 #define MACVLAN_MC_FILTER_SZ	(1 << MACVLAN_MC_FILTER_BITS)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next rfc V3 6/9] macvtap: eliminate linear search
  2013-06-05  6:36 [net-next rfc V3 0/9] Multiqueue API for macvtap Jason Wang
                   ` (4 preceding siblings ...)
  2013-06-05  6:36 ` [net-next rfc V3 5/9] macvlan: change the max number of queues to 16 Jason Wang
@ 2013-06-05  6:36 ` Jason Wang
  2013-06-05  6:36 ` [net-next rfc V3 7/9] macvtap: allow TUNSETIFF to create multiqueue device Jason Wang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2013-06-05  6:36 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Linear search were used in both get_slot() and macvtap_get_queue(), this is
because:

- macvtap didn't reshuffle the array of taps when create or destroy a queue, so
  when adding a new queue, macvtap must do linear search to find a location for
  the new queue. This will also complicate the TUNSETQUEUE implementation for
  multiqueue API.
- the queue itself didn't track the queue index, so the we must do a linear
  search in the array to find the location of a existed queue.

The solution is straightforward: reshuffle the array and introduce a queue_index
to macvtap_queue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |   64 +++++++++++++++---------------------------------
 1 files changed, 20 insertions(+), 44 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index d18130b..5ccba99 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -44,6 +44,7 @@ struct macvtap_queue {
 	struct macvlan_dev __rcu *vlan;
 	struct file *file;
 	unsigned int flags;
+	u16 queue_index;
 };
 
 static struct proto macvtap_proto = {
@@ -84,30 +85,10 @@ static const struct proto_ops macvtap_socket_ops;
  */
 static DEFINE_SPINLOCK(macvtap_lock);
 
-/*
- * get_slot: return a [unused/occupied] slot in vlan->taps[]:
- *	- if 'q' is NULL, return the first empty slot;
- *	- otherwise, return the slot this pointer occupies.
- */
-static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
-{
-	int i;
-
-	for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
-		if (rcu_dereference_protected(vlan->taps[i],
-					      lockdep_is_held(&macvtap_lock)) == q)
-			return i;
-	}
-
-	/* Should never happen */
-	BUG_ON(1);
-}
-
 static int macvtap_set_queue(struct net_device *dev, struct file *file,
 				struct macvtap_queue *q)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
-	int index;
 	int err = -EBUSY;
 
 	spin_lock(&macvtap_lock);
@@ -115,12 +96,12 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
 		goto out;
 
 	err = 0;
-	index = get_slot(vlan, NULL);
 	rcu_assign_pointer(q->vlan, vlan);
-	rcu_assign_pointer(vlan->taps[index], q);
+	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
 	sock_hold(&q->sk);
 
 	q->file = file;
+	q->queue_index = vlan->numvtaps;
 	file->private_data = q;
 
 	vlan->numvtaps++;
@@ -140,15 +121,22 @@ out:
  */
 static void macvtap_put_queue(struct macvtap_queue *q)
 {
+	struct macvtap_queue *nq;
 	struct macvlan_dev *vlan;
 
 	spin_lock(&macvtap_lock);
 	vlan = rcu_dereference_protected(q->vlan,
 					 lockdep_is_held(&macvtap_lock));
 	if (vlan) {
-		int index = get_slot(vlan, q);
+		int index = q->queue_index;
+		BUG_ON(index >= vlan->numvtaps);
+
+		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
+					       lockdep_is_held(&macvtap_lock));
+		rcu_assign_pointer(vlan->taps[index], nq);
+		nq->queue_index = index;
 
-		RCU_INIT_POINTER(vlan->taps[index], NULL);
+		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
 		RCU_INIT_POINTER(q->vlan, NULL);
 		sock_put(&q->sk);
 		--vlan->numvtaps;
@@ -182,8 +170,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 	rxq = skb_get_rxhash(skb);
 	if (rxq) {
 		tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
-		if (tap)
-			goto out;
+		goto out;
 	}
 
 	if (likely(skb_rx_queue_recorded(skb))) {
@@ -193,17 +180,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 			rxq -= numvtaps;
 
 		tap = rcu_dereference(vlan->taps[rxq]);
-		if (tap)
-			goto out;
-	}
-
-	/* Everything failed - find first available queue */
-	for (rxq = 0; rxq < MAX_MACVTAP_QUEUES; rxq++) {
-		tap = rcu_dereference(vlan->taps[rxq]);
-		if (tap)
-			break;
+		goto out;
 	}
 
+	tap = rcu_dereference(vlan->taps[0]);
 out:
 	return tap;
 }
@@ -219,19 +199,15 @@ static void macvtap_del_queues(struct net_device *dev)
 	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
 	int i, j = 0;
 
-	/* macvtap_put_queue can free some slots, so go through all slots */
 	spin_lock(&macvtap_lock);
-	for (i = 0; i < MAX_MACVTAP_QUEUES && vlan->numvtaps; i++) {
+	for (i = 0; i < vlan->numvtaps; i++) {
 		q = rcu_dereference_protected(vlan->taps[i],
 					      lockdep_is_held(&macvtap_lock));
-		if (q) {
-			qlist[j++] = q;
-			RCU_INIT_POINTER(vlan->taps[i], NULL);
-			RCU_INIT_POINTER(q->vlan, NULL);
-			vlan->numvtaps--;
-		}
+		BUG_ON(q == NULL);
+		qlist[j++] = q;
+		RCU_INIT_POINTER(vlan->taps[i], NULL);
+		RCU_INIT_POINTER(q->vlan, NULL);
 	}
-	BUG_ON(vlan->numvtaps != 0);
 	/* guarantee that any future macvtap_set_queue will fail */
 	vlan->numvtaps = MAX_MACVTAP_QUEUES;
 	spin_unlock(&macvtap_lock);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next rfc V3 7/9] macvtap: allow TUNSETIFF to create multiqueue device
  2013-06-05  6:36 [net-next rfc V3 0/9] Multiqueue API for macvtap Jason Wang
                   ` (5 preceding siblings ...)
  2013-06-05  6:36 ` [net-next rfc V3 6/9] macvtap: eliminate linear search Jason Wang
@ 2013-06-05  6:36 ` Jason Wang
  2013-06-05 10:43   ` Michael S. Tsirkin
  2013-06-05  6:36 ` [net-next rfc V3 8/9] macvtap: add TUNSETQUEUE ioctl Jason Wang
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2013-06-05  6:36 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Though the queue were in fact created by open(), we still need to add this check
to be compatible with tuntap which can let mgmt software use a single API to
manage queues. This patch only validates the device name and moves the TUNSETIFF
to a helper.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |   51 ++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 5ccba99..14764cc 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -869,6 +869,7 @@ out:
 	return ret;
 }
 
+
 static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
 {
 	struct macvlan_dev *vlan;
@@ -887,6 +888,44 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
 	dev_put(vlan->dev);
 }
 
+static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
+{
+	struct macvtap_queue *q = file->private_data;
+	struct net *net = current->nsproxy->net_ns;
+	struct inode *inode = file_inode(file);
+	struct net_device *dev, *dev2;
+	struct ifreq ifr;
+
+	if (copy_from_user(&ifr, ifr_u, sizeof(struct ifreq)))
+		return -EFAULT;
+
+	/* To keep the same behavior of tuntap, validate ifr_name */
+	if (ifr.ifr_flags & IFF_MULTI_QUEUE) {
+		dev = __dev_get_by_name(net, ifr.ifr_name);
+		if (!dev)
+			return -EINVAL;
+
+		dev2 = dev_get_by_macvtap_minor(iminor(inode));
+		if (!dev2)
+			return -EINVAL;
+
+		if (dev != dev2) {
+			dev_put(dev2);
+			return -EINVAL;
+		}
+
+		dev_put(dev2);
+	}
+
+	if ((ifr.ifr_flags & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
+	    (IFF_NO_PI | IFF_TAP))
+		return -EINVAL;
+	else
+		q->flags = ifr.ifr_flags;
+
+	return 0;
+}
+
 /*
  * provide compatibility with generic tun/tap interface
  */
@@ -905,17 +944,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 
 	switch (cmd) {
 	case TUNSETIFF:
-		/* ignore the name, just look at flags */
-		if (get_user(u, &ifr->ifr_flags))
-			return -EFAULT;
-
-		ret = 0;
-		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
-			ret = -EINVAL;
-		else
-			q->flags = u;
-
-		return ret;
+		return macvtap_set_iff(file, ifr);
 
 	case TUNGETIFF:
 		vlan = macvtap_get_vlan(q);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next rfc V3 8/9] macvtap: add TUNSETQUEUE ioctl
  2013-06-05  6:36 [net-next rfc V3 0/9] Multiqueue API for macvtap Jason Wang
                   ` (6 preceding siblings ...)
  2013-06-05  6:36 ` [net-next rfc V3 7/9] macvtap: allow TUNSETIFF to create multiqueue device Jason Wang
@ 2013-06-05  6:36 ` Jason Wang
  2013-06-05 10:59   ` Michael S. Tsirkin
  2013-06-05  6:36 ` [net-next rfc V3 9/9] macvtap: enable multiqueue flag Jason Wang
  2013-06-05 10:36 ` [net-next rfc V3 0/9] Multiqueue API for macvtap Michael S. Tsirkin
  9 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2013-06-05  6:36 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
enable a queue of macvtap. This is used to be compatible at API layer of tuntap
to simplify the userspace to manage the queues. This is done through introducing
a linked list to track all taps while using vlan->taps array to only track
active taps.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c      |  133 +++++++++++++++++++++++++++++++++++++++-----
 include/linux/if_macvlan.h |    4 +
 2 files changed, 122 insertions(+), 15 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 14764cc..355e6ad 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -45,6 +45,8 @@ struct macvtap_queue {
 	struct file *file;
 	unsigned int flags;
 	u16 queue_index;
+	bool enabled;
+	struct list_head next;
 };
 
 static struct proto macvtap_proto = {
@@ -85,14 +87,37 @@ static const struct proto_ops macvtap_socket_ops;
  */
 static DEFINE_SPINLOCK(macvtap_lock);
 
-static int macvtap_set_queue(struct net_device *dev, struct file *file,
+static int macvtap_enable_queue(struct net_device *dev, struct file *file,
 				struct macvtap_queue *q)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
+	int err = -EINVAL;
+
+	spin_lock(&macvtap_lock);
+
+	if (q->enabled)
+		goto out;
+
+	err = 0;
+	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
+	q->queue_index = vlan->numvtaps;
+	q->enabled = true;
+
+	vlan->numvtaps++;
+	vlan->numdisabled--;
+out:
+	spin_unlock(&macvtap_lock);
+	return err;
+}
+
+static int macvtap_set_queue(struct net_device *dev, struct file *file,
+			     struct macvtap_queue *q)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
 	int err = -EBUSY;
 
 	spin_lock(&macvtap_lock);
-	if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
+	if (vlan->numvtaps + vlan->numdisabled == MAX_MACVTAP_QUEUES)
 		goto out;
 
 	err = 0;
@@ -102,7 +127,9 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
 
 	q->file = file;
 	q->queue_index = vlan->numvtaps;
+	q->enabled = true;
 	file->private_data = q;
+	list_add_tail(&q->next, &vlan->tap_link);
 
 	vlan->numvtaps++;
 
@@ -111,6 +138,38 @@ out:
 	return err;
 }
 
+static int macvtap_disable_queue(struct macvtap_queue *q)
+{
+	struct macvlan_dev *vlan;
+	struct macvtap_queue *nq;
+	int err = 0;
+
+	spin_lock(&macvtap_lock);
+	vlan = rcu_dereference_protected(q->vlan,
+					 lockdep_is_held(&macvtap_lock));
+
+	if (!q->enabled) {
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (vlan) {
+		int index = q->queue_index;
+		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
+					       lockdep_is_held(&macvtap_lock));
+		nq->queue_index = index;
+
+		rcu_assign_pointer(vlan->taps[index], nq);
+		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
+		q->enabled = false;
+
+		vlan->numvtaps--;
+		vlan->numdisabled++;
+	}
+out:
+	spin_unlock(&macvtap_lock);
+	return err;
+}
 /*
  * The file owning the queue got closed, give up both
  * the reference that the files holds as well as the
@@ -128,18 +187,24 @@ static void macvtap_put_queue(struct macvtap_queue *q)
 	vlan = rcu_dereference_protected(q->vlan,
 					 lockdep_is_held(&macvtap_lock));
 	if (vlan) {
+		int numvtaps = vlan->numvtaps;
 		int index = q->queue_index;
-		BUG_ON(index >= vlan->numvtaps);
 
-		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
-					       lockdep_is_held(&macvtap_lock));
-		rcu_assign_pointer(vlan->taps[index], nq);
-		nq->queue_index = index;
+		if (q->enabled) {
+			BUG_ON(index >= vlan->numvtaps);
+			nq = rcu_dereference_protected(vlan->taps[numvtaps - 1],
+						lockdep_is_held(&macvtap_lock));
+			rcu_assign_pointer(vlan->taps[index], nq);
+			nq->queue_index = index;
+
+			RCU_INIT_POINTER(vlan->taps[numvtaps - 1], NULL);
+			vlan->numvtaps--;
+		} else
+			vlan->numdisabled--;
 
-		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
 		RCU_INIT_POINTER(q->vlan, NULL);
 		sock_put(&q->sk);
-		--vlan->numvtaps;
+		list_del_init(&q->next);
 	}
 
 	spin_unlock(&macvtap_lock);
@@ -160,6 +225,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct macvtap_queue *tap = NULL;
+	/* numvtaps here is just a hint, even if the number of active taps were
+	 * reduced in the same sime, since we are protected by rcu read lock,
+	 * we're safe here.
+	 */
 	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
 	__u32 rxq;
 
@@ -196,20 +265,25 @@ out:
 static void macvtap_del_queues(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
-	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
+	struct macvtap_queue *q, *tmp, *qlist[MAX_MACVTAP_QUEUES];
 	int i, j = 0;
 
 	spin_lock(&macvtap_lock);
-	for (i = 0; i < vlan->numvtaps; i++) {
-		q = rcu_dereference_protected(vlan->taps[i],
-					      lockdep_is_held(&macvtap_lock));
-		BUG_ON(q == NULL);
+	list_for_each_entry_safe(q, tmp, &vlan->tap_link, next) {
+		list_del_init(&q->next);
 		qlist[j++] = q;
-		RCU_INIT_POINTER(vlan->taps[i], NULL);
 		RCU_INIT_POINTER(q->vlan, NULL);
+		if (q->enabled)
+			vlan->numvtaps--;
+		else
+			vlan->numdisabled--;
 	}
+	for (i = 0; i < vlan->numvtaps; i++)
+		RCU_INIT_POINTER(vlan->taps[i], NULL);
+	BUG_ON(vlan->numvtaps + vlan->numdisabled != 0);
 	/* guarantee that any future macvtap_set_queue will fail */
 	vlan->numvtaps = MAX_MACVTAP_QUEUES;
+	vlan->numdisabled = 0;
 	spin_unlock(&macvtap_lock);
 
 	synchronize_rcu();
@@ -298,6 +372,9 @@ static int macvtap_newlink(struct net *src_net,
 			   struct nlattr *tb[],
 			   struct nlattr *data[])
 {
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	INIT_LIST_HEAD(&vlan->tap_link);
+
 	/* Don't put anything that may fail after macvlan_common_newlink
 	 * because we can't undo what it does.
 	 */
@@ -926,6 +1003,27 @@ static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
 	return 0;
 }
 
+static int macvtap_ioctl_set_queue(struct file *file, unsigned int flags)
+{
+	struct macvtap_queue *q = file->private_data;
+	struct macvlan_dev *vlan;
+	int ret = -EINVAL;
+
+	vlan = macvtap_get_vlan(q);
+	if (!vlan)
+		goto done;
+
+	if (flags & IFF_ATTACH_QUEUE)
+		ret = macvtap_enable_queue(vlan->dev, file, q);
+	else if (flags & IFF_DETACH_QUEUE)
+		ret = macvtap_disable_queue(q);
+
+	macvtap_put_vlan(vlan);
+
+done:
+	return ret;
+}
+
 /*
  * provide compatibility with generic tun/tap interface
  */
@@ -958,6 +1056,11 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		macvtap_put_vlan(vlan);
 		return ret;
 
+	case TUNSETQUEUE:
+		if (get_user(u, &ifr->ifr_flags))
+			return -EFAULT;
+		return macvtap_ioctl_set_queue(file, u);
+
 	case TUNGETFEATURES:
 		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
 			return -EFAULT;
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 62d8bda..87b9091 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -69,8 +69,12 @@ struct macvlan_dev {
 	u16			flags;
 	int (*receive)(struct sk_buff *skb);
 	int (*forward)(struct net_device *dev, struct sk_buff *skb);
+	/* This array tracks active taps. */
 	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
+	/* This list tracks all taps (both enabled and disabled) */
+	struct list_head	tap_link;
 	int			numvtaps;
+	int			numdisabled;
 	int			minor;
 };
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next rfc V3 9/9] macvtap: enable multiqueue flag
  2013-06-05  6:36 [net-next rfc V3 0/9] Multiqueue API for macvtap Jason Wang
                   ` (7 preceding siblings ...)
  2013-06-05  6:36 ` [net-next rfc V3 8/9] macvtap: add TUNSETQUEUE ioctl Jason Wang
@ 2013-06-05  6:36 ` Jason Wang
  2013-06-05 10:36 ` [net-next rfc V3 0/9] Multiqueue API for macvtap Michael S. Tsirkin
  9 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2013-06-05  6:36 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

To notify the userspace about our capability of multiqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 355e6ad..b907aee 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -31,10 +31,6 @@
  * macvtap_proto is used to allocate queues through the sock allocation
  * mechanism.
  *
- * TODO: multiqueue support is currently not implemented, even though
- * macvtap is basically prepared for that. We will need to add this
- * here as well as in virtio-net and qemu to get line rate on 10gbit
- * adapters from a guest.
  */
 struct macvtap_queue {
 	struct sock sk;
@@ -1062,7 +1058,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		return macvtap_ioctl_set_queue(file, u);
 
 	case TUNGETFEATURES:
-		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
+		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR |
+			     IFF_MULTI_QUEUE, up))
 			return -EFAULT;
 		return 0;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 1/9] macvtap: fix a possible race between queue selection and changing queues
  2013-06-05  6:36 ` [net-next rfc V3 1/9] macvtap: fix a possible race between queue selection and changing queues Jason Wang
@ 2013-06-05 10:35   ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2013-06-05 10:35 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Wed, Jun 05, 2013 at 02:36:24PM +0800, Jason Wang wrote:
> Complier may generate codes that re-read the vlan->numvtaps during
> macvtap_get_queue(). This may lead a race if vlan->numvtaps were changed in the
> same time and which can lead unexpected result (e.g. very huge value).
> 
> We need prevent the compiler from generating such codes by adding an
> ACCESS_ONCE() to make sure vlan->numvtaps were only read once.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I would add that this is a theoretical issue, so no need
for stable.

Acked-by: Michael S. Tsirkin <mst@redhat.com>


> ---
>  drivers/net/macvtap.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 68efb91..5e485e3 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -172,7 +172,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
>  	struct macvtap_queue *tap = NULL;
> -	int numvtaps = vlan->numvtaps;
> +	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
>  	__u32 rxq;
>  
>  	if (!numvtaps)
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 0/9] Multiqueue API for macvtap
  2013-06-05  6:36 [net-next rfc V3 0/9] Multiqueue API for macvtap Jason Wang
                   ` (8 preceding siblings ...)
  2013-06-05  6:36 ` [net-next rfc V3 9/9] macvtap: enable multiqueue flag Jason Wang
@ 2013-06-05 10:36 ` Michael S. Tsirkin
  2013-06-06  3:07   ` Jason Wang
  9 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2013-06-05 10:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Wed, Jun 05, 2013 at 02:36:23PM +0800, Jason Wang wrote:
> Hi all:
> 
> This series implements a v3 of fully tuntap compatiable API which could be used
> by userspace to manage multiple macvtap queues. The main parts is to add
> TUNSETQUEUE ioctl support for macvtap.
> 
> Patch 1 - 5 was some tuntap compatibility and misc cleanups.
> Patch 6 removes the linear search in macvtap by reshuffling the macvtaps array
> each time a queue is removed. After this, we could store both enabled and
> disabled queues in the same array without introducing extra data structure.
> Patch 7 let TUNSETIFF can create multiqueue device, nothing but some check
> were added.
> Patch 8 implement TUNSETQUEUE ioctl
> Patch 9 reports IFF_MULTI_QUEUE to userspace to notify the userspace that the
> multiqueue API is completed.
> 
> Flow caches implememtation were missed in this version, since I am doing
> rework on the tuntap flow caches. Have some some stress test with both netperf
> and pktgen.
> 
> Please review, thanks.

FYI by netdev rules RFC means "don't apply yet".

> Changes from V2:
> - Don't place disabled taps in the array and use a linked like to track all taps
> - Remove unnecessary best-effort barriers
> - Add ACCESS_ONCE() for vlan->numvtaps to prevent compiler generate unexpected
>   codes
> - Add a comment to explain that numvtaps is just a hint in macvtap_get_queue()
> - blank line fixes
> 
> Changes from V1:
> - Drop the patch of "return -EBADFD when TUNGETIFF fails" to preserve backward
>   compatibility
> - Add a small patch of optimizing macvtap_do_read() (see patch 1/8)
> - Use different functions to do tap enabling and creating
> - Use different functions to do tap disabling and closing
> - Remove the emtpy line in patch 6/8
> - Add a line in patch 3/8
> - Add a comment to explain the checking when doing TUNSETIFF with
>   IFF_MULTI_QUEUE
> - Add comments to explian the second swapping when moving taps between disabled
>   areas and enabled areas.
> - Fix the commit log of patch 7/8 ( v1 says the linked list were used which is
>   wrong )
> - Other minor bug fixes
> 
> Jason Wang (9):
>   macvtap: fix a possible race between queue selection and changing
>     queues
>   macvtap: do not add self to waitqueue if doing a nonblock read
>   macvlan: switch to use IS_ENABLED()
>   macvtap: introduce macvtap_get_vlan()
>   macvlan: change the max number of queues to 16
>   macvtap: eliminate linear search
>   macvtap: allow TUNSETIFF to create multiqueue device
>   macvtap: add TUNSETQUEUE ioctl
>   macvtap: enable multiqueue flag
> 
>  drivers/net/macvtap.c      |  261 ++++++++++++++++++++++++++++++++------------
>  include/linux/if_macvlan.h |    8 +-
>  2 files changed, 197 insertions(+), 72 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 7/9] macvtap: allow TUNSETIFF to create multiqueue device
  2013-06-05  6:36 ` [net-next rfc V3 7/9] macvtap: allow TUNSETIFF to create multiqueue device Jason Wang
@ 2013-06-05 10:43   ` Michael S. Tsirkin
  2013-06-06  3:13     ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2013-06-05 10:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Wed, Jun 05, 2013 at 02:36:30PM +0800, Jason Wang wrote:
> Though the queue were in fact created by open(), we still need to add this check
> to be compatible with tuntap which can let mgmt software use a single API to
> manage queues. This patch only validates the device name and moves the TUNSETIFF
> to a helper.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

The patch is OK, the description is confusing.
What you mean is simply:

	Allow IFF_MULTI_QUEUE in TUNSETIFF for macvtap, to match
	tun behaviour.

And if you put it like this, I would say make this
the last patch in the series, so userspace
can use IFF_MULTI_QUEUE to detect new versus old
behaviour.

> ---
>  drivers/net/macvtap.c |   51 ++++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 40 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 5ccba99..14764cc 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -869,6 +869,7 @@ out:
>  	return ret;
>  }
>  
> +
>  static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
>  {
>  	struct macvlan_dev *vlan;

Please don't.

> @@ -887,6 +888,44 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
>  	dev_put(vlan->dev);
>  }
>  
> +static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
> +{
> +	struct macvtap_queue *q = file->private_data;
> +	struct net *net = current->nsproxy->net_ns;
> +	struct inode *inode = file_inode(file);
> +	struct net_device *dev, *dev2;
> +	struct ifreq ifr;
> +
> +	if (copy_from_user(&ifr, ifr_u, sizeof(struct ifreq)))
> +		return -EFAULT;
> +
> +	/* To keep the same behavior of tuntap, validate ifr_name */

So I'm not sure - why is it important to validate ifr_name here?
We ignore the name for all other flags - why is IFF_MULTI_QUEUE
special?

> +	if (ifr.ifr_flags & IFF_MULTI_QUEUE) {
> +		dev = __dev_get_by_name(net, ifr.ifr_name);
> +		if (!dev)
> +			return -EINVAL;
> +
> +		dev2 = dev_get_by_macvtap_minor(iminor(inode));
> +		if (!dev2)
> +			return -EINVAL;
> +
> +		if (dev != dev2) {
> +			dev_put(dev2);
> +			return -EINVAL;
> +		}
> +
> +		dev_put(dev2);
> +	}
> +
> +	if ((ifr.ifr_flags & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
> +	    (IFF_NO_PI | IFF_TAP))
> +		return -EINVAL;
> +	else
> +		q->flags = ifr.ifr_flags;
> +
> +	return 0;
> +}
> +
>  /*
>   * provide compatibility with generic tun/tap interface
>   */
> @@ -905,17 +944,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  
>  	switch (cmd) {
>  	case TUNSETIFF:
> -		/* ignore the name, just look at flags */

This is actually a useful comment that you've removed.

> -		if (get_user(u, &ifr->ifr_flags))
> -			return -EFAULT;
> -
> -		ret = 0;
> -		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
> -			ret = -EINVAL;
> -		else
> -			q->flags = u;
> -
> -		return ret;
> +		return macvtap_set_iff(file, ifr);
>  
>  	case TUNGETIFF:
>  		vlan = macvtap_get_vlan(q);
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 8/9] macvtap: add TUNSETQUEUE ioctl
  2013-06-05  6:36 ` [net-next rfc V3 8/9] macvtap: add TUNSETQUEUE ioctl Jason Wang
@ 2013-06-05 10:59   ` Michael S. Tsirkin
  2013-06-06  3:16     ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2013-06-05 10:59 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Wed, Jun 05, 2013 at 02:36:31PM +0800, Jason Wang wrote:
> This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
> enable a queue of macvtap. This is used to be compatible at API layer of tuntap
> to simplify the userspace to manage the queues. This is done through introducing
> a linked list to track all taps while using vlan->taps array to only track
> active taps.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/macvtap.c      |  133 +++++++++++++++++++++++++++++++++++++++-----
>  include/linux/if_macvlan.h |    4 +
>  2 files changed, 122 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 14764cc..355e6ad 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -45,6 +45,8 @@ struct macvtap_queue {
>  	struct file *file;
>  	unsigned int flags;
>  	u16 queue_index;
> +	bool enabled;
> +	struct list_head next;
>  };
>  
>  static struct proto macvtap_proto = {
> @@ -85,14 +87,37 @@ static const struct proto_ops macvtap_socket_ops;
>   */
>  static DEFINE_SPINLOCK(macvtap_lock);
>  
> -static int macvtap_set_queue(struct net_device *dev, struct file *file,
> +static int macvtap_enable_queue(struct net_device *dev, struct file *file,
>  				struct macvtap_queue *q)
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
> +	int err = -EINVAL;
> +
> +	spin_lock(&macvtap_lock);
> +
> +	if (q->enabled)
> +		goto out;
> +
> +	err = 0;
> +	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
> +	q->queue_index = vlan->numvtaps;
> +	q->enabled = true;
> +
> +	vlan->numvtaps++;
> +	vlan->numdisabled--;
> +out:
> +	spin_unlock(&macvtap_lock);
> +	return err;
> +}
> +
> +static int macvtap_set_queue(struct net_device *dev, struct file *file,
> +			     struct macvtap_queue *q)
> +{
> +	struct macvlan_dev *vlan = netdev_priv(dev);
>  	int err = -EBUSY;
>  
>  	spin_lock(&macvtap_lock);
> -	if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
> +	if (vlan->numvtaps + vlan->numdisabled == MAX_MACVTAP_QUEUES)
>  		goto out;
>  
>  	err = 0;
> @@ -102,7 +127,9 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
>  
>  	q->file = file;
>  	q->queue_index = vlan->numvtaps;
> +	q->enabled = true;
>  	file->private_data = q;
> +	list_add_tail(&q->next, &vlan->tap_link);
>  
>  	vlan->numvtaps++;
>  
> @@ -111,6 +138,38 @@ out:
>  	return err;
>  }
>  
> +static int macvtap_disable_queue(struct macvtap_queue *q)
> +{
> +	struct macvlan_dev *vlan;
> +	struct macvtap_queue *nq;
> +	int err = 0;
> +
> +	spin_lock(&macvtap_lock);
> +	vlan = rcu_dereference_protected(q->vlan,
> +					 lockdep_is_held(&macvtap_lock));
> +
> +	if (!q->enabled) {
> +		err = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (vlan) {
> +		int index = q->queue_index;
> +		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
> +					       lockdep_is_held(&macvtap_lock));
> +		nq->queue_index = index;
> +
> +		rcu_assign_pointer(vlan->taps[index], nq);
> +		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
> +		q->enabled = false;
> +
> +		vlan->numvtaps--;
> +		vlan->numdisabled++;
> +	}
> +out:
> +	spin_unlock(&macvtap_lock);
> +	return err;
> +}
>  /*
>   * The file owning the queue got closed, give up both
>   * the reference that the files holds as well as the
> @@ -128,18 +187,24 @@ static void macvtap_put_queue(struct macvtap_queue *q)
>  	vlan = rcu_dereference_protected(q->vlan,
>  					 lockdep_is_held(&macvtap_lock));
>  	if (vlan) {
> +		int numvtaps = vlan->numvtaps;
>  		int index = q->queue_index;
> -		BUG_ON(index >= vlan->numvtaps);
>  
> -		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
> -					       lockdep_is_held(&macvtap_lock));
> -		rcu_assign_pointer(vlan->taps[index], nq);
> -		nq->queue_index = index;
> +		if (q->enabled) {
> +			BUG_ON(index >= vlan->numvtaps);
> +			nq = rcu_dereference_protected(vlan->taps[numvtaps - 1],
> +						lockdep_is_held(&macvtap_lock));
> +			rcu_assign_pointer(vlan->taps[index], nq);

Do we really need these tricks?
Can't we call macvtap_disable_queue and then only handle disable queues
here?

> +			nq->queue_index = index;
> +
> +			RCU_INIT_POINTER(vlan->taps[numvtaps - 1], NULL);
> +			vlan->numvtaps--;
> +		} else
> +			vlan->numdisabled--;
>  
> -		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
>  		RCU_INIT_POINTER(q->vlan, NULL);
>  		sock_put(&q->sk);
> -		--vlan->numvtaps;
> +		list_del_init(&q->next);
>  	}
>  
>  	spin_unlock(&macvtap_lock);
> @@ -160,6 +225,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
>  	struct macvtap_queue *tap = NULL;
> +	/* numvtaps here is just a hint, even if the number of active taps were
> +	 * reduced in the same sime,

typo

> since we are protected by rcu read lock,
> +	 * we're safe here.

I don't really understand what is this comment trying to say.
What is protected? What is safe? safe time as what?

> +	 */
>  	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
>  	__u32 rxq;
>  
> @@ -196,20 +265,25 @@ out:
>  static void macvtap_del_queues(struct net_device *dev)
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
> -	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
> +	struct macvtap_queue *q, *tmp, *qlist[MAX_MACVTAP_QUEUES];
>  	int i, j = 0;
>  
>  	spin_lock(&macvtap_lock);
> -	for (i = 0; i < vlan->numvtaps; i++) {
> -		q = rcu_dereference_protected(vlan->taps[i],
> -					      lockdep_is_held(&macvtap_lock));
> -		BUG_ON(q == NULL);
> +	list_for_each_entry_safe(q, tmp, &vlan->tap_link, next) {
> +		list_del_init(&q->next);
>  		qlist[j++] = q;
> -		RCU_INIT_POINTER(vlan->taps[i], NULL);
>  		RCU_INIT_POINTER(q->vlan, NULL);
> +		if (q->enabled)
> +			vlan->numvtaps--;
> +		else
> +			vlan->numdisabled--;
>  	}
> +	for (i = 0; i < vlan->numvtaps; i++)
> +		RCU_INIT_POINTER(vlan->taps[i], NULL);
> +	BUG_ON(vlan->numvtaps + vlan->numdisabled != 0);

Better two BUG_ON checks - they are both 0 right?
And no need for != 0.

>  	/* guarantee that any future macvtap_set_queue will fail */
>  	vlan->numvtaps = MAX_MACVTAP_QUEUES;
> +	vlan->numdisabled = 0;

What's this doing? We know it's 0 ...

>  	spin_unlock(&macvtap_lock);
>  
>  	synchronize_rcu();
> @@ -298,6 +372,9 @@ static int macvtap_newlink(struct net *src_net,
>  			   struct nlattr *tb[],
>  			   struct nlattr *data[])
>  {
> +	struct macvlan_dev *vlan = netdev_priv(dev);
> +	INIT_LIST_HEAD(&vlan->tap_link);
> +
>  	/* Don't put anything that may fail after macvlan_common_newlink
>  	 * because we can't undo what it does.
>  	 */
> @@ -926,6 +1003,27 @@ static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
>  	return 0;
>  }
>  
> +static int macvtap_ioctl_set_queue(struct file *file, unsigned int flags)
> +{
> +	struct macvtap_queue *q = file->private_data;
> +	struct macvlan_dev *vlan;
> +	int ret = -EINVAL;
> +
> +	vlan = macvtap_get_vlan(q);
> +	if (!vlan)
> +		goto done;

Can be just return -EINVAL, and you won't need to initialize
above.

> +
> +	if (flags & IFF_ATTACH_QUEUE)
> +		ret = macvtap_enable_queue(vlan->dev, file, q);
> +	else if (flags & IFF_DETACH_QUEUE)
> +		ret = macvtap_disable_queue(q);
> +
> +	macvtap_put_vlan(vlan);
> +
> +done:
> +	return ret;
> +}
> +
>  /*
>   * provide compatibility with generic tun/tap interface
>   */
> @@ -958,6 +1056,11 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  		macvtap_put_vlan(vlan);
>  		return ret;
>  
> +	case TUNSETQUEUE:
> +		if (get_user(u, &ifr->ifr_flags))
> +			return -EFAULT;
> +		return macvtap_ioctl_set_queue(file, u);
> +
>  	case TUNGETFEATURES:
>  		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
>  			return -EFAULT;
> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
> index 62d8bda..87b9091 100644
> --- a/include/linux/if_macvlan.h
> +++ b/include/linux/if_macvlan.h
> @@ -69,8 +69,12 @@ struct macvlan_dev {
>  	u16			flags;
>  	int (*receive)(struct sk_buff *skb);
>  	int (*forward)(struct net_device *dev, struct sk_buff *skb);
> +	/* This array tracks active taps. */
>  	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
> +	/* This list tracks all taps (both enabled and disabled) */
> +	struct list_head	tap_link;

Bad name, e.g. queue_list would be better.

>  	int			numvtaps;
> +	int			numdisabled;

So numvtaps is number of entries in the taps array.
But to get items on list, you must add numdisabled and
numvtaps, that's confusing.
Instead of numdisabled, let's add a counter for
number of entries in the tap_link list.

How about numqueues?

>  	int			minor;
>  };
>  
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 2/9] macvtap: do not add self to waitqueue if doing a nonblock read
  2013-06-05  6:36 ` [net-next rfc V3 2/9] macvtap: do not add self to waitqueue if doing a nonblock read Jason Wang
@ 2013-06-05 11:07   ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2013-06-05 11:07 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Wed, Jun 05, 2013 at 02:36:25PM +0800, Jason Wang wrote:
> There's no need to add self to waitqueue if doing a nonblock read. This could
> help to avoid the spinlock contention.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/macvtap.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 5e485e3..8949631 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -845,7 +845,9 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
>  	ssize_t ret = 0;
>  
>  	while (len) {
> -		prepare_to_wait(sk_sleep(&q->sk), &wait, TASK_INTERRUPTIBLE);
> +		if (!noblock)
> +			prepare_to_wait(sk_sleep(&q->sk), &wait,
> +					TASK_INTERRUPTIBLE);
>  
>  		/* Read frames from the queue */
>  		skb = skb_dequeue(&q->sk.sk_receive_queue);
> @@ -867,7 +869,8 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
>  		break;
>  	}
>  
> -	finish_wait(sk_sleep(&q->sk), &wait);
> +	if (!noblock)
> +		finish_wait(sk_sleep(&q->sk), &wait);
>  	return ret;
>  }
>  
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 0/9] Multiqueue API for macvtap
  2013-06-05 10:36 ` [net-next rfc V3 0/9] Multiqueue API for macvtap Michael S. Tsirkin
@ 2013-06-06  3:07   ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2013-06-06  3:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On 06/05/2013 06:36 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 05, 2013 at 02:36:23PM +0800, Jason Wang wrote:
>> > Hi all:
>> > 
>> > This series implements a v3 of fully tuntap compatiable API which could be used
>> > by userspace to manage multiple macvtap queues. The main parts is to add
>> > TUNSETQUEUE ioctl support for macvtap.
>> > 
>> > Patch 1 - 5 was some tuntap compatibility and misc cleanups.
>> > Patch 6 removes the linear search in macvtap by reshuffling the macvtaps array
>> > each time a queue is removed. After this, we could store both enabled and
>> > disabled queues in the same array without introducing extra data structure.
>> > Patch 7 let TUNSETIFF can create multiqueue device, nothing but some check
>> > were added.
>> > Patch 8 implement TUNSETQUEUE ioctl
>> > Patch 9 reports IFF_MULTI_QUEUE to userspace to notify the userspace that the
>> > multiqueue API is completed.
>> > 
>> > Flow caches implememtation were missed in this version, since I am doing
>> > rework on the tuntap flow caches. Have some some stress test with both netperf
>> > and pktgen.
>> > 
>> > Please review, thanks.
> FYI by netdev rules RFC means "don't apply yet".
>

Ok, let me remove the rfc in next series.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 7/9] macvtap: allow TUNSETIFF to create multiqueue device
  2013-06-05 10:43   ` Michael S. Tsirkin
@ 2013-06-06  3:13     ` Jason Wang
  2013-06-06  6:59       ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2013-06-06  3:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On 06/05/2013 06:43 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 05, 2013 at 02:36:30PM +0800, Jason Wang wrote:
>> Though the queue were in fact created by open(), we still need to add this check
>> to be compatible with tuntap which can let mgmt software use a single API to
>> manage queues. This patch only validates the device name and moves the TUNSETIFF
>> to a helper.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> The patch is OK, the description is confusing.
> What you mean is simply:
>
> 	Allow IFF_MULTI_QUEUE in TUNSETIFF for macvtap, to match
> 	tun behaviour.
>
> And if you put it like this, I would say make this
> the last patch in the series, so userspace
> can use IFF_MULTI_QUEUE to detect new versus old
> behaviour.

Make sense, thanks.
>
>> ---
>>  drivers/net/macvtap.c |   51 ++++++++++++++++++++++++++++++++++++++----------
>>  1 files changed, 40 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index 5ccba99..14764cc 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -869,6 +869,7 @@ out:
>>  	return ret;
>>  }
>>  
>> +
>>  static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
>>  {
>>  	struct macvlan_dev *vlan;
> Please don't.

Ok.
>
>> @@ -887,6 +888,44 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
>>  	dev_put(vlan->dev);
>>  }
>>  
>> +static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
>> +{
>> +	struct macvtap_queue *q = file->private_data;
>> +	struct net *net = current->nsproxy->net_ns;
>> +	struct inode *inode = file_inode(file);
>> +	struct net_device *dev, *dev2;
>> +	struct ifreq ifr;
>> +
>> +	if (copy_from_user(&ifr, ifr_u, sizeof(struct ifreq)))
>> +		return -EFAULT;
>> +
>> +	/* To keep the same behavior of tuntap, validate ifr_name */
> So I'm not sure - why is it important to validate ifr_name here?
> We ignore the name for all other flags - why is IFF_MULTI_QUEUE
> special?

It raises another question, why not validate ifname like tuntap? We
should warn userspace about their error, otherwise they may create
queues on the wrong device. In fact I want validate for both, but keep
the behaviour w/o IFF_MULTI_QUEUE for backward compatibility.
>
>> +	if (ifr.ifr_flags & IFF_MULTI_QUEUE) {
>> +		dev = __dev_get_by_name(net, ifr.ifr_name);
>> +		if (!dev)
>> +			return -EINVAL;
>> +
>> +		dev2 = dev_get_by_macvtap_minor(iminor(inode));
>> +		if (!dev2)
>> +			return -EINVAL;
>> +
>> +		if (dev != dev2) {
>> +			dev_put(dev2);
>> +			return -EINVAL;
>> +		}
>> +
>> +		dev_put(dev2);
>> +	}
>> +
>> +	if ((ifr.ifr_flags & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
>> +	    (IFF_NO_PI | IFF_TAP))
>> +		return -EINVAL;
>> +	else
>> +		q->flags = ifr.ifr_flags;
>> +
>> +	return 0;
>> +}
>> +
>>  /*
>>   * provide compatibility with generic tun/tap interface
>>   */
>> @@ -905,17 +944,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>>  
>>  	switch (cmd) {
>>  	case TUNSETIFF:
>> -		/* ignore the name, just look at flags */
> This is actually a useful comment that you've removed.

Will get it back.
>
>> -		if (get_user(u, &ifr->ifr_flags))
>> -			return -EFAULT;
>> -
>> -		ret = 0;
>> -		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
>> -			ret = -EINVAL;
>> -		else
>> -			q->flags = u;
>> -
>> -		return ret;
>> +		return macvtap_set_iff(file, ifr);
>>  
>>  	case TUNGETIFF:
>>  		vlan = macvtap_get_vlan(q);
>> -- 
>> 1.7.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 8/9] macvtap: add TUNSETQUEUE ioctl
  2013-06-05 10:59   ` Michael S. Tsirkin
@ 2013-06-06  3:16     ` Jason Wang
  2013-06-06  7:23       ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2013-06-06  3:16 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On 06/05/2013 06:59 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 05, 2013 at 02:36:31PM +0800, Jason Wang wrote:
>> This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
>> enable a queue of macvtap. This is used to be compatible at API layer of tuntap
>> to simplify the userspace to manage the queues. This is done through introducing
>> a linked list to track all taps while using vlan->taps array to only track
>> active taps.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  drivers/net/macvtap.c      |  133 +++++++++++++++++++++++++++++++++++++++-----
>>  include/linux/if_macvlan.h |    4 +
>>  2 files changed, 122 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index 14764cc..355e6ad 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -45,6 +45,8 @@ struct macvtap_queue {
>>  	struct file *file;
>>  	unsigned int flags;
>>  	u16 queue_index;
>> +	bool enabled;
>> +	struct list_head next;
>>  };
>>  
>>  static struct proto macvtap_proto = {
>> @@ -85,14 +87,37 @@ static const struct proto_ops macvtap_socket_ops;
>>   */
>>  static DEFINE_SPINLOCK(macvtap_lock);
>>  
>> -static int macvtap_set_queue(struct net_device *dev, struct file *file,
>> +static int macvtap_enable_queue(struct net_device *dev, struct file *file,
>>  				struct macvtap_queue *q)
>>  {
>>  	struct macvlan_dev *vlan = netdev_priv(dev);
>> +	int err = -EINVAL;
>> +
>> +	spin_lock(&macvtap_lock);
>> +
>> +	if (q->enabled)
>> +		goto out;
>> +
>> +	err = 0;
>> +	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
>> +	q->queue_index = vlan->numvtaps;
>> +	q->enabled = true;
>> +
>> +	vlan->numvtaps++;
>> +	vlan->numdisabled--;
>> +out:
>> +	spin_unlock(&macvtap_lock);
>> +	return err;
>> +}
>> +
>> +static int macvtap_set_queue(struct net_device *dev, struct file *file,
>> +			     struct macvtap_queue *q)
>> +{
>> +	struct macvlan_dev *vlan = netdev_priv(dev);
>>  	int err = -EBUSY;
>>  
>>  	spin_lock(&macvtap_lock);
>> -	if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
>> +	if (vlan->numvtaps + vlan->numdisabled == MAX_MACVTAP_QUEUES)
>>  		goto out;
>>  
>>  	err = 0;
>> @@ -102,7 +127,9 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
>>  
>>  	q->file = file;
>>  	q->queue_index = vlan->numvtaps;
>> +	q->enabled = true;
>>  	file->private_data = q;
>> +	list_add_tail(&q->next, &vlan->tap_link);
>>  
>>  	vlan->numvtaps++;
>>  
>> @@ -111,6 +138,38 @@ out:
>>  	return err;
>>  }
>>  
>> +static int macvtap_disable_queue(struct macvtap_queue *q)
>> +{
>> +	struct macvlan_dev *vlan;
>> +	struct macvtap_queue *nq;
>> +	int err = 0;
>> +
>> +	spin_lock(&macvtap_lock);
>> +	vlan = rcu_dereference_protected(q->vlan,
>> +					 lockdep_is_held(&macvtap_lock));
>> +
>> +	if (!q->enabled) {
>> +		err = -EINVAL;
>> +		goto out;
>> +	}
>> +
>> +	if (vlan) {
>> +		int index = q->queue_index;
>> +		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
>> +					       lockdep_is_held(&macvtap_lock));
>> +		nq->queue_index = index;
>> +
>> +		rcu_assign_pointer(vlan->taps[index], nq);
>> +		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
>> +		q->enabled = false;
>> +
>> +		vlan->numvtaps--;
>> +		vlan->numdisabled++;
>> +	}
>> +out:
>> +	spin_unlock(&macvtap_lock);
>> +	return err;
>> +}
>>  /*
>>   * The file owning the queue got closed, give up both
>>   * the reference that the files holds as well as the
>> @@ -128,18 +187,24 @@ static void macvtap_put_queue(struct macvtap_queue *q)
>>  	vlan = rcu_dereference_protected(q->vlan,
>>  					 lockdep_is_held(&macvtap_lock));
>>  	if (vlan) {
>> +		int numvtaps = vlan->numvtaps;
>>  		int index = q->queue_index;
>> -		BUG_ON(index >= vlan->numvtaps);
>>  
>> -		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
>> -					       lockdep_is_held(&macvtap_lock));
>> -		rcu_assign_pointer(vlan->taps[index], nq);
>> -		nq->queue_index = index;
>> +		if (q->enabled) {
>> +			BUG_ON(index >= vlan->numvtaps);
>> +			nq = rcu_dereference_protected(vlan->taps[numvtaps - 1],
>> +						lockdep_is_held(&macvtap_lock));
>> +			rcu_assign_pointer(vlan->taps[index], nq);
> Do we really need these tricks?
> Can't we call macvtap_disable_queue and then only handle disable queues
> here?

We could, will do it.
>
>> +			nq->queue_index = index;
>> +
>> +			RCU_INIT_POINTER(vlan->taps[numvtaps - 1], NULL);
>> +			vlan->numvtaps--;
>> +		} else
>> +			vlan->numdisabled--;
>>  
>> -		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
>>  		RCU_INIT_POINTER(q->vlan, NULL);
>>  		sock_put(&q->sk);
>> -		--vlan->numvtaps;
>> +		list_del_init(&q->next);
>>  	}
>>  
>>  	spin_unlock(&macvtap_lock);
>> @@ -160,6 +225,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
>>  {
>>  	struct macvlan_dev *vlan = netdev_priv(dev);
>>  	struct macvtap_queue *tap = NULL;
>> +	/* numvtaps here is just a hint, even if the number of active taps were
>> +	 * reduced in the same sime,
> typo

Will correct it.
>
>> since we are protected by rcu read lock,
>> +	 * we're safe here.
> I don't really understand what is this comment trying to say.
> What is protected? What is safe? safe time as what?

Will make it clear as:

"numvtaps is just a hint, even if the number of active taps were reduced
in the same tap, since the tap pointers were protected by rcu read lock,
they are safe even if some of them were being removed"

>
>> +	 */
>>  	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
>>  	__u32 rxq;
>>  
>> @@ -196,20 +265,25 @@ out:
>>  static void macvtap_del_queues(struct net_device *dev)
>>  {
>>  	struct macvlan_dev *vlan = netdev_priv(dev);
>> -	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
>> +	struct macvtap_queue *q, *tmp, *qlist[MAX_MACVTAP_QUEUES];
>>  	int i, j = 0;
>>  
>>  	spin_lock(&macvtap_lock);
>> -	for (i = 0; i < vlan->numvtaps; i++) {
>> -		q = rcu_dereference_protected(vlan->taps[i],
>> -					      lockdep_is_held(&macvtap_lock));
>> -		BUG_ON(q == NULL);
>> +	list_for_each_entry_safe(q, tmp, &vlan->tap_link, next) {
>> +		list_del_init(&q->next);
>>  		qlist[j++] = q;
>> -		RCU_INIT_POINTER(vlan->taps[i], NULL);
>>  		RCU_INIT_POINTER(q->vlan, NULL);
>> +		if (q->enabled)
>> +			vlan->numvtaps--;
>> +		else
>> +			vlan->numdisabled--;
>>  	}
>> +	for (i = 0; i < vlan->numvtaps; i++)
>> +		RCU_INIT_POINTER(vlan->taps[i], NULL);
>> +	BUG_ON(vlan->numvtaps + vlan->numdisabled != 0);
> Better two BUG_ON checks - they are both 0 right?
> And no need for != 0.

Sure.
>
>>  	/* guarantee that any future macvtap_set_queue will fail */
>>  	vlan->numvtaps = MAX_MACVTAP_QUEUES;
>> +	vlan->numdisabled = 0;
> What's this doing? We know it's 0 ...

Yes, will remove it.
>>  	spin_unlock(&macvtap_lock);
>>  
>>  	synchronize_rcu();
>> @@ -298,6 +372,9 @@ static int macvtap_newlink(struct net *src_net,
>>  			   struct nlattr *tb[],
>>  			   struct nlattr *data[])
>>  {
>> +	struct macvlan_dev *vlan = netdev_priv(dev);
>> +	INIT_LIST_HEAD(&vlan->tap_link);
>> +
>>  	/* Don't put anything that may fail after macvlan_common_newlink
>>  	 * because we can't undo what it does.
>>  	 */
>> @@ -926,6 +1003,27 @@ static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
>>  	return 0;
>>  }
>>  
>> +static int macvtap_ioctl_set_queue(struct file *file, unsigned int flags)
>> +{
>> +	struct macvtap_queue *q = file->private_data;
>> +	struct macvlan_dev *vlan;
>> +	int ret = -EINVAL;
>> +
>> +	vlan = macvtap_get_vlan(q);
>> +	if (!vlan)
>> +		goto done;
> Can be just return -EINVAL, and you won't need to initialize
> above.

Ok.
>
>> +
>> +	if (flags & IFF_ATTACH_QUEUE)
>> +		ret = macvtap_enable_queue(vlan->dev, file, q);
>> +	else if (flags & IFF_DETACH_QUEUE)
>> +		ret = macvtap_disable_queue(q);
>> +
>> +	macvtap_put_vlan(vlan);
>> +
>> +done:
>> +	return ret;
>> +}
>> +
>>  /*
>>   * provide compatibility with generic tun/tap interface
>>   */
>> @@ -958,6 +1056,11 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>>  		macvtap_put_vlan(vlan);
>>  		return ret;
>>  
>> +	case TUNSETQUEUE:
>> +		if (get_user(u, &ifr->ifr_flags))
>> +			return -EFAULT;
>> +		return macvtap_ioctl_set_queue(file, u);
>> +
>>  	case TUNGETFEATURES:
>>  		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
>>  			return -EFAULT;
>> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
>> index 62d8bda..87b9091 100644
>> --- a/include/linux/if_macvlan.h
>> +++ b/include/linux/if_macvlan.h
>> @@ -69,8 +69,12 @@ struct macvlan_dev {
>>  	u16			flags;
>>  	int (*receive)(struct sk_buff *skb);
>>  	int (*forward)(struct net_device *dev, struct sk_buff *skb);
>> +	/* This array tracks active taps. */
>>  	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
>> +	/* This list tracks all taps (both enabled and disabled) */
>> +	struct list_head	tap_link;
> Bad name, e.g. queue_list would be better.

Ok.
>>  	int			numvtaps;
>> +	int			numdisabled;
> So numvtaps is number of entries in the taps array.
> But to get items on list, you must add numdisabled and
> numvtaps, that's confusing.
> Instead of numdisabled, let's add a counter for
> number of entries in the tap_link list.
>
> How about numqueues?

This sounds better.
>>  	int			minor;
>>  };
>>  
>> -- 
>> 1.7.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 7/9] macvtap: allow TUNSETIFF to create multiqueue device
  2013-06-06  3:13     ` Jason Wang
@ 2013-06-06  6:59       ` Michael S. Tsirkin
  2013-06-06  7:12         ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06  6:59 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 11:13:29AM +0800, Jason Wang wrote:
> On 06/05/2013 06:43 PM, Michael S. Tsirkin wrote:
> > On Wed, Jun 05, 2013 at 02:36:30PM +0800, Jason Wang wrote:
> >> Though the queue were in fact created by open(), we still need to add this check
> >> to be compatible with tuntap which can let mgmt software use a single API to
> >> manage queues. This patch only validates the device name and moves the TUNSETIFF
> >> to a helper.
> >>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > The patch is OK, the description is confusing.
> > What you mean is simply:
> >
> > 	Allow IFF_MULTI_QUEUE in TUNSETIFF for macvtap, to match
> > 	tun behaviour.
> >
> > And if you put it like this, I would say make this
> > the last patch in the series, so userspace
> > can use IFF_MULTI_QUEUE to detect new versus old
> > behaviour.
> 
> Make sense, thanks.
> >
> >> ---
> >>  drivers/net/macvtap.c |   51 ++++++++++++++++++++++++++++++++++++++----------
> >>  1 files changed, 40 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> >> index 5ccba99..14764cc 100644
> >> --- a/drivers/net/macvtap.c
> >> +++ b/drivers/net/macvtap.c
> >> @@ -869,6 +869,7 @@ out:
> >>  	return ret;
> >>  }
> >>  
> >> +
> >>  static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
> >>  {
> >>  	struct macvlan_dev *vlan;
> > Please don't.
> 
> Ok.
> >
> >> @@ -887,6 +888,44 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
> >>  	dev_put(vlan->dev);
> >>  }
> >>  
> >> +static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
> >> +{
> >> +	struct macvtap_queue *q = file->private_data;
> >> +	struct net *net = current->nsproxy->net_ns;
> >> +	struct inode *inode = file_inode(file);
> >> +	struct net_device *dev, *dev2;
> >> +	struct ifreq ifr;
> >> +
> >> +	if (copy_from_user(&ifr, ifr_u, sizeof(struct ifreq)))
> >> +		return -EFAULT;
> >> +
> >> +	/* To keep the same behavior of tuntap, validate ifr_name */
> > So I'm not sure - why is it important to validate ifr_name here?
> > We ignore the name for all other flags - why is IFF_MULTI_QUEUE
> > special?
> 
> It raises another question, why not validate ifname like tuntap? We
> should warn userspace about their error, otherwise they may create
> queues on the wrong device. In fact I want validate for both, but keep
> the behaviour w/o IFF_MULTI_QUEUE for backward compatibility.

Basically macvtap ignores ifr_name because it doesn't need it.
Making it ignore it without IFF_MULTI_QUEUE but
not with IFF_MULTI_QUEUE seems ugly.

Do you think we'll need ifr_name at some point?
Why not validate then, when we actually do?


> >
> >> +	if (ifr.ifr_flags & IFF_MULTI_QUEUE) {
> >> +		dev = __dev_get_by_name(net, ifr.ifr_name);
> >> +		if (!dev)
> >> +			return -EINVAL;
> >> +
> >> +		dev2 = dev_get_by_macvtap_minor(iminor(inode));
> >> +		if (!dev2)
> >> +			return -EINVAL;
> >> +
> >> +		if (dev != dev2) {
> >> +			dev_put(dev2);
> >> +			return -EINVAL;
> >> +		}
> >> +
> >> +		dev_put(dev2);
> >> +	}
> >> +
> >> +	if ((ifr.ifr_flags & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
> >> +	    (IFF_NO_PI | IFF_TAP))
> >> +		return -EINVAL;
> >> +	else
> >> +		q->flags = ifr.ifr_flags;
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  /*
> >>   * provide compatibility with generic tun/tap interface
> >>   */
> >> @@ -905,17 +944,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
> >>  
> >>  	switch (cmd) {
> >>  	case TUNSETIFF:
> >> -		/* ignore the name, just look at flags */
> > This is actually a useful comment that you've removed.
> 
> Will get it back.
> >
> >> -		if (get_user(u, &ifr->ifr_flags))
> >> -			return -EFAULT;
> >> -
> >> -		ret = 0;
> >> -		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
> >> -			ret = -EINVAL;
> >> -		else
> >> -			q->flags = u;
> >> -
> >> -		return ret;
> >> +		return macvtap_set_iff(file, ifr);
> >>  
> >>  	case TUNGETIFF:
> >>  		vlan = macvtap_get_vlan(q);
> >> -- 
> >> 1.7.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 7/9] macvtap: allow TUNSETIFF to create multiqueue device
  2013-06-06  6:59       ` Michael S. Tsirkin
@ 2013-06-06  7:12         ` Jason Wang
  2013-06-06  7:26           ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2013-06-06  7:12 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On 06/06/2013 02:59 PM, Michael S. Tsirkin wrote:
> On Thu, Jun 06, 2013 at 11:13:29AM +0800, Jason Wang wrote:
>> On 06/05/2013 06:43 PM, Michael S. Tsirkin wrote:
>>> On Wed, Jun 05, 2013 at 02:36:30PM +0800, Jason Wang wrote:
>>>> Though the queue were in fact created by open(), we still need to add this check
>>>> to be compatible with tuntap which can let mgmt software use a single API to
>>>> manage queues. This patch only validates the device name and moves the TUNSETIFF
>>>> to a helper.
>>>>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> The patch is OK, the description is confusing.
>>> What you mean is simply:
>>>
>>> 	Allow IFF_MULTI_QUEUE in TUNSETIFF for macvtap, to match
>>> 	tun behaviour.
>>>
>>> And if you put it like this, I would say make this
>>> the last patch in the series, so userspace
>>> can use IFF_MULTI_QUEUE to detect new versus old
>>> behaviour.

[...]
>>>> @@ -887,6 +888,44 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
>>>>  	dev_put(vlan->dev);
>>>>  }
>>>>  
>>>> +static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
>>>> +{
>>>> +	struct macvtap_queue *q = file->private_data;
>>>> +	struct net *net = current->nsproxy->net_ns;
>>>> +	struct inode *inode = file_inode(file);
>>>> +	struct net_device *dev, *dev2;
>>>> +	struct ifreq ifr;
>>>> +
>>>> +	if (copy_from_user(&ifr, ifr_u, sizeof(struct ifreq)))
>>>> +		return -EFAULT;
>>>> +
>>>> +	/* To keep the same behavior of tuntap, validate ifr_name */
>>> So I'm not sure - why is it important to validate ifr_name here?
>>> We ignore the name for all other flags - why is IFF_MULTI_QUEUE
>>> special?
>> It raises another question, why not validate ifname like tuntap? We
>> should warn userspace about their error, otherwise they may create
>> queues on the wrong device. In fact I want validate for both, but keep
>> the behaviour w/o IFF_MULTI_QUEUE for backward compatibility.
> Basically macvtap ignores ifr_name because it doesn't need it.
> Making it ignore it without IFF_MULTI_QUEUE but
> not with IFF_MULTI_QUEUE seems ugly.
>
> Do you think we'll need ifr_name at some point?
> Why not validate then, when we actually do?
>
>
>

If we want to be more compatible with tuntap to simplify userspace codes.

E.g: There's a userspace who want to create both taps and macvtaps using
the same codes. For tuntap, we can let kernel name the device, so
creating a mq device looks like:

open()
tunsetiff()
if_name = tungetiff()
tunsetiff(if_name)
...
tunsetiff(if_name)

For tuntap, if we specifies a wrong ifr_name, kernel will complains.
We'd better do the same for macvtap.

[...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 8/9] macvtap: add TUNSETQUEUE ioctl
  2013-06-06  3:16     ` Jason Wang
@ 2013-06-06  7:23       ` Michael S. Tsirkin
  2013-06-06  7:30         ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06  7:23 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 11:16:51AM +0800, Jason Wang wrote:
> On 06/05/2013 06:59 PM, Michael S. Tsirkin wrote:
> > On Wed, Jun 05, 2013 at 02:36:31PM +0800, Jason Wang wrote:
> >> This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
> >> enable a queue of macvtap. This is used to be compatible at API layer of tuntap
> >> to simplify the userspace to manage the queues. This is done through introducing
> >> a linked list to track all taps while using vlan->taps array to only track
> >> active taps.
> >>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> ---
> >>  drivers/net/macvtap.c      |  133 +++++++++++++++++++++++++++++++++++++++-----
> >>  include/linux/if_macvlan.h |    4 +
> >>  2 files changed, 122 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> >> index 14764cc..355e6ad 100644
> >> --- a/drivers/net/macvtap.c
> >> +++ b/drivers/net/macvtap.c
> >> @@ -45,6 +45,8 @@ struct macvtap_queue {
> >>  	struct file *file;
> >>  	unsigned int flags;
> >>  	u16 queue_index;
> >> +	bool enabled;
> >> +	struct list_head next;
> >>  };
> >>  
> >>  static struct proto macvtap_proto = {
> >> @@ -85,14 +87,37 @@ static const struct proto_ops macvtap_socket_ops;
> >>   */
> >>  static DEFINE_SPINLOCK(macvtap_lock);
> >>  
> >> -static int macvtap_set_queue(struct net_device *dev, struct file *file,
> >> +static int macvtap_enable_queue(struct net_device *dev, struct file *file,
> >>  				struct macvtap_queue *q)
> >>  {
> >>  	struct macvlan_dev *vlan = netdev_priv(dev);
> >> +	int err = -EINVAL;
> >> +
> >> +	spin_lock(&macvtap_lock);
> >> +
> >> +	if (q->enabled)
> >> +		goto out;
> >> +
> >> +	err = 0;
> >> +	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
> >> +	q->queue_index = vlan->numvtaps;
> >> +	q->enabled = true;
> >> +
> >> +	vlan->numvtaps++;
> >> +	vlan->numdisabled--;
> >> +out:
> >> +	spin_unlock(&macvtap_lock);
> >> +	return err;
> >> +}
> >> +
> >> +static int macvtap_set_queue(struct net_device *dev, struct file *file,
> >> +			     struct macvtap_queue *q)
> >> +{
> >> +	struct macvlan_dev *vlan = netdev_priv(dev);
> >>  	int err = -EBUSY;
> >>  
> >>  	spin_lock(&macvtap_lock);
> >> -	if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
> >> +	if (vlan->numvtaps + vlan->numdisabled == MAX_MACVTAP_QUEUES)
> >>  		goto out;
> >>  
> >>  	err = 0;
> >> @@ -102,7 +127,9 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
> >>  
> >>  	q->file = file;
> >>  	q->queue_index = vlan->numvtaps;
> >> +	q->enabled = true;
> >>  	file->private_data = q;
> >> +	list_add_tail(&q->next, &vlan->tap_link);
> >>  
> >>  	vlan->numvtaps++;
> >>  
> >> @@ -111,6 +138,38 @@ out:
> >>  	return err;
> >>  }
> >>  
> >> +static int macvtap_disable_queue(struct macvtap_queue *q)
> >> +{
> >> +	struct macvlan_dev *vlan;
> >> +	struct macvtap_queue *nq;
> >> +	int err = 0;
> >> +
> >> +	spin_lock(&macvtap_lock);
> >> +	vlan = rcu_dereference_protected(q->vlan,
> >> +					 lockdep_is_held(&macvtap_lock));
> >> +
> >> +	if (!q->enabled) {
> >> +		err = -EINVAL;
> >> +		goto out;
> >> +	}
> >> +
> >> +	if (vlan) {
> >> +		int index = q->queue_index;
> >> +		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
> >> +					       lockdep_is_held(&macvtap_lock));
> >> +		nq->queue_index = index;
> >> +
> >> +		rcu_assign_pointer(vlan->taps[index], nq);
> >> +		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
> >> +		q->enabled = false;
> >> +
> >> +		vlan->numvtaps--;
> >> +		vlan->numdisabled++;
> >> +	}
> >> +out:
> >> +	spin_unlock(&macvtap_lock);
> >> +	return err;
> >> +}
> >>  /*
> >>   * The file owning the queue got closed, give up both
> >>   * the reference that the files holds as well as the
> >> @@ -128,18 +187,24 @@ static void macvtap_put_queue(struct macvtap_queue *q)
> >>  	vlan = rcu_dereference_protected(q->vlan,
> >>  					 lockdep_is_held(&macvtap_lock));
> >>  	if (vlan) {
> >> +		int numvtaps = vlan->numvtaps;
> >>  		int index = q->queue_index;
> >> -		BUG_ON(index >= vlan->numvtaps);
> >>  
> >> -		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
> >> -					       lockdep_is_held(&macvtap_lock));
> >> -		rcu_assign_pointer(vlan->taps[index], nq);
> >> -		nq->queue_index = index;
> >> +		if (q->enabled) {
> >> +			BUG_ON(index >= vlan->numvtaps);
> >> +			nq = rcu_dereference_protected(vlan->taps[numvtaps - 1],
> >> +						lockdep_is_held(&macvtap_lock));
> >> +			rcu_assign_pointer(vlan->taps[index], nq);
> > Do we really need these tricks?
> > Can't we call macvtap_disable_queue and then only handle disable queues
> > here?
> 
> We could, will do it.
> >
> >> +			nq->queue_index = index;
> >> +
> >> +			RCU_INIT_POINTER(vlan->taps[numvtaps - 1], NULL);
> >> +			vlan->numvtaps--;
> >> +		} else
> >> +			vlan->numdisabled--;
> >>  
> >> -		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
> >>  		RCU_INIT_POINTER(q->vlan, NULL);
> >>  		sock_put(&q->sk);
> >> -		--vlan->numvtaps;
> >> +		list_del_init(&q->next);
> >>  	}
> >>  
> >>  	spin_unlock(&macvtap_lock);
> >> @@ -160,6 +225,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
> >>  {
> >>  	struct macvlan_dev *vlan = netdev_priv(dev);
> >>  	struct macvtap_queue *tap = NULL;
> >> +	/* numvtaps here is just a hint, even if the number of active taps were
> >> +	 * reduced in the same sime,
> > typo
> 
> Will correct it.
> >
> >> since we are protected by rcu read lock,
> >> +	 * we're safe here.
> > I don't really understand what is this comment trying to say.
> > What is protected? What is safe? safe time as what?
> 
> Will make it clear as:
> 
> "numvtaps is just a hint, even if the number of active taps were reduced
> in the same tap, since the tap pointers were protected by rcu read lock,
> they are safe even if some of them were being removed"


I think I see.  I think what you really meant is two things:
- near macvtap_forward:
 /* called under rcu read lock */
- and here: 
/*
 * Access to taps array is protected by rcu, but access to numvtaps
 * isn't. Below we use it to lookup a queue, but treat it as a hint
 * and validate that the result isn't NULL - in case we are
 * racing against queue removal.
 */







> >> +	 */
> >>  	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
> >>  	__u32 rxq;
> >>  
> >> @@ -196,20 +265,25 @@ out:
> >>  static void macvtap_del_queues(struct net_device *dev)
> >>  {
> >>  	struct macvlan_dev *vlan = netdev_priv(dev);
> >> -	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
> >> +	struct macvtap_queue *q, *tmp, *qlist[MAX_MACVTAP_QUEUES];
> >>  	int i, j = 0;
> >>  
> >>  	spin_lock(&macvtap_lock);
> >> -	for (i = 0; i < vlan->numvtaps; i++) {
> >> -		q = rcu_dereference_protected(vlan->taps[i],
> >> -					      lockdep_is_held(&macvtap_lock));
> >> -		BUG_ON(q == NULL);
> >> +	list_for_each_entry_safe(q, tmp, &vlan->tap_link, next) {
> >> +		list_del_init(&q->next);
> >>  		qlist[j++] = q;
> >> -		RCU_INIT_POINTER(vlan->taps[i], NULL);
> >>  		RCU_INIT_POINTER(q->vlan, NULL);
> >> +		if (q->enabled)
> >> +			vlan->numvtaps--;
> >> +		else
> >> +			vlan->numdisabled--;
> >>  	}
> >> +	for (i = 0; i < vlan->numvtaps; i++)
> >> +		RCU_INIT_POINTER(vlan->taps[i], NULL);
> >> +	BUG_ON(vlan->numvtaps + vlan->numdisabled != 0);
> > Better two BUG_ON checks - they are both 0 right?
> > And no need for != 0.
> 
> Sure.
> >
> >>  	/* guarantee that any future macvtap_set_queue will fail */
> >>  	vlan->numvtaps = MAX_MACVTAP_QUEUES;
> >> +	vlan->numdisabled = 0;
> > What's this doing? We know it's 0 ...
> 
> Yes, will remove it.
> >>  	spin_unlock(&macvtap_lock);
> >>  
> >>  	synchronize_rcu();
> >> @@ -298,6 +372,9 @@ static int macvtap_newlink(struct net *src_net,
> >>  			   struct nlattr *tb[],
> >>  			   struct nlattr *data[])
> >>  {
> >> +	struct macvlan_dev *vlan = netdev_priv(dev);
> >> +	INIT_LIST_HEAD(&vlan->tap_link);
> >> +
> >>  	/* Don't put anything that may fail after macvlan_common_newlink
> >>  	 * because we can't undo what it does.
> >>  	 */
> >> @@ -926,6 +1003,27 @@ static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
> >>  	return 0;
> >>  }
> >>  
> >> +static int macvtap_ioctl_set_queue(struct file *file, unsigned int flags)
> >> +{
> >> +	struct macvtap_queue *q = file->private_data;
> >> +	struct macvlan_dev *vlan;
> >> +	int ret = -EINVAL;
> >> +
> >> +	vlan = macvtap_get_vlan(q);
> >> +	if (!vlan)
> >> +		goto done;
> > Can be just return -EINVAL, and you won't need to initialize
> > above.
> 
> Ok.
> >
> >> +
> >> +	if (flags & IFF_ATTACH_QUEUE)
> >> +		ret = macvtap_enable_queue(vlan->dev, file, q);
> >> +	else if (flags & IFF_DETACH_QUEUE)
> >> +		ret = macvtap_disable_queue(q);
> >> +
> >> +	macvtap_put_vlan(vlan);
> >> +
> >> +done:
> >> +	return ret;
> >> +}
> >> +
> >>  /*
> >>   * provide compatibility with generic tun/tap interface
> >>   */
> >> @@ -958,6 +1056,11 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
> >>  		macvtap_put_vlan(vlan);
> >>  		return ret;
> >>  
> >> +	case TUNSETQUEUE:
> >> +		if (get_user(u, &ifr->ifr_flags))
> >> +			return -EFAULT;
> >> +		return macvtap_ioctl_set_queue(file, u);
> >> +
> >>  	case TUNGETFEATURES:
> >>  		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
> >>  			return -EFAULT;
> >> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
> >> index 62d8bda..87b9091 100644
> >> --- a/include/linux/if_macvlan.h
> >> +++ b/include/linux/if_macvlan.h
> >> @@ -69,8 +69,12 @@ struct macvlan_dev {
> >>  	u16			flags;
> >>  	int (*receive)(struct sk_buff *skb);
> >>  	int (*forward)(struct net_device *dev, struct sk_buff *skb);
> >> +	/* This array tracks active taps. */
> >>  	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
> >> +	/* This list tracks all taps (both enabled and disabled) */
> >> +	struct list_head	tap_link;
> > Bad name, e.g. queue_list would be better.
> 
> Ok.
> >>  	int			numvtaps;
> >> +	int			numdisabled;
> > So numvtaps is number of entries in the taps array.
> > But to get items on list, you must add numdisabled and
> > numvtaps, that's confusing.
> > Instead of numdisabled, let's add a counter for
> > number of entries in the tap_link list.
> >
> > How about numqueues?
> 
> This sounds better.
> >>  	int			minor;
> >>  };
> >>  
> >> -- 
> >> 1.7.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 7/9] macvtap: allow TUNSETIFF to create multiqueue device
  2013-06-06  7:12         ` Jason Wang
@ 2013-06-06  7:26           ` Michael S. Tsirkin
  2013-06-06  7:31             ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06  7:26 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 03:12:52PM +0800, Jason Wang wrote:
> On 06/06/2013 02:59 PM, Michael S. Tsirkin wrote:
> > On Thu, Jun 06, 2013 at 11:13:29AM +0800, Jason Wang wrote:
> >> On 06/05/2013 06:43 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Jun 05, 2013 at 02:36:30PM +0800, Jason Wang wrote:
> >>>> Though the queue were in fact created by open(), we still need to add this check
> >>>> to be compatible with tuntap which can let mgmt software use a single API to
> >>>> manage queues. This patch only validates the device name and moves the TUNSETIFF
> >>>> to a helper.
> >>>>
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>> The patch is OK, the description is confusing.
> >>> What you mean is simply:
> >>>
> >>> 	Allow IFF_MULTI_QUEUE in TUNSETIFF for macvtap, to match
> >>> 	tun behaviour.
> >>>
> >>> And if you put it like this, I would say make this
> >>> the last patch in the series, so userspace
> >>> can use IFF_MULTI_QUEUE to detect new versus old
> >>> behaviour.
> 
> [...]
> >>>> @@ -887,6 +888,44 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
> >>>>  	dev_put(vlan->dev);
> >>>>  }
> >>>>  
> >>>> +static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
> >>>> +{
> >>>> +	struct macvtap_queue *q = file->private_data;
> >>>> +	struct net *net = current->nsproxy->net_ns;
> >>>> +	struct inode *inode = file_inode(file);
> >>>> +	struct net_device *dev, *dev2;
> >>>> +	struct ifreq ifr;
> >>>> +
> >>>> +	if (copy_from_user(&ifr, ifr_u, sizeof(struct ifreq)))
> >>>> +		return -EFAULT;
> >>>> +
> >>>> +	/* To keep the same behavior of tuntap, validate ifr_name */
> >>> So I'm not sure - why is it important to validate ifr_name here?
> >>> We ignore the name for all other flags - why is IFF_MULTI_QUEUE
> >>> special?
> >> It raises another question, why not validate ifname like tuntap? We
> >> should warn userspace about their error, otherwise they may create
> >> queues on the wrong device. In fact I want validate for both, but keep
> >> the behaviour w/o IFF_MULTI_QUEUE for backward compatibility.
> > Basically macvtap ignores ifr_name because it doesn't need it.
> > Making it ignore it without IFF_MULTI_QUEUE but
> > not with IFF_MULTI_QUEUE seems ugly.
> >
> > Do you think we'll need ifr_name at some point?
> > Why not validate then, when we actually do?
> >
> >
> >
> 
> If we want to be more compatible with tuntap to simplify userspace codes.
> 
> E.g: There's a userspace who want to create both taps and macvtaps using
> the same codes. For tuntap, we can let kernel name the device, so
> creating a mq device looks like:
> 
> open()
> tunsetiff()
> if_name = tungetiff()
> tunsetiff(if_name)
> ...
> tunsetiff(if_name)
> 
> For tuntap, if we specifies a wrong ifr_name, kernel will complains.
> We'd better do the same for macvtap.
> 
> [...]

I don't think we need to worry about returning same error to buggy
applications. Maybe it would make sense if we did it like this
the first time around, or maybe it won't, but adding
inconsistency between macvtap interfaces is even worse.

-- 
MST

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 8/9] macvtap: add TUNSETQUEUE ioctl
  2013-06-06  7:23       ` Michael S. Tsirkin
@ 2013-06-06  7:30         ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2013-06-06  7:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On 06/06/2013 03:23 PM, Michael S. Tsirkin wrote:
> On Thu, Jun 06, 2013 at 11:16:51AM +0800, Jason Wang wrote:
>> > On 06/05/2013 06:59 PM, Michael S. Tsirkin wrote:
>>> > > On Wed, Jun 05, 2013 at 02:36:31PM +0800, Jason Wang wrote:
>>>> > >> This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
>>>> > >> enable a queue of macvtap. This is used to be compatible at API layer of tuntap
>>>> > >> to simplify the userspace to manage the queues. This is done through introducing
>>>> > >> a linked list to track all taps while using vlan->taps array to only track
>>>> > >> active taps.
>>>> > >>
>>>> > >> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> > >> ---
>>>> > >>  drivers/net/macvtap.c      |  133 +++++++++++++++++++++++++++++++++++++++-----
>>>> > >>  include/linux/if_macvlan.h |    4 +
>>>> > >>  2 files changed, 122 insertions(+), 15 deletions(-)
>>>> > >>
>>>> > >> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>>>> > >> index 14764cc..355e6ad 100644
>>>> > >> --- a/drivers/net/macvtap.c
>>>> > >> +++ b/drivers/net/macvtap.c

[...]
>>>>
>>> > >
>>>> > >> since we are protected by rcu read lock,
>>>> > >> +	 * we're safe here.
>>> > > I don't really understand what is this comment trying to say.
>>> > > What is protected? What is safe? safe time as what?
>> > 
>> > Will make it clear as:
>> > 
>> > "numvtaps is just a hint, even if the number of active taps were reduced
>> > in the same tap, since the tap pointers were protected by rcu read lock,
>> > they are safe even if some of them were being removed"
> I think I see.  I think what you really meant is two things:
> - near macvtap_forward:
>  /* called under rcu read lock */
> - and here: 
> /*
>  * Access to taps array is protected by rcu, but access to numvtaps
>  * isn't. Below we use it to lookup a queue, but treat it as a hint
>  * and validate that the result isn't NULL - in case we are
>  * racing against queue removal.
>  */

Yes, will correct the comments.

Thanks

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next rfc V3 7/9] macvtap: allow TUNSETIFF to create multiqueue device
  2013-06-06  7:26           ` Michael S. Tsirkin
@ 2013-06-06  7:31             ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2013-06-06  7:31 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On 06/06/2013 03:26 PM, Michael S. Tsirkin wrote:
> On Thu, Jun 06, 2013 at 03:12:52PM +0800, Jason Wang wrote:
>> On 06/06/2013 02:59 PM, Michael S. Tsirkin wrote:
>>> On Thu, Jun 06, 2013 at 11:13:29AM +0800, Jason Wang wrote:
>>>> On 06/05/2013 06:43 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Jun 05, 2013 at 02:36:30PM +0800, Jason Wang wrote:
>>>>>> Though the queue were in fact created by open(), we still need to add this check
>>>>>> to be compatible with tuntap which can let mgmt software use a single API to
>>>>>> manage queues. This patch only validates the device name and moves the TUNSETIFF
>>>>>> to a helper.
>>>>>>
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> The patch is OK, the description is confusing.
>>>>> What you mean is simply:
>>>>>
>>>>> 	Allow IFF_MULTI_QUEUE in TUNSETIFF for macvtap, to match
>>>>> 	tun behaviour.
>>>>>
>>>>> And if you put it like this, I would say make this
>>>>> the last patch in the series, so userspace
>>>>> can use IFF_MULTI_QUEUE to detect new versus old
>>>>> behaviour.
>> [...]
>>>>>> @@ -887,6 +888,44 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
>>>>>>  	dev_put(vlan->dev);
>>>>>>  }
>>>>>>  
>>>>>> +static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
>>>>>> +{
>>>>>> +	struct macvtap_queue *q = file->private_data;
>>>>>> +	struct net *net = current->nsproxy->net_ns;
>>>>>> +	struct inode *inode = file_inode(file);
>>>>>> +	struct net_device *dev, *dev2;
>>>>>> +	struct ifreq ifr;
>>>>>> +
>>>>>> +	if (copy_from_user(&ifr, ifr_u, sizeof(struct ifreq)))
>>>>>> +		return -EFAULT;
>>>>>> +
>>>>>> +	/* To keep the same behavior of tuntap, validate ifr_name */
>>>>> So I'm not sure - why is it important to validate ifr_name here?
>>>>> We ignore the name for all other flags - why is IFF_MULTI_QUEUE
>>>>> special?
>>>> It raises another question, why not validate ifname like tuntap? We
>>>> should warn userspace about their error, otherwise they may create
>>>> queues on the wrong device. In fact I want validate for both, but keep
>>>> the behaviour w/o IFF_MULTI_QUEUE for backward compatibility.
>>> Basically macvtap ignores ifr_name because it doesn't need it.
>>> Making it ignore it without IFF_MULTI_QUEUE but
>>> not with IFF_MULTI_QUEUE seems ugly.
>>>
>>> Do you think we'll need ifr_name at some point?
>>> Why not validate then, when we actually do?
>>>
>>>
>>>
>> If we want to be more compatible with tuntap to simplify userspace codes.
>>
>> E.g: There's a userspace who want to create both taps and macvtaps using
>> the same codes. For tuntap, we can let kernel name the device, so
>> creating a mq device looks like:
>>
>> open()
>> tunsetiff()
>> if_name = tungetiff()
>> tunsetiff(if_name)
>> ...
>> tunsetiff(if_name)
>>
>> For tuntap, if we specifies a wrong ifr_name, kernel will complains.
>> We'd better do the same for macvtap.
>>
>> [...]
> I don't think we need to worry about returning same error to buggy
> applications. Maybe it would make sense if we did it like this
> the first time around, or maybe it won't, but adding
> inconsistency between macvtap interfaces is even worse.
>

Ok, I will drop this patch in next version.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2013-06-06  7:32 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-05  6:36 [net-next rfc V3 0/9] Multiqueue API for macvtap Jason Wang
2013-06-05  6:36 ` [net-next rfc V3 1/9] macvtap: fix a possible race between queue selection and changing queues Jason Wang
2013-06-05 10:35   ` Michael S. Tsirkin
2013-06-05  6:36 ` [net-next rfc V3 2/9] macvtap: do not add self to waitqueue if doing a nonblock read Jason Wang
2013-06-05 11:07   ` Michael S. Tsirkin
2013-06-05  6:36 ` [net-next rfc V3 3/9] macvlan: switch to use IS_ENABLED() Jason Wang
2013-06-05  6:36 ` [net-next rfc V3 4/9] macvtap: introduce macvtap_get_vlan() Jason Wang
2013-06-05  6:36 ` [net-next rfc V3 5/9] macvlan: change the max number of queues to 16 Jason Wang
2013-06-05  6:36 ` [net-next rfc V3 6/9] macvtap: eliminate linear search Jason Wang
2013-06-05  6:36 ` [net-next rfc V3 7/9] macvtap: allow TUNSETIFF to create multiqueue device Jason Wang
2013-06-05 10:43   ` Michael S. Tsirkin
2013-06-06  3:13     ` Jason Wang
2013-06-06  6:59       ` Michael S. Tsirkin
2013-06-06  7:12         ` Jason Wang
2013-06-06  7:26           ` Michael S. Tsirkin
2013-06-06  7:31             ` Jason Wang
2013-06-05  6:36 ` [net-next rfc V3 8/9] macvtap: add TUNSETQUEUE ioctl Jason Wang
2013-06-05 10:59   ` Michael S. Tsirkin
2013-06-06  3:16     ` Jason Wang
2013-06-06  7:23       ` Michael S. Tsirkin
2013-06-06  7:30         ` Jason Wang
2013-06-05  6:36 ` [net-next rfc V3 9/9] macvtap: enable multiqueue flag Jason Wang
2013-06-05 10:36 ` [net-next rfc V3 0/9] Multiqueue API for macvtap Michael S. Tsirkin
2013-06-06  3:07   ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).