* [net-next RFC 0/8] multiqueue API support for macvtap
@ 2013-05-23 3:12 Jason Wang
2013-05-23 3:12 ` [net-next RFC 1/8] macvlan: switch to use IS_ENABLED() Jason Wang
` (8 more replies)
0 siblings, 9 replies; 23+ messages in thread
From: Jason Wang @ 2013-05-23 3:12 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
Hi all:
This series implements a fully tuntap compatiable API which could be used by
userspace to manage multi macvtap queues. The main parts is to add TUNSETQUEUE
ioctl support for macvtap.
Patch 1 - 4 was some tuntap compatibility and misc cleanups.
Patch 5 removes the linear search in macvtap by reshuffling the macvtaps array
each time a queue is removed. After this, we could store both enabled and
disabled queues in the same array without introducing extra data structure.
Patch 6 let TUNSETIFF can create multiqueue device, nothing but some check were
added.
Patch 7 implement TUNSETQUEUE ioctl
Patch 8 reports IFF_MULTI_QUEUE to userspace to notify the userspace that the
multiqueue API is completed.
Flow caches implememtation were missed in this version, since I am doing rework
on the tuntap flow caches. Have some some stress test with both netperf and
pktgen.
Please review, thanks.
Jason Wang (8):
macvlan: switch to use IS_ENABLED()
macvtap: return -EBADFD when TUNGETIFF fails
macvtap: introduce macvtap_get_vlan()
macvlan: reduce the max number of taps to 8
macvtap: eliminate linear search
macvtap: allow TUNSETIFF to create multiqueue device
macvtap: add TUNSETQUEUE ioctl
macvtap: enable multiqueue flag
drivers/net/macvtap.c | 265 ++++++++++++++++++++++++++++++-------------
include/linux/if_macvlan.h | 11 ++-
2 files changed, 194 insertions(+), 82 deletions(-)
^ permalink raw reply [flat|nested] 23+ messages in thread
* [net-next RFC 1/8] macvlan: switch to use IS_ENABLED()
2013-05-23 3:12 [net-next RFC 0/8] multiqueue API support for macvtap Jason Wang
@ 2013-05-23 3:12 ` Jason Wang
2013-05-23 11:31 ` Michael S. Tsirkin
2013-05-23 3:12 ` [net-next RFC 2/8] macvtap: return -EBADFD when TUNGETIFF fails Jason Wang
` (7 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Jason Wang @ 2013-05-23 3:12 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
include/linux/if_macvlan.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 84dde1d..e47ad46 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -8,7 +8,7 @@
#include <net/netlink.h>
#include <linux/u64_stats_sync.h>
-#if defined(CONFIG_MACVTAP) || defined(CONFIG_MACVTAP_MODULE)
+#if IS_ENABLED(CONFIG_MACVTAP)
struct socket *macvtap_get_socket(struct file *);
#else
#include <linux/err.h>
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [net-next RFC 2/8] macvtap: return -EBADFD when TUNGETIFF fails
2013-05-23 3:12 [net-next RFC 0/8] multiqueue API support for macvtap Jason Wang
2013-05-23 3:12 ` [net-next RFC 1/8] macvlan: switch to use IS_ENABLED() Jason Wang
@ 2013-05-23 3:12 ` Jason Wang
2013-05-23 11:54 ` Michael S. Tsirkin
2013-05-23 3:12 ` [net-next RFC 3/8] macvtap: introduce macvtap_get_vlan() Jason Wang
` (6 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Jason Wang @ 2013-05-23 3:12 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
Tuntap return -EBADFD when TUNGETIFF fails, we should return the same value.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/macvtap.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 59e9605..ce1c72a 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -928,7 +928,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
rcu_read_unlock_bh();
if (!vlan)
- return -ENOLINK;
+ return -EBADFD;
ret = 0;
if (copy_to_user(&ifr->ifr_name, vlan->dev->name, IFNAMSIZ) ||
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [net-next RFC 3/8] macvtap: introduce macvtap_get_vlan()
2013-05-23 3:12 [net-next RFC 0/8] multiqueue API support for macvtap Jason Wang
2013-05-23 3:12 ` [net-next RFC 1/8] macvlan: switch to use IS_ENABLED() Jason Wang
2013-05-23 3:12 ` [net-next RFC 2/8] macvtap: return -EBADFD when TUNGETIFF fails Jason Wang
@ 2013-05-23 3:12 ` Jason Wang
2013-05-23 15:11 ` Sergei Shtylyov
2013-05-23 3:12 ` [net-next RFC 4/8] macvlan: reduce the max number of taps to 8 Jason Wang
` (5 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Jason Wang @ 2013-05-23 3:12 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
Factor out the device holding logic to a macvtap_get_vlan(), this will be also
used by multiqueue API.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/macvtap.c | 26 +++++++++++++++++++-------
1 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index ce1c72a..a36e49e 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -890,6 +890,23 @@ out:
return ret;
}
+static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
+{
+ struct macvlan_dev *vlan;
+ rcu_read_lock_bh();
+ vlan = rcu_dereference_bh(q->vlan);
+ if (vlan)
+ dev_hold(vlan->dev);
+ rcu_read_unlock_bh();
+
+ return vlan;
+}
+
+static void macvtap_put_vlan(struct macvlan_dev *vlan)
+{
+ dev_put(vlan->dev);
+}
+
/*
* provide compatibility with generic tun/tap interface
*/
@@ -921,12 +938,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
return ret;
case TUNGETIFF:
- rcu_read_lock_bh();
- vlan = rcu_dereference_bh(q->vlan);
- if (vlan)
- dev_hold(vlan->dev);
- rcu_read_unlock_bh();
-
+ vlan = macvtap_get_vlan(q);
if (!vlan)
return -EBADFD;
@@ -934,7 +946,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
if (copy_to_user(&ifr->ifr_name, vlan->dev->name, IFNAMSIZ) ||
put_user(q->flags, &ifr->ifr_flags))
ret = -EFAULT;
- dev_put(vlan->dev);
+ macvtap_put_vlan(vlan);
return ret;
case TUNGETFEATURES:
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [net-next RFC 4/8] macvlan: reduce the max number of taps to 8
2013-05-23 3:12 [net-next RFC 0/8] multiqueue API support for macvtap Jason Wang
` (2 preceding siblings ...)
2013-05-23 3:12 ` [net-next RFC 3/8] macvtap: introduce macvtap_get_vlan() Jason Wang
@ 2013-05-23 3:12 ` Jason Wang
2013-05-23 6:37 ` Michael S. Tsirkin
2013-05-23 3:12 ` [net-next RFC 5/8] macvtap: eliminate linear search Jason Wang
` (4 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Jason Wang @ 2013-05-23 3:12 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
To be same with tap.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
include/linux/if_macvlan.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index e47ad46..32e943a 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -50,7 +50,7 @@ struct macvlan_pcpu_stats {
* Maximum times a macvtap device can be opened. This can be used to
* configure the number of receive queue, e.g. for multiqueue virtio.
*/
-#define MAX_MACVTAP_QUEUES (NR_CPUS < 16 ? NR_CPUS : 16)
+#define MAX_MACVTAP_QUEUES 8
#define MACVLAN_MC_FILTER_BITS 8
#define MACVLAN_MC_FILTER_SZ (1 << MACVLAN_MC_FILTER_BITS)
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [net-next RFC 5/8] macvtap: eliminate linear search
2013-05-23 3:12 [net-next RFC 0/8] multiqueue API support for macvtap Jason Wang
` (3 preceding siblings ...)
2013-05-23 3:12 ` [net-next RFC 4/8] macvlan: reduce the max number of taps to 8 Jason Wang
@ 2013-05-23 3:12 ` Jason Wang
2013-05-23 11:41 ` Michael S. Tsirkin
2013-05-23 3:12 ` [net-next RFC 6/8] macvtap: allow TUNSETIFF to create multiqueue device Jason Wang
` (3 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Jason Wang @ 2013-05-23 3:12 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
Linear search were used in both get_slot() and macvtap_get_queue(), this is
because:
- macvtap didn't reshuffle the array of taps when create or destroy a queue, so
when adding a new queue, macvtap must do linear search to find a location for
the new queue. This will also complicate the TUNSETQUEUE implementation for
multiqueue API.
- the queue itself didn't track the queue index, so the we must do a linear
search in the array to find the location of a existed queue.
The solution is straightforward: reshuffle the array and introduce a queue_index
to macvtap_queue.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/macvtap.c | 63 +++++++++++++++---------------------------------
1 files changed, 20 insertions(+), 43 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index a36e49e..5fd341c 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -44,6 +44,7 @@ struct macvtap_queue {
struct macvlan_dev __rcu *vlan;
struct file *file;
unsigned int flags;
+ u16 queue_index;
};
static struct proto macvtap_proto = {
@@ -84,30 +85,10 @@ static const struct proto_ops macvtap_socket_ops;
*/
static DEFINE_SPINLOCK(macvtap_lock);
-/*
- * get_slot: return a [unused/occupied] slot in vlan->taps[]:
- * - if 'q' is NULL, return the first empty slot;
- * - otherwise, return the slot this pointer occupies.
- */
-static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
-{
- int i;
-
- for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
- if (rcu_dereference_protected(vlan->taps[i],
- lockdep_is_held(&macvtap_lock)) == q)
- return i;
- }
-
- /* Should never happen */
- BUG_ON(1);
-}
-
static int macvtap_set_queue(struct net_device *dev, struct file *file,
struct macvtap_queue *q)
{
struct macvlan_dev *vlan = netdev_priv(dev);
- int index;
int err = -EBUSY;
spin_lock(&macvtap_lock);
@@ -115,12 +96,12 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
goto out;
err = 0;
- index = get_slot(vlan, NULL);
rcu_assign_pointer(q->vlan, vlan);
- rcu_assign_pointer(vlan->taps[index], q);
+ rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
sock_hold(&q->sk);
q->file = file;
+ q->queue_index = vlan->numvtaps;
file->private_data = q;
vlan->numvtaps++;
@@ -140,16 +121,23 @@ out:
*/
static void macvtap_put_queue(struct macvtap_queue *q)
{
+ struct macvtap_queue *nq;
struct macvlan_dev *vlan;
spin_lock(&macvtap_lock);
vlan = rcu_dereference_protected(q->vlan,
lockdep_is_held(&macvtap_lock));
if (vlan) {
- int index = get_slot(vlan, q);
+ int index = q->queue_index;
+ BUG_ON(index >= vlan->numvtaps);
- RCU_INIT_POINTER(vlan->taps[index], NULL);
+ rcu_assign_pointer(vlan->taps[index],
+ vlan->taps[vlan->numvtaps - 1]);
RCU_INIT_POINTER(q->vlan, NULL);
+ nq = rcu_dereference_protected(vlan->taps[index],
+ lockdep_is_held(&macvtap_lock));
+ nq->queue_index = index;
+
sock_put(&q->sk);
--vlan->numvtaps;
}
@@ -182,8 +170,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
rxq = skb_get_rxhash(skb);
if (rxq) {
tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
- if (tap)
- goto out;
+ goto out;
}
if (likely(skb_rx_queue_recorded(skb))) {
@@ -193,17 +180,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
rxq -= numvtaps;
tap = rcu_dereference(vlan->taps[rxq]);
- if (tap)
- goto out;
- }
-
- /* Everything failed - find first available queue */
- for (rxq = 0; rxq < MAX_MACVTAP_QUEUES; rxq++) {
- tap = rcu_dereference(vlan->taps[rxq]);
- if (tap)
- break;
+ goto out;
}
+ tap = rcu_dereference(vlan->taps[0]);
out:
return tap;
}
@@ -221,17 +201,14 @@ static void macvtap_del_queues(struct net_device *dev)
/* macvtap_put_queue can free some slots, so go through all slots */
spin_lock(&macvtap_lock);
- for (i = 0; i < MAX_MACVTAP_QUEUES && vlan->numvtaps; i++) {
+ for (i = 0; i < vlan->numvtaps; i++) {
q = rcu_dereference_protected(vlan->taps[i],
lockdep_is_held(&macvtap_lock));
- if (q) {
- qlist[j++] = q;
- RCU_INIT_POINTER(vlan->taps[i], NULL);
- RCU_INIT_POINTER(q->vlan, NULL);
- vlan->numvtaps--;
- }
+ BUG_ON(q == NULL);
+ qlist[j++] = q;
+ RCU_INIT_POINTER(vlan->taps[i], NULL);
+ RCU_INIT_POINTER(q->vlan, NULL);
}
- BUG_ON(vlan->numvtaps != 0);
/* guarantee that any future macvtap_set_queue will fail */
vlan->numvtaps = MAX_MACVTAP_QUEUES;
spin_unlock(&macvtap_lock);
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [net-next RFC 6/8] macvtap: allow TUNSETIFF to create multiqueue device
2013-05-23 3:12 [net-next RFC 0/8] multiqueue API support for macvtap Jason Wang
` (4 preceding siblings ...)
2013-05-23 3:12 ` [net-next RFC 5/8] macvtap: eliminate linear search Jason Wang
@ 2013-05-23 3:12 ` Jason Wang
2013-05-23 11:45 ` Michael S. Tsirkin
2013-05-23 3:12 ` [net-next RFC 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
` (2 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Jason Wang @ 2013-05-23 3:12 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
Though the queue were in fact created by open(), we still need to add this check
to be compatible with tuntap which can let mgmt software use a single API to
manage queues. This patch only validates the device name and moves the TUNSETIFF
to a helper.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/macvtap.c | 50 ++++++++++++++++++++++++++++++++++++++----------
1 files changed, 39 insertions(+), 11 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 5fd341c..e3f9344 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -867,6 +867,7 @@ out:
return ret;
}
+
static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
{
struct macvlan_dev *vlan;
@@ -884,6 +885,43 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
dev_put(vlan->dev);
}
+static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
+{
+ struct macvtap_queue *q = file->private_data;
+ struct net *net = current->nsproxy->net_ns;
+ struct inode *inode = file_inode(file);
+ struct net_device *dev, *dev2;
+ struct ifreq ifr;
+
+ if (copy_from_user(&ifr, ifr_u, sizeof(struct ifreq)))
+ return -EFAULT;
+
+ if (ifr.ifr_flags & IFF_MULTI_QUEUE) {
+ dev = __dev_get_by_name(net, ifr.ifr_name);
+ if (!dev)
+ return -EINVAL;
+
+ dev2 = dev_get_by_macvtap_minor(iminor(inode));
+ if (!dev2)
+ return -EINVAL;
+
+ if (dev != dev2) {
+ dev_put(dev2);
+ return -EINVAL;
+ }
+
+ dev_put(dev2);
+ }
+
+ if ((ifr.ifr_flags & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
+ (IFF_NO_PI | IFF_TAP))
+ return -EINVAL;
+ else
+ q->flags = ifr.ifr_flags;
+
+ return 0;
+}
+
/*
* provide compatibility with generic tun/tap interface
*/
@@ -902,17 +940,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
switch (cmd) {
case TUNSETIFF:
- /* ignore the name, just look at flags */
- if (get_user(u, &ifr->ifr_flags))
- return -EFAULT;
-
- ret = 0;
- if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
- ret = -EINVAL;
- else
- q->flags = u;
-
- return ret;
+ return macvtap_set_iff(file, ifr);
case TUNGETIFF:
vlan = macvtap_get_vlan(q);
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [net-next RFC 7/8] macvtap: add TUNSETQUEUE ioctl
2013-05-23 3:12 [net-next RFC 0/8] multiqueue API support for macvtap Jason Wang
` (5 preceding siblings ...)
2013-05-23 3:12 ` [net-next RFC 6/8] macvtap: allow TUNSETIFF to create multiqueue device Jason Wang
@ 2013-05-23 3:12 ` Jason Wang
2013-05-23 11:52 ` Michael S. Tsirkin
2013-05-23 3:12 ` [net-next RFC 8/8] macvtap: enable multiqueue flag Jason Wang
2013-05-23 11:53 ` [net-next RFC 0/8] multiqueue API support for macvtap Michael S. Tsirkin
8 siblings, 1 reply; 23+ messages in thread
From: Jason Wang @ 2013-05-23 3:12 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
enable a queue of macvtap. This is used to be compatible at API layer of tuntap
to simplify the userspace to manage the queues.
This is done by just introduce a pointer detached which was set when the queue
were detached. Only the queue with NULL set to this pointer can be used to
forward packets.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/macvtap.c | 147 +++++++++++++++++++++++++++++++++++--------
include/linux/if_macvlan.h | 7 ++
2 files changed, 126 insertions(+), 28 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index e3f9344..06b10a5 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -85,67 +85,131 @@ static const struct proto_ops macvtap_socket_ops;
*/
static DEFINE_SPINLOCK(macvtap_lock);
+static void macvtap_swap_slot(struct macvlan_dev *vlan, int a, int b)
+{
+ struct macvtap_queue *q1, *q2;
+
+ if (a == b)
+ return;
+
+ q1 = rcu_dereference_protected(vlan->taps[a],
+ lockdep_is_held(&macvtap_lock));
+ q2 = rcu_dereference_protected(vlan->taps[b],
+ lockdep_is_held(&macvtap_lock));
+
+ BUG_ON(q1 == NULL || q2 == NULL);
+
+ rcu_assign_pointer(vlan->taps[a], vlan->taps[b]);
+ rcu_assign_pointer(vlan->taps[b], q1);
+
+ q1->queue_index = b;
+ q2->queue_index = a;
+}
+
static int macvtap_set_queue(struct net_device *dev, struct file *file,
- struct macvtap_queue *q)
+ struct macvtap_queue *q, bool enable)
{
struct macvlan_dev *vlan = netdev_priv(dev);
int err = -EBUSY;
+ int total;
spin_lock(&macvtap_lock);
- if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
+
+ total = vlan->numvtaps + vlan->numdisabled;
+ if (!enable && total == MAX_MACVTAP_QUEUES)
+ goto out;
+
+ err = -EINVAL;
+ if (enable && q->queue_index < vlan->numvtaps)
goto out;
err = 0;
- rcu_assign_pointer(q->vlan, vlan);
- rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
- sock_hold(&q->sk);
- q->file = file;
- q->queue_index = vlan->numvtaps;
- file->private_data = q;
+ if (enable) {
+ BUG_ON(q->queue_index >= total);
+ macvtap_swap_slot(vlan, q->queue_index, vlan->numvtaps);
+ --vlan->numdisabled;
+ } else {
+ rcu_assign_pointer(q->vlan, vlan);
+ rcu_assign_pointer(vlan->taps[total], q);
+ sock_hold(&q->sk);
+
+ q->file = file;
+ q->queue_index = total;
+ file->private_data = q;
+ if (vlan->numdisabled)
+ macvtap_swap_slot(vlan, vlan->numvtaps, total);
+
+ /* Make sure the pointers were seen before numvtaps. */
+ smp_wmb();
+ }
vlan->numvtaps++;
-
out:
spin_unlock(&macvtap_lock);
return err;
}
/*
- * The file owning the queue got closed, give up both
- * the reference that the files holds as well as the
- * one from the macvlan_dev if that still exists.
+ * The file owning the queue got closed or disabled.
+ *
+ * If closed give up both the reference that the files holds as well as the
+ * one from the macvlan_dev if that still exists. If disabled, reshuffle the
+ * array of taps.
*
* Using the spinlock makes sure that we don't get
* to the queue again after destroying it.
*/
-static void macvtap_put_queue(struct macvtap_queue *q)
+static int macvtap_put_queue(struct macvtap_queue *q, bool clean)
{
- struct macvtap_queue *nq;
struct macvlan_dev *vlan;
+ int err = -EINVAL;
spin_lock(&macvtap_lock);
vlan = rcu_dereference_protected(q->vlan,
lockdep_is_held(&macvtap_lock));
+
if (vlan) {
+ int total = vlan->numvtaps + vlan->numdisabled;
int index = q->queue_index;
- BUG_ON(index >= vlan->numvtaps);
+ bool disabled = q->queue_index >= vlan->numvtaps;
- rcu_assign_pointer(vlan->taps[index],
- vlan->taps[vlan->numvtaps - 1]);
- RCU_INIT_POINTER(q->vlan, NULL);
- nq = rcu_dereference_protected(vlan->taps[index],
- lockdep_is_held(&macvtap_lock));
- nq->queue_index = index;
+ if (!clean) {
+ BUG_ON(q->queue_index >= total);
+ if (q->queue_index >= vlan->numvtaps)
+ goto out;
+ }
- sock_put(&q->sk);
- --vlan->numvtaps;
- }
+ err = 0;
+ macvtap_swap_slot(vlan, index, total - 1);
+ if (!disabled && vlan->numdisabled)
+ macvtap_swap_slot(vlan, index, vlan->numvtaps - 1);
+
+ if (clean) {
+ RCU_INIT_POINTER(vlan->taps[total - 1], NULL);
+ RCU_INIT_POINTER(q->vlan, NULL);
+ sock_put(&q->sk);
+ if (disabled)
+ --vlan->numdisabled;
+ else
+ --vlan->numvtaps;
+ } else {
+ --vlan->numvtaps;
+ ++vlan->numdisabled;
+ }
+
+ } else if (clean)
+ err = 0;
+out:
spin_unlock(&macvtap_lock);
synchronize_rcu();
- sock_put(&q->sk);
+
+ if (clean)
+ sock_put(&q->sk);
+
+ return err;
}
/*
@@ -201,7 +265,7 @@ static void macvtap_del_queues(struct net_device *dev)
/* macvtap_put_queue can free some slots, so go through all slots */
spin_lock(&macvtap_lock);
- for (i = 0; i < vlan->numvtaps; i++) {
+ for (i = 0; i < vlan->numvtaps + vlan->numdisabled; i++) {
q = rcu_dereference_protected(vlan->taps[i],
lockdep_is_held(&macvtap_lock));
BUG_ON(q == NULL);
@@ -211,6 +275,7 @@ static void macvtap_del_queues(struct net_device *dev)
}
/* guarantee that any future macvtap_set_queue will fail */
vlan->numvtaps = MAX_MACVTAP_QUEUES;
+ vlan->numdisabled = 0;
spin_unlock(&macvtap_lock);
synchronize_rcu();
@@ -384,7 +449,7 @@ static int macvtap_open(struct inode *inode, struct file *file)
if ((dev->features & NETIF_F_HIGHDMA) && (dev->features & NETIF_F_SG))
sock_set_flag(&q->sk, SOCK_ZEROCOPY);
- err = macvtap_set_queue(dev, file, q);
+ err = macvtap_set_queue(dev, file, q, false);
if (err)
sock_put(&q->sk);
@@ -398,7 +463,7 @@ out:
static int macvtap_release(struct inode *inode, struct file *file)
{
struct macvtap_queue *q = file->private_data;
- macvtap_put_queue(q);
+ macvtap_put_queue(q, true);
return 0;
}
@@ -922,6 +987,27 @@ static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
return 0;
}
+static int macvtap_ioctl_set_queue(struct file *file, unsigned int flags)
+{
+ struct macvtap_queue *q = file->private_data;
+ struct macvlan_dev *vlan;
+ int ret = -EINVAL;
+
+ vlan = macvtap_get_vlan(q);
+ if (!vlan)
+ goto done;
+
+ if (flags & IFF_ATTACH_QUEUE)
+ ret = macvtap_set_queue(vlan->dev, file, q, true);
+ else if (flags & IFF_DETACH_QUEUE)
+ ret = macvtap_put_queue(q, false);
+
+ macvtap_put_vlan(vlan);
+
+done:
+ return ret;
+}
+
/*
* provide compatibility with generic tun/tap interface
*/
@@ -954,6 +1040,11 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
macvtap_put_vlan(vlan);
return ret;
+ case TUNSETQUEUE:
+ if (get_user(u, &ifr->ifr_flags))
+ return -EFAULT;
+ return macvtap_ioctl_set_queue(file, u);
+
case TUNGETFEATURES:
if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
return -EFAULT;
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 32e943a..f134afb 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -69,8 +69,15 @@ struct macvlan_dev {
u16 flags;
int (*receive)(struct sk_buff *skb);
int (*forward)(struct net_device *dev, struct sk_buff *skb);
+ /* This array tracks all taps (include disabled ones) and will be
+ * reshuffled to keep the following order:
+ * [0, numvtaps) : enabled taps,
+ * [numvtaps, numvtaps + numdisabled) : disabled taps,
+ * [numvtaps + numdisabled, MAX_MACVTAP_QUEUES) : unused slots
+ */
struct macvtap_queue *taps[MAX_MACVTAP_QUEUES];
int numvtaps;
+ int numdisabled;
int minor;
};
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [net-next RFC 8/8] macvtap: enable multiqueue flag
2013-05-23 3:12 [net-next RFC 0/8] multiqueue API support for macvtap Jason Wang
` (6 preceding siblings ...)
2013-05-23 3:12 ` [net-next RFC 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
@ 2013-05-23 3:12 ` Jason Wang
2013-05-23 11:53 ` [net-next RFC 0/8] multiqueue API support for macvtap Michael S. Tsirkin
8 siblings, 0 replies; 23+ messages in thread
From: Jason Wang @ 2013-05-23 3:12 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/macvtap.c | 7 ++-----
1 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 06b10a5..6a4af74 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -31,10 +31,6 @@
* macvtap_proto is used to allocate queues through the sock allocation
* mechanism.
*
- * TODO: multiqueue support is currently not implemented, even though
- * macvtap is basically prepared for that. We will need to add this
- * here as well as in virtio-net and qemu to get line rate on 10gbit
- * adapters from a guest.
*/
struct macvtap_queue {
struct sock sk;
@@ -1046,7 +1042,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
return macvtap_ioctl_set_queue(file, u);
case TUNGETFEATURES:
- if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
+ if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR |
+ IFF_MULTI_QUEUE, up))
return -EFAULT;
return 0;
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [net-next RFC 4/8] macvlan: reduce the max number of taps to 8
2013-05-23 3:12 ` [net-next RFC 4/8] macvlan: reduce the max number of taps to 8 Jason Wang
@ 2013-05-23 6:37 ` Michael S. Tsirkin
2013-05-24 5:14 ` Jason Wang
0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2013-05-23 6:37 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Thu, May 23, 2013 at 11:12:29AM +0800, Jason Wang wrote:
> To be same with tap.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Well for tap the very specific reason was that
there's an array of big queue structures,
so we need to limit it to make it fit in a page.
No such reason here right?
We need at least as much as tap to be compatible, so
let's just make it 16 unconditionally?
> ---
> include/linux/if_macvlan.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
> index e47ad46..32e943a 100644
> --- a/include/linux/if_macvlan.h
> +++ b/include/linux/if_macvlan.h
> @@ -50,7 +50,7 @@ struct macvlan_pcpu_stats {
> * Maximum times a macvtap device can be opened. This can be used to
> * configure the number of receive queue, e.g. for multiqueue virtio.
> */
> -#define MAX_MACVTAP_QUEUES (NR_CPUS < 16 ? NR_CPUS : 16)
> +#define MAX_MACVTAP_QUEUES 8
>
> #define MACVLAN_MC_FILTER_BITS 8
> #define MACVLAN_MC_FILTER_SZ (1 << MACVLAN_MC_FILTER_BITS)
> --
> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 1/8] macvlan: switch to use IS_ENABLED()
2013-05-23 3:12 ` [net-next RFC 1/8] macvlan: switch to use IS_ENABLED() Jason Wang
@ 2013-05-23 11:31 ` Michael S. Tsirkin
0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2013-05-23 11:31 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Thu, May 23, 2013 at 11:12:26AM +0800, Jason Wang wrote:
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> include/linux/if_macvlan.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
> index 84dde1d..e47ad46 100644
> --- a/include/linux/if_macvlan.h
> +++ b/include/linux/if_macvlan.h
> @@ -8,7 +8,7 @@
> #include <net/netlink.h>
> #include <linux/u64_stats_sync.h>
>
> -#if defined(CONFIG_MACVTAP) || defined(CONFIG_MACVTAP_MODULE)
> +#if IS_ENABLED(CONFIG_MACVTAP)
> struct socket *macvtap_get_socket(struct file *);
> #else
> #include <linux/err.h>
> --
> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 5/8] macvtap: eliminate linear search
2013-05-23 3:12 ` [net-next RFC 5/8] macvtap: eliminate linear search Jason Wang
@ 2013-05-23 11:41 ` Michael S. Tsirkin
2013-05-24 5:33 ` Jason Wang
0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2013-05-23 11:41 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Thu, May 23, 2013 at 11:12:30AM +0800, Jason Wang wrote:
> Linear search were used in both get_slot() and macvtap_get_queue(), this is
> because:
>
> - macvtap didn't reshuffle the array of taps when create or destroy a queue, so
> when adding a new queue, macvtap must do linear search to find a location for
> the new queue. This will also complicate the TUNSETQUEUE implementation for
> multiqueue API.
> - the queue itself didn't track the queue index, so the we must do a linear
> search in the array to find the location of a existed queue.
>
> The solution is straightforward: reshuffle the array and introduce a queue_index
> to macvtap_queue.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/macvtap.c | 63 +++++++++++++++---------------------------------
> 1 files changed, 20 insertions(+), 43 deletions(-)
>
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index a36e49e..5fd341c 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -44,6 +44,7 @@ struct macvtap_queue {
> struct macvlan_dev __rcu *vlan;
> struct file *file;
> unsigned int flags;
> + u16 queue_index;
> };
>
> static struct proto macvtap_proto = {
> @@ -84,30 +85,10 @@ static const struct proto_ops macvtap_socket_ops;
> */
> static DEFINE_SPINLOCK(macvtap_lock);
>
> -/*
> - * get_slot: return a [unused/occupied] slot in vlan->taps[]:
> - * - if 'q' is NULL, return the first empty slot;
> - * - otherwise, return the slot this pointer occupies.
> - */
> -static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
> -{
> - int i;
> -
> - for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
> - if (rcu_dereference_protected(vlan->taps[i],
> - lockdep_is_held(&macvtap_lock)) == q)
> - return i;
> - }
> -
> - /* Should never happen */
> - BUG_ON(1);
> -}
> -
> static int macvtap_set_queue(struct net_device *dev, struct file *file,
> struct macvtap_queue *q)
> {
> struct macvlan_dev *vlan = netdev_priv(dev);
> - int index;
> int err = -EBUSY;
>
> spin_lock(&macvtap_lock);
> @@ -115,12 +96,12 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
> goto out;
>
> err = 0;
> - index = get_slot(vlan, NULL);
> rcu_assign_pointer(q->vlan, vlan);
> - rcu_assign_pointer(vlan->taps[index], q);
> + rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
> sock_hold(&q->sk);
>
> q->file = file;
> + q->queue_index = vlan->numvtaps;
> file->private_data = q;
>
> vlan->numvtaps++;
> @@ -140,16 +121,23 @@ out:
> */
> static void macvtap_put_queue(struct macvtap_queue *q)
> {
> + struct macvtap_queue *nq;
> struct macvlan_dev *vlan;
>
> spin_lock(&macvtap_lock);
> vlan = rcu_dereference_protected(q->vlan,
> lockdep_is_held(&macvtap_lock));
> if (vlan) {
> - int index = get_slot(vlan, q);
> + int index = q->queue_index;
> + BUG_ON(index >= vlan->numvtaps);
>
> - RCU_INIT_POINTER(vlan->taps[index], NULL);
> + rcu_assign_pointer(vlan->taps[index],
> + vlan->taps[vlan->numvtaps - 1]);
> RCU_INIT_POINTER(q->vlan, NULL);
> + nq = rcu_dereference_protected(vlan->taps[index],
> + lockdep_is_held(&macvtap_lock));
> + nq->queue_index = index;
> +
> sock_put(&q->sk);
Hmm this looks like a bug BTW: sock_put should be after
no one uses this tap, so must be after synchronize_rcu.
No?
> --vlan->numvtaps;
> }
So:
nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
lockdep_is_held(&macvtap_lock));
rcu_assign_pointer(vlan->taps[index], nq);
same thing?
> @@ -182,8 +170,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
> rxq = skb_get_rxhash(skb);
> if (rxq) {
> tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
> - if (tap)
> - goto out;
> + goto out;
> }
>
> if (likely(skb_rx_queue_recorded(skb))) {
> @@ -193,17 +180,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
> rxq -= numvtaps;
>
> tap = rcu_dereference(vlan->taps[rxq]);
> - if (tap)
> - goto out;
> - }
> -
> - /* Everything failed - find first available queue */
> - for (rxq = 0; rxq < MAX_MACVTAP_QUEUES; rxq++) {
> - tap = rcu_dereference(vlan->taps[rxq]);
> - if (tap)
> - break;
> + goto out;
> }
>
> + tap = rcu_dereference(vlan->taps[0]);
> out:
> return tap;
> }
> @@ -221,17 +201,14 @@ static void macvtap_del_queues(struct net_device *dev)
>
> /* macvtap_put_queue can free some slots, so go through all slots */
> spin_lock(&macvtap_lock);
> - for (i = 0; i < MAX_MACVTAP_QUEUES && vlan->numvtaps; i++) {
> + for (i = 0; i < vlan->numvtaps; i++) {
> q = rcu_dereference_protected(vlan->taps[i],
> lockdep_is_held(&macvtap_lock));
> - if (q) {
> - qlist[j++] = q;
> - RCU_INIT_POINTER(vlan->taps[i], NULL);
> - RCU_INIT_POINTER(q->vlan, NULL);
> - vlan->numvtaps--;
> - }
> + BUG_ON(q == NULL);
> + qlist[j++] = q;
> + RCU_INIT_POINTER(vlan->taps[i], NULL);
> + RCU_INIT_POINTER(q->vlan, NULL);
> }
> - BUG_ON(vlan->numvtaps != 0);
> /* guarantee that any future macvtap_set_queue will fail */
> vlan->numvtaps = MAX_MACVTAP_QUEUES;
> spin_unlock(&macvtap_lock);
> --
> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 6/8] macvtap: allow TUNSETIFF to create multiqueue device
2013-05-23 3:12 ` [net-next RFC 6/8] macvtap: allow TUNSETIFF to create multiqueue device Jason Wang
@ 2013-05-23 11:45 ` Michael S. Tsirkin
2013-05-24 5:36 ` Jason Wang
0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2013-05-23 11:45 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Thu, May 23, 2013 at 11:12:31AM +0800, Jason Wang wrote:
> Though the queue were in fact created by open(), we still need to add this check
> to be compatible with tuntap which can let mgmt software use a single API to
> manage queues. This patch only validates the device name and moves the TUNSETIFF
> to a helper.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/macvtap.c | 50 ++++++++++++++++++++++++++++++++++++++----------
> 1 files changed, 39 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 5fd341c..e3f9344 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -867,6 +867,7 @@ out:
> return ret;
> }
>
> +
Please don't add empty lines like this :).
> static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
> {
> struct macvlan_dev *vlan;
> @@ -884,6 +885,43 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
> dev_put(vlan->dev);
> }
>
> +static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
> +{
> + struct macvtap_queue *q = file->private_data;
> + struct net *net = current->nsproxy->net_ns;
> + struct inode *inode = file_inode(file);
> + struct net_device *dev, *dev2;
> + struct ifreq ifr;
> +
> + if (copy_from_user(&ifr, ifr_u, sizeof(struct ifreq)))
> + return -EFAULT;
> +
> + if (ifr.ifr_flags & IFF_MULTI_QUEUE) {
So why do we validate the name?
Pls add a comment.
> + dev = __dev_get_by_name(net, ifr.ifr_name);
> + if (!dev)
> + return -EINVAL;
> +
> + dev2 = dev_get_by_macvtap_minor(iminor(inode));
> + if (!dev2)
> + return -EINVAL;
> +
> + if (dev != dev2) {
> + dev_put(dev2);
> + return -EINVAL;
> + }
> +
> + dev_put(dev2);
> + }
> +
> + if ((ifr.ifr_flags & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
> + (IFF_NO_PI | IFF_TAP))
> + return -EINVAL;
> + else
> + q->flags = ifr.ifr_flags;
> +
> + return 0;
> +}
> +
> /*
> * provide compatibility with generic tun/tap interface
> */
> @@ -902,17 +940,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>
> switch (cmd) {
> case TUNSETIFF:
> - /* ignore the name, just look at flags */
> - if (get_user(u, &ifr->ifr_flags))
> - return -EFAULT;
> -
> - ret = 0;
> - if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
> - ret = -EINVAL;
> - else
> - q->flags = u;
> -
> - return ret;
> + return macvtap_set_iff(file, ifr);
>
> case TUNGETIFF:
> vlan = macvtap_get_vlan(q);
> --
> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 7/8] macvtap: add TUNSETQUEUE ioctl
2013-05-23 3:12 ` [net-next RFC 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
@ 2013-05-23 11:52 ` Michael S. Tsirkin
2013-05-24 6:19 ` Jason Wang
0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2013-05-23 11:52 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Thu, May 23, 2013 at 11:12:32AM +0800, Jason Wang wrote:
> This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
> enable a queue of macvtap. This is used to be compatible at API layer of tuntap
> to simplify the userspace to manage the queues.
>
> This is done by just introduce a pointer detached which was set when the queue
> were detached. Only the queue with NULL set to this pointer can be used to
> forward packets.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Isn't this a bit too tricky?
Why keep disabled queues on the array at all?
Let's add a linked list of all queues for accounting.
Use array for enabled queues only, for data path.
> ---
> drivers/net/macvtap.c | 147 +++++++++++++++++++++++++++++++++++--------
> include/linux/if_macvlan.h | 7 ++
> 2 files changed, 126 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index e3f9344..06b10a5 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -85,67 +85,131 @@ static const struct proto_ops macvtap_socket_ops;
> */
> static DEFINE_SPINLOCK(macvtap_lock);
>
> +static void macvtap_swap_slot(struct macvlan_dev *vlan, int a, int b)
> +{
> + struct macvtap_queue *q1, *q2;
> +
> + if (a == b)
> + return;
> +
> + q1 = rcu_dereference_protected(vlan->taps[a],
> + lockdep_is_held(&macvtap_lock));
> + q2 = rcu_dereference_protected(vlan->taps[b],
> + lockdep_is_held(&macvtap_lock));
> +
> + BUG_ON(q1 == NULL || q2 == NULL);
> +
> + rcu_assign_pointer(vlan->taps[a], vlan->taps[b]);
> + rcu_assign_pointer(vlan->taps[b], q1);
> +
> + q1->queue_index = b;
> + q2->queue_index = a;
> +}
> +
> static int macvtap_set_queue(struct net_device *dev, struct file *file,
> - struct macvtap_queue *q)
> + struct macvtap_queue *q, bool enable)
> {
> struct macvlan_dev *vlan = netdev_priv(dev);
> int err = -EBUSY;
> + int total;
>
> spin_lock(&macvtap_lock);
> - if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
> +
> + total = vlan->numvtaps + vlan->numdisabled;
> + if (!enable && total == MAX_MACVTAP_QUEUES)
> + goto out;
> +
> + err = -EINVAL;
> + if (enable && q->queue_index < vlan->numvtaps)
> goto out;
>
> err = 0;
> - rcu_assign_pointer(q->vlan, vlan);
> - rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
> - sock_hold(&q->sk);
>
> - q->file = file;
> - q->queue_index = vlan->numvtaps;
> - file->private_data = q;
> + if (enable) {
> + BUG_ON(q->queue_index >= total);
> + macvtap_swap_slot(vlan, q->queue_index, vlan->numvtaps);
> + --vlan->numdisabled;
> + } else {
> + rcu_assign_pointer(q->vlan, vlan);
> + rcu_assign_pointer(vlan->taps[total], q);
> + sock_hold(&q->sk);
> +
> + q->file = file;
> + q->queue_index = total;
> + file->private_data = q;
> + if (vlan->numdisabled)
> + macvtap_swap_slot(vlan, vlan->numvtaps, total);
> +
> + /* Make sure the pointers were seen before numvtaps. */
> + smp_wmb();
> + }
>
> vlan->numvtaps++;
> -
> out:
> spin_unlock(&macvtap_lock);
> return err;
> }
>
> /*
> - * The file owning the queue got closed, give up both
> - * the reference that the files holds as well as the
> - * one from the macvlan_dev if that still exists.
> + * The file owning the queue got closed or disabled.
> + *
> + * If closed give up both the reference that the files holds as well as the
> + * one from the macvlan_dev if that still exists. If disabled, reshuffle the
> + * array of taps.
> *
> * Using the spinlock makes sure that we don't get
> * to the queue again after destroying it.
> */
> -static void macvtap_put_queue(struct macvtap_queue *q)
> +static int macvtap_put_queue(struct macvtap_queue *q, bool clean)
s/clean/release/
and add a comment explaining what it is.
> {
> - struct macvtap_queue *nq;
> struct macvlan_dev *vlan;
> + int err = -EINVAL;
>
> spin_lock(&macvtap_lock);
> vlan = rcu_dereference_protected(q->vlan,
> lockdep_is_held(&macvtap_lock));
> +
> if (vlan) {
> + int total = vlan->numvtaps + vlan->numdisabled;
> int index = q->queue_index;
> - BUG_ON(index >= vlan->numvtaps);
> + bool disabled = q->queue_index >= vlan->numvtaps;
>
> - rcu_assign_pointer(vlan->taps[index],
> - vlan->taps[vlan->numvtaps - 1]);
> - RCU_INIT_POINTER(q->vlan, NULL);
> - nq = rcu_dereference_protected(vlan->taps[index],
> - lockdep_is_held(&macvtap_lock));
> - nq->queue_index = index;
> + if (!clean) {
> + BUG_ON(q->queue_index >= total);
> + if (q->queue_index >= vlan->numvtaps)
> + goto out;
> + }
>
> - sock_put(&q->sk);
> - --vlan->numvtaps;
> - }
> + err = 0;
> + macvtap_swap_slot(vlan, index, total - 1);
> + if (!disabled && vlan->numdisabled)
What's the logic here?
> + macvtap_swap_slot(vlan, index, vlan->numvtaps - 1);
So these swaps do a lot of RCU assign, presumably we need
synchronize_rcu as well?
> +
> + if (clean) {
> + RCU_INIT_POINTER(vlan->taps[total - 1], NULL);
> + RCU_INIT_POINTER(q->vlan, NULL);
> + sock_put(&q->sk);
> + if (disabled)
> + --vlan->numdisabled;
> + else
> + --vlan->numvtaps;
> + } else {
> + --vlan->numvtaps;
> + ++vlan->numdisabled;
> + }
> +
> + } else if (clean)
> + err = 0;
>
> +out:
> spin_unlock(&macvtap_lock);
>
> synchronize_rcu();
> - sock_put(&q->sk);
> +
> + if (clean)
> + sock_put(&q->sk);
> +
> + return err;
> }
>
> /*
> @@ -201,7 +265,7 @@ static void macvtap_del_queues(struct net_device *dev)
>
> /* macvtap_put_queue can free some slots, so go through all slots */
> spin_lock(&macvtap_lock);
> - for (i = 0; i < vlan->numvtaps; i++) {
> + for (i = 0; i < vlan->numvtaps + vlan->numdisabled; i++) {
> q = rcu_dereference_protected(vlan->taps[i],
> lockdep_is_held(&macvtap_lock));
> BUG_ON(q == NULL);
> @@ -211,6 +275,7 @@ static void macvtap_del_queues(struct net_device *dev)
> }
> /* guarantee that any future macvtap_set_queue will fail */
> vlan->numvtaps = MAX_MACVTAP_QUEUES;
> + vlan->numdisabled = 0;
> spin_unlock(&macvtap_lock);
>
> synchronize_rcu();
> @@ -384,7 +449,7 @@ static int macvtap_open(struct inode *inode, struct file *file)
> if ((dev->features & NETIF_F_HIGHDMA) && (dev->features & NETIF_F_SG))
> sock_set_flag(&q->sk, SOCK_ZEROCOPY);
>
> - err = macvtap_set_queue(dev, file, q);
> + err = macvtap_set_queue(dev, file, q, false);
> if (err)
> sock_put(&q->sk);
>
> @@ -398,7 +463,7 @@ out:
> static int macvtap_release(struct inode *inode, struct file *file)
> {
> struct macvtap_queue *q = file->private_data;
> - macvtap_put_queue(q);
> + macvtap_put_queue(q, true);
> return 0;
> }
>
> @@ -922,6 +987,27 @@ static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
> return 0;
> }
>
> +static int macvtap_ioctl_set_queue(struct file *file, unsigned int flags)
> +{
> + struct macvtap_queue *q = file->private_data;
> + struct macvlan_dev *vlan;
> + int ret = -EINVAL;
> +
> + vlan = macvtap_get_vlan(q);
> + if (!vlan)
> + goto done;
> +
> + if (flags & IFF_ATTACH_QUEUE)
> + ret = macvtap_set_queue(vlan->dev, file, q, true);
> + else if (flags & IFF_DETACH_QUEUE)
> + ret = macvtap_put_queue(q, false);
> +
> + macvtap_put_vlan(vlan);
> +
> +done:
> + return ret;
> +}
> +
> /*
> * provide compatibility with generic tun/tap interface
> */
> @@ -954,6 +1040,11 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
> macvtap_put_vlan(vlan);
> return ret;
>
> + case TUNSETQUEUE:
> + if (get_user(u, &ifr->ifr_flags))
> + return -EFAULT;
> + return macvtap_ioctl_set_queue(file, u);
> +
> case TUNGETFEATURES:
> if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
> return -EFAULT;
> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
> index 32e943a..f134afb 100644
> --- a/include/linux/if_macvlan.h
> +++ b/include/linux/if_macvlan.h
> @@ -69,8 +69,15 @@ struct macvlan_dev {
> u16 flags;
> int (*receive)(struct sk_buff *skb);
> int (*forward)(struct net_device *dev, struct sk_buff *skb);
> + /* This array tracks all taps (include disabled ones) and will be
> + * reshuffled to keep the following order:
> + * [0, numvtaps) : enabled taps,
> + * [numvtaps, numvtaps + numdisabled) : disabled taps,
> + * [numvtaps + numdisabled, MAX_MACVTAP_QUEUES) : unused slots
> + */
> struct macvtap_queue *taps[MAX_MACVTAP_QUEUES];
> int numvtaps;
> + int numdisabled;
> int minor;
> };
>
> --
> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 0/8] multiqueue API support for macvtap
2013-05-23 3:12 [net-next RFC 0/8] multiqueue API support for macvtap Jason Wang
` (7 preceding siblings ...)
2013-05-23 3:12 ` [net-next RFC 8/8] macvtap: enable multiqueue flag Jason Wang
@ 2013-05-23 11:53 ` Michael S. Tsirkin
8 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2013-05-23 11:53 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Thu, May 23, 2013 at 11:12:25AM +0800, Jason Wang wrote:
> Hi all:
>
> This series implements a fully tuntap compatiable API which could be used by
> userspace to manage multi macvtap queues. The main parts is to add TUNSETQUEUE
> ioctl support for macvtap.
>
> Patch 1 - 4 was some tuntap compatibility and misc cleanups.
> Patch 5 removes the linear search in macvtap by reshuffling the macvtaps array
> each time a queue is removed. After this, we could store both enabled and
> disabled queues in the same array without introducing extra data structure.
> Patch 6 let TUNSETIFF can create multiqueue device, nothing but some check were
> added.
> Patch 7 implement TUNSETQUEUE ioctl
> Patch 8 reports IFF_MULTI_QUEUE to userspace to notify the userspace that the
> multiqueue API is completed.
>
> Flow caches implememtation were missed in this version, since I am doing rework
> on the tuntap flow caches. Have some some stress test with both netperf and
> pktgen.
>
> Please review, thanks.
Overall looks good. Sent some comments on specific patches.
Thanks!
> Jason Wang (8):
> macvlan: switch to use IS_ENABLED()
> macvtap: return -EBADFD when TUNGETIFF fails
> macvtap: introduce macvtap_get_vlan()
> macvlan: reduce the max number of taps to 8
> macvtap: eliminate linear search
> macvtap: allow TUNSETIFF to create multiqueue device
> macvtap: add TUNSETQUEUE ioctl
> macvtap: enable multiqueue flag
>
> drivers/net/macvtap.c | 265 ++++++++++++++++++++++++++++++-------------
> include/linux/if_macvlan.h | 11 ++-
> 2 files changed, 194 insertions(+), 82 deletions(-)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 2/8] macvtap: return -EBADFD when TUNGETIFF fails
2013-05-23 3:12 ` [net-next RFC 2/8] macvtap: return -EBADFD when TUNGETIFF fails Jason Wang
@ 2013-05-23 11:54 ` Michael S. Tsirkin
2013-05-24 6:28 ` Jason Wang
0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2013-05-23 11:54 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Thu, May 23, 2013 at 11:12:27AM +0800, Jason Wang wrote:
> Tuntap return -EBADFD when TUNGETIFF fails, we should return the same value.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Can you add some more comments on why this matters?
Any userspace that cares will have to handle both, right?
> ---
> drivers/net/macvtap.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 59e9605..ce1c72a 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -928,7 +928,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
> rcu_read_unlock_bh();
>
> if (!vlan)
> - return -ENOLINK;
> + return -EBADFD;
>
> ret = 0;
> if (copy_to_user(&ifr->ifr_name, vlan->dev->name, IFNAMSIZ) ||
> --
> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 3/8] macvtap: introduce macvtap_get_vlan()
2013-05-23 3:12 ` [net-next RFC 3/8] macvtap: introduce macvtap_get_vlan() Jason Wang
@ 2013-05-23 15:11 ` Sergei Shtylyov
2013-05-24 6:21 ` Jason Wang
0 siblings, 1 reply; 23+ messages in thread
From: Sergei Shtylyov @ 2013-05-23 15:11 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, mst, netdev, linux-kernel
Hello.
On 23-05-2013 7:12, Jason Wang wrote:
> Factor out the device holding logic to a macvtap_get_vlan(), this will be also
> used by multiqueue API.
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/macvtap.c | 26 +++++++++++++++++++-------
> 1 files changed, 19 insertions(+), 7 deletions(-)
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index ce1c72a..a36e49e 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -890,6 +890,23 @@ out:
> return ret;
> }
>
> +static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
> +{
> + struct macvlan_dev *vlan;
Empty line wouldn't hurt here, after declaration.
> + rcu_read_lock_bh();
> + vlan = rcu_dereference_bh(q->vlan);
> + if (vlan)
> + dev_hold(vlan->dev);
> + rcu_read_unlock_bh();
> +
> + return vlan;
> +}
[...]
WBR, Sergei
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 4/8] macvlan: reduce the max number of taps to 8
2013-05-23 6:37 ` Michael S. Tsirkin
@ 2013-05-24 5:14 ` Jason Wang
0 siblings, 0 replies; 23+ messages in thread
From: Jason Wang @ 2013-05-24 5:14 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel
On 05/23/2013 02:37 PM, Michael S. Tsirkin wrote:
> On Thu, May 23, 2013 at 11:12:29AM +0800, Jason Wang wrote:
>> To be same with tap.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Well for tap the very specific reason was that
> there's an array of big queue structures,
> so we need to limit it to make it fit in a page.
> No such reason here right?
Right, one of the reason is that the flow caches itself occupies too
much space, I'd move it out of tun_struct.
>
> We need at least as much as tap to be compatible, so
> let's just make it 16 unconditionally?
Ok, in fact we can make it even more.
>
>> ---
>> include/linux/if_macvlan.h | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
>> index e47ad46..32e943a 100644
>> --- a/include/linux/if_macvlan.h
>> +++ b/include/linux/if_macvlan.h
>> @@ -50,7 +50,7 @@ struct macvlan_pcpu_stats {
>> * Maximum times a macvtap device can be opened. This can be used to
>> * configure the number of receive queue, e.g. for multiqueue virtio.
>> */
>> -#define MAX_MACVTAP_QUEUES (NR_CPUS < 16 ? NR_CPUS : 16)
>> +#define MAX_MACVTAP_QUEUES 8
>>
>> #define MACVLAN_MC_FILTER_BITS 8
>> #define MACVLAN_MC_FILTER_SZ (1 << MACVLAN_MC_FILTER_BITS)
>> --
>> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 5/8] macvtap: eliminate linear search
2013-05-23 11:41 ` Michael S. Tsirkin
@ 2013-05-24 5:33 ` Jason Wang
0 siblings, 0 replies; 23+ messages in thread
From: Jason Wang @ 2013-05-24 5:33 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel
On 05/23/2013 07:41 PM, Michael S. Tsirkin wrote:
> On Thu, May 23, 2013 at 11:12:30AM +0800, Jason Wang wrote:
>> Linear search were used in both get_slot() and macvtap_get_queue(), this is
>> because:
>>
>> - macvtap didn't reshuffle the array of taps when create or destroy a queue, so
>> when adding a new queue, macvtap must do linear search to find a location for
>> the new queue. This will also complicate the TUNSETQUEUE implementation for
>> multiqueue API.
>> - the queue itself didn't track the queue index, so the we must do a linear
>> search in the array to find the location of a existed queue.
>>
>> The solution is straightforward: reshuffle the array and introduce a queue_index
>> to macvtap_queue.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/net/macvtap.c | 63 +++++++++++++++---------------------------------
>> 1 files changed, 20 insertions(+), 43 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index a36e49e..5fd341c 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -44,6 +44,7 @@ struct macvtap_queue {
>> struct macvlan_dev __rcu *vlan;
>> struct file *file;
>> unsigned int flags;
>> + u16 queue_index;
>> };
>>
>> static struct proto macvtap_proto = {
>> @@ -84,30 +85,10 @@ static const struct proto_ops macvtap_socket_ops;
>> */
>> static DEFINE_SPINLOCK(macvtap_lock);
>>
>> -/*
>> - * get_slot: return a [unused/occupied] slot in vlan->taps[]:
>> - * - if 'q' is NULL, return the first empty slot;
>> - * - otherwise, return the slot this pointer occupies.
>> - */
>> -static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
>> -{
>> - int i;
>> -
>> - for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
>> - if (rcu_dereference_protected(vlan->taps[i],
>> - lockdep_is_held(&macvtap_lock)) == q)
>> - return i;
>> - }
>> -
>> - /* Should never happen */
>> - BUG_ON(1);
>> -}
>> -
>> static int macvtap_set_queue(struct net_device *dev, struct file *file,
>> struct macvtap_queue *q)
>> {
>> struct macvlan_dev *vlan = netdev_priv(dev);
>> - int index;
>> int err = -EBUSY;
>>
>> spin_lock(&macvtap_lock);
>> @@ -115,12 +96,12 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
>> goto out;
>>
>> err = 0;
>> - index = get_slot(vlan, NULL);
>> rcu_assign_pointer(q->vlan, vlan);
>> - rcu_assign_pointer(vlan->taps[index], q);
>> + rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
>> sock_hold(&q->sk);
>>
>> q->file = file;
>> + q->queue_index = vlan->numvtaps;
>> file->private_data = q;
>>
>> vlan->numvtaps++;
>> @@ -140,16 +121,23 @@ out:
>> */
>> static void macvtap_put_queue(struct macvtap_queue *q)
>> {
>> + struct macvtap_queue *nq;
>> struct macvlan_dev *vlan;
>>
>> spin_lock(&macvtap_lock);
>> vlan = rcu_dereference_protected(q->vlan,
>> lockdep_is_held(&macvtap_lock));
>> if (vlan) {
>> - int index = get_slot(vlan, q);
>> + int index = q->queue_index;
>> + BUG_ON(index >= vlan->numvtaps);
>>
>> - RCU_INIT_POINTER(vlan->taps[index], NULL);
>> + rcu_assign_pointer(vlan->taps[index],
>> + vlan->taps[vlan->numvtaps - 1]);
>> RCU_INIT_POINTER(q->vlan, NULL);
>> + nq = rcu_dereference_protected(vlan->taps[index],
>> + lockdep_is_held(&macvtap_lock));
>> + nq->queue_index = index;
>> +
>> sock_put(&q->sk);
> Hmm this looks like a bug BTW: sock_put should be after
> no one uses this tap, so must be after synchronize_rcu.
> No?
Two refcnts were held, one is the socket itself (and we can safely put
it here), another is held by device when doing macvtap_set_queue(), it
was released by another sock_put() after synchronize_rcu().
>
>> --vlan->numvtaps;
>> }
> So:
> nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
> lockdep_is_held(&macvtap_lock));
> rcu_assign_pointer(vlan->taps[index], nq);
>
> same thing?
Same and cleaner, thanks.
>
>> @@ -182,8 +170,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
>> rxq = skb_get_rxhash(skb);
>> if (rxq) {
>> tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
>> - if (tap)
>> - goto out;
>> + goto out;
>> }
>>
>> if (likely(skb_rx_queue_recorded(skb))) {
>> @@ -193,17 +180,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
>> rxq -= numvtaps;
>>
>> tap = rcu_dereference(vlan->taps[rxq]);
>> - if (tap)
>> - goto out;
>> - }
>> -
>> - /* Everything failed - find first available queue */
>> - for (rxq = 0; rxq < MAX_MACVTAP_QUEUES; rxq++) {
>> - tap = rcu_dereference(vlan->taps[rxq]);
>> - if (tap)
>> - break;
>> + goto out;
>> }
>>
>> + tap = rcu_dereference(vlan->taps[0]);
>> out:
>> return tap;
>> }
>> @@ -221,17 +201,14 @@ static void macvtap_del_queues(struct net_device *dev)
>>
>> /* macvtap_put_queue can free some slots, so go through all slots */
>> spin_lock(&macvtap_lock);
>> - for (i = 0; i < MAX_MACVTAP_QUEUES && vlan->numvtaps; i++) {
>> + for (i = 0; i < vlan->numvtaps; i++) {
>> q = rcu_dereference_protected(vlan->taps[i],
>> lockdep_is_held(&macvtap_lock));
>> - if (q) {
>> - qlist[j++] = q;
>> - RCU_INIT_POINTER(vlan->taps[i], NULL);
>> - RCU_INIT_POINTER(q->vlan, NULL);
>> - vlan->numvtaps--;
>> - }
>> + BUG_ON(q == NULL);
>> + qlist[j++] = q;
>> + RCU_INIT_POINTER(vlan->taps[i], NULL);
>> + RCU_INIT_POINTER(q->vlan, NULL);
>> }
>> - BUG_ON(vlan->numvtaps != 0);
>> /* guarantee that any future macvtap_set_queue will fail */
>> vlan->numvtaps = MAX_MACVTAP_QUEUES;
>> spin_unlock(&macvtap_lock);
>> --
>> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 6/8] macvtap: allow TUNSETIFF to create multiqueue device
2013-05-23 11:45 ` Michael S. Tsirkin
@ 2013-05-24 5:36 ` Jason Wang
0 siblings, 0 replies; 23+ messages in thread
From: Jason Wang @ 2013-05-24 5:36 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel
On 05/23/2013 07:45 PM, Michael S. Tsirkin wrote:
> On Thu, May 23, 2013 at 11:12:31AM +0800, Jason Wang wrote:
>> Though the queue were in fact created by open(), we still need to add this check
>> to be compatible with tuntap which can let mgmt software use a single API to
>> manage queues. This patch only validates the device name and moves the TUNSETIFF
>> to a helper.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/net/macvtap.c | 50 ++++++++++++++++++++++++++++++++++++++----------
>> 1 files changed, 39 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index 5fd341c..e3f9344 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -867,6 +867,7 @@ out:
>> return ret;
>> }
>>
>> +
> Please don't add empty lines like this :).
Sure.
>
>> static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
>> {
>> struct macvlan_dev *vlan;
>> @@ -884,6 +885,43 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
>> dev_put(vlan->dev);
>> }
>>
>> +static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
>> +{
>> + struct macvtap_queue *q = file->private_data;
>> + struct net *net = current->nsproxy->net_ns;
>> + struct inode *inode = file_inode(file);
>> + struct net_device *dev, *dev2;
>> + struct ifreq ifr;
>> +
>> + if (copy_from_user(&ifr, ifr_u, sizeof(struct ifreq)))
>> + return -EFAULT;
>> +
>> + if (ifr.ifr_flags & IFF_MULTI_QUEUE) {
> So why do we validate the name?
> Pls add a comment.
Ok. The validation is to prevent the userspace create the queues on the
wrong device. Not sure whether or not this is needed for single queue case.
>
>> + dev = __dev_get_by_name(net, ifr.ifr_name);
>> + if (!dev)
>> + return -EINVAL;
>> +
>> + dev2 = dev_get_by_macvtap_minor(iminor(inode));
>> + if (!dev2)
>> + return -EINVAL;
>> +
>> + if (dev != dev2) {
>> + dev_put(dev2);
>> + return -EINVAL;
>> + }
>> +
>> + dev_put(dev2);
>> + }
>> +
>> + if ((ifr.ifr_flags & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
>> + (IFF_NO_PI | IFF_TAP))
>> + return -EINVAL;
>> + else
>> + q->flags = ifr.ifr_flags;
>> +
>> + return 0;
>> +}
>> +
>> /*
>> * provide compatibility with generic tun/tap interface
>> */
>> @@ -902,17 +940,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>>
>> switch (cmd) {
>> case TUNSETIFF:
>> - /* ignore the name, just look at flags */
>> - if (get_user(u, &ifr->ifr_flags))
>> - return -EFAULT;
>> -
>> - ret = 0;
>> - if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
>> - ret = -EINVAL;
>> - else
>> - q->flags = u;
>> -
>> - return ret;
>> + return macvtap_set_iff(file, ifr);
>>
>> case TUNGETIFF:
>> vlan = macvtap_get_vlan(q);
>> --
>> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 7/8] macvtap: add TUNSETQUEUE ioctl
2013-05-23 11:52 ` Michael S. Tsirkin
@ 2013-05-24 6:19 ` Jason Wang
0 siblings, 0 replies; 23+ messages in thread
From: Jason Wang @ 2013-05-24 6:19 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel
On 05/23/2013 07:52 PM, Michael S. Tsirkin wrote:
> On Thu, May 23, 2013 at 11:12:32AM +0800, Jason Wang wrote:
>> This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
>> enable a queue of macvtap. This is used to be compatible at API layer of tuntap
>> to simplify the userspace to manage the queues.
>>
>> This is done by just introduce a pointer detached which was set when the queue
>> were detached. Only the queue with NULL set to this pointer can be used to
>> forward packets.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Isn't this a bit too tricky?
Well, not tricky and it was just like what we did the reshuffling in
PATCH 5/8
> Why keep disabled queues on the array at all?
Both are ok, but this method need less data structures in macvlan_dev.
> Let's add a linked list of all queues for accounting.
> Use array for enabled queues only, for data path.
This would not be simpler than just do in place reshuffling.
>
>> ---
>> drivers/net/macvtap.c | 147 +++++++++++++++++++++++++++++++++++--------
>> include/linux/if_macvlan.h | 7 ++
>> 2 files changed, 126 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index e3f9344..06b10a5 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -85,67 +85,131 @@ static const struct proto_ops macvtap_socket_ops;
>> */
>> static DEFINE_SPINLOCK(macvtap_lock);
>>
>> +static void macvtap_swap_slot(struct macvlan_dev *vlan, int a, int b)
>> +{
>> + struct macvtap_queue *q1, *q2;
>> +
>> + if (a == b)
>> + return;
>> +
>> + q1 = rcu_dereference_protected(vlan->taps[a],
>> + lockdep_is_held(&macvtap_lock));
>> + q2 = rcu_dereference_protected(vlan->taps[b],
>> + lockdep_is_held(&macvtap_lock));
>> +
>> + BUG_ON(q1 == NULL || q2 == NULL);
>> +
>> + rcu_assign_pointer(vlan->taps[a], vlan->taps[b]);
>> + rcu_assign_pointer(vlan->taps[b], q1);
>> +
>> + q1->queue_index = b;
>> + q2->queue_index = a;
>> +}
>> +
>> static int macvtap_set_queue(struct net_device *dev, struct file *file,
>> - struct macvtap_queue *q)
>> + struct macvtap_queue *q, bool enable)
>> {
>> struct macvlan_dev *vlan = netdev_priv(dev);
>> int err = -EBUSY;
>> + int total;
>>
>> spin_lock(&macvtap_lock);
>> - if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
>> +
>> + total = vlan->numvtaps + vlan->numdisabled;
>> + if (!enable && total == MAX_MACVTAP_QUEUES)
>> + goto out;
>> +
>> + err = -EINVAL;
>> + if (enable && q->queue_index < vlan->numvtaps)
>> goto out;
>>
>> err = 0;
>> - rcu_assign_pointer(q->vlan, vlan);
>> - rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
>> - sock_hold(&q->sk);
>>
>> - q->file = file;
>> - q->queue_index = vlan->numvtaps;
>> - file->private_data = q;
>> + if (enable) {
>> + BUG_ON(q->queue_index >= total);
>> + macvtap_swap_slot(vlan, q->queue_index, vlan->numvtaps);
>> + --vlan->numdisabled;
>> + } else {
>> + rcu_assign_pointer(q->vlan, vlan);
>> + rcu_assign_pointer(vlan->taps[total], q);
>> + sock_hold(&q->sk);
>> +
>> + q->file = file;
>> + q->queue_index = total;
>> + file->private_data = q;
>> + if (vlan->numdisabled)
>> + macvtap_swap_slot(vlan, vlan->numvtaps, total);
>> +
>> + /* Make sure the pointers were seen before numvtaps. */
>> + smp_wmb();
>> + }
>>
>> vlan->numvtaps++;
>> -
>> out:
>> spin_unlock(&macvtap_lock);
>> return err;
>> }
>>
>> /*
>> - * The file owning the queue got closed, give up both
>> - * the reference that the files holds as well as the
>> - * one from the macvlan_dev if that still exists.
>> + * The file owning the queue got closed or disabled.
>> + *
>> + * If closed give up both the reference that the files holds as well as the
>> + * one from the macvlan_dev if that still exists. If disabled, reshuffle the
>> + * array of taps.
>> *
>> * Using the spinlock makes sure that we don't get
>> * to the queue again after destroying it.
>> */
>> -static void macvtap_put_queue(struct macvtap_queue *q)
>> +static int macvtap_put_queue(struct macvtap_queue *q, bool clean)
> s/clean/release/
> and add a comment explaining what it is.
Ok.
>> {
>> - struct macvtap_queue *nq;
>> struct macvlan_dev *vlan;
>> + int err = -EINVAL;
>>
>> spin_lock(&macvtap_lock);
>> vlan = rcu_dereference_protected(q->vlan,
>> lockdep_is_held(&macvtap_lock));
>> +
>> if (vlan) {
>> + int total = vlan->numvtaps + vlan->numdisabled;
>> int index = q->queue_index;
>> - BUG_ON(index >= vlan->numvtaps);
>> + bool disabled = q->queue_index >= vlan->numvtaps;
>>
>> - rcu_assign_pointer(vlan->taps[index],
>> - vlan->taps[vlan->numvtaps - 1]);
>> - RCU_INIT_POINTER(q->vlan, NULL);
>> - nq = rcu_dereference_protected(vlan->taps[index],
>> - lockdep_is_held(&macvtap_lock));
>> - nq->queue_index = index;
>> + if (!clean) {
>> + BUG_ON(q->queue_index >= total);
>> + if (q->queue_index >= vlan->numvtaps)
>> + goto out;
>> + }
>>
>> - sock_put(&q->sk);
>> - --vlan->numvtaps;
>> - }
>> + err = 0;
>> + macvtap_swap_slot(vlan, index, total - 1);
>> + if (!disabled && vlan->numdisabled)
> What's the logic here?
If there's at least one disabled taps, the above swap will move a
disabled tap into enabled area. So we need to do swap it with the last
enabled tap to keep the right order.
>
>> + macvtap_swap_slot(vlan, index, vlan->numvtaps - 1);
> So these swaps do a lot of RCU assign, presumably we need
> synchronize_rcu as well?
There is a synchronize_rcu() below, and we do the cleanup -sock_put()
after that. And since we purge the socket during its destruction, no
need to care about the little possibility that a packet were sent to the
disabled or removed queue.
>
>> +
>> + if (clean) {
>> + RCU_INIT_POINTER(vlan->taps[total - 1], NULL);
>> + RCU_INIT_POINTER(q->vlan, NULL);
>> + sock_put(&q->sk);
>> + if (disabled)
>> + --vlan->numdisabled;
>> + else
>> + --vlan->numvtaps;
>> + } else {
>> + --vlan->numvtaps;
>> + ++vlan->numdisabled;
>> + }
>> +
>> + } else if (clean)
>> + err = 0;
>>
>> +out:
>> spin_unlock(&macvtap_lock);
>>
>> synchronize_rcu();
>> - sock_put(&q->sk);
>> +
>> + if (clean)
>> + sock_put(&q->sk);
>> +
>> + return err;
>> }
>>
>> /*
>> @@ -201,7 +265,7 @@ static void macvtap_del_queues(struct net_device *dev)
>>
>> /* macvtap_put_queue can free some slots, so go through all slots */
>> spin_lock(&macvtap_lock);
>> - for (i = 0; i < vlan->numvtaps; i++) {
>> + for (i = 0; i < vlan->numvtaps + vlan->numdisabled; i++) {
>> q = rcu_dereference_protected(vlan->taps[i],
>> lockdep_is_held(&macvtap_lock));
>> BUG_ON(q == NULL);
>> @@ -211,6 +275,7 @@ static void macvtap_del_queues(struct net_device *dev)
>> }
>> /* guarantee that any future macvtap_set_queue will fail */
>> vlan->numvtaps = MAX_MACVTAP_QUEUES;
>> + vlan->numdisabled = 0;
>> spin_unlock(&macvtap_lock);
>>
>> synchronize_rcu();
>> @@ -384,7 +449,7 @@ static int macvtap_open(struct inode *inode, struct file *file)
>> if ((dev->features & NETIF_F_HIGHDMA) && (dev->features & NETIF_F_SG))
>> sock_set_flag(&q->sk, SOCK_ZEROCOPY);
>>
>> - err = macvtap_set_queue(dev, file, q);
>> + err = macvtap_set_queue(dev, file, q, false);
>> if (err)
>> sock_put(&q->sk);
>>
>> @@ -398,7 +463,7 @@ out:
>> static int macvtap_release(struct inode *inode, struct file *file)
>> {
>> struct macvtap_queue *q = file->private_data;
>> - macvtap_put_queue(q);
>> + macvtap_put_queue(q, true);
>> return 0;
>> }
>>
>> @@ -922,6 +987,27 @@ static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
>> return 0;
>> }
>>
>> +static int macvtap_ioctl_set_queue(struct file *file, unsigned int flags)
>> +{
>> + struct macvtap_queue *q = file->private_data;
>> + struct macvlan_dev *vlan;
>> + int ret = -EINVAL;
>> +
>> + vlan = macvtap_get_vlan(q);
>> + if (!vlan)
>> + goto done;
>> +
>> + if (flags & IFF_ATTACH_QUEUE)
>> + ret = macvtap_set_queue(vlan->dev, file, q, true);
>> + else if (flags & IFF_DETACH_QUEUE)
>> + ret = macvtap_put_queue(q, false);
>> +
>> + macvtap_put_vlan(vlan);
>> +
>> +done:
>> + return ret;
>> +}
>> +
>> /*
>> * provide compatibility with generic tun/tap interface
>> */
>> @@ -954,6 +1040,11 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>> macvtap_put_vlan(vlan);
>> return ret;
>>
>> + case TUNSETQUEUE:
>> + if (get_user(u, &ifr->ifr_flags))
>> + return -EFAULT;
>> + return macvtap_ioctl_set_queue(file, u);
>> +
>> case TUNGETFEATURES:
>> if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
>> return -EFAULT;
>> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
>> index 32e943a..f134afb 100644
>> --- a/include/linux/if_macvlan.h
>> +++ b/include/linux/if_macvlan.h
>> @@ -69,8 +69,15 @@ struct macvlan_dev {
>> u16 flags;
>> int (*receive)(struct sk_buff *skb);
>> int (*forward)(struct net_device *dev, struct sk_buff *skb);
>> + /* This array tracks all taps (include disabled ones) and will be
>> + * reshuffled to keep the following order:
>> + * [0, numvtaps) : enabled taps,
>> + * [numvtaps, numvtaps + numdisabled) : disabled taps,
>> + * [numvtaps + numdisabled, MAX_MACVTAP_QUEUES) : unused slots
>> + */
>> struct macvtap_queue *taps[MAX_MACVTAP_QUEUES];
>> int numvtaps;
>> + int numdisabled;
>> int minor;
>> };
>>
>> --
>> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 3/8] macvtap: introduce macvtap_get_vlan()
2013-05-23 15:11 ` Sergei Shtylyov
@ 2013-05-24 6:21 ` Jason Wang
0 siblings, 0 replies; 23+ messages in thread
From: Jason Wang @ 2013-05-24 6:21 UTC (permalink / raw)
To: Sergei Shtylyov; +Cc: davem, mst, netdev, linux-kernel
On 05/23/2013 11:11 PM, Sergei Shtylyov wrote:
> Hello.
>
> On 23-05-2013 7:12, Jason Wang wrote:
>
>> Factor out the device holding logic to a macvtap_get_vlan(), this
>> will be also
>> used by multiqueue API.
>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/net/macvtap.c | 26 +++++++++++++++++++-------
>> 1 files changed, 19 insertions(+), 7 deletions(-)
>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index ce1c72a..a36e49e 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -890,6 +890,23 @@ out:
>> return ret;
>> }
>>
>> +static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
>> +{
>> + struct macvlan_dev *vlan;
>
> Empty line wouldn't hurt here, after declaration.
Will add one line here. Thanks
>
>> + rcu_read_lock_bh();
>> + vlan = rcu_dereference_bh(q->vlan);
>> + if (vlan)
>> + dev_hold(vlan->dev);
>> + rcu_read_unlock_bh();
>> +
>> + return vlan;
>> +}
> [...]
>
> WBR, Sergei
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [net-next RFC 2/8] macvtap: return -EBADFD when TUNGETIFF fails
2013-05-23 11:54 ` Michael S. Tsirkin
@ 2013-05-24 6:28 ` Jason Wang
0 siblings, 0 replies; 23+ messages in thread
From: Jason Wang @ 2013-05-24 6:28 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel
On 05/23/2013 07:54 PM, Michael S. Tsirkin wrote:
> On Thu, May 23, 2013 at 11:12:27AM +0800, Jason Wang wrote:
>> Tuntap return -EBADFD when TUNGETIFF fails, we should return the same value.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Can you add some more comments on why this matters?
Just to keep compatibility for tuntap.
> Any userspace that cares will have to handle both, right?
Ideally the userspace should expect the same behavior for both tap and
macvtap.
Maybe we should just keep this.
>
>> ---
>> drivers/net/macvtap.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index 59e9605..ce1c72a 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -928,7 +928,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>> rcu_read_unlock_bh();
>>
>> if (!vlan)
>> - return -ENOLINK;
>> + return -EBADFD;
>>
>> ret = 0;
>> if (copy_to_user(&ifr->ifr_name, vlan->dev->name, IFNAMSIZ) ||
>> --
>> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2013-05-24 6:28 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-23 3:12 [net-next RFC 0/8] multiqueue API support for macvtap Jason Wang
2013-05-23 3:12 ` [net-next RFC 1/8] macvlan: switch to use IS_ENABLED() Jason Wang
2013-05-23 11:31 ` Michael S. Tsirkin
2013-05-23 3:12 ` [net-next RFC 2/8] macvtap: return -EBADFD when TUNGETIFF fails Jason Wang
2013-05-23 11:54 ` Michael S. Tsirkin
2013-05-24 6:28 ` Jason Wang
2013-05-23 3:12 ` [net-next RFC 3/8] macvtap: introduce macvtap_get_vlan() Jason Wang
2013-05-23 15:11 ` Sergei Shtylyov
2013-05-24 6:21 ` Jason Wang
2013-05-23 3:12 ` [net-next RFC 4/8] macvlan: reduce the max number of taps to 8 Jason Wang
2013-05-23 6:37 ` Michael S. Tsirkin
2013-05-24 5:14 ` Jason Wang
2013-05-23 3:12 ` [net-next RFC 5/8] macvtap: eliminate linear search Jason Wang
2013-05-23 11:41 ` Michael S. Tsirkin
2013-05-24 5:33 ` Jason Wang
2013-05-23 3:12 ` [net-next RFC 6/8] macvtap: allow TUNSETIFF to create multiqueue device Jason Wang
2013-05-23 11:45 ` Michael S. Tsirkin
2013-05-24 5:36 ` Jason Wang
2013-05-23 3:12 ` [net-next RFC 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
2013-05-23 11:52 ` Michael S. Tsirkin
2013-05-24 6:19 ` Jason Wang
2013-05-23 3:12 ` [net-next RFC 8/8] macvtap: enable multiqueue flag Jason Wang
2013-05-23 11:53 ` [net-next RFC 0/8] multiqueue API support for macvtap Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).