All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] macvtap fixes.
@ 2011-10-20 14:24 Eric W. Biederman
  2011-10-20 14:26 ` [PATCH 1/5] macvtap: Close a race between macvtap_open and macvtap_dellink Eric W. Biederman
  2011-10-21  6:56 ` [PATCH 0/5] macvtap fixes David Miller
  0 siblings, 2 replies; 8+ messages in thread
From: Eric W. Biederman @ 2011-10-20 14:24 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Arnd Bergmann, Jason Wang, Michael S. Tsirkin,
	Ian Campbell, Shirly Ma


This series of patches fixes a series of minor bugs in the macvtap code.

The fixes to handle failures in newlink and the change in how we handle
minor device number allocations are particularly significant.

Eric W. Biederman (5):
      macvtap: Close a race between macvtap_open and macvtap_dellink.
      macvtap: Fix macvtap_open races in the zero copy enable code.
      macvtap: Don't leak unreceived packets when we delete a macvtap device.
      macvtap: Rewrite macvtap_newlink so the error handling works.
      macvtap: Fix the minor device number allocation

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] macvtap: Close a race between macvtap_open and macvtap_dellink.
  2011-10-20 14:24 [PATCH 0/5] macvtap fixes Eric W. Biederman
@ 2011-10-20 14:26 ` Eric W. Biederman
  2011-10-20 14:26   ` [PATCH 2/5] macvtap: Fix macvtap_open races in the zero copy enable code Eric W. Biederman
  2011-10-21  6:56 ` [PATCH 0/5] macvtap fixes David Miller
  1 sibling, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2011-10-20 14:26 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Arnd Bergmann, Jason Wang, Michael S. Tsirkin,
	Ian Campbell, Shirly Ma


There is a small window in macvtap_open between looking up a
networking device and calling macvtap_set_queue in which
macvtap_del_queues called from macvtap_dellink.   After
calling macvtap_del_queues it is totally incorrect to
allow macvtap_set_queue to proceed so prevent success by
reporting that all of the available queues are in use.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/macvtap.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 3da5578..70aa628 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -231,6 +231,8 @@ static void macvtap_del_queues(struct net_device *dev)
 		}
 	}
 	BUG_ON(vlan->numvtaps != 0);
+	/* guarantee that any future macvtap_set_queue will fail */
+	vlan->numvtaps = MAX_MACVTAP_QUEUES;
 	spin_unlock(&macvtap_lock);
 
 	synchronize_rcu();
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] macvtap: Fix macvtap_open races in the zero copy enable code.
  2011-10-20 14:26 ` [PATCH 1/5] macvtap: Close a race between macvtap_open and macvtap_dellink Eric W. Biederman
@ 2011-10-20 14:26   ` Eric W. Biederman
  2011-10-20 14:27     ` [PATCH 3/5] macvtap: Don't leak unreceived packets when we delete a macvtap device Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2011-10-20 14:26 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Arnd Bergmann, Jason Wang, Michael S. Tsirkin,
	Ian Campbell, Shirly Ma


To see if it is appropriate to enable the macvtap zero copy feature
don't test the lowerdev network device flags.   Instead test the
macvtap network device flags which are a direct copy of the lowerdev
flags.  This is important because nothing holds a reference to lowerdev
and on a very bad day we lowerdev could be a pointer to stale memory.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/macvtap.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 70aa628..1d9c9c2 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -343,7 +343,6 @@ static int macvtap_open(struct inode *inode, struct file *file)
 {
 	struct net *net = current->nsproxy->net_ns;
 	struct net_device *dev = dev_get_by_index(net, iminor(inode));
-	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct macvtap_queue *q;
 	int err;
 
@@ -376,12 +375,12 @@ static int macvtap_open(struct inode *inode, struct file *file)
 	/*
 	 * so far only KVM virtio_net uses macvtap, enable zero copy between
 	 * guest kernel and host kernel when lower device supports zerocopy
+	 *
+	 * The macvlan supports zerocopy iff the lower device supports zero
+	 * copy so we don't have to look at the lower device directly.
 	 */
-	if (vlan) {
-		if ((vlan->lowerdev->features & NETIF_F_HIGHDMA) &&
-		    (vlan->lowerdev->features & NETIF_F_SG))
-			sock_set_flag(&q->sk, SOCK_ZEROCOPY);
-	}
+	if ((dev->features & NETIF_F_HIGHDMA) && (dev->features & NETIF_F_SG))
+		sock_set_flag(&q->sk, SOCK_ZEROCOPY);
 
 	err = macvtap_set_queue(dev, file, q);
 	if (err)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] macvtap: Don't leak unreceived packets when we delete a macvtap device.
  2011-10-20 14:26   ` [PATCH 2/5] macvtap: Fix macvtap_open races in the zero copy enable code Eric W. Biederman
@ 2011-10-20 14:27     ` Eric W. Biederman
  2011-10-20 14:28       ` [PATCH 4/5] macvtap: Rewrite macvtap_newlink so the error handling works Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2011-10-20 14:27 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Arnd Bergmann, Jason Wang, Michael S. Tsirkin,
	Ian Campbell, Shirly Ma


To avoid leaking packets in the receive queue.  Add a socket destructor
that will run whenever destroy a macvtap socket.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/macvtap.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 1d9c9c2..515aa87 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -339,6 +339,11 @@ static void macvtap_sock_write_space(struct sock *sk)
 		wake_up_interruptible_poll(wqueue, POLLOUT | POLLWRNORM | POLLWRBAND);
 }
 
+static void macvtap_sock_destruct(struct sock *sk)
+{
+	skb_queue_purge(&sk->sk_receive_queue);
+}
+
 static int macvtap_open(struct inode *inode, struct file *file)
 {
 	struct net *net = current->nsproxy->net_ns;
@@ -369,6 +374,7 @@ static int macvtap_open(struct inode *inode, struct file *file)
 	q->sock.ops = &macvtap_socket_ops;
 	sock_init_data(&q->sock, &q->sk);
 	q->sk.sk_write_space = macvtap_sock_write_space;
+	q->sk.sk_destruct = macvtap_sock_destruct;
 	q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP;
 	q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] macvtap: Rewrite macvtap_newlink so the error handling works.
  2011-10-20 14:27     ` [PATCH 3/5] macvtap: Don't leak unreceived packets when we delete a macvtap device Eric W. Biederman
@ 2011-10-20 14:28       ` Eric W. Biederman
  2011-10-20 14:29         ` [PATCH 5/5] macvtap: Fix the minor device number allocation Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2011-10-20 14:28 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Arnd Bergmann, Jason Wang, Michael S. Tsirkin,
	Ian Campbell, Shirly Ma


Place macvlan_common_newlink at the end of macvtap_newlink because
failing in newlink after registering your network device is not
supported.

Move device_create into a netdevice creation notifier.   The network device
notifier is the only hook that is called after the network device has been
registered with the device layer and before register_network_device returns
success.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/macvtap.c |   73 +++++++++++++++++++++++++++++++++----------------
 1 files changed, 49 insertions(+), 24 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 515aa87..25689e9 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -280,34 +280,16 @@ static int macvtap_newlink(struct net *src_net,
 			   struct nlattr *tb[],
 			   struct nlattr *data[])
 {
-	struct device *classdev;
-	dev_t devt;
-	int err;
-
-	err = macvlan_common_newlink(src_net, dev, tb, data,
-				     macvtap_receive, macvtap_forward);
-	if (err)
-		goto out;
-
-	devt = MKDEV(MAJOR(macvtap_major), dev->ifindex);
-
-	classdev = device_create(macvtap_class, &dev->dev, devt,
-				 dev, "tap%d", dev->ifindex);
-	if (IS_ERR(classdev)) {
-		err = PTR_ERR(classdev);
-		macvtap_del_queues(dev);
-	}
-
-out:
-	return err;
+	/* Don't put anything that may fail after macvlan_common_newlink
+	 * because we can't undo what it does.
+	 */
+	return macvlan_common_newlink(src_net, dev, tb, data,
+				      macvtap_receive, macvtap_forward);
 }
 
 static void macvtap_dellink(struct net_device *dev,
 			    struct list_head *head)
 {
-	device_destroy(macvtap_class,
-		       MKDEV(MAJOR(macvtap_major), dev->ifindex));
-
 	macvtap_del_queues(dev);
 	macvlan_dellink(dev, head);
 }
@@ -975,6 +957,42 @@ struct socket *macvtap_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(macvtap_get_socket);
 
+static int macvtap_device_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *dev = ptr;
+	struct device *classdev;
+	dev_t devt;
+
+	if (dev->rtnl_link_ops != &macvtap_link_ops)
+		return NOTIFY_DONE;
+
+
+	switch (event) {
+	case NETDEV_REGISTER:
+		/* Create the device node here after the network device has
+		 * been registered but before register_netdevice has
+		 * finished running.
+		 */
+		devt = MKDEV(MAJOR(macvtap_major), dev->ifindex);
+		classdev = device_create(macvtap_class, &dev->dev, devt,
+					 dev, "tap%d", dev->ifindex);
+		if (IS_ERR(classdev))
+			return notifier_from_errno(PTR_ERR(classdev));
+		break;
+	case NETDEV_UNREGISTER:
+		devt = MKDEV(MAJOR(macvtap_major), dev->ifindex);
+		device_destroy(macvtap_class, devt);
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block macvtap_notifier_block __read_mostly = {
+	.notifier_call	= macvtap_device_event,
+};
+
 static int macvtap_init(void)
 {
 	int err;
@@ -995,12 +1013,18 @@ static int macvtap_init(void)
 		goto out3;
 	}
 
-	err = macvlan_link_register(&macvtap_link_ops);
+	err = register_netdevice_notifier(&macvtap_notifier_block);
 	if (err)
 		goto out4;
 
+	err = macvlan_link_register(&macvtap_link_ops);
+	if (err)
+		goto out5;
+
 	return 0;
 
+out5:
+	unregister_netdevice_notifier(&macvtap_notifier_block);
 out4:
 	class_unregister(macvtap_class);
 out3:
@@ -1015,6 +1039,7 @@ module_init(macvtap_init);
 static void macvtap_exit(void)
 {
 	rtnl_link_unregister(&macvtap_link_ops);
+	unregister_netdevice_notifier(&macvtap_notifier_block);
 	class_unregister(macvtap_class);
 	cdev_del(&macvtap_cdev);
 	unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] macvtap: Fix the minor device number allocation
  2011-10-20 14:28       ` [PATCH 4/5] macvtap: Rewrite macvtap_newlink so the error handling works Eric W. Biederman
@ 2011-10-20 14:29         ` Eric W. Biederman
  0 siblings, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2011-10-20 14:29 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Arnd Bergmann, Jason Wang, Michael S. Tsirkin,
	Ian Campbell, Shirly Ma


On systems that create and delete lots of dynamic devices the
31bit linux ifindex fails to fit in the 16bit macvtap minor,
resulting in unusable macvtap devices.  I have systems running
automated tests that that hit this condition in just a few days.

Use a linux idr allocator to track which mavtap minor numbers
are available and and to track the association between macvtap
minor numbers and macvtap network devices.

Remove the unnecessary unneccessary check to see if the network
device we have found is indeed a macvtap device.  With macvtap
specific data structures it is impossible to find any other
kind of networking device.

Increase the macvtap minor range from 65536 to the full 20 bits
that is supported by linux device numbers.  It doesn't solve the
original problem but there is no penalty for a larger minor
device range.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/macvtap.c      |   87 ++++++++++++++++++++++++++++++++++++--------
 include/linux/if_macvlan.h |    1 +
 2 files changed, 72 insertions(+), 16 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 25689e9..1b7082d 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -51,15 +51,13 @@ static struct proto macvtap_proto = {
 };
 
 /*
- * Minor number matches netdev->ifindex, so need a potentially
- * large value. This also makes it possible to split the
- * tap functionality out again in the future by offering it
- * from other drivers besides macvtap. As long as every device
- * only has one tap, the interface numbers assure that the
- * device nodes are unique.
+ * Variables for dealing with macvtaps device numbers.
  */
 static dev_t macvtap_major;
-#define MACVTAP_NUM_DEVS 65536
+#define MACVTAP_NUM_DEVS (1U << MINORBITS)
+static DEFINE_MUTEX(minor_lock);
+static DEFINE_IDR(minor_idr);
+
 #define GOODCOPY_LEN 128
 static struct class *macvtap_class;
 static struct cdev macvtap_cdev;
@@ -275,6 +273,58 @@ static int macvtap_receive(struct sk_buff *skb)
 	return macvtap_forward(skb->dev, skb);
 }
 
+static int macvtap_get_minor(struct macvlan_dev *vlan)
+{
+	int retval = -ENOMEM;
+	int id;
+
+	mutex_lock(&minor_lock);
+	if (idr_pre_get(&minor_idr, GFP_KERNEL) == 0)
+		goto exit;
+
+	retval = idr_get_new_above(&minor_idr, vlan, 1, &id);
+	if (retval < 0) {
+		if (retval == -EAGAIN)
+			retval = -ENOMEM;
+		goto exit;
+	}
+	if (id < MACVTAP_NUM_DEVS) {
+		vlan->minor = id;
+	} else {
+		printk(KERN_ERR "too many macvtap devices\n");
+		retval = -EINVAL;
+		idr_remove(&minor_idr, id);
+	}
+exit:
+	mutex_unlock(&minor_lock);
+	return retval;
+}
+
+static void macvtap_free_minor(struct macvlan_dev *vlan)
+{
+	mutex_lock(&minor_lock);
+	if (vlan->minor) {
+		idr_remove(&minor_idr, vlan->minor);
+		vlan->minor = 0;
+	}
+	mutex_unlock(&minor_lock);
+}
+
+static struct net_device *dev_get_by_macvtap_minor(int minor)
+{
+	struct net_device *dev = NULL;
+	struct macvlan_dev *vlan;
+
+	mutex_lock(&minor_lock);
+	vlan = idr_find(&minor_idr, minor);
+	if (vlan) {
+		dev = vlan->dev;
+		dev_hold(dev);
+	}
+	mutex_unlock(&minor_lock);
+	return dev;
+}
+
 static int macvtap_newlink(struct net *src_net,
 			   struct net_device *dev,
 			   struct nlattr *tb[],
@@ -329,7 +379,7 @@ static void macvtap_sock_destruct(struct sock *sk)
 static int macvtap_open(struct inode *inode, struct file *file)
 {
 	struct net *net = current->nsproxy->net_ns;
-	struct net_device *dev = dev_get_by_index(net, iminor(inode));
+	struct net_device *dev = dev_get_by_macvtap_minor(iminor(inode));
 	struct macvtap_queue *q;
 	int err;
 
@@ -337,11 +387,6 @@ static int macvtap_open(struct inode *inode, struct file *file)
 	if (!dev)
 		goto out;
 
-	/* check if this is a macvtap device */
-	err = -EINVAL;
-	if (dev->rtnl_link_ops != &macvtap_link_ops)
-		goto out;
-
 	err = -ENOMEM;
 	q = (struct macvtap_queue *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
 					     &macvtap_proto);
@@ -961,12 +1006,15 @@ static int macvtap_device_event(struct notifier_block *unused,
 				unsigned long event, void *ptr)
 {
 	struct net_device *dev = ptr;
+	struct macvlan_dev *vlan;
 	struct device *classdev;
 	dev_t devt;
+	int err;
 
 	if (dev->rtnl_link_ops != &macvtap_link_ops)
 		return NOTIFY_DONE;
 
+	vlan = netdev_priv(dev);
 
 	switch (event) {
 	case NETDEV_REGISTER:
@@ -974,15 +1022,22 @@ static int macvtap_device_event(struct notifier_block *unused,
 		 * been registered but before register_netdevice has
 		 * finished running.
 		 */
-		devt = MKDEV(MAJOR(macvtap_major), dev->ifindex);
+		err = macvtap_get_minor(vlan);
+		if (err)
+			return notifier_from_errno(err);
+
+		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
 		classdev = device_create(macvtap_class, &dev->dev, devt,
 					 dev, "tap%d", dev->ifindex);
-		if (IS_ERR(classdev))
+		if (IS_ERR(classdev)) {
+			macvtap_free_minor(vlan);
 			return notifier_from_errno(PTR_ERR(classdev));
+		}
 		break;
 	case NETDEV_UNREGISTER:
-		devt = MKDEV(MAJOR(macvtap_major), dev->ifindex);
+		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
 		device_destroy(macvtap_class, devt);
+		macvtap_free_minor(vlan);
 		break;
 	}
 
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index e28b2e4..d103dca 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -64,6 +64,7 @@ struct macvlan_dev {
 	int (*forward)(struct net_device *dev, struct sk_buff *skb);
 	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
 	int			numvtaps;
+	int			minor;
 };
 
 static inline void macvlan_count_rx(const struct macvlan_dev *vlan,
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5] macvtap fixes.
  2011-10-20 14:24 [PATCH 0/5] macvtap fixes Eric W. Biederman
  2011-10-20 14:26 ` [PATCH 1/5] macvtap: Close a race between macvtap_open and macvtap_dellink Eric W. Biederman
@ 2011-10-21  6:56 ` David Miller
  2011-10-24  6:31   ` Michael S. Tsirkin
  1 sibling, 1 reply; 8+ messages in thread
From: David Miller @ 2011-10-21  6:56 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, arnd, jasowang, mst, ian.campbell, mashirley

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Thu, 20 Oct 2011 07:24:59 -0700

> 
> This series of patches fixes a series of minor bugs in the macvtap code.
> 
> The fixes to handle failures in newlink and the change in how we handle
> minor device number allocations are particularly significant.

Applied to net-next, thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5] macvtap fixes.
  2011-10-21  6:56 ` [PATCH 0/5] macvtap fixes David Miller
@ 2011-10-24  6:31   ` Michael S. Tsirkin
  0 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2011-10-24  6:31 UTC (permalink / raw)
  To: David Miller; +Cc: ebiederm, netdev, arnd, jasowang, ian.campbell, mashirley

On Fri, Oct 21, 2011 at 02:56:07AM -0400, David Miller wrote:
> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Thu, 20 Oct 2011 07:24:59 -0700
> 
> > 
> > This series of patches fixes a series of minor bugs in the macvtap code.
> > 
> > The fixes to handle failures in newlink and the change in how we handle
> > minor device number allocations are particularly significant.
> 
> Applied to net-next, thanks.

The 'Don't leak unreceived packets' patch probably makes sense in 'net'.

-- 
MST

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-10-24  6:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-20 14:24 [PATCH 0/5] macvtap fixes Eric W. Biederman
2011-10-20 14:26 ` [PATCH 1/5] macvtap: Close a race between macvtap_open and macvtap_dellink Eric W. Biederman
2011-10-20 14:26   ` [PATCH 2/5] macvtap: Fix macvtap_open races in the zero copy enable code Eric W. Biederman
2011-10-20 14:27     ` [PATCH 3/5] macvtap: Don't leak unreceived packets when we delete a macvtap device Eric W. Biederman
2011-10-20 14:28       ` [PATCH 4/5] macvtap: Rewrite macvtap_newlink so the error handling works Eric W. Biederman
2011-10-20 14:29         ` [PATCH 5/5] macvtap: Fix the minor device number allocation Eric W. Biederman
2011-10-21  6:56 ` [PATCH 0/5] macvtap fixes David Miller
2011-10-24  6:31   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.