All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] macvlan: fix oops with vlan-on-top and HW_VLAN_CTAG_TX lowerdev
@ 2013-12-28 13:38 Florian Westphal
  2013-12-29 22:01 ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2013-12-28 13:38 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

commit 797f87f83b60685ff8a13fa0572d2f10393c50d3
(macvlan: fix netdev feature propagation from lower device) can result
in oops:

[   81.011639] 8021q: adding VLAN 0 to HW filter on device wan0
[   81.030402] BUG: unable to handle kernel NULL pointer dereference at 00000000000001e0
[   81.032267] IP: [<ffffffff813269d0>] macvlan_hard_header+0x40/0x60
[..]
[   81.034359] RIP: 0010:[<ffffffff813269d0>]  [<ffffffff813269d0>] macvlan_hard_header+0x40/0x60
[..]
[   81.034359]  <IRQ> 
[   81.034359]  [<ffffffff8139c7ed>] ? neigh_resolve_output+0x16d/0x2b0
[   81.034359]  [<ffffffff81484246>] ? ip6_finish_output2+0x176/0x600
[   81.034359]  [<ffffffff81484246>] ip6_finish_output2+0x176/0x600
[   81.034359]  [<ffffffff81484129>] ? ip6_finish_output2+0x59/0x600
[   81.034359]  [<ffffffff814865c6>] ip6_finish_output+0x96/0x1f0
[   81.034359]  [<ffffffff81486773>] ip6_output+0x53/0x1c0
[   81.034359]  [<ffffffff814a7dc2>] mld_sendpack+0x2b2/0x330
[   81.034359]  [<ffffffff814a8774>] mld_ifc_timer_expire+0x194/0x2c0

...if the lower device supports NETIF_F_HW_VLAN_CTAG_TX flag and a vlan
is created on top of the macvlan device, i.e.

ip link add link eth0 name wan0 type macvlan
ip link add link wan0 name wan1 type vlan id 2
ip link set wan0 up

reason is that 8021q sets dev->header_ops to the realdev in
'NETIF_F_HW_VLAN_CTAG_TX present' case - but macvlan_heard_header
assumes that the *dev pointer passed is a macvlan device.

But thats not the case in the above scenario.
macvlan_hard_header is invokes with *dev being the 8021q interface,
which then oopses since the netdev_priv area is something completely different.

Cap lowerdev feature set to the one explicitly set via MACVLAN_FEATURES
before trying to increment any features.

Fixes: 797f87f83b ("macvlan: fix netdev feature propagation from lower device")
Reported-by: Will Trives <renevant@internode.on.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 WARNING: I am not sure this is correct.

 We lose flags that the lowerdev of the macvlan could handle.

 What 8021q is doing in net/8021q/vlan_dev.c:vlan_dev_init seems strange to me.

 Where does it say that is ok to just do
  dev->header_ops      = real_dev->header_ops;

 and then assume that header_ops->create() et al. will cope
 with dev being a 8021q device instead of real_dev?

 To me this was completely unexpected.

 Or is the real bug the use of netdev_priv in macvlan_hard_header()?

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 60406b0..cd2791b 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -690,13 +690,13 @@ static netdev_features_t macvlan_fix_features(struct net_device *dev,
 					      netdev_features_t features)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
-	netdev_features_t mask;
+	netdev_features_t mask, lowerdev_features;
 
-	features |= NETIF_F_ALL_FOR_ALL;
 	features &= (vlan->set_features | ~MACVLAN_FEATURES);
+	lowerdev_features = vlan->lowerdev->features & MACVLAN_FEATURES;
 	mask = features;
 
-	features = netdev_increment_features(vlan->lowerdev->features,
+	features = netdev_increment_features(lowerdev_features,
 					     features,
 					     mask);
 	if (!vlan->fwd_priv)
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] macvlan: fix oops with vlan-on-top and HW_VLAN_CTAG_TX lowerdev
  2013-12-28 13:38 [PATCH] macvlan: fix oops with vlan-on-top and HW_VLAN_CTAG_TX lowerdev Florian Westphal
@ 2013-12-29 22:01 ` David Miller
  2013-12-30 10:27   ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2013-12-29 22:01 UTC (permalink / raw)
  To: fw; +Cc: netdev

From: Florian Westphal <fw@strlen.de>
Date: Sat, 28 Dec 2013 14:38:40 +0100

>  Or is the real bug the use of netdev_priv in macvlan_hard_header()?

That's a really good question.

On the surface it really looks like header ops cannot assume anything
about the netdev_priv().  These operations are really meant to be
stateless and apply to an entire class of devices.  macvlan is trying
to make it's operations specific to exactly one device.

For example, eth_header() only assumes that the device the packet is
going over has an ETH_ALEN length MAC address.

Or at least, based upon what the vlan is trying to do it appears this
way.

But, ironically, vlan_dev_hard_header() makes this same assumption.  It
uses netdev_priv() the same way macvlan is.

So the more I think about this, the more I think that the operations
assignment VLAN is doing is illegal.  If it wants to "pass through"
it must do so with explicit handlers, ones that pass the expected
device to the hard header ops.

So vlan_dev.c would have things like:

static int vlan_passthru_hard_header(struct sk_buff *skb, struct net_device *dev,
				     unsigned short type,
				     const void *daddr, const void *saddr,
				     unsigned int len)
{
	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
	struct net_device *real_dev = vlan->real_dev;

	return real_dev->header_ops->create(skb, real_dev, type, daddr, saddr, len);
}

static int vlan_passthru_rebuild_header(struct sk_buff *skb)
{
	struct net_device *dev = skb->dev;
	struct net_device *real_dev;
	struct vlan_dev_priv *vlan;

	vlan = vlan_dev_priv(dev);
	real_dev = vlan->real_dev;

	return real_dev->header_ops->rebuild(skb);
}

and then:

static const struct header_ops vlan_passthru_header_ops = {
	.create	 = vlan_passthru_hard_header,
	.rebuild = vlan_passthru_rebuild_header,
	.parse	 = eth_header_parse,
};

BTW, ARCNET is another case which cares about the netdev_priv() in it's
header_ops.

Can someone test this?

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 762896e..91eb467 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -530,6 +530,35 @@ static const struct header_ops vlan_header_ops = {
 	.parse	 = eth_header_parse,
 };
 
+static int vlan_passthru_hard_header(struct sk_buff *skb, struct net_device *dev,
+				     unsigned short type,
+				     const void *daddr, const void *saddr,
+				     unsigned int len)
+{
+	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
+	struct net_device *real_dev = vlan->real_dev;
+
+	return real_dev->header_ops->create(skb, real_dev, type, daddr, saddr, len);
+}
+
+static int vlan_passthru_rebuild_header(struct sk_buff *skb)
+{
+	struct net_device *dev = skb->dev;
+	struct net_device *real_dev;
+	struct vlan_dev_priv *vlan;
+
+	vlan = vlan_dev_priv(dev);
+	real_dev = vlan->real_dev;
+
+	return real_dev->header_ops->rebuild(skb);
+}
+
+static const struct header_ops vlan_passthru_header_ops = {
+	.create	 = vlan_passthru_hard_header,
+	.rebuild = vlan_passthru_rebuild_header,
+	.parse	 = eth_header_parse,
+};
+
 static struct device_type vlan_type = {
 	.name	= "vlan",
 };
@@ -573,7 +602,7 @@ static int vlan_dev_init(struct net_device *dev)
 
 	dev->needed_headroom = real_dev->needed_headroom;
 	if (real_dev->features & NETIF_F_HW_VLAN_CTAG_TX) {
-		dev->header_ops      = real_dev->header_ops;
+		dev->header_ops      = &vlan_passthru_header_ops;
 		dev->hard_header_len = real_dev->hard_header_len;
 	} else {
 		dev->header_ops      = &vlan_header_ops;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] macvlan: fix oops with vlan-on-top and HW_VLAN_CTAG_TX lowerdev
  2013-12-29 22:01 ` David Miller
@ 2013-12-30 10:27   ` Florian Westphal
  2013-12-31 21:24     ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2013-12-30 10:27 UTC (permalink / raw)
  To: David Miller; +Cc: fw, netdev

David Miller <davem@davemloft.net> wrote:
> From: Florian Westphal <fw@strlen.de>
> Date: Sat, 28 Dec 2013 14:38:40 +0100
> 
> >  Or is the real bug the use of netdev_priv in macvlan_hard_header()?
> 
> That's a really good question.
[..]
> So the more I think about this, the more I think that the operations
> assignment VLAN is doing is illegal.  If it wants to "pass through"
> it must do so with explicit handlers, ones that pass the expected
> device to the hard header ops.

Both bonding and team seem to do the same.  Guess it was never
a problem since most create hooks don't use netdev_priv.

> So vlan_dev.c would have things like:
> 
> static int vlan_passthru_hard_header(struct sk_buff *skb, struct net_device *dev,
> 				     unsigned short type,
> 				     const void *daddr, const void *saddr,
> 				     unsigned int len)
> {
> 	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
> 	struct net_device *real_dev = vlan->real_dev;
> 
> 	return real_dev->header_ops->create(skb, real_dev, type, daddr, saddr, len);
> }

I wonder if this needs to use dev_hard_header() to protect against
NULL header_ops/create hook. macvlan does this.

> static int vlan_passthru_rebuild_header(struct sk_buff *skb)
> {
> 	struct net_device *dev = skb->dev;
> 	struct net_device *real_dev;
> 	struct vlan_dev_priv *vlan;
> 
> 	vlan = vlan_dev_priv(dev);
> 	real_dev = vlan->real_dev;
> 
> 	return real_dev->header_ops->rebuild(skb);
> }

similar, e.g. drivers/net/plip/plip.c defines 'create' and 'cache' but
no 'rebuild'.

> Can someone test this?

Here is what i did in qemu guest running net-tree+your patch below:

ip link add link eth0 name wan0 type macvlan
ip link add link wan0 name wan1 type vlan id 2
ip link set wan0 up

[ without your patch it would oops real soon now ]

ip addr add 10.0.0.2/8 dev wan1
ip link set wan1 up

- host/guest can xmit data using vlan interface (tested with netcat)
- tcpdump -ne on host shows:
[ .. ] 9:fc:b7, ethertype 802.1Q (0x8100), length 1518: vlan 2, p 0, ethertype IPv4, 10.0.0.1.12345 > 10.0.0.2.50369 [..]
ethtool -k wan0 on guest shows:
rx-vlan-offload: on [fixed]
tx-vlan-offload: on [fixed]

So, this looks good to me.

Many thanks for investigating this issue!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] macvlan: fix oops with vlan-on-top and HW_VLAN_CTAG_TX lowerdev
  2013-12-30 10:27   ` Florian Westphal
@ 2013-12-31 21:24     ` David Miller
  2014-01-03 11:39       ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2013-12-31 21:24 UTC (permalink / raw)
  To: fw; +Cc: netdev

From: Florian Westphal <fw@strlen.de>
Date: Mon, 30 Dec 2013 11:27:12 +0100

> I wonder if this needs to use dev_hard_header() to protect against
> NULL header_ops/create hook. macvlan does this.

Good catch.

> similar, e.g. drivers/net/plip/plip.c defines 'create' and 'cache' but
> no 'rebuild'.

Likewise, and it looks like the neighbour code could crash if the right
things happen over a PLIP device.  Also fixed by the patch below.

The following is what I committed and will submit to -stable, thanks!

====================
vlan: Fix header ops passthru when doing TX VLAN offload.

When the vlan code detects that the real device can do TX VLAN offloads
in hardware, it tries to arrange for the real device's header_ops to
be invoked directly.

But it does so illegally, by simply hooking the real device's
header_ops up to the VLAN device.

This doesn't work because we will end up invoking a set of header_ops
routines which expect a device type which matches the real device, but
will see a VLAN device instead.

Fix this by providing a pass-thru set of header_ops which will arrange
to pass the proper real device instead.

To facilitate this add a dev_rebuild_header().  There are
implementations which provide a ->cache and ->create but not a
->rebuild (f.e. PLIP).  So we need a helper function just like
dev_hard_header() to avoid crashes.

Use this helper in the one existing place where the
header_ops->rebuild was being invoked, the neighbour code.

With lots of help from Florian Westphal.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h |  9 +++++++++
 net/8021q/vlan_dev.c      | 19 ++++++++++++++++++-
 net/core/neighbour.c      |  2 +-
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d9a550b..7514b9c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1912,6 +1912,15 @@ static inline int dev_parse_header(const struct sk_buff *skb,
 	return dev->header_ops->parse(skb, haddr);
 }
 
+static inline int dev_rebuild_header(struct sk_buff *skb)
+{
+	const struct net_device *dev = skb->dev;
+
+	if (!dev->header_ops || !dev->header_ops->rebuild)
+		return 0;
+	return dev->header_ops->rebuild(skb);
+}
+
 typedef int gifconf_func_t(struct net_device * dev, char __user * bufptr, int len);
 int register_gifconf(unsigned int family, gifconf_func_t *gifconf);
 static inline int unregister_gifconf(unsigned int family)
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 762896e..47c908f 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -530,6 +530,23 @@ static const struct header_ops vlan_header_ops = {
 	.parse	 = eth_header_parse,
 };
 
+static int vlan_passthru_hard_header(struct sk_buff *skb, struct net_device *dev,
+				     unsigned short type,
+				     const void *daddr, const void *saddr,
+				     unsigned int len)
+{
+	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
+	struct net_device *real_dev = vlan->real_dev;
+
+	return dev_hard_header(skb, real_dev, type, daddr, saddr, len);
+}
+
+static const struct header_ops vlan_passthru_header_ops = {
+	.create	 = vlan_passthru_hard_header,
+	.rebuild = dev_rebuild_header,
+	.parse	 = eth_header_parse,
+};
+
 static struct device_type vlan_type = {
 	.name	= "vlan",
 };
@@ -573,7 +590,7 @@ static int vlan_dev_init(struct net_device *dev)
 
 	dev->needed_headroom = real_dev->needed_headroom;
 	if (real_dev->features & NETIF_F_HW_VLAN_CTAG_TX) {
-		dev->header_ops      = real_dev->header_ops;
+		dev->header_ops      = &vlan_passthru_header_ops;
 		dev->hard_header_len = real_dev->hard_header_len;
 	} else {
 		dev->header_ops      = &vlan_header_ops;
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 36b1443..932c6d7 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1275,7 +1275,7 @@ int neigh_compat_output(struct neighbour *neigh, struct sk_buff *skb)
 
 	if (dev_hard_header(skb, dev, ntohs(skb->protocol), NULL, NULL,
 			    skb->len) < 0 &&
-	    dev->header_ops->rebuild(skb))
+	    dev_rebuild_header(skb))
 		return 0;
 
 	return dev_queue_xmit(skb);
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] macvlan: fix oops with vlan-on-top and HW_VLAN_CTAG_TX lowerdev
  2013-12-31 21:24     ` David Miller
@ 2014-01-03 11:39       ` Florian Westphal
  2014-01-03 20:33         ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2014-01-03 11:39 UTC (permalink / raw)
  To: David Miller; +Cc: fw, netdev

David Miller <davem@davemloft.net> wrote:
> vlan: Fix header ops passthru when doing TX VLAN offload.
> 
> When the vlan code detects that the real device can do TX VLAN offloads
> in hardware, it tries to arrange for the real device's header_ops to
> be invoked directly.

Sorry for the late reply.

> +static inline int dev_rebuild_header(struct sk_buff *skb)
> +{
> +	const struct net_device *dev = skb->dev;
> +
> +	if (!dev->header_ops || !dev->header_ops->rebuild)
> +		return 0;
> +	return dev->header_ops->rebuild(skb);
> +}
> +
[..]

> +static const struct header_ops vlan_passthru_header_ops = {
> +	.create	 = vlan_passthru_hard_header,
> +	.rebuild = dev_rebuild_header,

Doesn't that result in infinite recursion when invoking
dev_rebuild_header() on skb whose dev->header_ops is
vlan_passthru_header_ops?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] macvlan: fix oops with vlan-on-top and HW_VLAN_CTAG_TX lowerdev
  2014-01-03 11:39       ` Florian Westphal
@ 2014-01-03 20:33         ` David Miller
  2014-01-04 13:51           ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2014-01-03 20:33 UTC (permalink / raw)
  To: fw; +Cc: netdev

From: Florian Westphal <fw@strlen.de>
Date: Fri, 3 Jan 2014 12:39:04 +0100

>> +static const struct header_ops vlan_passthru_header_ops = {
>> +	.create	 = vlan_passthru_hard_header,
>> +	.rebuild = dev_rebuild_header,
> 
> Doesn't that result in infinite recursion when invoking
> dev_rebuild_header() on skb whose dev->header_ops is
> vlan_passthru_header_ops?

The skb->dev should be the real_dev at this point, no?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] macvlan: fix oops with vlan-on-top and HW_VLAN_CTAG_TX lowerdev
  2014-01-03 20:33         ` David Miller
@ 2014-01-04 13:51           ` Florian Westphal
  2014-01-05  1:13             ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2014-01-04 13:51 UTC (permalink / raw)
  To: David Miller; +Cc: fw, netdev

David Miller <davem@davemloft.net> wrote:
> From: Florian Westphal <fw@strlen.de>
> Date: Fri, 3 Jan 2014 12:39:04 +0100
> 
> >> +static const struct header_ops vlan_passthru_header_ops = {
> >> +	.create	 = vlan_passthru_hard_header,
> >> +	.rebuild = dev_rebuild_header,
> > 
> > Doesn't that result in infinite recursion when invoking
> > dev_rebuild_header() on skb whose dev->header_ops is
> > vlan_passthru_header_ops?
> 
> The skb->dev should be the real_dev at this point, no?

Oh, this is fun.

I grep'd for invocations of ->rebuild() because I wanted to understand
when/where it is used.

I only found one single instance, namely

neigh_compat_output() in net/core/neighbour.c

It does
        struct net_device *dev = skb->dev;
        __skb_pull(skb, skb_network_offset(skb));
       if (dev_hard_header(skb, dev, ntohs(skb->protocol), NULL, NULL, skb->len) < 0 &&
            dev->header_ops->rebuild(skb))
               return 0;

So I thought, if skb->dev is the vlan device, we would invoke
dev_rebuild_header(), which grabs skb->dev again and invokes
dev_rebuild_header again, etc. etc.

But: neigh_compat_output (suspicious name...) is only wired up in
net/ipv4/arp.c, in 'static const struct neigh_ops arp_broken_ops'.
... and arp_broken_ops is only set if dev->type is one of
ARPHRD_ROSE, ARPHRD_AX25, ARPHRD_NETROM.

Could it be that ->rebuild() is completely obsolete and could be removed
from almost all drivers (except above types)?

Archeology exercise #1 digs up 3b04ddde02c in linux.git, which
creats header_ops->rebuild, from the old  dev->rebuild_header.

Exercise #2 then finds commit 275513d2e1c78 in netdev-vger-cvs.git tree.
Quote from commit message:

        - dev->rebuild_header WILL DISAPPEAR. All the code
          relying on its existance is wrong, though still works.

Alexey calling from 1997 ;-)

I'll do some more digging before working on this.

I've placed a BUG() in eth_rebuild_header on my workstation, lets see if
it dies 8-}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] macvlan: fix oops with vlan-on-top and HW_VLAN_CTAG_TX lowerdev
  2014-01-04 13:51           ` Florian Westphal
@ 2014-01-05  1:13             ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2014-01-05  1:13 UTC (permalink / raw)
  To: fw; +Cc: netdev

From: Florian Westphal <fw@strlen.de>
Date: Sat, 4 Jan 2014 14:51:32 +0100

> Archeology exercise #1 digs up 3b04ddde02c in linux.git, which
> creats header_ops->rebuild, from the old  dev->rebuild_header.
> 
> Exercise #2 then finds commit 275513d2e1c78 in netdev-vger-cvs.git tree.
> Quote from commit message:
> 
>         - dev->rebuild_header WILL DISAPPEAR. All the code
>           relying on its existance is wrong, though still works.
> 
> Alexey calling from 1997 ;-)

These protocols which use the broken ARP ops are special, the
problem is that they do things like send packets out when their
header ops are invoked.

It's non-trivial to undo this design, and these aren't easy protocols
to test.

That's why we've had this compat stuff for them for more than a
decade.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-01-05  1:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-28 13:38 [PATCH] macvlan: fix oops with vlan-on-top and HW_VLAN_CTAG_TX lowerdev Florian Westphal
2013-12-29 22:01 ` David Miller
2013-12-30 10:27   ` Florian Westphal
2013-12-31 21:24     ` David Miller
2014-01-03 11:39       ` Florian Westphal
2014-01-03 20:33         ` David Miller
2014-01-04 13:51           ` Florian Westphal
2014-01-05  1:13             ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.