netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Stack sends oversize UDP packet to the driver
@ 2019-01-20 22:26 Michael Chan
  2019-01-22  0:36 ` Daniel Axtens
       [not found] ` <CAF2d9jgskHTb-nmbVo9A2CQhh9T3OnH_vbfGcMBii13oq1teCw@mail.gmail.com>
  0 siblings, 2 replies; 12+ messages in thread
From: Michael Chan @ 2019-01-20 22:26 UTC (permalink / raw)
  To: Netdev, David Miller

I've received a bug report of oversized UDP packets sent to the
bnxt_en driver for transmission.  There is no check for illegal length
in the driver and it will send a corrupted BD to the NIC if the
non-TSO length exceeds the maximum MTU supported by the driver.  This
ultimately causes the driver to hang.

Looking a little deeper, it looks like the route of the SKB was
initially to "lo" and therefore no fragmentation was done.  And it
looks like the route later got changed to the bnxt_en dev before
transmission.  The user was doing multiple VM reboots and the bad
length was happening on the Linux host.

I can add a length check in the driver to prevent this.  But is there
a better way to prevent this in the stack?  Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stack sends oversize UDP packet to the driver
  2019-01-20 22:26 Stack sends oversize UDP packet to the driver Michael Chan
@ 2019-01-22  0:36 ` Daniel Axtens
  2019-01-22  0:59   ` Michael Chan
       [not found] ` <CAF2d9jgskHTb-nmbVo9A2CQhh9T3OnH_vbfGcMBii13oq1teCw@mail.gmail.com>
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Axtens @ 2019-01-22  0:36 UTC (permalink / raw)
  To: Michael Chan, Netdev, David Miller

Hi Michael,

> I've received a bug report of oversized UDP packets sent to the
> bnxt_en driver for transmission.  There is no check for illegal length
> in the driver and it will send a corrupted BD to the NIC if the
> non-TSO length exceeds the maximum MTU supported by the driver.  This
> ultimately causes the driver to hang.
>
> Looking a little deeper, it looks like the route of the SKB was
> initially to "lo" and therefore no fragmentation was done.  And it
> looks like the route later got changed to the bnxt_en dev before
> transmission.  The user was doing multiple VM reboots and the bad
> length was happening on the Linux host.
>
> I can add a length check in the driver to prevent this.  But is there
> a better way to prevent this in the stack?  Thanks.

I hit a similar sounding issue on a bnx2x - see commit
8914a595110a6eca69a5e275b323f5d09e18f4f9

In that case, a GSO packet with gso_size too large for the firmware was
coming to the bnx2x driver from an ibmveth device via Open vSwitch. I
also toyed with a fix in the stack and ended up fixing just the driver.

I was hoping to get a generic fix in to the stack afterwards, but didn't
get anything finished. Looking back at old branches, it looks like I
considered adding MTU validation to validate_xmit_skb, but I never got
that upstream. My vague recollection is that I ended up caught by edge
cases: GSO_DODGY allows an untrusted source to set gso parameters, so
that needed to be validated first - and that was complex and potentially
slow, and I just got overtaken by more urgent work. (Note that this was
a year ago and was in many ways my introduction to TSO/GSO, so I could
be completely wrong.) Anyway, I can send you my partial work if it would
be helpful.

Regards,
Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stack sends oversize UDP packet to the driver
       [not found] ` <CAF2d9jgskHTb-nmbVo9A2CQhh9T3OnH_vbfGcMBii13oq1teCw@mail.gmail.com>
@ 2019-01-22  0:45   ` Michael Chan
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Chan @ 2019-01-22  0:45 UTC (permalink / raw)
  To: Mahesh Bandewar (महेश
	बंडेवार)
  Cc: Netdev, David Miller, Eric Dumazet, Willem de Bruijn

On Mon, Jan 21, 2019 at 4:13 PM Mahesh Bandewar (महेश बंडेवार)
<maheshb@google.com> wrote:

> I have cooked few patches to address this issue, however, I'm finding it really hard to reproduce this issue in my test environment. If you could try them and see if those fixes the issue, we can refine them if necessary and eventually formalize patches.
>
>

Hi Mahesh, yes, I can have the patches tested, so please send them my
way.  Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stack sends oversize UDP packet to the driver
  2019-01-22  0:36 ` Daniel Axtens
@ 2019-01-22  0:59   ` Michael Chan
  2019-01-22 18:28     ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Chan @ 2019-01-22  0:59 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: Netdev, David Miller

On Mon, Jan 21, 2019 at 4:36 PM Daniel Axtens <dja@axtens.net> wrote:

> I hit a similar sounding issue on a bnx2x - see commit
> 8914a595110a6eca69a5e275b323f5d09e18f4f9
>
> In that case, a GSO packet with gso_size too large for the firmware was
> coming to the bnx2x driver from an ibmveth device via Open vSwitch. I
> also toyed with a fix in the stack and ended up fixing just the driver.

Hi Daniel, yes I remember the issue that you reported at that time.
The current issue is similar but for non-TSO UDP packets.

>
> I was hoping to get a generic fix in to the stack afterwards, but didn't
> get anything finished. Looking back at old branches, it looks like I
> considered adding MTU validation to validate_xmit_skb,

MTU validation in validate_xmit_skb() would definitely prevent the
current reported issue.  Perhaps we can add a WARN() when an invalid
length is detected so that the underlying issue will be reported and
fixed.  The current issue is that somehow the dst of the SKB gets
changed to "lo" and fragmentation is not done for the proper output
device.  Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stack sends oversize UDP packet to the driver
  2019-01-22  0:59   ` Michael Chan
@ 2019-01-22 18:28     ` Mahesh Bandewar (महेश बंडेवार)
  2019-01-22 20:09       ` David Miller
  2019-01-30  9:07       ` Michael Chan
  0 siblings, 2 replies; 12+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2019-01-22 18:28 UTC (permalink / raw)
  To: Michael Chan, Daniel Axtens
  Cc: Netdev, David Miller, Eric Dumazet, Willem de Bruijn

[-- Attachment #1: Type: text/plain, Size: 2578 bytes --]

On Mon, Jan 21, 2019 at 4:59 PM Michael Chan <michael.chan@broadcom.com> wrote:
>
> On Mon, Jan 21, 2019 at 4:36 PM Daniel Axtens <dja@axtens.net> wrote:
>
> > I hit a similar sounding issue on a bnx2x - see commit
> > 8914a595110a6eca69a5e275b323f5d09e18f4f9
> >
> > In that case, a GSO packet with gso_size too large for the firmware was
> > coming to the bnx2x driver from an ibmveth device via Open vSwitch. I
> > also toyed with a fix in the stack and ended up fixing just the driver.
>
> Hi Daniel, yes I remember the issue that you reported at that time.
> The current issue is similar but for non-TSO UDP packets.
>
> >
> > I was hoping to get a generic fix in to the stack afterwards, but didn't
> > get anything finished. Looking back at old branches, it looks like I
> > considered adding MTU validation to validate_xmit_skb,
>
> MTU validation in validate_xmit_skb() would definitely prevent the
> current reported issue.  Perhaps we can add a WARN() when an invalid
> length is detected so that the underlying issue will be reported and
> fixed.  The current issue is that somehow the dst of the SKB gets
> changed to "lo" and fragmentation is not done for the proper output
> device.  Thanks.

[Looks like my earlier reply didn't make it to most of the people in
the list, So let me include that here also]

>>>
We have been battling with something similar. The suspicion is that
when an egress packet gets transferred from L4 to L3 some strange
sequence of event causes dst invalidation which results into dst
device to change to 'lo' which has the MTU of 65k, this makes the
stack pass to a frame larger than the MTU to the driver which results
into the undesired outcome you have mentioned.

I have cooked few patches to address this issue, however, I'm finding
it really hard to reproduce this issue in my test environment. If you
could try them and see if those fixes the issue, we can refine them if
necessary and eventually formalize patches.
<<<

Actually while looking at this issue I did look at the fix Daniel
mentioned (8914a595110a6eca69a5e275b323f5d09e18f4f9) and wanted to
reach out to him to see if the solution that I'm trying to implement
in the stack works in that scenario too.

The idea behind the fix is very simple and it is to create a dst-only
(unregistered) device with a very low MTU and use it instead of 'lo'
while invalidating the dst. This would make it *not* forward packets
to driver which might need fragmentation.

Attaching the patches to this mail. [Let me know if I should provide
them in a different way]

thanks,
--mahesh..

[-- Attachment #2: 0001-loopback-create-blackhole-net-device-similar-to-loop.patch --]
[-- Type: application/octet-stream, Size: 4587 bytes --]

From 9c248c001e5f90a60ebf5289806992ebb0dfdec5 Mon Sep 17 00:00:00 2001
From: Mahesh Bandewar <maheshb@google.com>
Date: Thu, 4 Oct 2018 13:33:38 -0700
Subject: [RFC PATCH 1/2] loopback: create blackhole net device similar to loopack.

Create a blackhole net device that can be used for "dead"
dst entries instead of loopback device. This blackhole device differs
from loopback in few aspects: (a) It's not per-ns. (b)  MTU on this
device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb().
and (d) since it's not registed it wont have ifindex.

Lower MTU effectively make the device not pass the MTU check during
the route check when a dst associated with the skb is dead.

Change-Id: I19fb3dfd7a91c8cba6c29274e80269371fc6d54a
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/loopback.c    | 72 ++++++++++++++++++++++++++++++++++-----
 include/linux/netdevice.h |  2 ++
 2 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 2df7f60fe052..a67daa16f49a 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -59,6 +59,13 @@
 #include <net/net_namespace.h>
 #include <linux/u64_stats_sync.h>
 
+/* blackhole_netdev - a device used for dsts that are marked expired!
+ * This is global device (instead of per-net-ns) since it's not needed
+ * to be per-ns and gets initialized at boot time.
+ */
+struct net_device *blackhole_netdev;
+EXPORT_SYMBOL(blackhole_netdev);
+
 /* The higher levels take care of making this non-reentrant (it's
  * called with bh's disabled).
  */
@@ -166,12 +173,14 @@ static const struct net_device_ops loopback_ops = {
 	.ndo_set_mac_address = eth_mac_addr,
 };
 
-/* The loopback device is special. There is only one instance
- * per network namespace.
- */
-static void loopback_setup(struct net_device *dev)
+static void gen_lo_setup(struct net_device *dev,
+			 unsigned int mtu,
+			 const struct ethtool_ops *eth_ops,
+			 const struct header_ops *hdr_ops,
+			 const struct net_device_ops *dev_ops,
+			 void (*dev_destructor)(struct net_device *dev))
 {
-	dev->mtu		= 64 * 1024;
+	dev->mtu		= mtu;
 	dev->hard_header_len	= ETH_HLEN;	/* 14	*/
 	dev->min_header_len	= ETH_HLEN;	/* 14	*/
 	dev->addr_len		= ETH_ALEN;	/* 6	*/
@@ -190,11 +199,20 @@ static void loopback_setup(struct net_device *dev)
 		| NETIF_F_NETNS_LOCAL
 		| NETIF_F_VLAN_CHALLENGED
 		| NETIF_F_LOOPBACK;
-	dev->ethtool_ops	= &loopback_ethtool_ops;
-	dev->header_ops		= &eth_header_ops;
-	dev->netdev_ops		= &loopback_ops;
+	dev->ethtool_ops	= eth_ops;
+	dev->header_ops		= hdr_ops;
+	dev->netdev_ops		= dev_ops;
 	dev->needs_free_netdev	= true;
-	dev->priv_destructor	= loopback_dev_free;
+	dev->priv_destructor	= dev_destructor;
+}
+
+/* The loopback device is special. There is only one instance
+ * per network namespace.
+ */
+static void loopback_setup(struct net_device *dev)
+{
+	gen_lo_setup(dev, (64 * 1024), &loopback_ethtool_ops, &eth_header_ops,
+		     &loopback_ops, loopback_dev_free);
 }
 
 /* Setup and register the loopback device. */
@@ -229,3 +247,39 @@ static __net_init int loopback_net_init(struct net *net)
 struct pernet_operations __net_initdata loopback_net_ops = {
 	.init = loopback_net_init,
 };
+
+/* blackhole netdevice */
+static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb,
+					 struct net_device *dev)
+{
+	kfree_skb(skb);
+	net_warn_ratelimited("%s(): Dropping skb.\n", __func__);
+	return NETDEV_TX_OK;
+}
+
+static const struct net_device_ops blackhole_netdev_ops = {
+	.ndo_start_xmit = blackhole_netdev_xmit,
+};
+
+/* This is a dst-dummy device used specifically for invalidated
+ * DSTs and unlike loopback, this is not per-ns.
+ */
+static void blackhole_netdev_setup(struct net_device *dev)
+{
+	gen_lo_setup(dev, ETH_MIN_MTU, NULL, NULL, &blackhole_netdev_ops, NULL);
+}
+
+/* Setup and register the blackhole_netdev. */
+static int __init blackhole_netdev_init(void)
+{
+	blackhole_netdev = alloc_netdev(0, "blackhole_dev", NET_NAME_UNKNOWN,
+					blackhole_netdev_setup);
+	if (!blackhole_netdev)
+		return -ENOMEM;
+
+	dev_init_scheduler(blackhole_netdev);
+
+	return 0;
+}
+
+device_initcall(blackhole_netdev_init);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a57b9a853aab..6a52b112ad82 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4838,4 +4838,6 @@ do {								\
 #define PTYPE_HASH_SIZE	(16)
 #define PTYPE_HASH_MASK	(PTYPE_HASH_SIZE - 1)
 
+extern struct net_device *blackhole_netdev;
+
 #endif	/* _LINUX_NETDEVICE_H */
-- 
2.20.1.321.g9e740568ce-goog


[-- Attachment #3: 0002-blackhole_netdev-use-blackhole_netdev-to-invalidate-.patch --]
[-- Type: application/octet-stream, Size: 2003 bytes --]

From ad615fa0994f3e671a9e97121be24936c06d6b36 Mon Sep 17 00:00:00 2001
From: Mahesh Bandewar <maheshb@google.com>
Date: Thu, 4 Oct 2018 13:40:31 -0700
Subject: [RFC PATCH 2/2] blackhole_netdev: use blackhole_netdev to invalidate dst entries

Use blackhole_netdev instead of 'lo' device  with lower MTU when marking
dst "dead".

Change-Id: I6c5cf4bb6aafe50ec3ad31defa03906b00b2cd8b
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 net/core/dst.c   | 2 +-
 net/ipv4/route.c | 3 +--
 net/ipv6/route.c | 2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index a263309df115..a07b05d0a826 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -176,7 +176,7 @@ void dst_dev_put(struct dst_entry *dst)
 		dst->ops->ifdown(dst, dev, true);
 	dst->input = dst_discard;
 	dst->output = dst_discard_out;
-	dst->dev = dev_net(dst->dev)->loopback_dev;
+	dst->dev = blackhole_netdev;
 	dev_hold(dst->dev);
 	dev_put(dev);
 }
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 99be68b15da0..012c64ebbd0c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1485,7 +1485,6 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
 
 void rt_flush_dev(struct net_device *dev)
 {
-	struct net *net = dev_net(dev);
 	struct rtable *rt;
 	int cpu;
 
@@ -1496,7 +1495,7 @@ void rt_flush_dev(struct net_device *dev)
 		list_for_each_entry(rt, &ul->head, rt_uncached) {
 			if (rt->dst.dev != dev)
 				continue;
-			rt->dst.dev = net->loopback_dev;
+			rt->dst.dev = blackhole_netdev;
 			dev_hold(rt->dst.dev);
 			dev_put(dev);
 		}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8e11f9a557b1..695f3aeb1f4d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -179,7 +179,7 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev)
 			}
 
 			if (rt_dev == dev) {
-				rt->dst.dev = loopback_dev;
+				rt->dst.dev = blackhole_netdev;
 				dev_hold(rt->dst.dev);
 				dev_put(rt_dev);
 			}
-- 
2.20.1.321.g9e740568ce-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Stack sends oversize UDP packet to the driver
  2019-01-22 18:28     ` Mahesh Bandewar (महेश बंडेवार)
@ 2019-01-22 20:09       ` David Miller
  2019-01-30  9:07       ` Michael Chan
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2019-01-22 20:09 UTC (permalink / raw)
  To: maheshb; +Cc: michael.chan, dja, netdev, edumazet, willemb

From: Mahesh Bandewar (महेश बंडेवार) <maheshb@google.com>
Date: Tue, 22 Jan 2019 10:28:58 -0800

> On Mon, Jan 21, 2019 at 4:59 PM Michael Chan <michael.chan@broadcom.com> wrote:
>>
>> On Mon, Jan 21, 2019 at 4:36 PM Daniel Axtens <dja@axtens.net> wrote:
>>
>> > I hit a similar sounding issue on a bnx2x - see commit
>> > 8914a595110a6eca69a5e275b323f5d09e18f4f9
>> >
>> > In that case, a GSO packet with gso_size too large for the firmware was
>> > coming to the bnx2x driver from an ibmveth device via Open vSwitch. I
>> > also toyed with a fix in the stack and ended up fixing just the driver.
>>
>> Hi Daniel, yes I remember the issue that you reported at that time.
>> The current issue is similar but for non-TSO UDP packets.

I just want to step in and say that I really don't want drivers to have to
deal with this problem, if possible.  Drivers are hard enough to write
already.

However, some aspects of this problem, like the MTU changes, route flaps,
packet mirroring, etc. are all in one way or another asynchonous to the
data fast path and therefore we may end up having to do something like
this afterall right at the last step before ->ndo_start_xmit() or similar.

Just my $0.02

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stack sends oversize UDP packet to the driver
  2019-01-22 18:28     ` Mahesh Bandewar (महेश बंडेवार)
  2019-01-22 20:09       ` David Miller
@ 2019-01-30  9:07       ` Michael Chan
  2019-01-31  1:00         ` Mahesh Bandewar (महेश बंडेवार)
  1 sibling, 1 reply; 12+ messages in thread
From: Michael Chan @ 2019-01-30  9:07 UTC (permalink / raw)
  To: Mahesh Bandewar (महेश
	बंडेवार)
  Cc: Daniel Axtens, Netdev, David Miller, Eric Dumazet, Willem de Bruijn

On Tue, Jan 22, 2019 at 10:29 AM Mahesh Bandewar (महेश बंडेवार)
<maheshb@google.com> wrote:

>
> The idea behind the fix is very simple and it is to create a dst-only
> (unregistered) device with a very low MTU and use it instead of 'lo'
> while invalidating the dst. This would make it *not* forward packets
> to driver which might need fragmentation.
>

We tested the 2 patches many times and including an overnight test.  I
can confirm that the oversize UDP packets are no longer seen with the
patches applied.  However, I don't see the blackhole xmit function
getting called to free the SKBs though.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stack sends oversize UDP packet to the driver
  2019-01-30  9:07       ` Michael Chan
@ 2019-01-31  1:00         ` Mahesh Bandewar (महेश बंडेवार)
  2019-02-05 19:35           ` Michael Chan
  0 siblings, 1 reply; 12+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2019-01-31  1:00 UTC (permalink / raw)
  To: Michael Chan
  Cc: Daniel Axtens, Netdev, David Miller, Eric Dumazet, Willem de Bruijn

On Wed, Jan 30, 2019 at 1:07 AM Michael Chan <michael.chan@broadcom.com> wrote:
>
> On Tue, Jan 22, 2019 at 10:29 AM Mahesh Bandewar (महेश बंडेवार)
> <maheshb@google.com> wrote:
>
> >
> > The idea behind the fix is very simple and it is to create a dst-only
> > (unregistered) device with a very low MTU and use it instead of 'lo'
> > while invalidating the dst. This would make it *not* forward packets
> > to driver which might need fragmentation.
> >
>
> We tested the 2 patches many times and including an overnight test.  I
> can confirm that the oversize UDP packets are no longer seen with the
> patches applied.  However, I don't see the blackhole xmit function
> getting called to free the SKBs though.
>
Thanks for the confirmation Michael. The blackhole device mtu is
really small, so I would assume the fragmentation code dropped those
packets before calling the xmit function (in ip_fragment), you could
verify that with icmp counters.

> Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stack sends oversize UDP packet to the driver
  2019-01-31  1:00         ` Mahesh Bandewar (महेश बंडेवार)
@ 2019-02-05 19:35           ` Michael Chan
  2019-02-07  4:51             ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Chan @ 2019-02-05 19:35 UTC (permalink / raw)
  To: Mahesh Bandewar (महेश
	बंडेवार)
  Cc: Daniel Axtens, Netdev, David Miller, Eric Dumazet, Willem de Bruijn

On Wed, Jan 30, 2019 at 5:00 PM Mahesh Bandewar (महेश बंडेवार)
<maheshb@google.com> wrote:
>
> On Wed, Jan 30, 2019 at 1:07 AM Michael Chan <michael.chan@broadcom.com> wrote:
> >
> > On Tue, Jan 22, 2019 at 10:29 AM Mahesh Bandewar (महेश बंडेवार)
> > <maheshb@google.com> wrote:
> >
> > >
> > > The idea behind the fix is very simple and it is to create a dst-only
> > > (unregistered) device with a very low MTU and use it instead of 'lo'
> > > while invalidating the dst. This would make it *not* forward packets
> > > to driver which might need fragmentation.
> > >
> >
> > We tested the 2 patches many times and including an overnight test.  I
> > can confirm that the oversize UDP packets are no longer seen with the
> > patches applied.  However, I don't see the blackhole xmit function
> > getting called to free the SKBs though.
> >
> Thanks for the confirmation Michael. The blackhole device mtu is
> really small, so I would assume the fragmentation code dropped those
> packets before calling the xmit function (in ip_fragment), you could
> verify that with icmp counters.
>

I've looked at this a little more.  The blackhole_dev is not IFF_UP |
IFF_RUNNING, right?  May be that's why the packets are never getting
to the xmit function?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stack sends oversize UDP packet to the driver
  2019-02-05 19:35           ` Michael Chan
@ 2019-02-07  4:51             ` Mahesh Bandewar (महेश बंडेवार)
  2019-02-08 20:26               ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 1 reply; 12+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2019-02-07  4:51 UTC (permalink / raw)
  To: Michael Chan
  Cc: Daniel Axtens, Netdev, David Miller, Eric Dumazet, Willem de Bruijn

On Tue, Feb 5, 2019 at 11:36 AM Michael Chan <michael.chan@broadcom.com> wrote:
>
> On Wed, Jan 30, 2019 at 5:00 PM Mahesh Bandewar (महेश बंडेवार)
> <maheshb@google.com> wrote:
> >
> > On Wed, Jan 30, 2019 at 1:07 AM Michael Chan <michael.chan@broadcom.com> wrote:
> > >
> > > On Tue, Jan 22, 2019 at 10:29 AM Mahesh Bandewar (महेश बंडेवार)
> > > <maheshb@google.com> wrote:
> > >
> > > >
> > > > The idea behind the fix is very simple and it is to create a dst-only
> > > > (unregistered) device with a very low MTU and use it instead of 'lo'
> > > > while invalidating the dst. This would make it *not* forward packets
> > > > to driver which might need fragmentation.
> > > >
> > >
> > > We tested the 2 patches many times and including an overnight test.  I
> > > can confirm that the oversize UDP packets are no longer seen with the
> > > patches applied.  However, I don't see the blackhole xmit function
> > > getting called to free the SKBs though.
> > >
> > Thanks for the confirmation Michael. The blackhole device mtu is
> > really small, so I would assume the fragmentation code dropped those
> > packets before calling the xmit function (in ip_fragment), you could
> > verify that with icmp counters.
> >
>
> I've looked at this a little more.  The blackhole_dev is not IFF_UP |
> IFF_RUNNING, right?  May be that's why the packets are never getting
> to the xmit function?
Yes, so I added those two flags and ended up writing a test-module for
the device (which I will include while posting the patch-series).
However, adding those flags is also not sufficient since the qdisc is
initialized to noop_qdisc so qdisc enqueue will drop packets before
hitting the ndo_start_xmit().

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stack sends oversize UDP packet to the driver
  2019-02-07  4:51             ` Mahesh Bandewar (महेश बंडेवार)
@ 2019-02-08 20:26               ` Mahesh Bandewar (महेश बंडेवार)
  2019-02-12  8:55                 ` Michael Chan
  0 siblings, 1 reply; 12+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2019-02-08 20:26 UTC (permalink / raw)
  To: Michael Chan
  Cc: Daniel Axtens, Netdev, David Miller, Eric Dumazet, Willem de Bruijn

[-- Attachment #1: Type: text/plain, Size: 2315 bytes --]

On Wed, Feb 6, 2019 at 8:51 PM Mahesh Bandewar (महेश बंडेवार)
<maheshb@google.com> wrote:
>
> On Tue, Feb 5, 2019 at 11:36 AM Michael Chan <michael.chan@broadcom.com> wrote:
> >
> > On Wed, Jan 30, 2019 at 5:00 PM Mahesh Bandewar (महेश बंडेवार)
> > <maheshb@google.com> wrote:
> > >
> > > On Wed, Jan 30, 2019 at 1:07 AM Michael Chan <michael.chan@broadcom.com> wrote:
> > > >
> > > > On Tue, Jan 22, 2019 at 10:29 AM Mahesh Bandewar (महेश बंडेवार)
> > > > <maheshb@google.com> wrote:
> > > >
> > > > >
> > > > > The idea behind the fix is very simple and it is to create a dst-only
> > > > > (unregistered) device with a very low MTU and use it instead of 'lo'
> > > > > while invalidating the dst. This would make it *not* forward packets
> > > > > to driver which might need fragmentation.
> > > > >
> > > >
> > > > We tested the 2 patches many times and including an overnight test.  I
> > > > can confirm that the oversize UDP packets are no longer seen with the
> > > > patches applied.  However, I don't see the blackhole xmit function
> > > > getting called to free the SKBs though.
> > > >
> > > Thanks for the confirmation Michael. The blackhole device mtu is
> > > really small, so I would assume the fragmentation code dropped those
> > > packets before calling the xmit function (in ip_fragment), you could
> > > verify that with icmp counters.
> > >
> >
> > I've looked at this a little more.  The blackhole_dev is not IFF_UP |
> > IFF_RUNNING, right?  May be that's why the packets are never getting
> > to the xmit function?
> Yes, so I added those two flags and ended up writing a test-module for
> the device (which I will include while posting the patch-series).
> However, adding those flags is also not sufficient since the qdisc is
> initialized to noop_qdisc so qdisc enqueue will drop packets before
> hitting the ndo_start_xmit().

I have another version of the fix (with help from Eric) and this
should hit the .ndo_start_xmit() of the blackhole_dev. I'm adding
these flags during the setup and then calling dev_activate() to change
noop qdisc to null qdisc. Please give this patch set a try and let me
know if the blackhole_dev xmit path gets exercised in your test
scenario.

[-- Attachment #2: 0001-loopback-create-blackhole-net-device-similar-to-loop.patch --]
[-- Type: application/octet-stream, Size: 4714 bytes --]

From 8aa152e0feb929479ef3f2f93fdd2edcf6f463b8 Mon Sep 17 00:00:00 2001
From: Mahesh Bandewar <maheshb@google.com>
Date: Thu, 4 Oct 2018 13:33:38 -0700
Subject: [PATCH 1/3] loopback: create blackhole net device similar to loopack.

Create a blackhole net device that can be used for "dead"
dst entries instead of loopback device. This blackhole device differs
from loopback in few aspects: (a) It's not per-ns. (b)  MTU on this
device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb().
and (d) since it's not registed it wont have ifindex.

Lower MTU effectively make the device not pass the MTU check during
the route check when a dst associated with the skb is dead.

Change-Id: I19fb3dfd7a91c8cba6c29274e80269371fc6d54a
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/loopback.c    | 76 ++++++++++++++++++++++++++++++++++-----
 include/linux/netdevice.h |  2 ++
 2 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 2df7f60fe052..48d7842d5f9c 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -59,6 +59,13 @@
 #include <net/net_namespace.h>
 #include <linux/u64_stats_sync.h>
 
+/* blackhole_netdev - a device used for dsts that are marked expired!
+ * This is global device (instead of per-net-ns) since it's not needed
+ * to be per-ns and gets initialized at boot time.
+ */
+struct net_device *blackhole_netdev;
+EXPORT_SYMBOL(blackhole_netdev);
+
 /* The higher levels take care of making this non-reentrant (it's
  * called with bh's disabled).
  */
@@ -166,12 +173,14 @@ static const struct net_device_ops loopback_ops = {
 	.ndo_set_mac_address = eth_mac_addr,
 };
 
-/* The loopback device is special. There is only one instance
- * per network namespace.
- */
-static void loopback_setup(struct net_device *dev)
+static void gen_lo_setup(struct net_device *dev,
+			 unsigned int mtu,
+			 const struct ethtool_ops *eth_ops,
+			 const struct header_ops *hdr_ops,
+			 const struct net_device_ops *dev_ops,
+			 void (*dev_destructor)(struct net_device *dev))
 {
-	dev->mtu		= 64 * 1024;
+	dev->mtu		= mtu;
 	dev->hard_header_len	= ETH_HLEN;	/* 14	*/
 	dev->min_header_len	= ETH_HLEN;	/* 14	*/
 	dev->addr_len		= ETH_ALEN;	/* 6	*/
@@ -190,11 +199,20 @@ static void loopback_setup(struct net_device *dev)
 		| NETIF_F_NETNS_LOCAL
 		| NETIF_F_VLAN_CHALLENGED
 		| NETIF_F_LOOPBACK;
-	dev->ethtool_ops	= &loopback_ethtool_ops;
-	dev->header_ops		= &eth_header_ops;
-	dev->netdev_ops		= &loopback_ops;
+	dev->ethtool_ops	= eth_ops;
+	dev->header_ops		= hdr_ops;
+	dev->netdev_ops		= dev_ops;
 	dev->needs_free_netdev	= true;
-	dev->priv_destructor	= loopback_dev_free;
+	dev->priv_destructor	= dev_destructor;
+}
+
+/* The loopback device is special. There is only one instance
+ * per network namespace.
+ */
+static void loopback_setup(struct net_device *dev)
+{
+	gen_lo_setup(dev, (64 * 1024), &loopback_ethtool_ops, &eth_header_ops,
+		     &loopback_ops, loopback_dev_free);
 }
 
 /* Setup and register the loopback device. */
@@ -229,3 +247,43 @@ static __net_init int loopback_net_init(struct net *net)
 struct pernet_operations __net_initdata loopback_net_ops = {
 	.init = loopback_net_init,
 };
+
+/* blackhole netdevice */
+static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb,
+					 struct net_device *dev)
+{
+	kfree_skb(skb);
+	net_warn_ratelimited("%s(): Dropping skb.\n", __func__);
+	return NETDEV_TX_OK;
+}
+
+static const struct net_device_ops blackhole_netdev_ops = {
+	.ndo_start_xmit = blackhole_netdev_xmit,
+};
+
+/* This is a dst-dummy device used specifically for invalidated
+ * DSTs and unlike loopback, this is not per-ns.
+ */
+static void blackhole_netdev_setup(struct net_device *dev)
+{
+	gen_lo_setup(dev, ETH_MIN_MTU, NULL, NULL, &blackhole_netdev_ops, NULL);
+}
+
+/* Setup and register the blackhole_netdev. */
+static int __init blackhole_netdev_init(void)
+{
+	blackhole_netdev = alloc_netdev(0, "blackhole_dev", NET_NAME_UNKNOWN,
+					blackhole_netdev_setup);
+	if (!blackhole_netdev)
+		return -ENOMEM;
+
+	dev_init_scheduler(blackhole_netdev);
+	dev_activate(blackhole_netdev);
+
+	blackhole_netdev->flags |= IFF_UP | IFF_RUNNING;
+	dev_net_set(blackhole_netdev, &init_net);
+
+	return 0;
+}
+
+device_initcall(blackhole_netdev_init);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1d95e634f3fe..c10cba10388a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4848,4 +4848,6 @@ do {								\
 #define PTYPE_HASH_SIZE	(16)
 #define PTYPE_HASH_MASK	(PTYPE_HASH_SIZE - 1)
 
+extern struct net_device *blackhole_netdev;
+
 #endif	/* _LINUX_NETDEVICE_H */
-- 
2.20.1.791.gb4d0f1c61a-goog


[-- Attachment #3: 0002-blackhole_netdev-use-blackhole_netdev-to-invalidate-.patch --]
[-- Type: application/octet-stream, Size: 2000 bytes --]

From 8ca028c8a51f0e8b9376c195fe96086d4dfa18ac Mon Sep 17 00:00:00 2001
From: Mahesh Bandewar <maheshb@google.com>
Date: Thu, 4 Oct 2018 13:40:31 -0700
Subject: [PATCH 2/3] blackhole_netdev: use blackhole_netdev to invalidate dst
 entries

Use blackhole_netdev instead of 'lo' device  with lower MTU when marking
dst "dead".

Change-Id: I6c5cf4bb6aafe50ec3ad31defa03906b00b2cd8b
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 net/core/dst.c   | 2 +-
 net/ipv4/route.c | 3 +--
 net/ipv6/route.c | 2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index a263309df115..a07b05d0a826 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -176,7 +176,7 @@ void dst_dev_put(struct dst_entry *dst)
 		dst->ops->ifdown(dst, dev, true);
 	dst->input = dst_discard;
 	dst->output = dst_discard_out;
-	dst->dev = dev_net(dst->dev)->loopback_dev;
+	dst->dev = blackhole_netdev;
 	dev_hold(dst->dev);
 	dev_put(dev);
 }
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 16259ea9df54..4e4f1b36e306 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1485,7 +1485,6 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
 
 void rt_flush_dev(struct net_device *dev)
 {
-	struct net *net = dev_net(dev);
 	struct rtable *rt;
 	int cpu;
 
@@ -1496,7 +1495,7 @@ void rt_flush_dev(struct net_device *dev)
 		list_for_each_entry(rt, &ul->head, rt_uncached) {
 			if (rt->dst.dev != dev)
 				continue;
-			rt->dst.dev = net->loopback_dev;
+			rt->dst.dev = blackhole_netdev;
 			dev_hold(rt->dst.dev);
 			dev_put(dev);
 		}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index dc066fdf7e46..aca0e5651f5b 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -179,7 +179,7 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev)
 			}
 
 			if (rt_dev == dev) {
-				rt->dst.dev = loopback_dev;
+				rt->dst.dev = blackhole_netdev;
 				dev_hold(rt->dst.dev);
 				dev_put(rt_dev);
 			}
-- 
2.20.1.791.gb4d0f1c61a-goog


[-- Attachment #4: 0003-blackhole_dev-add-a-selftest.patch --]
[-- Type: application/octet-stream, Size: 6342 bytes --]

From 992832040ffe08b106bbf21c0fa8a7317655d530 Mon Sep 17 00:00:00 2001
From: Mahesh Bandewar <maheshb@google.com>
Date: Wed, 10 Oct 2018 15:25:01 -0700
Subject: [PATCH 3/3] blackhole_dev: add a selftest

Change-Id: Ia06c3edc124c652ee03833eb60a5c6229d4c4ec5
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 lib/Kconfig.debug                             |   9 ++
 lib/Makefile                                  |   1 +
 lib/test_blackhole_dev.c                      | 100 ++++++++++++++++++
 tools/testing/selftests/net/Makefile          |   3 +-
 tools/testing/selftests/net/config            |   1 +
 .../selftests/net/test_blackhole_dev.sh       |  11 ++
 6 files changed, 124 insertions(+), 1 deletion(-)
 create mode 100644 lib/test_blackhole_dev.c
 create mode 100755 tools/testing/selftests/net/test_blackhole_dev.sh

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d4df5b24d75e..593ecd4bc815 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1901,6 +1901,15 @@ config TEST_BPF
 
 	  If unsure, say N.
 
+config TEST_BLACKHOLE_DEV
+	tristate "Test BPF filter functionality"
+	depends on m && NET
+	help
+	  This builds the "test_blackhole_dev" module that validates the
+	  data path through this blackhole netdev.
+
+	  If unsure, say N.
+
 config FIND_BIT_BENCHMARK
 	tristate "Test find_bit functions"
 	help
diff --git a/lib/Makefile b/lib/Makefile
index e1b59da71418..733f66ceb51a 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -76,6 +76,7 @@ obj-$(CONFIG_TEST_KMOD) += test_kmod.o
 obj-$(CONFIG_TEST_DEBUG_VIRTUAL) += test_debug_virtual.o
 obj-$(CONFIG_TEST_MEMCAT_P) += test_memcat_p.o
 obj-$(CONFIG_TEST_OBJAGG) += test_objagg.o
+obj-$(CONFIG_TEST_BLACKHOLE_DEV) += test_blackhole_dev.o
 
 ifeq ($(CONFIG_DEBUG_KOBJECT),y)
 CFLAGS_kobject.o += -DDEBUG
diff --git a/lib/test_blackhole_dev.c b/lib/test_blackhole_dev.c
new file mode 100644
index 000000000000..4c40580a99a3
--- /dev/null
+++ b/lib/test_blackhole_dev.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This module tests the blackhole_dev that is created during the
+ * net subsystem initialization. The test this module performs is
+ * by injecting an skb into the stack with skb->dev as the
+ * blackhole_dev and expects kernel to behave in a sane manner
+ * (in other words, *not crash*)!
+ *
+ * Copyright (c) 2018, Mahesh Bandewar <maheshb@google.com>
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/udp.h>
+#include <linux/ipv6.h>
+
+#include <net/dst.h>
+
+#define SKB_SIZE  256
+#define HEAD_SIZE (14+40+8)	/* Ether + IPv6 + UDP */
+#define TAIL_SIZE 32		/* random tail-room */
+
+#define UDP_PORT 1234
+
+static int __init test_blackholedev_init(void)
+{
+	struct ipv6hdr *ip6h;
+	struct sk_buff *skb;
+	struct ethhdr *ethh;
+	struct udphdr *uh;
+	int data_len;
+	int ret;
+
+	skb = alloc_skb(SKB_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	/* Reserve head-room for the headers */
+	skb_reserve(skb, HEAD_SIZE);
+
+	/* Add data to the skb */
+	data_len = SKB_SIZE - (HEAD_SIZE + TAIL_SIZE);
+	memset(__skb_put(skb, data_len), 0xf, data_len);
+
+	/* Add protocol data */
+	/* (Transport) UDP */
+	uh = (struct udphdr *)skb_push(skb, sizeof(struct udphdr));
+	skb_set_transport_header(skb, 0);
+	uh->source = uh->dest = htons(UDP_PORT);
+	uh->len = htons(data_len);
+	uh->check = 0;
+	/* (Network) IPv6 */
+	ip6h = (struct ipv6hdr *)skb_push(skb, sizeof(struct ipv6hdr));
+	skb_set_network_header(skb, 0);
+	ip6h->hop_limit = 32;
+	ip6h->payload_len = data_len + sizeof(struct udphdr);
+	ip6h->nexthdr = IPPROTO_UDP;
+	ip6h->saddr = in6addr_loopback;
+	ip6h->daddr = in6addr_loopback;
+	/* Ether */
+	ethh = (struct ethhdr *)skb_push(skb, sizeof(struct ethhdr));
+	skb_set_mac_header(skb, 0);
+
+	skb->protocol = htons(ETH_P_IPV6);
+	skb->pkt_type = PACKET_HOST;
+	skb->dev = blackhole_netdev;
+
+	/* Now attempt to send the packet */
+	ret = dev_queue_xmit(skb);
+
+	switch (ret) {
+	case NET_XMIT_SUCCESS:
+		pr_warn("dev_queue_xmit() returned NET_XMIT_SUCCESS\n");
+		break;
+	case NET_XMIT_DROP:
+		pr_warn("dev_queue_xmit() returned NET_XMIT_DROP\n");
+		break;
+	case NET_XMIT_CN:
+		pr_warn("dev_queue_xmit() returned NET_XMIT_CN\n");
+		break;
+	default:
+		pr_err("dev_queue_xmit() returned UNKNOWN(%d)\n", ret);
+	}
+
+	return 0;
+}
+
+static void __exit test_blackholedev_exit(void)
+{
+	pr_warn("test_blackholedev module terminating.\n");
+}
+
+module_init(test_blackholedev_init);
+module_exit(test_blackholedev_exit);
+
+MODULE_AUTHOR("Mahesh Bandewar <maheshb@google.com>");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index f8f3e90700c0..8952b2039fdc 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -4,8 +4,9 @@
 CFLAGS =  -Wall -Wl,--no-as-needed -O2 -g
 CFLAGS += -I../../../../usr/include/
 
+<<<<<<< HEAD
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh \
-	      rtnetlink.sh xfrm_policy.sh
+	      rtnetlink.sh xfrm_policy.sh test_blackhole_dev.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
 TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
 TEST_PROGS += udpgro_bench.sh udpgro.sh test_vxlan_under_vrf.sh reuseport_addr_any.sh
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
index 5821bdd98d20..4553c7367aad 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -28,3 +28,4 @@ CONFIG_NF_TABLES_IPV6=y
 CONFIG_NF_TABLES_IPV4=y
 CONFIG_NFT_CHAIN_NAT_IPV6=m
 CONFIG_NFT_CHAIN_NAT_IPV4=m
+CONFIG_TEST_BLACKHOLE_DEV=m
diff --git a/tools/testing/selftests/net/test_blackhole_dev.sh b/tools/testing/selftests/net/test_blackhole_dev.sh
new file mode 100755
index 000000000000..3119b80e711f
--- /dev/null
+++ b/tools/testing/selftests/net/test_blackhole_dev.sh
@@ -0,0 +1,11 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# Runs blackhole-dev test using blackhole-dev kernel module
+
+if /sbin/modprobe -q test_blackhole_dev ; then
+	/sbin/modprobe -q -r test_blackhole_dev;
+	echo "test_blackhole_dev: ok";
+else
+	echo "test_blackhole_dev: [FAIL]";
+	exit 1;
+fi
-- 
2.20.1.791.gb4d0f1c61a-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Stack sends oversize UDP packet to the driver
  2019-02-08 20:26               ` Mahesh Bandewar (महेश बंडेवार)
@ 2019-02-12  8:55                 ` Michael Chan
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Chan @ 2019-02-12  8:55 UTC (permalink / raw)
  To: Mahesh Bandewar (महेश
	बंडेवार)
  Cc: Daniel Axtens, Netdev, David Miller, Eric Dumazet, Willem de Bruijn

On Fri, Feb 8, 2019 at 12:26 PM Mahesh Bandewar (महेश बंडेवार)
<maheshb@google.com> wrote:
>
> On Wed, Feb 6, 2019 at 8:51 PM Mahesh Bandewar (महेश बंडेवार)
> <maheshb@google.com> wrote:
> >
> > On Tue, Feb 5, 2019 at 11:36 AM Michael Chan <michael.chan@broadcom.com> wrote:
> > > I've looked at this a little more.  The blackhole_dev is not IFF_UP |
> > > IFF_RUNNING, right?  May be that's why the packets are never getting
> > > to the xmit function?
> > Yes, so I added those two flags and ended up writing a test-module for
> > the device (which I will include while posting the patch-series).
> > However, adding those flags is also not sufficient since the qdisc is
> > initialized to noop_qdisc so qdisc enqueue will drop packets before
> > hitting the ndo_start_xmit().
>
> I have another version of the fix (with help from Eric) and this
> should hit the .ndo_start_xmit() of the blackhole_dev. I'm adding
> these flags during the setup and then calling dev_activate() to change
> noop qdisc to null qdisc. Please give this patch set a try and let me
> know if the blackhole_dev xmit path gets exercised in your test
> scenario.

The new version still works in the sense that no oversize packets are
seen in the NIC driver's xmit function.  But I still don't see any
packets hitting the blackhole's xmit function.  I'm not 100% sure but
I think the blackhole dev has no IP address and so the UDP packets are
dropped in ip_finish_output2() because there is no neigh.  Something
like that.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-02-12  8:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-20 22:26 Stack sends oversize UDP packet to the driver Michael Chan
2019-01-22  0:36 ` Daniel Axtens
2019-01-22  0:59   ` Michael Chan
2019-01-22 18:28     ` Mahesh Bandewar (महेश बंडेवार)
2019-01-22 20:09       ` David Miller
2019-01-30  9:07       ` Michael Chan
2019-01-31  1:00         ` Mahesh Bandewar (महेश बंडेवार)
2019-02-05 19:35           ` Michael Chan
2019-02-07  4:51             ` Mahesh Bandewar (महेश बंडेवार)
2019-02-08 20:26               ` Mahesh Bandewar (महेश बंडेवार)
2019-02-12  8:55                 ` Michael Chan
     [not found] ` <CAF2d9jgskHTb-nmbVo9A2CQhh9T3OnH_vbfGcMBii13oq1teCw@mail.gmail.com>
2019-01-22  0:45   ` Michael Chan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).