From: "Mahesh Bandewar (महेश बंडेवार)" <maheshb@google.com>
To: Michael Chan <michael.chan@broadcom.com>
Cc: Daniel Axtens <dja@axtens.net>, Netdev <netdev@vger.kernel.org>,
David Miller <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Willem de Bruijn <willemb@google.com>
Subject: Re: Stack sends oversize UDP packet to the driver
Date: Fri, 8 Feb 2019 12:26:15 -0800 [thread overview]
Message-ID: <CAF2d9jjwuT1WSz7P9QFbjdpp8GhhV-XBLzVC8H94Le_JCxL0fg@mail.gmail.com> (raw)
In-Reply-To: <CAF2d9jja7B+R8M7PXd+6jZBjifWs_NiR-7Ljn5kJvSOMaKBDYQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2315 bytes --]
On Wed, Feb 6, 2019 at 8:51 PM Mahesh Bandewar (महेश बंडेवार)
<maheshb@google.com> wrote:
>
> On Tue, Feb 5, 2019 at 11:36 AM Michael Chan <michael.chan@broadcom.com> wrote:
> >
> > On Wed, Jan 30, 2019 at 5:00 PM Mahesh Bandewar (महेश बंडेवार)
> > <maheshb@google.com> wrote:
> > >
> > > On Wed, Jan 30, 2019 at 1:07 AM Michael Chan <michael.chan@broadcom.com> wrote:
> > > >
> > > > On Tue, Jan 22, 2019 at 10:29 AM Mahesh Bandewar (महेश बंडेवार)
> > > > <maheshb@google.com> wrote:
> > > >
> > > > >
> > > > > The idea behind the fix is very simple and it is to create a dst-only
> > > > > (unregistered) device with a very low MTU and use it instead of 'lo'
> > > > > while invalidating the dst. This would make it *not* forward packets
> > > > > to driver which might need fragmentation.
> > > > >
> > > >
> > > > We tested the 2 patches many times and including an overnight test. I
> > > > can confirm that the oversize UDP packets are no longer seen with the
> > > > patches applied. However, I don't see the blackhole xmit function
> > > > getting called to free the SKBs though.
> > > >
> > > Thanks for the confirmation Michael. The blackhole device mtu is
> > > really small, so I would assume the fragmentation code dropped those
> > > packets before calling the xmit function (in ip_fragment), you could
> > > verify that with icmp counters.
> > >
> >
> > I've looked at this a little more. The blackhole_dev is not IFF_UP |
> > IFF_RUNNING, right? May be that's why the packets are never getting
> > to the xmit function?
> Yes, so I added those two flags and ended up writing a test-module for
> the device (which I will include while posting the patch-series).
> However, adding those flags is also not sufficient since the qdisc is
> initialized to noop_qdisc so qdisc enqueue will drop packets before
> hitting the ndo_start_xmit().
I have another version of the fix (with help from Eric) and this
should hit the .ndo_start_xmit() of the blackhole_dev. I'm adding
these flags during the setup and then calling dev_activate() to change
noop qdisc to null qdisc. Please give this patch set a try and let me
know if the blackhole_dev xmit path gets exercised in your test
scenario.
[-- Attachment #2: 0001-loopback-create-blackhole-net-device-similar-to-loop.patch --]
[-- Type: application/octet-stream, Size: 4714 bytes --]
From 8aa152e0feb929479ef3f2f93fdd2edcf6f463b8 Mon Sep 17 00:00:00 2001
From: Mahesh Bandewar <maheshb@google.com>
Date: Thu, 4 Oct 2018 13:33:38 -0700
Subject: [PATCH 1/3] loopback: create blackhole net device similar to loopack.
Create a blackhole net device that can be used for "dead"
dst entries instead of loopback device. This blackhole device differs
from loopback in few aspects: (a) It's not per-ns. (b) MTU on this
device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb().
and (d) since it's not registed it wont have ifindex.
Lower MTU effectively make the device not pass the MTU check during
the route check when a dst associated with the skb is dead.
Change-Id: I19fb3dfd7a91c8cba6c29274e80269371fc6d54a
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
drivers/net/loopback.c | 76 ++++++++++++++++++++++++++++++++++-----
include/linux/netdevice.h | 2 ++
2 files changed, 69 insertions(+), 9 deletions(-)
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 2df7f60fe052..48d7842d5f9c 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -59,6 +59,13 @@
#include <net/net_namespace.h>
#include <linux/u64_stats_sync.h>
+/* blackhole_netdev - a device used for dsts that are marked expired!
+ * This is global device (instead of per-net-ns) since it's not needed
+ * to be per-ns and gets initialized at boot time.
+ */
+struct net_device *blackhole_netdev;
+EXPORT_SYMBOL(blackhole_netdev);
+
/* The higher levels take care of making this non-reentrant (it's
* called with bh's disabled).
*/
@@ -166,12 +173,14 @@ static const struct net_device_ops loopback_ops = {
.ndo_set_mac_address = eth_mac_addr,
};
-/* The loopback device is special. There is only one instance
- * per network namespace.
- */
-static void loopback_setup(struct net_device *dev)
+static void gen_lo_setup(struct net_device *dev,
+ unsigned int mtu,
+ const struct ethtool_ops *eth_ops,
+ const struct header_ops *hdr_ops,
+ const struct net_device_ops *dev_ops,
+ void (*dev_destructor)(struct net_device *dev))
{
- dev->mtu = 64 * 1024;
+ dev->mtu = mtu;
dev->hard_header_len = ETH_HLEN; /* 14 */
dev->min_header_len = ETH_HLEN; /* 14 */
dev->addr_len = ETH_ALEN; /* 6 */
@@ -190,11 +199,20 @@ static void loopback_setup(struct net_device *dev)
| NETIF_F_NETNS_LOCAL
| NETIF_F_VLAN_CHALLENGED
| NETIF_F_LOOPBACK;
- dev->ethtool_ops = &loopback_ethtool_ops;
- dev->header_ops = ð_header_ops;
- dev->netdev_ops = &loopback_ops;
+ dev->ethtool_ops = eth_ops;
+ dev->header_ops = hdr_ops;
+ dev->netdev_ops = dev_ops;
dev->needs_free_netdev = true;
- dev->priv_destructor = loopback_dev_free;
+ dev->priv_destructor = dev_destructor;
+}
+
+/* The loopback device is special. There is only one instance
+ * per network namespace.
+ */
+static void loopback_setup(struct net_device *dev)
+{
+ gen_lo_setup(dev, (64 * 1024), &loopback_ethtool_ops, ð_header_ops,
+ &loopback_ops, loopback_dev_free);
}
/* Setup and register the loopback device. */
@@ -229,3 +247,43 @@ static __net_init int loopback_net_init(struct net *net)
struct pernet_operations __net_initdata loopback_net_ops = {
.init = loopback_net_init,
};
+
+/* blackhole netdevice */
+static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb,
+ struct net_device *dev)
+{
+ kfree_skb(skb);
+ net_warn_ratelimited("%s(): Dropping skb.\n", __func__);
+ return NETDEV_TX_OK;
+}
+
+static const struct net_device_ops blackhole_netdev_ops = {
+ .ndo_start_xmit = blackhole_netdev_xmit,
+};
+
+/* This is a dst-dummy device used specifically for invalidated
+ * DSTs and unlike loopback, this is not per-ns.
+ */
+static void blackhole_netdev_setup(struct net_device *dev)
+{
+ gen_lo_setup(dev, ETH_MIN_MTU, NULL, NULL, &blackhole_netdev_ops, NULL);
+}
+
+/* Setup and register the blackhole_netdev. */
+static int __init blackhole_netdev_init(void)
+{
+ blackhole_netdev = alloc_netdev(0, "blackhole_dev", NET_NAME_UNKNOWN,
+ blackhole_netdev_setup);
+ if (!blackhole_netdev)
+ return -ENOMEM;
+
+ dev_init_scheduler(blackhole_netdev);
+ dev_activate(blackhole_netdev);
+
+ blackhole_netdev->flags |= IFF_UP | IFF_RUNNING;
+ dev_net_set(blackhole_netdev, &init_net);
+
+ return 0;
+}
+
+device_initcall(blackhole_netdev_init);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1d95e634f3fe..c10cba10388a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4848,4 +4848,6 @@ do { \
#define PTYPE_HASH_SIZE (16)
#define PTYPE_HASH_MASK (PTYPE_HASH_SIZE - 1)
+extern struct net_device *blackhole_netdev;
+
#endif /* _LINUX_NETDEVICE_H */
--
2.20.1.791.gb4d0f1c61a-goog
[-- Attachment #3: 0002-blackhole_netdev-use-blackhole_netdev-to-invalidate-.patch --]
[-- Type: application/octet-stream, Size: 2000 bytes --]
From 8ca028c8a51f0e8b9376c195fe96086d4dfa18ac Mon Sep 17 00:00:00 2001
From: Mahesh Bandewar <maheshb@google.com>
Date: Thu, 4 Oct 2018 13:40:31 -0700
Subject: [PATCH 2/3] blackhole_netdev: use blackhole_netdev to invalidate dst
entries
Use blackhole_netdev instead of 'lo' device with lower MTU when marking
dst "dead".
Change-Id: I6c5cf4bb6aafe50ec3ad31defa03906b00b2cd8b
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
net/core/dst.c | 2 +-
net/ipv4/route.c | 3 +--
net/ipv6/route.c | 2 +-
3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/core/dst.c b/net/core/dst.c
index a263309df115..a07b05d0a826 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -176,7 +176,7 @@ void dst_dev_put(struct dst_entry *dst)
dst->ops->ifdown(dst, dev, true);
dst->input = dst_discard;
dst->output = dst_discard_out;
- dst->dev = dev_net(dst->dev)->loopback_dev;
+ dst->dev = blackhole_netdev;
dev_hold(dst->dev);
dev_put(dev);
}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 16259ea9df54..4e4f1b36e306 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1485,7 +1485,6 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
void rt_flush_dev(struct net_device *dev)
{
- struct net *net = dev_net(dev);
struct rtable *rt;
int cpu;
@@ -1496,7 +1495,7 @@ void rt_flush_dev(struct net_device *dev)
list_for_each_entry(rt, &ul->head, rt_uncached) {
if (rt->dst.dev != dev)
continue;
- rt->dst.dev = net->loopback_dev;
+ rt->dst.dev = blackhole_netdev;
dev_hold(rt->dst.dev);
dev_put(dev);
}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index dc066fdf7e46..aca0e5651f5b 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -179,7 +179,7 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev)
}
if (rt_dev == dev) {
- rt->dst.dev = loopback_dev;
+ rt->dst.dev = blackhole_netdev;
dev_hold(rt->dst.dev);
dev_put(rt_dev);
}
--
2.20.1.791.gb4d0f1c61a-goog
[-- Attachment #4: 0003-blackhole_dev-add-a-selftest.patch --]
[-- Type: application/octet-stream, Size: 6342 bytes --]
From 992832040ffe08b106bbf21c0fa8a7317655d530 Mon Sep 17 00:00:00 2001
From: Mahesh Bandewar <maheshb@google.com>
Date: Wed, 10 Oct 2018 15:25:01 -0700
Subject: [PATCH 3/3] blackhole_dev: add a selftest
Change-Id: Ia06c3edc124c652ee03833eb60a5c6229d4c4ec5
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
lib/Kconfig.debug | 9 ++
lib/Makefile | 1 +
lib/test_blackhole_dev.c | 100 ++++++++++++++++++
tools/testing/selftests/net/Makefile | 3 +-
tools/testing/selftests/net/config | 1 +
.../selftests/net/test_blackhole_dev.sh | 11 ++
6 files changed, 124 insertions(+), 1 deletion(-)
create mode 100644 lib/test_blackhole_dev.c
create mode 100755 tools/testing/selftests/net/test_blackhole_dev.sh
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d4df5b24d75e..593ecd4bc815 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1901,6 +1901,15 @@ config TEST_BPF
If unsure, say N.
+config TEST_BLACKHOLE_DEV
+ tristate "Test BPF filter functionality"
+ depends on m && NET
+ help
+ This builds the "test_blackhole_dev" module that validates the
+ data path through this blackhole netdev.
+
+ If unsure, say N.
+
config FIND_BIT_BENCHMARK
tristate "Test find_bit functions"
help
diff --git a/lib/Makefile b/lib/Makefile
index e1b59da71418..733f66ceb51a 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -76,6 +76,7 @@ obj-$(CONFIG_TEST_KMOD) += test_kmod.o
obj-$(CONFIG_TEST_DEBUG_VIRTUAL) += test_debug_virtual.o
obj-$(CONFIG_TEST_MEMCAT_P) += test_memcat_p.o
obj-$(CONFIG_TEST_OBJAGG) += test_objagg.o
+obj-$(CONFIG_TEST_BLACKHOLE_DEV) += test_blackhole_dev.o
ifeq ($(CONFIG_DEBUG_KOBJECT),y)
CFLAGS_kobject.o += -DDEBUG
diff --git a/lib/test_blackhole_dev.c b/lib/test_blackhole_dev.c
new file mode 100644
index 000000000000..4c40580a99a3
--- /dev/null
+++ b/lib/test_blackhole_dev.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This module tests the blackhole_dev that is created during the
+ * net subsystem initialization. The test this module performs is
+ * by injecting an skb into the stack with skb->dev as the
+ * blackhole_dev and expects kernel to behave in a sane manner
+ * (in other words, *not crash*)!
+ *
+ * Copyright (c) 2018, Mahesh Bandewar <maheshb@google.com>
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/udp.h>
+#include <linux/ipv6.h>
+
+#include <net/dst.h>
+
+#define SKB_SIZE 256
+#define HEAD_SIZE (14+40+8) /* Ether + IPv6 + UDP */
+#define TAIL_SIZE 32 /* random tail-room */
+
+#define UDP_PORT 1234
+
+static int __init test_blackholedev_init(void)
+{
+ struct ipv6hdr *ip6h;
+ struct sk_buff *skb;
+ struct ethhdr *ethh;
+ struct udphdr *uh;
+ int data_len;
+ int ret;
+
+ skb = alloc_skb(SKB_SIZE, GFP_KERNEL);
+ if (!skb)
+ return -ENOMEM;
+
+ /* Reserve head-room for the headers */
+ skb_reserve(skb, HEAD_SIZE);
+
+ /* Add data to the skb */
+ data_len = SKB_SIZE - (HEAD_SIZE + TAIL_SIZE);
+ memset(__skb_put(skb, data_len), 0xf, data_len);
+
+ /* Add protocol data */
+ /* (Transport) UDP */
+ uh = (struct udphdr *)skb_push(skb, sizeof(struct udphdr));
+ skb_set_transport_header(skb, 0);
+ uh->source = uh->dest = htons(UDP_PORT);
+ uh->len = htons(data_len);
+ uh->check = 0;
+ /* (Network) IPv6 */
+ ip6h = (struct ipv6hdr *)skb_push(skb, sizeof(struct ipv6hdr));
+ skb_set_network_header(skb, 0);
+ ip6h->hop_limit = 32;
+ ip6h->payload_len = data_len + sizeof(struct udphdr);
+ ip6h->nexthdr = IPPROTO_UDP;
+ ip6h->saddr = in6addr_loopback;
+ ip6h->daddr = in6addr_loopback;
+ /* Ether */
+ ethh = (struct ethhdr *)skb_push(skb, sizeof(struct ethhdr));
+ skb_set_mac_header(skb, 0);
+
+ skb->protocol = htons(ETH_P_IPV6);
+ skb->pkt_type = PACKET_HOST;
+ skb->dev = blackhole_netdev;
+
+ /* Now attempt to send the packet */
+ ret = dev_queue_xmit(skb);
+
+ switch (ret) {
+ case NET_XMIT_SUCCESS:
+ pr_warn("dev_queue_xmit() returned NET_XMIT_SUCCESS\n");
+ break;
+ case NET_XMIT_DROP:
+ pr_warn("dev_queue_xmit() returned NET_XMIT_DROP\n");
+ break;
+ case NET_XMIT_CN:
+ pr_warn("dev_queue_xmit() returned NET_XMIT_CN\n");
+ break;
+ default:
+ pr_err("dev_queue_xmit() returned UNKNOWN(%d)\n", ret);
+ }
+
+ return 0;
+}
+
+static void __exit test_blackholedev_exit(void)
+{
+ pr_warn("test_blackholedev module terminating.\n");
+}
+
+module_init(test_blackholedev_init);
+module_exit(test_blackholedev_exit);
+
+MODULE_AUTHOR("Mahesh Bandewar <maheshb@google.com>");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index f8f3e90700c0..8952b2039fdc 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -4,8 +4,9 @@
CFLAGS = -Wall -Wl,--no-as-needed -O2 -g
CFLAGS += -I../../../../usr/include/
+<<<<<<< HEAD
TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh \
- rtnetlink.sh xfrm_policy.sh
+ rtnetlink.sh xfrm_policy.sh test_blackhole_dev.sh
TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
TEST_PROGS += udpgro_bench.sh udpgro.sh test_vxlan_under_vrf.sh reuseport_addr_any.sh
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
index 5821bdd98d20..4553c7367aad 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -28,3 +28,4 @@ CONFIG_NF_TABLES_IPV6=y
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV4=m
+CONFIG_TEST_BLACKHOLE_DEV=m
diff --git a/tools/testing/selftests/net/test_blackhole_dev.sh b/tools/testing/selftests/net/test_blackhole_dev.sh
new file mode 100755
index 000000000000..3119b80e711f
--- /dev/null
+++ b/tools/testing/selftests/net/test_blackhole_dev.sh
@@ -0,0 +1,11 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# Runs blackhole-dev test using blackhole-dev kernel module
+
+if /sbin/modprobe -q test_blackhole_dev ; then
+ /sbin/modprobe -q -r test_blackhole_dev;
+ echo "test_blackhole_dev: ok";
+else
+ echo "test_blackhole_dev: [FAIL]";
+ exit 1;
+fi
--
2.20.1.791.gb4d0f1c61a-goog
next prev parent reply other threads:[~2019-02-08 20:26 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-20 22:26 Stack sends oversize UDP packet to the driver Michael Chan
2019-01-22 0:36 ` Daniel Axtens
2019-01-22 0:59 ` Michael Chan
2019-01-22 18:28 ` Mahesh Bandewar (महेश बंडेवार)
2019-01-22 20:09 ` David Miller
2019-01-30 9:07 ` Michael Chan
2019-01-31 1:00 ` Mahesh Bandewar (महेश बंडेवार)
2019-02-05 19:35 ` Michael Chan
2019-02-07 4:51 ` Mahesh Bandewar (महेश बंडेवार)
2019-02-08 20:26 ` Mahesh Bandewar (महेश बंडेवार) [this message]
2019-02-12 8:55 ` Michael Chan
[not found] ` <CAF2d9jgskHTb-nmbVo9A2CQhh9T3OnH_vbfGcMBii13oq1teCw@mail.gmail.com>
2019-01-22 0:45 ` Michael Chan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAF2d9jjwuT1WSz7P9QFbjdpp8GhhV-XBLzVC8H94Le_JCxL0fg@mail.gmail.com \
--to=maheshb@google.com \
--cc=davem@davemloft.net \
--cc=dja@axtens.net \
--cc=edumazet@google.com \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).