netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: bpf@vger.kernel.org
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
	netdev@vger.kernel.org, Daniel Borkmann <borkmann@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	maze@google.com, lmb@cloudflare.com, shaun@tigera.io,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	marek@cloudflare.com, John Fastabend <john.fastabend@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>,
	eyal.birger@gmail.com
Subject: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook
Date: Wed, 07 Oct 2020 18:23:00 +0200	[thread overview]
Message-ID: <160208778070.798237.16265441131909465819.stgit@firesoul> (raw)
In-Reply-To: <160208770557.798237.11181325462593441941.stgit@firesoul>

The MTU should only apply to transmitted packets.

When TC-ingress redirect packet to egress on another netdev, then the
normal netstack MTU checks are skipped (and driver level will not catch
any MTU violation, checked ixgbe).

This patch choose not to add MTU check in the egress code path of
skb_do_redirect() prior to calling dev_queue_xmit(), because it is still
possible to run another BPF egress program that will shrink/consume
headers, which will make packet comply with netdev MTU. This use-case
might already be in production use (if ingress MTU is larger than egress).

Instead do the MTU check after sch_handle_egress() step, for the cases
that require this.

The cases need a bit explaining. Ingress to egress redirected packets
could be detected via skb->tc_at_ingress bit, but it is not reliable,
because sch_handle_egress() could steal the packet and redirect this
(again) to another egress netdev, which will then have the
skb->tc_at_ingress cleared. There is also the case of TC-egress prog
increase packet size and then redirect it egress. Thus, it is more
reliable to do the MTU check for any redirected packet (both ingress and
egress), which is available via skb_is_redirected() in earlier patch.
Also handle case where egress BPF-prog increased size.

One advantage of this approach is that it ingress-to-egress BPF-prog can
send information via packet data. With the MTU checks removed in the
helpers, and also not done in skb_do_redirect() call, this allows for an
ingress BPF-prog to communicate with an egress BPF-prog via packet data,
as long as egress BPF-prog remove this prior to transmitting packet.

Troubleshooting: MTU violations are recorded in TX dropped counter, and
kprobe on dev_queue_xmit() have retval -EMSGSIZE.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/dev.c |   24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b433098896b2..19406013f93e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3870,6 +3870,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
 	case TC_ACT_OK:
 	case TC_ACT_RECLASSIFY:
+		*ret = NET_XMIT_SUCCESS;
 		skb->tc_index = TC_H_MIN(cl_res.classid);
 		break;
 	case TC_ACT_SHOT:
@@ -4064,9 +4065,12 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 {
 	struct net_device *dev = skb->dev;
 	struct netdev_queue *txq;
+#ifdef CONFIG_NET_CLS_ACT
+	bool mtu_check = false;
+#endif
+	bool again = false;
 	struct Qdisc *q;
 	int rc = -ENOMEM;
-	bool again = false;
 
 	skb_reset_mac_header(skb);
 
@@ -4082,14 +4086,28 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 
 	qdisc_pkt_len_init(skb);
 #ifdef CONFIG_NET_CLS_ACT
+	mtu_check = skb_is_redirected(skb);
 	skb->tc_at_ingress = 0;
 # ifdef CONFIG_NET_EGRESS
 	if (static_branch_unlikely(&egress_needed_key)) {
+		unsigned int len_orig = skb->len;
+
 		skb = sch_handle_egress(skb, &rc, dev);
 		if (!skb)
 			goto out;
+		/* BPF-prog ran and could have changed packet size beyond MTU */
+		if (rc == NET_XMIT_SUCCESS && skb->len > len_orig)
+			mtu_check = true;
 	}
 # endif
+	/* MTU-check only happens on "last" net_device in a redirect sequence
+	 * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it
+	 * either ingress or egress to another device).
+	 */
+	if (mtu_check && !is_skb_forwardable(dev, skb)) {
+		rc = -EMSGSIZE;
+		goto drop;
+	}
 #endif
 	/* If device/qdisc don't need skb->dst, release it right now while
 	 * its hot in this cpu cache.
@@ -4157,7 +4175,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 
 	rc = -ENETDOWN;
 	rcu_read_unlock_bh();
-
+#ifdef CONFIG_NET_CLS_ACT
+drop:
+#endif
 	atomic_long_inc(&dev->tx_dropped);
 	kfree_skb_list(skb);
 	return rc;



  parent reply	other threads:[~2020-10-07 16:23 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-07 16:22 [PATCH bpf-next V2 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 1/6] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
2020-10-07 21:26   ` Daniel Borkmann
2020-10-07 23:46   ` Maciej Żenczykowski
2020-10-08 11:06     ` Jesper Dangaard Brouer
2020-10-08 12:33       ` Willem de Bruijn
2020-10-08 14:07         ` Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 3/6] bpf: add BPF-helper for MTU checking Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 4/6] bpf: make it possible to identify BPF redirected SKBs Jesper Dangaard Brouer
2020-10-07 16:23 ` Jesper Dangaard Brouer [this message]
2020-10-07 21:37   ` [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook Daniel Borkmann
2020-10-07 22:36     ` John Fastabend
2020-10-07 23:52       ` Maciej Żenczykowski
2020-10-08  3:19         ` John Fastabend
2020-10-08  8:07           ` Jesper Dangaard Brouer
2020-10-08  8:25             ` Daniel Borkmann
2020-10-08  8:30     ` Jesper Dangaard Brouer
2020-10-07 16:23 ` [PATCH bpf-next V2 6/6] bpf: drop MTU check when doing TC-BPF redirect to ingress Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=160208778070.798237.16265441131909465819.stgit@firesoul \
    --to=brouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=eyal.birger@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=lorenzo@kernel.org \
    --cc=marek@cloudflare.com \
    --cc=maze@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=shaun@tigera.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).