All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alan Maguire <alan.maguire@oracle.com>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	bpf <bpf@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>,
	Jakub Kicinski <kuba@kernel.org>
Subject: Re: Checksum behaviour of bpf_redirected packets
Date: Mon, 1 Jun 2020 18:48:20 +0100 (BST)	[thread overview]
Message-ID: <alpine.LRH.2.21.2006011839430.623@localhost> (raw)
In-Reply-To: <CACAyw9_LPEOvHbmP8UrpwVkwYT57rKWRisai=Z7kbKxOPh5XNQ@mail.gmail.com>



On Wed, 13 May 2020, Lorenz Bauer wrote:

> > > Option 1: always downgrade UNNECESSARY to NONE
> > > - Easiest to back port
> > > - The helper is safe by default
> > > - Performance impact unclear
> > > - No escape hatch for Cilium
> > >
> > > Option 2: add a flag to force CHECKSUM_NONE
> > > - New UAPI, can this be backported?
> > > - The helper isn't safe by default, needs documentation
> > > - Escape hatch for Cilium
> > >
> > > Option 3: downgrade to CHECKSUM_NONE, add flag to skip this
> > > - New UAPI, can this be backported?
> > > - The helper is safe by default
> > > - Escape hatch for Cilium (though you'd need to detect availability of the
> > >    flag somehow)
> >
> > This seems most reasonable to me; I can try and cook a proposal for tomorrow as
> > potential fix. Even if we add a flag, this is still backportable to stable (as
> > long as the overall patch doesn't get too complex and the backport itself stays
> > compatible uapi-wise to latest kernels. We've done that before.). I happen to
> > have two ixgbe NICs on some of my test machines which seem to be setting the
> > CHECKSUM_UNNECESSARY, so I'll run some experiments from over here as well.
> 
> Great! I'm happy to test, of course.
> 

I had a go at implementing option 3 as a few colleagues ran into this 
problem. They confirmed the fix below resolved the issue.  Daniel is
this  roughly what you had in mind? I can submit a patch for the bpf
tree if that's acceptable with the new flag. Do we need a few
tests though?

From 7e0b0c78530f3800e5c40aa1fe87e5db82c5fb59 Mon Sep 17 00:00:00 2001
From: Alan Maguire <alan.maguire@oracle.com>
Date: Mon, 1 Jun 2020 13:10:37 +0200
Subject: [PATCH bpf-next 1/2] bpf: fix bpf_skb_adjust_room decap for
 CHECKSUM_UNNECESSESARY skbs

When hardware verifies checksums for some of the headers it
will set CHECKSUM_UNNECESSESARY and csum_level indicates the
number of consecutive checksums found.  If we de-encapsulate
data however these values become invalid since we likely
just removed the checksum-validated headers.  The best option
in such cases is to revert to CHECKSUM_NONE as all checksums
will then be checked in software.  Otherwise such checks can
be skipped.

Other checksum states are handled via skb_postpull_rcsum().

Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 include/uapi/linux/bpf.h |  7 +++++++
 net/core/filter.c        | 15 ++++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 974ca6e..03ab70c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1646,6 +1646,12 @@ struct bpf_stack_build_id {
  *		* **BPF_F_ADJ_ROOM_FIXED_GSO**: Do not adjust gso_size.
  *		  Adjusting mss in this way is not allowed for datagrams.
  *
+ *		* **BPF_F_ADJ_ROOM_SKIP_CSUM_RESET**: When shrinking skbs
+ *		  marked CHECKSUM_UNNECESSARY, avoid default behavior which
+ *		  resets to CHECKSUM_NONE.  In most cases, this flag will
+ *		  not be needed as the default behavior ensures checksums
+ *		  will be verified in sofware.
+ *
  *		* **BPF_F_ADJ_ROOM_ENCAP_L3_IPV4**,
  *		  **BPF_F_ADJ_ROOM_ENCAP_L3_IPV6**:
  *		  Any new space is reserved to hold a tunnel header.
@@ -3431,6 +3437,7 @@ enum {
 	BPF_F_ADJ_ROOM_ENCAP_L3_IPV6	= (1ULL << 2),
 	BPF_F_ADJ_ROOM_ENCAP_L4_GRE	= (1ULL << 3),
 	BPF_F_ADJ_ROOM_ENCAP_L4_UDP	= (1ULL << 4),
+	BPF_F_ADJ_ROOM_SKIP_CSUM_RESET	= (1ULL << 5),
 };
 
 enum {
diff --git a/net/core/filter.c b/net/core/filter.c
index a6fc234..47c8a31 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3113,7 +3113,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 {
 	int ret;
 
-	if (flags & ~BPF_F_ADJ_ROOM_FIXED_GSO)
+	if (flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
+		      BPF_F_ADJ_ROOM_SKIP_CSUM_RESET))
 		return -EINVAL;
 
 	if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
@@ -3143,6 +3144,18 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 		shinfo->gso_segs = 0;
 	}
 
+	/*
+	 * Decap should invalidate checksum checks done by hardware.
+	 * skb_csum_unnecessary() is not used as the other conditions
+	 * in that predicate do not need to be considered here; we only
+	 * wish to downgrade CHECKSUM_UNNECESSARY to CHECKSUM_NONE.
+	 */
+	if (unlikely(!(flags & BPF_F_ADJ_ROOM_SKIP_CSUM_RESET) &&
+		     skb->ip_summed == CHECKSUM_UNNECESSARY)) {
+		skb->ip_summed = CHECKSUM_NONE;
+		skb->csum_level = 0;
+	}
+
 	return 0;
 }
 
-- 
1.8.3.1


  reply	other threads:[~2020-06-01 17:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04 16:11 Checksum behaviour of bpf_redirected packets Lorenz Bauer
2020-05-06  1:28 ` Alexei Starovoitov
2020-05-06 16:24   ` Lorenz Bauer
2020-05-06 17:26     ` Jakub Kicinski
2020-05-06 21:55     ` Daniel Borkmann
2020-05-07 15:54       ` Lorenz Bauer
2020-05-07 16:43         ` Daniel Borkmann
2020-05-07 21:25           ` Jakub Kicinski
2020-05-11  9:31             ` Lorenz Bauer
2020-05-11  9:29           ` Lorenz Bauer
2020-05-12 21:25             ` Daniel Borkmann
2020-05-13 14:14               ` Lorenz Bauer
2020-06-01 17:48                 ` Alan Maguire [this message]
2020-06-01 20:13                   ` Daniel Borkmann
2020-06-01 21:25                     ` Alan Maguire
2020-06-02 10:13                       ` Lorenz Bauer
2020-06-02 15:01                         ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.21.2006011839430.623@localhost \
    --to=alan.maguire@oracle.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.