From: Alan Maguire <alan.maguire@oracle.com>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
bpf <bpf@vger.kernel.org>,
kernel-team <kernel-team@cloudflare.com>,
Jakub Kicinski <kuba@kernel.org>
Subject: Re: Checksum behaviour of bpf_redirected packets
Date: Mon, 1 Jun 2020 18:48:20 +0100 (BST) [thread overview]
Message-ID: <alpine.LRH.2.21.2006011839430.623@localhost> (raw)
In-Reply-To: <CACAyw9_LPEOvHbmP8UrpwVkwYT57rKWRisai=Z7kbKxOPh5XNQ@mail.gmail.com>
On Wed, 13 May 2020, Lorenz Bauer wrote:
> > > Option 1: always downgrade UNNECESSARY to NONE
> > > - Easiest to back port
> > > - The helper is safe by default
> > > - Performance impact unclear
> > > - No escape hatch for Cilium
> > >
> > > Option 2: add a flag to force CHECKSUM_NONE
> > > - New UAPI, can this be backported?
> > > - The helper isn't safe by default, needs documentation
> > > - Escape hatch for Cilium
> > >
> > > Option 3: downgrade to CHECKSUM_NONE, add flag to skip this
> > > - New UAPI, can this be backported?
> > > - The helper is safe by default
> > > - Escape hatch for Cilium (though you'd need to detect availability of the
> > > flag somehow)
> >
> > This seems most reasonable to me; I can try and cook a proposal for tomorrow as
> > potential fix. Even if we add a flag, this is still backportable to stable (as
> > long as the overall patch doesn't get too complex and the backport itself stays
> > compatible uapi-wise to latest kernels. We've done that before.). I happen to
> > have two ixgbe NICs on some of my test machines which seem to be setting the
> > CHECKSUM_UNNECESSARY, so I'll run some experiments from over here as well.
>
> Great! I'm happy to test, of course.
>
I had a go at implementing option 3 as a few colleagues ran into this
problem. They confirmed the fix below resolved the issue. Daniel is
this roughly what you had in mind? I can submit a patch for the bpf
tree if that's acceptable with the new flag. Do we need a few
tests though?
From 7e0b0c78530f3800e5c40aa1fe87e5db82c5fb59 Mon Sep 17 00:00:00 2001
From: Alan Maguire <alan.maguire@oracle.com>
Date: Mon, 1 Jun 2020 13:10:37 +0200
Subject: [PATCH bpf-next 1/2] bpf: fix bpf_skb_adjust_room decap for
CHECKSUM_UNNECESSESARY skbs
When hardware verifies checksums for some of the headers it
will set CHECKSUM_UNNECESSESARY and csum_level indicates the
number of consecutive checksums found. If we de-encapsulate
data however these values become invalid since we likely
just removed the checksum-validated headers. The best option
in such cases is to revert to CHECKSUM_NONE as all checksums
will then be checked in software. Otherwise such checks can
be skipped.
Other checksum states are handled via skb_postpull_rcsum().
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
include/uapi/linux/bpf.h | 7 +++++++
net/core/filter.c | 15 ++++++++++++++-
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 974ca6e..03ab70c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1646,6 +1646,12 @@ struct bpf_stack_build_id {
* * **BPF_F_ADJ_ROOM_FIXED_GSO**: Do not adjust gso_size.
* Adjusting mss in this way is not allowed for datagrams.
*
+ * * **BPF_F_ADJ_ROOM_SKIP_CSUM_RESET**: When shrinking skbs
+ * marked CHECKSUM_UNNECESSARY, avoid default behavior which
+ * resets to CHECKSUM_NONE. In most cases, this flag will
+ * not be needed as the default behavior ensures checksums
+ * will be verified in sofware.
+ *
* * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV4**,
* **BPF_F_ADJ_ROOM_ENCAP_L3_IPV6**:
* Any new space is reserved to hold a tunnel header.
@@ -3431,6 +3437,7 @@ enum {
BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2),
BPF_F_ADJ_ROOM_ENCAP_L4_GRE = (1ULL << 3),
BPF_F_ADJ_ROOM_ENCAP_L4_UDP = (1ULL << 4),
+ BPF_F_ADJ_ROOM_SKIP_CSUM_RESET = (1ULL << 5),
};
enum {
diff --git a/net/core/filter.c b/net/core/filter.c
index a6fc234..47c8a31 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3113,7 +3113,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
{
int ret;
- if (flags & ~BPF_F_ADJ_ROOM_FIXED_GSO)
+ if (flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
+ BPF_F_ADJ_ROOM_SKIP_CSUM_RESET))
return -EINVAL;
if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
@@ -3143,6 +3144,18 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
shinfo->gso_segs = 0;
}
+ /*
+ * Decap should invalidate checksum checks done by hardware.
+ * skb_csum_unnecessary() is not used as the other conditions
+ * in that predicate do not need to be considered here; we only
+ * wish to downgrade CHECKSUM_UNNECESSARY to CHECKSUM_NONE.
+ */
+ if (unlikely(!(flags & BPF_F_ADJ_ROOM_SKIP_CSUM_RESET) &&
+ skb->ip_summed == CHECKSUM_UNNECESSARY)) {
+ skb->ip_summed = CHECKSUM_NONE;
+ skb->csum_level = 0;
+ }
+
return 0;
}
--
1.8.3.1
next prev parent reply other threads:[~2020-06-01 17:50 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-04 16:11 Checksum behaviour of bpf_redirected packets Lorenz Bauer
2020-05-06 1:28 ` Alexei Starovoitov
2020-05-06 16:24 ` Lorenz Bauer
2020-05-06 17:26 ` Jakub Kicinski
2020-05-06 21:55 ` Daniel Borkmann
2020-05-07 15:54 ` Lorenz Bauer
2020-05-07 16:43 ` Daniel Borkmann
2020-05-07 21:25 ` Jakub Kicinski
2020-05-11 9:31 ` Lorenz Bauer
2020-05-11 9:29 ` Lorenz Bauer
2020-05-12 21:25 ` Daniel Borkmann
2020-05-13 14:14 ` Lorenz Bauer
2020-06-01 17:48 ` Alan Maguire [this message]
2020-06-01 20:13 ` Daniel Borkmann
2020-06-01 21:25 ` Alan Maguire
2020-06-02 10:13 ` Lorenz Bauer
2020-06-02 15:01 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LRH.2.21.2006011839430.623@localhost \
--to=alan.maguire@oracle.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=lmb@cloudflare.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).