From: Daniel Borkmann <daniel@iogearbox.net>
To: Lorenz Bauer <lmb@cloudflare.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, kernel-team <kernel-team@cloudflare.com>
Subject: Re: Checksum behaviour of bpf_redirected packets
Date: Wed, 6 May 2020 23:55:10 +0200 [thread overview]
Message-ID: <39d3bee2-dcfc-8240-4c78-2110d639d386@iogearbox.net> (raw)
In-Reply-To: <CACAyw9_ygNV1J+PkBJ-i7ysU_Y=rN3Z5adKYExNXCic0gumaow@mail.gmail.com>
On 5/6/20 6:24 PM, Lorenz Bauer wrote:
> On Wed, 6 May 2020 at 02:28, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Mon, May 4, 2020 at 9:12 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>>>
>>> In our TC classifier cls_redirect [1], we use the following sequence
>>> of helper calls to
>>> decapsulate a GUE (basically IP + UDP + custom header) encapsulated packet:
>>>
>>> skb_adjust_room(skb, -encap_len,
>>> BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
>>> bpf_redirect(skb->ifindex, BPF_F_INGRESS)
>>>
>>> It seems like some checksums of the inner headers are not validated in
>>> this case.
>>> For example, a TCP SYN packet with invalid TCP checksum is still accepted by the
>>> network stack and elicits a SYN ACK.
>>>
>>> Is this known but undocumented behaviour or a bug? In either case, is
>>> there a work
>>> around I'm not aware of?
>>
>> I thought inner and outer csums are covered by different flags and driver
>> suppose to set the right one depending on level of in-hw checking it did.
>
> I've figured out what the problem is. We receive the following packet from
> the driver:
>
> | ETH | IP | UDP | GUE | IP | TCP |
> skb->ip_summed == CHECKSUM_UNNECESSARY
>
> ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx
> checksum offloading. On this packet we run skb_adjust_room_mac(-encap),
> and get the following:
>
> | ETH | IP | TCP |
> skb->ip_summed == CHECKSUM_UNNECESSARY
>
> Note that ip_summed is still CHECKSUM_UNNECESSARY. After
> bpf_redirect()ing into the ingress, we end up in tcp_v4_rcv. There
> skb_checksum_init is turned into a no-op due to
> CHECKSUM_UNNECESSARY.
>
> I think this boils down to bpf_skb_generic_pop not adjusting ip_summed
> accordingly. Unfortunately I don't understand how checksums work
> sufficiently. Daniel, it seems like you wrote the helper, could you
> take a look?
Right, so in the skb_adjust_room() case we're not aware of protocol
specifics. We do handle the csum complete case via skb_postpull_rcsum(),
but not CHECKSUM_UNNECESSARY at the moment. I presume in your case the
skb->csum_level of the original skb prior to skb_adjust_room() call
might have been 0 (that is, covering UDP)? So if we'd add the possibility
to __skb_decr_checksum_unnecessary() via flag, then it would become
skb->ip_summed = CHECKSUM_NONE? And to be generic, we'd need to do the
same for the reverse case. Below is a quick hack (compile tested-only);
would this resolve your case ...
>>> skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO|BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)
>>> bpf_redirect(skb->ifindex, BPF_F_INGRESS)
From 7439724fcfff7742223198c620349a4fc89d4835 Mon Sep 17 00:00:00 2001
Message-Id: <7439724fcfff7742223198c620349a4fc89d4835.1588801971.git.daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 6 May 2020 23:50:31 +0200
Subject: [PATCH bpf-next] bpf: inc/dec csum level for csum_unnecessary in skb_adjust_room
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
include/uapi/linux/bpf.h | 2 ++
net/core/filter.c | 23 ++++++++++++++++++++---
2 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b3643e27e264..9877807b8f28 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3279,6 +3279,8 @@ enum {
BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2),
BPF_F_ADJ_ROOM_ENCAP_L4_GRE = (1ULL << 3),
BPF_F_ADJ_ROOM_ENCAP_L4_UDP = (1ULL << 4),
+ BPF_F_ADJ_ROOM_INC_CSUM_LEVEL = (1ULL << 5),
+ BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL = (1ULL << 6),
};
enum {
diff --git a/net/core/filter.c b/net/core/filter.c
index dfaf5df13722..10551dabb7b5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3008,7 +3008,9 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
BPF_F_ADJ_ROOM_ENCAP_L2( \
- BPF_ADJ_ROOM_ENCAP_L2_MASK))
+ BPF_ADJ_ROOM_ENCAP_L2_MASK) | \
+ BPF_F_ADJ_ROOM_INC_CSUM_LEVEL | \
+ BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)
static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
u64 flags)
@@ -3019,6 +3021,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
unsigned int gso_type = SKB_GSO_DODGY;
int ret;
+ if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK &
+ ~(BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL))))
+ return -EINVAL;
+
if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
/* udp gso_size delineates datagrams, only allow if fixed */
if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) ||
@@ -3105,6 +3111,9 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
shinfo->gso_segs = 0;
}
+ if (flags & BPF_F_ADJ_ROOM_INC_CSUM_LEVEL)
+ __skb_incr_checksum_unnecessary(skb);
+
return 0;
}
@@ -3113,7 +3122,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
{
int ret;
- if (flags & ~BPF_F_ADJ_ROOM_FIXED_GSO)
+ if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
+ BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)))
return -EINVAL;
if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
@@ -3143,6 +3153,9 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
shinfo->gso_segs = 0;
}
+ if (flags & BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)
+ __skb_decr_checksum_unnecessary(skb);
+
return 0;
}
@@ -3163,7 +3176,11 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
u32 off;
int ret;
- if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK))
+ if (unlikely((flags & ~BPF_F_ADJ_ROOM_MASK) ||
+ ((flags & (BPF_F_ADJ_ROOM_INC_CSUM_LEVEL |
+ BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)) ==
+ (BPF_F_ADJ_ROOM_INC_CSUM_LEVEL |
+ BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL))))
return -EINVAL;
if (unlikely(len_diff_abs > 0xfffU))
return -EFAULT;
--
2.21.0
next prev parent reply other threads:[~2020-05-06 21:55 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-04 16:11 Checksum behaviour of bpf_redirected packets Lorenz Bauer
2020-05-06 1:28 ` Alexei Starovoitov
2020-05-06 16:24 ` Lorenz Bauer
2020-05-06 17:26 ` Jakub Kicinski
2020-05-06 21:55 ` Daniel Borkmann [this message]
2020-05-07 15:54 ` Lorenz Bauer
2020-05-07 16:43 ` Daniel Borkmann
2020-05-07 21:25 ` Jakub Kicinski
2020-05-11 9:31 ` Lorenz Bauer
2020-05-11 9:29 ` Lorenz Bauer
2020-05-12 21:25 ` Daniel Borkmann
2020-05-13 14:14 ` Lorenz Bauer
2020-06-01 17:48 ` Alan Maguire
2020-06-01 20:13 ` Daniel Borkmann
2020-06-01 21:25 ` Alan Maguire
2020-06-02 10:13 ` Lorenz Bauer
2020-06-02 15:01 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=39d3bee2-dcfc-8240-4c78-2110d639d386@iogearbox.net \
--to=daniel@iogearbox.net \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=kernel-team@cloudflare.com \
--cc=lmb@cloudflare.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).