All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Lorenz Bauer <lmb@cloudflare.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, kernel-team <kernel-team@cloudflare.com>
Subject: Re: Checksum behaviour of bpf_redirected packets
Date: Wed, 6 May 2020 23:55:10 +0200	[thread overview]
Message-ID: <39d3bee2-dcfc-8240-4c78-2110d639d386@iogearbox.net> (raw)
In-Reply-To: <CACAyw9_ygNV1J+PkBJ-i7ysU_Y=rN3Z5adKYExNXCic0gumaow@mail.gmail.com>

On 5/6/20 6:24 PM, Lorenz Bauer wrote:
> On Wed, 6 May 2020 at 02:28, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Mon, May 4, 2020 at 9:12 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>>>
>>> In our TC classifier cls_redirect [1], we use the following sequence
>>> of helper calls to
>>> decapsulate a GUE (basically IP + UDP + custom header) encapsulated packet:
>>>
>>>    skb_adjust_room(skb, -encap_len,
>>> BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
>>>    bpf_redirect(skb->ifindex, BPF_F_INGRESS)
>>>
>>> It seems like some checksums of the inner headers are not validated in
>>> this case.
>>> For example, a TCP SYN packet with invalid TCP checksum is still accepted by the
>>> network stack and elicits a SYN ACK.
>>>
>>> Is this known but undocumented behaviour or a bug? In either case, is
>>> there a work
>>> around I'm not aware of?
>>
>> I thought inner and outer csums are covered by different flags and driver
>> suppose to set the right one depending on level of in-hw checking it did.
> 
> I've figured out what the problem is. We receive the following packet from
> the driver:
> 
>      | ETH | IP | UDP | GUE | IP | TCP |
>      skb->ip_summed == CHECKSUM_UNNECESSARY
> 
> ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx
> checksum offloading. On this packet we run skb_adjust_room_mac(-encap),
> and get the following:
> 
>      | ETH | IP | TCP |
>      skb->ip_summed == CHECKSUM_UNNECESSARY
> 
> Note that ip_summed is still CHECKSUM_UNNECESSARY. After
> bpf_redirect()ing into the ingress, we end up in tcp_v4_rcv. There
> skb_checksum_init is turned into a no-op due to
> CHECKSUM_UNNECESSARY.
> 
> I think this boils down to bpf_skb_generic_pop not adjusting ip_summed
> accordingly. Unfortunately I don't understand how checksums work
> sufficiently. Daniel, it seems like you wrote the helper, could you
> take a look?

Right, so in the skb_adjust_room() case we're not aware of protocol
specifics. We do handle the csum complete case via skb_postpull_rcsum(),
but not CHECKSUM_UNNECESSARY at the moment. I presume in your case the
skb->csum_level of the original skb prior to skb_adjust_room() call
might have been 0 (that is, covering UDP)? So if we'd add the possibility
to __skb_decr_checksum_unnecessary() via flag, then it would become
skb->ip_summed = CHECKSUM_NONE? And to be generic, we'd need to do the
same for the reverse case. Below is a quick hack (compile tested-only);
would this resolve your case ...

 >>>    skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO|BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)
 >>>    bpf_redirect(skb->ifindex, BPF_F_INGRESS)

 From 7439724fcfff7742223198c620349a4fc89d4835 Mon Sep 17 00:00:00 2001
Message-Id: <7439724fcfff7742223198c620349a4fc89d4835.1588801971.git.daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 6 May 2020 23:50:31 +0200
Subject: [PATCH bpf-next] bpf: inc/dec csum level for csum_unnecessary in skb_adjust_room

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
  include/uapi/linux/bpf.h |  2 ++
  net/core/filter.c        | 23 ++++++++++++++++++++---
  2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b3643e27e264..9877807b8f28 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3279,6 +3279,8 @@ enum {
  	BPF_F_ADJ_ROOM_ENCAP_L3_IPV6	= (1ULL << 2),
  	BPF_F_ADJ_ROOM_ENCAP_L4_GRE	= (1ULL << 3),
  	BPF_F_ADJ_ROOM_ENCAP_L4_UDP	= (1ULL << 4),
+	BPF_F_ADJ_ROOM_INC_CSUM_LEVEL	= (1ULL << 5),
+	BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL	= (1ULL << 6),
  };

  enum {
diff --git a/net/core/filter.c b/net/core/filter.c
index dfaf5df13722..10551dabb7b5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3008,7 +3008,9 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
  					 BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
  					 BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
  					 BPF_F_ADJ_ROOM_ENCAP_L2( \
-					  BPF_ADJ_ROOM_ENCAP_L2_MASK))
+					  BPF_ADJ_ROOM_ENCAP_L2_MASK) | \
+					 BPF_F_ADJ_ROOM_INC_CSUM_LEVEL | \
+					 BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)

  static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
  			    u64 flags)
@@ -3019,6 +3021,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
  	unsigned int gso_type = SKB_GSO_DODGY;
  	int ret;

+	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK &
+			       ~(BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL))))
+		return -EINVAL;
+
  	if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
  		/* udp gso_size delineates datagrams, only allow if fixed */
  		if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) ||
@@ -3105,6 +3111,9 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
  		shinfo->gso_segs = 0;
  	}

+	if (flags & BPF_F_ADJ_ROOM_INC_CSUM_LEVEL)
+		__skb_incr_checksum_unnecessary(skb);
+
  	return 0;
  }

@@ -3113,7 +3122,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
  {
  	int ret;

-	if (flags & ~BPF_F_ADJ_ROOM_FIXED_GSO)
+	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
+			       BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)))
  		return -EINVAL;

  	if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
@@ -3143,6 +3153,9 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
  		shinfo->gso_segs = 0;
  	}

+	if (flags & BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)
+		__skb_decr_checksum_unnecessary(skb);
+
  	return 0;
  }

@@ -3163,7 +3176,11 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
  	u32 off;
  	int ret;

-	if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK))
+	if (unlikely((flags & ~BPF_F_ADJ_ROOM_MASK) ||
+		     ((flags & (BPF_F_ADJ_ROOM_INC_CSUM_LEVEL |
+				BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)) ==
+		      (BPF_F_ADJ_ROOM_INC_CSUM_LEVEL |
+		       BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL))))
  		return -EINVAL;
  	if (unlikely(len_diff_abs > 0xfffU))
  		return -EFAULT;
-- 
2.21.0



  parent reply	other threads:[~2020-05-06 21:55 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04 16:11 Checksum behaviour of bpf_redirected packets Lorenz Bauer
2020-05-06  1:28 ` Alexei Starovoitov
2020-05-06 16:24   ` Lorenz Bauer
2020-05-06 17:26     ` Jakub Kicinski
2020-05-06 21:55     ` Daniel Borkmann [this message]
2020-05-07 15:54       ` Lorenz Bauer
2020-05-07 16:43         ` Daniel Borkmann
2020-05-07 21:25           ` Jakub Kicinski
2020-05-11  9:31             ` Lorenz Bauer
2020-05-11  9:29           ` Lorenz Bauer
2020-05-12 21:25             ` Daniel Borkmann
2020-05-13 14:14               ` Lorenz Bauer
2020-06-01 17:48                 ` Alan Maguire
2020-06-01 20:13                   ` Daniel Borkmann
2020-06-01 21:25                     ` Alan Maguire
2020-06-02 10:13                       ` Lorenz Bauer
2020-06-02 15:01                         ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39d3bee2-dcfc-8240-4c78-2110d639d386@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@cloudflare.com \
    --cc=lmb@cloudflare.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.