bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lorenz Bauer <lmb@cloudflare.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	bpf <bpf@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>
Subject: Re: Checksum behaviour of bpf_redirected packets
Date: Mon, 11 May 2020 10:31:44 +0100	[thread overview]
Message-ID: <CACAyw9-Hf7977K3f6hM3gWawz2Y_KgkJ0URmivkAxX8kKH1iEA@mail.gmail.com> (raw)
In-Reply-To: <20200507142518.43c22a1b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Thu, 7 May 2020 at 22:25, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 7 May 2020 18:43:47 +0200 Daniel Borkmann wrote:
> > > Thanks for the patch, it indeed fixes our problem! I spent some more time
> > > trying to understand the checksum offload stuff, here is where I am:
> > >
> > > On NICs that don't support hardware offload ip_summed is CHECKSUM_NONE,
> > > everything works by default since the rest of the stack does checksumming in
> > > software.
> > >
> > > On NICs that support CHECKSUM_COMPLETE, skb_postpull_rcsum
> > > will adjust for the data that is being removed from the skb. The rest of the
> > > stack will use the correct value, all is well.
> > >
> > > However, we're out of luck on NICs that do CHECKSUM_UNNECESSARY:
> > > the API of skb_adjust_room doesn't tell us whether the user intends to
> > > remove headers or data, and how that will influence csum_level.
> > >  From my POV, skb_adjust_room currently does the wrong thing.
> > > I think we need to fix skb_adjust_room to do the right thing by default,
> > > rather than extending the API. We spent a lot of time on tracking this down,
> > > so hopefully we can spare others the pain.
> > >
> > > As Jakub alludes to, we don't know when and how often to call
> > > __skb_decr_checksum_unnecessary so we should just
> > > unconditionally downgrade a packet to CHECKSUM_NONE if we encounter
> > > CHECKSUM_UNNECESSARY in bpf_skb_generic_pop. It sounds simple
> > > enough to land as a fix via the bpf tree (which is important for our
> > > production kernel). As a follow up we could add the inverse of the flags you
> > > propose via bpf-next.
> > >
> > > What do you think?
> >
> > My concern with unconditionally downgrading a packet to CHECKSUM_NONE would
> > basically trash performance if we have to fallback to sw in fast-path, these
> > helpers are also used in our LB case for DSR, for example. I agree that it
> > sucks to expose these implementation details though. So eventually we'd end
> > up with 3 csum flags: inc/dec/reset to none. bpf_skb_adjust_room() is already
> > a complex to use helper with all its flags where you end up looking into the
> > implementation detail to understand what it is really doing. I'm not sure if
> > we make anything worse, but I do see your concern. :/ (We do have bpf_csum_update()
> > helper as well. I wonder whether we should split such control into a different
> > helper.)
>
> Probably stating the obvious but for decap of UDP tunnels which carry
> locally terminated flows - we'd probably also want the upgrade from
> UNNECESSARY to COMPLETE, like we do in the kernel
> (skb_checksum_try_convert()). Tricky.

I guess this is an argument in the direction that bpf_adjust_room is too
low level an API?

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

  reply	other threads:[~2020-05-11  9:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04 16:11 Checksum behaviour of bpf_redirected packets Lorenz Bauer
2020-05-06  1:28 ` Alexei Starovoitov
2020-05-06 16:24   ` Lorenz Bauer
2020-05-06 17:26     ` Jakub Kicinski
2020-05-06 21:55     ` Daniel Borkmann
2020-05-07 15:54       ` Lorenz Bauer
2020-05-07 16:43         ` Daniel Borkmann
2020-05-07 21:25           ` Jakub Kicinski
2020-05-11  9:31             ` Lorenz Bauer [this message]
2020-05-11  9:29           ` Lorenz Bauer
2020-05-12 21:25             ` Daniel Borkmann
2020-05-13 14:14               ` Lorenz Bauer
2020-06-01 17:48                 ` Alan Maguire
2020-06-01 20:13                   ` Daniel Borkmann
2020-06-01 21:25                     ` Alan Maguire
2020-06-02 10:13                       ` Lorenz Bauer
2020-06-02 15:01                         ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACAyw9-Hf7977K3f6hM3gWawz2Y_KgkJ0URmivkAxX8kKH1iEA@mail.gmail.com \
    --to=lmb@cloudflare.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).