All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenz Bauer <lmb@cloudflare.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	bpf <bpf@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>
Subject: Re: Checksum behaviour of bpf_redirected packets
Date: Mon, 11 May 2020 10:31:44 +0100	[thread overview]
Message-ID: <CACAyw9-Hf7977K3f6hM3gWawz2Y_KgkJ0URmivkAxX8kKH1iEA@mail.gmail.com> (raw)
In-Reply-To: <20200507142518.43c22a1b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Thu, 7 May 2020 at 22:25, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 7 May 2020 18:43:47 +0200 Daniel Borkmann wrote:
> > > Thanks for the patch, it indeed fixes our problem! I spent some more time
> > > trying to understand the checksum offload stuff, here is where I am:
> > >
> > > On NICs that don't support hardware offload ip_summed is CHECKSUM_NONE,
> > > everything works by default since the rest of the stack does checksumming in
> > > software.
> > >
> > > On NICs that support CHECKSUM_COMPLETE, skb_postpull_rcsum
> > > will adjust for the data that is being removed from the skb. The rest of the
> > > stack will use the correct value, all is well.
> > >
> > > However, we're out of luck on NICs that do CHECKSUM_UNNECESSARY:
> > > the API of skb_adjust_room doesn't tell us whether the user intends to
> > > remove headers or data, and how that will influence csum_level.
> > >  From my POV, skb_adjust_room currently does the wrong thing.
> > > I think we need to fix skb_adjust_room to do the right thing by default,
> > > rather than extending the API. We spent a lot of time on tracking this down,
> > > so hopefully we can spare others the pain.
> > >
> > > As Jakub alludes to, we don't know when and how often to call
> > > __skb_decr_checksum_unnecessary so we should just
> > > unconditionally downgrade a packet to CHECKSUM_NONE if we encounter
> > > CHECKSUM_UNNECESSARY in bpf_skb_generic_pop. It sounds simple
> > > enough to land as a fix via the bpf tree (which is important for our
> > > production kernel). As a follow up we could add the inverse of the flags you
> > > propose via bpf-next.
> > >
> > > What do you think?
> >
> > My concern with unconditionally downgrading a packet to CHECKSUM_NONE would
> > basically trash performance if we have to fallback to sw in fast-path, these
> > helpers are also used in our LB case for DSR, for example. I agree that it
> > sucks to expose these implementation details though. So eventually we'd end
> > up with 3 csum flags: inc/dec/reset to none. bpf_skb_adjust_room() is already
> > a complex to use helper with all its flags where you end up looking into the
> > implementation detail to understand what it is really doing. I'm not sure if
> > we make anything worse, but I do see your concern. :/ (We do have bpf_csum_update()
> > helper as well. I wonder whether we should split such control into a different
> > helper.)
>
> Probably stating the obvious but for decap of UDP tunnels which carry
> locally terminated flows - we'd probably also want the upgrade from
> UNNECESSARY to COMPLETE, like we do in the kernel
> (skb_checksum_try_convert()). Tricky.

I guess this is an argument in the direction that bpf_adjust_room is too
low level an API?

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

  reply	other threads:[~2020-05-11  9:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04 16:11 Checksum behaviour of bpf_redirected packets Lorenz Bauer
2020-05-06  1:28 ` Alexei Starovoitov
2020-05-06 16:24   ` Lorenz Bauer
2020-05-06 17:26     ` Jakub Kicinski
2020-05-06 21:55     ` Daniel Borkmann
2020-05-07 15:54       ` Lorenz Bauer
2020-05-07 16:43         ` Daniel Borkmann
2020-05-07 21:25           ` Jakub Kicinski
2020-05-11  9:31             ` Lorenz Bauer [this message]
2020-05-11  9:29           ` Lorenz Bauer
2020-05-12 21:25             ` Daniel Borkmann
2020-05-13 14:14               ` Lorenz Bauer
2020-06-01 17:48                 ` Alan Maguire
2020-06-01 20:13                   ` Daniel Borkmann
2020-06-01 21:25                     ` Alan Maguire
2020-06-02 10:13                       ` Lorenz Bauer
2020-06-02 15:01                         ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACAyw9-Hf7977K3f6hM3gWawz2Y_KgkJ0URmivkAxX8kKH1iEA@mail.gmail.com \
    --to=lmb@cloudflare.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.