Netdev Archive on lore.kernel.org
 help / color / Atom feed
From: "Maciej Żenczykowski" <maze@google.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Daniel Borkmann <borkmann@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	BPF-dev-list <bpf@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Lorenzo Bianconi <lorenzo.bianconi@redhat.com>,
	Lorenz Bauer <lmb@cloudflare.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Shaun Crampton <shaun@tigera.io>,
	David Miller <davem@davemloft.net>,
	Marek Majkowski <marek@cloudflare.com>
Subject: Re: BPF redirect API design issue for BPF-prog MTU feedback?
Date: Thu, 17 Sep 2020 05:54:48 -0700
Message-ID: <CANP3RGcxM-Cno=Qw5Lut9DgmV=1suXqetnybA9RgxmW3KmwivQ@mail.gmail.com> (raw)
In-Reply-To: <20200917143846.37ce43a0@carbon>

On Thu, Sep 17, 2020 at 5:39 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
>
> As you likely know[1] I'm looking into moving the MTU check (for TC-BPF)
> in __bpf_skb_max_len() when e.g. called by bpf_skb_adjust_room(),
> because when redirecting packets to another netdev it is not correct to
> limit the MTU based on the incoming netdev.
>
> I was looking at doing the MTU check in bpf_redirect() helper, because
> at this point we know the redirect to netdev, and returning an
> indication/error that MTU was exceed, would allow the BPF-prog logic to
> react, e.g. sending ICMP (instead of packet getting silently dropped).
> BUT this is not possible because bpf_redirect(index, flags) helper
> don't provide the packet context-object (so I cannot lookup the packet
> length).
>
> Seeking input:
>
> Should/can we change the bpf_redirect API or create a new helper with
> packet-context?
>
>  Note: We have the same need for the packet context for XDP when
>  redirecting the new multi-buffer packets, as not all destination netdev
>  will support these new multi-buffer packets.
>
> I can of-cause do the MTU checks on kernel-side in skb_do_redirect, but
> then how do people debug this? as packet will basically be silently dropped.
>
>
>
> (Looking at how does BPF-prog logic handle MTU today)
>
> How do bpf_skb_adjust_room() report that the MTU was exceeded?
> Unfortunately it uses a common return code -ENOTSUPP which used for
> multiple cases (include MTU exceeded). Thus, the BPF-prog logic cannot
> use this reliably to know if this is a MTU exceeded event. (Looked
> BPF-prog code and they all simply exit with TC_ACT_SHOT for all error
> codes, cloudflare have the most advanced handling with
> metrics->errors_total_encap_adjust_failed++).
>
>
> [1] https://lore.kernel.org/bpf/159921182827.1260200.9699352760916903781.stgit@firesoul/
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>

(a) the current state of the world seems very hard to use correctly,
so adding new apis,
or even changing existing ones seems ok to me.
especially if this just means changing what error code they return

(b) another complexity with bpf_redirect() is you can call it, it can succeed,
but then you can not return TC_ACT_REDIRECT from the bpf program,
which effectively makes the earlier *successful* bpf_redirect() call
an utter no-op.

(bpf_redirect() just determines what a future return TC_ACT_REDIRECT will do)

so if you bpf_redirect to interface with larger mtu, then increase packet size,
then return TC_ACT_OK, then you potentially end up with excessively large
packet egressing through original interface (with small mtu).

My vote would be to return a new distinct error from bpf_redirect()
based on then current
packet size and interface being redirected to, save this interface mtu
somewhere,
then in operations that increase packet size check against this saved mtu,
for correctness you still have to check mtu after the bpf program is done,
but this is then just to deal with braindead bpf code (that calls
bpf_redirect and returns TC_ACT_OK, or calls bpf_redirect() multiple
times, or something...).

Maciej Żenczykowski, Kernel Networking Developer @ Google

  reply index

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-17 12:38 Jesper Dangaard Brouer
2020-09-17 12:54 ` Maciej Żenczykowski [this message]
2020-09-17 19:11   ` Saeed Mahameed
2020-09-18 10:00     ` Jesper Dangaard Brouer
2020-09-18 10:34       ` Toke Høiland-Jørgensen
2020-09-18 23:06       ` Maciej Żenczykowski
2020-09-21 10:37         ` Lorenz Bauer
2020-09-21 12:49           ` Jesper Dangaard Brouer
2020-09-21 15:08             ` Daniel Borkmann
2020-09-21 16:21               ` Marek Zavodsky
2020-09-21 21:17                 ` Willem de Bruijn
2020-09-22  9:15                   ` Jesper Dangaard Brouer
2020-09-21 16:26               ` Jesper Dangaard Brouer
2020-09-22  6:56                 ` Eyal Birger
2020-09-21 18:04               ` John Fastabend
2020-10-06 11:45                 ` Jesper Dangaard Brouer
2020-09-21 10:42   ` Lorenz Bauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANP3RGcxM-Cno=Qw5Lut9DgmV=1suXqetnybA9RgxmW3KmwivQ@mail.gmail.com' \
    --to=maze@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=marek@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=shaun@tigera.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git