All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Maciej Żenczykowski" <maze@google.com>
Cc: bpf <bpf@vger.kernel.org>, Linux NetDev <netdev@vger.kernel.org>,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Lorenz Bauer <lmb@cloudflare.com>,
	Shaun Crampton <shaun@tigera.io>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Marek Majkowski <marek@cloudflare.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>,
	brouer@redhat.com, David Ahern <dsahern@kernel.org>
Subject: Re: [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up
Date: Wed, 7 Oct 2020 09:42:28 +0200	[thread overview]
Message-ID: <20201007094228.5919998b@carbon> (raw)
In-Reply-To: <CANP3RGfeh=a=h2C4voLtfWtvKG7ezaPb7y6r0W1eOjA2ZoNHaw@mail.gmail.com>

On Tue, 6 Oct 2020 18:34:50 -0700
Maciej Żenczykowski <maze@google.com> wrote:

> On Tue, Oct 6, 2020 at 9:03 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >
> > The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup)
> > can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED.  The BPF-prog
> > don't know the MTU value that caused this rejection.
> >
> > If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it
> > need to know this MTU value for the ICMP packet.
> >
> > Patch change lookup and result struct bpf_fib_lookup, to contain this MTU
> > value as output via a union with 'tot_len' as this is the value used for
> > the MTU lookup.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  include/uapi/linux/bpf.h |   11 +++++++++--
> >  net/core/filter.c        |   17 ++++++++++++-----
> >  2 files changed, 21 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index c446394135be..50ce65e37b16 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
[...]
> > @@ -4844,9 +4847,13 @@ struct bpf_fib_lookup {
> >         __be16  sport;
> >         __be16  dport;
> >
> > -       /* total length of packet from network header - used for MTU check */
> > -       __u16   tot_len;
> > +       union { /* used for MTU check */
> > +               /* input to lookup */
> > +               __u16   tot_len; /* total length of packet from network hdr */
> >
> > +               /* output: MTU value (if requested check_mtu) */
> > +               __u16   mtu;
> > +       };
> >         /* input: L3 device index for lookup
> >          * output: device index from FIB lookup
> >          */
[...]
> 
> It would be beneficial to be able to fetch the route advmss, initcwnd,
> etc as well...
> But I take it the struct can't be extended?

The struct bpf_fib_lookup is exactly 1 cache-line (64 bytes) for
performance reasons.  I do believe that it can be extended, as Ahern
designed the BPF-helper API cleverly via a plen (detail below signature).

For accessing other route metric information like advmss and initcwnd,
I would expect Daniel to suggest to use BTF to access info from
dst_entry, or actually dst->_metrics. But looking at the details for
accessing dst->_metrics is complicated by macros, thus I expect BTF
would have a hard time.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 flags)

$ pahole -C bpf_fib_lookup
struct bpf_fib_lookup {
	__u8                       family;               /*     0     1 */
	__u8                       l4_protocol;          /*     1     1 */
	__be16                     sport;                /*     2     2 */
	__be16                     dport;                /*     4     2 */
	union {
		__u16              tot_len;              /*     6     2 */
		__u16              mtu;                  /*     6     2 */
	};                                               /*     6     2 */
	__u32                      ifindex;              /*     8     4 */
	union {
		__u8               tos;                  /*    12     1 */
		__be32             flowinfo;             /*    12     4 */
		__u32              rt_metric;            /*    12     4 */
	};                                               /*    12     4 */
	union {
		__be32             ipv4_src;             /*    16     4 */
		__u32              ipv6_src[4];          /*    16    16 */
	};                                               /*    16    16 */
	union {
		__be32             ipv4_dst;             /*    32     4 */
		__u32              ipv6_dst[4];          /*    32    16 */
	};                                               /*    32    16 */
	__be16                     h_vlan_proto;         /*    48     2 */
	__be16                     h_vlan_TCI;           /*    50     2 */
	__u8                       smac[6];              /*    52     6 */
	__u8                       dmac[6];              /*    58     6 */

	/* size: 64, cachelines: 1, members: 13 */
};


struct dst_metrics {
	u32		metrics[RTAX_MAX];
	refcount_t	refcnt;
} __aligned(4);		/* Low pointer bits contain DST_METRICS_FLAGS */

extern const struct dst_metrics dst_default_metrics;

#define DST_METRICS_READ_ONLY		0x1UL
#define DST_METRICS_REFCOUNTED		0x2UL
#define DST_METRICS_FLAGS		0x3UL
#define __DST_METRICS_PTR(Y)	\
	((u32 *)((Y) & ~DST_METRICS_FLAGS))
#define DST_METRICS_PTR(X)	__DST_METRICS_PTR((X)->_metrics)


static inline u32
dst_metric_raw(const struct dst_entry *dst, const int metric)
{
	u32 *p = DST_METRICS_PTR(dst);

	return p[metric-1];
}

static inline u32
dst_metric_advmss(const struct dst_entry *dst)
{
	u32 advmss = dst_metric_raw(dst, RTAX_ADVMSS);

	if (!advmss)
		advmss = dst->ops->default_advmss(dst);

	return advmss;
}


  reply	other threads:[~2020-10-07  7:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-06 16:02 [PATCH bpf-next V1 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
2020-10-06 16:02 ` [PATCH bpf-next V1 1/6] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
2020-10-06 16:02 ` [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
2020-10-07  1:34   ` Maciej Żenczykowski
2020-10-07  7:42     ` Jesper Dangaard Brouer [this message]
2020-10-07 16:38       ` David Ahern
2020-10-07  7:28   ` kernel test robot
2020-10-06 16:03 ` [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex Jesper Dangaard Brouer
2020-10-06 16:33   ` Jesper Dangaard Brouer
2020-10-07  1:18     ` Jakub Kicinski
2020-10-07  1:24       ` Maciej Żenczykowski
2020-10-07  7:53         ` Jesper Dangaard Brouer
2020-10-07 16:35         ` David Ahern
2020-10-07 17:44           ` Maciej Żenczykowski
2020-10-06 16:03 ` [PATCH bpf-next V1 4/6] bpf: make it possible to identify BPF redirected SKBs Jesper Dangaard Brouer
2020-10-06 16:03 ` [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook Jesper Dangaard Brouer
2020-10-06 20:09   ` kernel test robot
2020-10-06 20:09     ` kernel test robot
2020-10-07  0:26   ` kernel test robot
2020-10-07  0:26     ` kernel test robot
2020-10-06 16:03 ` [PATCH bpf-next V1 6/6] bpf: drop MTU check when doing TC-BPF redirect to ingress Jesper Dangaard Brouer
2020-10-06 21:20 [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201007094228.5919998b@carbon \
    --to=brouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=dsahern@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=lorenzo@kernel.org \
    --cc=marek@cloudflare.com \
    --cc=maze@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=shaun@tigera.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.