BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Magnus Karlsson <magnus.karlsson@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "John Fastabend" <john.fastabend@gmail.com>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Björn Töpel" <bjorn.topel@intel.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Network Development" <netdev@vger.kernel.org>,
	"Jonathan Lemon" <jonathan.lemon@gmail.com>,
	bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf] libbpf: fix compatibility for kernels without need_wakeup
Date: Mon, 14 Oct 2019 10:26:10 +0200
Message-ID: <CAJ8uoz2mzgwvxpE1jsXvPEU=830MeOEtx4T_CMK3pjexFyJdnw@mail.gmail.com> (raw)
In-Reply-To: <CAADnVQKDr0u_YfB_eMYZEPKO1O=4hdzLye9FDMqjy4J3GL8Szg@mail.gmail.com>

On Sat, Oct 12, 2019 at 6:55 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Oct 11, 2019 at 1:58 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Magnus Karlsson wrote:
> > > On Tue, Oct 8, 2019 at 9:29 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > > >
> > > > Magnus Karlsson wrote:
> > > > > When the need_wakeup flag was added to AF_XDP, the format of the
> > > > > XDP_MMAP_OFFSETS getsockopt was extended. Code was added to the kernel
> > > > > to take care of compatibility issues arrising from running
> > > > > applications using any of the two formats. However, libbpf was not
> > > > > extended to take care of the case when the application/libbpf uses the
> > > > > new format but the kernel only supports the old format. This patch
> > > > > adds support in libbpf for parsing the old format, before the
> > > > > need_wakeup flag was added, and emulating a set of static need_wakeup
> > > > > flags that will always work for the application.
> > > > >
> > > > > Fixes: a4500432c2587cb2a ("libbpf: add support for need_wakeup flag in AF_XDP part")
> > > > > Reported-by: Eloy Degen <degeneloy@gmail.com>
> > > > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > > > > ---
> > > > >  tools/lib/bpf/xsk.c | 109 +++++++++++++++++++++++++++++++++++++---------------
> > > > >  1 file changed, 78 insertions(+), 31 deletions(-)
> > > > >
> > > > > diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> > > > > index a902838..46f9687 100644
> > > > > --- a/tools/lib/bpf/xsk.c
> > > > > +++ b/tools/lib/bpf/xsk.c
> > > > > @@ -44,6 +44,25 @@
> > > > >   #define PF_XDP AF_XDP
> > > > >  #endif
> > > > >
> > > > > +#define is_mmap_offsets_v1(optlen) \
> > > > > +     ((optlen) == sizeof(struct xdp_mmap_offsets_v1))
> > > > > +
> > > > > +#define get_prod_off(ring) \
> > > > > +     (is_mmap_offsets_v1(optlen) ? \
> > > > > +      ((struct xdp_mmap_offsets_v1 *)&off)->ring.producer : \
> > > > > +      off.ring.producer)
> > > > > +#define get_cons_off(ring) \
> > > > > +     (is_mmap_offsets_v1(optlen) ? \
> > > > > +      ((struct xdp_mmap_offsets_v1 *)&off)->ring.consumer : \
> > > > > +      off.ring.consumer)
> > > > > +#define get_desc_off(ring) \
> > > > > +     (is_mmap_offsets_v1(optlen) ? \
> > > > > +      ((struct xdp_mmap_offsets_v1 *)&off)->ring.desc : off.ring.desc)
> > > > > +#define get_flags_off(ring) \
> > > > > +     (is_mmap_offsets_v1(optlen) ? \
> > > > > +      ((struct xdp_mmap_offsets_v1 *)&off)->ring.consumer + sizeof(u32) : \
> > > > > +      off.ring.flags)
> > > > > +
> > > >
> > > > It seems the only thing added was flags right? If so seems we
> > > > only need the last one there, get_flags_off(). I think it would
> > > > be a bit cleaner to just use the macros where its actually
> > > > needed IMO.
> > >
> > > The flag is indeed added to the end of struct xdp_ring_offsets, but
> > > this struct is replicated four times in the struct xdp_mmap_offsets,
> > > so the added flags are present four time there at different offsets.
> > > This means that 3 out of the 4 prod, cons and desc variables are
> > > located at different offsets from the original. Do not know how I can
> > > get rid of these macros in this case. But it might just be me not
> > > seeing it, of course :-).
> >
> > Not sure I like it but not seeing a cleaner solution that doesn't cause
> > larger changes so...
> >
> > Acked-by: John Fastabend <john.fastabend.gmail.com>
>
> Frankly above hack looks awful.
> What is _v1 ?! Is it going to be _v2?
> What was _v0?
> I also don't see how this is a fix. imo bpf-next is more appropriate
> and if "large changes" are necessary then go ahead and do them.
> We're not doing fixes-branches in libbpf.
> The library always moves forward and compatible with all older kernels.

The fix in this patch is about making libbpf compatible with older
kernels (<=5.3). It is not at the moment in bpf. The current code in
bpf and bpf-next only works with the 5.3-rc kernels, which I think is
bad and a bug. But please let me know if this is bpf or bpf-next and I
will adjust accordingly.

As for the hack, I do not like it and neither did John, but no one
managed to come up with something better. But if this is a fix for bpf
(will not work at all for bpf-next for compatibility reasons), we
could potentially do something like this, as this is only present in
the 5.4-rc series. Currently the extension of the XDP_MMAP_OFFSETS in
5.4-rc is from:

struct xdp_ring_offset {
       __u64 producer;
       __u64 consumer;
       __u64 desc;
};

to:

struct xdp_ring_offset {
       __u64 producer;
       __u64 consumer;
       __u64 desc;
       __u64 flags;
};

while the overall struct provided to the getsockopt stayed the same:

struct xdp_mmap_offsets {
       struct xdp_ring_offset rx;
       struct xdp_ring_offset tx;
       struct xdp_ring_offset fr;
       struct xdp_ring_offset cr;
};

If we instead keep the original struct xdp_ring_offset and append the
new flag offsets at the end of struct xdp_mmap_offsets to something
like this:

struct xdp_mmap_offsets {
       struct xdp_ring_offset rx;
       struct xdp_ring_offset tx;
       struct xdp_ring_offset fr;
       struct xdp_ring_offset cr;
       __u64 rx_flag;
       __u64 tx_flag;
       __u64 fr_flag;
       __u64 cr_flag;
};

the implementation in both the kernel and libbpf becomes much cleaner.
The only change needed in libbpf and the kernel is to introduce a new
function for reading the flag field. The other offset reading and
setting code would stay the same, in contrast to the current scheme.
None of the current patch's macro crap would be needed. This would
also simplify the kernel implementation, improving maintainability.
But this would only be possible to change in bpf. So let me know what
you think. Do not know what the policy is here, so need some advice
please.

Cheers: Magnus

  reply index

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08 10:23 Magnus Karlsson
2019-10-08 19:28 ` John Fastabend
2019-10-09  7:31   ` Magnus Karlsson
2019-10-11 20:58     ` John Fastabend
2019-10-12 16:54       ` Alexei Starovoitov
2019-10-14  8:26         ` Magnus Karlsson [this message]
2019-10-16  3:42           ` Alexei Starovoitov

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ8uoz2mzgwvxpE1jsXvPEU=830MeOEtx4T_CMK3pjexFyJdnw@mail.gmail.com' \
    --to=magnus.karlsson@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git