Re: [PATCH bpf-next] selftests/bpf: use getpagesize() to initialize ring buffer size

From: Hou Tao <hotforest@gmail.com>
To: andrii.nakryiko@gmail.com
Cc: andrii@kernel.org, ast@kernel.org, bpf@vger.kernel.org,
	daniel@iogearbox.net, davem@davemloft.net, hotforest@gmail.com,
	houtao1@huawei.com, kafai@fb.com, kuba@kernel.org,
	netdev@vger.kernel.org, yhs@fb.com
Subject: Re: [PATCH bpf-next] selftests/bpf: use getpagesize() to initialize ring buffer size
Date: Thu,  3 Feb 2022 19:12:45 +0800	[thread overview]
Message-ID: <20220203111245.3495-1-houtao1@huawei.com> (raw)
In-Reply-To: <CAEf4BzY_BGV_8d8+gUMva6dpnHq=JSo8oU0p3tc_o=7ii2gU4A@mail.gmail.com>

Hi,

> On Tue, Feb 1, 2022 at 6:36 PM Hou Tao <hotforest@gmail.com> wrote:
> >
> > Hi,
> >
> > > >
> > > > Hi Andrii,
> > > >
> > > > > >
> > > > > > 4096 is OK for x86-64, but for other archs with greater than 4KB
> > > > > > page size (e.g. 64KB under arm64), test_verifier for test case
> > > > > > "check valid spill/fill, ptr to mem" will fail, so just use
> > > > > > getpagesize() to initialize the ring buffer size. Do this for
> > > > > > test_progs as well.
> > > > > >
> > > > [...]
> > > >
> > > > > > diff --git a/tools/testing/selftests/bpf/progs/ima.c b/tools/testing/selftests/bpf/progs/ima.c
> > > > > > index 96060ff4ffc6..e192a9f16aea 100644
> > > > > > --- a/tools/testing/selftests/bpf/progs/ima.c
> > > > > > +++ b/tools/testing/selftests/bpf/progs/ima.c
> > > > > > @@ -13,7 +13,6 @@ u32 monitored_pid = 0;
> > > > > >
> > > > > >  struct {
> > > > > >         __uint(type, BPF_MAP_TYPE_RINGBUF);
> > > > > > -       __uint(max_entries, 1 << 12);
> > > > >
> > > > > Should we just bump it to 64/128/256KB instead? It's quite annoying to
> > > > > do a split open and then load just due to this...
> > > > >
> > > > Agreed.
> > > >
> > > > > I'm also wondering if we should either teach kernel to round up to
> > > > > closes power-of-2 of page_size internally, or teach libbpf to do this
> > > > > for RINGBUF maps. Thoughts?
> > > > >
[...]
> > >
> > > No, if you read BPF ringbuf code carefully you'll see that we map the
> > > entire ringbuf data twice in the memory (see [0] for lame ASCII
> > > diagram), so that records that are wrapped at the end of the ringbuf
> > > and go back to the start are still accessible as a linear array. It's
> > > a very important guarantee, so it has to be page size multiple. But
> > > auto-increasing it to the closest power-of-2 of page size seems like a
> > > pretty low-impact change. Hard to imagine breaking anything except
> > > some carefully crafted tests for ENOSPC behavior.
> > >
> >
> > Yes, i know the double map trick. What i tried to say is that:
> > (1) remove the page-aligned restrain for max_entries
> > (2) still allocate page-aligned memory for ringbuf
> >
> > instead of rounding max_entries up to closest power-of-2 page size
> > directly, so max_entries from userspace is unchanged and double map trick
> > still works.
> 
> I don't see how. Knowing the correct and exact size of the ringbuf
> data area is mandatory for correctly consuming ringbuf data from
> user-space. But if I'm missing something, feel free to give it a try
> and see if it actually works.
> 
You are right. The userspace needs max_entries to do mmap() for data
area, so max_entries must be page-sized aligned.

If we want to do the automatic round-up, i think libbpf would be a better
place, because if the round-up is done in kernel, the userspace program
may use the old max_entries to call mmap(), the consumer side will not
work and leads to confusion for usage. If we do auto-round-up in libbpf,
the setup procedure is hidden from libbpf user. Will add the auto
round-up and its tests in libbpf.

Regards
Tao
> 
> >
> > > [0] https://github.com/torvalds/linux/blob/master/kernel/bpf/ringbuf.c#L73-L89
> >