Re: [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode

From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: "John Fastabend" <john.fastabend@gmail.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Eduard Zingerman" <eddyz87@gmail.com>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"KP Singh" <kpsingh@kernel.org>,
	"Stanislav Fomichev" <sdf@google.com>,
	"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Eric Dumazet" <edumazet@google.com>,
	"Paolo Abeni" <pabeni@redhat.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode
Date: Thu, 4 Apr 2024 15:09:52 +0200	[thread overview]
Message-ID: <ef853ce2-4e44-4a07-8dfd-6b8a2d48ad62@intel.com> (raw)
In-Reply-To: <f53e02ef-ee33-4cd0-b045-a3efe7f0fae4@kernel.org>

From: Jesper Dangaard Brouer <hawk@kernel.org>
Date: Thu, 4 Apr 2024 13:43:12 +0200

> 
> 
> On 03/04/2024 22.39, John Fastabend wrote:
>> Alexander Lobakin wrote:
>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>> Date: Tue, 20 Feb 2024 22:03:39 +0100
>>>
>>>> The BPF_TEST_RUN code in XDP live frame mode creates a new page pool
>>>> each time it is called and uses that to allocate the frames used for
>>>> the
>>>> XDP run. This works well if the syscall is used with a high repetitions
>>>> number, as it allows for efficient page recycling. However, if used
>>>> with
>>>> a small number of repetitions, the overhead of creating and tearing
>>>> down
>>>> the page pool is significant, and can even lead to system stalls if the
>>>> syscall is called in a tight loop.
>>>>
>>>> Now that we have a persistent system page pool instance, it becomes
>>>> pretty straight forward to change the test_run code to use it. The only
>>>> wrinkle is that we can no longer rely on a custom page init callback
>>>> from page_pool itself; instead, we change the test_run code to write a
>>>> random cookie value to the beginning of the page as an indicator that
>>>> the page has been initialised and can be re-used without copying the
>>>> initial data again.
>>>>
>>>> The cookie is a random 128-bit value, which means the probability that
>>>> we will get accidental collisions (which would lead to recycling the
>>>> wrong page values and reading garbage) is on the order of 2^-128. This
>>>> is in the "won't happen before the heat death of the universe"
>>>> range, so
>>>> this marking is safe for the intended usage.
>>>>
>>>> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>>>> Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>
>>> Hey,
>>>
>>> What's the status of this series, now that the window is open?
>>>
>>> Thanks,
>>> Olek
>>
>> Hi Toke,
>>
>> I read the thread from top to bottom so seems someone else notices the
>> 2^128 is unique numbers not the collision probability. Anywaays I'm still
>> a bit confused, whats the use case here? Maybe I need to understand
>> what this XDP live frame mode is better?
>>
>> Could another solution be to avoid calling BPF_TEST_RUN multiple times
>> in a row? Or perhaps have a BPF_SETUP_RUN that does the config and lets
>> BPF_TEST_RUN skip the page allocation? Another idea just have the first
>> run of BPF_TEST_RUN init a page pool and not destroy it.
>>
> 
> I like John's idea of "the first run of BPF_TEST_RUN init a page pool
> and not destroy it".  On exit we could start a work-queue that tried to
> "destroy" the PP (in the future) if it's not in use.
> 
> Page pool (PP) performance comes from having an association with a
> SINGLE RX-queue, which means no synchronization is needed then
> "allocating" new pages (from the alloc cache array).
> 
> Thus, IMHO each BPF_TEST_RUN should gets it's own PP instance, as then
> lockless PP scheme works (and we don't have to depend on NAPI /
> BH-disable).  This still works with John's idea, as we could simply have
> a list of PP instances that can be reused.

Lockless PP scheme works for percpu PPs as well via page_pool::cpuid,
seems like you missed this change?
Percpu page_pool is CPU-local, which means it absolutely can't be
accessed from several threads simultaneously.

> 
> --Jesper

Thanks,
Olek