All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Joe Damato <jdamato@fastly.com>,
	Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: brouer@redhat.com, netdev@vger.kernel.org, kuba@kernel.org,
	davem@davemloft.net, ilias.apalodimas@linaro.org,
	hawk@kernel.org
Subject: Re: [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters
Date: Wed, 2 Feb 2022 15:04:33 +0100	[thread overview]
Message-ID: <33216f8e-0b9c-c701-0cdf-e96229bc03d6@redhat.com> (raw)
In-Reply-To: <CALALjgxwkRBFd86=kcOjeF4uvhCgB52gG7ka_e16=Qf-3LQiFQ@mail.gmail.com>

On 27/01/2022 22.08, Joe Damato wrote:
> On Thu, Jan 27, 2022 at 12:51 AM Jesper Dangaard Brouer
> <jbrouer@redhat.com> wrote:
>>
>> On 26/01/2022 23.48, Joe Damato wrote:
>>> Greetings:
>>>
>>> This series adds some stat counters for the page_pool allocation path which
>>> help to track:
>>>
>>>        - fast path allocations
>>>        - slow path order-0 allocations
>>>        - slow path high order allocations
>>>        - refills which failed due to an empty ptr ring, forcing a slow
>>>          path allocation
>>>        - allocations fulfilled via successful refill
>>>        - pages which cannot be added to the cache because of numa mismatch
>>>          (i.e. waived)
>>>
>>> Some static inline wrappers are provided for accessing these stats. The
>>> intention is that drivers which use the page_pool API can, if they choose,
>>> use this stats API.
>>
>> You are adding (always on) counters to a critical fast-path, that
>> drivers uses for the XDP_DROP use-case.
> 
> If you prefer requiring users explicitly enable these stats, I am
> happy to add a kernel config option (e.g. CONFIG_PAGE_POOL_DEBUG or
> similar) in a v2.
> 
>> I want to see performance measurements as documentation, showing this is
>> not causing a slow-down.
>>
>> I have some performance tests here[1]:
>>    [1]
>> https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/lib
>>
>> Look at:
>>    - bench_page_pool_simple.c and
>>    - bench_page_pool_cross_cpu.c
>>
>> How to use + build this[2]:
>>    [2]
>> https://prototype-kernel.readthedocs.io/en/latest/prototype-kernel/build-process.html
> 
> Thanks for the pointers to the benchmarks.
> 
> In general, I noted that the benchmark results varied fairly
> substantially between repeated runs on the same system.
> 
> Results below suggest that:
>     -  bench_page_pool_simple is faster on the test kernel, and
>     - bench_page_pool_cross_cpu faster on the control
> 
> Subsequent runs of bench_page_pool_cross_cpu on the control, however,
> reveal *much* slower results than shown below.
> 
> Test system:
>    - 2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
>    - 2 numa zones, with 18 cores per zone and 2 threads per core
> 
> Control kernel: built from net-next at commit e2cf07654efb ("ptp:
> replace snprintf with sysfs_emit").
> Test kernel: This series applied on top of control kernel mentioned above.
> 
> Raw output from dmesg for control [1] and test [2] kernel summarized below:
> 
> bench_page_pool_simple
>    - run with default options (i.e. "sudo mod probe bench_page_pool_simple").
> 
> Control:
> 
> Type:for_loop Per elem: 0 cycles(tsc) 0.334 ns (step:0)
> Type:atomic_inc Per elem: 13 cycles(tsc) 6.021 ns (step:0)
> Type:lock Per elem: 31 cycles(tsc) 13.514 ns (step:0)
> 
> Type:no-softirq-page_pool01 Per elem: 44 cycles(tsc) 19.549 ns (step:0)
> Type:no-softirq-page_pool02 Per elem: 45 cycles(tsc) 19.658 ns (step:0)
> Type:no-softirq-page_pool03 Per elem: 118 cycles(tsc) 51.638 ns (step:0)
> 
> Type:tasklet_page_pool01_fast_path Per elem: 17 cycles(tsc) 7.472 ns (step:0)
> Type:tasklet_page_pool02_ptr_ring Per elem: 42 cycles(tsc) 18.585 ns (step:0)
> Type:tasklet_page_pool03_slow Per elem: 109 cycles(tsc) 47.807 ns (step:0)
> 
> Test:
> 
> Type:for_loop Per elem: 0 cycles(tsc) 0.334 ns (step:0)
> Type:atomic_inc Per elem: 14 cycles(tsc) 6.195 ns (step:0)
> Type:lock Per elem: 31 cycles(tsc) 13.827 ns (step:0)
> 
> Type:no-softirq-page_pool01 Per elem: 44 cycles(tsc) 19.561 ns (step:0)
> Type:no-softirq-page_pool02 Per elem: 45 cycles(tsc) 19.700 ns (step:0)
> Type:no-softirq-page_pool03 Per elem: 108 cycles(tsc) 47.186 ns (step:0)
> 
> Type:tasklet_page_pool01_fast_path Per elem: 12 cycles(tsc) 5.447 ns (step:0)

Watch out for the: measurement period time:0.054478253 (taken from [2])

If the measurement period becomes too small, you/we cannot use the 
results.  Perhaps I've set the default 'loops' variable too low, for 
these modern systems.  Hint it is adjust as module parameter 'loops'.



> Type:tasklet_page_pool02_ptr_ring Per elem: 42 cycles(tsc) 18.501 ns (step:0)
> Type:tasklet_page_pool03_slow Per elem: 106 cycles(tsc) 46.313 ns (step:0)
> 
> bench_page_pool_cross_cpu
>    - run with default options (i.e. "sudo mod probe bench_page_pool_cross_cpu").
> 
> Control:
> Type:page_pool_cross_cpu CPU(0) 1795 cycles(tsc) 782.567 ns (step:2)
> Type:page_pool_cross_cpu CPU(1) 1921 cycles(tsc) 837.435 ns (step:2)
> Type:page_pool_cross_cpu CPU(2) 960 cycles(tsc) 418.758 ns (step:2)
> Sum Type:page_pool_cross_cpu Average: 1558 cycles(tsc) CPUs:3 step:2
> 
> Test:
> Type:page_pool_cross_cpu CPU(0) 2411 cycles(tsc) 1051.037 ns (step:2)
> Type:page_pool_cross_cpu CPU(1) 2467 cycles(tsc) 1075.204 ns (step:2)
> Type:page_pool_cross_cpu CPU(2) 1233 cycles(tsc) 537.629 ns (step:2)
> Type:page_pool_cross_cpu Average: 2037 cycles(tsc) CPUs:3 step:2

I think the effect you are seeing here is because you placed your stats 
struct on the a cache-line that is also used by remote CPUs 
freeing/recycling page'es back to the page_pool.


> 
> [1]: https://gist.githubusercontent.com/jdamato-fsly/385806f06cb95c61ff8cecf7a3645e75/raw/886e3208f5b9c47abdd59bdaa7ecf27994f477b1/page_pool_bench_control
> [2]: https://gist.githubusercontent.com/jdamato-fsly/385806f06cb95c61ff8cecf7a3645e75/raw/886e3208f5b9c47abdd59bdaa7ecf27994f477b1/page_pool_bench_TESTKERNEL
> 
> 
>>> It assumed that the API consumer will ensure the page_pool is not destroyed
>>> during calls to the stats API.
>>>
>>> If this series is accepted, I'll submit a follow up patch which will export
>>> these stats per RX-ring via ethtool in a driver which uses the page_pool
>>> API.
>>>
>>> Joe Damato (6):
>>>     net: page_pool: Add alloc stats and fast path stat
>>>     net: page_pool: Add a stat for the slow alloc path
>>>     net: page_pool: Add a high order alloc stat
>>>     net: page_pool: Add stat tracking empty ring
>>>     net: page_pool: Add stat tracking cache refills.
>>>     net: page_pool: Add a stat tracking waived pages.
>>>
>>>    include/net/page_pool.h | 82 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>    net/core/page_pool.c    | 15 +++++++--
>>>    2 files changed, 94 insertions(+), 3 deletions(-)
>>>
>>
> 


  reply	other threads:[~2022-02-02 14:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26 22:48 [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Joe Damato
2022-01-26 22:48 ` [PATCH 1/6] net: page_pool: Add alloc stats and fast path stat Joe Damato
2022-01-27 16:32   ` Jakub Kicinski
2022-01-27 21:11     ` Joe Damato
2022-02-02 14:14   ` Jesper Dangaard Brouer
2022-01-26 22:48 ` [PATCH net-next 2/6] net: page_pool: Add a stat for the slow alloc path Joe Damato
2022-01-26 22:48 ` [PATCH net-next 3/6] net: page_pool: Add a high order alloc stat Joe Damato
2022-01-26 22:48 ` [PATCH net-next 4/6] net: page_pool: Add stat tracking empty ring Joe Damato
2022-01-26 22:48 ` [PATCH net-next 5/6] net: page_pool: Add stat tracking cache refills Joe Damato
2022-01-26 22:48 ` [PATCH 6/6] net: page_pool: Add a stat tracking waived pages Joe Damato
2022-01-27  8:51 ` [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Jesper Dangaard Brouer
2022-01-27 21:08   ` Joe Damato
2022-02-02 14:04     ` Jesper Dangaard Brouer [this message]
2022-01-27  9:08 ` Ilias Apalodimas
2022-01-27 23:55   ` Joe Damato
2022-01-29 14:07     ` Ilias Apalodimas
2022-01-29 18:07       ` Joe Damato

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33216f8e-0b9c-c701-0cdf-e96229bc03d6@redhat.com \
    --to=jbrouer@redhat.com \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jdamato@fastly.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.