linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
To: Alexander Lobakin <alobakin@pm.me>
Cc: Matteo Croce <mcroce@linux.microsoft.com>,
	Jesper Dangaard Brouer <jbrouer@redhat.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	David Ahern <dsahern@gmail.com>,
	Saeed Mahameed <saeed@kernel.org>, Andrew Lunn <andrew@lunn.ch>
Subject: Re: [PATCH net-next 0/6] page_pool: recycle buffers
Date: Wed, 24 Mar 2021 09:50:38 +0200	[thread overview]
Message-ID: <YFrvTtS8E/C5vYgo@enceladus> (raw)
In-Reply-To: <20210323200338.578264-1-alobakin@pm.me>

Hi Alexander,

On Tue, Mar 23, 2021 at 08:03:46PM +0000, Alexander Lobakin wrote:
> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Date: Tue, 23 Mar 2021 19:01:52 +0200
> 
> > On Tue, Mar 23, 2021 at 04:55:31PM +0000, Alexander Lobakin wrote:
> > > > > > > >
> >
> > [...]
> >
> > > > > > >
> > > > > > > Thanks for the testing!
> > > > > > > Any chance you can get a perf measurement on this?
> > > > > >
> > > > > > I guess you mean perf-report (--stdio) output, right?
> > > > > >
> > > > >
> > > > > Yea,
> > > > > As hinted below, I am just trying to figure out if on Alexander's platform the
> > > > > cost of syncing, is bigger that free-allocate. I remember one armv7 were that
> > > > > was the case.
> > > > >
> > > > > > > Is DMA syncing taking a substantial amount of your cpu usage?
> > > > > >
> > > > > > (+1 this is an important question)
> > >
> > > Sure, I'll drop perf tools to my test env and share the results,
> > > maybe tomorrow or in a few days.
> 
> Oh we-e-e-ell...
> Looks like I've been fooled by I-cache misses or smth like that.
> That happens sometimes, not only on my machines, and not only on
> MIPS if I'm not mistaken.
> Sorry for confusing you guys.
> 
> I got drastically different numbers after I enabled CONFIG_KALLSYMS +
> CONFIG_PERF_EVENTS for perf tools.
> The only difference in code is that I rebased onto Mel's
> mm-bulk-rebase-v6r4.
> 
> (lunar is my WIP NIC driver)
> 
> 1. 5.12-rc3 baseline:
> 
> TCP: 566 Mbps
> UDP: 615 Mbps
> 
> perf top:
>      4.44%  [lunar]              [k] lunar_rx_poll_page_pool
>      3.56%  [kernel]             [k] r4k_wait_irqoff
>      2.89%  [kernel]             [k] free_unref_page
>      2.57%  [kernel]             [k] dma_map_page_attrs
>      2.32%  [kernel]             [k] get_page_from_freelist
>      2.28%  [lunar]              [k] lunar_start_xmit
>      1.82%  [kernel]             [k] __copy_user
>      1.75%  [kernel]             [k] dev_gro_receive
>      1.52%  [kernel]             [k] cpuidle_enter_state_coupled
>      1.46%  [kernel]             [k] tcp_gro_receive
>      1.35%  [kernel]             [k] __rmemcpy
>      1.33%  [nf_conntrack]       [k] nf_conntrack_tcp_packet
>      1.30%  [kernel]             [k] __dev_queue_xmit
>      1.22%  [kernel]             [k] pfifo_fast_dequeue
>      1.17%  [kernel]             [k] skb_release_data
>      1.17%  [kernel]             [k] skb_segment
> 
> free_unref_page() and get_page_from_freelist() consume a lot.
> 
> 2. 5.12-rc3 + Page Pool recycling by Matteo:
> TCP: 589 Mbps
> UDP: 633 Mbps
> 
> perf top:
>      4.27%  [lunar]              [k] lunar_rx_poll_page_pool
>      2.68%  [lunar]              [k] lunar_start_xmit
>      2.41%  [kernel]             [k] dma_map_page_attrs
>      1.92%  [kernel]             [k] r4k_wait_irqoff
>      1.89%  [kernel]             [k] __copy_user
>      1.62%  [kernel]             [k] dev_gro_receive
>      1.51%  [kernel]             [k] cpuidle_enter_state_coupled
>      1.44%  [kernel]             [k] tcp_gro_receive
>      1.40%  [kernel]             [k] __rmemcpy
>      1.38%  [nf_conntrack]       [k] nf_conntrack_tcp_packet
>      1.37%  [kernel]             [k] free_unref_page
>      1.35%  [kernel]             [k] __dev_queue_xmit
>      1.30%  [kernel]             [k] skb_segment
>      1.28%  [kernel]             [k] get_page_from_freelist
>      1.27%  [kernel]             [k] r4k_dma_cache_inv
> 
> +20 Mbps increase on both TCP and UDP. free_unref_page() and
> get_page_from_freelist() dropped down the list significantly.
> 
> 3. 5.12-rc3 + Page Pool recycling + PP bulk allocator (Mel & Jesper):
> TCP: 596
> UDP: 641
> 
> perf top:
>      4.38%  [lunar]              [k] lunar_rx_poll_page_pool
>      3.34%  [kernel]             [k] r4k_wait_irqoff
>      3.14%  [kernel]             [k] dma_map_page_attrs
>      2.49%  [lunar]              [k] lunar_start_xmit
>      1.85%  [kernel]             [k] dev_gro_receive
>      1.76%  [kernel]             [k] free_unref_page
>      1.76%  [kernel]             [k] __copy_user
>      1.65%  [kernel]             [k] inet_gro_receive
>      1.57%  [kernel]             [k] tcp_gro_receive
>      1.48%  [kernel]             [k] cpuidle_enter_state_coupled
>      1.43%  [nf_conntrack]       [k] nf_conntrack_tcp_packet
>      1.42%  [kernel]             [k] __rmemcpy
>      1.25%  [kernel]             [k] skb_segment
>      1.21%  [kernel]             [k] r4k_dma_cache_inv
> 
> +10 Mbps on top of recycling.
> get_page_from_freelist() is gone.
> NAPI polling, CPU idle cycle (r4k_wait_irqoff) and DMA mapping
> routine became the top consumers.

Again, thanks for the extensive testing. 
I assume you dont use page pool to map the buffers right?
Because if the ampping is preserved the only thing you have to do is sync it
after the packet reception

> 
> 4-5. __always_inline for rmqueue_bulk() and __rmqueue_pcplist(),
> removing 'noinline' from net/core/page_pool.c etc.
> 
> ...makes absolutely no sense anymore.
> I see Mel took Jesper's patch to make __rmqueue_pcplist() inline into
> mm-bulk-rebase-v6r5, not sure if it's really needed now.
> 
> So I'm really glad we sorted out the things and I can see the real
> performance improvements from both recycling and bulk allocations.
> 

Those will probably be even bigger with and io(sm)/mu present

[...]

Cheers
/Ilias

  reply	other threads:[~2021-03-24  7:51 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-22 17:02 [PATCH net-next 0/6] page_pool: recycle buffers Matteo Croce
2021-03-22 17:02 ` [PATCH net-next 1/6] xdp: reduce size of struct xdp_mem_info Matteo Croce
2021-03-22 17:02 ` [PATCH net-next 2/6] mm: add a signature in struct page Matteo Croce
2021-03-22 17:02 ` [PATCH net-next 3/6] page_pool: DMA handling and allow to recycles frames via SKB Matteo Croce
2021-03-22 19:38   ` Matteo Croce
2021-03-22 17:02 ` [PATCH net-next 4/6] net: change users of __skb_frag_unref() and add an extra argument Matteo Croce
2021-03-22 17:03 ` [PATCH net-next 5/6] mvpp2: recycle buffers Matteo Croce
2021-03-22 17:03 ` [PATCH net-next 6/6] mvneta: " Matteo Croce
2021-03-23 15:06   ` Jesper Dangaard Brouer
2021-03-24  9:28     ` Lorenzo Bianconi
2021-03-24 21:48       ` Ilias Apalodimas
2021-03-23 14:57 ` [PATCH net-next 0/6] page_pool: " David Ahern
2021-03-23 15:03   ` Ilias Apalodimas
2021-03-23 15:41 ` Alexander Lobakin
2021-03-23 15:47   ` Ilias Apalodimas
2021-03-23 16:04     ` Jesper Dangaard Brouer
2021-03-23 16:10       ` Ilias Apalodimas
2021-03-23 16:28         ` Matteo Croce
2021-03-23 16:55           ` Alexander Lobakin
2021-03-23 17:01             ` Ilias Apalodimas
2021-03-23 20:03               ` Alexander Lobakin
2021-03-24  7:50                 ` Ilias Apalodimas [this message]
2021-03-24 11:42                   ` Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFrvTtS8E/C5vYgo@enceladus \
    --to=ilias.apalodimas@linaro.org \
    --cc=alobakin@pm.me \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=hawk@kernel.org \
    --cc=jbrouer@redhat.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=mcroce@linux.microsoft.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeed@kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).