linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	Linux-RT-Users <linux-rt-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [RFC PATCH 0/6] Use local_lock for pcp protection and reduce stat overhead
Date: Wed, 31 Mar 2021 08:38:05 +0100	[thread overview]
Message-ID: <20210331073805.GY3697@techsingularity.net> (raw)
In-Reply-To: <20210330205154.1fe1e479@carbon>

On Tue, Mar 30, 2021 at 08:51:54PM +0200, Jesper Dangaard Brouer wrote:
> On Mon, 29 Mar 2021 13:06:42 +0100
> Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > This series requires patches in Andrew's tree so the series is also
> > available at
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-percpu-local_lock-v1r15
> > 
> > tldr: Jesper and Chuck, it would be nice to verify if this series helps
> > 	the allocation rate of the bulk page allocator. RT people, this
> > 	*partially* addresses some problems PREEMPT_RT has with the page
> > 	allocator but it needs review.
> 
> I've run a new micro-benchmark[1] which shows:
> (CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz)
> 
> <Editting to focus on arrays>
> BASELINE
>  single_page alloc+put: 194 cycles(tsc) 54.106 ns
> 
> ARRAY variant: time_bulk_page_alloc_free_array: step=bulk size
> 
>  Per elem: 195 cycles(tsc) 54.225 ns (step:1)
>  Per elem: 127 cycles(tsc) 35.492 ns (step:2)
>  Per elem: 117 cycles(tsc) 32.643 ns (step:3)
>  Per elem: 111 cycles(tsc) 30.992 ns (step:4)
>  Per elem: 106 cycles(tsc) 29.606 ns (step:8)
>  Per elem: 102 cycles(tsc) 28.532 ns (step:16)
>  Per elem: 99 cycles(tsc) 27.728 ns (step:32)
>  Per elem: 98 cycles(tsc) 27.252 ns (step:64)
>  Per elem: 97 cycles(tsc) 27.090 ns (step:128)
> 
> This should be seen in comparison with the older micro-benchmark[2]
> done on branch mm-bulk-rebase-v5r9.
> 
> BASELINE
>  single_page alloc+put: Per elem: 199 cycles(tsc) 55.472 ns
> 
> ARRAY variant: time_bulk_page_alloc_free_array: step=bulk size
> 
>  Per elem: 202 cycles(tsc) 56.383 ns (step:1)
>  Per elem: 144 cycles(tsc) 40.047 ns (step:2)
>  Per elem: 134 cycles(tsc) 37.339 ns (step:3)
>  Per elem: 128 cycles(tsc) 35.578 ns (step:4)
>  Per elem: 120 cycles(tsc) 33.592 ns (step:8)
>  Per elem: 116 cycles(tsc) 32.362 ns (step:16)
>  Per elem: 113 cycles(tsc) 31.476 ns (step:32)
>  Per elem: 110 cycles(tsc) 30.633 ns (step:64)
>  Per elem: 110 cycles(tsc) 30.596 ns (step:128)
> 

Ok, so bulk allocation is faster than allocating single pages, no surprise
there. Putting the array figures for bulk allocation into tabular format
and comparing we get;

Array variant (time to allocate a page in nanoseconds, lower is better)
        Baseline        Patched
1       56.383          54.225 (+3.83%)
2       40.047          35.492 (+11.38%)
3       37.339          32.643 (+12.58%)
4       35.578          30.992 (+12.89%)
8       33.592          29.606 (+11.87%)
16      32.362          28.532 (+11.85%)
32      31.476          27.728 (+11.91%)
64      30.633          27.252 (+11.04%)
128     30.596          27.090 (+11.46%)

The series is 11-12% faster when allocating multiple pages.  That's a
fairly positive outcome and I'll include this in the series leader if
you have no objections.

Thanks Jesper!

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2021-03-31  7:38 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-29 12:06 [RFC PATCH 0/6] Use local_lock for pcp protection and reduce stat overhead Mel Gorman
2021-03-29 12:06 ` [PATCH 1/6] mm/page_alloc: Split per cpu page lists and zone stats Mel Gorman
2021-03-29 12:06 ` [PATCH 2/6] mm/page_alloc: Convert per-cpu list protection to local_lock Mel Gorman
2021-03-31  9:55   ` Thomas Gleixner
2021-03-31 11:01     ` Mel Gorman
2021-03-31 17:42       ` Thomas Gleixner
2021-03-31 17:46         ` Thomas Gleixner
2021-03-31 20:42         ` Mel Gorman
2021-03-29 12:06 ` [PATCH 3/6] mm/vmstat: Convert NUMA statistics to basic NUMA counters Mel Gorman
2021-03-29 12:06 ` [PATCH 4/6] mm/vmstat: Inline NUMA event counter updates Mel Gorman
2021-03-29 12:06 ` [PATCH 5/6] mm/page_alloc: Batch the accounting updates in the bulk allocator Mel Gorman
2021-03-29 12:06 ` [PATCH 6/6] mm/page_alloc: Reduce duration that IRQs are disabled for VM counters Mel Gorman
2021-03-30 18:51 ` [RFC PATCH 0/6] Use local_lock for pcp protection and reduce stat overhead Jesper Dangaard Brouer
2021-03-31  7:38   ` Mel Gorman [this message]
2021-03-31  8:17     ` Jesper Dangaard Brouer
2021-03-31  8:52 ` Mel Gorman
2021-03-31  9:51   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210331073805.GY3697@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=brouer@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).