All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiongfeng Wang <wangxiongfeng2@huawei.com>
To: Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	<akpm@linux-foundation.org>
Cc: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<frederic@kernel.org>, <tglx@linutronix.de>,
	<mtosatti@redhat.com>, <mgorman@suse.de>,
	<linux-rt-users@vger.kernel.org>, <vbabka@suse.cz>,
	<cl@linux.com>, <paulmck@kernel.org>, <willy@infradead.org>
Subject: Re: [PATCH 0/2] mm/page_alloc: Remote per-cpu lists drain support
Date: Thu, 10 Feb 2022 18:59:46 +0800	[thread overview]
Message-ID: <8cf34eed-be5a-1aae-0523-c2de9e087cb6@huawei.com> (raw)
In-Reply-To: <20220208100750.1189808-1-nsaenzju@redhat.com>

Hi Nicolas,

I found there are several 'lru_add_drain_per_cpu()' worker thread on the
NOHZ_FULL CPUs. I am just wondering do you have plan to add support for remote
per-cpu LRU pagevec drain ?

Thanks,
Xiongfeng

On 2022/2/8 18:07, Nicolas Saenz Julienne wrote:
> This series replaces mm/page_alloc's per-cpu page lists drain mechanism with
> one that allows accessing the lists remotely. Currently, only a local CPU is
> permitted to change its per-cpu lists, and it's expected to do so, on-demand,
> whenever a process demands it by means of queueing a drain task on the local
> CPU. This causes problems for NOHZ_FULL CPUs and real-time systems that can't
> take any sort of interruption and to some lesser extent inconveniences idle and
> virtualised systems.
> 
> The new algorithm will atomically switch the pointer to the per-cpu page lists
> and use RCU to make sure it's not being concurrently used before draining the
> lists. And its main benefit of is that it fixes the issue for good, avoiding
> the need for configuration based heuristics or having to modify applications
> (i.e. using the isolation prctrl being worked by Marcello Tosatti ATM).
> 
> All this with minimal performance implications: a page allocation
> microbenchmark was run on multiple systems and architectures generally showing
> no performance differences, only the more extreme cases showed a 1-3%
> degradation. See data below. Needless to say that I'd appreciate if someone
> could validate my values independently.
> 
> The approach has been stress-tested: I forced 100 drains/s while running
> mmtests' pft in a loop for a full day on multiple machines and archs (arm64,
> x86_64, ppc64le).
> 
> Note that this is not the first attempt at fixing this per-cpu page lists:
>  - The first attempt[1] tried to conditionally change the pagesets locking
>    scheme based the NOHZ_FULL config. It was deemed hard to maintain as the
>    NOHZ_FULL code path would be rarely tested. Also, this only solves the issue
>    for NOHZ_FULL setups, which isn't ideal.
>  - The second[2] unanimously switched the local_locks to per-cpu spinlocks. The
>    performance degradation was too big.
> 
> Previous RFC: https://lkml.org/lkml/2021/10/8/793
> 
> Thanks!
> 
> [1] https://lkml.org/lkml/2021/9/21/599
> [2] https://lkml.org/lkml/2021/11/3/644
> 
> ---
> 
> Changes since RFC:
>  - Get rid of aesthetic changes that affected performance
>  - Add more documentation
>  - Add better commit messages
>  - Pass sparse tests
>  - Verify this_cpu_*() usage
>  - Performance measurements
> 
> Nicolas Saenz Julienne (2):
>   mm/page_alloc: Access lists in 'struct per_cpu_pages' indirectly
>   mm/page_alloc: Add remote draining support to per-cpu lists
> 
>  include/linux/mmzone.h |  28 +++++-
>  mm/page_alloc.c        | 212 ++++++++++++++++++++++++++---------------
>  mm/vmstat.c            |   6 +-
>  3 files changed, 162 insertions(+), 84 deletions(-)
> 
> 
> -------------------------Performance results-----------------------------
> 
> I'm focusing on mmtests' Page Fault Test (pft), as it's page allocator
> intensive.
> 
> - AMD Daytona Reference System, 2 sockets, AMD EPYC 7742, Zen 2, 64-Core,
>   4 NUMA nodes, x86_64
> 
> pft timings:
>                                      vanilla                    rcu
> Amean     system-1          58.52 (   0.00%)       58.92 *  -0.68%*
> Amean     system-4          61.00 (   0.00%)       61.41 *  -0.67%*
> Amean     system-7          61.55 (   0.00%)       61.74 *  -0.30%*
> Amean     system-12         64.91 (   0.00%)       64.94 *  -0.05%*
> Amean     system-21         98.80 (   0.00%)       99.92 *  -1.13%*
> Amean     system-30        147.68 (   0.00%)      145.83 *   1.25%*
> Amean     system-48        237.04 (   0.00%)      241.29 *  -1.79%*
> Amean     system-79        286.61 (   0.00%)      283.72 *   1.01%*
> Amean     system-110       303.40 (   0.00%)      299.91 *   1.15%*
> Amean     system-128       345.07 (   0.00%)      342.10 *   0.86%*
> Amean     elapsed-1         61.21 (   0.00%)       61.65 *  -0.71%*
> Amean     elapsed-4         15.94 (   0.00%)       16.05 *  -0.69%*
> Amean     elapsed-7          9.24 (   0.00%)        9.28 *  -0.47%*
> Amean     elapsed-12         5.70 (   0.00%)        5.70 *  -0.07%*
> Amean     elapsed-21         5.11 (   0.00%)        5.06 *   1.13%*
> Amean     elapsed-30         5.28 (   0.00%)        5.14 *   2.73%*
> Amean     elapsed-48         5.28 (   0.00%)        5.24 *   0.74%*
> Amean     elapsed-79         4.41 (   0.00%)        4.31 *   2.17%*
> Amean     elapsed-110        3.45 (   0.00%)        3.44 *   0.40%*
> Amean     elapsed-128        2.75 (   0.00%)        2.75 *  -0.28%*
> 
> - AMD Speedway Reference System, 2 sockets, AMD EPYC 7601, Zen 1, 64-core, 8
>   NUMA nodes, x86_64. Lots of variance between tests on this platform. It'll
>   easily swing -+5% on each result. 
> 
> pft timings:
>                                      vanilla                    rcu
> Amean     system-1          69.20 (   0.00%)       66.21 *   4.32%*
> Amean     system-4          70.79 (   0.00%)       69.01 *   2.52%*
> Amean     system-7          71.34 (   0.00%)       69.16 *   3.05%*
> Amean     system-12         74.00 (   0.00%)       72.74 *   1.70%*
> Amean     system-21         86.01 (   0.00%)       85.70 *   0.36%*
> Amean     system-30         89.21 (   0.00%)       89.93 *  -0.80%*
> Amean     system-48         92.39 (   0.00%)       92.43 *  -0.04%*
> Amean     system-79        120.19 (   0.00%)      121.30 *  -0.92%*
> Amean     system-110       172.79 (   0.00%)      179.37 *  -3.81%*
> Amean     system-128       201.70 (   0.00%)      212.57 *  -5.39%*
> Amean     elapsed-1         72.23 (   0.00%)       69.29 *   4.08%*
> Amean     elapsed-4         18.69 (   0.00%)       18.28 *   2.20%*
> Amean     elapsed-7         10.80 (   0.00%)       10.54 *   2.41%*
> Amean     elapsed-12         6.62 (   0.00%)        6.53 *   1.30%*
> Amean     elapsed-21         4.68 (   0.00%)        4.69 *  -0.14%*
> Amean     elapsed-30         3.44 (   0.00%)        3.50 *  -1.66%*
> Amean     elapsed-48         2.40 (   0.00%)        2.42 *  -1.00%*
> Amean     elapsed-79         2.05 (   0.00%)        2.09 *  -1.90%*
> Amean     elapsed-110        1.83 (   0.00%)        1.91 *  -4.60%*
> Amean     elapsed-128        1.75 (   0.00%)        1.85 *  -5.99%*
> 
> - IBM 9006-22C system, 2 sockets, POWER9, 64-Core, 1 NUMA node per cpu,
>   pppc64le.
> 
> pft timings:
>                                      vanilla                    rcu
> Amean     system-1           1.82 (   0.00%)        1.85 *  -1.43%*
> Amean     system-4           2.18 (   0.00%)        2.22 *  -2.02%*
> Amean     system-7           3.27 (   0.00%)        3.28 *  -0.15%*
> Amean     system-12          5.22 (   0.00%)        5.20 *   0.26%*
> Amean     system-21         10.10 (   0.00%)       10.20 *  -1.00%*
> Amean     system-30         15.00 (   0.00%)       14.52 *   3.20%*
> Amean     system-48         26.41 (   0.00%)       25.96 *   1.71%*
> Amean     system-79         29.35 (   0.00%)       29.70 *  -1.21%*
> Amean     system-110        24.01 (   0.00%)       23.40 *   2.54%*
> Amean     system-128        24.57 (   0.00%)       25.32 *  -3.06%*
> Amean     elapsed-1          1.85 (   0.00%)        1.87 *  -1.28%*
> Amean     elapsed-4          0.56 (   0.00%)        0.57 *  -1.72%*
> Amean     elapsed-7          0.51 (   0.00%)        0.50 *   0.07%*
> Amean     elapsed-12         0.51 (   0.00%)        0.51 *   0.06%*
> Amean     elapsed-21         0.54 (   0.00%)        0.54 *   0.06%*
> Amean     elapsed-30         0.54 (   0.00%)        0.53 *   2.22%*
> Amean     elapsed-48         0.58 (   0.00%)        0.57 *   1.73%*
> Amean     elapsed-79         0.49 (   0.00%)        0.48 *   0.89%*
> Amean     elapsed-110        0.37 (   0.00%)        0.37 *  -1.08%*
> Amean     elapsed-128        0.33 (   0.00%)        0.33 *   0.00%*
> 
> - Ampere MtSnow, 1 socket, Neoverse-N1, 80-Cores, 1 NUMA node, arm64.
> 
> pft timings:
>                                     vanilla		       rcu
> Amean     system-1         11.92 (   0.00%)       11.99 *  -0.61%*
> Amean     system-4         13.13 (   0.00%)       13.09 *   0.31%*
> Amean     system-7         13.91 (   0.00%)       13.94 *  -0.20%*
> Amean     system-12        15.77 (   0.00%)       15.69 *   0.48%*
> Amean     system-21        21.32 (   0.00%)       21.42 *  -0.46%*
> Amean     system-30        28.58 (   0.00%)       29.12 *  -1.90%*
> Amean     system-48        47.41 (   0.00%)       46.91 *   1.04%*
> Amean     system-79        76.76 (   0.00%)       77.16 *  -0.52%*
> Amean     system-80        77.98 (   0.00%)       78.23 *  -0.32%*
> Amean     elapsed-1        12.46 (   0.00%)       12.53 *  -0.58%*
> Amean     elapsed-4         3.47 (   0.00%)        3.46 *   0.34%*
> Amean     elapsed-7         2.18 (   0.00%)        2.21 *  -1.58%*
> Amean     elapsed-12        1.41 (   0.00%)        1.42 *  -0.80%*
> Amean     elapsed-21        1.09 (   0.00%)        1.12 *  -2.60%*
> Amean     elapsed-30        0.98 (   0.00%)        1.01 *  -3.08%*
> Amean     elapsed-48        1.08 (   0.00%)        1.10 *  -1.78%*
> Amean     elapsed-79        1.32 (   0.00%)        1.28 *   2.71%*
> Amean     elapsed-80        1.32 (   0.00%)        1.28 *   3.23%*
> 
> - Dell R430, 2 sockets, Intel Xeon E5-2640 v3, Sandy Bridge, 16-Cores, 2 NUMA
>   nodes, x86_64.
> 
> pft timings:
>                                     vanilla		       rcu
> Amean     system-1         11.10 (   0.00%)       11.07 *   0.24%*
> Amean     system-3         11.14 (   0.00%)       11.10 *   0.34%*
> Amean     system-5         11.18 (   0.00%)       11.13 *   0.47%*
> Amean     system-7         11.21 (   0.00%)       11.17 *   0.38%*
> Amean     system-12        11.28 (   0.00%)       11.28 (  -0.03%)
> Amean     system-18        13.24 (   0.00%)       13.25 *  -0.11%*
> Amean     system-24        17.12 (   0.00%)       17.14 (  -0.13%)
> Amean     system-30        21.10 (   0.00%)       21.23 *  -0.60%*
> Amean     system-32        22.31 (   0.00%)       22.47 *  -0.71%*
> Amean     elapsed-1        11.76 (   0.00%)       11.73 *   0.29%*
> Amean     elapsed-3         3.93 (   0.00%)        3.93 *   0.17%*
> Amean     elapsed-5         2.39 (   0.00%)        2.37 *   0.74%*
> Amean     elapsed-7         1.72 (   0.00%)        1.71 *   0.81%*
> Amean     elapsed-12        1.02 (   0.00%)        1.03 (  -0.42%)
> Amean     elapsed-18        1.13 (   0.00%)        1.14 (  -0.18%)
> Amean     elapsed-24        0.87 (   0.00%)        0.88 *  -0.65%*
> Amean     elapsed-30        0.77 (   0.00%)        0.78 *  -0.86%*
> Amean     elapsed-32        0.74 (   0.00%)        0.74 (   0.00%)
> 
> - HPE Apollo 70, 2 sockets, Cavium ThunderX2, 128-Cores, 2 NUMA nodes, arm64.
>   NOTE: The test here only goes up to 128 for some reason, although there are
>   256 CPUs. Maybe a mmtests issue? I didn't investigate.
> 
> pft timings:
>                                      vanilla		        rcu
> Amean     system-1           4.42 (   0.00%)        4.36 *   1.29%*
> Amean     system-4           4.56 (   0.00%)        4.51 *   1.05%*
> Amean     system-7           4.63 (   0.00%)        4.65 *  -0.42%*
> Amean     system-12          5.96 (   0.00%)        6.02 *  -1.00%*
> Amean     system-21         10.97 (   0.00%)       11.01 *  -0.32%*
> Amean     system-30         16.01 (   0.00%)       16.04 *  -0.19%*
> Amean     system-48         26.81 (   0.00%)       26.78 *   0.09%*
> Amean     system-79         30.80 (   0.00%)       30.85 *  -0.16%*
> Amean     system-110        31.87 (   0.00%)       31.93 *  -0.19%*
> Amean     system-128        36.27 (   0.00%)       36.31 *  -0.10%*
> Amean     elapsed-1          4.88 (   0.00%)        4.85 *   0.60%*
> Amean     elapsed-4          1.27 (   0.00%)        1.26 *   1.00%*
> Amean     elapsed-7          0.73 (   0.00%)        0.74 *  -0.46%*
> Amean     elapsed-12         0.55 (   0.00%)        0.55 *   1.09%*
> Amean     elapsed-21         0.59 (   0.00%)        0.60 *  -0.96%*
> Amean     elapsed-30         0.60 (   0.00%)        0.60 *   0.28%*
> Amean     elapsed-48         0.60 (   0.00%)        0.60 *   0.44%*
> Amean     elapsed-79         0.49 (   0.00%)        0.49 *  -0.07%*
> Amean     elapsed-110        0.36 (   0.00%)        0.36 *   0.28%*
> Amean     elapsed-128        0.31 (   0.00%)        0.31 *  -0.43%*
> 
> - Raspberry Pi 4, 1 socket, bcm2711, Cortex-A72, 4-Cores, 1 NUMA node, arm64.
> 
> pft timings:
>                                    vanilla		      rcu
> Amean     system-1         0.67 (   0.00%)        0.67 *  -1.25%*
> Amean     system-3         1.30 (   0.00%)        1.29 *   0.62%*
> Amean     system-4         1.61 (   0.00%)        1.59 *   0.95%*
> Amean     elapsed-1        0.71 (   0.00%)        0.72 *  -1.17%*
> Amean     elapsed-3        0.45 (   0.00%)        0.45 *   0.88%*
> Amean     elapsed-4        0.42 (   0.00%)        0.42 *   1.19%*
> 
> 

  parent reply	other threads:[~2022-02-10 10:59 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08 10:07 [PATCH 0/2] mm/page_alloc: Remote per-cpu lists drain support Nicolas Saenz Julienne
2022-02-08 10:07 ` [PATCH 1/2] mm/page_alloc: Access lists in 'struct per_cpu_pages' indirectly Nicolas Saenz Julienne
2022-03-03 14:33   ` Marcelo Tosatti
2022-02-08 10:07 ` [PATCH 2/2] mm/page_alloc: Add remote draining support to per-cpu lists Nicolas Saenz Julienne
2022-02-08 15:47   ` Marcelo Tosatti
2022-02-15  8:47     ` Nicolas Saenz Julienne
2022-02-15 17:32       ` Paul E. McKenney
2022-02-09  8:55 ` [PATCH 0/2] mm/page_alloc: Remote per-cpu lists drain support Xiongfeng Wang
2022-02-09  9:45   ` Nicolas Saenz Julienne
2022-02-09 11:26     ` Xiongfeng Wang
2022-02-09 11:36       ` Nicolas Saenz Julienne
2022-02-10 10:59 ` Xiongfeng Wang [this message]
2022-02-10 11:04   ` Nicolas Saenz Julienne
2022-03-03 11:45 ` Mel Gorman
2022-03-07 13:57   ` Nicolas Saenz Julienne
2022-03-10 16:31     ` Mel Gorman
2022-03-07 20:47   ` Marcelo Tosatti
2022-03-24 18:59   ` Nicolas Saenz Julienne
2022-03-25 10:48     ` Mel Gorman
2022-03-28 13:51       ` Nicolas Saenz Julienne
2022-03-29  9:45         ` Mel Gorman
2022-03-30 11:29   ` Nicolas Saenz Julienne
2022-03-31 15:24     ` Mel Gorman
2022-03-03 13:27 ` Vlastimil Babka
2022-03-03 14:10   ` Nicolas Saenz Julienne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8cf34eed-be5a-1aae-0523-c2de9e087cb6@huawei.com \
    --to=wangxiongfeng2@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mtosatti@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.