All of lore.kernel.org
 help / color / mirror / Atom feed
* + mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch added to -mm tree
@ 2019-10-21 22:37 akpm
  0 siblings, 0 replies; 2+ messages in thread
From: akpm @ 2019-10-21 22:37 UTC (permalink / raw)
  To: mm-commits, vbabka, tglx, stable, mhocko, matt, cai, bp, mgorman


The patch titled
     Subject: mm, meminit: recalculate pcpu batch and high limits after init completes
has been added to the -mm tree.  Its filename is
     mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm, meminit: recalculate pcpu batch and high limits after init completes

Deferred memory initialisation updates zone->managed_pages during the
initialisation phase but before that finishes, the per-cpu page allocator
(pcpu) calculates the number of pages allocated/freed in batches as well
as the maximum number of pages allowed on a per-cpu list.  As
zone->managed_pages is not up to date yet, the pcpu initialisation
calculates inappropriately low batch and high values.

This increases zone lock contention quite severely in some cases with the
degree of severity depending on how many CPUs share a local zone and the
size of the zone.  A private report indicated that kernel build times were
excessive with extremely high system CPU usage.  A perf profile indicated
that a large chunk of time was lost on zone->lock contention.

This patch recalculates the pcpu batch and high values after deferred
initialisation completes for every populated zone in the system.  It was
tested on a 2-socket AMD EPYC 2 machine using a kernel compilation
workload -- allmodconfig and all available CPUs.

mmtests configuration: config-workload-kernbench-max Configuration was
modified to build on a fresh XFS partition.

kernbench
                                5.4.0-rc3              5.4.0-rc3
                                  vanilla           resetpcpu-v2
Amean     user-256    13249.50 (   0.00%)    16401.31 * -23.79%*
Amean     syst-256    14760.30 (   0.00%)     4448.39 *  69.86%*
Amean     elsp-256      162.42 (   0.00%)      119.13 *  26.65%*
Stddev    user-256       42.97 (   0.00%)       19.15 (  55.43%)
Stddev    syst-256      336.87 (   0.00%)        6.71 (  98.01%)
Stddev    elsp-256        2.46 (   0.00%)        0.39 (  84.03%)

                   5.4.0-rc3    5.4.0-rc3
                     vanilla resetpcpu-v2
Duration User       39766.24     49221.79
Duration System     44298.10     13361.67
Duration Elapsed      519.11       388.87

The patch reduces system CPU usage by 69.86% and total build time by
26.65%. The variance of system CPU usage is also much reduced.

Before, this was the breakdown of batch and high values over all zones was.

    256               batch: 1
    256               batch: 63
    512               batch: 7
    256               high:  0
    256               high:  378
    512               high:  42

512 pcpu pagesets had a batch limit of 7 and a high limit of 42.  After
the patch:

    256               batch: 1
    768               batch: 63
    256               high:  0
    768               high:  378

Link: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org>	[4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/mm/page_alloc.c~mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes
+++ a/mm/page_alloc.c
@@ -1948,6 +1948,14 @@ void __init page_alloc_init_late(void)
 	wait_for_completion(&pgdat_init_all_done_comp);
 
 	/*
+	 * The number of managed pages has changed due to the initialisation
+	 * so the pcpu batch and high limits needs to be updated or the limits
+	 * will be artificially small.
+	 */
+	for_each_populated_zone(zone)
+		zone_pcp_update(zone);
+
+	/*
 	 * We initialized the rest of the deferred pages.  Permanently disable
 	 * on-demand struct page initialization.
 	 */
_

Patches currently in -mm which might be from mgorman@techsingularity.net are

mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch
mm-pcp-share-common-code-between-memory-hotplug-and-percpu-sysctl-handler.patch
mm-pcpu-make-zone-pcp-updates-and-reset-internal-to-the-mm.patch


^ permalink raw reply	[flat|nested] 2+ messages in thread

* + mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch added to -mm tree
@ 2019-10-19  1:33 akpm
  0 siblings, 0 replies; 2+ messages in thread
From: akpm @ 2019-10-19  1:33 UTC (permalink / raw)
  To: bp, matt, mgorman, mhocko, mm-commits, stable, tglx, vbabka


The patch titled
     Subject: mm, meminit: recalculate pcpu batch and high limits after init completes
has been added to the -mm tree.  Its filename is
     mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm, meminit: recalculate pcpu batch and high limits after init completes

Deferred memory initialisation updates zone->managed_pages during the
initialisation phase but before that finishes, the per-cpu page allocator
(pcpu) calculates the number of pages allocated/freed in batches as well
as the maximum number of pages allowed on a per-cpu list.  As
zone->managed_pages is not up to date yet, the pcpu initialisation
calculates inappropriately low batch and high values.

This increases zone lock contention quite severely in some cases with the
degree of severity depending on how many CPUs share a local zone and the
size of the zone.  A private report indicated that kernel build times were
excessive with extremely high system CPU usage.  A perf profile indicated
that a large chunk of time was lost on zone->lock contention.

This patch recalculates the pcpu batch and high values after deferred
initialisation completes on each node.  It was tested on a 2-socket AMD
EPYC 2 machine using a kernel compilation workload -- allmodconfig and all
available CPUs.

mmtests configuration: config-workload-kernbench-max Configuration was
modified to build on a fresh XFS partition.

kernbench
                                5.4.0-rc3              5.4.0-rc3
                                  vanilla         resetpcpu-v1r1
Amean     user-256    13249.50 (   0.00%)    15928.40 * -20.22%*
Amean     syst-256    14760.30 (   0.00%)     4551.77 *  69.16%*
Amean     elsp-256      162.42 (   0.00%)      118.46 *  27.06%*
Stddev    user-256       42.97 (   0.00%)       50.83 ( -18.30%)
Stddev    syst-256      336.87 (   0.00%)       33.70 (  90.00%)
Stddev    elsp-256        2.46 (   0.00%)        0.81 (  67.01%)

                   5.4.0-rc3   5.4.0-rc3
                     vanillaresetpcpu-v1r1
Duration User       39766.24    47802.92
Duration System     44298.10    13671.93
Duration Elapsed      519.11      387.65

The patch reduces system CPU usage by 69.16% and total build time by
27.06%.  The variance of system CPU usage is also much reduced.

Link: http://lkml.kernel.org/r/20191018105606.3249-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org>	[4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/mm/page_alloc.c~mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes
+++ a/mm/page_alloc.c
@@ -1818,6 +1818,14 @@ static int __init deferred_init_memmap(v
 	 */
 	while (spfn < epfn)
 		nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
+
+	/*
+	 * The number of managed pages has changed due to the initialisation
+	 * so the pcpu batch and high limits needs to be updated or the limits
+	 * will be artificially small.
+	 */
+	zone_pcp_update(zone);
+
 zone_empty:
 	pgdat_resize_unlock(pgdat, &flags);
 
@@ -8518,7 +8526,6 @@ void free_contig_range(unsigned long pfn
 	WARN(count != 0, "%d pages are still in use!\n", count);
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG
 /*
  * The zone indicated has a new number of managed_pages; batch sizes and percpu
  * page high values need to be recalulated.
@@ -8529,7 +8536,6 @@ void __meminit zone_pcp_update(struct zo
 	__zone_pcp_update(zone);
 	mutex_unlock(&pcp_batch_high_lock);
 }
-#endif
 
 void zone_pcp_reset(struct zone *zone)
 {
_

Patches currently in -mm which might be from mgorman@techsingularity.net are

mm-pcp-share-common-code-between-memory-hotplug-and-percpu-sysctl-handler.patch
mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch
mm-pcpu-make-zone-pcp-updates-and-reset-internal-to-the-mm.patch


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-10-21 22:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-21 22:37 + mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch added to -mm tree akpm
  -- strict thread matches above, loose matches on Subject: below --
2019-10-19  1:33 akpm

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.