linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs
@ 2021-05-25  8:01 Mel Gorman
  2021-05-25  8:01 ` [PATCH 1/6] mm/page_alloc: Delete vm.percpu_pagelist_fraction Mel Gorman
                   ` (6 more replies)
  0 siblings, 7 replies; 34+ messages in thread
From: Mel Gorman @ 2021-05-25  8:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hillf Danton, Dave Hansen, Vlastimil Babka, Michal Hocko, LKML,
	Linux-MM, Mel Gorman

Changelog since v1
o Clarification comments
o Sanity check pcp->high during reclaim				(dhansen)
o Handle vm.percpu_pagelist_high_fraction in zone_highsize	(hdanton)
o Sanity check pcp->batch versus pcp->high

This series has pre-requisites in mmotm so for convenience it is also
available at

https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-pcpburst-v2r3

The per-cpu page allocator (PCP) is meant to reduce contention on the zone
lock but the sizing of batch and high is archaic and neither takes the zone
size into account or the number of CPUs local to a zone. With larger zones
and more CPUs per node, the contention is getting worse. Furthermore,
the fact that vm.percpu_pagelist_fraction adjusts both batch and high
values means that the sysctl can reduce zone lock contention but also
increase allocation latencies.

This series disassociates pcp->high from pcp->batch and then scales
pcp->high based on the size of the local zone with limited impact to
reclaim and accounting for active CPUs but leaves pcp->batch static.
It also adapts the number of pages that can be on the pcp list based on
recent freeing patterns.

The motivation is partially to adjust to larger memory sizes but
is also driven by the fact that large batches of page freeing via
release_pages() often shows zone contention as a major part of the
problem. Another is a bug report based on an older kernel where a
multi-terabyte process can takes several minutes to exit. A workaround
was to use vm.percpu_pagelist_fraction to increase the pcp->high value
but testing indicated that a production workload could not use the same
values because of an increase in allocation latencies. Unfortunately,
I cannot reproduce this test case myself as the multi-terabyte machines
are in active use but it should alleviate the problem.

The series aims to address both and partially acts as a pre-requisite. pcp
only works with order-0 which is useless for SLUB (when using high orders)
and THP (unconditionally). To store high-order pages on PCP, the pcp->high
values need to be increased first.

 Documentation/admin-guide/sysctl/vm.rst |  29 ++--
 include/linux/cpuhotplug.h              |   2 +-
 include/linux/mmzone.h                  |   8 +-
 kernel/sysctl.c                         |   8 +-
 mm/internal.h                           |   2 +-
 mm/memory_hotplug.c                     |   4 +-
 mm/page_alloc.c                         | 196 ++++++++++++++++++------
 mm/vmscan.c                             |  35 +++++
 8 files changed, 212 insertions(+), 72 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 34+ messages in thread
* [RFC PATCH 0/6] Calculate pcp->high based on zone sizes and active CPUs
@ 2021-05-21 10:28 Mel Gorman
  2021-05-21 10:28 ` [PATCH 4/6] mm/page_alloc: Scale the number of pages that are batch freed Mel Gorman
  0 siblings, 1 reply; 34+ messages in thread
From: Mel Gorman @ 2021-05-21 10:28 UTC (permalink / raw)
  To: Linux-MM
  Cc: Dave Hansen, Matthew Wilcox, Vlastimil Babka, Michal Hocko,
	Nicholas Piggin, LKML, Mel Gorman

The per-cpu page allocator (PCP) is meant to reduce contention on the zone
lock but the sizing of batch and high is archaic and neither takes the zone
size into account or the number of CPUs local to a zone. Furthermore, the
fact that vm.percpu_pagelist_fraction adjusts both batch and high values
means that the sysctl can reduce zone lock contention but also increase
allocation latencies.

This series disassociates pcp->high from pcp->batch and then scales
pcp->high based on the size of the local zone with limited impact to
reclaim and accounting for active CPUs but leaves pcp->batch static.
It also adapts the number of pages that can be on the pcp list based on
recent freeing patterns.

The motivation is partially to adjust to larger memory sizes but
is also driven by the fact that large batches of page freeing via
release_pages() often shows zone contention as a major part of the
problem. Another is a bug report based on an older kernel where a
multi-terabyte process can takes several minutes to exit. A workaround
was to use vm.percpu_pagelist_fraction to increase the pcp->high value
but testing indicated that a production workload could not use the same
values because of an increase in allocation latencies. Unfortunately,
I cannot reproduce this test case myself as the multi-terabyte machines
are in active use but it should alleviate the problem.

The series aims to address both and partially acts as a pre-requisite. pcp
only works with order-0 which is useless for SLUB (when using high orders)
and THP (unconditionally). To store high-order pages on PCP, the pcp->high
values need to be increased first.

 Documentation/admin-guide/sysctl/vm.rst |  19 +--
 include/linux/cpuhotplug.h              |   2 +-
 include/linux/mmzone.h                  |   8 +-
 kernel/sysctl.c                         |   8 +-
 mm/internal.h                           |   2 +-
 mm/memory_hotplug.c                     |   4 +-
 mm/page_alloc.c                         | 166 +++++++++++++++++-------
 mm/vmscan.c                             |  35 +++++
 8 files changed, 179 insertions(+), 65 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-05-31 12:00 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-25  8:01 [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs Mel Gorman
2021-05-25  8:01 ` [PATCH 1/6] mm/page_alloc: Delete vm.percpu_pagelist_fraction Mel Gorman
2021-05-26 17:41   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 2/6] mm/page_alloc: Disassociate the pcp->high from pcp->batch Mel Gorman
2021-05-26 18:14   ` Vlastimil Babka
2021-05-27 10:52     ` Mel Gorman
2021-05-28 10:27       ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events Mel Gorman
2021-05-28 11:08   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 4/6] mm/page_alloc: Scale the number of pages that are batch freed Mel Gorman
2021-05-28 11:19   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 5/6] mm/page_alloc: Limit the number of pages on PCP lists when reclaim is active Mel Gorman
2021-05-28 11:43   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 6/6] mm/page_alloc: Introduce vm.percpu_pagelist_high_fraction Mel Gorman
2021-05-28 11:59   ` Vlastimil Babka
2021-05-28 12:53     ` Mel Gorman
2021-05-28 14:38       ` Vlastimil Babka
2021-05-27 19:36 ` [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs Dave Hansen
2021-05-28  8:55   ` Mel Gorman
2021-05-28  9:03     ` David Hildenbrand
2021-05-28  9:08       ` David Hildenbrand
2021-05-28  9:49         ` Mel Gorman
2021-05-28  9:52           ` David Hildenbrand
2021-05-28 10:09             ` Mel Gorman
2021-05-28 10:21               ` David Hildenbrand
2021-05-28 12:12     ` Vlastimil Babka
2021-05-28 12:37       ` Mel Gorman
2021-05-28 14:39     ` Dave Hansen
2021-05-28 15:18       ` Mel Gorman
2021-05-28 16:17         ` Dave Hansen
2021-05-31 12:00           ` Feng Tang
  -- strict thread matches above, loose matches on Subject: below --
2021-05-21 10:28 [RFC PATCH 0/6] " Mel Gorman
2021-05-21 10:28 ` [PATCH 4/6] mm/page_alloc: Scale the number of pages that are batch freed Mel Gorman
2021-05-21 22:36   ` Dave Hansen
2021-05-24  9:12     ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).