From: Mel Gorman <firstname.lastname@example.org> To: Andrew Morton <email@example.com> Cc: Hillf Danton <firstname.lastname@example.org>, Dave Hansen <email@example.com>, Vlastimil Babka <firstname.lastname@example.org>, Michal Hocko <email@example.com>, LKML <firstname.lastname@example.org>, Linux-MM <email@example.com>, Mel Gorman <firstname.lastname@example.org> Subject: [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs Date: Tue, 25 May 2021 09:01:13 +0100 [thread overview] Message-ID: <email@example.com> (raw) Changelog since v1 o Clarification comments o Sanity check pcp->high during reclaim (dhansen) o Handle vm.percpu_pagelist_high_fraction in zone_highsize (hdanton) o Sanity check pcp->batch versus pcp->high This series has pre-requisites in mmotm so for convenience it is also available at https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-pcpburst-v2r3 The per-cpu page allocator (PCP) is meant to reduce contention on the zone lock but the sizing of batch and high is archaic and neither takes the zone size into account or the number of CPUs local to a zone. With larger zones and more CPUs per node, the contention is getting worse. Furthermore, the fact that vm.percpu_pagelist_fraction adjusts both batch and high values means that the sysctl can reduce zone lock contention but also increase allocation latencies. This series disassociates pcp->high from pcp->batch and then scales pcp->high based on the size of the local zone with limited impact to reclaim and accounting for active CPUs but leaves pcp->batch static. It also adapts the number of pages that can be on the pcp list based on recent freeing patterns. The motivation is partially to adjust to larger memory sizes but is also driven by the fact that large batches of page freeing via release_pages() often shows zone contention as a major part of the problem. Another is a bug report based on an older kernel where a multi-terabyte process can takes several minutes to exit. A workaround was to use vm.percpu_pagelist_fraction to increase the pcp->high value but testing indicated that a production workload could not use the same values because of an increase in allocation latencies. Unfortunately, I cannot reproduce this test case myself as the multi-terabyte machines are in active use but it should alleviate the problem. The series aims to address both and partially acts as a pre-requisite. pcp only works with order-0 which is useless for SLUB (when using high orders) and THP (unconditionally). To store high-order pages on PCP, the pcp->high values need to be increased first. Documentation/admin-guide/sysctl/vm.rst | 29 ++-- include/linux/cpuhotplug.h | 2 +- include/linux/mmzone.h | 8 +- kernel/sysctl.c | 8 +- mm/internal.h | 2 +- mm/memory_hotplug.c | 4 +- mm/page_alloc.c | 196 ++++++++++++++++++------ mm/vmscan.c | 35 +++++ 8 files changed, 212 insertions(+), 72 deletions(-) -- 2.26.2
next reply other threads:[~2021-05-25 8:12 UTC|newest] Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-25 8:01 Mel Gorman [this message] 2021-05-25 8:01 ` [PATCH 1/6] mm/page_alloc: Delete vm.percpu_pagelist_fraction Mel Gorman 2021-05-26 17:41 ` Vlastimil Babka 2021-05-25 8:01 ` [PATCH 2/6] mm/page_alloc: Disassociate the pcp->high from pcp->batch Mel Gorman 2021-05-26 18:14 ` Vlastimil Babka 2021-05-27 10:52 ` Mel Gorman 2021-05-28 10:27 ` Vlastimil Babka 2021-05-25 8:01 ` [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events Mel Gorman 2021-05-28 11:08 ` Vlastimil Babka 2021-05-25 8:01 ` [PATCH 4/6] mm/page_alloc: Scale the number of pages that are batch freed Mel Gorman 2021-05-28 11:19 ` Vlastimil Babka 2021-05-25 8:01 ` [PATCH 5/6] mm/page_alloc: Limit the number of pages on PCP lists when reclaim is active Mel Gorman 2021-05-28 11:43 ` Vlastimil Babka 2021-05-25 8:01 ` [PATCH 6/6] mm/page_alloc: Introduce vm.percpu_pagelist_high_fraction Mel Gorman 2021-05-28 11:59 ` Vlastimil Babka 2021-05-28 12:53 ` Mel Gorman 2021-05-28 14:38 ` Vlastimil Babka 2021-05-27 19:36 ` [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs Dave Hansen 2021-05-28 8:55 ` Mel Gorman 2021-05-28 9:03 ` David Hildenbrand 2021-05-28 9:08 ` David Hildenbrand 2021-05-28 9:49 ` Mel Gorman 2021-05-28 9:52 ` David Hildenbrand 2021-05-28 10:09 ` Mel Gorman 2021-05-28 10:21 ` David Hildenbrand 2021-05-28 12:12 ` Vlastimil Babka 2021-05-28 12:37 ` Mel Gorman 2021-05-28 14:39 ` Dave Hansen 2021-05-28 15:18 ` Mel Gorman 2021-05-28 16:17 ` Dave Hansen 2021-05-31 12:00 ` Feng Tang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).