From: Ankur Arora <ankur.a.arora@oracle.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org
Cc: akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de,
dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
willy@infradead.org, mgorman@suse.de, peterz@infradead.org,
rostedt@goodmis.org, tglx@linutronix.de, jon.grimm@amd.com,
bharata@amd.com, raghavendra.kt@amd.com,
boris.ostrovsky@oracle.com, konrad.wilk@oracle.com,
Ankur Arora <ankur.a.arora@oracle.com>
Subject: [PATCH v2 6/9] x86/clear_huge_page: multi-page clearing
Date: Wed, 30 Aug 2023 11:49:55 -0700 [thread overview]
Message-ID: <20230830184958.2333078-7-ankur.a.arora@oracle.com> (raw)
In-Reply-To: <20230830184958.2333078-1-ankur.a.arora@oracle.com>
clear_pages_rep(), clear_pages_erms() clear using string instructions.
While clearing extents of more than a single page, we can use these
more effectively by explicitly advertising the region-size to the
processor.
This can be used as a hint by the processor-uarch to optimize the
clearing (ex. to avoid polluting one or more levels of the data-cache.)
As a secondary benefit, string instructions are typically microcoded,
and so it's a good idea to amortize the cost of the decode across larger
regions.
Accordingly, clear_huge_page() now does huge-page clearing in three
parts: the neighbourhood of the faulting address, the left, and the
right region of the neighbourhood.
The local neighbourhood is cleared last to keep its cachelines hot.
Performance
==
Use mmap(MAP_HUGETLB) to demand fault a 128GB region (on the local
NUMA node):
Milan (EPYC 7J13, boost=1):
mm/clear_huge_page x86/clear_huge_page change
(GB/s) (GB/s)
pg-sz=2MB 14.55 19.29 +32.5%
pg-sz=1GB 19.34 49.60 +156.4%
Milan uses a threshold of LLC-size (~32MB) for eliding cacheline
allocation, so we see a dropoff in cacheline-allocations for
pg-sz=1GB:
pg-sz=1GB:
-23,088,001,347 cycles # 3.487 GHz ( +- 0.08% ) (35.68%)
- 4,680,678,939 L1-dcache-loads # 706.831 M/sec ( +- 0.02% ) (35.74%)
- 2,150,395,280 L1-dcache-load-misses # 45.93% of all L1-dcache accesses ( +- 0.01% ) (35.74%)
+ 8,983,798,764 cycles # 3.489 GHz ( +- 0.05% ) (35.59%)
+ 18,294,725 L1-dcache-loads # 7.104 M/sec ( +- 18.88% ) (35.78%)
+ 6,677,565 L1-dcache-load-misses # 30.48% of all L1-dcache accesses ( +- 20.72% ) (35.78%)
That's not the case with pg-sz=2MB, where we perform better but the
number of cacheline allocations remain the same:
pg-sz=2MB:
-31,087,683,852 cycles # 3.494 GHz ( +- 0.17% ) (35.72%)
- 4,898,684,886 L1-dcache-loads # 550.627 M/sec ( +- 0.03% ) (35.71%)
- 2,161,434,236 L1-dcache-load-misses # 44.11% of all L1-dcache accesses ( +- 0.01% ) (35.71%)
+23,368,914,596 cycles # 3.480 GHz ( +- 0.27% ) (35.72%)
+ 4,481,808,430 L1-dcache-loads # 667.382 M/sec ( +- 0.03% ) (35.71%)
+ 2,170,453,309 L1-dcache-load-misses # 48.41% of all L1-dcache accesses ( +- 0.06% ) (35.71%)
Icelakex (Platinum 8358, no_turbo=0):
mm/clear_huge_page x86/clear_huge_page change
(GB/s) (GB/s)
pg-sz=2MB 9.19 12.94 +40.8%
pg-sz=1GB 9.36 12.97 +38.5%
For both page-sizes, Icelakex, behaves similarly to Milan pg-sz=2MB: we
see a drop in cycles but there's no drop in cacheline allocation.
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/x86/mm/hugetlbpage.c | 54 +++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 5804bbae4f01..0b9f7a6dad93 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -148,6 +148,60 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
return hugetlb_get_unmapped_area_topdown(file, addr, len,
pgoff, flags);
}
+
+#ifndef CONFIG_HIGHMEM
+static void clear_contig_region(struct page *page, unsigned int npages)
+{
+ clear_pages(page_address(page), npages);
+}
+
+/*
+ * clear_huge_page(): multi-page clearing variant of clear_huge_page().
+ *
+ * Taking inspiration from the common code variant, we split the zeroing in
+ * three parts: left of the fault, right of the fault, and up to 5 pages
+ * in the immediate neighbourhood of the target page.
+ *
+ * Cleared in that order to keep cache lines of the target region hot.
+ *
+ * For gigantic pages, there is no expectation of cache locality so we do a
+ * straight zeroing.
+ */
+void clear_huge_page(struct page *page,
+ unsigned long addr_hint, unsigned int pages_per_huge_page)
+{
+ unsigned long addr = addr_hint &
+ ~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1);
+ const long pgidx = (addr_hint - addr) / PAGE_SIZE;
+ const int first_pg = 0, last_pg = pages_per_huge_page - 1;
+ const int width = 2; /* pages cleared last on either side */
+ int sidx[3], eidx[3];
+ int i, n;
+
+ if (pages_per_huge_page > MAX_ORDER_NR_PAGES)
+ return clear_contig_region(page, pages_per_huge_page);
+
+ /*
+ * Neighbourhood of the fault. Cleared at the end to ensure
+ * it sticks around in the cache.
+ */
+ n = 2;
+ sidx[n] = (pgidx - width) < first_pg ? first_pg : (pgidx - width);
+ eidx[n] = (pgidx + width) > last_pg ? last_pg : (pgidx + width);
+
+ sidx[0] = first_pg; /* Region to the left of the fault */
+ eidx[0] = sidx[n] - 1;
+
+ sidx[1] = eidx[n] + 1; /* Region to the right of the fault */
+ eidx[1] = last_pg;
+
+ for (i = 0; i <= 2; i++) {
+ if (eidx[i] >= sidx[i])
+ clear_contig_region(page + sidx[i],
+ eidx[i] - sidx[i] + 1);
+ }
+}
+#endif /* CONFIG_HIGHMEM */
#endif /* CONFIG_HUGETLB_PAGE */
#ifdef CONFIG_X86_64
--
2.31.1
next prev parent reply other threads:[~2023-08-30 20:24 UTC|newest]
Thread overview: 152+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-30 18:49 [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-30 18:49 ` [PATCH v2 1/9] mm/clear_huge_page: allow arch override for clear_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 2/9] mm/huge_page: separate clear_huge_page() and copy_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 3/9] mm/huge_page: cleanup clear_/copy_subpage() Ankur Arora
2023-09-08 13:09 ` Matthew Wilcox
2023-09-11 17:22 ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 4/9] x86/clear_page: extend clear_page*() for multi-page clearing Ankur Arora
2023-09-08 13:11 ` Matthew Wilcox
2023-08-30 18:49 ` [PATCH v2 5/9] x86/clear_page: add clear_pages() Ankur Arora
2023-08-30 18:49 ` Ankur Arora [this message]
2023-08-31 18:26 ` [PATCH v2 6/9] x86/clear_huge_page: multi-page clearing kernel test robot
2023-09-08 12:38 ` Peter Zijlstra
2023-09-13 6:43 ` Raghavendra K T
2023-08-30 18:49 ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-08 7:02 ` Peter Zijlstra
2023-09-08 17:15 ` Linus Torvalds
2023-09-08 22:50 ` Peter Zijlstra
2023-09-09 5:15 ` Linus Torvalds
2023-09-09 6:39 ` Ankur Arora
2023-09-09 9:11 ` Peter Zijlstra
2023-09-09 20:04 ` Ankur Arora
2023-09-09 5:30 ` Ankur Arora
2023-09-09 9:12 ` Peter Zijlstra
2023-09-09 20:15 ` Ankur Arora
2023-09-09 21:16 ` Linus Torvalds
2023-09-10 3:48 ` Ankur Arora
2023-09-10 4:35 ` Linus Torvalds
2023-09-10 10:01 ` Ankur Arora
2023-09-10 18:32 ` Linus Torvalds
2023-09-11 15:04 ` Peter Zijlstra
2023-09-11 16:29 ` andrew.cooper3
2023-09-11 17:04 ` Ankur Arora
2023-09-12 8:26 ` Peter Zijlstra
2023-09-12 12:24 ` Phil Auld
2023-09-12 12:33 ` Matthew Wilcox
2023-09-18 23:42 ` Thomas Gleixner
2023-09-19 1:57 ` Linus Torvalds
2023-09-19 8:03 ` Ingo Molnar
2023-09-19 8:43 ` Ingo Molnar
2023-09-19 13:43 ` Thomas Gleixner
2023-09-19 13:25 ` Thomas Gleixner
2023-09-19 12:30 ` Thomas Gleixner
2023-09-19 13:00 ` Arches that don't support PREEMPT Matthew Wilcox
2023-09-19 13:34 ` Geert Uytterhoeven
2023-09-19 13:37 ` John Paul Adrian Glaubitz
2023-09-19 13:42 ` Peter Zijlstra
2023-09-19 13:48 ` John Paul Adrian Glaubitz
2023-09-19 14:16 ` Peter Zijlstra
2023-09-19 14:24 ` John Paul Adrian Glaubitz
2023-09-19 14:32 ` Matthew Wilcox
2023-09-19 15:31 ` Steven Rostedt
2023-09-20 14:38 ` Anton Ivanov
2023-09-21 12:20 ` Arnd Bergmann
2023-09-19 14:17 ` Thomas Gleixner
2023-09-19 14:50 ` H. Peter Anvin
2023-09-19 14:57 ` Matt Turner
2023-09-19 17:09 ` Ulrich Teichert
2023-09-19 17:25 ` Linus Torvalds
2023-09-19 17:58 ` John Paul Adrian Glaubitz
2023-09-19 18:31 ` Thomas Gleixner
2023-09-19 18:38 ` Steven Rostedt
2023-09-19 18:52 ` Linus Torvalds
2023-09-19 19:53 ` Thomas Gleixner
2023-09-20 7:32 ` Ingo Molnar
2023-09-20 7:29 ` Ingo Molnar
2023-09-20 8:26 ` Thomas Gleixner
2023-09-20 10:37 ` David Laight
2023-09-19 14:21 ` Anton Ivanov
2023-09-19 15:17 ` Thomas Gleixner
2023-09-19 15:21 ` Anton Ivanov
2023-09-19 16:22 ` Richard Weinberger
2023-09-19 16:41 ` Anton Ivanov
2023-09-19 17:33 ` Thomas Gleixner
2023-10-06 14:51 ` Geert Uytterhoeven
2023-09-20 14:22 ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-20 20:51 ` Thomas Gleixner
2023-09-21 0:14 ` Thomas Gleixner
2023-09-21 0:58 ` Ankur Arora
2023-09-21 2:12 ` Thomas Gleixner
2023-09-20 23:58 ` Thomas Gleixner
2023-09-21 0:57 ` Ankur Arora
2023-09-21 2:02 ` Thomas Gleixner
2023-09-21 4:16 ` Ankur Arora
2023-09-21 13:59 ` Steven Rostedt
2023-09-21 16:00 ` Linus Torvalds
2023-09-21 22:55 ` Thomas Gleixner
2023-09-23 1:11 ` Thomas Gleixner
2023-10-02 14:15 ` Steven Rostedt
2023-10-02 16:13 ` Thomas Gleixner
2023-10-18 1:03 ` Paul E. McKenney
2023-10-18 12:09 ` Ankur Arora
2023-10-18 17:51 ` Paul E. McKenney
2023-10-18 22:53 ` Thomas Gleixner
2023-10-18 23:25 ` Paul E. McKenney
2023-10-18 13:16 ` Thomas Gleixner
2023-10-18 14:31 ` Steven Rostedt
2023-10-18 17:55 ` Paul E. McKenney
2023-10-18 18:00 ` Steven Rostedt
2023-10-18 18:13 ` Paul E. McKenney
2023-10-19 12:37 ` Daniel Bristot de Oliveira
2023-10-19 17:08 ` Paul E. McKenney
2023-10-18 17:19 ` Paul E. McKenney
2023-10-18 17:41 ` Steven Rostedt
2023-10-18 17:59 ` Paul E. McKenney
2023-10-18 20:15 ` Ankur Arora
2023-10-18 20:42 ` Paul E. McKenney
2023-10-19 0:21 ` Thomas Gleixner
2023-10-19 19:13 ` Paul E. McKenney
2023-10-20 21:59 ` Paul E. McKenney
2023-10-20 22:56 ` Ankur Arora
2023-10-20 23:36 ` Paul E. McKenney
2023-10-21 1:05 ` Ankur Arora
2023-10-21 2:08 ` Paul E. McKenney
2023-10-24 12:15 ` Thomas Gleixner
2023-10-24 18:59 ` Paul E. McKenney
2023-09-23 22:50 ` Thomas Gleixner
2023-09-24 0:10 ` Thomas Gleixner
2023-09-24 7:19 ` Matthew Wilcox
2023-09-24 7:55 ` Thomas Gleixner
2023-09-24 10:29 ` Matthew Wilcox
2023-09-25 0:13 ` Ankur Arora
2023-10-06 13:01 ` Geert Uytterhoeven
2023-09-19 7:21 ` Ingo Molnar
2023-09-19 19:05 ` Ankur Arora
2023-10-24 14:34 ` Steven Rostedt
2023-10-25 1:49 ` Steven Rostedt
2023-10-26 7:50 ` Sergey Senozhatsky
2023-10-26 12:48 ` Steven Rostedt
2023-09-11 16:48 ` Steven Rostedt
2023-09-11 20:50 ` Linus Torvalds
2023-09-11 21:16 ` Linus Torvalds
2023-09-12 7:20 ` Peter Zijlstra
2023-09-12 7:38 ` Ingo Molnar
2023-09-11 22:20 ` Steven Rostedt
2023-09-11 23:10 ` Ankur Arora
2023-09-11 23:16 ` Steven Rostedt
2023-09-12 16:30 ` Linus Torvalds
2023-09-12 3:27 ` Matthew Wilcox
2023-09-12 16:20 ` Linus Torvalds
2023-09-19 3:21 ` Andy Lutomirski
2023-09-19 9:20 ` Thomas Gleixner
2023-09-19 9:49 ` Ingo Molnar
2023-08-30 18:49 ` [PATCH v2 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-09-08 12:42 ` Peter Zijlstra
2023-09-11 17:24 ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-09-08 12:45 ` Peter Zijlstra
2023-09-03 8:14 ` [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Mateusz Guzik
2023-09-05 22:14 ` Ankur Arora
2023-09-08 2:18 ` Raghavendra K T
2023-09-05 1:06 ` Raghavendra K T
2023-09-05 19:36 ` Ankur Arora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230830184958.2333078-7-ankur.a.arora@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jon.grimm@amd.com \
--cc=juri.lelli@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).