From: Raghavendra K T <raghavendra.kt@amd.com>
To: Ankur Arora <ankur.a.arora@oracle.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org
Cc: akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de,
dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
willy@infradead.org, mgorman@suse.de, peterz@infradead.org,
rostedt@goodmis.org, tglx@linutronix.de, jon.grimm@amd.com,
bharata@amd.com, boris.ostrovsky@oracle.com,
konrad.wilk@oracle.com
Subject: Re: [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing
Date: Tue, 5 Sep 2023 06:36:33 +0530 [thread overview]
Message-ID: <2b79ab3b-56e7-926f-49f0-4c2584f6a72b@amd.com> (raw)
In-Reply-To: <20230830184958.2333078-1-ankur.a.arora@oracle.com>
On 8/31/2023 12:19 AM, Ankur Arora wrote:
> This series adds a multi-page clearing primitive, clear_pages(),
> which enables more effective use of x86 string instructions by
> advertising the real region-size to be cleared.
>
> Region-size can be used as a hint by uarchs to optimize the
> clearing.
>
> Also add allow_resched() which marks a code-section as allowing
> rescheduling in the irqentry_exit path. This allows clear_pages()
> to get by without having to call cond_sched() periodically.
> (preempt_model_full() already handles this via
> irqentry_exit_cond_resched(), so we handle this similarly for
> preempt_model_none() and preempt_model_voluntary().)
>
>
Hello Ankur,
Thansk for the patches.
I tried the patches, Improvements look similar to V1 (even without
circuitous chunk optimizations.)
STill we see similar 50-60% improvement for 1G and 2M page sizes.
SUT: Bergamo
CPU family: 25
Model: 160
Thread(s) per core: 2
Core(s) per socket: 128
Socket(s): 2
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-127,256-383
NUMA node1 CPU(s): 128-255,384-511
Test: Use mmap(MAP_HUGETLB) to demand a fault on 64GB region (NUMA
node0), for both base-hugepage-size=2M and 1GB
Current result is with thp = always, but madv also did not make much
difference.
perf stat -r 10 -d -d numactl -m 0 -N 0 <test>
time in seconds elapsed (average of 10 runs) (lower = better)
Result:
base: mm/clear_huge_page
patched: x86/clear_huge_page
page-size base patched Improvement %
2M 5.0779 2.50623 50.64
1G 2.50623 1.012439 59.60
More details:
Performance counter stats for 'mm/map_hugetlb' (10 runs):
5,058.71 msec task-clock # 0.996 CPUs
utilized ( +- 0.26% )
8 context-switches # 1.576 /sec
( +- 7.23% )
0 cpu-migrations # 0.000 /sec
32,917 page-faults # 6.484 K/sec
( +- 0.00% )
15,797,804,067 cycles # 3.112 GHz
( +- 0.26% ) (35.70%)
2,073,754 stalled-cycles-frontend # 0.01% frontend
cycles idle ( +- 1.25% ) (35.71%)
27,508,977 stalled-cycles-backend # 0.17% backend
cycles idle ( +- 9.48% ) (35.74%)
1,143,710,651 instructions # 0.07 insn per cycle
# 0.03 stalled
cycles per insn ( +- 0.15% ) (35.76%)
243,817,330 branches # 48.028 M/sec
( +- 0.12% ) (35.78%)
357,760 branch-misses # 0.15% of all
branches ( +- 1.52% ) (35.75%)
2,540,733,497 L1-dcache-loads # 500.483 M/sec
( +- 0.04% ) (35.74%)
1,093,660,557 L1-dcache-load-misses # 42.98% of all
L1-dcache accesses ( +- 0.03% ) (35.71%)
73,335,478 L1-icache-loads # 14.446 M/sec
( +- 0.08% ) (35.70%)
878,378 L1-icache-load-misses # 1.19% of all
L1-icache accesses ( +- 2.65% ) (35.68%)
1,025,714 dTLB-loads # 202.049 K/sec
( +- 2.70% ) (35.69%)
405,407 dTLB-load-misses # 37.35% of all
dTLB cache accesses ( +- 1.59% ) (35.68%)
2 iTLB-loads # 0.394 /sec
( +- 41.63% ) (35.68%)
40,356 iTLB-load-misses # 1552153.85% of all
iTLB cache accesses ( +- 7.18% ) (35.68%)
5.0779 +- 0.0132 seconds time elapsed ( +- 0.26% )
Performance counter stats for 'numactl -m 0 -N 0 x86/map_hugetlb' (10
runs):
2,538.40 msec task-clock # 1.013 CPUs
utilized ( +- 0.27% )
4 context-switches # 1.597 /sec
( +- 6.51% )
1 cpu-migrations # 0.399 /sec
32,916 page-faults # 13.140 K/sec
( +- 0.00% )
7,901,830,782 cycles # 3.154 GHz
( +- 0.27% ) (35.67%)
6,590,473 stalled-cycles-frontend # 0.08% frontend
cycles idle ( +- 10.31% ) (35.71%)
329,970,288 stalled-cycles-backend # 4.23% backend
cycles idle ( +- 13.65% ) (35.74%)
725,811,962 instructions # 0.09 insn per cycle
# 0.80 stalled
cycles per insn ( +- 0.37% ) (35.78%)
132,182,704 branches # 52.767 M/sec
( +- 0.26% ) (35.82%)
254,163 branch-misses # 0.19% of all
branches ( +- 2.47% ) (35.81%)
2,382,927,453 L1-dcache-loads # 951.262 M/sec
( +- 0.04% ) (35.77%)
1,082,022,067 L1-dcache-load-misses # 45.41% of all
L1-dcache accesses ( +- 0.02% ) (35.74%)
47,164,491 L1-icache-loads # 18.828 M/sec
( +- 0.37% ) (35.70%)
474,535 L1-icache-load-misses # 0.99% of all
L1-icache accesses ( +- 2.93% ) (35.66%)
1,477,334 dTLB-loads # 589.750 K/sec
( +- 5.12% ) (35.65%)
624,125 dTLB-load-misses # 56.24% of all
dTLB cache accesses ( +- 5.66% ) (35.65%)
0 iTLB-loads # 0.000 /sec
(35.65%)
1,626 iTLB-load-misses # 7069.57% of all
iTLB cache accesses ( +-283.51% ) (35.65%)
2.50623 +- 0.00691 seconds time elapsed ( +- 0.28% )
Performance counter stats for 'numactl -m 0 -N 0 mm/map_hugetlb_1G'
(10 runs):
2,506.50 msec task-clock # 0.995 CPUs
utilized ( +- 0.17% )
4 context-switches # 1.589 /sec
( +- 9.28% )
0 cpu-migrations # 0.000 /sec
214 page-faults # 84.997 /sec
( +- 0.13% )
7,821,519,053 cycles # 3.107 GHz
( +- 0.17% ) (35.72%)
2,037,744 stalled-cycles-frontend # 0.03% frontend
cycles idle ( +- 25.62% ) (35.73%)
6,578,899 stalled-cycles-backend # 0.08% backend
cycles idle ( +- 2.65% ) (35.73%)
468,648,780 instructions # 0.06 insn per cycle
# 0.01 stalled
cycles per insn ( +- 0.10% ) (35.73%)
116,267,370 branches # 46.179 M/sec
( +- 0.08% ) (35.73%)
111,966 branch-misses # 0.10% of all
branches ( +- 2.98% ) (35.72%)
2,294,727,165 L1-dcache-loads # 911.424 M/sec
( +- 0.02% ) (35.71%)
1,076,156,463 L1-dcache-load-misses # 46.88% of all
L1-dcache accesses ( +- 0.01% ) (35.70%)
26,093,151 L1-icache-loads # 10.364 M/sec
( +- 0.21% ) (35.71%)
132,944 L1-icache-load-misses # 0.51% of all
L1-icache accesses ( +- 0.55% ) (35.70%)
30,925 dTLB-loads # 12.283 K/sec
( +- 5.70% ) (35.71%)
27,437 dTLB-load-misses # 86.22% of all
dTLB cache accesses ( +- 1.98% ) (35.70%)
0 iTLB-loads # 0.000 /sec
(35.71%)
11 iTLB-load-misses # 62.50% of all
iTLB cache accesses ( +-140.21% ) (35.70%)
2.51890 +- 0.00433 seconds time elapsed ( +- 0.17% )
Performance counter stats for 'numactl -m 0 -N 0 x86/map_hugetlb_1G'
(10 runs):
1,013.59 msec task-clock # 1.001 CPUs
utilized ( +- 0.07% )
2 context-switches # 1.978 /sec
( +- 12.91% )
1 cpu-migrations # 0.989 /sec
213 page-faults # 210.634 /sec
( +- 0.17% )
3,169,391,694 cycles # 3.134 GHz
( +- 0.07% ) (35.53%)
109,925 stalled-cycles-frontend # 0.00% frontend
cycles idle ( +- 5.56% ) (35.63%)
950,638,913 stalled-cycles-backend # 30.06% backend
cycles idle ( +- 5.06% ) (35.73%)
51,189,571 instructions # 0.02 insn per cycle
# 21.03 stalled
cycles per insn ( +- 1.22% ) (35.82%)
9,545,941 branches # 9.440 M/sec
( +- 1.50% ) (35.92%)
86,836 branch-misses # 0.88% of all
branches ( +- 3.74% ) (36.00%)
46,109,587 L1-dcache-loads # 45.597 M/sec
( +- 3.92% ) (35.96%)
13,796,172 L1-dcache-load-misses # 41.77% of all
L1-dcache accesses ( +- 4.81% ) (35.85%)
1,179,166 L1-icache-loads # 1.166 M/sec
( +- 1.22% ) (35.77%)
21,528 L1-icache-load-misses # 1.90% of all
L1-icache accesses ( +- 1.85% ) (35.66%)
14,529 dTLB-loads # 14.368 K/sec
( +- 4.65% ) (35.57%)
8,505 dTLB-load-misses # 67.88% of all
dTLB cache accesses ( +- 5.61% ) (35.52%)
0 iTLB-loads # 0.000 /sec
(35.52%)
8 iTLB-load-misses # 0.00% of all
iTLB cache accesses ( +-267.99% ) (35.52%)
1.012439 +- 0.000723 seconds time elapsed ( +- 0.07% )
Please feel free to carry:
Tested-by: Raghavendra K T <raghavendra.kt@amd.com>
for any minor changes.
Thanks and Regards
- Raghu
next prev parent reply other threads:[~2023-09-05 16:33 UTC|newest]
Thread overview: 152+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-30 18:49 [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-30 18:49 ` [PATCH v2 1/9] mm/clear_huge_page: allow arch override for clear_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 2/9] mm/huge_page: separate clear_huge_page() and copy_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 3/9] mm/huge_page: cleanup clear_/copy_subpage() Ankur Arora
2023-09-08 13:09 ` Matthew Wilcox
2023-09-11 17:22 ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 4/9] x86/clear_page: extend clear_page*() for multi-page clearing Ankur Arora
2023-09-08 13:11 ` Matthew Wilcox
2023-08-30 18:49 ` [PATCH v2 5/9] x86/clear_page: add clear_pages() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 6/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-31 18:26 ` kernel test robot
2023-09-08 12:38 ` Peter Zijlstra
2023-09-13 6:43 ` Raghavendra K T
2023-08-30 18:49 ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-08 7:02 ` Peter Zijlstra
2023-09-08 17:15 ` Linus Torvalds
2023-09-08 22:50 ` Peter Zijlstra
2023-09-09 5:15 ` Linus Torvalds
2023-09-09 6:39 ` Ankur Arora
2023-09-09 9:11 ` Peter Zijlstra
2023-09-09 20:04 ` Ankur Arora
2023-09-09 5:30 ` Ankur Arora
2023-09-09 9:12 ` Peter Zijlstra
2023-09-09 20:15 ` Ankur Arora
2023-09-09 21:16 ` Linus Torvalds
2023-09-10 3:48 ` Ankur Arora
2023-09-10 4:35 ` Linus Torvalds
2023-09-10 10:01 ` Ankur Arora
2023-09-10 18:32 ` Linus Torvalds
2023-09-11 15:04 ` Peter Zijlstra
2023-09-11 16:29 ` andrew.cooper3
2023-09-11 17:04 ` Ankur Arora
2023-09-12 8:26 ` Peter Zijlstra
2023-09-12 12:24 ` Phil Auld
2023-09-12 12:33 ` Matthew Wilcox
2023-09-18 23:42 ` Thomas Gleixner
2023-09-19 1:57 ` Linus Torvalds
2023-09-19 8:03 ` Ingo Molnar
2023-09-19 8:43 ` Ingo Molnar
2023-09-19 13:43 ` Thomas Gleixner
2023-09-19 13:25 ` Thomas Gleixner
2023-09-19 12:30 ` Thomas Gleixner
2023-09-19 13:00 ` Arches that don't support PREEMPT Matthew Wilcox
2023-09-19 13:34 ` Geert Uytterhoeven
2023-09-19 13:37 ` John Paul Adrian Glaubitz
2023-09-19 13:42 ` Peter Zijlstra
2023-09-19 13:48 ` John Paul Adrian Glaubitz
2023-09-19 14:16 ` Peter Zijlstra
2023-09-19 14:24 ` John Paul Adrian Glaubitz
2023-09-19 14:32 ` Matthew Wilcox
2023-09-19 15:31 ` Steven Rostedt
2023-09-20 14:38 ` Anton Ivanov
2023-09-21 12:20 ` Arnd Bergmann
2023-09-19 14:17 ` Thomas Gleixner
2023-09-19 14:50 ` H. Peter Anvin
2023-09-19 14:57 ` Matt Turner
2023-09-19 17:09 ` Ulrich Teichert
2023-09-19 17:25 ` Linus Torvalds
2023-09-19 17:58 ` John Paul Adrian Glaubitz
2023-09-19 18:31 ` Thomas Gleixner
2023-09-19 18:38 ` Steven Rostedt
2023-09-19 18:52 ` Linus Torvalds
2023-09-19 19:53 ` Thomas Gleixner
2023-09-20 7:32 ` Ingo Molnar
2023-09-20 7:29 ` Ingo Molnar
2023-09-20 8:26 ` Thomas Gleixner
2023-09-20 10:37 ` David Laight
2023-09-19 14:21 ` Anton Ivanov
2023-09-19 15:17 ` Thomas Gleixner
2023-09-19 15:21 ` Anton Ivanov
2023-09-19 16:22 ` Richard Weinberger
2023-09-19 16:41 ` Anton Ivanov
2023-09-19 17:33 ` Thomas Gleixner
2023-10-06 14:51 ` Geert Uytterhoeven
2023-09-20 14:22 ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-20 20:51 ` Thomas Gleixner
2023-09-21 0:14 ` Thomas Gleixner
2023-09-21 0:58 ` Ankur Arora
2023-09-21 2:12 ` Thomas Gleixner
2023-09-20 23:58 ` Thomas Gleixner
2023-09-21 0:57 ` Ankur Arora
2023-09-21 2:02 ` Thomas Gleixner
2023-09-21 4:16 ` Ankur Arora
2023-09-21 13:59 ` Steven Rostedt
2023-09-21 16:00 ` Linus Torvalds
2023-09-21 22:55 ` Thomas Gleixner
2023-09-23 1:11 ` Thomas Gleixner
2023-10-02 14:15 ` Steven Rostedt
2023-10-02 16:13 ` Thomas Gleixner
2023-10-18 1:03 ` Paul E. McKenney
2023-10-18 12:09 ` Ankur Arora
2023-10-18 17:51 ` Paul E. McKenney
2023-10-18 22:53 ` Thomas Gleixner
2023-10-18 23:25 ` Paul E. McKenney
2023-10-18 13:16 ` Thomas Gleixner
2023-10-18 14:31 ` Steven Rostedt
2023-10-18 17:55 ` Paul E. McKenney
2023-10-18 18:00 ` Steven Rostedt
2023-10-18 18:13 ` Paul E. McKenney
2023-10-19 12:37 ` Daniel Bristot de Oliveira
2023-10-19 17:08 ` Paul E. McKenney
2023-10-18 17:19 ` Paul E. McKenney
2023-10-18 17:41 ` Steven Rostedt
2023-10-18 17:59 ` Paul E. McKenney
2023-10-18 20:15 ` Ankur Arora
2023-10-18 20:42 ` Paul E. McKenney
2023-10-19 0:21 ` Thomas Gleixner
2023-10-19 19:13 ` Paul E. McKenney
2023-10-20 21:59 ` Paul E. McKenney
2023-10-20 22:56 ` Ankur Arora
2023-10-20 23:36 ` Paul E. McKenney
2023-10-21 1:05 ` Ankur Arora
2023-10-21 2:08 ` Paul E. McKenney
2023-10-24 12:15 ` Thomas Gleixner
2023-10-24 18:59 ` Paul E. McKenney
2023-09-23 22:50 ` Thomas Gleixner
2023-09-24 0:10 ` Thomas Gleixner
2023-09-24 7:19 ` Matthew Wilcox
2023-09-24 7:55 ` Thomas Gleixner
2023-09-24 10:29 ` Matthew Wilcox
2023-09-25 0:13 ` Ankur Arora
2023-10-06 13:01 ` Geert Uytterhoeven
2023-09-19 7:21 ` Ingo Molnar
2023-09-19 19:05 ` Ankur Arora
2023-10-24 14:34 ` Steven Rostedt
2023-10-25 1:49 ` Steven Rostedt
2023-10-26 7:50 ` Sergey Senozhatsky
2023-10-26 12:48 ` Steven Rostedt
2023-09-11 16:48 ` Steven Rostedt
2023-09-11 20:50 ` Linus Torvalds
2023-09-11 21:16 ` Linus Torvalds
2023-09-12 7:20 ` Peter Zijlstra
2023-09-12 7:38 ` Ingo Molnar
2023-09-11 22:20 ` Steven Rostedt
2023-09-11 23:10 ` Ankur Arora
2023-09-11 23:16 ` Steven Rostedt
2023-09-12 16:30 ` Linus Torvalds
2023-09-12 3:27 ` Matthew Wilcox
2023-09-12 16:20 ` Linus Torvalds
2023-09-19 3:21 ` Andy Lutomirski
2023-09-19 9:20 ` Thomas Gleixner
2023-09-19 9:49 ` Ingo Molnar
2023-08-30 18:49 ` [PATCH v2 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-09-08 12:42 ` Peter Zijlstra
2023-09-11 17:24 ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-09-08 12:45 ` Peter Zijlstra
2023-09-03 8:14 ` [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Mateusz Guzik
2023-09-05 22:14 ` Ankur Arora
2023-09-08 2:18 ` Raghavendra K T
2023-09-05 1:06 ` Raghavendra K T [this message]
2023-09-05 19:36 ` Ankur Arora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2b79ab3b-56e7-926f-49f0-4c2584f6a72b@amd.com \
--to=raghavendra.kt@amd.com \
--cc=akpm@linux-foundation.org \
--cc=ankur.a.arora@oracle.com \
--cc=bharata@amd.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jon.grimm@amd.com \
--cc=juri.lelli@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).