From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Perf and Hackbench results on my machine
Date: Mon, 11 Oct 2021 10:33:02 +0000 [thread overview]
Message-ID: <20211011103302.GA65713@kvm.asia-northeast3-a.c.our-ratio-313919.internal> (raw)
In-Reply-To: <904b6e72-cc2e-2e4d-5601-dacab734bf15@suse.cz>
Hello Vlastimil.
On Mon, Oct 11, 2021 at 09:21:01AM +0200, Vlastimil Babka wrote:
> On 10/11/21 00:49, David Rientjes wrote:
> > On Fri, 8 Oct 2021, Hyeonggon Yoo wrote:
> >
> >> It's certain that an object will be not only read, but also
> >> written after allocation.
> >>
> >
> > Why is it certain? I think perhaps what you meant to say is that if we
> > are doing any prefetching here, then access will benefit from prefetchw
> > instead of prefetch. But it's not "certain" that allocated memory will be
> > accessed at all.
>
> I think the primary reason there's a prefetch is freelist traversal. The
> cacheline we prefetch will be read during the next allocation, so if we
> expect there to be one soon, prefetch might help.
I agree that.
> That the freepointer is
> part of object itself and thus the cache line will be probably accessed also
> after the allocation, is secondary.
Right. it depends on cache line size and whether first cache line of an
object is frequently accessed or not.
> Yeah this might help some workloads, but
> perhaps hurt others - these things might look obvious in theory but be
> rather unpredictable in practice. At least some hackbench results would help...
>
Below is my measurement. it seems prefetch(w) is not making things worse
at least on hackbench.
Measured on 16 CPUs (ARM64) / 16G RAM
Without prefetch:
Time: 91.989
Performance counter stats for 'hackbench -g 100 -l 10000':
1467926.03 msec cpu-clock # 15.907 CPUs utilized
17782076 context-switches # 12.114 K/sec
957523 cpu-migrations # 652.296 /sec
104561 page-faults # 71.230 /sec
1622117569931 cycles # 1.105 GHz (54.54%)
2002981132267 instructions # 1.23 insn per cycle (54.32%)
5600876429 branch-misses (54.28%)
642657442307 cache-references # 437.800 M/sec (54.27%)
19404890844 cache-misses # 3.019 % of all cache refs (54.28%)
640413686039 L1-dcache-loads # 436.271 M/sec (46.85%)
19110650580 L1-dcache-load-misses # 2.98% of all L1-dcache accesses (46.83%)
651556334841 dTLB-loads # 443.862 M/sec (46.63%)
3193647402 dTLB-load-misses # 0.49% of all dTLB cache accesses (46.84%)
538927659684 iTLB-loads # 367.135 M/sec (54.31%)
118503839 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.35%)
625750168840 L1-icache-loads # 426.282 M/sec (46.80%)
24348083282 L1-icache-load-misses # 3.89% of all L1-icache accesses (46.78%)
92.284351157 seconds time elapsed
44.524693000 seconds user
1426.214006000 seconds sys
With prefetch:
Time: 91.677
Performance counter stats for 'hackbench -g 100 -l 10000':
1462938.07 msec cpu-clock # 15.908 CPUs utilized
18072550 context-switches # 12.354 K/sec
1018814 cpu-migrations # 696.416 /sec
104558 page-faults # 71.471 /sec
2003670016013 instructions # 1.27 insn per cycle (54.31%)
5702204863 branch-misses (54.28%)
643368500985 cache-references # 439.778 M/sec (54.26%)
18475582235 cache-misses # 2.872 % of all cache refs (54.28%)
642206796636 L1-dcache-loads # 438.984 M/sec (46.87%)
18215813147 L1-dcache-load-misses # 2.84% of all L1-dcache accesses (46.83%)
653842996501 dTLB-loads # 446.938 M/sec (46.63%)
3227179675 dTLB-load-misses # 0.49% of all dTLB cache accesses (46.85%)
537531951350 iTLB-loads # 367.433 M/sec (54.33%)
114750630 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.37%)
630135543177 L1-icache-loads # 430.733 M/sec (46.80%)
22923237620 L1-icache-load-misses # 3.64% of all L1-icache accesses (46.76%)
91.964452802 seconds time elapsed
43.416742000 seconds user
1422.441123000 seconds sys
With prefetchw:
Time: 90.220
Performance counter stats for 'hackbench -g 100 -l 10000':
1437418.48 msec cpu-clock # 15.880 CPUs utilized
17694068 context-switches # 12.310 K/sec
958257 cpu-migrations # 666.651 /sec
100604 page-faults # 69.989 /sec
1583259429428 cycles # 1.101 GHz (54.57%)
2004002484935 instructions # 1.27 insn per cycle (54.37%)
5594202389 branch-misses (54.36%)
643113574524 cache-references # 447.409 M/sec (54.39%)
18233791870 cache-misses # 2.835 % of all cache refs (54.37%)
640205852062 L1-dcache-loads # 445.386 M/sec (46.75%)
17968160377 L1-dcache-load-misses # 2.81% of all L1-dcache accesses (46.79%)
651747432274 dTLB-loads # 453.415 M/sec (46.59%)
3127124271 dTLB-load-misses # 0.48% of all dTLB cache accesses (46.75%)
535395273064 iTLB-loads # 372.470 M/sec (54.38%)
113500056 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.35%)
628871845924 L1-icache-loads # 437.501 M/sec (46.80%)
22585641203 L1-icache-load-misses # 3.59% of all L1-icache accesses (46.79%)
90.514819303 seconds time elapsed
43.877656000 seconds user
1397.176001000 seconds sys
Thanks,
Hyeonggon
next prev parent reply other threads:[~2021-10-11 10:33 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-08 13:36 [PATCH] mm, slub: Use prefetchw instead of prefetch Hyeonggon Yoo
2021-10-10 22:49 ` David Rientjes
2021-10-11 7:21 ` Vlastimil Babka
2021-10-11 10:33 ` Hyeonggon Yoo [this message]
2021-10-11 13:49 ` Perf and Hackbench results on my machine Hyeonggon Yoo
2021-10-11 7:23 ` [PATCH] mm, slub: Use prefetchw instead of prefetch Hyeonggon Yoo
2021-10-11 7:20 ` Christoph Lameter
2021-10-11 7:32 ` Hyeonggon Yoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211011103302.GA65713@kvm.asia-northeast3-a.c.our-ratio-313919.internal \
--to=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).