linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Perf and Hackbench results on my machine
Date: Mon, 11 Oct 2021 10:33:02 +0000	[thread overview]
Message-ID: <20211011103302.GA65713@kvm.asia-northeast3-a.c.our-ratio-313919.internal> (raw)
In-Reply-To: <904b6e72-cc2e-2e4d-5601-dacab734bf15@suse.cz>

Hello Vlastimil.

On Mon, Oct 11, 2021 at 09:21:01AM +0200, Vlastimil Babka wrote:
> On 10/11/21 00:49, David Rientjes wrote:
> > On Fri, 8 Oct 2021, Hyeonggon Yoo wrote:
> > 
> >> It's certain that an object will be not only read, but also
> >> written after allocation.
> >> 
> > 
> > Why is it certain?  I think perhaps what you meant to say is that if we 
> > are doing any prefetching here, then access will benefit from prefetchw 
> > instead of prefetch.  But it's not "certain" that allocated memory will be 
> > accessed at all.
> 
> I think the primary reason there's a prefetch is freelist traversal. The
> cacheline we prefetch will be read during the next allocation, so if we
> expect there to be one soon, prefetch might help.

I agree that.

> That the freepointer is
> part of object itself and thus the cache line will be probably accessed also
> after the allocation, is secondary.

Right. it depends on cache line size and whether first cache line of an
object is frequently accessed or not.

> Yeah this might help some workloads, but
> perhaps hurt others - these things might look obvious in theory but be
> rather unpredictable in practice. At least some hackbench results would help...
>

Below is my measurement. it seems prefetch(w) is not making things worse
at least on hackbench.

Measured on 16 CPUs (ARM64) / 16G RAM
Without prefetch:

Time: 91.989
 Performance counter stats for 'hackbench -g 100 -l 10000':
        1467926.03 msec cpu-clock                 #   15.907 CPUs utilized          
          17782076      context-switches          #   12.114 K/sec                  
            957523      cpu-migrations            #  652.296 /sec                   
            104561      page-faults               #   71.230 /sec                   
     1622117569931      cycles                    #    1.105 GHz                      (54.54%)
     2002981132267      instructions              #    1.23  insn per cycle           (54.32%)
        5600876429      branch-misses                                                 (54.28%)
      642657442307      cache-references          #  437.800 M/sec                    (54.27%)
       19404890844      cache-misses              #    3.019 % of all cache refs      (54.28%)
      640413686039      L1-dcache-loads           #  436.271 M/sec                    (46.85%)
       19110650580      L1-dcache-load-misses     #    2.98% of all L1-dcache accesses  (46.83%)
      651556334841      dTLB-loads                #  443.862 M/sec                    (46.63%)
        3193647402      dTLB-load-misses          #    0.49% of all dTLB cache accesses  (46.84%)
      538927659684      iTLB-loads                #  367.135 M/sec                    (54.31%)
         118503839      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.35%)
      625750168840      L1-icache-loads           #  426.282 M/sec                    (46.80%)
       24348083282      L1-icache-load-misses     #    3.89% of all L1-icache accesses  (46.78%)

      92.284351157 seconds time elapsed

      44.524693000 seconds user
    1426.214006000 seconds sys

With prefetch:

Time: 91.677

 Performance counter stats for 'hackbench -g 100 -l 10000':
        1462938.07 msec cpu-clock                 #   15.908 CPUs utilized          
          18072550      context-switches          #   12.354 K/sec                  
           1018814      cpu-migrations            #  696.416 /sec                   
            104558      page-faults               #   71.471 /sec                   
     2003670016013      instructions              #    1.27  insn per cycle           (54.31%)
        5702204863      branch-misses                                                 (54.28%)
      643368500985      cache-references          #  439.778 M/sec                    (54.26%)
       18475582235      cache-misses              #    2.872 % of all cache refs      (54.28%)
      642206796636      L1-dcache-loads           #  438.984 M/sec                    (46.87%)
       18215813147      L1-dcache-load-misses     #    2.84% of all L1-dcache accesses  (46.83%)
      653842996501      dTLB-loads                #  446.938 M/sec                    (46.63%)
        3227179675      dTLB-load-misses          #    0.49% of all dTLB cache accesses  (46.85%)
      537531951350      iTLB-loads                #  367.433 M/sec                    (54.33%)
         114750630      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.37%)
      630135543177      L1-icache-loads           #  430.733 M/sec                    (46.80%)
       22923237620      L1-icache-load-misses     #    3.64% of all L1-icache accesses  (46.76%)
 
      91.964452802 seconds time elapsed

      43.416742000 seconds user
    1422.441123000 seconds sys
	
With prefetchw:

Time: 90.220

 Performance counter stats for 'hackbench -g 100 -l 10000':
        1437418.48 msec cpu-clock                 #   15.880 CPUs utilized          
          17694068      context-switches          #   12.310 K/sec                  
            958257      cpu-migrations            #  666.651 /sec                   
            100604      page-faults               #   69.989 /sec                   
     1583259429428      cycles                    #    1.101 GHz                      (54.57%)
     2004002484935      instructions              #    1.27  insn per cycle           (54.37%)
        5594202389      branch-misses                                                 (54.36%)
      643113574524      cache-references          #  447.409 M/sec                    (54.39%)
       18233791870      cache-misses              #    2.835 % of all cache refs      (54.37%)
      640205852062      L1-dcache-loads           #  445.386 M/sec                    (46.75%)
       17968160377      L1-dcache-load-misses     #    2.81% of all L1-dcache accesses  (46.79%)
      651747432274      dTLB-loads                #  453.415 M/sec                    (46.59%)
        3127124271      dTLB-load-misses          #    0.48% of all dTLB cache accesses  (46.75%)
      535395273064      iTLB-loads                #  372.470 M/sec                    (54.38%)
         113500056      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.35%)
      628871845924      L1-icache-loads           #  437.501 M/sec                    (46.80%)
       22585641203      L1-icache-load-misses     #    3.59% of all L1-icache accesses  (46.79%)
 
      90.514819303 seconds time elapsed
 
      43.877656000 seconds user
    1397.176001000 seconds sys

Thanks,
Hyeonggon

  reply	other threads:[~2021-10-11 10:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-08 13:36 [PATCH] mm, slub: Use prefetchw instead of prefetch Hyeonggon Yoo
2021-10-10 22:49 ` David Rientjes
2021-10-11  7:21   ` Vlastimil Babka
2021-10-11 10:33     ` Hyeonggon Yoo [this message]
2021-10-11 13:49       ` Perf and Hackbench results on my machine Hyeonggon Yoo
2021-10-11  7:23   ` [PATCH] mm, slub: Use prefetchw instead of prefetch Hyeonggon Yoo
2021-10-11  7:20 ` Christoph Lameter
2021-10-11  7:32   ` Hyeonggon Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211011103302.GA65713@kvm.asia-northeast3-a.c.our-ratio-313919.internal \
    --to=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).