linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org
Cc: Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm, slub: Use prefetchw instead of prefetch
Date: Tue, 19 Oct 2021 09:11:54 +0200	[thread overview]
Message-ID: <bf496398-d42f-05dc-927d-b4c601bd2d19@suse.cz> (raw)
In-Reply-To: <20211011144331.70084-1-42.hyeyoo@gmail.com>

On 10/11/21 16:43, Hyeonggon Yoo wrote:
> commit 0ad9500e16fe ("slub: prefetch next freelist pointer in
> slab_alloc()") introduced prefetch_freepointer() because when other cpu(s)
> freed objects into a page that current cpu owns, the freelist link is
> hot on cpu(s) which freed objects and possibly very cold on current cpu.
> 
> But if freelist link chain is hot on cpu(s) which freed objects,
> it's better to invalidate that chain because they're not going to access
> again within a short time.
> 
> So use prefetchw instead of prefetch. On supported architectures like x86
> and arm, it invalidates other copied instances of a cache line when
> prefetching it.
> 
> Before:
> 
> Time: 91.677
> 
>  Performance counter stats for 'hackbench -g 100 -l 10000':
>         1462938.07 msec cpu-clock                 #   15.908 CPUs utilized
>           18072550      context-switches          #   12.354 K/sec
>            1018814      cpu-migrations            #  696.416 /sec
>             104558      page-faults               #   71.471 /sec
>      1580035699271      cycles                    #    1.080 GHz                      (54.51%)
>      2003670016013      instructions              #    1.27  insn per cycle           (54.31%)
>         5702204863      branch-misses                                                 (54.28%)
>       643368500985      cache-references          #  439.778 M/sec                    (54.26%)
>        18475582235      cache-misses              #    2.872 % of all cache refs      (54.28%)
>       642206796636      L1-dcache-loads           #  438.984 M/sec                    (46.87%)
>        18215813147      L1-dcache-load-misses     #    2.84% of all L1-dcache accesses  (46.83%)
>       653842996501      dTLB-loads                #  446.938 M/sec                    (46.63%)
>         3227179675      dTLB-load-misses          #    0.49% of all dTLB cache accesses  (46.85%)
>       537531951350      iTLB-loads                #  367.433 M/sec                    (54.33%)
>          114750630      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.37%)
>       630135543177      L1-icache-loads           #  430.733 M/sec                    (46.80%)
>        22923237620      L1-icache-load-misses     #    3.64% of all L1-icache accesses  (46.76%)
> 
>       91.964452802 seconds time elapsed
> 
>       43.416742000 seconds user
>     1422.441123000 seconds sys
> 
> After:
> 
> Time: 90.220
> 
>  Performance counter stats for 'hackbench -g 100 -l 10000':
>         1437418.48 msec cpu-clock                 #   15.880 CPUs utilized
>           17694068      context-switches          #   12.310 K/sec
>             958257      cpu-migrations            #  666.651 /sec
>             100604      page-faults               #   69.989 /sec
>      1583259429428      cycles                    #    1.101 GHz                      (54.57%)
>      2004002484935      instructions              #    1.27  insn per cycle           (54.37%)
>         5594202389      branch-misses                                                 (54.36%)
>       643113574524      cache-references          #  447.409 M/sec                    (54.39%)
>        18233791870      cache-misses              #    2.835 % of all cache refs      (54.37%)
>       640205852062      L1-dcache-loads           #  445.386 M/sec                    (46.75%)
>        17968160377      L1-dcache-load-misses     #    2.81% of all L1-dcache accesses  (46.79%)
>       651747432274      dTLB-loads                #  453.415 M/sec                    (46.59%)
>         3127124271      dTLB-load-misses          #    0.48% of all dTLB cache accesses  (46.75%)
>       535395273064      iTLB-loads                #  372.470 M/sec                    (54.38%)
>          113500056      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.35%)
>       628871845924      L1-icache-loads           #  437.501 M/sec                    (46.80%)
>        22585641203      L1-icache-load-misses     #    3.59% of all L1-icache accesses  (46.79%)
> 
>       90.514819303 seconds time elapsed
> 
>       43.877656000 seconds user
>     1397.176001000 seconds sys

Wouldn't expect such noticeable difference. Maybe it would diminish when
repeating and taking average. But guess it's at least not worse with
prefetchw, so...

> Link: https://lkml.org/lkml/2021/10/8/598 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/slub.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 3d2025f7163b..ce3d8b11215c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -354,7 +354,7 @@ static inline void *get_freepointer(struct kmem_cache *s, void *object)
>  
>  static void prefetch_freepointer(const struct kmem_cache *s, void *object)
>  {
> -	prefetch(object + s->offset);
> +	prefetchw(object + s->offset);
>  }
>  
>  static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
> 



      parent reply	other threads:[~2021-10-19  7:11 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-11 14:43 [PATCH v2] mm, slub: Use prefetchw instead of prefetch Hyeonggon Yoo
2021-10-16 11:38 ` Hyeonggon Yoo
2021-10-19  7:11 ` Vlastimil Babka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bf496398-d42f-05dc-927d-b4c601bd2d19@suse.cz \
    --to=vbabka@suse.cz \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).