All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@kernel.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Mel Gorman <mgorman@techsingularity.net>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Jann Horn <jannh@google.com>
Subject: Re: [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible
Date: Fri, 06 Aug 2021 07:14:18 +0200	[thread overview]
Message-ID: <a36ad3a060da4f02808b2692891afbe35292e15b.camel@gmx.de> (raw)
In-Reply-To: <20210805164210.2zxpzn2sdogf4kx3@linutronix.de>

On Thu, 2021-08-05 at 18:42 +0200, Sebastian Andrzej Siewior wrote:
>
> There was throughput regression in RT compared to previous releases
> (without this series). The regression was (based on my testing) only
> visible in hackbench and was addressed by adding adaptiv spinning to
> RT-mutex. With that we almost back to what we had before :)

Numbers on my box say a throughput regression remains (silly fork bomb
scenario.. yawn), which can be recouped by either turning on all
SL[AU]B features or converting the list_lock to a raw lock.  They also
seem to be saying that if you turned on PREEMPT_RT because you care
about RT performance first and foremost (gee), you'll do neither of
those, because either will eliminate an RT performance progression.

	-Mike

numbers...

box is old i4790 desktop
perf stat -r10 hackbench -s4096 -l500
full warmup, record, repeat twice for elapsed

SLUB+SLUB_DEBUG only

begin previously reported numbers
5.14.0.g79e92006-tip-rt (5.12-rt based as before, 5.13-rt didn't yet exist)
          7,984.52 msec task-clock                #    7.565 CPUs utilized            ( +-  0.66% )
           353,566      context-switches          #   44.281 K/sec                    ( +-  2.77% )
            37,685      cpu-migrations            #    4.720 K/sec                    ( +-  6.37% )
            12,939      page-faults               #    1.620 K/sec                    ( +-  0.67% )
    29,901,079,227      cycles                    #    3.745 GHz                      ( +-  0.71% )
    14,550,797,818      instructions              #    0.49  insn per cycle           ( +-  0.47% )
     3,056,685,643      branches                  #  382.826 M/sec                    ( +-  0.51% )
         9,598,083      branch-misses             #    0.31% of all branches          ( +-  2.11% )

           1.05542 +- 0.00409 seconds time elapsed  ( +-  0.39% )
           1.05990 +- 0.00244 seconds time elapsed  ( +-  0.23% ) (repeat)
           1.05367 +- 0.00303 seconds time elapsed  ( +-  0.29% ) (repeat)

5.14.0.g79e92006-tip-rt +slub-local-lock-v2r3 -0034-mm-slub-convert-kmem_cpu_slab-protection-to-local_lock.patch
          6,899.35 msec task-clock                #    5.637 CPUs utilized            ( +-  0.53% )
           420,304      context-switches          #   60.919 K/sec                    ( +-  2.83% )
           187,130      cpu-migrations            #   27.123 K/sec                    ( +-  1.81% )
            13,206      page-faults               #    1.914 K/sec                    ( +-  0.96% )
    25,110,362,933      cycles                    #    3.640 GHz                      ( +-  0.49% )
    15,853,643,635      instructions              #    0.63  insn per cycle           ( +-  0.64% )
     3,366,261,524      branches                  #  487.910 M/sec                    ( +-  0.70% )
        14,839,618      branch-misses             #    0.44% of all branches          ( +-  2.01% )

           1.22390 +- 0.00744 seconds time elapsed  ( +-  0.61% )
           1.21813 +- 0.00907 seconds time elapsed  ( +-  0.74% ) (repeat)
           1.22097 +- 0.00952 seconds time elapsed  ( +-  0.78% ) (repeat)

repeat of above with raw list_lock
          8,072.62 msec task-clock                #    7.605 CPUs utilized            ( +-  0.49% )
           359,514      context-switches          #   44.535 K/sec                    ( +-  4.95% )
            35,285      cpu-migrations            #    4.371 K/sec                    ( +-  5.82% )
            13,503      page-faults               #    1.673 K/sec                    ( +-  0.96% )
    30,247,989,681      cycles                    #    3.747 GHz                      ( +-  0.52% )
    14,580,011,391      instructions              #    0.48  insn per cycle           ( +-  0.81% )
     3,063,743,405      branches                  #  379.523 M/sec                    ( +-  0.85% )
         8,907,160      branch-misses             #    0.29% of all branches          ( +-  3.99% )

           1.06150 +- 0.00427 seconds time elapsed  ( +-  0.40% )
           1.05041 +- 0.00176 seconds time elapsed  ( +-  0.17% ) (repeat)
           1.06086 +- 0.00237 seconds time elapsed  ( +-  0.22% ) (repeat)

5.14.0.g79e92006-rt3-tip-rt +slub-local-lock-v2r3 full set
          7,598.44 msec task-clock                #    5.813 CPUs utilized            ( +-  0.85% )
           488,161      context-switches          #   64.245 K/sec                    ( +-  4.29% )
           196,866      cpu-migrations            #   25.909 K/sec                    ( +-  1.49% )
            13,042      page-faults               #    1.716 K/sec                    ( +-  0.73% )
    27,695,116,746      cycles                    #    3.645 GHz                      ( +-  0.79% )
    18,423,934,168      instructions              #    0.67  insn per cycle           ( +-  0.88% )
     3,969,540,695      branches                  #  522.415 M/sec                    ( +-  0.92% )
        15,493,482      branch-misses             #    0.39% of all branches          ( +-  2.15% )

           1.30709 +- 0.00890 seconds time elapsed  ( +-  0.68% )
           1.3205 +- 0.0134 seconds time elapsed  ( +-  1.02% ) (repeat)
           1.3083 +- 0.0132 seconds time elapsed  ( +-  1.01% ) (repeat)
end previously reported numbers

5.14.0.gf6a71a5-rt6-tip-rt (same config, full slub set.. obviously)
          7,707.63 msec task-clock                #    5.880 CPUs utilized            ( +-  1.46% )
           562,533      context-switches          #   72.984 K/sec                    ( +-  7.46% )
           208,475      cpu-migrations            #   27.048 K/sec                    ( +-  2.26% )
            13,022      page-faults               #    1.689 K/sec                    ( +-  0.80% )
    28,025,004,779      cycles                    #    3.636 GHz                      ( +-  1.34% )
    18,487,135,489      instructions              #    0.66  insn per cycle           ( +-  1.58% )
     3,997,110,493      branches                  #  518.591 M/sec                    ( +-  1.65% )
        16,078,322      branch-misses             #    0.40% of all branches          ( +-  4.23% )

            1.3108 +- 0.0135 seconds time elapsed  ( +-  1.03% )
            1.2997 +- 0.0138 seconds time elapsed  ( +-  1.06% ) (repeat)
            1.3009 +- 0.0166 seconds time elapsed  ( +-  1.28% ) (repeat)

5.14.0.gf6a71a5-rt6-tip-rt +list_lock=raw_spinlock_t
          8,252.59 msec task-clock                #    7.584 CPUs utilized            ( +-  0.27% )
           400,991      context-switches          #   48.590 K/sec                    ( +-  6.15% )
            35,979      cpu-migrations            #    4.360 K/sec                    ( +-  5.63% )
            13,261      page-faults               #    1.607 K/sec                    ( +-  0.73% )
    30,910,310,737      cycles                    #    3.746 GHz                      ( +-  0.31% )
    16,522,383,240      instructions              #    0.53  insn per cycle           ( +-  0.92% )
     3,535,219,839      branches                  #  428.377 M/sec                    ( +-  0.96% )
        10,115,967      branch-misses             #    0.29% of all branches          ( +-  4.32% )

           1.08817 +- 0.00238 seconds time elapsed  ( +-  0.22% )
           1.08583 +- 0.00243 seconds time elapsed  ( +-  0.22% ) (repeat)
           1.09003 +- 0.00164 seconds time elapsed  ( +-  0.15% ) (repeat)

5.14.0.g251a152-rt6-master-rt (+SLAB_MERGE_DEFAULT,SLUB_CPU_PARTIAL,SLAB_FREELIST_RANDOM/HARDENED)
          8,170.48 msec task-clock                #    7.390 CPUs utilized            ( +-  0.43% )
           449,994      context-switches          #   55.076 K/sec                    ( +-  4.20% )
            55,912      cpu-migrations            #    6.843 K/sec                    ( +-  4.28% )
            13,144      page-faults               #    1.609 K/sec                    ( +-  0.53% )
    30,484,114,812      cycles                    #    3.731 GHz                      ( +-  0.44% )
    17,554,521,787      instructions              #    0.58  insn per cycle           ( +-  0.76% )
     3,751,725,852      branches                  #  459.181 M/sec                    ( +-  0.81% )
        13,421,985      branch-misses             #    0.36% of all branches          ( +-  2.40% )

           1.10563 +- 0.00382 seconds time elapsed  ( +-  0.35% )
           1.1098 +- 0.0147 seconds time elapsed  ( +-  1.32% ) (repeat)
           1.11308 +- 0.00883 seconds time elapsed  ( +-  0.79% ) (repeat)

5.14.0.gf6a71a5-rt6-tip-rt +SLAB_MERGE_DEFAULT,SLUB_CPU_PARTIAL,SLAB_FREELIST_RANDOM/HARDENED
          8,026.39 msec task-clock                #    7.320 CPUs utilized            ( +-  0.70% )
           496,579      context-switches          #   61.868 K/sec                    ( +-  6.78% )
            65,022      cpu-migrations            #    8.101 K/sec                    ( +-  8.29% )
            13,161      page-faults               #    1.640 K/sec                    ( +-  0.51% )
    29,870,954,733      cycles                    #    3.722 GHz                      ( +-  0.67% )
    17,617,522,235      instructions              #    0.59  insn per cycle           ( +-  1.36% )
     3,760,346,459      branches                  #  468.498 M/sec                    ( +-  1.45% )
        12,863,520      branch-misses             #    0.34% of all branches          ( +-  4.45% )

            1.0965 +- 0.0103 seconds time elapsed  ( +-  0.94% )
            1.08149 +- 0.00362 seconds time elapsed  ( +-  0.33% ) (repeat)
            1.10027 +- 0.00916 seconds time elapsed  ( +-  0.83% )

yup, perf delta == config delta, lets have a peek at jitter

cyclictest -Smqp99& perf stat -r100 hackbench -s4096 -l500 && killall cyclictest

5.14.0.gf6a71a5-rt6-tip-rt
SLUB+SLUB_DEBUG
T: 1 ( 5903) P:99 I:1500 C:  92330 Min:      1 Act:    2 Avg:    6 Max:      19
T: 2 ( 5904) P:99 I:2000 C:  69247 Min:      1 Act:    2 Avg:    6 Max:      21
T: 3 ( 5905) P:99 I:2500 C:  55395 Min:      1 Act:    3 Avg:    6 Max:      22
T: 4 ( 5906) P:99 I:3000 C:  46163 Min:      1 Act:    4 Avg:    7 Max:      22
T: 5 ( 5907) P:99 I:3500 C:  39568 Min:      1 Act:    3 Avg:    6 Max:      23
T: 6 ( 5909) P:99 I:4000 C:  34621 Min:      1 Act:    2 Avg:    7 Max:      22
T: 7 ( 5910) P:99 I:4500 C:  30774 Min:      1 Act:    3 Avg:    7 Max:      18

SLUB+SLUB_DEBUG+list_lock=raw_spinlock_t
T: 1 ( 4044) P:99 I:1500 C:  73340 Min:      1 Act:    3 Avg:   10 Max:      28
T: 2 ( 4045) P:99 I:2000 C:  55004 Min:      1 Act:    4 Avg:   10 Max:      33
T: 3 ( 4046) P:99 I:2500 C:  44002 Min:      1 Act:    2 Avg:   10 Max:      26
T: 4 ( 4047) P:99 I:3000 C:  36668 Min:      1 Act:    3 Avg:   10 Max:      24
T: 5 ( 4048) P:99 I:3500 C:  31429 Min:      1 Act:    3 Avg:   10 Max:      27
T: 6 ( 4049) P:99 I:4000 C:  27500 Min:      1 Act:    3 Avg:   11 Max:      30
T: 7 ( 4050) P:99 I:4500 C:  24444 Min:      1 Act:    4 Avg:   11 Max:      25

SLUB+SLUB_DEBUG+SLAB_MERGE_DEFAULT,SLUB_CPU_PARTIAL,SLAB_FREELIST_RANDOM/HARDENED
T: 1 ( 4036) P:99 I:1500 C:  74039 Min:      1 Act:    3 Avg:    9 Max:      31
T: 2 ( 4037) P:99 I:2000 C:  55528 Min:      1 Act:    3 Avg:   10 Max:      29
T: 3 ( 4038) P:99 I:2500 C:  44422 Min:      1 Act:    2 Avg:   10 Max:      31
T: 4 ( 4039) P:99 I:3000 C:  37017 Min:      1 Act:    2 Avg:    9 Max:      23
T: 5 ( 4040) P:99 I:3500 C:  31729 Min:      1 Act:    3 Avg:   10 Max:      29
T: 6 ( 4041) P:99 I:4000 C:  27762 Min:      1 Act:    2 Avg:    8 Max:      26
T: 7 ( 4042) P:99 I:4500 C:  24677 Min:      1 Act:    3 Avg:    9 Max:      27

conclusion: gee, pi both works and ain't free - ditto add more stuff=cycles :)





  reply	other threads:[~2021-08-06  5:15 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 15:19 [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 01/35] mm, slub: don't call flush_all() from slab_debug_trace_open() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 02/35] mm, slub: allocate private object map for debugfs listings Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 03/35] mm, slub: allocate private object map for validate_slab_cache() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 04/35] mm, slub: don't disable irq for debug_check_no_locks_freed() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 05/35] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 06/35] mm, slub: unify cmpxchg_double_slab() and __cmpxchg_double_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 07/35] mm, slub: extract get_partial() from new_slab_objects() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 08/35] mm, slub: dissolve new_slab_objects() into ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 09/35] mm, slub: return slab page from get_partial() and set c->page afterwards Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 10/35] mm, slub: restructure new page checks in ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 11/35] mm, slub: simplify kmem_cache_cpu and tid setup Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 12/35] mm, slub: move disabling/enabling irqs to ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 13/35] mm, slub: do initial checks in ___slab_alloc() with irqs enabled Vlastimil Babka
2021-08-15 10:14   ` Vlastimil Babka
2021-08-15 10:22     ` Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 14/35] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 15/35] mm, slub: restore irqs around calling new_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 16/35] mm, slub: validate slab from partial list or page allocator before making it cpu slab Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 17/35] mm, slub: check new pages with restored irqs Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 18/35] mm, slub: stop disabling irqs around get_partial() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 19/35] mm, slub: move reset of c->page and freelist out of deactivate_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 20/35] mm, slub: make locking in deactivate_slab() irq-safe Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 21/35] mm, slub: call deactivate_slab() without disabling irqs Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 22/35] mm, slub: move irq control into unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 23/35] mm, slub: discard slabs in unfreeze_partials() without irqs disabled Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 24/35] mm, slub: detach whole partial list at once in unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 25/35] mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 26/35] mm, slub: only disable irq with spin_lock in __unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 27/35] mm, slub: don't disable irqs in slub_cpu_dead() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 28/35] mm, slab: make flush_slab() possible to call with irqs enabled Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 29/35] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context Vlastimil Babka
2021-08-09 13:41   ` Qian Cai
2021-08-09 18:44     ` Mike Galbraith
2021-08-09 18:44       ` Mike Galbraith
2021-08-09 20:08       ` Vlastimil Babka
2021-08-09 22:13         ` Qian Cai
2021-08-10  1:07         ` Mike Galbraith
2021-08-10  1:07           ` Mike Galbraith
2021-08-10  9:03     ` Vlastimil Babka
2021-08-10 11:47       ` Mike Galbraith
2021-08-10 11:47         ` Mike Galbraith
2021-08-10 20:31         ` Paul E. McKenney
2021-08-10 22:36           ` Vlastimil Babka
2021-08-10 23:53             ` Paul E. McKenney
2021-08-11 14:17               ` Paul E. McKenney
2021-08-10 20:25       ` Paul E. McKenney
2021-08-10 14:33     ` Vlastimil Babka
2021-08-11  1:42       ` Qian Cai
2021-08-11  8:55       ` Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 30/35] mm: slub: Make object_map_lock a raw_spinlock_t Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 31/35] mm, slub: optionally save/restore irqs in slab_[un]lock()/ Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 32/35] mm, slub: make slab_lock() disable irqs with PREEMPT_RT Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 33/35] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 34/35] mm, slub: use migrate_disable() on PREEMPT_RT Vlastimil Babka
2021-08-05 15:20 ` [PATCH v4 35/35] mm, slub: convert kmem_cpu_slab protection to local_lock Vlastimil Babka
2021-08-15 12:27   ` Sven Eckelmann
2021-08-17  8:37     ` Vlastimil Babka
2021-08-17  9:12       ` Sebastian Andrzej Siewior
2021-08-17  9:17         ` Vlastimil Babka
2021-08-17  9:31           ` Sebastian Andrzej Siewior
2021-08-17  9:31         ` Vlastimil Babka
2021-08-17  9:34           ` Sebastian Andrzej Siewior
2021-08-17  9:13     ` Vlastimil Babka
2021-08-17 10:14   ` Vlastimil Babka
2021-08-17 19:53     ` Andrew Morton
2021-08-18 11:52       ` Vlastimil Babka
2021-08-23 20:36         ` Thomas Gleixner
2021-08-17 15:39   ` Sebastian Andrzej Siewior
2021-08-17 15:41     ` Vlastimil Babka
2021-08-17 15:49       ` Sebastian Andrzej Siewior
2021-08-17 15:56   ` Vlastimil Babka
2021-08-05 16:42 ` [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible Sebastian Andrzej Siewior
2021-08-06  5:14   ` Mike Galbraith [this message]
2021-08-06  5:14     ` Mike Galbraith
2021-08-06  7:45     ` Vlastimil Babka
2021-08-10 14:36 ` Vlastimil Babka
2021-08-15 10:18   ` Vlastimil Babka
2021-08-17 10:23     ` Vlastimil Babka
2021-08-17 15:59       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a36ad3a060da4f02808b2692891afbe35292e15b.camel@gmx.de \
    --to=efault@gmx.de \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=brouer@redhat.com \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.