linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@kernel.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Mel Gorman <mgorman@techsingularity.net>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Jann Horn <jannh@google.com>
Subject: Re: [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible
Date: Fri, 06 Aug 2021 07:14:18 +0200	[thread overview]
Message-ID: <a36ad3a060da4f02808b2692891afbe35292e15b.camel@gmx.de> (raw)
In-Reply-To: <20210805164210.2zxpzn2sdogf4kx3@linutronix.de>

On Thu, 2021-08-05 at 18:42 +0200, Sebastian Andrzej Siewior wrote:
>
> There was throughput regression in RT compared to previous releases
> (without this series). The regression was (based on my testing) only
> visible in hackbench and was addressed by adding adaptiv spinning to
> RT-mutex. With that we almost back to what we had before :)

Numbers on my box say a throughput regression remains (silly fork bomb
scenario.. yawn), which can be recouped by either turning on all
SL[AU]B features or converting the list_lock to a raw lock.  They also
seem to be saying that if you turned on PREEMPT_RT because you care
about RT performance first and foremost (gee), you'll do neither of
those, because either will eliminate an RT performance progression.

	-Mike

numbers...

box is old i4790 desktop
perf stat -r10 hackbench -s4096 -l500
full warmup, record, repeat twice for elapsed

SLUB+SLUB_DEBUG only

begin previously reported numbers
5.14.0.g79e92006-tip-rt (5.12-rt based as before, 5.13-rt didn't yet exist)
          7,984.52 msec task-clock                #    7.565 CPUs utilized            ( +-  0.66% )
           353,566      context-switches          #   44.281 K/sec                    ( +-  2.77% )
            37,685      cpu-migrations            #    4.720 K/sec                    ( +-  6.37% )
            12,939      page-faults               #    1.620 K/sec                    ( +-  0.67% )
    29,901,079,227      cycles                    #    3.745 GHz                      ( +-  0.71% )
    14,550,797,818      instructions              #    0.49  insn per cycle           ( +-  0.47% )
     3,056,685,643      branches                  #  382.826 M/sec                    ( +-  0.51% )
         9,598,083      branch-misses             #    0.31% of all branches          ( +-  2.11% )

           1.05542 +- 0.00409 seconds time elapsed  ( +-  0.39% )
           1.05990 +- 0.00244 seconds time elapsed  ( +-  0.23% ) (repeat)
           1.05367 +- 0.00303 seconds time elapsed  ( +-  0.29% ) (repeat)

5.14.0.g79e92006-tip-rt +slub-local-lock-v2r3 -0034-mm-slub-convert-kmem_cpu_slab-protection-to-local_lock.patch
          6,899.35 msec task-clock                #    5.637 CPUs utilized            ( +-  0.53% )
           420,304      context-switches          #   60.919 K/sec                    ( +-  2.83% )
           187,130      cpu-migrations            #   27.123 K/sec                    ( +-  1.81% )
            13,206      page-faults               #    1.914 K/sec                    ( +-  0.96% )
    25,110,362,933      cycles                    #    3.640 GHz                      ( +-  0.49% )
    15,853,643,635      instructions              #    0.63  insn per cycle           ( +-  0.64% )
     3,366,261,524      branches                  #  487.910 M/sec                    ( +-  0.70% )
        14,839,618      branch-misses             #    0.44% of all branches          ( +-  2.01% )

           1.22390 +- 0.00744 seconds time elapsed  ( +-  0.61% )
           1.21813 +- 0.00907 seconds time elapsed  ( +-  0.74% ) (repeat)
           1.22097 +- 0.00952 seconds time elapsed  ( +-  0.78% ) (repeat)

repeat of above with raw list_lock
          8,072.62 msec task-clock                #    7.605 CPUs utilized            ( +-  0.49% )
           359,514      context-switches          #   44.535 K/sec                    ( +-  4.95% )
            35,285      cpu-migrations            #    4.371 K/sec                    ( +-  5.82% )
            13,503      page-faults               #    1.673 K/sec                    ( +-  0.96% )
    30,247,989,681      cycles                    #    3.747 GHz                      ( +-  0.52% )
    14,580,011,391      instructions              #    0.48  insn per cycle           ( +-  0.81% )
     3,063,743,405      branches                  #  379.523 M/sec                    ( +-  0.85% )
         8,907,160      branch-misses             #    0.29% of all branches          ( +-  3.99% )

           1.06150 +- 0.00427 seconds time elapsed  ( +-  0.40% )
           1.05041 +- 0.00176 seconds time elapsed  ( +-  0.17% ) (repeat)
           1.06086 +- 0.00237 seconds time elapsed  ( +-  0.22% ) (repeat)

5.14.0.g79e92006-rt3-tip-rt +slub-local-lock-v2r3 full set
          7,598.44 msec task-clock                #    5.813 CPUs utilized            ( +-  0.85% )
           488,161      context-switches          #   64.245 K/sec                    ( +-  4.29% )
           196,866      cpu-migrations            #   25.909 K/sec                    ( +-  1.49% )
            13,042      page-faults               #    1.716 K/sec                    ( +-  0.73% )
    27,695,116,746      cycles                    #    3.645 GHz                      ( +-  0.79% )
    18,423,934,168      instructions              #    0.67  insn per cycle           ( +-  0.88% )
     3,969,540,695      branches                  #  522.415 M/sec                    ( +-  0.92% )
        15,493,482      branch-misses             #    0.39% of all branches          ( +-  2.15% )

           1.30709 +- 0.00890 seconds time elapsed  ( +-  0.68% )
           1.3205 +- 0.0134 seconds time elapsed  ( +-  1.02% ) (repeat)
           1.3083 +- 0.0132 seconds time elapsed  ( +-  1.01% ) (repeat)
end previously reported numbers

5.14.0.gf6a71a5-rt6-tip-rt (same config, full slub set.. obviously)
          7,707.63 msec task-clock                #    5.880 CPUs utilized            ( +-  1.46% )
           562,533      context-switches          #   72.984 K/sec                    ( +-  7.46% )
           208,475      cpu-migrations            #   27.048 K/sec                    ( +-  2.26% )
            13,022      page-faults               #    1.689 K/sec                    ( +-  0.80% )
    28,025,004,779      cycles                    #    3.636 GHz                      ( +-  1.34% )
    18,487,135,489      instructions              #    0.66  insn per cycle           ( +-  1.58% )
     3,997,110,493      branches                  #  518.591 M/sec                    ( +-  1.65% )
        16,078,322      branch-misses             #    0.40% of all branches          ( +-  4.23% )

            1.3108 +- 0.0135 seconds time elapsed  ( +-  1.03% )
            1.2997 +- 0.0138 seconds time elapsed  ( +-  1.06% ) (repeat)
            1.3009 +- 0.0166 seconds time elapsed  ( +-  1.28% ) (repeat)

5.14.0.gf6a71a5-rt6-tip-rt +list_lock=raw_spinlock_t
          8,252.59 msec task-clock                #    7.584 CPUs utilized            ( +-  0.27% )
           400,991      context-switches          #   48.590 K/sec                    ( +-  6.15% )
            35,979      cpu-migrations            #    4.360 K/sec                    ( +-  5.63% )
            13,261      page-faults               #    1.607 K/sec                    ( +-  0.73% )
    30,910,310,737      cycles                    #    3.746 GHz                      ( +-  0.31% )
    16,522,383,240      instructions              #    0.53  insn per cycle           ( +-  0.92% )
     3,535,219,839      branches                  #  428.377 M/sec                    ( +-  0.96% )
        10,115,967      branch-misses             #    0.29% of all branches          ( +-  4.32% )

           1.08817 +- 0.00238 seconds time elapsed  ( +-  0.22% )
           1.08583 +- 0.00243 seconds time elapsed  ( +-  0.22% ) (repeat)
           1.09003 +- 0.00164 seconds time elapsed  ( +-  0.15% ) (repeat)

5.14.0.g251a152-rt6-master-rt (+SLAB_MERGE_DEFAULT,SLUB_CPU_PARTIAL,SLAB_FREELIST_RANDOM/HARDENED)
          8,170.48 msec task-clock                #    7.390 CPUs utilized            ( +-  0.43% )
           449,994      context-switches          #   55.076 K/sec                    ( +-  4.20% )
            55,912      cpu-migrations            #    6.843 K/sec                    ( +-  4.28% )
            13,144      page-faults               #    1.609 K/sec                    ( +-  0.53% )
    30,484,114,812      cycles                    #    3.731 GHz                      ( +-  0.44% )
    17,554,521,787      instructions              #    0.58  insn per cycle           ( +-  0.76% )
     3,751,725,852      branches                  #  459.181 M/sec                    ( +-  0.81% )
        13,421,985      branch-misses             #    0.36% of all branches          ( +-  2.40% )

           1.10563 +- 0.00382 seconds time elapsed  ( +-  0.35% )
           1.1098 +- 0.0147 seconds time elapsed  ( +-  1.32% ) (repeat)
           1.11308 +- 0.00883 seconds time elapsed  ( +-  0.79% ) (repeat)

5.14.0.gf6a71a5-rt6-tip-rt +SLAB_MERGE_DEFAULT,SLUB_CPU_PARTIAL,SLAB_FREELIST_RANDOM/HARDENED
          8,026.39 msec task-clock                #    7.320 CPUs utilized            ( +-  0.70% )
           496,579      context-switches          #   61.868 K/sec                    ( +-  6.78% )
            65,022      cpu-migrations            #    8.101 K/sec                    ( +-  8.29% )
            13,161      page-faults               #    1.640 K/sec                    ( +-  0.51% )
    29,870,954,733      cycles                    #    3.722 GHz                      ( +-  0.67% )
    17,617,522,235      instructions              #    0.59  insn per cycle           ( +-  1.36% )
     3,760,346,459      branches                  #  468.498 M/sec                    ( +-  1.45% )
        12,863,520      branch-misses             #    0.34% of all branches          ( +-  4.45% )

            1.0965 +- 0.0103 seconds time elapsed  ( +-  0.94% )
            1.08149 +- 0.00362 seconds time elapsed  ( +-  0.33% ) (repeat)
            1.10027 +- 0.00916 seconds time elapsed  ( +-  0.83% )

yup, perf delta == config delta, lets have a peek at jitter

cyclictest -Smqp99& perf stat -r100 hackbench -s4096 -l500 && killall cyclictest

5.14.0.gf6a71a5-rt6-tip-rt
SLUB+SLUB_DEBUG
T: 1 ( 5903) P:99 I:1500 C:  92330 Min:      1 Act:    2 Avg:    6 Max:      19
T: 2 ( 5904) P:99 I:2000 C:  69247 Min:      1 Act:    2 Avg:    6 Max:      21
T: 3 ( 5905) P:99 I:2500 C:  55395 Min:      1 Act:    3 Avg:    6 Max:      22
T: 4 ( 5906) P:99 I:3000 C:  46163 Min:      1 Act:    4 Avg:    7 Max:      22
T: 5 ( 5907) P:99 I:3500 C:  39568 Min:      1 Act:    3 Avg:    6 Max:      23
T: 6 ( 5909) P:99 I:4000 C:  34621 Min:      1 Act:    2 Avg:    7 Max:      22
T: 7 ( 5910) P:99 I:4500 C:  30774 Min:      1 Act:    3 Avg:    7 Max:      18

SLUB+SLUB_DEBUG+list_lock=raw_spinlock_t
T: 1 ( 4044) P:99 I:1500 C:  73340 Min:      1 Act:    3 Avg:   10 Max:      28
T: 2 ( 4045) P:99 I:2000 C:  55004 Min:      1 Act:    4 Avg:   10 Max:      33
T: 3 ( 4046) P:99 I:2500 C:  44002 Min:      1 Act:    2 Avg:   10 Max:      26
T: 4 ( 4047) P:99 I:3000 C:  36668 Min:      1 Act:    3 Avg:   10 Max:      24
T: 5 ( 4048) P:99 I:3500 C:  31429 Min:      1 Act:    3 Avg:   10 Max:      27
T: 6 ( 4049) P:99 I:4000 C:  27500 Min:      1 Act:    3 Avg:   11 Max:      30
T: 7 ( 4050) P:99 I:4500 C:  24444 Min:      1 Act:    4 Avg:   11 Max:      25

SLUB+SLUB_DEBUG+SLAB_MERGE_DEFAULT,SLUB_CPU_PARTIAL,SLAB_FREELIST_RANDOM/HARDENED
T: 1 ( 4036) P:99 I:1500 C:  74039 Min:      1 Act:    3 Avg:    9 Max:      31
T: 2 ( 4037) P:99 I:2000 C:  55528 Min:      1 Act:    3 Avg:   10 Max:      29
T: 3 ( 4038) P:99 I:2500 C:  44422 Min:      1 Act:    2 Avg:   10 Max:      31
T: 4 ( 4039) P:99 I:3000 C:  37017 Min:      1 Act:    2 Avg:    9 Max:      23
T: 5 ( 4040) P:99 I:3500 C:  31729 Min:      1 Act:    3 Avg:   10 Max:      29
T: 6 ( 4041) P:99 I:4000 C:  27762 Min:      1 Act:    2 Avg:    8 Max:      26
T: 7 ( 4042) P:99 I:4500 C:  24677 Min:      1 Act:    3 Avg:    9 Max:      27

conclusion: gee, pi both works and ain't free - ditto add more stuff=cycles :)





  reply	other threads:[~2021-08-06  5:15 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 15:19 [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 01/35] mm, slub: don't call flush_all() from slab_debug_trace_open() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 02/35] mm, slub: allocate private object map for debugfs listings Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 03/35] mm, slub: allocate private object map for validate_slab_cache() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 04/35] mm, slub: don't disable irq for debug_check_no_locks_freed() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 05/35] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 06/35] mm, slub: unify cmpxchg_double_slab() and __cmpxchg_double_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 07/35] mm, slub: extract get_partial() from new_slab_objects() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 08/35] mm, slub: dissolve new_slab_objects() into ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 09/35] mm, slub: return slab page from get_partial() and set c->page afterwards Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 10/35] mm, slub: restructure new page checks in ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 11/35] mm, slub: simplify kmem_cache_cpu and tid setup Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 12/35] mm, slub: move disabling/enabling irqs to ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 13/35] mm, slub: do initial checks in ___slab_alloc() with irqs enabled Vlastimil Babka
2021-08-15 10:14   ` Vlastimil Babka
2021-08-15 10:22     ` Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 14/35] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 15/35] mm, slub: restore irqs around calling new_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 16/35] mm, slub: validate slab from partial list or page allocator before making it cpu slab Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 17/35] mm, slub: check new pages with restored irqs Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 18/35] mm, slub: stop disabling irqs around get_partial() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 19/35] mm, slub: move reset of c->page and freelist out of deactivate_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 20/35] mm, slub: make locking in deactivate_slab() irq-safe Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 21/35] mm, slub: call deactivate_slab() without disabling irqs Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 22/35] mm, slub: move irq control into unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 23/35] mm, slub: discard slabs in unfreeze_partials() without irqs disabled Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 24/35] mm, slub: detach whole partial list at once in unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 25/35] mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 26/35] mm, slub: only disable irq with spin_lock in __unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 27/35] mm, slub: don't disable irqs in slub_cpu_dead() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 28/35] mm, slab: make flush_slab() possible to call with irqs enabled Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 29/35] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context Vlastimil Babka
2021-08-09 13:41   ` Qian Cai
2021-08-09 18:44     ` Mike Galbraith
2021-08-09 20:08       ` Vlastimil Babka
2021-08-09 22:13         ` Qian Cai
2021-08-10  1:07         ` Mike Galbraith
2021-08-10  9:03     ` Vlastimil Babka
2021-08-10 11:47       ` Mike Galbraith
2021-08-10 20:31         ` Paul E. McKenney
2021-08-10 22:36           ` Vlastimil Babka
2021-08-10 23:53             ` Paul E. McKenney
2021-08-11 14:17               ` Paul E. McKenney
2021-08-10 20:25       ` Paul E. McKenney
2021-08-10 14:33     ` Vlastimil Babka
2021-08-11  1:42       ` Qian Cai
2021-08-11  8:55       ` Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 30/35] mm: slub: Make object_map_lock a raw_spinlock_t Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 31/35] mm, slub: optionally save/restore irqs in slab_[un]lock()/ Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 32/35] mm, slub: make slab_lock() disable irqs with PREEMPT_RT Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 33/35] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 34/35] mm, slub: use migrate_disable() on PREEMPT_RT Vlastimil Babka
2021-08-05 15:20 ` [PATCH v4 35/35] mm, slub: convert kmem_cpu_slab protection to local_lock Vlastimil Babka
2021-08-15 12:27   ` Sven Eckelmann
2021-08-17  8:37     ` Vlastimil Babka
2021-08-17  9:12       ` Sebastian Andrzej Siewior
2021-08-17  9:17         ` Vlastimil Babka
2021-08-17  9:31           ` Sebastian Andrzej Siewior
2021-08-17  9:31         ` Vlastimil Babka
2021-08-17  9:34           ` Sebastian Andrzej Siewior
2021-08-17  9:13     ` Vlastimil Babka
2021-08-17 10:14   ` Vlastimil Babka
2021-08-17 19:53     ` Andrew Morton
2021-08-18 11:52       ` Vlastimil Babka
2021-08-23 20:36         ` Thomas Gleixner
2021-08-17 15:39   ` Sebastian Andrzej Siewior
2021-08-17 15:41     ` Vlastimil Babka
2021-08-17 15:49       ` Sebastian Andrzej Siewior
2021-08-17 15:56   ` Vlastimil Babka
2021-08-05 16:42 ` [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible Sebastian Andrzej Siewior
2021-08-06  5:14   ` Mike Galbraith [this message]
2021-08-06  7:45     ` Vlastimil Babka
2021-08-10 14:36 ` Vlastimil Babka
2021-08-15 10:18   ` Vlastimil Babka
2021-08-17 10:23     ` Vlastimil Babka
2021-08-17 15:59       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a36ad3a060da4f02808b2692891afbe35292e15b.camel@gmx.de \
    --to=efault@gmx.de \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=brouer@redhat.com \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).