linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANNOUNCE] v5.13-rt1
@ 2021-07-08  9:42 Thomas Gleixner
  2021-07-09  5:20 ` [patch] mm/slub: Fix kmem_cache_alloc_bulk() error path Mike Galbraith
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Thomas Gleixner @ 2021-07-08  9:42 UTC (permalink / raw)
  To: LKML
  Cc: linux-rt-users, Mel Gorman, Vlastimil Babka, Sebastian Andrzej Siewior

Dear RT folks!

I'm pleased to announce the v5.13-rt1 patch set.

Changes since v5.12-rc3-rt3:

  - Fast forward to v5.13

  - Rework of the locking core bits

  - Rework of large parts of the mm bits. Thanks to Mel Gorman and
    Vlastimil Babka for picking this up and polishing it with -mm
    wizardry.

  - The latest respin of the printk overhaul from John Ogness

  - Patch queue reordered

Known issues
  - config dependent build fails on ARM (also in plain v5.13)
  - netconsole triggers WARN.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git v5.13-rt1

The RT patch against v5.13 can be found here:

  https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.13/patch-5.13-rt1.patch.xz

The split quilt queue is available at:

  https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.13/patches-5.13-rt1.tar.xz

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [patch] mm/slub: Fix kmem_cache_alloc_bulk() error path
  2021-07-08  9:42 [ANNOUNCE] v5.13-rt1 Thomas Gleixner
@ 2021-07-09  5:20 ` Mike Galbraith
  2021-07-09  5:20 ` [patch] mm/slub: Replace do_slab_free() local_lock_irqsave/restore() calls in PREEMPT_RT scope Mike Galbraith
  2021-07-09  5:21 ` [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope Mike Galbraith
  2 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-07-09  5:20 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Vlastimil Babka, Sebastian Andrzej Siewior


The kmem_cache_alloc_bulk() error exit path double unlocks cpu_slab->lock
instead of making the required slub_put_cpu_ptr() call.  Fix that.

Boring details:
1. 12c69bab1ece ("mm, slub: move disabling/enabling irqs to ___slab_alloc())")
adds local_irq_enable() above goto error, leaving the one at error: intact.
2. 2180da7ea70a0 ("mm, slub: use migrate_disable() on PREEMPT_RT") adds
slub_get/put_cpu_ptr() calls, missing the already broken error path,
creating unpaired slub_put_cpu_ptr()/slub_put_cpu_ptr() calls.
3. 340e7c4136c3 ("mm, slub: convert kmem_cpu_slab protection to local_lock")
converts local_irq_enable() to local_unlock_irq(), culminating in a
double unlock and unpaired slub_put_cpu_ptr()/slub_put_cpu_ptr().

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 mm/slub.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3605,7 +3605,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca
 				slab_want_init_on_alloc(flags, s));
 	return i;
 error:
-	local_unlock_irq(&s->cpu_slab->lock);
+	slub_put_cpu_ptr(s->cpu_slab);
 	slab_post_alloc_hook(s, objcg, flags, i, p, false);
 	__kmem_cache_free_bulk(s, i, p);
 	return 0;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [patch] mm/slub: Replace do_slab_free() local_lock_irqsave/restore() calls in PREEMPT_RT scope
  2021-07-08  9:42 [ANNOUNCE] v5.13-rt1 Thomas Gleixner
  2021-07-09  5:20 ` [patch] mm/slub: Fix kmem_cache_alloc_bulk() error path Mike Galbraith
@ 2021-07-09  5:20 ` Mike Galbraith
  2021-07-09  5:21 ` [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope Mike Galbraith
  2 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-07-09  5:20 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Vlastimil Babka, Sebastian Andrzej Siewior


IRQ affecting local lock operations are intended for dual RT/!RT scope,
only affect IRQs in !RT scope.  Replace misplaced do_slab_free() calls
with plain local_lock()/local_unlock().  Purely cosmetic.

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 mm/slub.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3362,13 +3362,12 @@ static __always_inline void do_slab_free
 		 * we need to take the local_lock. We shouldn't simply defer to
 		 * __slab_free() as that wouldn't use the cpu freelist at all.
 		 */
-		unsigned long flags;
 		void **freelist;

-		local_lock_irqsave(&s->cpu_slab->lock, flags);
+		local_lock(&s->cpu_slab->lock);
 		c = this_cpu_ptr(s->cpu_slab);
 		if (unlikely(page != c->page)) {
-			local_unlock_irqrestore(&s->cpu_slab->lock, flags);
+			local_unlock(&s->cpu_slab->lock);
 			goto redo;
 		}
 		tid = c->tid;
@@ -3378,7 +3377,7 @@ static __always_inline void do_slab_free
 		c->freelist = head;
 		c->tid = next_tid(tid);

-		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
+		local_unlock(&s->cpu_slab->lock);
 #endif
 		stat(s, FREE_FASTPATH);
 	} else


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-08  9:42 [ANNOUNCE] v5.13-rt1 Thomas Gleixner
  2021-07-09  5:20 ` [patch] mm/slub: Fix kmem_cache_alloc_bulk() error path Mike Galbraith
  2021-07-09  5:20 ` [patch] mm/slub: Replace do_slab_free() local_lock_irqsave/restore() calls in PREEMPT_RT scope Mike Galbraith
@ 2021-07-09  5:21 ` Mike Galbraith
  2021-07-09  5:25   ` Mike Galbraith
  2021-07-09 19:28   ` Thomas Gleixner
  2 siblings, 2 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-07-09  5:21 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Vlastimil Babka, Sebastian Andrzej Siewior

Well, bug report in patch for actually.

Empirical evidence that PREEMPT_RT exclusion around unfreeze_partials()
is buggy lies in the readily reproducable PREEMPT_RT+SLUB_CPU_PARTIAL=y
explosion below, which this patch precludes.  Slub expertise required.

Dirt simple reproduction method:

terminal1:
while ! [ -f "/stop" ]; do
	./runltp -f zram; hackbench.sh; ./runltp -f controllers;
done

terminal2:
while ! [ -f "/stop" ]; do
	ccache -C; make clean; make -j8;
done

Wait for it...

[ 1321.540157] general protection fault, maybe for address 0xffffea0004624ea8: 0000 [#1] PREEMPT_RT SMP NOPTI
[ 1321.540162] CPU: 3 PID: 18442 Comm: dd Kdump: loaded Tainted: G            E     5.13.1-rt1-rt #5
[ 1321.540165] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 1321.540166] RIP: 0010:___slab_alloc.constprop.95+0x102/0xb00
[ 1321.540172] Code: 18 4d 85 db 66 89 55 b8 0f 95 c2 83 e0 7f c1 e2 07 09 d0 88 45 bb 41 f6 45 0b 40 4c 8b 65 b8 74 19 4c 89 d8 4c 89 f2 4c 89 e1 <f0> 49 0f c7 4f 20 0f 84 c6 00 00 00 f3 90 eb ab 4c 89 5d 80 9c 8f
[ 1321.540174] RSP: 0018:ffff8883008b3b70 EFLAGS: 00010202
[ 1321.540176] RAX: 0000000000190015 RBX: 0000000000000000 RCX: 00000001ffff7fff
[ 1321.540177] RDX: 00000001ffffffff RSI: 0000000000000023 RDI: ffffffff81e52c48
[ 1321.540179] RBP: ffff8883008b3c50 R08: ffffffffffffffff R09: 0000000000000000
[ 1321.540180] R10: ffff8883008b3c70 R11: 0000000000190015 R12: 00000001ffff7fff
[ 1321.540181] R13: ffff88810019dd00 R14: 00000001ffffffff R15: ffffea0004624e88
[ 1321.540182] FS:  00007fb763c4b580(0000) GS:ffff88840ecc0000(0000) knlGS:0000000000000000
[ 1321.540184] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1321.540185] CR2: 00007fb763c8b000 CR3: 0000000193f2c002 CR4: 00000000001706e0
[ 1321.540187] Call Trace:
[ 1321.540191]  ? __alloc_file+0x26/0xe0
[ 1321.540196]  ? migrate_enable+0x9c/0x100
[ 1321.540201]  ? __alloc_file+0x26/0xe0
[ 1321.540204]  ? __slab_alloc.isra.81.constprop.94+0x3d/0x50
[ 1321.540206]  ? __alloc_file+0x26/0xe0
[ 1321.540208]  __slab_alloc.isra.81.constprop.94+0x3d/0x50
[ 1321.540210]  ? __alloc_file+0x26/0xe0
[ 1321.540212]  kmem_cache_alloc+0xba/0x450
[ 1321.540215]  __alloc_file+0x26/0xe0
[ 1321.540218]  alloc_empty_file+0x43/0xe0
[ 1321.540221]  path_openat+0x35/0xe30
[ 1321.540224]  ? ___slab_alloc.constprop.95+0x48e/0xb00
[ 1321.540227]  ? filemap_map_pages+0xf0/0x3e0
[ 1321.540230]  ? getname_flags+0x32/0x170
[ 1321.540233]  do_filp_open+0xa2/0x100
[ 1321.540237]  ? getname_flags+0x32/0x170
[ 1321.540240]  ? migrate_enable+0x9c/0x100
[ 1321.540242]  ? __slab_alloc.isra.81.constprop.94+0x45/0x50
[ 1321.540245]  ? alloc_fd+0xe2/0x1b0
[ 1321.540249]  ? do_sys_openat2+0x248/0x310
[ 1321.540250]  do_sys_openat2+0x248/0x310
[ 1321.540253]  do_sys_open+0x47/0x60
[ 1321.540255]  do_syscall_64+0x39/0x80
[ 1321.540259]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1321.540262] RIP: 0033:0x7fb76378cb41
[ 1321.540264] Code: 41 83 e2 40 48 89 54 24 30 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 4c 24 18 64 48 33 0c 25 28 00 00 00
[ 1321.540266] RSP: 002b:00007fff10f2ad30 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
[ 1321.540268] RAX: ffffffffffffffda RBX: 0000563c1c09e170 RCX: 00007fb76378cb41
[ 1321.540269] RDX: 0000000000080000 RSI: 0000563c1c09e140 RDI: 00000000ffffff9c
[ 1321.540271] RBP: 00007fff10f2ae70 R08: 0000000000000000 R09: 00007fff10f2ae83
[ 1321.540272] R10: 0000000000000000 R11: 0000000000000287 R12: 0000563c1c09df48
[ 1321.540273] R13: 000000000000000a R14: 0000563c1c09df20 R15: 000000000000000a
[ 1321.540275] Modules linked in: zram(E) sr_mod(E) cdrom(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) xfs(E) libcrc32c(E) af_packet(E) ip6table_mangle(E) ip6table_raw(E) iptable_raw(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nfnetlink(E) ebtable_filter(E) rfkill(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) bpfilter(E) joydev(E) usblp(E) intel_rapl_msr(E) mei_hdcp(E) at24(E) regmap_i2c(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) nls_iso8859_1(E) nls_cp437(E) kvm_intel(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) kvm(E) ledtrig_audio(E) snd_hda_codec_hdmi(E) irqbypass(E) snd_hda_intel(E) crct10dif_pclmul(E) snd_intel_dspcfg(E) crc32_pclmul(E) ghash_clmulni_intel(E) snd_hda_codec(E) aesni_intel(E) snd_hwdep(E) crypto_simd(E) snd_hda_core(E) cryptd(E) r8169(E) snd_pcm(E) snd_timer(E) realtek(E) mei_me(E) mdio_devres(E) i2c_i801(E)
[ 1321.540320]  lpc_ich(E) snd(E) pcspkr(E) i2c_smbus(E) mfd_core(E) libphy(E) soundcore(E) mei(E) fan(E) thermal(E) intel_smartconnect(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sch_fq_codel(E) sunrpc(E) fuse(E) configfs(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) drm_ttm_helper(E) ttm(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ahci(E) cec(E) xhci_pci(E) libahci(E) rc_core(E) ehci_pci(E) xhci_hcd(E) ehci_hcd(E) libata(E) drm(E) usbcore(E) video(E) button(E) sd_mod(E) t10_pi(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) virtio_ring(E) virtio(E) ext4(E) crc32c_intel(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) msr(E) autofs4(E) [last unloaded: zram]
[ 1321.540366] Dumping ftrace buffer:
[ 1321.540369]    (ftrace buffer empty)

Not-signed-off-by: Mike Galbraith <efault@gmx.de>
---
 mm/slub.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2497,7 +2497,9 @@ static void put_cpu_partial(struct kmem_
 				 * partial array is full. Move the existing
 				 * set to the per node partial list.
 				 */
+				local_lock(&s->cpu_slab->lock);
 				unfreeze_partials(s);
+				local_unlock(&s->cpu_slab->lock);
 				oldpage = NULL;
 				pobjects = 0;
 				pages = 0;
@@ -2579,7 +2581,9 @@ static void flush_cpu_slab(struct work_s
 	if (c->page)
 		flush_slab(s, c, true);
 
+	local_lock(&s->cpu_slab->lock);
 	unfreeze_partials(s);
+	local_unlock(&s->cpu_slab->lock);
 }
 
 static bool has_cpu_slab(int cpu, struct kmem_cache *s)
@@ -2632,8 +2636,11 @@ static int slub_cpu_dead(unsigned int cp
 	struct kmem_cache *s;
 
 	mutex_lock(&slab_mutex);
-	list_for_each_entry(s, &slab_caches, list)
+	list_for_each_entry(s, &slab_caches, list) {
+		local_lock(&s->cpu_slab->lock);
 		__flush_cpu_slab(s, cpu);
+		local_unlock(&s->cpu_slab->lock);
+	}
 	mutex_unlock(&slab_mutex);
 	return 0;
 }


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-09  5:21 ` [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope Mike Galbraith
@ 2021-07-09  5:25   ` Mike Galbraith
  2021-07-09 19:28   ` Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-07-09  5:25 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Vlastimil Babka, Sebastian Andrzej Siewior

On Fri, 2021-07-09 at 07:21 +0200, Mike Galbraith wrote:
> Well, bug report in patch for actually.
                            ^^^form


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-09  5:21 ` [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope Mike Galbraith
  2021-07-09  5:25   ` Mike Galbraith
@ 2021-07-09 19:28   ` Thomas Gleixner
  2021-07-10  1:12     ` Mike Galbraith
  1 sibling, 1 reply; 36+ messages in thread
From: Thomas Gleixner @ 2021-07-09 19:28 UTC (permalink / raw)
  To: Mike Galbraith, LKML
  Cc: linux-rt-users, Mel Gorman, Vlastimil Babka, Sebastian Andrzej Siewior

On Fri, Jul 09 2021 at 07:21, Mike Galbraith wrote:
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2497,7 +2497,9 @@ static void put_cpu_partial(struct kmem_
>  				 * partial array is full. Move the existing
>  				 * set to the per node partial list.
>  				 */
> +				local_lock(&s->cpu_slab->lock);
>  				unfreeze_partials(s);
> +				local_unlock(&s->cpu_slab->lock);
>  				oldpage = NULL;
>  				pobjects = 0;
>  				pages = 0;
> @@ -2579,7 +2581,9 @@ static void flush_cpu_slab(struct work_s
>  	if (c->page)
>  		flush_slab(s, c, true);
>  
> +	local_lock(&s->cpu_slab->lock);
>  	unfreeze_partials(s);
> +	local_unlock(&s->cpu_slab->lock);
>  }
>  
>  static bool has_cpu_slab(int cpu, struct kmem_cache *s)
> @@ -2632,8 +2636,11 @@ static int slub_cpu_dead(unsigned int cp
>  	struct kmem_cache *s;
>  
>  	mutex_lock(&slab_mutex);
> -	list_for_each_entry(s, &slab_caches, list)
> +	list_for_each_entry(s, &slab_caches, list) {
> +		local_lock(&s->cpu_slab->lock);

This one is odd. It locks the cpu_slab lock of the CPU which runs this
callback and then flushes the slab of the dead CPU.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-09 19:28   ` Thomas Gleixner
@ 2021-07-10  1:12     ` Mike Galbraith
  2021-07-15 16:34       ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2021-07-10  1:12 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Vlastimil Babka, Sebastian Andrzej Siewior

On Fri, 2021-07-09 at 21:28 +0200, Thomas Gleixner wrote:
> On Fri, Jul 09 2021 at 07:21, Mike Galbraith wrote:
> > static bool has_cpu_slab(int cpu, struct kmem_cache *s)
> > @@ -2632,8 +2636,11 @@ static int slub_cpu_dead(unsigned int cp
> >  	struct kmem_cache *s;
> >
> >  	mutex_lock(&slab_mutex);
> > -	list_for_each_entry(s, &slab_caches, list)
> > +	list_for_each_entry(s, &slab_caches, list) {
> > +		local_lock(&s->cpu_slab->lock);
>
> This one is odd. It locks the cpu_slab lock of the CPU which runs
> thiscallback and then flushes the slab of the dead CPU.

That spot I put back only because it used to exist, ergo I documented
it as an afterthought.  Yeah, it clearly has nada to do with the
explosions, those were stopped by the other two as previously reported.

	-Mike


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-10  1:12     ` Mike Galbraith
@ 2021-07-15 16:34       ` Mike Galbraith
  2021-07-17 14:58         ` [patch] v2 " Mike Galbraith
  2021-07-20  8:56         ` [rfc/patch] " Vlastimil Babka
  0 siblings, 2 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-07-15 16:34 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Vlastimil Babka, Sebastian Andrzej Siewior

Greetings crickets,

Methinks he problem is the hole these patches opened only for RT.

static void put_cpu_partial(struct kmem_cache *s, struct page *page,
int drain)
{
#ifdef CONFIG_SLUB_CPU_PARTIAL
	struct page *oldpage;
	int pages;
	int pobjects;

	slub_get_cpu_ptr(s->cpu_slab);
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

That is an assertion that the stuff under it is preempt safe for RT and
ONLY RT.  My box says the RT portion of that assertion is horse pookey.
What it does is let kmem_cache_free()/kfree() blast straight through
___slab_alloc() critical sections, swapping out ->partial underneath
it.  Sprinkling percpu fingerprint power about, I saw lots of
___slab_alloc() preempts put_cpu_partial().. and vice versa.  I don't
think it's a coincidence that ___slab_alloc() and __unfreeze_partials()
both explode trying to access page->freelist.  You've seen the former,
here's one of the later.

crash> bt -sx
PID: 18761  TASK: ffff88812fff0000  CPU: 0   COMMAND: "hackbench"
 #0 [ffff88818f8ff980] machine_kexec+0x14f at ffffffff81051c8f
 #1 [ffff88818f8ff9c8] __crash_kexec+0xd2 at ffffffff8111ef72
 #2 [ffff88818f8ffa88] crash_kexec+0x30 at ffffffff8111fd10
 #3 [ffff88818f8ffa98] oops_end+0xd3 at ffffffff810267e3
 #4 [ffff88818f8ffab8] exc_general_protection+0x195 at ffffffff8179fdb5
 #5 [ffff88818f8ffb50] asm_exc_general_protection+0x1e at ffffffff81800a0e
    [exception RIP: __unfreeze_partials+156]
    RIP: ffffffff81248bac  RSP: ffff88818f8ffc00  RFLAGS: 00010202
    RAX: 0000000000200002  RBX: 0000000000200002  RCX: 000000017fffffff
    RDX: 00000001ffffffff  RSI: 0000000000000202  RDI: ffff888100040b80
    RBP: ffff88818f8ffca0   R8: ffff88818f9cba30   R9: 0000000000000001
    R10: ffff88818f8ffcc0  R11: 0000000000000000  R12: ffff888100043700
    R13: ffff888100040b80  R14: 00000001ffffffff  R15: ffffea00046c0808
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff88818f8ffcb8] put_cpu_partial+0x8e at ffffffff81248ece
 #7 [ffff88818f8ffcd8] consume_skb+0x2b at ffffffff815eddeb
 #8 [ffff88818f8ffce8] unix_stream_read_generic+0x788 at ffffffff8170b238
 #9 [ffff88818f8ffdc0] unix_stream_recvmsg+0x43 at ffffffff8170b433
#10 [ffff88818f8ffdf8] sock_read_iter+0x104 at ffffffff815dd554
#11 [ffff88818f8ffe68] new_sync_read+0x16a at ffffffff812674fa
#12 [ffff88818f8ffee8] vfs_read+0x1ae at ffffffff81269c8e
#13 [ffff88818f8fff00] ksys_read+0x40 at ffffffff8126a070
#14 [ffff88818f8fff38] do_syscall_64+0x37 at ffffffff8179f5e7
#15 [ffff88818f8fff50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffff8180007c
    RIP: 00007fbda4438f2e  RSP: 00007fff3bf9d798  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007fff3bf9e7a0  RCX: 00007fbda4438f2e
    RDX: 0000000000001000  RSI: 00007fff3bf9d7a0  RDI: 0000000000000007
    RBP: 00007fff3bf9e800   R8: 00007fff3bf9e6e0   R9: 00007fbda4641010
    R10: 00007fbda4428028  R11: 0000000000000246  R12: 0000000000001000
crash> dis __unfreeze_partials+156
0xffffffff81248bac <__unfreeze_partials+156>:   lock cmpxchg16b 0x20(%r15)
crash> gdb list *__unfreeze_partials+156
0xffffffff81248bac is in __unfreeze_partials (mm/slub.c:475).
470             if (!disable_irqs)
471                     lockdep_assert_irqs_disabled();
472     #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
473         defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
474             if (s->flags & __CMPXCHG_DOUBLE) {
475                     if (cmpxchg_double(&page->freelist, &page->counters,
476                                        freelist_old, counters_old,
477                                        freelist_new, counters_new))
478                             return true;
479             } else
crash> kmem ffffea00046c0808
      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
ffffea00104b3000 412cc0000                0        c  2 8000000000003000 reserved,private
crash

Regarding $subject, Lockdep thinks my hole plugging skills suck rocks
(shrug, I've given it plentiful cause) but that's ok, I sometimes think
the same of its bug reporting skills :)

[    2.459456] ============================================
[    2.459456] WARNING: possible recursive locking detected
[    2.459457] 5.14.0.g79e92006-tip-rt #83 Tainted: G            E
[    2.459458] --------------------------------------------
[    2.459458] rcuc/7/56 is trying to acquire lock:
[    2.459459] ffff88840edf01c0 (&(&c->lock)->lock){+.+.}-{0:0}, at: kfree+0x280/0x670
[    2.459466]
               but task is already holding lock:
[    2.459466] ffff88840edf4d60 (&(&c->lock)->lock){+.+.}-{0:0}, at: __slab_free+0x4d6/0x600
[    2.459469]

live kernel snooping....

crash> ps | grep UN
crash> bt -sx 56
PID: 56     TASK: ffff888100c19a40  CPU: 7   COMMAND: "rcuc/7"
 #0 [ffff888100c63e40] __schedule+0x2eb at ffffffff818969fb
 #1 [ffff888100c63ec8] schedule+0x3b at ffffffff8189723b
 #2 [ffff888100c63ee0] smpboot_thread_fn+0x18c at ffffffff810a90dc
 #3 [ffff888100c63f18] kthread+0x1af at ffffffff810a27bf
 #4 [ffff888100c63f50] ret_from_fork+0x1f at ffffffff81001cbf
crash> task_struct ffff888100c19a40 | grep __state
  __state = 1,
crash> gdb list *__slab_free+0x4d6
0xffffffff812923c6 is in __slab_free (mm/slub.c:2568).
2563                                    /*
2564                                     * partial array is full. Move the existing
2565                                     * set to the per node partial list.
2566                                     */
2567                                    local_lock(&s->cpu_slab->lock);
2568                                    unfreeze_partials(s);
2569                                    local_unlock(&s->cpu_slab->lock);
2570                                    oldpage = NULL;
2571                                    pobjects = 0;
2572                                    pages = 0;
crash> gdb list *kfree+0x280
0xffffffff81292770 is in kfree (mm/slub.c:3442).
3437                     * __slab_free() as that wouldn't use the cpu freelist at all.
3438                     */
3439                    void **freelist;
3440
3441                    local_lock(&s->cpu_slab->lock);
3442                    c = this_cpu_ptr(s->cpu_slab);
3443                    if (unlikely(page != c->page)) {
3444                            local_unlock(&s->cpu_slab->lock);
3445                            goto redo;
3446                    }
crash>

Check, those are what's in the stacktrace below... but the allegedly
deadlocked for real task is very much alive, as is the rest of box.

               other info that might help us debug this:
[    2.459470]  Possible unsafe locking scenario:

[    2.459470]        CPU0
[    2.459470]        ----
[    2.459471]   lock(&(&c->lock)->lock);
[    2.459471]   lock(&(&c->lock)->lock);
[    2.459472]
                *** DEADLOCK ***

[    2.459472]  May be due to missing lock nesting notation

[    2.459472] 6 locks held by rcuc/7/56:
[    2.459473]  #0: ffff88840edd9820 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0xc3/0x190
[    2.459479]  #1: ffffffff82382b80 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xd0
[    2.459484]  #2: ffffffff82382b80 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0xa0/0x190
[    2.459487]  #3: ffffffff82382ac0 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x195/0x520
[    2.459490]  #4: ffff88840edf4d60 (&(&c->lock)->lock){+.+.}-{0:0}, at: __slab_free+0x4d6/0x600
[    2.459493]  #5: ffffffff82382b80 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xd0
[    2.459496]
               stack backtrace:
[    2.459497] CPU: 7 PID: 56 Comm: rcuc/7 Tainted: G            E     5.14.0.g79e92006-tip-rt #83
[    2.459498] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[    2.459499] Call Trace:
[    2.459501]  dump_stack_lvl+0x44/0x57
[    2.459504]  __lock_acquire+0xb7f/0x1ab0
[    2.459508]  lock_acquire+0x2a6/0x340
[    2.459510]  ? kfree+0x280/0x670
[    2.459513]  ? __free_slab+0xa4/0x1b0
[    2.459515]  rt_spin_lock+0x2a/0xd0
[    2.459516]  ? kfree+0x280/0x670                   somewhere in the multiverse lies a dead rcuc/7
[    2.459518]  kfree+0x280/0x670                 <== local_lock(&s->cpu_slab->lock); #2
[    2.459521]  __free_slab+0xa4/0x1b0            ==> kfree(page_objcgs(page))
[    2.459523]  __unfreeze_partials+0x1d8/0x330   ==> discard_slab(s, page);
[    2.459526]  ? _raw_spin_unlock_irqrestore+0x30/0x80
[    2.459530]  ? __slab_free+0x4de/0x600
[    2.459532]  __slab_free+0x4de/0x600           <== local_lock(&s->cpu_slab->lock); #1
[    2.459534]  ? find_held_lock+0x2d/0x90
[    2.459536]  ? kmem_cache_free+0x276/0x630
[    2.459539]  ? rcu_do_batch+0x1c3/0x520
[    2.459540]  ? kmem_cache_free+0x364/0x630
[    2.459542]  kmem_cache_free+0x364/0x630
[    2.459544]  ? rcu_do_batch+0x195/0x520
[    2.459546]  rcu_do_batch+0x1c3/0x520
[    2.459547]  ? rcu_do_batch+0x195/0x520
[    2.459550]  ? rcu_cpu_kthread+0x7e/0x4b0
[    2.459552]  ? rcu_cpu_kthread+0x57/0x4b0
[    2.459553]  rcu_core+0x2c3/0x590
[    2.459555]  ? rcu_cpu_kthread+0x78/0x4b0
[    2.459557]  ? rcu_cpu_kthread+0x7e/0x4b0
[    2.459558]  ? rcu_cpu_kthread+0x57/0x4b0
[    2.459560]  rcu_cpu_kthread+0xc2/0x4b0
[    2.459562]  ? smpboot_thread_fn+0x23/0x300
[    2.459565]  smpboot_thread_fn+0x249/0x300
[    2.459567]  ? smpboot_register_percpu_thread+0xe0/0xe0
[    2.459569]  kthread+0x1af/0x1d0
[    2.459571]  ? set_kthread_struct+0x40/0x40
[    2.459574]  ret_from_fork+0x1f/0x30


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [patch] v2 mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-15 16:34       ` Mike Galbraith
@ 2021-07-17 14:58         ` Mike Galbraith
  2021-07-18  7:58           ` Vlastimil Babka
                             ` (3 more replies)
  2021-07-20  8:56         ` [rfc/patch] " Vlastimil Babka
  1 sibling, 4 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-07-17 14:58 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Vlastimil Babka, Sebastian Andrzej Siewior

On Thu, 2021-07-15 at 18:34 +0200, Mike Galbraith wrote:
> Greetings crickets,
>
> Methinks he problem is the hole these patches opened only for RT.
>
> static void put_cpu_partial(struct kmem_cache *s, struct page *page,
> int drain)
> {
> #ifdef CONFIG_SLUB_CPU_PARTIAL
> 	struct page *oldpage;
> 	int pages;
> 	int pobjects;
>
> 	slub_get_cpu_ptr(s->cpu_slab);
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Bah, I'm tired of waiting to see what if anything mm folks do about
this little bugger, so I'm gonna step on it my damn self and be done
with it.  Fly or die little patchlet.

mm/slub: restore/expand unfreeze_partials() local exclusion scope

2180da7ea70a ("mm, slub: use migrate_disable() on PREEMPT_RT") replaced
preempt_disable() in put_cpu_partial() with migrate_disable(), which when
combined with ___slab_alloc() having become preemptibile, leads to
kmem_cache_free()/kfree() blowing through ___slab_alloc() unimpeded,
and vice versa, resulting in PREMPT_RT exclusive explosions in both
paths while stress testing with both SLUB_CPU_PARTIAL/MEMCG enabled,
___slab_alloc() during allocation (duh), and __unfreeze_partials()
during free, both while accessing an unmapped page->freelist.

Serialize put_cpu_partial()/unfreeze_partials() on cpu_slab->lock to
ensure that alloc/free paths cannot pluck cpu_slab->partial out from
underneath each other unconstrained.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Fixes: 2180da7ea70a ("mm, slub: use migrate_disable() on PREEMPT_RT")
---
 mm/slub.c |   23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2418,6 +2418,17 @@ static void __unfreeze_partials(struct k
 	if (n)
 		spin_unlock_irqrestore(&n->list_lock, flags);

+	/*
+	 * If we got here via __slab_free() -> put_cpu_partial(),
+	 * memcg_free_page_obj_cgroups() ->kfree() may send us
+	 * back to __slab_free() -> put_cpu_partial() for an
+	 * unfortunate second encounter with cpu_slab->lock.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && memcg_kmem_enabled()) {
+		lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
+		local_unlock(&s->cpu_slab->lock);
+	}
+
 	while (discard_page) {
 		page = discard_page;
 		discard_page = discard_page->next;
@@ -2426,6 +2437,9 @@ static void __unfreeze_partials(struct k
 		discard_slab(s, page);
 		stat(s, FREE_SLAB);
 	}
+
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && memcg_kmem_enabled())
+		local_lock(&s->cpu_slab->lock);
 }

 /*
@@ -2497,7 +2511,9 @@ static void put_cpu_partial(struct kmem_
 				 * partial array is full. Move the existing
 				 * set to the per node partial list.
 				 */
+				local_lock(&s->cpu_slab->lock);
 				unfreeze_partials(s);
+				local_unlock(&s->cpu_slab->lock);
 				oldpage = NULL;
 				pobjects = 0;
 				pages = 0;
@@ -2579,7 +2595,9 @@ static void flush_cpu_slab(struct work_s
 	if (c->page)
 		flush_slab(s, c, true);

+	local_lock(&s->cpu_slab->lock);
 	unfreeze_partials(s);
+	local_unlock(&s->cpu_slab->lock);
 }

 static bool has_cpu_slab(int cpu, struct kmem_cache *s)
@@ -2632,8 +2650,11 @@ static int slub_cpu_dead(unsigned int cp
 	struct kmem_cache *s;

 	mutex_lock(&slab_mutex);
-	list_for_each_entry(s, &slab_caches, list)
+	list_for_each_entry(s, &slab_caches, list) {
+		local_lock(&s->cpu_slab->lock);
 		__flush_cpu_slab(s, cpu);
+		local_unlock(&s->cpu_slab->lock);
+	}
 	mutex_unlock(&slab_mutex);
 	return 0;
 }


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch] v2 mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-17 14:58         ` [patch] v2 " Mike Galbraith
@ 2021-07-18  7:58           ` Vlastimil Babka
  2021-07-18  8:11             ` Mike Galbraith
  2021-07-18 15:43           ` Mike Galbraith
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-18  7:58 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/17/21 4:58 PM, Mike Galbraith wrote:
> On Thu, 2021-07-15 at 18:34 +0200, Mike Galbraith wrote:
>> Greetings crickets,
>>
>> Methinks he problem is the hole these patches opened only for RT.
>>
>> static void put_cpu_partial(struct kmem_cache *s, struct page *page,
>> int drain)
>> {
>> #ifdef CONFIG_SLUB_CPU_PARTIAL
>> 	struct page *oldpage;
>> 	int pages;
>> 	int pobjects;
>>
>> 	slub_get_cpu_ptr(s->cpu_slab);
>>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Bah, I'm tired of waiting to see what if anything mm folks do about
> this little bugger, so I'm gonna step on it my damn self and be done
> with it.  Fly or die little patchlet.

Sorry, Mike. Got some badly timed sickness. Will check ASAP.

> mm/slub: restore/expand unfreeze_partials() local exclusion scope
> 
> 2180da7ea70a ("mm, slub: use migrate_disable() on PREEMPT_RT") replaced
> preempt_disable() in put_cpu_partial() with migrate_disable(), which when
> combined with ___slab_alloc() having become preemptibile, leads to
> kmem_cache_free()/kfree() blowing through ___slab_alloc() unimpeded,
> and vice versa, resulting in PREMPT_RT exclusive explosions in both
> paths while stress testing with both SLUB_CPU_PARTIAL/MEMCG enabled,
> ___slab_alloc() during allocation (duh), and __unfreeze_partials()
> during free, both while accessing an unmapped page->freelist.
> 
> Serialize put_cpu_partial()/unfreeze_partials() on cpu_slab->lock to
> ensure that alloc/free paths cannot pluck cpu_slab->partial out from
> underneath each other unconstrained.
> 
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> Fixes: 2180da7ea70a ("mm, slub: use migrate_disable() on PREEMPT_RT")
> ---
>  mm/slub.c |   23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2418,6 +2418,17 @@ static void __unfreeze_partials(struct k
>  	if (n)
>  		spin_unlock_irqrestore(&n->list_lock, flags);
> 
> +	/*
> +	 * If we got here via __slab_free() -> put_cpu_partial(),
> +	 * memcg_free_page_obj_cgroups() ->kfree() may send us
> +	 * back to __slab_free() -> put_cpu_partial() for an
> +	 * unfortunate second encounter with cpu_slab->lock.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && memcg_kmem_enabled()) {
> +		lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
> +		local_unlock(&s->cpu_slab->lock);
> +	}
> +
>  	while (discard_page) {
>  		page = discard_page;
>  		discard_page = discard_page->next;
> @@ -2426,6 +2437,9 @@ static void __unfreeze_partials(struct k
>  		discard_slab(s, page);
>  		stat(s, FREE_SLAB);
>  	}
> +
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && memcg_kmem_enabled())
> +		local_lock(&s->cpu_slab->lock);
>  }
> 
>  /*
> @@ -2497,7 +2511,9 @@ static void put_cpu_partial(struct kmem_
>  				 * partial array is full. Move the existing
>  				 * set to the per node partial list.
>  				 */
> +				local_lock(&s->cpu_slab->lock);
>  				unfreeze_partials(s);
> +				local_unlock(&s->cpu_slab->lock);
>  				oldpage = NULL;
>  				pobjects = 0;
>  				pages = 0;
> @@ -2579,7 +2595,9 @@ static void flush_cpu_slab(struct work_s
>  	if (c->page)
>  		flush_slab(s, c, true);
> 
> +	local_lock(&s->cpu_slab->lock);
>  	unfreeze_partials(s);
> +	local_unlock(&s->cpu_slab->lock);
>  }
> 
>  static bool has_cpu_slab(int cpu, struct kmem_cache *s)
> @@ -2632,8 +2650,11 @@ static int slub_cpu_dead(unsigned int cp
>  	struct kmem_cache *s;
> 
>  	mutex_lock(&slab_mutex);
> -	list_for_each_entry(s, &slab_caches, list)
> +	list_for_each_entry(s, &slab_caches, list) {
> +		local_lock(&s->cpu_slab->lock);
>  		__flush_cpu_slab(s, cpu);
> +		local_unlock(&s->cpu_slab->lock);
> +	}
>  	mutex_unlock(&slab_mutex);
>  	return 0;
>  }
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch] v2 mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-18  7:58           ` Vlastimil Babka
@ 2021-07-18  8:11             ` Mike Galbraith
  0 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-07-18  8:11 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Sun, 2021-07-18 at 09:58 +0200, Vlastimil Babka wrote:
> On 7/17/21 4:58 PM, Mike Galbraith wrote:
> > On Thu, 2021-07-15 at 18:34 +0200, Mike Galbraith wrote:
> > > Greetings crickets,
> > >
> > > Methinks he problem is the hole these patches opened only for RT.
> > >
> > > static void put_cpu_partial(struct kmem_cache *s, struct page
> > > *page,
> > > int drain)
> > > {
> > > #ifdef CONFIG_SLUB_CPU_PARTIAL
> > > 	struct page *oldpage;
> > > 	int pages;
> > > 	int pobjects;
> > >
> > > 	slub_get_cpu_ptr(s->cpu_slab);
> > >         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > Bah, I'm tired of waiting to see what if anything mm folks do about
> > this little bugger, so I'm gonna step on it my damn self and be
> > done
> > with it.  Fly or die little patchlet.
>
> Sorry, Mike. Got some badly timed sickness. Will check ASAP.

Hey, no apology necessary.  I have no way of knowing what's going on
behind the scenes, but I also didn't like the notion of just walking
away from a busted thing, so given the silence, figured it's better to
post the working something I've got (wart and all) than to let the bug
go unchallenged.  (take that damn bug.. pokes it with sharp stick;)

	-Mike


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch] v2 mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-17 14:58         ` [patch] v2 " Mike Galbraith
  2021-07-18  7:58           ` Vlastimil Babka
@ 2021-07-18 15:43           ` Mike Galbraith
  2021-07-18 21:19           ` Vlastimil Babka
  2021-07-20  2:46           ` kernel test robot
  3 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-07-18 15:43 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Vlastimil Babka, Sebastian Andrzej Siewior

It's moot, but for the record...

@@ -2418,6 +2418,17 @@ static void __unfreeze_partials(struct k
>  	if (n)
>  		spin_unlock_irqrestore(&n->list_lock, flags);
>
> +	/*
> +	 * If we got here via __slab_free() -> put_cpu_partial(),
> +	 * memcg_free_page_obj_cgroups() ->kfree() may send us
> +	 * back to __slab_free() -> put_cpu_partial() for an
> +	 * unfortunate second encounter with cpu_slab->lock.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && memcg_kmem_enabled()) {
> +		lockdep_assert_held(this_cpu_ptr(&s->cpu_slab.lock.lock));

...that assert needs to hide behind something less transparent.

	-Mike


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch] v2 mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-17 14:58         ` [patch] v2 " Mike Galbraith
  2021-07-18  7:58           ` Vlastimil Babka
  2021-07-18 15:43           ` Mike Galbraith
@ 2021-07-18 21:19           ` Vlastimil Babka
  2021-07-19  4:01             ` Mike Galbraith
  2021-07-20  2:46           ` kernel test robot
  3 siblings, 1 reply; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-18 21:19 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/17/21 4:58 PM, Mike Galbraith wrote:
> On Thu, 2021-07-15 at 18:34 +0200, Mike Galbraith wrote:
>> Greetings crickets,
>>
>> Methinks he problem is the hole these patches opened only for RT.
>>
>> static void put_cpu_partial(struct kmem_cache *s, struct page *page,
>> int drain)
>> {
>> #ifdef CONFIG_SLUB_CPU_PARTIAL
>> 	struct page *oldpage;
>> 	int pages;
>> 	int pobjects;
>>
>> 	slub_get_cpu_ptr(s->cpu_slab);
>>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Bah, I'm tired of waiting to see what if anything mm folks do about
> this little bugger, so I'm gonna step on it my damn self and be done
> with it.  Fly or die little patchlet.
> 
> mm/slub: restore/expand unfreeze_partials() local exclusion scope
> 
> 2180da7ea70a ("mm, slub: use migrate_disable() on PREEMPT_RT") replaced
> preempt_disable() in put_cpu_partial() with migrate_disable(), which when
> combined with ___slab_alloc() having become preemptibile, leads to
> kmem_cache_free()/kfree() blowing through ___slab_alloc() unimpeded,
> and vice versa, resulting in PREMPT_RT exclusive explosions in both
> paths while stress testing with both SLUB_CPU_PARTIAL/MEMCG enabled,
> ___slab_alloc() during allocation (duh), and __unfreeze_partials()
> during free, both while accessing an unmapped page->freelist.
> 
> Serialize put_cpu_partial()/unfreeze_partials() on cpu_slab->lock to

Hm you mention put_cpu_partial() but your patch handles only the
unfreeze_partial() call from that function? If I understand the problem
correctly, all modifications of cpu_slab->partial has to be protected
on RT after the local_lock conversion, thus also the one that
put_cpu_partial() does by itself (via this_cpu_cmpxchg).

On the other hand the slub_cpu_dead() part should really be unnecessary,
as tglx pointed out.

How about the patch below? It handles also the recursion issue
differently by not locking around __unfreeze_partials().
If that works, I can think of making it less ugly :/

----8<----
diff --git a/mm/slub.c b/mm/slub.c
index 581004a5aca9..1c7a41460941 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2437,6 +2437,9 @@ static void unfreeze_partials(struct kmem_cache *s)
 {
 	struct page *partial_page;
 
+#ifdef CONFIG_PREEMPT_RT
+	local_lock(&s->cpu_slab->lock);
+#endif
 	do {
 		partial_page = this_cpu_read(s->cpu_slab->partial);
 
@@ -2444,6 +2447,9 @@ static void unfreeze_partials(struct kmem_cache *s)
 		 this_cpu_cmpxchg(s->cpu_slab->partial, partial_page, NULL)
 				  != partial_page);
 
+#ifdef CONFIG_PREEMPT_RT
+	local_unlock(&s->cpu_slab->lock);
+#endif
 	if (partial_page)
 		__unfreeze_partials(s, partial_page);
 }
@@ -2482,7 +2488,11 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 	int pages;
 	int pobjects;
 
-	slub_get_cpu_ptr(s->cpu_slab);
+#ifndef CONFIG_PREEMPT_RT
+	get_cpu_ptr(s->cpu_slab);
+#else
+	local_lock(&s->cpu_slab->lock);
+#endif
 	do {
 		pages = 0;
 		pobjects = 0;
@@ -2496,7 +2506,15 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 				 * partial array is full. Move the existing
 				 * set to the per node partial list.
 				 */
+#ifndef CONFIG_PREEMPT_RT
 				unfreeze_partials(s);
+#else
+				this_cpu_write(s->cpu_slab->partial, NULL);
+				local_unlock(&s->cpu_slab->lock);
+				__unfreeze_partials(s, oldpage);
+				local_lock(&s->cpu_slab->lock);
+#endif
+
 				oldpage = NULL;
 				pobjects = 0;
 				pages = 0;
@@ -2513,7 +2531,11 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 
 	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
 								!= oldpage);
-	slub_put_cpu_ptr(s->cpu_slab);
+#ifndef CONFIG_PREMPT_RT
+	put_cpu_ptr(s->cpu_slab);
+#else
+	local_unlock(&s->cpu_slab->lock);
+#endif
 #endif	/* CONFIG_SLUB_CPU_PARTIAL */
 }
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch] v2 mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-18 21:19           ` Vlastimil Babka
@ 2021-07-19  4:01             ` Mike Galbraith
  2021-07-19 13:15               ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2021-07-19  4:01 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

(a pox on whomever broke evolution text quoting yet again)

On Sun, 2021-07-18 at 23:19 +0200, Vlastimil Babka wrote:
>
> Hm you mention put_cpu_partial() but your patch handles only the
> unfreeze_partial() call from that function?

Indeed, and that's because when I poked that slub_get_cpu_ptr(), it
poked me back by real deal deadlocking on boot, so I posted the
restoration of previously existing exclusion points with the
unlock/relock to appease lockdep (big hairy wart) bit instead.

>  If I understand the problem
> correctly, all modifications of cpu_slab->partial has to be protected
> on RT after the local_lock conversion, thus also the one that
> put_cpu_partial() does by itself (via this_cpu_cmpxchg).

Yup, that's what I concluded too, but box disagreed by not booting when
I did that, while seemingly being perfectly fine with the one in
put_cpu_partial() itself.  I figured my slub_get_cpu_ptr() replacement
had failed due to list_lock/local_lock order woes that I didn't notice
until after box bricked on me, but despite the unlock/relock in this
patch, it still manages to be a -ENOBOOT.

	-Mike


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch] v2 mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-19  4:01             ` Mike Galbraith
@ 2021-07-19 13:15               ` Mike Galbraith
  0 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-07-19 13:15 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Mon, 2021-07-19 at 06:01 +0200, Mike Galbraith wrote:
>
> ... despite the unlock/relock in this patch, it still manages to be a
> -ENOBOOT.

The somewhat bent up half of your patch below has been chugging away
long enough for me to speculate that __slab_free() vs ___slab_alloc()
is the *only* player in the explosion varieties I've been seeing.

---
 mm/slub.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2484,6 +2484,9 @@ static void put_cpu_partial(struct kmem_
 	int pobjects;

 	slub_get_cpu_ptr(s->cpu_slab);
+	/* __slab_free() vs ___slab_alloc() critical sections */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
+		local_lock(&s->cpu_slab->lock);
 	do {
 		pages = 0;
 		pobjects = 0;
@@ -2497,7 +2500,13 @@ static void put_cpu_partial(struct kmem_
 				 * partial array is full. Move the existing
 				 * set to the per node partial list.
 				 */
-				unfreeze_partials(s);
+				this_cpu_write(s->cpu_slab->partial, NULL);
+				if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
+					local_unlock(&s->cpu_slab->lock);
+				__unfreeze_partials(s, oldpage);
+				if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
+					local_lock(&s->cpu_slab->lock);
+
 				oldpage = NULL;
 				pobjects = 0;
 				pages = 0;
@@ -2514,6 +2523,8 @@ static void put_cpu_partial(struct kmem_

 	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
 								!= oldpage);
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
+		local_unlock(&s->cpu_slab->lock);
 	slub_put_cpu_ptr(s->cpu_slab);
 #endif	/* CONFIG_SLUB_CPU_PARTIAL */
 }


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch] v2 mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-17 14:58         ` [patch] v2 " Mike Galbraith
                             ` (2 preceding siblings ...)
  2021-07-18 21:19           ` Vlastimil Babka
@ 2021-07-20  2:46           ` kernel test robot
  3 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2021-07-20  2:46 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: kbuild-all, linux-rt-users, Mel Gorman, Vlastimil Babka,
	Sebastian Andrzej Siewior

[-- Attachment #1: Type: text/plain, Size: 8002 bytes --]

Hi Mike,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux-rt-devel/linux-5.13.y-rt-testing]
[cannot apply to hnaz-linux-mm/master linux/master linus/master v5.14-rc2 next-20210719]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Galbraith/v2-mm-slub-restore-expand-unfreeze_partials-local-exclusion-scope/20210718-092133
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-5.13.y-rt-testing
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/9398c1a5075cbaa4a610319cfceeab417d1f3e12
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Galbraith/v2-mm-slub-restore-expand-unfreeze_partials-local-exclusion-scope/20210718-092133
        git checkout 9398c1a5075cbaa4a610319cfceeab417d1f3e12
        # save the attached .config to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/bug.h:84,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:9,
                    from mm/slub.c:13:
   mm/slub.c: In function '__unfreeze_partials':
>> mm/slub.c:2428:54: error: 'local_lock_t' {aka 'struct <anonymous>'} has no member named 'lock'
    2428 |   lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
         |                                                      ^
   include/asm-generic/bug.h:119:25: note: in definition of macro 'WARN_ON'
     119 |  int __ret_warn_on = !!(condition);    \
         |                         ^~~~~~~~~
   include/linux/lockdep.h:311:4: note: in expansion of macro 'lockdep_is_held'
     311 |    lockdep_is_held(l) == LOCK_STATE_NOT_HELD); \
         |    ^~~~~~~~~~~~~~~
   mm/slub.c:2428:3: note: in expansion of macro 'lockdep_assert_held'
    2428 |   lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
         |   ^~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:241:2: note: in expansion of macro '__verify_pcpu_ptr'
     241 |  __verify_pcpu_ptr(ptr);      \
         |  ^~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:252:27: note: in expansion of macro 'raw_cpu_ptr'
     252 | #define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
         |                           ^~~~~~~~~~~
   mm/slub.c:2428:23: note: in expansion of macro 'this_cpu_ptr'
    2428 |   lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
         |                       ^~~~~~~~~~~~
>> mm/slub.c:2428:54: error: 'local_lock_t' {aka 'struct <anonymous>'} has no member named 'lock'
    2428 |   lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
         |                                                      ^
   include/asm-generic/bug.h:119:25: note: in definition of macro 'WARN_ON'
     119 |  int __ret_warn_on = !!(condition);    \
         |                         ^~~~~~~~~
   include/linux/lockdep.h:311:4: note: in expansion of macro 'lockdep_is_held'
     311 |    lockdep_is_held(l) == LOCK_STATE_NOT_HELD); \
         |    ^~~~~~~~~~~~~~~
   mm/slub.c:2428:3: note: in expansion of macro 'lockdep_assert_held'
    2428 |   lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
         |   ^~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:242:2: note: in expansion of macro 'arch_raw_cpu_ptr'
     242 |  arch_raw_cpu_ptr(ptr);      \
         |  ^~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:252:27: note: in expansion of macro 'raw_cpu_ptr'
     252 | #define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
         |                           ^~~~~~~~~~~
   mm/slub.c:2428:23: note: in expansion of macro 'this_cpu_ptr'
    2428 |   lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
         |                       ^~~~~~~~~~~~
>> mm/slub.c:2428:54: error: 'local_lock_t' {aka 'struct <anonymous>'} has no member named 'lock'
    2428 |   lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
         |                                                      ^
   include/asm-generic/bug.h:119:25: note: in definition of macro 'WARN_ON'
     119 |  int __ret_warn_on = !!(condition);    \
         |                         ^~~~~~~~~
   include/linux/lockdep.h:311:4: note: in expansion of macro 'lockdep_is_held'
     311 |    lockdep_is_held(l) == LOCK_STATE_NOT_HELD); \
         |    ^~~~~~~~~~~~~~~
   mm/slub.c:2428:3: note: in expansion of macro 'lockdep_assert_held'
    2428 |   lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
         |   ^~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:242:2: note: in expansion of macro 'arch_raw_cpu_ptr'
     242 |  arch_raw_cpu_ptr(ptr);      \
         |  ^~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:252:27: note: in expansion of macro 'raw_cpu_ptr'
     252 | #define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
         |                           ^~~~~~~~~~~
   mm/slub.c:2428:23: note: in expansion of macro 'this_cpu_ptr'
    2428 |   lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
         |                       ^~~~~~~~~~~~


vim +2428 mm/slub.c

  2369	
  2370	#ifdef CONFIG_SLUB_CPU_PARTIAL
  2371	static void __unfreeze_partials(struct kmem_cache *s, struct page *partial_page)
  2372	{
  2373		struct kmem_cache_node *n = NULL, *n2 = NULL;
  2374		struct page *page, *discard_page = NULL;
  2375		unsigned long flags = 0;
  2376	
  2377		while (partial_page) {
  2378			struct page new;
  2379			struct page old;
  2380	
  2381			page = partial_page;
  2382			partial_page = page->next;
  2383	
  2384			n2 = get_node(s, page_to_nid(page));
  2385			if (n != n2) {
  2386				if (n)
  2387					spin_unlock_irqrestore(&n->list_lock, flags);
  2388	
  2389				n = n2;
  2390				spin_lock_irqsave(&n->list_lock, flags);
  2391			}
  2392	
  2393			do {
  2394	
  2395				old.freelist = page->freelist;
  2396				old.counters = page->counters;
  2397				VM_BUG_ON(!old.frozen);
  2398	
  2399				new.counters = old.counters;
  2400				new.freelist = old.freelist;
  2401	
  2402				new.frozen = 0;
  2403	
  2404			} while (!__cmpxchg_double_slab(s, page,
  2405					old.freelist, old.counters,
  2406					new.freelist, new.counters,
  2407					"unfreezing slab"));
  2408	
  2409			if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) {
  2410				page->next = discard_page;
  2411				discard_page = page;
  2412			} else {
  2413				add_partial(n, page, DEACTIVATE_TO_TAIL);
  2414				stat(s, FREE_ADD_PARTIAL);
  2415			}
  2416		}
  2417	
  2418		if (n)
  2419			spin_unlock_irqrestore(&n->list_lock, flags);
  2420	
  2421		/*
  2422		 * If we got here via __slab_free() -> put_cpu_partial(),
  2423		 * memcg_free_page_obj_cgroups() ->kfree() may send us
  2424		 * back to __slab_free() -> put_cpu_partial() for an
  2425		 * unfortunate second encounter with cpu_slab->lock.
  2426		 */
  2427		if (IS_ENABLED(CONFIG_PREEMPT_RT) && memcg_kmem_enabled()) {
> 2428			lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
  2429			local_unlock(&s->cpu_slab->lock);
  2430		}
  2431	
  2432		while (discard_page) {
  2433			page = discard_page;
  2434			discard_page = discard_page->next;
  2435	
  2436			stat(s, DEACTIVATE_EMPTY);
  2437			discard_slab(s, page);
  2438			stat(s, FREE_SLAB);
  2439		}
  2440	
  2441		if (IS_ENABLED(CONFIG_PREEMPT_RT) && memcg_kmem_enabled())
  2442			local_lock(&s->cpu_slab->lock);
  2443	}
  2444	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 65639 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-15 16:34       ` Mike Galbraith
  2021-07-17 14:58         ` [patch] v2 " Mike Galbraith
@ 2021-07-20  8:56         ` Vlastimil Babka
  2021-07-20 11:26           ` Mike Galbraith
  1 sibling, 1 reply; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-20  8:56 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/15/21 6:34 PM, Mike Galbraith wrote:
> Greetings crickets,
> 
> Methinks he problem is the hole these patches opened only for RT.
> 
> static void put_cpu_partial(struct kmem_cache *s, struct page *page,
> int drain)
> {
> #ifdef CONFIG_SLUB_CPU_PARTIAL
> 	struct page *oldpage;
> 	int pages;
> 	int pobjects;
> 
> 	slub_get_cpu_ptr(s->cpu_slab);
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> That is an assertion that the stuff under it is preempt safe for RT and
> ONLY RT.  My box says the RT portion of that assertion is horse pookey.
> What it does is let kmem_cache_free()/kfree() blast straight through
> ___slab_alloc() critical sections, swapping out ->partial underneath
> it.  Sprinkling percpu fingerprint power about, I saw lots of
> ___slab_alloc() preempts put_cpu_partial().. and vice versa.  I don't
> think it's a coincidence that ___slab_alloc() and __unfreeze_partials()
> both explode trying to access page->freelist.  You've seen the former,
> here's one of the later.
> 
> crash> bt -sx
> PID: 18761  TASK: ffff88812fff0000  CPU: 0   COMMAND: "hackbench"
>  #0 [ffff88818f8ff980] machine_kexec+0x14f at ffffffff81051c8f
>  #1 [ffff88818f8ff9c8] __crash_kexec+0xd2 at ffffffff8111ef72
>  #2 [ffff88818f8ffa88] crash_kexec+0x30 at ffffffff8111fd10
>  #3 [ffff88818f8ffa98] oops_end+0xd3 at ffffffff810267e3
>  #4 [ffff88818f8ffab8] exc_general_protection+0x195 at ffffffff8179fdb5
>  #5 [ffff88818f8ffb50] asm_exc_general_protection+0x1e at ffffffff81800a0e
>     [exception RIP: __unfreeze_partials+156]

Hm going back to this report, if that's a direct result of __slab_alloc
preempting (interrupting) put_cpu_partial() then I think that's
something that could happen even on !RT as all we do in the
put_cpu_partial() there is disable preemption, and the allocation could
come in irq handler.
It would mean the whole idea of "mm, slub: detach percpu partial list in
unfreeze_partials() using this_cpu_cmpxchg()" isn't safe. I'll have to
ponder it.
But we didn't see crashes on !RT yet. So could it be that it was still
put_cpu_partial() preempting __slab_alloc() messing the partial list,
but for some reason the put_cpu_partial() side crashed this time?

>     RIP: ffffffff81248bac  RSP: ffff88818f8ffc00  RFLAGS: 00010202
>     RAX: 0000000000200002  RBX: 0000000000200002  RCX: 000000017fffffff
>     RDX: 00000001ffffffff  RSI: 0000000000000202  RDI: ffff888100040b80
>     RBP: ffff88818f8ffca0   R8: ffff88818f9cba30   R9: 0000000000000001
>     R10: ffff88818f8ffcc0  R11: 0000000000000000  R12: ffff888100043700
>     R13: ffff888100040b80  R14: 00000001ffffffff  R15: ffffea00046c0808
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #6 [ffff88818f8ffcb8] put_cpu_partial+0x8e at ffffffff81248ece
>  #7 [ffff88818f8ffcd8] consume_skb+0x2b at ffffffff815eddeb
>  #8 [ffff88818f8ffce8] unix_stream_read_generic+0x788 at ffffffff8170b238
>  #9 [ffff88818f8ffdc0] unix_stream_recvmsg+0x43 at ffffffff8170b433
> #10 [ffff88818f8ffdf8] sock_read_iter+0x104 at ffffffff815dd554
> #11 [ffff88818f8ffe68] new_sync_read+0x16a at ffffffff812674fa
> #12 [ffff88818f8ffee8] vfs_read+0x1ae at ffffffff81269c8e
> #13 [ffff88818f8fff00] ksys_read+0x40 at ffffffff8126a070
> #14 [ffff88818f8fff38] do_syscall_64+0x37 at ffffffff8179f5e7
> #15 [ffff88818f8fff50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffff8180007c
>     RIP: 00007fbda4438f2e  RSP: 00007fff3bf9d798  RFLAGS: 00000246
>     RAX: ffffffffffffffda  RBX: 00007fff3bf9e7a0  RCX: 00007fbda4438f2e
>     RDX: 0000000000001000  RSI: 00007fff3bf9d7a0  RDI: 0000000000000007
>     RBP: 00007fff3bf9e800   R8: 00007fff3bf9e6e0   R9: 00007fbda4641010
>     R10: 00007fbda4428028  R11: 0000000000000246  R12: 0000000000001000
> crash> dis __unfreeze_partials+156
> 0xffffffff81248bac <__unfreeze_partials+156>:   lock cmpxchg16b 0x20(%r15)
> crash> gdb list *__unfreeze_partials+156
> 0xffffffff81248bac is in __unfreeze_partials (mm/slub.c:475).
> 470             if (!disable_irqs)
> 471                     lockdep_assert_irqs_disabled();
> 472     #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
> 473         defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
> 474             if (s->flags & __CMPXCHG_DOUBLE) {
> 475                     if (cmpxchg_double(&page->freelist, &page->counters,
> 476                                        freelist_old, counters_old,
> 477                                        freelist_new, counters_new))
> 478                             return true;
> 479             } else
> crash> kmem ffffea00046c0808
>       PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
> ffffea00104b3000 412cc0000                0        c  2 8000000000003000 reserved,private
> crash
> 
> Regarding $subject, Lockdep thinks my hole plugging skills suck rocks
> (shrug, I've given it plentiful cause) but that's ok, I sometimes think
> the same of its bug reporting skills :)
> 
> [    2.459456] ============================================
> [    2.459456] WARNING: possible recursive locking detected
> [    2.459457] 5.14.0.g79e92006-tip-rt #83 Tainted: G            E
> [    2.459458] --------------------------------------------
> [    2.459458] rcuc/7/56 is trying to acquire lock:
> [    2.459459] ffff88840edf01c0 (&(&c->lock)->lock){+.+.}-{0:0}, at: kfree+0x280/0x670
> [    2.459466]
>                but task is already holding lock:
> [    2.459466] ffff88840edf4d60 (&(&c->lock)->lock){+.+.}-{0:0}, at: __slab_free+0x4d6/0x600
> [    2.459469]
> 
> live kernel snooping....
> 
> crash> ps | grep UN
> crash> bt -sx 56
> PID: 56     TASK: ffff888100c19a40  CPU: 7   COMMAND: "rcuc/7"
>  #0 [ffff888100c63e40] __schedule+0x2eb at ffffffff818969fb
>  #1 [ffff888100c63ec8] schedule+0x3b at ffffffff8189723b
>  #2 [ffff888100c63ee0] smpboot_thread_fn+0x18c at ffffffff810a90dc
>  #3 [ffff888100c63f18] kthread+0x1af at ffffffff810a27bf
>  #4 [ffff888100c63f50] ret_from_fork+0x1f at ffffffff81001cbf
> crash> task_struct ffff888100c19a40 | grep __state
>   __state = 1,
> crash> gdb list *__slab_free+0x4d6
> 0xffffffff812923c6 is in __slab_free (mm/slub.c:2568).
> 2563                                    /*
> 2564                                     * partial array is full. Move the existing
> 2565                                     * set to the per node partial list.
> 2566                                     */
> 2567                                    local_lock(&s->cpu_slab->lock);
> 2568                                    unfreeze_partials(s);
> 2569                                    local_unlock(&s->cpu_slab->lock);
> 2570                                    oldpage = NULL;
> 2571                                    pobjects = 0;
> 2572                                    pages = 0;
> crash> gdb list *kfree+0x280
> 0xffffffff81292770 is in kfree (mm/slub.c:3442).
> 3437                     * __slab_free() as that wouldn't use the cpu freelist at all.
> 3438                     */
> 3439                    void **freelist;
> 3440
> 3441                    local_lock(&s->cpu_slab->lock);
> 3442                    c = this_cpu_ptr(s->cpu_slab);
> 3443                    if (unlikely(page != c->page)) {
> 3444                            local_unlock(&s->cpu_slab->lock);
> 3445                            goto redo;
> 3446                    }
> crash>
> 
> Check, those are what's in the stacktrace below... but the allegedly
> deadlocked for real task is very much alive, as is the rest of box.
> 
>                other info that might help us debug this:
> [    2.459470]  Possible unsafe locking scenario:
> 
> [    2.459470]        CPU0
> [    2.459470]        ----
> [    2.459471]   lock(&(&c->lock)->lock);
> [    2.459471]   lock(&(&c->lock)->lock);
> [    2.459472]
>                 *** DEADLOCK ***
> 
> [    2.459472]  May be due to missing lock nesting notation
> 
> [    2.459472] 6 locks held by rcuc/7/56:
> [    2.459473]  #0: ffff88840edd9820 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0xc3/0x190
> [    2.459479]  #1: ffffffff82382b80 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xd0
> [    2.459484]  #2: ffffffff82382b80 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0xa0/0x190
> [    2.459487]  #3: ffffffff82382ac0 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x195/0x520
> [    2.459490]  #4: ffff88840edf4d60 (&(&c->lock)->lock){+.+.}-{0:0}, at: __slab_free+0x4d6/0x600
> [    2.459493]  #5: ffffffff82382b80 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xd0
> [    2.459496]
>                stack backtrace:
> [    2.459497] CPU: 7 PID: 56 Comm: rcuc/7 Tainted: G            E     5.14.0.g79e92006-tip-rt #83
> [    2.459498] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [    2.459499] Call Trace:
> [    2.459501]  dump_stack_lvl+0x44/0x57
> [    2.459504]  __lock_acquire+0xb7f/0x1ab0
> [    2.459508]  lock_acquire+0x2a6/0x340
> [    2.459510]  ? kfree+0x280/0x670
> [    2.459513]  ? __free_slab+0xa4/0x1b0
> [    2.459515]  rt_spin_lock+0x2a/0xd0
> [    2.459516]  ? kfree+0x280/0x670                   somewhere in the multiverse lies a dead rcuc/7
> [    2.459518]  kfree+0x280/0x670                 <== local_lock(&s->cpu_slab->lock); #2
> [    2.459521]  __free_slab+0xa4/0x1b0            ==> kfree(page_objcgs(page))
> [    2.459523]  __unfreeze_partials+0x1d8/0x330   ==> discard_slab(s, page);
> [    2.459526]  ? _raw_spin_unlock_irqrestore+0x30/0x80
> [    2.459530]  ? __slab_free+0x4de/0x600
> [    2.459532]  __slab_free+0x4de/0x600           <== local_lock(&s->cpu_slab->lock); #1
> [    2.459534]  ? find_held_lock+0x2d/0x90
> [    2.459536]  ? kmem_cache_free+0x276/0x630
> [    2.459539]  ? rcu_do_batch+0x1c3/0x520
> [    2.459540]  ? kmem_cache_free+0x364/0x630
> [    2.459542]  kmem_cache_free+0x364/0x630
> [    2.459544]  ? rcu_do_batch+0x195/0x520
> [    2.459546]  rcu_do_batch+0x1c3/0x520
> [    2.459547]  ? rcu_do_batch+0x195/0x520
> [    2.459550]  ? rcu_cpu_kthread+0x7e/0x4b0
> [    2.459552]  ? rcu_cpu_kthread+0x57/0x4b0
> [    2.459553]  rcu_core+0x2c3/0x590
> [    2.459555]  ? rcu_cpu_kthread+0x78/0x4b0
> [    2.459557]  ? rcu_cpu_kthread+0x7e/0x4b0
> [    2.459558]  ? rcu_cpu_kthread+0x57/0x4b0
> [    2.459560]  rcu_cpu_kthread+0xc2/0x4b0
> [    2.459562]  ? smpboot_thread_fn+0x23/0x300
> [    2.459565]  smpboot_thread_fn+0x249/0x300
> [    2.459567]  ? smpboot_register_percpu_thread+0xe0/0xe0
> [    2.459569]  kthread+0x1af/0x1d0
> [    2.459571]  ? set_kthread_struct+0x40/0x40
> [    2.459574]  ret_from_fork+0x1f/0x30
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-20  8:56         ` [rfc/patch] " Vlastimil Babka
@ 2021-07-20 11:26           ` Mike Galbraith
  2021-07-21  4:56             ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2021-07-20 11:26 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Tue, 2021-07-20 at 10:56 +0200, Vlastimil Babka wrote:
>
> > crash> bt -sx
> > PID: 18761  TASK: ffff88812fff0000  CPU: 0   COMMAND: "hackbench"
> >  #0 [ffff88818f8ff980] machine_kexec+0x14f at ffffffff81051c8f
> >  #1 [ffff88818f8ff9c8] __crash_kexec+0xd2 at ffffffff8111ef72
> >  #2 [ffff88818f8ffa88] crash_kexec+0x30 at ffffffff8111fd10
> >  #3 [ffff88818f8ffa98] oops_end+0xd3 at ffffffff810267e3
> >  #4 [ffff88818f8ffab8] exc_general_protection+0x195 at ffffffff8179fdb5
> >  #5 [ffff88818f8ffb50] asm_exc_general_protection+0x1e at ffffffff81800a0e
> >     [exception RIP: __unfreeze_partials+156]
>
> Hm going back to this report...

> So could it be that it was stillput_cpu_partial() preempting
> __slab_alloc() messing the partial list, but for some reason the
> put_cpu_partial() side crashed this time?

Thinking this bug is toast, I emptied the trash bin, so no can peek.

If need be, and nobody beats me to it (which ~guarantees nobody will
beat me to it;), I can make the thing go boom again easily enough.

	-Mike


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-20 11:26           ` Mike Galbraith
@ 2021-07-21  4:56             ` Mike Galbraith
  2021-07-21  8:44               ` Vlastimil Babka
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2021-07-21  4:56 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Tue, 2021-07-20 at 13:26 +0200, Mike Galbraith wrote:
> On Tue, 2021-07-20 at 10:56 +0200, Vlastimil Babka wrote:
> > > crash> bt -sx
> > > PID: 18761  TASK: ffff88812fff0000  CPU: 0   COMMAND: "hackbench"
> > >  #0 [ffff88818f8ff980] machine_kexec+0x14f at ffffffff81051c8f
> > >  #1 [ffff88818f8ff9c8] __crash_kexec+0xd2 at ffffffff8111ef72
> > >  #2 [ffff88818f8ffa88] crash_kexec+0x30 at ffffffff8111fd10
> > >  #3 [ffff88818f8ffa98] oops_end+0xd3 at ffffffff810267e3
> > >  #4 [ffff88818f8ffab8] exc_general_protection+0x195 at
> > > ffffffff8179fdb5
> > >  #5 [ffff88818f8ffb50] asm_exc_general_protection+0x1e at
> > > ffffffff81800a0e
> > >     [exception RIP: __unfreeze_partials+156]
> >
> > Hm going back to this report...
> > So could it be that it was stillput_cpu_partial() preempting
> > __slab_alloc() messing the partial list, but for some reason the
> > put_cpu_partial() side crashed this time?
>
> Thinking this bug is toast, I emptied the trash bin, so no can peek.

I made fireworks while waiting for bike riding time, boom #10 was
finally the right flavor, but...

crash> bt -sx
PID: 32     TASK: ffff888100a56000  CPU: 3   COMMAND: "rcuc/3"
 #0 [ffff888100aa7a90] machine_kexec+0x14f at ffffffff81051c8f
 #1 [ffff888100aa7ad8] __crash_kexec+0xd2 at ffffffff81120612
 #2 [ffff888100aa7b98] crash_kexec+0x30 at ffffffff811213b0
 #3 [ffff888100aa7ba8] oops_end+0xd3 at ffffffff810267e3
 #4 [ffff888100aa7bc8] exc_general_protection+0x195 at ffffffff817a2cc5
 #5 [ffff888100aa7c60] asm_exc_general_protection+0x1e at ffffffff81800a0e
    [exception RIP: __unfreeze_partials+149]
    RIP: ffffffff8124a295  RSP: ffff888100aa7d10  RFLAGS: 00010202
    RAX: 0000000000190016  RBX: 0000000000190016  RCX: 000000017fffffff
    RDX: 00000001ffffffff  RSI: 0000000000000023  RDI: ffffffff81e58b10
    RBP: ffff888100aa7da0   R8: 0000000000000000   R9: 0000000000190018
    R10: ffff888100aa7db8  R11: 000000000002d9e4  R12: ffff888100190500
    R13: ffff88810018c980  R14: 00000001ffffffff  R15: ffffea0004571588
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff888100aa7db0] put_cpu_partial+0x8e at ffffffff8124a56e
 #7 [ffff888100aa7dd0] kmem_cache_free+0x3a8 at ffffffff8124d238
 #8 [ffff888100aa7e08] rcu_do_batch+0x186 at ffffffff810eb246
 #9 [ffff888100aa7e70] rcu_core+0x25f at ffffffff810eeb2f
#10 [ffff888100aa7eb0] rcu_cpu_kthread+0x94 at ffffffff810eed24
#11 [ffff888100aa7ee0] smpboot_thread_fn+0x249 at ffffffff8109e559
#12 [ffff888100aa7f18] kthread+0x1ac at ffffffff810984dc
#13 [ffff888100aa7f50] ret_from_fork+0x1f at ffffffff81001b1f
crash> runq
...
CPU 3 RUNQUEUE: ffff88840ece9980
  CURRENT: PID: 32     TASK: ffff888100a56000  COMMAND: "rcuc/3"
  RT PRIO_ARRAY: ffff88840ece9bc0
     [ 94] PID: 32     TASK: ffff888100a56000  COMMAND: "rcuc/3"
  CFS RB_ROOT: ffff88840ece9a40
     [120] PID: 33     TASK: ffff888100a51000  COMMAND: "ksoftirqd/3"
...
crash> bt -sx 33
PID: 33     TASK: ffff888100a51000  CPU: 3   COMMAND: "ksoftirqd/3"
 #0 [ffff888100aabdf0] __schedule+0x2d7 at ffffffff817ad3a7
 #1 [ffff888100aabec8] schedule+0x3b at ffffffff817ae4eb
 #2 [ffff888100aabee0] smpboot_thread_fn+0x18c at ffffffff8109e49c
 #3 [ffff888100aabf18] kthread+0x1ac at ffffffff810984dc
 #4 [ffff888100aabf50] ret_from_fork+0x1f at ffffffff81001b1f
crash>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-21  4:56             ` Mike Galbraith
@ 2021-07-21  8:44               ` Vlastimil Babka
  2021-07-21  9:33                 ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-21  8:44 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/21/21 6:56 AM, Mike Galbraith wrote:
> On Tue, 2021-07-20 at 13:26 +0200, Mike Galbraith wrote:
>> On Tue, 2021-07-20 at 10:56 +0200, Vlastimil Babka wrote:
>> > > crash> bt -sx
>> > > PID: 18761  TASK: ffff88812fff0000  CPU: 0   COMMAND: "hackbench"
>> > >  #0 [ffff88818f8ff980] machine_kexec+0x14f at ffffffff81051c8f
>> > >  #1 [ffff88818f8ff9c8] __crash_kexec+0xd2 at ffffffff8111ef72
>> > >  #2 [ffff88818f8ffa88] crash_kexec+0x30 at ffffffff8111fd10
>> > >  #3 [ffff88818f8ffa98] oops_end+0xd3 at ffffffff810267e3
>> > >  #4 [ffff88818f8ffab8] exc_general_protection+0x195 at
>> > > ffffffff8179fdb5
>> > >  #5 [ffff88818f8ffb50] asm_exc_general_protection+0x1e at
>> > > ffffffff81800a0e
>> > >     [exception RIP: __unfreeze_partials+156]
>> >
>> > Hm going back to this report...
>> > So could it be that it was stillput_cpu_partial() preempting
>> > __slab_alloc() messing the partial list, but for some reason the
>> > put_cpu_partial() side crashed this time?
>>
>> Thinking this bug is toast, I emptied the trash bin, so no can peek.
> 
> I made fireworks while waiting for bike riding time, boom #10 was
> finally the right flavor, but...
> 
> crash> bt -sx
> PID: 32     TASK: ffff888100a56000  CPU: 3   COMMAND: "rcuc/3"
>  #0 [ffff888100aa7a90] machine_kexec+0x14f at ffffffff81051c8f
>  #1 [ffff888100aa7ad8] __crash_kexec+0xd2 at ffffffff81120612
>  #2 [ffff888100aa7b98] crash_kexec+0x30 at ffffffff811213b0
>  #3 [ffff888100aa7ba8] oops_end+0xd3 at ffffffff810267e3
>  #4 [ffff888100aa7bc8] exc_general_protection+0x195 at ffffffff817a2cc5
>  #5 [ffff888100aa7c60] asm_exc_general_protection+0x1e at ffffffff81800a0e
>     [exception RIP: __unfreeze_partials+149]
>     RIP: ffffffff8124a295  RSP: ffff888100aa7d10  RFLAGS: 00010202
>     RAX: 0000000000190016  RBX: 0000000000190016  RCX: 000000017fffffff
>     RDX: 00000001ffffffff  RSI: 0000000000000023  RDI: ffffffff81e58b10
>     RBP: ffff888100aa7da0   R8: 0000000000000000   R9: 0000000000190018
>     R10: ffff888100aa7db8  R11: 000000000002d9e4  R12: ffff888100190500
>     R13: ffff88810018c980  R14: 00000001ffffffff  R15: ffffea0004571588
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #6 [ffff888100aa7db0] put_cpu_partial+0x8e at ffffffff8124a56e
>  #7 [ffff888100aa7dd0] kmem_cache_free+0x3a8 at ffffffff8124d238
>  #8 [ffff888100aa7e08] rcu_do_batch+0x186 at ffffffff810eb246
>  #9 [ffff888100aa7e70] rcu_core+0x25f at ffffffff810eeb2f
> #10 [ffff888100aa7eb0] rcu_cpu_kthread+0x94 at ffffffff810eed24
> #11 [ffff888100aa7ee0] smpboot_thread_fn+0x249 at ffffffff8109e559
> #12 [ffff888100aa7f18] kthread+0x1ac at ffffffff810984dc
> #13 [ffff888100aa7f50] ret_from_fork+0x1f at ffffffff81001b1f
> crash> runq
> ...
> CPU 3 RUNQUEUE: ffff88840ece9980
>   CURRENT: PID: 32     TASK: ffff888100a56000  COMMAND: "rcuc/3"
>   RT PRIO_ARRAY: ffff88840ece9bc0
>      [ 94] PID: 32     TASK: ffff888100a56000  COMMAND: "rcuc/3"
>   CFS RB_ROOT: ffff88840ece9a40
>      [120] PID: 33     TASK: ffff888100a51000  COMMAND: "ksoftirqd/3"
> ...
> crash> bt -sx 33
> PID: 33     TASK: ffff888100a51000  CPU: 3   COMMAND: "ksoftirqd/3"
>  #0 [ffff888100aabdf0] __schedule+0x2d7 at ffffffff817ad3a7
>  #1 [ffff888100aabec8] schedule+0x3b at ffffffff817ae4eb
>  #2 [ffff888100aabee0] smpboot_thread_fn+0x18c at ffffffff8109e49c
>  #3 [ffff888100aabf18] kthread+0x1ac at ffffffff810984dc
>  #4 [ffff888100aabf50] ret_from_fork+0x1f at ffffffff81001b1f
> crash>

So this doesn't look like our put_cpu_partial() preempted a __slab_alloc() on
the same cpu, right? There might have been __slab_alloc() in irq handler
preempting us, but we won't see that anymore. I don't immediately see the root
cause and this scenario should be possible on !RT too where we however didn't
see these explosions.

BTW did my ugly patch work?

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-21  8:44               ` Vlastimil Babka
@ 2021-07-21  9:33                 ` Mike Galbraith
  2021-07-23 22:39                   ` Vlastimil Babka
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2021-07-21  9:33 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Wed, 2021-07-21 at 10:44 +0200, Vlastimil Babka wrote:
>
> So this doesn't look like our put_cpu_partial() preempted a
> __slab_alloc() on the same cpu, right?

No, likely it was the one preempted by someone long gone, but we'll
never know without setting a trap.

> BTW did my ugly patch work?

Nope.  I guess you missed my reporting it to have been a -ENOBOOT, and
that cutting it in half, ie snagging only __slab_free() does boot, and
seems to cure all of the RT fireworks.

(chainsaw noises...)

---
 mm/slub.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2551,6 +2551,8 @@ static void put_cpu_partial(struct kmem_
 	int pobjects;

 	slub_get_cpu_ptr(s->cpu_slab);
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
+		local_lock(&s->cpu_slab->lock);
 	do {
 		pages = 0;
 		pobjects = 0;
@@ -2564,7 +2566,13 @@ static void put_cpu_partial(struct kmem_
 				 * partial array is full. Move the existing
 				 * set to the per node partial list.
 				 */
-				unfreeze_partials(s);
+				this_cpu_write(s->cpu_slab->partial, NULL);
+				if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
+					local_unlock(&s->cpu_slab->lock);
+				__unfreeze_partials(s, oldpage);
+				if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
+					local_lock(&s->cpu_slab->lock);
+
 				oldpage = NULL;
 				pobjects = 0;
 				pages = 0;
@@ -2581,6 +2589,8 @@ static void put_cpu_partial(struct kmem_

 	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
 								!= oldpage);
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
+		local_unlock(&s->cpu_slab->lock);
 	slub_put_cpu_ptr(s->cpu_slab);
 #endif	/* CONFIG_SLUB_CPU_PARTIAL */
 }



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-21  9:33                 ` Mike Galbraith
@ 2021-07-23 22:39                   ` Vlastimil Babka
  2021-07-24  2:25                     ` Mike Galbraith
  2021-07-25 14:09                     ` Mike Galbraith
  0 siblings, 2 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-23 22:39 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/21/21 11:33 AM, Mike Galbraith wrote:
> On Wed, 2021-07-21 at 10:44 +0200, Vlastimil Babka wrote:
>>
>> So this doesn't look like our put_cpu_partial() preempted a
>> __slab_alloc() on the same cpu, right?
> 
> No, likely it was the one preempted by someone long gone, but we'll
> never know without setting a trap.
> 
>> BTW did my ugly patch work?
> 
> Nope.  I guess you missed my reporting it to have been a -ENOBOOT, and

Indeed, I misunderstood it as you talking about your patch.

> that cutting it in half, ie snagging only __slab_free() does boot, and
> seems to cure all of the RT fireworks.

OK, so depending on drain=1 makes this apply only to put_cpu_partial()
called from __slab_free and not get_partial_node(). One notable
difference is that in __slab_free we don't have n->list_lock locked and
in get_partial_node() we do. I guess in case your list_lock is made raw
again by another patch, that explains a local_lock can't nest under it.
If not, then I would expect this to work (I don't think they ever nest
in the opposite order, also lockdep should tell us instead of
-ENOBOOT?), but might be missing something...

I'd rather not nest those locks in any case. I just need to convince
myself that the scenario the half-fix fixes is indeed the only one
that's needed and we're not leaving there other races that are just
harder to trigger...

> (chainsaw noises...)
> 
> ---
>  mm/slub.c |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2551,6 +2551,8 @@ static void put_cpu_partial(struct kmem_
>  	int pobjects;
> 
>  	slub_get_cpu_ptr(s->cpu_slab);
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
> +		local_lock(&s->cpu_slab->lock);
>  	do {
>  		pages = 0;
>  		pobjects = 0;
> @@ -2564,7 +2566,13 @@ static void put_cpu_partial(struct kmem_
>  				 * partial array is full. Move the existing
>  				 * set to the per node partial list.
>  				 */
> -				unfreeze_partials(s);
> +				this_cpu_write(s->cpu_slab->partial, NULL);
> +				if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
> +					local_unlock(&s->cpu_slab->lock);
> +				__unfreeze_partials(s, oldpage);
> +				if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
> +					local_lock(&s->cpu_slab->lock);
> +
>  				oldpage = NULL;
>  				pobjects = 0;
>  				pages = 0;
> @@ -2581,6 +2589,8 @@ static void put_cpu_partial(struct kmem_
> 
>  	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
>  								!= oldpage);
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && drain)
> +		local_unlock(&s->cpu_slab->lock);
>  	slub_put_cpu_ptr(s->cpu_slab);
>  #endif	/* CONFIG_SLUB_CPU_PARTIAL */
>  }
> 
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-23 22:39                   ` Vlastimil Babka
@ 2021-07-24  2:25                     ` Mike Galbraith
  2021-07-25 14:09                     ` Mike Galbraith
  1 sibling, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-07-24  2:25 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Sat, 2021-07-24 at 00:39 +0200, Vlastimil Babka wrote:
> On 7/21/21 11:33 AM, Mike Galbraith wrote:
> > On Wed, 2021-07-21 at 10:44 +0200, Vlastimil Babka wrote:
> > >
> > > So this doesn't look like our put_cpu_partial() preempted a
> > > __slab_alloc() on the same cpu, right?
> >
> > No, likely it was the one preempted by someone long gone, but we'll
> > never know without setting a trap.
> >
> > > BTW did my ugly patch work?
> >
> > Nope.  I guess you missed my reporting it to have been a -ENOBOOT, and
>
> Indeed, I misunderstood it as you talking about your patch.
>
> > that cutting it in half, ie snagging only __slab_free() does boot, and
> > seems to cure all of the RT fireworks.
>
> OK, so depending on drain=1 makes this apply only to put_cpu_partial()
> called from __slab_free and not get_partial_node(). One notable
> difference is that in __slab_free we don't have n->list_lock locked and
> in get_partial_node() we do. I guess in case your list_lock is made raw
> again by another patch, that explains a local_lock can't nest under it.
> If not, then I would expect this to work (I don't think they ever nest
> in the opposite order, also lockdep should tell us instead of
> -ENOBOOT?), but might be missing something...

RT used to convert list_lock to raw_spinlock_t, but no longer does.
Whatever is going on, box does not emit a single sign of life with the
full patch.

> I'd rather not nest those locks in any case. I just need to convince
> myself that the scenario the half-fix fixes is indeed the only one
> that's needed and we're not leaving there other races that are just
> harder to trigger...

Yup.  I can only state with confidence that the trouble I was able to
easily reproduce was fixed up by serializing __slab_free(). Hopefully
you'll find that's the only hole in need of plugging.

	-Mike


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-23 22:39                   ` Vlastimil Babka
  2021-07-24  2:25                     ` Mike Galbraith
@ 2021-07-25 14:09                     ` Mike Galbraith
  2021-07-25 14:16                       ` Vlastimil Babka
  1 sibling, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2021-07-25 14:09 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Sat, 2021-07-24 at 00:39 +0200, Vlastimil Babka wrote:
>
> If not, then I would expect this to work (I don't think they ever nest
> in the opposite order, also lockdep should tell us instead of
> -ENOBOOT?), but might be missing something...

Yeah, like #ifndef CONFIG_PREMPT_RT at the bottom of the loop that our
useless damn eyeballs auto-correct instead of reporting :)

	-Mike


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-25 14:09                     ` Mike Galbraith
@ 2021-07-25 14:16                       ` Vlastimil Babka
  2021-07-25 15:02                         ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-25 14:16 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/25/21 4:09 PM, Mike Galbraith wrote:
> On Sat, 2021-07-24 at 00:39 +0200, Vlastimil Babka wrote:
>>
>> If not, then I would expect this to work (I don't think they ever nest
>> in the opposite order, also lockdep should tell us instead of
>> -ENOBOOT?), but might be missing something...
> 
> Yeah, like #ifndef CONFIG_PREMPT_RT at the bottom of the loop that our
> useless damn eyeballs auto-correct instead of reporting :)

Well doh, good catch. Hope fixing that helps then?
I've so far convinced myself that the full patch will be needed - in my
theory the ->partial cmpxchg in put_cpu_partial() itself can race with
assignment in ___slab_alloc() in a way that will not cause crashes, but
will leak slab pages.

> 	-Mike
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-25 14:16                       ` Vlastimil Babka
@ 2021-07-25 15:02                         ` Mike Galbraith
  2021-07-25 16:27                           ` Vlastimil Babka
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2021-07-25 15:02 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Sun, 2021-07-25 at 16:16 +0200, Vlastimil Babka wrote:
> On 7/25/21 4:09 PM, Mike Galbraith wrote:
> > On Sat, 2021-07-24 at 00:39 +0200, Vlastimil Babka wrote:
> > >
> > > If not, then I would expect this to work (I don't think they ever nest
> > > in the opposite order, also lockdep should tell us instead of
> > > -ENOBOOT?), but might be missing something...
> >
> > Yeah, like #ifndef CONFIG_PREMPT_RT at the bottom of the loop that our
> > useless damn eyeballs auto-correct instead of reporting :)
>
> Well doh, good catch.

I never did see it.  I got sick of saying "but but but", and did make
mm/slub.i, which made it glow.

> Hope fixing that helps then?

Yeah, though RT should perhaps be pinned across release/re-acquire?

Actually, local locks should rediscover the recursion handling skills
they long had so such RT specific hole poking isn't necessary.  There
previously would have been no ifdef+typo there for eyeballs to miss and
miss and miss.

	-Mike


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-25 15:02                         ` Mike Galbraith
@ 2021-07-25 16:27                           ` Vlastimil Babka
  2021-07-25 19:12                             ` Vlastimil Babka
  0 siblings, 1 reply; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-25 16:27 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/25/21 5:02 PM, Mike Galbraith wrote:
> On Sun, 2021-07-25 at 16:16 +0200, Vlastimil Babka wrote:
>> On 7/25/21 4:09 PM, Mike Galbraith wrote:
>>> On Sat, 2021-07-24 at 00:39 +0200, Vlastimil Babka wrote:
>>>>
>>>> If not, then I would expect this to work (I don't think they ever nest
>>>> in the opposite order, also lockdep should tell us instead of
>>>> -ENOBOOT?), but might be missing something...
>>>
>>> Yeah, like #ifndef CONFIG_PREMPT_RT at the bottom of the loop that our
>>> useless damn eyeballs auto-correct instead of reporting :)
>>
>> Well doh, good catch.
> 
> I never did see it.  I got sick of saying "but but but", and did make
> mm/slub.i, which made it glow.

Glad you did.

>> Hope fixing that helps then?
> 
> Yeah, though RT should perhaps be pinned across release/re-acquire?

Probably not necessary, this_cpu_cmpxchg() will effectively recognize
being moved to a different CPU.
Might also move __unfreeze_partials() out of the whole loop to avoid the
relock. Yeah that should be better.

> Actually, local locks should rediscover the recursion handling skills
> they long had so such RT specific hole poking isn't necessary.  There
> previously would have been no ifdef+typo there for eyeballs to miss and
> miss and miss.

Hm, now I'm realizing that local_lock() on !RT is just
preempt_disable(), i.e. equivalent to get_cpu_ptr(), so some of the
ifdeffery could go away?

> 	-Mike
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-25 16:27                           ` Vlastimil Babka
@ 2021-07-25 19:12                             ` Vlastimil Babka
  2021-07-25 19:34                               ` Vlastimil Babka
  0 siblings, 1 reply; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-25 19:12 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/25/21 6:27 PM, Vlastimil Babka wrote:
>>> Hope fixing that helps then?
>>
>> Yeah, though RT should perhaps be pinned across release/re-acquire?
> 
> Probably not necessary, this_cpu_cmpxchg() will effectively recognize
> being moved to a different CPU.
> Might also move __unfreeze_partials() out of the whole loop to avoid the
> relock. Yeah that should be better.
> 
>> Actually, local locks should rediscover the recursion handling skills
>> they long had so such RT specific hole poking isn't necessary.  There
>> previously would have been no ifdef+typo there for eyeballs to miss and
>> miss and miss.
> 
> Hm, now I'm realizing that local_lock() on !RT is just
> preempt_disable(), i.e. equivalent to get_cpu_ptr(), so some of the
> ifdeffery could go away?

How much will this explode? Thanks.

----8<----
From 99808b198bdf867951131bb9d1ca1bd1cd12b8c4 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Fri, 23 Jul 2021 23:17:18 +0200
Subject: [PATCH] PREEMPT_RT+SLUB_CPU_PARTIAL fix attempt

---
 mm/slub.c | 34 +++++++++++++++++++++++++++++++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 581004a5aca9..d12a50b5ee6f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2437,13 +2437,19 @@ static void unfreeze_partials(struct kmem_cache *s)
 {
 	struct page *partial_page;
 
+#ifndef CONFIG_PREEMPT_RT
 	do {
 		partial_page = this_cpu_read(s->cpu_slab->partial);
 
 	} while (partial_page &&
 		 this_cpu_cmpxchg(s->cpu_slab->partial, partial_page, NULL)
 				  != partial_page);
-
+#else
+	local_lock(&s->cpu_slab->lock);
+	partial_page = this_cpu_read(s->cpu_slab->partial);
+	this_cpu_write(s->cpu_slab->partial, NULL);
+	local_unlock(&s->cpu_slab->lock);
+#endif
 	if (partial_page)
 		__unfreeze_partials(s, partial_page);
 }
@@ -2479,10 +2485,15 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 {
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct page *oldpage;
+	struct page *page_to_unfreeze = NULL;
 	int pages;
 	int pobjects;
 
-	slub_get_cpu_ptr(s->cpu_slab);
+	/*
+	 * On !RT we just want to disable preemption, on RT we need the lock
+	 * for real. This happens to match local_lock() semantics.
+	 */
+	local_lock(&s->cpu_slab->lock);
 	do {
 		pages = 0;
 		pobjects = 0;
@@ -2496,7 +2507,16 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 				 * partial array is full. Move the existing
 				 * set to the per node partial list.
 				 */
+#ifndef CONFIG_PREEMPT_RT
 				unfreeze_partials(s);
+#else
+				/*
+				 * Postpone unfreezing until we drop the local
+				 * lock to avoid relocking.
+				 */
+				page_to_unfreeze = oldpage;
+#endif
+
 				oldpage = NULL;
 				pobjects = 0;
 				pages = 0;
@@ -2511,9 +2531,17 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 		page->pobjects = pobjects;
 		page->next = oldpage;
 
+#ifndef CONFIG_PREEMPT_RT
 	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
 								!= oldpage);
-	slub_put_cpu_ptr(s->cpu_slab);
+#else
+		this_cpu_write(s->cpu_slab->partial, page);
+	} while (false);
+#endif
+
+	local_unlock(&s->cpu_slab->lock);
+	if (page_to_unfreeze)
+		__unfreeze_partials(s, page_to_unfreeze);
 #endif	/* CONFIG_SLUB_CPU_PARTIAL */
 }
 
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-25 19:12                             ` Vlastimil Babka
@ 2021-07-25 19:34                               ` Vlastimil Babka
  2021-07-26 10:04                                 ` Mike Galbraith
  2021-07-26 17:00                                 ` Mike Galbraith
  0 siblings, 2 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-25 19:34 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/25/21 9:12 PM, Vlastimil Babka wrote:
> +	/*
> +	 * On !RT we just want to disable preemption, on RT we need the lock
> +	 * for real. This happens to match local_lock() semantics.
> +	 */
> +	local_lock(&s->cpu_slab->lock);

OK I realized (and tglx confirmed) that this will blow up on !RT +
lockdep if interrupted by ___slab_alloc() that will do
local_lock_irqsave(). So back to #ifdefs it is. But should work as-is
for RT testing.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-25 19:34                               ` Vlastimil Babka
@ 2021-07-26 10:04                                 ` Mike Galbraith
  2021-07-26 17:00                                 ` Mike Galbraith
  1 sibling, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-07-26 10:04 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Sun, 2021-07-25 at 21:34 +0200, Vlastimil Babka wrote:
> On 7/25/21 9:12 PM, Vlastimil Babka wrote:
> > +       /*
> > +        * On !RT we just want to disable preemption, on RT we need
> > the lock
> > +        * for real. This happens to match local_lock() semantics.
> > +        */
> > +       local_lock(&s->cpu_slab->lock);
>
> OK I realized (and tglx confirmed) that this will blow up on !RT +
> lockdep if interrupted by ___slab_alloc() that will do
> local_lock_irqsave(). So back to #ifdefs it is. But should work as-is
> for RT testing.

I called it off after 3 1/2 hours.  Bug blew box outta the water 10
times in about half of that, so methinks it's safe to call it dead.

	-Mike


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-25 19:34                               ` Vlastimil Babka
  2021-07-26 10:04                                 ` Mike Galbraith
@ 2021-07-26 17:00                                 ` Mike Galbraith
  2021-07-26 21:26                                   ` Vlastimil Babka
  1 sibling, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2021-07-26 17:00 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Sun, 2021-07-25 at 21:34 +0200, Vlastimil Babka wrote:
> On 7/25/21 9:12 PM, Vlastimil Babka wrote:
> > +       /*
> > +        * On !RT we just want to disable preemption, on RT we need
> > the lock
> > +        * for real. This happens to match local_lock() semantics.
> > +        */
> > +       local_lock(&s->cpu_slab->lock);
>
> OK I realized (and tglx confirmed) that this will blow up on !RT +
> lockdep if interrupted by ___slab_alloc() that will do
> local_lock_irqsave(). So back to #ifdefs it is. But should work as-is
> for RT testing.

Speaking of that local_lock_irqsave(), and some unloved ifdefs..

Why not do something like the below?  When I look at new_slab:, I see
cpu_slab->partial assignment protected by IRQs being disabled, which
implies to me it should probably be so protected everywhere.  There
used to be another slub_set_percpu_partial() call in
unfreeze_partials(), which was indeed called with IRQs disabled, quite
sane looking to an mm outsider looking in. The odd man out ->partial
assignment was the preempt disabled put_cpu_partial() cmpxchg loop,
which contained an IRQ disabled region to accommodate the
aforementioned unfreeze_partials().

Is there real world benefit to the cmpxchg loops whacked below (ala
monkey see monkey do) over everyone just taking the straight shot you
laid down for RT?  It's easier on the eye (mine anyway), and neither
PREEMPT or PREEMPT_RT seem inclined to complain... tick... tock...

---
 mm/slub.c |   79 ++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 41 insertions(+), 38 deletions(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2437,13 +2437,12 @@ static void __unfreeze_partials(struct k
 static void unfreeze_partials(struct kmem_cache *s)
 {
 	struct page *partial_page;
+	unsigned long flags;

-	do {
-		partial_page = this_cpu_read(s->cpu_slab->partial);
-
-	} while (partial_page &&
-		 this_cpu_cmpxchg(s->cpu_slab->partial, partial_page, NULL)
-				  != partial_page);
+	local_lock_irqsave(&s->cpu_slab->lock, flags);
+	partial_page = this_cpu_read(s->cpu_slab->partial);
+	this_cpu_write(s->cpu_slab->partial, NULL);
+	local_unlock_irqrestore(&s->cpu_slab->lock, flags);

 	if (partial_page)
 		__unfreeze_partials(s, partial_page);
@@ -2480,41 +2479,45 @@ static void put_cpu_partial(struct kmem_
 {
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct page *oldpage;
-	int pages;
-	int pobjects;
-
-	slub_get_cpu_ptr(s->cpu_slab);
-	do {
-		pages = 0;
-		pobjects = 0;
-		oldpage = this_cpu_read(s->cpu_slab->partial);
-
-		if (oldpage) {
-			pobjects = oldpage->pobjects;
-			pages = oldpage->pages;
-			if (drain && pobjects > slub_cpu_partial(s)) {
-				/*
-				 * partial array is full. Move the existing
-				 * set to the per node partial list.
-				 */
-				unfreeze_partials(s);
-				oldpage = NULL;
-				pobjects = 0;
-				pages = 0;
-				stat(s, CPU_PARTIAL_DRAIN);
-			}
+	struct page *page_to_unfreeze = NULL;
+	unsigned long flags;
+	int pages = 0, pobjects = 0;
+
+	local_lock_irqsave(&s->cpu_slab->lock, flags);
+
+	if (oldpage = this_cpu_read(s->cpu_slab->partial)) {
+		pobjects = oldpage->pobjects;
+		pages = oldpage->pages;
+		if (drain && pobjects > slub_cpu_partial(s)) {
+			/*
+			 * partial array is full. Move the existing
+			 * set to the per node partial list.
+			 *
+			 * Postpone unfreezing until we drop the local
+			 * lock to avoid an RT unlock/relock requirement
+			 * due to MEMCG __slab_free() recursion.
+			 */
+			page_to_unfreeze = oldpage;
+
+			oldpage = NULL;
+			pobjects = 0;
+			pages = 0;
+			stat(s, CPU_PARTIAL_DRAIN);
 		}
+	}
+
+	pages++;
+	pobjects += page->objects - page->inuse;
+
+	page->pages = pages;
+	page->pobjects = pobjects;
+	page->next = oldpage;

-		pages++;
-		pobjects += page->objects - page->inuse;
+	this_cpu_write(s->cpu_slab->partial, page);
+	local_unlock_irqrestore(&s->cpu_slab->lock, flags);

-		page->pages = pages;
-		page->pobjects = pobjects;
-		page->next = oldpage;
-
-	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
-								!= oldpage);
-	slub_put_cpu_ptr(s->cpu_slab);
+	if (page_to_unfreeze)
+		__unfreeze_partials(s, page_to_unfreeze);
 #endif	/* CONFIG_SLUB_CPU_PARTIAL */
 }




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-26 17:00                                 ` Mike Galbraith
@ 2021-07-26 21:26                                   ` Vlastimil Babka
  2021-07-27  4:09                                     ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-26 21:26 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/26/21 7:00 PM, Mike Galbraith wrote:
> On Sun, 2021-07-25 at 21:34 +0200, Vlastimil Babka wrote:
>> On 7/25/21 9:12 PM, Vlastimil Babka wrote:
>>> +       /*
>>> +        * On !RT we just want to disable preemption, on RT we need
>>> the lock
>>> +        * for real. This happens to match local_lock() semantics.
>>> +        */
>>> +       local_lock(&s->cpu_slab->lock);
>>
>> OK I realized (and tglx confirmed) that this will blow up on !RT +
>> lockdep if interrupted by ___slab_alloc() that will do
>> local_lock_irqsave(). So back to #ifdefs it is. But should work as-is
>> for RT testing.
> 
> Speaking of that local_lock_irqsave(), and some unloved ifdefs..
> 
> Why not do something like the below?  When I look at new_slab:, I see
> cpu_slab->partial assignment protected by IRQs being disabled, which
> implies to me it should probably be so protected everywhere.  There
> used to be another slub_set_percpu_partial() call in
> unfreeze_partials(), which was indeed called with IRQs disabled, quite
> sane looking to an mm outsider looking in. The odd man out ->partial
> assignment was the preempt disabled put_cpu_partial() cmpxchg loop,
> which contained an IRQ disabled region to accommodate the
> aforementioned unfreeze_partials().
> 
> Is there real world benefit to the cmpxchg loops whacked below (ala
> monkey see monkey do) over everyone just taking the straight shot you
> laid down for RT?  It's easier on the eye (mine anyway), and neither
> PREEMPT or PREEMPT_RT seem inclined to complain... tick... tock...

Yep, sounds like a good approach, thanks. Percpu partial is not *the*
SLUB fast path, so it should be sufficient without the lockless cmpxchg
tricks. Will incorporate in updated series.

> 
> ---
>  mm/slub.c |   79 ++++++++++++++++++++++++++++++++------------------------------
>  1 file changed, 41 insertions(+), 38 deletions(-)
> 
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2437,13 +2437,12 @@ static void __unfreeze_partials(struct k
>  static void unfreeze_partials(struct kmem_cache *s)
>  {
>  	struct page *partial_page;
> +	unsigned long flags;
> 
> -	do {
> -		partial_page = this_cpu_read(s->cpu_slab->partial);
> -
> -	} while (partial_page &&
> -		 this_cpu_cmpxchg(s->cpu_slab->partial, partial_page, NULL)
> -				  != partial_page);
> +	local_lock_irqsave(&s->cpu_slab->lock, flags);
> +	partial_page = this_cpu_read(s->cpu_slab->partial);
> +	this_cpu_write(s->cpu_slab->partial, NULL);
> +	local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> 
>  	if (partial_page)
>  		__unfreeze_partials(s, partial_page);
> @@ -2480,41 +2479,45 @@ static void put_cpu_partial(struct kmem_
>  {
>  #ifdef CONFIG_SLUB_CPU_PARTIAL
>  	struct page *oldpage;
> -	int pages;
> -	int pobjects;
> -
> -	slub_get_cpu_ptr(s->cpu_slab);
> -	do {
> -		pages = 0;
> -		pobjects = 0;
> -		oldpage = this_cpu_read(s->cpu_slab->partial);
> -
> -		if (oldpage) {
> -			pobjects = oldpage->pobjects;
> -			pages = oldpage->pages;
> -			if (drain && pobjects > slub_cpu_partial(s)) {
> -				/*
> -				 * partial array is full. Move the existing
> -				 * set to the per node partial list.
> -				 */
> -				unfreeze_partials(s);
> -				oldpage = NULL;
> -				pobjects = 0;
> -				pages = 0;
> -				stat(s, CPU_PARTIAL_DRAIN);
> -			}
> +	struct page *page_to_unfreeze = NULL;
> +	unsigned long flags;
> +	int pages = 0, pobjects = 0;
> +
> +	local_lock_irqsave(&s->cpu_slab->lock, flags);
> +
> +	if (oldpage = this_cpu_read(s->cpu_slab->partial)) {
> +		pobjects = oldpage->pobjects;
> +		pages = oldpage->pages;
> +		if (drain && pobjects > slub_cpu_partial(s)) {
> +			/*
> +			 * partial array is full. Move the existing
> +			 * set to the per node partial list.
> +			 *
> +			 * Postpone unfreezing until we drop the local
> +			 * lock to avoid an RT unlock/relock requirement
> +			 * due to MEMCG __slab_free() recursion.
> +			 */
> +			page_to_unfreeze = oldpage;
> +
> +			oldpage = NULL;
> +			pobjects = 0;
> +			pages = 0;
> +			stat(s, CPU_PARTIAL_DRAIN);
>  		}
> +	}
> +
> +	pages++;
> +	pobjects += page->objects - page->inuse;
> +
> +	page->pages = pages;
> +	page->pobjects = pobjects;
> +	page->next = oldpage;
> 
> -		pages++;
> -		pobjects += page->objects - page->inuse;
> +	this_cpu_write(s->cpu_slab->partial, page);
> +	local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> 
> -		page->pages = pages;
> -		page->pobjects = pobjects;
> -		page->next = oldpage;
> -
> -	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
> -								!= oldpage);
> -	slub_put_cpu_ptr(s->cpu_slab);
> +	if (page_to_unfreeze)
> +		__unfreeze_partials(s, page_to_unfreeze);
>  #endif	/* CONFIG_SLUB_CPU_PARTIAL */
>  }
> 
> 
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-26 21:26                                   ` Vlastimil Babka
@ 2021-07-27  4:09                                     ` Mike Galbraith
  2021-07-28 16:59                                       ` Vlastimil Babka
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2021-07-27  4:09 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Mon, 2021-07-26 at 23:26 +0200, Vlastimil Babka wrote:
> On 7/26/21 7:00 PM, Mike Galbraith wrote:
> >
> > Why not do something like the below?...
>
> Yep, sounds like a good approach, thanks. Percpu partial is not *the*
> SLUB fast path, so it should be sufficient without the lockless cmpxchg
> tricks. Will incorporate in updated series.

Great, my >= 5.13 trees will meanwhile wear it like so:

From: Vlastimil Babka <vbabka@suse.cz>
Date: Fri, 23 Jul 2021 23:17:18 +0200

mm, slub: Fix PREEMPT_RT plus SLUB_CPU_PARTIAL local exclusion

See https://lkml.org/lkml/2021/7/25/185

Mike: Remove ifdefs, make all configs take the straight line path layed
out for RT by Vlastimil in his prospective (now confirmed) fix.

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 mm/slub.c |   79 ++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 41 insertions(+), 38 deletions(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2437,13 +2437,12 @@ static void __unfreeze_partials(struct k
 static void unfreeze_partials(struct kmem_cache *s)
 {
 	struct page *partial_page;
+	unsigned long flags;

-	do {
-		partial_page = this_cpu_read(s->cpu_slab->partial);
-
-	} while (partial_page &&
-		 this_cpu_cmpxchg(s->cpu_slab->partial, partial_page, NULL)
-				  != partial_page);
+	local_lock_irqsave(&s->cpu_slab->lock, flags);
+	partial_page = this_cpu_read(s->cpu_slab->partial);
+	this_cpu_write(s->cpu_slab->partial, NULL);
+	local_unlock_irqrestore(&s->cpu_slab->lock, flags);

 	if (partial_page)
 		__unfreeze_partials(s, partial_page);
@@ -2480,41 +2479,45 @@ static void put_cpu_partial(struct kmem_
 {
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct page *oldpage;
-	int pages;
-	int pobjects;
-
-	slub_get_cpu_ptr(s->cpu_slab);
-	do {
-		pages = 0;
-		pobjects = 0;
-		oldpage = this_cpu_read(s->cpu_slab->partial);
-
-		if (oldpage) {
-			pobjects = oldpage->pobjects;
-			pages = oldpage->pages;
-			if (drain && pobjects > slub_cpu_partial(s)) {
-				/*
-				 * partial array is full. Move the existing
-				 * set to the per node partial list.
-				 */
-				unfreeze_partials(s);
-				oldpage = NULL;
-				pobjects = 0;
-				pages = 0;
-				stat(s, CPU_PARTIAL_DRAIN);
-			}
+	struct page *page_to_unfreeze = NULL;
+	unsigned long flags;
+	int pages = 0, pobjects = 0;
+
+	local_lock_irqsave(&s->cpu_slab->lock, flags);
+
+	if (oldpage = this_cpu_read(s->cpu_slab->partial)) {
+		pobjects = oldpage->pobjects;
+		pages = oldpage->pages;
+		if (drain && pobjects > slub_cpu_partial(s)) {
+			/*
+			 * partial array is full. Move the existing
+			 * set to the per node partial list.
+			 *
+			 * Postpone unfreezing until we drop the local
+			 * lock to avoid an RT unlock/relock requirement
+			 * due to MEMCG __slab_free() recursion.
+			 */
+			page_to_unfreeze = oldpage;
+
+			oldpage = NULL;
+			pobjects = 0;
+			pages = 0;
+			stat(s, CPU_PARTIAL_DRAIN);
 		}
+	}
+
+	pages++;
+	pobjects += page->objects - page->inuse;
+
+	page->pages = pages;
+	page->pobjects = pobjects;
+	page->next = oldpage;

-		pages++;
-		pobjects += page->objects - page->inuse;
+	this_cpu_write(s->cpu_slab->partial, page);
+	local_unlock_irqrestore(&s->cpu_slab->lock, flags);

-		page->pages = pages;
-		page->pobjects = pobjects;
-		page->next = oldpage;
-
-	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
-								!= oldpage);
-	slub_put_cpu_ptr(s->cpu_slab);
+	if (page_to_unfreeze)
+		__unfreeze_partials(s, page_to_unfreeze);
 #endif	/* CONFIG_SLUB_CPU_PARTIAL */
 }




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-27  4:09                                     ` Mike Galbraith
@ 2021-07-28 16:59                                       ` Vlastimil Babka
  2021-07-29  4:51                                         ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-28 16:59 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/27/21 6:09 AM, Mike Galbraith wrote:
> On Mon, 2021-07-26 at 23:26 +0200, Vlastimil Babka wrote:
>> On 7/26/21 7:00 PM, Mike Galbraith wrote:
>>>
>>> Why not do something like the below?...
>>
>> Yep, sounds like a good approach, thanks. Percpu partial is not *the*
>> SLUB fast path, so it should be sufficient without the lockless cmpxchg
>> tricks. Will incorporate in updated series.

The updated series incorporating hopefully all fixes from Mike and
bigeasy, and rebased to 5.14-rc3 (Thomas told me RT is moving to it), is
here:

https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-local-lock-v3r0

I'll post it to linux-mm tomorrow.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-28 16:59                                       ` Vlastimil Babka
@ 2021-07-29  4:51                                         ` Mike Galbraith
  2021-07-29  9:51                                           ` Vlastimil Babka
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2021-07-29  4:51 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On Wed, 2021-07-28 at 18:59 +0200, Vlastimil Babka wrote:
> On 7/27/21 6:09 AM, Mike Galbraith wrote:
> > On Mon, 2021-07-26 at 23:26 +0200, Vlastimil Babka wrote:
> > > On 7/26/21 7:00 PM, Mike Galbraith wrote:
> > > >
> > > > Why not do something like the below?...
> > >
> > > Yep, sounds like a good approach, thanks. Percpu partial is not *the*
> > > SLUB fast path, so it should be sufficient without the lockless cmpxchg
> > > tricks. Will incorporate in updated series.
>
> The updated series incorporating hopefully all fixes from Mike and
> bigeasy, and rebased to 5.14-rc3 (Thomas told me RT is moving to it), is
> here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-local-lock-v3r0

I had to resurrect the hunk below to build with lockdep, but modulo
dinky speedbump, the same RT testdrive that previously exploded was as
entertainment free as such testing is supposed to be.

---
 mm/slub.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2890,7 +2890,11 @@ static void *___slab_alloc(struct kmem_c

 load_freelist:

+#ifdef CONFIG_PREEMPT_RT
+	lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
+#else
 	lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock));
+#endif

 	/*
 	 * freelist is pointing to the list of objects to be used.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
  2021-07-29  4:51                                         ` Mike Galbraith
@ 2021-07-29  9:51                                           ` Vlastimil Babka
  0 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-07-29  9:51 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner, LKML
  Cc: linux-rt-users, Mel Gorman, Sebastian Andrzej Siewior

On 7/29/21 6:51 AM, Mike Galbraith wrote:
> On Wed, 2021-07-28 at 18:59 +0200, Vlastimil Babka wrote:
>> On 7/27/21 6:09 AM, Mike Galbraith wrote:
>> > On Mon, 2021-07-26 at 23:26 +0200, Vlastimil Babka wrote:
>> > > On 7/26/21 7:00 PM, Mike Galbraith wrote:
>> > > >
>> > > > Why not do something like the below?...
>> > >
>> > > Yep, sounds like a good approach, thanks. Percpu partial is not *the*
>> > > SLUB fast path, so it should be sufficient without the lockless cmpxchg
>> > > tricks. Will incorporate in updated series.
>>
>> The updated series incorporating hopefully all fixes from Mike and
>> bigeasy, and rebased to 5.14-rc3 (Thomas told me RT is moving to it), is
>> here:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-local-lock-v3r0
> 
> I had to resurrect the hunk below to build with lockdep, but modulo
> dinky speedbump, the same RT testdrive that previously exploded was as
> entertainment free as such testing is supposed to be.

Ah forgot about that, I'll include it too. Thanks for testing!

> ---
>  mm/slub.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2890,7 +2890,11 @@ static void *___slab_alloc(struct kmem_c
> 
>  load_freelist:
> 
> +#ifdef CONFIG_PREEMPT_RT
> +	lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock.lock));
> +#else
>  	lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock));
> +#endif
> 
>  	/*
>  	 * freelist is pointing to the list of objects to be used.
> 
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-07-29  9:51 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-08  9:42 [ANNOUNCE] v5.13-rt1 Thomas Gleixner
2021-07-09  5:20 ` [patch] mm/slub: Fix kmem_cache_alloc_bulk() error path Mike Galbraith
2021-07-09  5:20 ` [patch] mm/slub: Replace do_slab_free() local_lock_irqsave/restore() calls in PREEMPT_RT scope Mike Galbraith
2021-07-09  5:21 ` [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope Mike Galbraith
2021-07-09  5:25   ` Mike Galbraith
2021-07-09 19:28   ` Thomas Gleixner
2021-07-10  1:12     ` Mike Galbraith
2021-07-15 16:34       ` Mike Galbraith
2021-07-17 14:58         ` [patch] v2 " Mike Galbraith
2021-07-18  7:58           ` Vlastimil Babka
2021-07-18  8:11             ` Mike Galbraith
2021-07-18 15:43           ` Mike Galbraith
2021-07-18 21:19           ` Vlastimil Babka
2021-07-19  4:01             ` Mike Galbraith
2021-07-19 13:15               ` Mike Galbraith
2021-07-20  2:46           ` kernel test robot
2021-07-20  8:56         ` [rfc/patch] " Vlastimil Babka
2021-07-20 11:26           ` Mike Galbraith
2021-07-21  4:56             ` Mike Galbraith
2021-07-21  8:44               ` Vlastimil Babka
2021-07-21  9:33                 ` Mike Galbraith
2021-07-23 22:39                   ` Vlastimil Babka
2021-07-24  2:25                     ` Mike Galbraith
2021-07-25 14:09                     ` Mike Galbraith
2021-07-25 14:16                       ` Vlastimil Babka
2021-07-25 15:02                         ` Mike Galbraith
2021-07-25 16:27                           ` Vlastimil Babka
2021-07-25 19:12                             ` Vlastimil Babka
2021-07-25 19:34                               ` Vlastimil Babka
2021-07-26 10:04                                 ` Mike Galbraith
2021-07-26 17:00                                 ` Mike Galbraith
2021-07-26 21:26                                   ` Vlastimil Babka
2021-07-27  4:09                                     ` Mike Galbraith
2021-07-28 16:59                                       ` Vlastimil Babka
2021-07-29  4:51                                         ` Mike Galbraith
2021-07-29  9:51                                           ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).