From: Vlastimil Babka <vbabka@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
David Rientjes <rientjes@google.com>,
Pekka Enberg <penberg@kernel.org>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Mike Galbraith <efault@gmx.de>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Thomas Gleixner <tglx@linutronix.de>,
Mel Gorman <mgorman@techsingularity.net>,
Jesper Dangaard Brouer <brouer@redhat.com>,
Jann Horn <jannh@google.com>
Subject: Re: [PATCH v4 35/35] mm, slub: convert kmem_cpu_slab protection to local_lock
Date: Tue, 17 Aug 2021 12:14:58 +0200 [thread overview]
Message-ID: <e907c2b6-6df1-8038-8c6c-aa9c1fd11259@suse.cz> (raw)
In-Reply-To: <20210805152000.12817-36-vbabka@suse.cz>
On 8/5/21 5:20 PM, Vlastimil Babka wrote:
> Embed local_lock into struct kmem_cpu_slab and use the irq-safe versions of
> local_lock instead of plain local_irq_save/restore. On !PREEMPT_RT that's
> equivalent, with better lockdep visibility. On PREEMPT_RT that means better
> preemption.
>
> However, the cost on PREEMPT_RT is the loss of lockless fast paths which only
> work with cpu freelist. Those are designed to detect and recover from being
> preempted by other conflicting operations (both fast or slow path), but the
> slow path operations assume they cannot be preempted by a fast path operation,
> which is guaranteed naturally with disabled irqs. With local locks on
> PREEMPT_RT, the fast paths now also need to take the local lock to avoid races.
>
> In the allocation fastpath slab_alloc_node() we can just defer to the slowpath
> __slab_alloc() which also works with cpu freelist, but under the local lock.
> In the free fastpath do_slab_free() we have to add a new local lock protected
> version of freeing to the cpu freelist, as the existing slowpath only works
> with the page freelist.
>
> Also update the comment about locking scheme in SLUB to reflect changes done
> by this series.
>
> [ Mike Galbraith <efault@gmx.de>: use local_lock() without irq in PREEMPT_RT
> scope; debugging of RT crashes resulting in put_cpu_partial() locking changes ]
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
Another fixup. Is it too many and should we replace it all with a v5?
----8<----
From b13291ca13effc2b22a55619aada688ad5defa4b Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Tue, 17 Aug 2021 11:47:16 +0200
Subject: [PATCH] mm, slub: fix kmem_cache_cpu fields alignment for double
cmpxchg
Sven Eckelmann reports [1] that the addition of local_lock to kmem_cache_cpu
breaks a config with 64BIT+LOCK_STAT:
general protection fault, maybe for address 0xffff888007fcf1c8: 0000 [#1] NOPTI
CPU: 0 PID: 0 Comm: swapper Not tainted 5.14.0-rc5+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
RIP: 0010:kmem_cache_alloc+0x81/0x180
Code: 79 48 00 4c 8b 41 38 0f 84 89 00 00 00 4d 85 c0 0f 84 80 00 00 00 41 8b 44 24 28 49 8b 3c 24 48 8d 4a 01 49 8b 1c 00 4c 89 c0 <48> 0f c7 4f 38 0f 943
RSP: 0000:ffffffff81803c10 EFLAGS: 00000286
RAX: ffff88800244e7c0 RBX: ffff88800244e800 RCX: 0000000000000024
RDX: 0000000000000023 RSI: 0000000000000100 RDI: ffff888007fcf190
RBP: ffffffff81803c38 R08: ffff88800244e7c0 R09: 0000000000000dc0
R10: 0000000000004000 R11: 0000000000000000 R12: ffff8880024413c0
R13: ffffffff810d18f4 R14: 0000000000000dc0 R15: 0000000000000100
FS: 0000000000000000(0000) GS:ffffffff81840000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888002001000 CR3: 0000000001824000 CR4: 00000000000006b0
Call Trace:
__get_vm_area_node.constprop.0.isra.0+0x74/0x150
__vmalloc_node_range+0x5a/0x2b0
? kernel_clone+0x88/0x390
? copy_process+0x1ac/0x17e0
copy_process+0x768/0x17e0
? kernel_clone+0x88/0x390
kernel_clone+0x88/0x390
? _vm_unmap_aliases.part.0+0xe9/0x110
? change_page_attr_set_clr+0x10d/0x180
kernel_thread+0x43/0x50
? rest_init+0x100/0x100
rest_init+0x1e/0x100
arch_call_rest_init+0x9/0xc
start_kernel+0x481/0x493
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x80/0x84
secondary_startup_64_no_verify+0xc2/0xcb
random: get_random_bytes called from oops_exit+0x34/0x60 with crng_init=0
---[ end trace 2cac18ac38f640c1 ]---
RIP: 0010:kmem_cache_alloc+0x81/0x180
Code: 79 48 00 4c 8b 41 38 0f 84 89 00 00 00 4d 85 c0 0f 84 80 00 00 00 41 8b 44 24 28 49 8b 3c 24 48 8d 4a 01 49 8b 1c 00 4c 89 c0 <48> 0f c7 4f 38 0f 943
RSP: 0000:ffffffff81803c10 EFLAGS: 00000286
RAX: ffff88800244e7c0 RBX: ffff88800244e800 RCX: 0000000000000024
RDX: 0000000000000023 RSI: 0000000000000100 RDI: ffff888007fcf190
RBP: ffffffff81803c38 R08: ffff88800244e7c0 R09: 0000000000000dc0
R10: 0000000000004000 R11: 0000000000000000 R12: ffff8880024413c0
R13: ffffffff810d18f4 R14: 0000000000000dc0 R15: 0000000000000100
FS: 0000000000000000(0000) GS:ffffffff81840000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888002001000 CR3: 0000000001824000 CR4: 00000000000006b0
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
Decoding the RIP points to this_cpu_cmpxchg_double() call in slab_alloc_node().
The problem is the particular size of local_lock_t with LOCK_STAT resulting
in the following layout:
struct kmem_cache_cpu {
local_lock_t lock; /* 0 56 */
void * * freelist; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
long unsigned int tid; /* 64 8 */
struct page * page; /* 72 8 */
struct page * partial; /* 80 8 */
/* size: 88, cachelines: 2, members: 5 */
/* last cacheline: 24 bytes */
};
As pointed out by Sebastian Andrzej Siewior, this_cpu_cmpxchg_double()
needs the freelist and tid fields to be aligned to sum of their sizes
(16 bytes) but they are not in this configuration. This didn't happen
with non-debug RT and !RT configs as well as lockdep.
To fix this, move the lock field below partial field, so that it doesn't
affect the layout.
[1] https://lore.kernel.org/linux-mm/2666777.vCjUEy5FO1@sven-desktop/
This is a fixup for mmotm patch
mm-slub-convert-kmem_cpu_slab-protection-to-local_lock.patch
Reported-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
include/linux/slub_def.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index b5bcac29b979..85499f0586b0 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -41,14 +41,18 @@ enum stat_item {
CPU_PARTIAL_DRAIN, /* Drain cpu partial to node partial */
NR_SLUB_STAT_ITEMS };
+/*
+ * When changing the layout, make sure freelist and tid are still compatible
+ * with this_cpu_cmpxchg_double() alignment requirements.
+ */
struct kmem_cache_cpu {
- local_lock_t lock; /* Protects the fields below except stat */
void **freelist; /* Pointer to next available object */
unsigned long tid; /* Globally unique transaction id */
struct page *page; /* The slab from which we are allocating */
#ifdef CONFIG_SLUB_CPU_PARTIAL
struct page *partial; /* Partially allocated frozen slabs */
#endif
+ local_lock_t lock; /* Protects the fields above */
#ifdef CONFIG_SLUB_STATS
unsigned stat[NR_SLUB_STAT_ITEMS];
#endif
--
2.32.0
next prev parent reply other threads:[~2021-08-17 10:15 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-05 15:19 [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 01/35] mm, slub: don't call flush_all() from slab_debug_trace_open() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 02/35] mm, slub: allocate private object map for debugfs listings Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 03/35] mm, slub: allocate private object map for validate_slab_cache() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 04/35] mm, slub: don't disable irq for debug_check_no_locks_freed() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 05/35] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 06/35] mm, slub: unify cmpxchg_double_slab() and __cmpxchg_double_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 07/35] mm, slub: extract get_partial() from new_slab_objects() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 08/35] mm, slub: dissolve new_slab_objects() into ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 09/35] mm, slub: return slab page from get_partial() and set c->page afterwards Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 10/35] mm, slub: restructure new page checks in ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 11/35] mm, slub: simplify kmem_cache_cpu and tid setup Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 12/35] mm, slub: move disabling/enabling irqs to ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 13/35] mm, slub: do initial checks in ___slab_alloc() with irqs enabled Vlastimil Babka
2021-08-15 10:14 ` Vlastimil Babka
2021-08-15 10:22 ` Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 14/35] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 15/35] mm, slub: restore irqs around calling new_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 16/35] mm, slub: validate slab from partial list or page allocator before making it cpu slab Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 17/35] mm, slub: check new pages with restored irqs Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 18/35] mm, slub: stop disabling irqs around get_partial() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 19/35] mm, slub: move reset of c->page and freelist out of deactivate_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 20/35] mm, slub: make locking in deactivate_slab() irq-safe Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 21/35] mm, slub: call deactivate_slab() without disabling irqs Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 22/35] mm, slub: move irq control into unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 23/35] mm, slub: discard slabs in unfreeze_partials() without irqs disabled Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 24/35] mm, slub: detach whole partial list at once in unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 25/35] mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 26/35] mm, slub: only disable irq with spin_lock in __unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 27/35] mm, slub: don't disable irqs in slub_cpu_dead() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 28/35] mm, slab: make flush_slab() possible to call with irqs enabled Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 29/35] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context Vlastimil Babka
2021-08-09 13:41 ` Qian Cai
2021-08-09 18:44 ` Mike Galbraith
2021-08-09 18:44 ` Mike Galbraith
2021-08-09 20:08 ` Vlastimil Babka
2021-08-09 22:13 ` Qian Cai
2021-08-10 1:07 ` Mike Galbraith
2021-08-10 1:07 ` Mike Galbraith
2021-08-10 9:03 ` Vlastimil Babka
2021-08-10 11:47 ` Mike Galbraith
2021-08-10 11:47 ` Mike Galbraith
2021-08-10 20:31 ` Paul E. McKenney
2021-08-10 22:36 ` Vlastimil Babka
2021-08-10 23:53 ` Paul E. McKenney
2021-08-11 14:17 ` Paul E. McKenney
2021-08-10 20:25 ` Paul E. McKenney
2021-08-10 14:33 ` Vlastimil Babka
2021-08-11 1:42 ` Qian Cai
2021-08-11 8:55 ` Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 30/35] mm: slub: Make object_map_lock a raw_spinlock_t Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 31/35] mm, slub: optionally save/restore irqs in slab_[un]lock()/ Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 32/35] mm, slub: make slab_lock() disable irqs with PREEMPT_RT Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 33/35] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 34/35] mm, slub: use migrate_disable() on PREEMPT_RT Vlastimil Babka
2021-08-05 15:20 ` [PATCH v4 35/35] mm, slub: convert kmem_cpu_slab protection to local_lock Vlastimil Babka
2021-08-15 12:27 ` Sven Eckelmann
2021-08-17 8:37 ` Vlastimil Babka
2021-08-17 9:12 ` Sebastian Andrzej Siewior
2021-08-17 9:17 ` Vlastimil Babka
2021-08-17 9:31 ` Sebastian Andrzej Siewior
2021-08-17 9:31 ` Vlastimil Babka
2021-08-17 9:34 ` Sebastian Andrzej Siewior
2021-08-17 9:13 ` Vlastimil Babka
2021-08-17 10:14 ` Vlastimil Babka [this message]
2021-08-17 19:53 ` Andrew Morton
2021-08-18 11:52 ` Vlastimil Babka
2021-08-23 20:36 ` Thomas Gleixner
2021-08-17 15:39 ` Sebastian Andrzej Siewior
2021-08-17 15:41 ` Vlastimil Babka
2021-08-17 15:49 ` Sebastian Andrzej Siewior
2021-08-17 15:56 ` Vlastimil Babka
2021-08-05 16:42 ` [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible Sebastian Andrzej Siewior
2021-08-06 5:14 ` Mike Galbraith
2021-08-06 5:14 ` Mike Galbraith
2021-08-06 7:45 ` Vlastimil Babka
2021-08-10 14:36 ` Vlastimil Babka
2021-08-15 10:18 ` Vlastimil Babka
2021-08-17 10:23 ` Vlastimil Babka
2021-08-17 15:59 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e907c2b6-6df1-8038-8c6c-aa9c1fd11259@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=brouer@redhat.com \
--cc=cl@linux.com \
--cc=efault@gmx.de \
--cc=iamjoonsoo.kim@lge.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.