All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@kernel.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Mike Galbraith <efault@gmx.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Mel Gorman <mgorman@techsingularity.net>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Jann Horn <jannh@google.com>
Subject: Re: [PATCH v4 35/35] mm, slub: convert kmem_cpu_slab protection to local_lock
Date: Tue, 17 Aug 2021 12:14:58 +0200	[thread overview]
Message-ID: <e907c2b6-6df1-8038-8c6c-aa9c1fd11259@suse.cz> (raw)
In-Reply-To: <20210805152000.12817-36-vbabka@suse.cz>

On 8/5/21 5:20 PM, Vlastimil Babka wrote:
> Embed local_lock into struct kmem_cpu_slab and use the irq-safe versions of
> local_lock instead of plain local_irq_save/restore. On !PREEMPT_RT that's
> equivalent, with better lockdep visibility. On PREEMPT_RT that means better
> preemption.
> 
> However, the cost on PREEMPT_RT is the loss of lockless fast paths which only
> work with cpu freelist. Those are designed to detect and recover from being
> preempted by other conflicting operations (both fast or slow path), but the
> slow path operations assume they cannot be preempted by a fast path operation,
> which is guaranteed naturally with disabled irqs. With local locks on
> PREEMPT_RT, the fast paths now also need to take the local lock to avoid races.
> 
> In the allocation fastpath slab_alloc_node() we can just defer to the slowpath
> __slab_alloc() which also works with cpu freelist, but under the local lock.
> In the free fastpath do_slab_free() we have to add a new local lock protected
> version of freeing to the cpu freelist, as the existing slowpath only works
> with the page freelist.
> 
> Also update the comment about locking scheme in SLUB to reflect changes done
> by this series.
> 
> [ Mike Galbraith <efault@gmx.de>: use local_lock() without irq in PREEMPT_RT
>   scope; debugging of RT crashes resulting in put_cpu_partial() locking changes ]
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---

Another fixup. Is it too many and should we replace it all with a v5?
----8<----
From b13291ca13effc2b22a55619aada688ad5defa4b Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Tue, 17 Aug 2021 11:47:16 +0200
Subject: [PATCH] mm, slub: fix kmem_cache_cpu fields alignment for double
 cmpxchg

Sven Eckelmann reports [1] that the addition of local_lock to kmem_cache_cpu
breaks a config with 64BIT+LOCK_STAT:

    general protection fault, maybe for address 0xffff888007fcf1c8: 0000 [#1] NOPTI
    CPU: 0 PID: 0 Comm: swapper Not tainted 5.14.0-rc5+ #7
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
    RIP: 0010:kmem_cache_alloc+0x81/0x180
    Code: 79 48 00 4c 8b 41 38 0f 84 89 00 00 00 4d 85 c0 0f 84 80 00 00 00 41 8b 44 24 28 49 8b 3c 24 48 8d 4a 01 49 8b 1c 00 4c 89 c0 <48> 0f c7 4f 38 0f 943
    RSP: 0000:ffffffff81803c10 EFLAGS: 00000286
    RAX: ffff88800244e7c0 RBX: ffff88800244e800 RCX: 0000000000000024
    RDX: 0000000000000023 RSI: 0000000000000100 RDI: ffff888007fcf190
    RBP: ffffffff81803c38 R08: ffff88800244e7c0 R09: 0000000000000dc0
    R10: 0000000000004000 R11: 0000000000000000 R12: ffff8880024413c0
    R13: ffffffff810d18f4 R14: 0000000000000dc0 R15: 0000000000000100
    FS:  0000000000000000(0000) GS:ffffffff81840000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff888002001000 CR3: 0000000001824000 CR4: 00000000000006b0
    Call Trace:
     __get_vm_area_node.constprop.0.isra.0+0x74/0x150
     __vmalloc_node_range+0x5a/0x2b0
     ? kernel_clone+0x88/0x390
     ? copy_process+0x1ac/0x17e0
     copy_process+0x768/0x17e0
     ? kernel_clone+0x88/0x390
     kernel_clone+0x88/0x390
     ? _vm_unmap_aliases.part.0+0xe9/0x110
     ? change_page_attr_set_clr+0x10d/0x180
     kernel_thread+0x43/0x50
     ? rest_init+0x100/0x100
     rest_init+0x1e/0x100
     arch_call_rest_init+0x9/0xc
     start_kernel+0x481/0x493
     x86_64_start_reservations+0x24/0x26
     x86_64_start_kernel+0x80/0x84
     secondary_startup_64_no_verify+0xc2/0xcb
    random: get_random_bytes called from oops_exit+0x34/0x60 with crng_init=0
    ---[ end trace 2cac18ac38f640c1 ]---
    RIP: 0010:kmem_cache_alloc+0x81/0x180
    Code: 79 48 00 4c 8b 41 38 0f 84 89 00 00 00 4d 85 c0 0f 84 80 00 00 00 41 8b 44 24 28 49 8b 3c 24 48 8d 4a 01 49 8b 1c 00 4c 89 c0 <48> 0f c7 4f 38 0f 943
    RSP: 0000:ffffffff81803c10 EFLAGS: 00000286
    RAX: ffff88800244e7c0 RBX: ffff88800244e800 RCX: 0000000000000024
    RDX: 0000000000000023 RSI: 0000000000000100 RDI: ffff888007fcf190
    RBP: ffffffff81803c38 R08: ffff88800244e7c0 R09: 0000000000000dc0
    R10: 0000000000004000 R11: 0000000000000000 R12: ffff8880024413c0
    R13: ffffffff810d18f4 R14: 0000000000000dc0 R15: 0000000000000100
    FS:  0000000000000000(0000) GS:ffffffff81840000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff888002001000 CR3: 0000000001824000 CR4: 00000000000006b0
    Kernel panic - not syncing: Attempted to kill the idle task!
    ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Decoding the RIP points to this_cpu_cmpxchg_double() call in slab_alloc_node().

The problem is the particular size of local_lock_t with LOCK_STAT resulting
in the following layout:

struct kmem_cache_cpu {
        local_lock_t               lock;                 /*     0    56 */
        void * *                   freelist;             /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        long unsigned int          tid;                  /*    64     8 */
        struct page *              page;                 /*    72     8 */
        struct page *              partial;              /*    80     8 */

        /* size: 88, cachelines: 2, members: 5 */
        /* last cacheline: 24 bytes */
};

As pointed out by Sebastian Andrzej Siewior, this_cpu_cmpxchg_double()
needs the freelist and tid fields to be aligned to sum of their sizes
(16 bytes) but they are not in this configuration. This didn't happen
with non-debug RT and !RT configs as well as lockdep.

To fix this, move the lock field below partial field, so that it doesn't
affect the layout.

[1] https://lore.kernel.org/linux-mm/2666777.vCjUEy5FO1@sven-desktop/

This is a fixup for mmotm patch
mm-slub-convert-kmem_cpu_slab-protection-to-local_lock.patch

Reported-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slub_def.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index b5bcac29b979..85499f0586b0 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -41,14 +41,18 @@ enum stat_item {
 	CPU_PARTIAL_DRAIN,	/* Drain cpu partial to node partial */
 	NR_SLUB_STAT_ITEMS };
 
+/*
+ * When changing the layout, make sure freelist and tid are still compatible
+ * with this_cpu_cmpxchg_double() alignment requirements.
+ */
 struct kmem_cache_cpu {
-	local_lock_t lock;	/* Protects the fields below except stat */
 	void **freelist;	/* Pointer to next available object */
 	unsigned long tid;	/* Globally unique transaction id */
 	struct page *page;	/* The slab from which we are allocating */
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct page *partial;	/* Partially allocated frozen slabs */
 #endif
+	local_lock_t lock;	/* Protects the fields above */
 #ifdef CONFIG_SLUB_STATS
 	unsigned stat[NR_SLUB_STAT_ITEMS];
 #endif
-- 
2.32.0



  parent reply	other threads:[~2021-08-17 10:15 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 15:19 [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 01/35] mm, slub: don't call flush_all() from slab_debug_trace_open() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 02/35] mm, slub: allocate private object map for debugfs listings Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 03/35] mm, slub: allocate private object map for validate_slab_cache() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 04/35] mm, slub: don't disable irq for debug_check_no_locks_freed() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 05/35] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 06/35] mm, slub: unify cmpxchg_double_slab() and __cmpxchg_double_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 07/35] mm, slub: extract get_partial() from new_slab_objects() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 08/35] mm, slub: dissolve new_slab_objects() into ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 09/35] mm, slub: return slab page from get_partial() and set c->page afterwards Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 10/35] mm, slub: restructure new page checks in ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 11/35] mm, slub: simplify kmem_cache_cpu and tid setup Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 12/35] mm, slub: move disabling/enabling irqs to ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 13/35] mm, slub: do initial checks in ___slab_alloc() with irqs enabled Vlastimil Babka
2021-08-15 10:14   ` Vlastimil Babka
2021-08-15 10:22     ` Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 14/35] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 15/35] mm, slub: restore irqs around calling new_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 16/35] mm, slub: validate slab from partial list or page allocator before making it cpu slab Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 17/35] mm, slub: check new pages with restored irqs Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 18/35] mm, slub: stop disabling irqs around get_partial() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 19/35] mm, slub: move reset of c->page and freelist out of deactivate_slab() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 20/35] mm, slub: make locking in deactivate_slab() irq-safe Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 21/35] mm, slub: call deactivate_slab() without disabling irqs Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 22/35] mm, slub: move irq control into unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 23/35] mm, slub: discard slabs in unfreeze_partials() without irqs disabled Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 24/35] mm, slub: detach whole partial list at once in unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 25/35] mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 26/35] mm, slub: only disable irq with spin_lock in __unfreeze_partials() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 27/35] mm, slub: don't disable irqs in slub_cpu_dead() Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 28/35] mm, slab: make flush_slab() possible to call with irqs enabled Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 29/35] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context Vlastimil Babka
2021-08-09 13:41   ` Qian Cai
2021-08-09 18:44     ` Mike Galbraith
2021-08-09 18:44       ` Mike Galbraith
2021-08-09 20:08       ` Vlastimil Babka
2021-08-09 22:13         ` Qian Cai
2021-08-10  1:07         ` Mike Galbraith
2021-08-10  1:07           ` Mike Galbraith
2021-08-10  9:03     ` Vlastimil Babka
2021-08-10 11:47       ` Mike Galbraith
2021-08-10 11:47         ` Mike Galbraith
2021-08-10 20:31         ` Paul E. McKenney
2021-08-10 22:36           ` Vlastimil Babka
2021-08-10 23:53             ` Paul E. McKenney
2021-08-11 14:17               ` Paul E. McKenney
2021-08-10 20:25       ` Paul E. McKenney
2021-08-10 14:33     ` Vlastimil Babka
2021-08-11  1:42       ` Qian Cai
2021-08-11  8:55       ` Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 30/35] mm: slub: Make object_map_lock a raw_spinlock_t Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 31/35] mm, slub: optionally save/restore irqs in slab_[un]lock()/ Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 32/35] mm, slub: make slab_lock() disable irqs with PREEMPT_RT Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 33/35] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Vlastimil Babka
2021-08-05 15:19 ` [PATCH v4 34/35] mm, slub: use migrate_disable() on PREEMPT_RT Vlastimil Babka
2021-08-05 15:20 ` [PATCH v4 35/35] mm, slub: convert kmem_cpu_slab protection to local_lock Vlastimil Babka
2021-08-15 12:27   ` Sven Eckelmann
2021-08-17  8:37     ` Vlastimil Babka
2021-08-17  9:12       ` Sebastian Andrzej Siewior
2021-08-17  9:17         ` Vlastimil Babka
2021-08-17  9:31           ` Sebastian Andrzej Siewior
2021-08-17  9:31         ` Vlastimil Babka
2021-08-17  9:34           ` Sebastian Andrzej Siewior
2021-08-17  9:13     ` Vlastimil Babka
2021-08-17 10:14   ` Vlastimil Babka [this message]
2021-08-17 19:53     ` Andrew Morton
2021-08-18 11:52       ` Vlastimil Babka
2021-08-23 20:36         ` Thomas Gleixner
2021-08-17 15:39   ` Sebastian Andrzej Siewior
2021-08-17 15:41     ` Vlastimil Babka
2021-08-17 15:49       ` Sebastian Andrzej Siewior
2021-08-17 15:56   ` Vlastimil Babka
2021-08-05 16:42 ` [PATCH v4 00/35] SLUB: reduce irq disabled scope and make it RT compatible Sebastian Andrzej Siewior
2021-08-06  5:14   ` Mike Galbraith
2021-08-06  5:14     ` Mike Galbraith
2021-08-06  7:45     ` Vlastimil Babka
2021-08-10 14:36 ` Vlastimil Babka
2021-08-15 10:18   ` Vlastimil Babka
2021-08-17 10:23     ` Vlastimil Babka
2021-08-17 15:59       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e907c2b6-6df1-8038-8c6c-aa9c1fd11259@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=brouer@redhat.com \
    --cc=cl@linux.com \
    --cc=efault@gmx.de \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.