linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible
@ 2021-09-04 10:49 Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 01/33] mm, slub: don't call flush_all() from slab_debug_trace_open() Vlastimil Babka
                   ` (34 more replies)
  0 siblings, 35 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg,
	Joonsoo Kim, Linus Torvalds, Stephen Rothwell
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

(Linus and Stephen only To:'d on the cover letter for the git part)

Hi,

this new version addresses feedback from Linus wrt use of conditional locking.
The flush_slab() one is fixed as was discussed, per Mike Galbraith's
suggestion [14]. More instances in the series occuring in cmpxchg_double_slab()
and slab_[un]lock() were removed too. As such, the cleanest way to do this
without churn was to rebase, drop some patches and adjust others (see changelog
below).  As there was rebase anyway, bumped the base tag from v5.14-rc6 to
v5.14 as well. The resulting differences from v5 should be only how the code is
organized, not functional. I will of course continue testing, as I hope Mel and
the RT guys would soon too.

As Linus mentioned the possibility of taking this directly from git [15] I have
created a signed tag for a later (near end of merge window?) pull request
assuming nothing blows up:
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git tags/mm-slub-5.15-rc1

I assume that would mean dropping the series from mmotm now and instead adding
the tree to linux-next? AFAIK there are no further slub patches pending anywhere
so no conflicts.

Changes since v5 [11]
* Rebased from v5.14-rc6 to v5.14
* Removed conditional locking from flush_slab(), instead the non-locking variant
  is moved to its only caller (patches 27-28)
* In the same vein, removed patches "mm, slub: unify cmpxchg_double_slab() and
  __cmpxchg_double_slab()" [12] and "mm, slub: optionally save/restore irqs in
  slab_[un]lock()" [13] that introduced conditional locking elsewhere. Patch 30
  "mm, slub: make slab_lock() disable irqs with PREEMPT_RT" adjusted
  accordingly

Links to rarlier versions (changelog trimmed):
v4 [10], v3 [8], v2 [5],  RFC v1 [1]:

This series was initially inspired by Mel's pcplist local_lock rewrite, and
also interest to better understand SLUB's locking and the new primitives and RT
variants and implications. It makes SLUB compatible with PREEMPT_RT and
generally more preemption-friendly, apparently without significant regressions,
as the fast paths are not affected.

The main changes to SLUB by this series:

* irq disabling is now only done for minimum amount of time needed to protect
  the strict kmem_cache_cpu fields, and as part of spin lock, local lock and
  bit lock operations to make them irq-safe

* SLUB is fully PREEMPT_RT compatible

Series is based on 5.14 and also available as a git branch:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-local-lock-v6r2

And signed tag:
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git tags/mm-slub-5.15-rc1

The series should now be sufficiently tested in both RT and !RT configs, mainly
thanks to Mike.

The RFC/v1 version also got basic performance screening by Mel that didn't show
major regressions. Mike's testing with hackbench of v2 on !RT reported
negligible differences [6]:

virgin(ish) tip
5.13.0.g60ab3ed-tip
          7,320.67 msec task-clock                #    7.792 CPUs utilized            ( +-  0.31% )
           221,215      context-switches          #    0.030 M/sec                    ( +-  3.97% )
            16,234      cpu-migrations            #    0.002 M/sec                    ( +-  4.07% )
            13,233      page-faults               #    0.002 M/sec                    ( +-  0.91% )
    27,592,205,252      cycles                    #    3.769 GHz                      ( +-  0.32% )
     8,309,495,040      instructions              #    0.30  insn per cycle           ( +-  0.37% )
     1,555,210,607      branches                  #  212.441 M/sec                    ( +-  0.42% )
         5,484,209      branch-misses             #    0.35% of all branches          ( +-  2.13% )

           0.93949 +- 0.00423 seconds time elapsed  ( +-  0.45% )
           0.94608 +- 0.00384 seconds time elapsed  ( +-  0.41% ) (repeat)
           0.94422 +- 0.00410 seconds time elapsed  ( +-  0.43% )

5.13.0.g60ab3ed-tip +slub-local-lock-v2r3
          7,343.57 msec task-clock                #    7.776 CPUs utilized            ( +-  0.44% )
           223,044      context-switches          #    0.030 M/sec                    ( +-  3.02% )
            16,057      cpu-migrations            #    0.002 M/sec                    ( +-  4.03% )
            13,164      page-faults               #    0.002 M/sec                    ( +-  0.97% )
    27,684,906,017      cycles                    #    3.770 GHz                      ( +-  0.45% )
     8,323,273,871      instructions              #    0.30  insn per cycle           ( +-  0.28% )
     1,556,106,680      branches                  #  211.901 M/sec                    ( +-  0.31% )
         5,463,468      branch-misses             #    0.35% of all branches          ( +-  1.33% )

           0.94440 +- 0.00352 seconds time elapsed  ( +-  0.37% )
           0.94830 +- 0.00228 seconds time elapsed  ( +-  0.24% ) (repeat)
           0.93813 +- 0.00440 seconds time elapsed  ( +-  0.47% ) (repeat)

RT configs showed some throughput regressions, but that's expected tradeoff for
the preemption improvements through the RT mutex. It didn't prevent the v2 to
be incorporated to the 5.13 RT tree [7], leading to testing exposure and
bugfixes.

Before the series, SLUB is lockless in both allocation and free fast paths, but
elsewhere, it's disabling irqs for considerable periods of time - especially in
allocation slowpath and the bulk allocation, where IRQs are re-enabled only
when a new page from the page allocator is needed, and the context allows
blocking. The irq disabled sections can then include deactivate_slab() which
walks a full freelist and frees the slab back to page allocator or
unfreeze_partials() going through a list of percpu partial slabs. The RT tree
currently has some patches mitigating these, but we can do much better in
mainline too.

Patches 1-6 are straightforward improvements or cleanups that could exist
outside of this series too, but are prerequsities.

Patches 7-9 are also preparatory code changes without functional changes, but
not so useful without the rest of the series.

Patch 10 simplifies the fast paths on systems with preemption, based on
(hopefully correct) observation that the current loops to verify tid are
unnecessary.

Patches 11-20 focus on reducing irq disabled scope in the allocation slowpath.

Patch 11 moves disabling of irqs into ___slab_alloc() from its callers, which
are the allocation slowpath, and bulk allocation. Instead these callers only
disable preemption to stabilize the cpu. The following patches then gradually
reduce the scope of disabled irqs in ___slab_alloc() and the functions called
from there. As of patch 14, the re-enabling of irqs based on gfp flags before
calling the page allocator is removed from allocate_slab(). As of patch 17,
it's possible to reach the page allocator (in case of existing slabs depleted)
without disabling and re-enabling irqs a single time.

Pathces 21-26 reduce the scope of disabled irqs in functions related to
unfreezing percpu partial slab.

Patch 27 is preparatory. Patch 28 is adopted from the RT tree and converts the
flushing of percpu slabs on all cpus from using IPI to workqueue, so that the
processing isn't happening with irqs disabled in the IPI handler. The flushing
is not performance critical so it should be acceptable.

Patch 29 also comes from RT tree and makes object_map_lock RT compatible.

Patch 30 make slab_lock irq-safe on RT where we cannot rely on having
irq disabled from the list_lock spin lock usage.

Patch 31 changes kmem_cache_cpu->partial handling in put_cpu_partial() from
cmpxchg loop to a short irq disabled section, which is used by all other code
modifying the field. This addresses a theoretical race scenario pointed out by
Jann, and makes the critical section safe wrt with RT local_lock semantics
after the conversion in patch 35.

Patch 32 changes preempt disable to migrate disable, so that the nested
list_lock spinlock is safe to take on RT. Because migrate_disable() is a
function call even on !RT, a small set of private wrappers is introduced
to keep using the cheaper preempt_disable() on !PREEMPT_RT configurations.
As of this patch, SLUB should be already compatible with RT's lock semantics.

Finally, patch 33 changes irq disabled sections that protect kmem_cache_cpu
fields in the slow paths, with a local lock. However on PREEMPT_RT it means the
lockless fast paths can now preempt slow paths which don't expect that, so the
local lock has to be taken also in the fast paths and they are no longer
lockless. RT folks seem to not mind this tradeoff. The patch also updates the
locking documentation in the file's comment.

[1] https://lore.kernel.org/lkml/20210524233946.20352-1-vbabka@suse.cz/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0001-mm-sl-au-b-Change-list_lock-to-raw_spinlock_t.patch?h=linux-5.12.y-rt-patches
[3] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0004-mm-slub-Move-discard_slab-invocations-out-of-IRQ-off.patch?h=linux-5.12.y-rt-patches
[4] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0005-mm-slub-Move-flush_cpu_slab-invocations-__free_slab-.patch?h=linux-5.12.y-rt-patches
[5] https://lore.kernel.org/lkml/20210609113903.1421-1-vbabka@suse.cz/
[6] https://lore.kernel.org/lkml/891dc24e38106f8542f4c72831d52dc1a1863ae8.camel@gmx.de
[7] https://lore.kernel.org/linux-rt-users/87tul5p2fa.ffs@nanos.tec.linutronix.de/
[8] https://lore.kernel.org/lkml/20210729132132.19691-1-vbabka@suse.cz/
[9] https://lore.kernel.org/lkml/20210804120522.GD6464@techsingularity.net/
[10] https://lore.kernel.org/lkml/20210805152000.12817-1-vbabka@suse.cz/
[11] https://lore.kernel.org/all/20210823145826.3857-1-vbabka@suse.cz/
[12] https://lore.kernel.org/all/20210823145826.3857-7-vbabka@suse.cz/
[13] https://lore.kernel.org/all/20210823145826.3857-32-vbabka@suse.cz/
[14] https://lore.kernel.org/linux-mm/1ae902f7-c500-f9e8-1b4f-077beade0f42@suse.cz/
[15] https://lore.kernel.org/linux-mm/CAHk-=wjRfFtnQ5p42s_5Uv8i0U5YKSBpTH++_ZMKZyyvYicYmQ@mail.gmail.com/
[16] https://lore.kernel.org/all/871r6j526m.ffs@tglx/

Sebastian Andrzej Siewior (2):
  mm: slub: move flush_cpu_slab() invocations __free_slab() invocations
    out of IRQ context
  mm: slub: make object_map_lock a raw_spinlock_t

Vlastimil Babka (31):
  mm, slub: don't call flush_all() from slab_debug_trace_open()
  mm, slub: allocate private object map for debugfs listings
  mm, slub: allocate private object map for validate_slab_cache()
  mm, slub: don't disable irq for debug_check_no_locks_freed()
  mm, slub: remove redundant unfreeze_partials() from put_cpu_partial()
  mm, slub: extract get_partial() from new_slab_objects()
  mm, slub: dissolve new_slab_objects() into ___slab_alloc()
  mm, slub: return slab page from get_partial() and set c->page
    afterwards
  mm, slub: restructure new page checks in ___slab_alloc()
  mm, slub: simplify kmem_cache_cpu and tid setup
  mm, slub: move disabling/enabling irqs to ___slab_alloc()
  mm, slub: do initial checks in ___slab_alloc() with irqs enabled
  mm, slub: move disabling irqs closer to get_partial() in
    ___slab_alloc()
  mm, slub: restore irqs around calling new_slab()
  mm, slub: validate slab from partial list or page allocator before
    making it cpu slab
  mm, slub: check new pages with restored irqs
  mm, slub: stop disabling irqs around get_partial()
  mm, slub: move reset of c->page and freelist out of deactivate_slab()
  mm, slub: make locking in deactivate_slab() irq-safe
  mm, slub: call deactivate_slab() without disabling irqs
  mm, slub: move irq control into unfreeze_partials()
  mm, slub: discard slabs in unfreeze_partials() without irqs disabled
  mm, slub: detach whole partial list at once in unfreeze_partials()
  mm, slub: separate detaching of partial list in unfreeze_partials()
    from unfreezing
  mm, slub: only disable irq with spin_lock in __unfreeze_partials()
  mm, slub: don't disable irqs in slub_cpu_dead()
  mm, slab: split out the cpu offline variant of flush_slab()
  mm, slub: make slab_lock() disable irqs with PREEMPT_RT
  mm, slub: protect put_cpu_partial() with disabled irqs instead of
    cmpxchg
  mm, slub: use migrate_disable() on PREEMPT_RT
  mm, slub: convert kmem_cpu_slab protection to local_lock

 include/linux/page-flags.h |   9 +
 include/linux/slub_def.h   |   6 +
 mm/slab_common.c           |   2 +
 mm/slub.c                  | 797 +++++++++++++++++++++++++------------
 4 files changed, 563 insertions(+), 251 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 01/33] mm, slub: don't call flush_all() from slab_debug_trace_open()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 02/33] mm, slub: allocate private object map for debugfs listings Vlastimil Babka
                   ` (33 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

slab_debug_trace_open() can only be called on caches with SLAB_STORE_USER flag
and as with all slub debugging flags, such caches avoid cpu or percpu partial
slabs altogether, so there's nothing to flush.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Christoph Lameter <cl@linux.com>
---
 mm/slub.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index f77d8cd79ef7..f6063ec97a55 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5825,9 +5825,6 @@ static int slab_debug_trace_open(struct inode *inode, struct file *filep)
 	if (!alloc_loc_track(t, PAGE_SIZE / sizeof(struct location), GFP_KERNEL))
 		return -ENOMEM;
 
-	/* Push back cpu slabs */
-	flush_all(s);
-
 	for_each_kmem_cache_node(s, node, n) {
 		unsigned long flags;
 		struct page *page;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 02/33] mm, slub: allocate private object map for debugfs listings
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 01/33] mm, slub: don't call flush_all() from slab_debug_trace_open() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 03/33] mm, slub: allocate private object map for validate_slab_cache() Vlastimil Babka
                   ` (32 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

Slub has a static spinlock protected bitmap for marking which objects are on
freelist when it wants to list them, for situations where dynamically
allocating such map can lead to recursion or locking issues, and on-stack
bitmap would be too large.

The handlers of debugfs files alloc_traces and free_traces also currently use this
shared bitmap, but their syscall context makes it straightforward to allocate a
private map before entering locked sections, so switch these processing paths
to use a private bitmap.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/slub.c | 44 +++++++++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index f6063ec97a55..fb603fdf58cb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -454,6 +454,18 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
 static DEFINE_SPINLOCK(object_map_lock);
 
+static void __fill_map(unsigned long *obj_map, struct kmem_cache *s,
+		       struct page *page)
+{
+	void *addr = page_address(page);
+	void *p;
+
+	bitmap_zero(obj_map, page->objects);
+
+	for (p = page->freelist; p; p = get_freepointer(s, p))
+		set_bit(__obj_to_index(s, addr, p), obj_map);
+}
+
 #if IS_ENABLED(CONFIG_KUNIT)
 static bool slab_add_kunit_errors(void)
 {
@@ -483,17 +495,11 @@ static inline bool slab_add_kunit_errors(void) { return false; }
 static unsigned long *get_map(struct kmem_cache *s, struct page *page)
 	__acquires(&object_map_lock)
 {
-	void *p;
-	void *addr = page_address(page);
-
 	VM_BUG_ON(!irqs_disabled());
 
 	spin_lock(&object_map_lock);
 
-	bitmap_zero(object_map, page->objects);
-
-	for (p = page->freelist; p; p = get_freepointer(s, p))
-		set_bit(__obj_to_index(s, addr, p), object_map);
+	__fill_map(object_map, s, page);
 
 	return object_map;
 }
@@ -4879,17 +4885,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 }
 
 static void process_slab(struct loc_track *t, struct kmem_cache *s,
-		struct page *page, enum track_item alloc)
+		struct page *page, enum track_item alloc,
+		unsigned long *obj_map)
 {
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map;
 
-	map = get_map(s, page);
+	__fill_map(obj_map, s, page);
+
 	for_each_object(p, s, addr, page->objects)
-		if (!test_bit(__obj_to_index(s, addr, p), map))
+		if (!test_bit(__obj_to_index(s, addr, p), obj_map))
 			add_location(t, s, get_track(s, p, alloc));
-	put_map(map);
 }
 #endif  /* CONFIG_DEBUG_FS   */
 #endif	/* CONFIG_SLUB_DEBUG */
@@ -5816,14 +5822,21 @@ static int slab_debug_trace_open(struct inode *inode, struct file *filep)
 	struct loc_track *t = __seq_open_private(filep, &slab_debugfs_sops,
 						sizeof(struct loc_track));
 	struct kmem_cache *s = file_inode(filep)->i_private;
+	unsigned long *obj_map;
+
+	obj_map = bitmap_alloc(oo_objects(s->oo), GFP_KERNEL);
+	if (!obj_map)
+		return -ENOMEM;
 
 	if (strcmp(filep->f_path.dentry->d_name.name, "alloc_traces") == 0)
 		alloc = TRACK_ALLOC;
 	else
 		alloc = TRACK_FREE;
 
-	if (!alloc_loc_track(t, PAGE_SIZE / sizeof(struct location), GFP_KERNEL))
+	if (!alloc_loc_track(t, PAGE_SIZE / sizeof(struct location), GFP_KERNEL)) {
+		bitmap_free(obj_map);
 		return -ENOMEM;
+	}
 
 	for_each_kmem_cache_node(s, node, n) {
 		unsigned long flags;
@@ -5834,12 +5847,13 @@ static int slab_debug_trace_open(struct inode *inode, struct file *filep)
 
 		spin_lock_irqsave(&n->list_lock, flags);
 		list_for_each_entry(page, &n->partial, slab_list)
-			process_slab(t, s, page, alloc);
+			process_slab(t, s, page, alloc, obj_map);
 		list_for_each_entry(page, &n->full, slab_list)
-			process_slab(t, s, page, alloc);
+			process_slab(t, s, page, alloc, obj_map);
 		spin_unlock_irqrestore(&n->list_lock, flags);
 	}
 
+	bitmap_free(obj_map);
 	return 0;
 }
 
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 03/33] mm, slub: allocate private object map for validate_slab_cache()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 01/33] mm, slub: don't call flush_all() from slab_debug_trace_open() Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 02/33] mm, slub: allocate private object map for debugfs listings Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 04/33] mm, slub: don't disable irq for debug_check_no_locks_freed() Vlastimil Babka
                   ` (31 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

validate_slab_cache() is called either to handle a sysfs write, or from a
self-test context. In both situations it's straightforward to preallocate a
private object bitmap instead of grabbing the shared static one meant for
critical sections, so let's do that.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/slub.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index fb603fdf58cb..4697280130f2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4679,11 +4679,11 @@ static int count_total(struct page *page)
 #endif
 
 #ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page)
+static void validate_slab(struct kmem_cache *s, struct page *page,
+			  unsigned long *obj_map)
 {
 	void *p;
 	void *addr = page_address(page);
-	unsigned long *map;
 
 	slab_lock(page);
 
@@ -4691,21 +4691,20 @@ static void validate_slab(struct kmem_cache *s, struct page *page)
 		goto unlock;
 
 	/* Now we know that a valid freelist exists */
-	map = get_map(s, page);
+	__fill_map(obj_map, s, page);
 	for_each_object(p, s, addr, page->objects) {
-		u8 val = test_bit(__obj_to_index(s, addr, p), map) ?
+		u8 val = test_bit(__obj_to_index(s, addr, p), obj_map) ?
 			 SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
 
 		if (!check_object(s, page, p, val))
 			break;
 	}
-	put_map(map);
 unlock:
 	slab_unlock(page);
 }
 
 static int validate_slab_node(struct kmem_cache *s,
-		struct kmem_cache_node *n)
+		struct kmem_cache_node *n, unsigned long *obj_map)
 {
 	unsigned long count = 0;
 	struct page *page;
@@ -4714,7 +4713,7 @@ static int validate_slab_node(struct kmem_cache *s,
 	spin_lock_irqsave(&n->list_lock, flags);
 
 	list_for_each_entry(page, &n->partial, slab_list) {
-		validate_slab(s, page);
+		validate_slab(s, page, obj_map);
 		count++;
 	}
 	if (count != n->nr_partial) {
@@ -4727,7 +4726,7 @@ static int validate_slab_node(struct kmem_cache *s,
 		goto out;
 
 	list_for_each_entry(page, &n->full, slab_list) {
-		validate_slab(s, page);
+		validate_slab(s, page, obj_map);
 		count++;
 	}
 	if (count != atomic_long_read(&n->nr_slabs)) {
@@ -4746,10 +4745,17 @@ long validate_slab_cache(struct kmem_cache *s)
 	int node;
 	unsigned long count = 0;
 	struct kmem_cache_node *n;
+	unsigned long *obj_map;
+
+	obj_map = bitmap_alloc(oo_objects(s->oo), GFP_KERNEL);
+	if (!obj_map)
+		return -ENOMEM;
 
 	flush_all(s);
 	for_each_kmem_cache_node(s, node, n)
-		count += validate_slab_node(s, n);
+		count += validate_slab_node(s, n, obj_map);
+
+	bitmap_free(obj_map);
 
 	return count;
 }
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 04/33] mm, slub: don't disable irq for debug_check_no_locks_freed()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (2 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 03/33] mm, slub: allocate private object map for validate_slab_cache() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 05/33] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() Vlastimil Babka
                   ` (30 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

In slab_free_hook() we disable irqs around the debug_check_no_locks_freed()
call, which is unnecessary, as irqs are already being disabled inside the call.
This seems to be leftover from the past where there were more calls inside the
irq disabled sections. Remove the irq disable/enable operations.

Mel noted:
> Looks like it was needed for kmemcheck which went away back in 4.15

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/slub.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 4697280130f2..fee093db2bfd 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1591,20 +1591,8 @@ static __always_inline bool slab_free_hook(struct kmem_cache *s,
 {
 	kmemleak_free_recursive(x, s->flags);
 
-	/*
-	 * Trouble is that we may no longer disable interrupts in the fast path
-	 * So in order to make the debug calls that expect irqs to be
-	 * disabled we need to disable interrupts temporarily.
-	 */
-#ifdef CONFIG_LOCKDEP
-	{
-		unsigned long flags;
+	debug_check_no_locks_freed(x, s->object_size);
 
-		local_irq_save(flags);
-		debug_check_no_locks_freed(x, s->object_size);
-		local_irq_restore(flags);
-	}
-#endif
 	if (!(s->flags & SLAB_DEBUG_OBJECTS))
 		debug_check_no_obj_freed(x, s->object_size);
 
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 05/33] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (3 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 04/33] mm, slub: don't disable irq for debug_check_no_locks_freed() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 06/33] mm, slub: extract get_partial() from new_slab_objects() Vlastimil Babka
                   ` (29 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

Commit d6e0b7fa1186 ("slub: make dead caches discard free slabs immediately")
introduced cpu partial flushing for kmemcg caches, based on setting the target
cpu_partial to 0 and adding a flushing check in put_cpu_partial().
This code that sets cpu_partial to 0 was later moved by c9fc586403e7 ("slab:
introduce __kmemcg_cache_deactivate()") and ultimately removed by 9855609bde03
("mm: memcg/slab: use a single set of kmem_caches for all accounted
allocations"). However the check and flush in put_cpu_partial() was never
removed, although it's effectively a dead code. So this patch removes it.

Note that d6e0b7fa1186 also added preempt_disable()/enable() to
unfreeze_partials() which could be thus also considered unnecessary. But
further patches will rely on it, so keep it.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index fee093db2bfd..79e53303844c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2466,13 +2466,6 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 
 	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
 								!= oldpage);
-	if (unlikely(!slub_cpu_partial(s))) {
-		unsigned long flags;
-
-		local_irq_save(flags);
-		unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
-		local_irq_restore(flags);
-	}
 	preempt_enable();
 #endif	/* CONFIG_SLUB_CPU_PARTIAL */
 }
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 06/33] mm, slub: extract get_partial() from new_slab_objects()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (4 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 05/33] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 07/33] mm, slub: dissolve new_slab_objects() into ___slab_alloc() Vlastimil Babka
                   ` (28 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

The later patches will need more fine grained control over individual actions
in ___slab_alloc(), the only caller of new_slab_objects(), so this is a first
preparatory step with no functional change.

This adds a goto label that appears unnecessary at this point, but will be
useful for later changes.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Christoph Lameter <cl@linux.com>
---
 mm/slub.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 79e53303844c..cd6aeeec4417 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2613,17 +2613,12 @@ slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid)
 static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
 			int node, struct kmem_cache_cpu **pc)
 {
-	void *freelist;
+	void *freelist = NULL;
 	struct kmem_cache_cpu *c = *pc;
 	struct page *page;
 
 	WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
 
-	freelist = get_partial(s, flags, node, c);
-
-	if (freelist)
-		return freelist;
-
 	page = new_slab(s, flags, node);
 	if (page) {
 		c = raw_cpu_ptr(s->cpu_slab);
@@ -2787,6 +2782,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		goto redo;
 	}
 
+	freelist = get_partial(s, gfpflags, node, c);
+	if (freelist)
+		goto check_new_page;
+
 	freelist = new_slab_objects(s, gfpflags, node, &c);
 
 	if (unlikely(!freelist)) {
@@ -2794,6 +2793,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		return NULL;
 	}
 
+check_new_page:
 	page = c->page;
 	if (likely(!kmem_cache_debug(s) && pfmemalloc_match(page, gfpflags)))
 		goto load_freelist;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 07/33] mm, slub: dissolve new_slab_objects() into ___slab_alloc()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (5 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 06/33] mm, slub: extract get_partial() from new_slab_objects() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 08/33] mm, slub: return slab page from get_partial() and set c->page afterwards Vlastimil Babka
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

The later patches will need more fine grained control over individual actions
in ___slab_alloc(), the only caller of new_slab_objects(), so dissolve it
there. This is a preparatory step with no functional change.

The only minor change is moving WARN_ON_ONCE() for using a constructor together
with __GFP_ZERO to new_slab(), which makes it somewhat less frequent, but still
able to catch a development change introducing a systematic misuse.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/slub.c | 50 ++++++++++++++++++--------------------------------
 1 file changed, 18 insertions(+), 32 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index cd6aeeec4417..0c645b0e96d9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1885,6 +1885,8 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if (unlikely(flags & GFP_SLAB_BUG_MASK))
 		flags = kmalloc_fix_flags(flags);
 
+	WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
+
 	return allocate_slab(s,
 		flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
 }
@@ -2610,36 +2612,6 @@ slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid)
 #endif
 }
 
-static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
-			int node, struct kmem_cache_cpu **pc)
-{
-	void *freelist = NULL;
-	struct kmem_cache_cpu *c = *pc;
-	struct page *page;
-
-	WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
-
-	page = new_slab(s, flags, node);
-	if (page) {
-		c = raw_cpu_ptr(s->cpu_slab);
-		if (c->page)
-			flush_slab(s, c);
-
-		/*
-		 * No other reference to the page yet so we can
-		 * muck around with it freely without cmpxchg
-		 */
-		freelist = page->freelist;
-		page->freelist = NULL;
-
-		stat(s, ALLOC_SLAB);
-		c->page = page;
-		*pc = c;
-	}
-
-	return freelist;
-}
-
 static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags)
 {
 	if (unlikely(PageSlabPfmemalloc(page)))
@@ -2786,13 +2758,27 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	if (freelist)
 		goto check_new_page;
 
-	freelist = new_slab_objects(s, gfpflags, node, &c);
+	page = new_slab(s, gfpflags, node);
 
-	if (unlikely(!freelist)) {
+	if (unlikely(!page)) {
 		slab_out_of_memory(s, gfpflags, node);
 		return NULL;
 	}
 
+	c = raw_cpu_ptr(s->cpu_slab);
+	if (c->page)
+		flush_slab(s, c);
+
+	/*
+	 * No other reference to the page yet so we can
+	 * muck around with it freely without cmpxchg
+	 */
+	freelist = page->freelist;
+	page->freelist = NULL;
+
+	stat(s, ALLOC_SLAB);
+	c->page = page;
+
 check_new_page:
 	page = c->page;
 	if (likely(!kmem_cache_debug(s) && pfmemalloc_match(page, gfpflags)))
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 08/33] mm, slub: return slab page from get_partial() and set c->page afterwards
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (6 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 07/33] mm, slub: dissolve new_slab_objects() into ___slab_alloc() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 09/33] mm, slub: restructure new page checks in ___slab_alloc() Vlastimil Babka
                   ` (26 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

The function get_partial() finds a suitable page on a partial list, acquires
and returns its freelist and assigns the page pointer to kmem_cache_cpu.
In later patch we will need more control over the kmem_cache_cpu.page
assignment, so instead of passing a kmem_cache_cpu pointer, pass a pointer to a
pointer to a page that get_partial() can fill and the caller can assign the
kmem_cache_cpu.page pointer. No functional change as all of this still happens
with disabled IRQs.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 0c645b0e96d9..e9d582eee7d7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2017,7 +2017,7 @@ static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
  * Try to allocate a partial slab from a specific node.
  */
 static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
-				struct kmem_cache_cpu *c, gfp_t flags)
+			      struct page **ret_page, gfp_t flags)
 {
 	struct page *page, *page2;
 	void *object = NULL;
@@ -2046,7 +2046,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
 
 		available += objects;
 		if (!object) {
-			c->page = page;
+			*ret_page = page;
 			stat(s, ALLOC_FROM_PARTIAL);
 			object = t;
 		} else {
@@ -2066,7 +2066,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
  * Get a page from somewhere. Search in increasing NUMA distances.
  */
 static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
-		struct kmem_cache_cpu *c)
+			     struct page **ret_page)
 {
 #ifdef CONFIG_NUMA
 	struct zonelist *zonelist;
@@ -2108,7 +2108,7 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
 
 			if (n && cpuset_zone_allowed(zone, flags) &&
 					n->nr_partial > s->min_partial) {
-				object = get_partial_node(s, n, c, flags);
+				object = get_partial_node(s, n, ret_page, flags);
 				if (object) {
 					/*
 					 * Don't check read_mems_allowed_retry()
@@ -2130,7 +2130,7 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
  * Get a partial page, lock it and return it.
  */
 static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
-		struct kmem_cache_cpu *c)
+			 struct page **ret_page)
 {
 	void *object;
 	int searchnode = node;
@@ -2138,11 +2138,11 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 	if (node == NUMA_NO_NODE)
 		searchnode = numa_mem_id();
 
-	object = get_partial_node(s, get_node(s, searchnode), c, flags);
+	object = get_partial_node(s, get_node(s, searchnode), ret_page, flags);
 	if (object || node != NUMA_NO_NODE)
 		return object;
 
-	return get_any_partial(s, flags, c);
+	return get_any_partial(s, flags, ret_page);
 }
 
 #ifdef CONFIG_PREEMPTION
@@ -2754,9 +2754,11 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		goto redo;
 	}
 
-	freelist = get_partial(s, gfpflags, node, c);
-	if (freelist)
+	freelist = get_partial(s, gfpflags, node, &page);
+	if (freelist) {
+		c->page = page;
 		goto check_new_page;
+	}
 
 	page = new_slab(s, gfpflags, node);
 
@@ -2780,7 +2782,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	c->page = page;
 
 check_new_page:
-	page = c->page;
 	if (likely(!kmem_cache_debug(s) && pfmemalloc_match(page, gfpflags)))
 		goto load_freelist;
 
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 09/33] mm, slub: restructure new page checks in ___slab_alloc()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (7 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 08/33] mm, slub: return slab page from get_partial() and set c->page afterwards Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 10/33] mm, slub: simplify kmem_cache_cpu and tid setup Vlastimil Babka
                   ` (25 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

When we allocate slab object from a newly acquired page (from node's partial
list or page allocator), we usually also retain the page as a new percpu slab.
There are two exceptions - when pfmemalloc status of the page doesn't match our
gfp flags, or when the cache has debugging enabled.

The current code for these decisions is not easy to follow, so restructure it
and add comments. The new structure will also help with the following changes.
No functional change.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/slub.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index e9d582eee7d7..9607ce37e661 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2782,13 +2782,29 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	c->page = page;
 
 check_new_page:
-	if (likely(!kmem_cache_debug(s) && pfmemalloc_match(page, gfpflags)))
-		goto load_freelist;
 
-	/* Only entered in the debug case */
-	if (kmem_cache_debug(s) &&
-			!alloc_debug_processing(s, page, freelist, addr))
-		goto new_slab;	/* Slab failed checks. Next slab needed */
+	if (kmem_cache_debug(s)) {
+		if (!alloc_debug_processing(s, page, freelist, addr))
+			/* Slab failed checks. Next slab needed */
+			goto new_slab;
+		else
+			/*
+			 * For debug case, we don't load freelist so that all
+			 * allocations go through alloc_debug_processing()
+			 */
+			goto return_single;
+	}
+
+	if (unlikely(!pfmemalloc_match(page, gfpflags)))
+		/*
+		 * For !pfmemalloc_match() case we don't load freelist so that
+		 * we don't make further mismatched allocations easier.
+		 */
+		goto return_single;
+
+	goto load_freelist;
+
+return_single:
 
 	deactivate_slab(s, page, get_freepointer(s, freelist), c);
 	return freelist;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 10/33] mm, slub: simplify kmem_cache_cpu and tid setup
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (8 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 09/33] mm, slub: restructure new page checks in ___slab_alloc() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 11/33] mm, slub: move disabling/enabling irqs to ___slab_alloc() Vlastimil Babka
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

In slab_alloc_node() and do_slab_free() fastpaths we need to guarantee that
our kmem_cache_cpu pointer is from the same cpu as the tid value. Currently
that's done by reading the tid first using this_cpu_read(), then the
kmem_cache_cpu pointer and verifying we read the same tid using the pointer and
plain READ_ONCE().

This can be simplified to just fetching kmem_cache_cpu pointer and then reading
tid using the pointer. That guarantees they are from the same cpu. We don't
need to read the tid using this_cpu_read() because the value will be validated
by this_cpu_cmpxchg_double(), making sure we are on the correct cpu and the
freelist didn't change by anyone preempting us since reading the tid.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/slub.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 9607ce37e661..c0dc5968223c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2882,15 +2882,14 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 	 * reading from one cpu area. That does not matter as long
 	 * as we end up on the original cpu again when doing the cmpxchg.
 	 *
-	 * We should guarantee that tid and kmem_cache are retrieved on
-	 * the same cpu. It could be different if CONFIG_PREEMPTION so we need
-	 * to check if it is matched or not.
+	 * We must guarantee that tid and kmem_cache_cpu are retrieved on the
+	 * same cpu. We read first the kmem_cache_cpu pointer and use it to read
+	 * the tid. If we are preempted and switched to another cpu between the
+	 * two reads, it's OK as the two are still associated with the same cpu
+	 * and cmpxchg later will validate the cpu.
 	 */
-	do {
-		tid = this_cpu_read(s->cpu_slab->tid);
-		c = raw_cpu_ptr(s->cpu_slab);
-	} while (IS_ENABLED(CONFIG_PREEMPTION) &&
-		 unlikely(tid != READ_ONCE(c->tid)));
+	c = raw_cpu_ptr(s->cpu_slab);
+	tid = READ_ONCE(c->tid);
 
 	/*
 	 * Irqless object alloc/free algorithm used here depends on sequence
@@ -3164,11 +3163,8 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
 	 * data is retrieved via this pointer. If we are on the same cpu
 	 * during the cmpxchg then the free will succeed.
 	 */
-	do {
-		tid = this_cpu_read(s->cpu_slab->tid);
-		c = raw_cpu_ptr(s->cpu_slab);
-	} while (IS_ENABLED(CONFIG_PREEMPTION) &&
-		 unlikely(tid != READ_ONCE(c->tid)));
+	c = raw_cpu_ptr(s->cpu_slab);
+	tid = READ_ONCE(c->tid);
 
 	/* Same with comment on barrier() in slab_alloc_node() */
 	barrier();
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 11/33] mm, slub: move disabling/enabling irqs to ___slab_alloc()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (9 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 10/33] mm, slub: simplify kmem_cache_cpu and tid setup Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 12/33] mm, slub: do initial checks in ___slab_alloc() with irqs enabled Vlastimil Babka
                   ` (23 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

Currently __slab_alloc() disables irqs around the whole ___slab_alloc().  This
includes cases where this is not needed, such as when the allocation ends up in
the page allocator and has to awkwardly enable irqs back based on gfp flags.
Also the whole kmem_cache_alloc_bulk() is executed with irqs disabled even when
it hits the __slab_alloc() slow path, and long periods with disabled interrupts
are undesirable.

As a first step towards reducing irq disabled periods, move irq handling into
___slab_alloc(). Callers will instead prevent the s->cpu_slab percpu pointer
from becoming invalid via get_cpu_ptr(), thus preempt_disable(). This does not
protect against modification by an irq handler, which is still done by disabled
irq for most of ___slab_alloc(). As a small immediate benefit,
slab_out_of_memory() from ___slab_alloc() is now called with irqs enabled.

kmem_cache_alloc_bulk() disables irqs for its fastpath and then re-enables them
before calling ___slab_alloc(), which then disables them at its discretion. The
whole kmem_cache_alloc_bulk() operation also disables preemption.

When  ___slab_alloc() calls new_slab() to allocate a new page, re-enable
preemption, because new_slab() will re-enable interrupts in contexts that allow
blocking (this will be improved by later patches).

The patch itself will thus increase overhead a bit due to disabled preemption
(on configs where it matters) and increased disabling/enabling irqs in
kmem_cache_alloc_bulk(), but that will be gradually improved in the following
patches.

Note in __slab_alloc() we need to change the #ifdef CONFIG_PREEMPT guard to
CONFIG_PREEMPT_COUNT to make sure preempt disable/enable is properly paired in
all configurations. On configs without involuntary preemption and debugging
the re-read of kmem_cache_cpu pointer is still compiled out as it was before.

[ Mike Galbraith <efault@gmx.de>: Fix kmem_cache_alloc_bulk() error path ]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index c0dc5968223c..dda05cc83eef 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2670,7 +2670,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
  * we need to allocate a new slab. This is the slowest path since it involves
  * a call to the page allocator and the setup of a new slab.
  *
- * Version of __slab_alloc to use when we know that interrupts are
+ * Version of __slab_alloc to use when we know that preemption is
  * already disabled (which is the case for bulk allocation).
  */
 static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
@@ -2678,9 +2678,11 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 {
 	void *freelist;
 	struct page *page;
+	unsigned long flags;
 
 	stat(s, ALLOC_SLOWPATH);
 
+	local_irq_save(flags);
 	page = c->page;
 	if (!page) {
 		/*
@@ -2743,6 +2745,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	VM_BUG_ON(!c->page->frozen);
 	c->freelist = get_freepointer(s, freelist);
 	c->tid = next_tid(c->tid);
+	local_irq_restore(flags);
 	return freelist;
 
 new_slab:
@@ -2760,14 +2763,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		goto check_new_page;
 	}
 
+	put_cpu_ptr(s->cpu_slab);
 	page = new_slab(s, gfpflags, node);
+	c = get_cpu_ptr(s->cpu_slab);
 
 	if (unlikely(!page)) {
+		local_irq_restore(flags);
 		slab_out_of_memory(s, gfpflags, node);
 		return NULL;
 	}
 
-	c = raw_cpu_ptr(s->cpu_slab);
 	if (c->page)
 		flush_slab(s, c);
 
@@ -2807,31 +2812,33 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 return_single:
 
 	deactivate_slab(s, page, get_freepointer(s, freelist), c);
+	local_irq_restore(flags);
 	return freelist;
 }
 
 /*
- * Another one that disabled interrupt and compensates for possible
- * cpu changes by refetching the per cpu area pointer.
+ * A wrapper for ___slab_alloc() for contexts where preemption is not yet
+ * disabled. Compensates for possible cpu changes by refetching the per cpu area
+ * pointer.
  */
 static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			  unsigned long addr, struct kmem_cache_cpu *c)
 {
 	void *p;
-	unsigned long flags;
 
-	local_irq_save(flags);
-#ifdef CONFIG_PREEMPTION
+#ifdef CONFIG_PREEMPT_COUNT
 	/*
 	 * We may have been preempted and rescheduled on a different
-	 * cpu before disabling interrupts. Need to reload cpu area
+	 * cpu before disabling preemption. Need to reload cpu area
 	 * pointer.
 	 */
-	c = this_cpu_ptr(s->cpu_slab);
+	c = get_cpu_ptr(s->cpu_slab);
 #endif
 
 	p = ___slab_alloc(s, gfpflags, node, addr, c);
-	local_irq_restore(flags);
+#ifdef CONFIG_PREEMPT_COUNT
+	put_cpu_ptr(s->cpu_slab);
+#endif
 	return p;
 }
 
@@ -3359,8 +3366,8 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	 * IRQs, which protects against PREEMPT and interrupts
 	 * handlers invoking normal fastpath.
 	 */
+	c = get_cpu_ptr(s->cpu_slab);
 	local_irq_disable();
-	c = this_cpu_ptr(s->cpu_slab);
 
 	for (i = 0; i < size; i++) {
 		void *object = kfence_alloc(s, s->object_size, flags);
@@ -3381,6 +3388,8 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			 */
 			c->tid = next_tid(c->tid);
 
+			local_irq_enable();
+
 			/*
 			 * Invoking slow path likely have side-effect
 			 * of re-populating per CPU c->freelist
@@ -3393,6 +3402,8 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			c = this_cpu_ptr(s->cpu_slab);
 			maybe_wipe_obj_freeptr(s, p[i]);
 
+			local_irq_disable();
+
 			continue; /* goto for-loop */
 		}
 		c->freelist = get_freepointer(s, object);
@@ -3401,6 +3412,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	}
 	c->tid = next_tid(c->tid);
 	local_irq_enable();
+	put_cpu_ptr(s->cpu_slab);
 
 	/*
 	 * memcg and kmem_cache debug support and memory initialization.
@@ -3410,7 +3422,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 				slab_want_init_on_alloc(flags, s));
 	return i;
 error:
-	local_irq_enable();
+	put_cpu_ptr(s->cpu_slab);
 	slab_post_alloc_hook(s, objcg, flags, i, p, false);
 	__kmem_cache_free_bulk(s, i, p);
 	return 0;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 12/33] mm, slub: do initial checks in ___slab_alloc() with irqs enabled
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (10 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 11/33] mm, slub: move disabling/enabling irqs to ___slab_alloc() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 13/33] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() Vlastimil Babka
                   ` (22 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

As another step of shortening irq disabled sections in ___slab_alloc(), delay
disabling irqs until we pass the initial checks if there is a cached percpu
slab and it's suitable for our allocation.

Now we have to recheck c->page after actually disabling irqs as an allocation
in irq handler might have replaced it.

Because we call pfmemalloc_match() as one of the checks, we might hit
VM_BUG_ON_PAGE(!PageSlab(page)) in PageSlabPfmemalloc in case we get
interrupted and the page is freed. Thus introduce a pfmemalloc_match_unsafe()
variant that lacks the PageSlab check.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/page-flags.h |  9 +++++++
 mm/slub.c                  | 54 +++++++++++++++++++++++++++++++-------
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5922031ffab6..7fda4fb85bdc 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -815,6 +815,15 @@ static inline int PageSlabPfmemalloc(struct page *page)
 	return PageActive(page);
 }
 
+/*
+ * A version of PageSlabPfmemalloc() for opportunistic checks where the page
+ * might have been freed under us and not be a PageSlab anymore.
+ */
+static inline int __PageSlabPfmemalloc(struct page *page)
+{
+	return PageActive(page);
+}
+
 static inline void SetPageSlabPfmemalloc(struct page *page)
 {
 	VM_BUG_ON_PAGE(!PageSlab(page), page);
diff --git a/mm/slub.c b/mm/slub.c
index dda05cc83eef..6295695d8515 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2620,6 +2620,19 @@ static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags)
 	return true;
 }
 
+/*
+ * A variant of pfmemalloc_match() that tests page flags without asserting
+ * PageSlab. Intended for opportunistic checks before taking a lock and
+ * rechecking that nobody else freed the page under us.
+ */
+static inline bool pfmemalloc_match_unsafe(struct page *page, gfp_t gfpflags)
+{
+	if (unlikely(__PageSlabPfmemalloc(page)))
+		return gfp_pfmemalloc_allowed(gfpflags);
+
+	return true;
+}
+
 /*
  * Check the page->freelist of a page and either transfer the freelist to the
  * per cpu freelist or deactivate the page.
@@ -2682,8 +2695,9 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 	stat(s, ALLOC_SLOWPATH);
 
-	local_irq_save(flags);
-	page = c->page;
+reread_page:
+
+	page = READ_ONCE(c->page);
 	if (!page) {
 		/*
 		 * if the node is not online or has no normal memory, just
@@ -2692,6 +2706,11 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		if (unlikely(node != NUMA_NO_NODE &&
 			     !node_isset(node, slab_nodes)))
 			node = NUMA_NO_NODE;
+		local_irq_save(flags);
+		if (unlikely(c->page)) {
+			local_irq_restore(flags);
+			goto reread_page;
+		}
 		goto new_slab;
 	}
 redo:
@@ -2706,8 +2725,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			goto redo;
 		} else {
 			stat(s, ALLOC_NODE_MISMATCH);
-			deactivate_slab(s, page, c->freelist, c);
-			goto new_slab;
+			goto deactivate_slab;
 		}
 	}
 
@@ -2716,12 +2734,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	 * PFMEMALLOC but right now, we are losing the pfmemalloc
 	 * information when the page leaves the per-cpu allocator
 	 */
-	if (unlikely(!pfmemalloc_match(page, gfpflags))) {
-		deactivate_slab(s, page, c->freelist, c);
-		goto new_slab;
-	}
+	if (unlikely(!pfmemalloc_match_unsafe(page, gfpflags)))
+		goto deactivate_slab;
 
-	/* must check again c->freelist in case of cpu migration or IRQ */
+	/* must check again c->page in case IRQ handler changed it */
+	local_irq_save(flags);
+	if (unlikely(page != c->page)) {
+		local_irq_restore(flags);
+		goto reread_page;
+	}
 	freelist = c->freelist;
 	if (freelist)
 		goto load_freelist;
@@ -2737,6 +2758,9 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	stat(s, ALLOC_REFILL);
 
 load_freelist:
+
+	lockdep_assert_irqs_disabled();
+
 	/*
 	 * freelist is pointing to the list of objects to be used.
 	 * page is pointing to the page from which the objects are obtained.
@@ -2748,11 +2772,23 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	local_irq_restore(flags);
 	return freelist;
 
+deactivate_slab:
+
+	local_irq_save(flags);
+	if (page != c->page) {
+		local_irq_restore(flags);
+		goto reread_page;
+	}
+	deactivate_slab(s, page, c->freelist, c);
+
 new_slab:
 
+	lockdep_assert_irqs_disabled();
+
 	if (slub_percpu_partial(c)) {
 		page = c->page = slub_percpu_partial(c);
 		slub_set_percpu_partial(c, page);
+		local_irq_restore(flags);
 		stat(s, CPU_PARTIAL_ALLOC);
 		goto redo;
 	}
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 13/33] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (11 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 12/33] mm, slub: do initial checks in ___slab_alloc() with irqs enabled Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 14/33] mm, slub: restore irqs around calling new_slab() Vlastimil Babka
                   ` (21 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

Continue reducing the irq disabled scope. Check for per-cpu partial slabs with
first with irqs enabled and then recheck with irqs disabled before grabbing
the slab page. Mostly preparatory for the following patches.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 6295695d8515..4d1f3e4a5951 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2706,11 +2706,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		if (unlikely(node != NUMA_NO_NODE &&
 			     !node_isset(node, slab_nodes)))
 			node = NUMA_NO_NODE;
-		local_irq_save(flags);
-		if (unlikely(c->page)) {
-			local_irq_restore(flags);
-			goto reread_page;
-		}
 		goto new_slab;
 	}
 redo:
@@ -2751,6 +2746,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 	if (!freelist) {
 		c->page = NULL;
+		local_irq_restore(flags);
 		stat(s, DEACTIVATE_BYPASS);
 		goto new_slab;
 	}
@@ -2780,12 +2776,19 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		goto reread_page;
 	}
 	deactivate_slab(s, page, c->freelist, c);
+	local_irq_restore(flags);
 
 new_slab:
 
-	lockdep_assert_irqs_disabled();
-
 	if (slub_percpu_partial(c)) {
+		local_irq_save(flags);
+		if (unlikely(c->page)) {
+			local_irq_restore(flags);
+			goto reread_page;
+		}
+		if (unlikely(!slub_percpu_partial(c)))
+			goto new_objects; /* stolen by an IRQ handler */
+
 		page = c->page = slub_percpu_partial(c);
 		slub_set_percpu_partial(c, page);
 		local_irq_restore(flags);
@@ -2793,6 +2796,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		goto redo;
 	}
 
+	local_irq_save(flags);
+	if (unlikely(c->page)) {
+		local_irq_restore(flags);
+		goto reread_page;
+	}
+
+new_objects:
+
+	lockdep_assert_irqs_disabled();
+
 	freelist = get_partial(s, gfpflags, node, &page);
 	if (freelist) {
 		c->page = page;
@@ -2825,15 +2838,18 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 check_new_page:
 
 	if (kmem_cache_debug(s)) {
-		if (!alloc_debug_processing(s, page, freelist, addr))
+		if (!alloc_debug_processing(s, page, freelist, addr)) {
 			/* Slab failed checks. Next slab needed */
+			c->page = NULL;
+			local_irq_restore(flags);
 			goto new_slab;
-		else
+		} else {
 			/*
 			 * For debug case, we don't load freelist so that all
 			 * allocations go through alloc_debug_processing()
 			 */
 			goto return_single;
+		}
 	}
 
 	if (unlikely(!pfmemalloc_match(page, gfpflags)))
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 14/33] mm, slub: restore irqs around calling new_slab()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (12 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 13/33] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 15/33] mm, slub: validate slab from partial list or page allocator before making it cpu slab Vlastimil Babka
                   ` (20 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

allocate_slab() currently re-enables irqs before calling to the page allocator.
It depends on gfpflags_allow_blocking() to determine if it's safe to do so.
Now we can instead simply restore irq before calling it through new_slab().
The other caller early_kmem_cache_node_alloc() is unaffected by this.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 4d1f3e4a5951..7798ba1c614f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1809,9 +1809,6 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 
 	flags &= gfp_allowed_mask;
 
-	if (gfpflags_allow_blocking(flags))
-		local_irq_enable();
-
 	flags |= s->allocflags;
 
 	/*
@@ -1870,8 +1867,6 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	page->frozen = 1;
 
 out:
-	if (gfpflags_allow_blocking(flags))
-		local_irq_disable();
 	if (!page)
 		return NULL;
 
@@ -2812,16 +2807,17 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		goto check_new_page;
 	}
 
+	local_irq_restore(flags);
 	put_cpu_ptr(s->cpu_slab);
 	page = new_slab(s, gfpflags, node);
 	c = get_cpu_ptr(s->cpu_slab);
 
 	if (unlikely(!page)) {
-		local_irq_restore(flags);
 		slab_out_of_memory(s, gfpflags, node);
 		return NULL;
 	}
 
+	local_irq_save(flags);
 	if (c->page)
 		flush_slab(s, c);
 
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 15/33] mm, slub: validate slab from partial list or page allocator before making it cpu slab
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (13 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 14/33] mm, slub: restore irqs around calling new_slab() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 16/33] mm, slub: check new pages with restored irqs Vlastimil Babka
                   ` (19 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

When we obtain a new slab page from node partial list or page allocator, we
assign it to kmem_cache_cpu, perform some checks, and if they fail, we undo
the assignment.

In order to allow doing the checks without irq disabled, restructure the code
so that the checks are done first, and kmem_cache_cpu.page assignment only
after they pass.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 7798ba1c614f..a5e974defcb7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2802,10 +2802,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	lockdep_assert_irqs_disabled();
 
 	freelist = get_partial(s, gfpflags, node, &page);
-	if (freelist) {
-		c->page = page;
+	if (freelist)
 		goto check_new_page;
-	}
 
 	local_irq_restore(flags);
 	put_cpu_ptr(s->cpu_slab);
@@ -2818,9 +2816,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	}
 
 	local_irq_save(flags);
-	if (c->page)
-		flush_slab(s, c);
-
 	/*
 	 * No other reference to the page yet so we can
 	 * muck around with it freely without cmpxchg
@@ -2829,14 +2824,12 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	page->freelist = NULL;
 
 	stat(s, ALLOC_SLAB);
-	c->page = page;
 
 check_new_page:
 
 	if (kmem_cache_debug(s)) {
 		if (!alloc_debug_processing(s, page, freelist, addr)) {
 			/* Slab failed checks. Next slab needed */
-			c->page = NULL;
 			local_irq_restore(flags);
 			goto new_slab;
 		} else {
@@ -2855,10 +2848,18 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		 */
 		goto return_single;
 
+	if (unlikely(c->page))
+		flush_slab(s, c);
+	c->page = page;
+
 	goto load_freelist;
 
 return_single:
 
+	if (unlikely(c->page))
+		flush_slab(s, c);
+	c->page = page;
+
 	deactivate_slab(s, page, get_freepointer(s, freelist), c);
 	local_irq_restore(flags);
 	return freelist;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 16/33] mm, slub: check new pages with restored irqs
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (14 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 15/33] mm, slub: validate slab from partial list or page allocator before making it cpu slab Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 17/33] mm, slub: stop disabling irqs around get_partial() Vlastimil Babka
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

Building on top of the previous patch, re-enable irqs before checking new
pages. alloc_debug_processing() is now called with enabled irqs so we need to
remove VM_BUG_ON(!irqs_disabled()); in check_slab() - there doesn't seem to be
a need for it anyway.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index a5e974defcb7..b5788040d92e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1009,8 +1009,6 @@ static int check_slab(struct kmem_cache *s, struct page *page)
 {
 	int maxobj;
 
-	VM_BUG_ON(!irqs_disabled());
-
 	if (!PageSlab(page)) {
 		slab_err(s, page, "Not a valid slab page");
 		return 0;
@@ -2802,10 +2800,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	lockdep_assert_irqs_disabled();
 
 	freelist = get_partial(s, gfpflags, node, &page);
+	local_irq_restore(flags);
 	if (freelist)
 		goto check_new_page;
 
-	local_irq_restore(flags);
 	put_cpu_ptr(s->cpu_slab);
 	page = new_slab(s, gfpflags, node);
 	c = get_cpu_ptr(s->cpu_slab);
@@ -2815,7 +2813,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		return NULL;
 	}
 
-	local_irq_save(flags);
 	/*
 	 * No other reference to the page yet so we can
 	 * muck around with it freely without cmpxchg
@@ -2830,7 +2827,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	if (kmem_cache_debug(s)) {
 		if (!alloc_debug_processing(s, page, freelist, addr)) {
 			/* Slab failed checks. Next slab needed */
-			local_irq_restore(flags);
 			goto new_slab;
 		} else {
 			/*
@@ -2848,6 +2844,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		 */
 		goto return_single;
 
+	local_irq_save(flags);
 	if (unlikely(c->page))
 		flush_slab(s, c);
 	c->page = page;
@@ -2856,6 +2853,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 return_single:
 
+	local_irq_save(flags);
 	if (unlikely(c->page))
 		flush_slab(s, c);
 	c->page = page;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 17/33] mm, slub: stop disabling irqs around get_partial()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (15 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 16/33] mm, slub: check new pages with restored irqs Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 18/33] mm, slub: move reset of c->page and freelist out of deactivate_slab() Vlastimil Babka
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

The function get_partial() does not need to have irqs disabled as a whole. It's
sufficient to convert spin_lock operations to their irq saving/restoring
versions.

As a result, it's now possible to reach the page allocator from the slab
allocator without disabling and re-enabling interrupts on the way.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b5788040d92e..8433e50d2c8e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2010,11 +2010,12 @@ static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
  * Try to allocate a partial slab from a specific node.
  */
 static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
-			      struct page **ret_page, gfp_t flags)
+			      struct page **ret_page, gfp_t gfpflags)
 {
 	struct page *page, *page2;
 	void *object = NULL;
 	unsigned int available = 0;
+	unsigned long flags;
 	int objects;
 
 	/*
@@ -2026,11 +2027,11 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
 	if (!n || !n->nr_partial)
 		return NULL;
 
-	spin_lock(&n->list_lock);
+	spin_lock_irqsave(&n->list_lock, flags);
 	list_for_each_entry_safe(page, page2, &n->partial, slab_list) {
 		void *t;
 
-		if (!pfmemalloc_match(page, flags))
+		if (!pfmemalloc_match(page, gfpflags))
 			continue;
 
 		t = acquire_slab(s, n, page, object == NULL, &objects);
@@ -2051,7 +2052,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
 			break;
 
 	}
-	spin_unlock(&n->list_lock);
+	spin_unlock_irqrestore(&n->list_lock, flags);
 	return object;
 }
 
@@ -2779,8 +2780,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			local_irq_restore(flags);
 			goto reread_page;
 		}
-		if (unlikely(!slub_percpu_partial(c)))
+		if (unlikely(!slub_percpu_partial(c))) {
+			local_irq_restore(flags);
 			goto new_objects; /* stolen by an IRQ handler */
+		}
 
 		page = c->page = slub_percpu_partial(c);
 		slub_set_percpu_partial(c, page);
@@ -2789,18 +2792,9 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		goto redo;
 	}
 
-	local_irq_save(flags);
-	if (unlikely(c->page)) {
-		local_irq_restore(flags);
-		goto reread_page;
-	}
-
 new_objects:
 
-	lockdep_assert_irqs_disabled();
-
 	freelist = get_partial(s, gfpflags, node, &page);
-	local_irq_restore(flags);
 	if (freelist)
 		goto check_new_page;
 
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 18/33] mm, slub: move reset of c->page and freelist out of deactivate_slab()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (16 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 17/33] mm, slub: stop disabling irqs around get_partial() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 19/33] mm, slub: make locking in deactivate_slab() irq-safe Vlastimil Babka
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

deactivate_slab() removes the cpu slab by merging the cpu freelist with slab's
freelist and putting the slab on the proper node's list. It also sets the
respective kmem_cache_cpu pointers to NULL.

By extracting the kmem_cache_cpu operations from the function, we can make it
not dependent on disabled irqs.

Also if we return a single free pointer from ___slab_alloc, we no longer have
to assign kmem_cache_cpu.page before deactivation or care if somebody preempted
us and assigned a different page to our kmem_cache_cpu in the process.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8433e50d2c8e..cea7a2ad9e3e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2209,10 +2209,13 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
 }
 
 /*
- * Remove the cpu slab
+ * Finishes removing the cpu slab. Merges cpu's freelist with page's freelist,
+ * unfreezes the slabs and puts it on the proper list.
+ * Assumes the slab has been already safely taken away from kmem_cache_cpu
+ * by the caller.
  */
 static void deactivate_slab(struct kmem_cache *s, struct page *page,
-				void *freelist, struct kmem_cache_cpu *c)
+			    void *freelist)
 {
 	enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
 	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
@@ -2341,9 +2344,6 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 		discard_slab(s, page);
 		stat(s, FREE_SLAB);
 	}
-
-	c->page = NULL;
-	c->freelist = NULL;
 }
 
 /*
@@ -2468,10 +2468,16 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
-	stat(s, CPUSLAB_FLUSH);
-	deactivate_slab(s, c->page, c->freelist, c);
+	void *freelist = c->freelist;
+	struct page *page = c->page;
 
+	c->page = NULL;
+	c->freelist = NULL;
 	c->tid = next_tid(c->tid);
+
+	deactivate_slab(s, page, freelist);
+
+	stat(s, CPUSLAB_FLUSH);
 }
 
 /*
@@ -2769,7 +2775,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		local_irq_restore(flags);
 		goto reread_page;
 	}
-	deactivate_slab(s, page, c->freelist, c);
+	freelist = c->freelist;
+	c->page = NULL;
+	c->freelist = NULL;
+	deactivate_slab(s, page, freelist);
 	local_irq_restore(flags);
 
 new_slab:
@@ -2848,11 +2857,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 return_single:
 
 	local_irq_save(flags);
-	if (unlikely(c->page))
-		flush_slab(s, c);
-	c->page = page;
-
-	deactivate_slab(s, page, get_freepointer(s, freelist), c);
+	deactivate_slab(s, page, get_freepointer(s, freelist));
 	local_irq_restore(flags);
 	return freelist;
 }
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 19/33] mm, slub: make locking in deactivate_slab() irq-safe
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (17 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 18/33] mm, slub: move reset of c->page and freelist out of deactivate_slab() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 20/33] mm, slub: call deactivate_slab() without disabling irqs Vlastimil Babka
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

dectivate_slab() now no longer touches the kmem_cache_cpu structure, so it will
be possible to call it with irqs enabled. Just convert the spin_lock calls to
their irq saving/restoring variants to make it irq-safe.

Note we now have to use cmpxchg_double_slab() for irq-safe slab_lock(), because
in some situations we don't take the list_lock, which would disable irqs.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index cea7a2ad9e3e..6deb4080ef54 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2223,6 +2223,7 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 	enum slab_modes l = M_NONE, m = M_NONE;
 	void *nextfree, *freelist_iter, *freelist_tail;
 	int tail = DEACTIVATE_TO_HEAD;
+	unsigned long flags = 0;
 	struct page new;
 	struct page old;
 
@@ -2298,7 +2299,7 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 			 * that acquire_slab() will see a slab page that
 			 * is frozen
 			 */
-			spin_lock(&n->list_lock);
+			spin_lock_irqsave(&n->list_lock, flags);
 		}
 	} else {
 		m = M_FULL;
@@ -2309,7 +2310,7 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 			 * slabs from diagnostic functions will not see
 			 * any frozen slabs.
 			 */
-			spin_lock(&n->list_lock);
+			spin_lock_irqsave(&n->list_lock, flags);
 		}
 	}
 
@@ -2326,14 +2327,14 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 	}
 
 	l = m;
-	if (!__cmpxchg_double_slab(s, page,
+	if (!cmpxchg_double_slab(s, page,
 				old.freelist, old.counters,
 				new.freelist, new.counters,
 				"unfreezing slab"))
 		goto redo;
 
 	if (lock)
-		spin_unlock(&n->list_lock);
+		spin_unlock_irqrestore(&n->list_lock, flags);
 
 	if (m == M_PARTIAL)
 		stat(s, tail);
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 20/33] mm, slub: call deactivate_slab() without disabling irqs
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (18 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 19/33] mm, slub: make locking in deactivate_slab() irq-safe Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 21/33] mm, slub: move irq control into unfreeze_partials() Vlastimil Babka
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

The function is now safe to be called with irqs enabled, so move the calls
outside of irq disabled sections.

When called from ___slab_alloc() -> flush_slab() we have irqs disabled, so to
reenable them before deactivate_slab() we need to open-code flush_slab() in
___slab_alloc() and reenable irqs after modifying the kmem_cache_cpu fields.
But that means a IRQ handler meanwhile might have assigned a new page to
kmem_cache_cpu.page so we have to retry the whole check.

The remaining callers of flush_slab() are the IPI handler which has disabled
irqs anyway, and slub_cpu_dead() which will be dealt with in the following
patch.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 6deb4080ef54..cb12a077c61c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2779,8 +2779,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	freelist = c->freelist;
 	c->page = NULL;
 	c->freelist = NULL;
-	deactivate_slab(s, page, freelist);
 	local_irq_restore(flags);
+	deactivate_slab(s, page, freelist);
 
 new_slab:
 
@@ -2848,18 +2848,32 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		 */
 		goto return_single;
 
+retry_load_page:
+
 	local_irq_save(flags);
-	if (unlikely(c->page))
-		flush_slab(s, c);
+	if (unlikely(c->page)) {
+		void *flush_freelist = c->freelist;
+		struct page *flush_page = c->page;
+
+		c->page = NULL;
+		c->freelist = NULL;
+		c->tid = next_tid(c->tid);
+
+		local_irq_restore(flags);
+
+		deactivate_slab(s, flush_page, flush_freelist);
+
+		stat(s, CPUSLAB_FLUSH);
+
+		goto retry_load_page;
+	}
 	c->page = page;
 
 	goto load_freelist;
 
 return_single:
 
-	local_irq_save(flags);
 	deactivate_slab(s, page, get_freepointer(s, freelist));
-	local_irq_restore(flags);
 	return freelist;
 }
 
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 21/33] mm, slub: move irq control into unfreeze_partials()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (19 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 20/33] mm, slub: call deactivate_slab() without disabling irqs Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 22/33] mm, slub: discard slabs in unfreeze_partials() without irqs disabled Vlastimil Babka
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

unfreeze_partials() can be optimized so that it doesn't need irqs disabled for
the whole time. As the first step, move irq control into the function and
remove it from the put_cpu_partial() caller.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index cb12a077c61c..1c4bd45d66a1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2350,9 +2350,8 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 /*
  * Unfreeze all the cpu partial slabs.
  *
- * This function must be called with interrupts disabled
- * for the cpu using c (or some other guarantee must be there
- * to guarantee no concurrent accesses).
+ * This function must be called with preemption or migration
+ * disabled with c local to the cpu.
  */
 static void unfreeze_partials(struct kmem_cache *s,
 		struct kmem_cache_cpu *c)
@@ -2360,6 +2359,9 @@ static void unfreeze_partials(struct kmem_cache *s,
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct kmem_cache_node *n = NULL, *n2 = NULL;
 	struct page *page, *discard_page = NULL;
+	unsigned long flags;
+
+	local_irq_save(flags);
 
 	while ((page = slub_percpu_partial(c))) {
 		struct page new;
@@ -2412,6 +2414,8 @@ static void unfreeze_partials(struct kmem_cache *s,
 		discard_slab(s, page);
 		stat(s, FREE_SLAB);
 	}
+
+	local_irq_restore(flags);
 #endif	/* CONFIG_SLUB_CPU_PARTIAL */
 }
 
@@ -2439,14 +2443,11 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 			pobjects = oldpage->pobjects;
 			pages = oldpage->pages;
 			if (drain && pobjects > slub_cpu_partial(s)) {
-				unsigned long flags;
 				/*
 				 * partial array is full. Move the existing
 				 * set to the per node partial list.
 				 */
-				local_irq_save(flags);
 				unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
-				local_irq_restore(flags);
 				oldpage = NULL;
 				pobjects = 0;
 				pages = 0;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 22/33] mm, slub: discard slabs in unfreeze_partials() without irqs disabled
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (20 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 21/33] mm, slub: move irq control into unfreeze_partials() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 23/33] mm, slub: detach whole partial list at once in unfreeze_partials() Vlastimil Babka
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

No need for disabled irqs when discarding slabs, so restore them before
discarding.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1c4bd45d66a1..0a1e048d0db7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2406,6 +2406,8 @@ static void unfreeze_partials(struct kmem_cache *s,
 	if (n)
 		spin_unlock(&n->list_lock);
 
+	local_irq_restore(flags);
+
 	while (discard_page) {
 		page = discard_page;
 		discard_page = discard_page->next;
@@ -2415,7 +2417,6 @@ static void unfreeze_partials(struct kmem_cache *s,
 		stat(s, FREE_SLAB);
 	}
 
-	local_irq_restore(flags);
 #endif	/* CONFIG_SLUB_CPU_PARTIAL */
 }
 
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 23/33] mm, slub: detach whole partial list at once in unfreeze_partials()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (21 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 22/33] mm, slub: discard slabs in unfreeze_partials() without irqs disabled Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 24/33] mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing Vlastimil Babka
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

Instead of iterating through the live percpu partial list, detach it from the
kmem_cache_cpu at once. This is simpler and will allow further optimization.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 0a1e048d0db7..b31e00eb9561 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2358,16 +2358,20 @@ static void unfreeze_partials(struct kmem_cache *s,
 {
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct kmem_cache_node *n = NULL, *n2 = NULL;
-	struct page *page, *discard_page = NULL;
+	struct page *page, *partial_page, *discard_page = NULL;
 	unsigned long flags;
 
 	local_irq_save(flags);
 
-	while ((page = slub_percpu_partial(c))) {
+	partial_page = slub_percpu_partial(c);
+	c->partial = NULL;
+
+	while (partial_page) {
 		struct page new;
 		struct page old;
 
-		slub_set_percpu_partial(c, page);
+		page = partial_page;
+		partial_page = page->next;
 
 		n2 = get_node(s, page_to_nid(page));
 		if (n != n2) {
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 24/33] mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (22 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 23/33] mm, slub: detach whole partial list at once in unfreeze_partials() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 25/33] mm, slub: only disable irq with spin_lock in __unfreeze_partials() Vlastimil Babka
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

Unfreezing partial list can be split to two phases - detaching the list from
struct kmem_cache_cpu, and processing the list. The whole operation does not
need to be protected by disabled irqs. Restructure the code to separate the
detaching (with disabled irqs) and unfreezing (with irq disabling to be reduced
in the next patch).

Also, unfreeze_partials() can be called from another cpu on behalf of a cpu
that is being offlined, where disabling irqs on the local cpu has no sense, so
restructure the code as follows:

- __unfreeze_partials() is the bulk of unfreeze_partials() that processes the
  detached percpu partial list
- unfreeze_partials() detaches list from current cpu with irqs disabled and
  calls __unfreeze_partials()
- unfreeze_partials_cpu() is to be called for the offlined cpu so it needs no
  irq disabling, and is called from __flush_cpu_slab()
- flush_cpu_slab() is for the local cpu thus it needs to call
  unfreeze_partials(). So it can't simply call
  __flush_cpu_slab(smp_processor_id()) anymore and we have to open-code the
  proper calls.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 73 ++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 51 insertions(+), 22 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b31e00eb9561..9b46d9b9c879 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2347,25 +2347,15 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 	}
 }
 
-/*
- * Unfreeze all the cpu partial slabs.
- *
- * This function must be called with preemption or migration
- * disabled with c local to the cpu.
- */
-static void unfreeze_partials(struct kmem_cache *s,
-		struct kmem_cache_cpu *c)
-{
 #ifdef CONFIG_SLUB_CPU_PARTIAL
+static void __unfreeze_partials(struct kmem_cache *s, struct page *partial_page)
+{
 	struct kmem_cache_node *n = NULL, *n2 = NULL;
-	struct page *page, *partial_page, *discard_page = NULL;
+	struct page *page, *discard_page = NULL;
 	unsigned long flags;
 
 	local_irq_save(flags);
 
-	partial_page = slub_percpu_partial(c);
-	c->partial = NULL;
-
 	while (partial_page) {
 		struct page new;
 		struct page old;
@@ -2420,10 +2410,45 @@ static void unfreeze_partials(struct kmem_cache *s,
 		discard_slab(s, page);
 		stat(s, FREE_SLAB);
 	}
+}
 
-#endif	/* CONFIG_SLUB_CPU_PARTIAL */
+/*
+ * Unfreeze all the cpu partial slabs.
+ */
+static void unfreeze_partials(struct kmem_cache *s)
+{
+	struct page *partial_page;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	partial_page = this_cpu_read(s->cpu_slab->partial);
+	this_cpu_write(s->cpu_slab->partial, NULL);
+	local_irq_restore(flags);
+
+	if (partial_page)
+		__unfreeze_partials(s, partial_page);
 }
 
+static void unfreeze_partials_cpu(struct kmem_cache *s,
+				  struct kmem_cache_cpu *c)
+{
+	struct page *partial_page;
+
+	partial_page = slub_percpu_partial(c);
+	c->partial = NULL;
+
+	if (partial_page)
+		__unfreeze_partials(s, partial_page);
+}
+
+#else	/* CONFIG_SLUB_CPU_PARTIAL */
+
+static inline void unfreeze_partials(struct kmem_cache *s) { }
+static inline void unfreeze_partials_cpu(struct kmem_cache *s,
+				  struct kmem_cache_cpu *c) { }
+
+#endif	/* CONFIG_SLUB_CPU_PARTIAL */
+
 /*
  * Put a page that was just frozen (in __slab_free|get_partial_node) into a
  * partial page slot if available.
@@ -2452,7 +2477,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 				 * partial array is full. Move the existing
 				 * set to the per node partial list.
 				 */
-				unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
+				unfreeze_partials(s);
 				oldpage = NULL;
 				pobjects = 0;
 				pages = 0;
@@ -2487,11 +2512,6 @@ static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 	stat(s, CPUSLAB_FLUSH);
 }
 
-/*
- * Flush cpu slab.
- *
- * Called from IPI handler with interrupts disabled.
- */
 static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
 {
 	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
@@ -2499,14 +2519,23 @@ static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
 	if (c->page)
 		flush_slab(s, c);
 
-	unfreeze_partials(s, c);
+	unfreeze_partials_cpu(s, c);
 }
 
+/*
+ * Flush cpu slab.
+ *
+ * Called from IPI handler with interrupts disabled.
+ */
 static void flush_cpu_slab(void *d)
 {
 	struct kmem_cache *s = d;
+	struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
 
-	__flush_cpu_slab(s, smp_processor_id());
+	if (c->page)
+		flush_slab(s, c);
+
+	unfreeze_partials(s);
 }
 
 static bool has_cpu_slab(int cpu, void *info)
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 25/33] mm, slub: only disable irq with spin_lock in __unfreeze_partials()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (23 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 24/33] mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 26/33] mm, slub: don't disable irqs in slub_cpu_dead() Vlastimil Babka
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

__unfreeze_partials() no longer needs to have irqs disabled, except for making
the spin_lock operations irq-safe, so convert the spin_locks operations and
remove the separate irq handling.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 9b46d9b9c879..c5766e504e73 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2352,9 +2352,7 @@ static void __unfreeze_partials(struct kmem_cache *s, struct page *partial_page)
 {
 	struct kmem_cache_node *n = NULL, *n2 = NULL;
 	struct page *page, *discard_page = NULL;
-	unsigned long flags;
-
-	local_irq_save(flags);
+	unsigned long flags = 0;
 
 	while (partial_page) {
 		struct page new;
@@ -2366,10 +2364,10 @@ static void __unfreeze_partials(struct kmem_cache *s, struct page *partial_page)
 		n2 = get_node(s, page_to_nid(page));
 		if (n != n2) {
 			if (n)
-				spin_unlock(&n->list_lock);
+				spin_unlock_irqrestore(&n->list_lock, flags);
 
 			n = n2;
-			spin_lock(&n->list_lock);
+			spin_lock_irqsave(&n->list_lock, flags);
 		}
 
 		do {
@@ -2398,9 +2396,7 @@ static void __unfreeze_partials(struct kmem_cache *s, struct page *partial_page)
 	}
 
 	if (n)
-		spin_unlock(&n->list_lock);
-
-	local_irq_restore(flags);
+		spin_unlock_irqrestore(&n->list_lock, flags);
 
 	while (discard_page) {
 		page = discard_page;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 26/33] mm, slub: don't disable irqs in slub_cpu_dead()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (24 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 25/33] mm, slub: only disable irq with spin_lock in __unfreeze_partials() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 27/33] mm, slab: split out the cpu offline variant of flush_slab() Vlastimil Babka
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

slub_cpu_dead() cleans up for an offlined cpu from another cpu and calls only
functions that are now irq safe, so we don't need to disable irqs anymore.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index c5766e504e73..c4a9b8901576 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2554,14 +2554,10 @@ static void flush_all(struct kmem_cache *s)
 static int slub_cpu_dead(unsigned int cpu)
 {
 	struct kmem_cache *s;
-	unsigned long flags;
 
 	mutex_lock(&slab_mutex);
-	list_for_each_entry(s, &slab_caches, list) {
-		local_irq_save(flags);
+	list_for_each_entry(s, &slab_caches, list)
 		__flush_cpu_slab(s, cpu);
-		local_irq_restore(flags);
-	}
 	mutex_unlock(&slab_mutex);
 	return 0;
 }
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 27/33] mm, slab: split out the cpu offline variant of flush_slab()
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (25 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 26/33] mm, slub: don't disable irqs in slub_cpu_dead() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 28/33] mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context Vlastimil Babka
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

flush_slab() is called either as part IPI handler on given live cpu, or as a
cleanup on behalf of another cpu that went offline. The first case needs to
protect updating the kmem_cache_cpu fields with disabled irqs. Currently the
whole call happens with irqs disabled by the IPI handler, but the following
patch will change from IPI to workqueue, and flush_slab() will have to disable
irqs (to be replaced with a local lock later) in the critical part.

To prepare for this change, replace the call to flush_slab() for the dead cpu
handling with an opencoded variant that will not disable irqs nor take a local
lock.

Suggested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index c4a9b8901576..fa9a366d2d9c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2511,9 +2511,17 @@ static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
 {
 	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
+	void *freelist = c->freelist;
+	struct page *page = c->page;
 
-	if (c->page)
-		flush_slab(s, c);
+	c->page = NULL;
+	c->freelist = NULL;
+	c->tid = next_tid(c->tid);
+
+	if (page) {
+		deactivate_slab(s, page, freelist);
+		stat(s, CPUSLAB_FLUSH);
+	}
 
 	unfreeze_partials_cpu(s, c);
 }
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 28/33] mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (26 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 27/33] mm, slab: split out the cpu offline variant of flush_slab() Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:49 ` [PATCH v6 29/33] mm: slub: make object_map_lock a raw_spinlock_t Vlastimil Babka
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

flush_all() flushes a specific SLAB cache on each CPU (where the cache
is present). The deactivate_slab()/__free_slab() invocation happens
within IPI handler and is problematic for PREEMPT_RT.

The flush operation is not a frequent operation or a hot path. The
per-CPU flush operation can be moved to within a workqueue.

Because a workqueue handler, unlike IPI handler, does not disable irqs,
flush_slab() now has to disable them for working with the kmem_cache_cpu
fields. deactivate_slab() is safe to call with irqs enabled.

[vbabka@suse.cz: adapt to new SLUB changes]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c |  2 ++
 mm/slub.c        | 94 +++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 80 insertions(+), 16 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1c673c323baf..ec2bb0beed75 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -502,6 +502,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	if (unlikely(!s))
 		return;
 
+	cpus_read_lock();
 	mutex_lock(&slab_mutex);
 
 	s->refcount--;
@@ -516,6 +517,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	}
 out_unlock:
 	mutex_unlock(&slab_mutex);
+	cpus_read_unlock();
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
 
diff --git a/mm/slub.c b/mm/slub.c
index fa9a366d2d9c..b7f8b9d34e46 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2496,16 +2496,25 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
-	void *freelist = c->freelist;
-	struct page *page = c->page;
+	unsigned long flags;
+	struct page *page;
+	void *freelist;
+
+	local_irq_save(flags);
+
+	page = c->page;
+	freelist = c->freelist;
 
 	c->page = NULL;
 	c->freelist = NULL;
 	c->tid = next_tid(c->tid);
 
-	deactivate_slab(s, page, freelist);
+	local_irq_restore(flags);
 
-	stat(s, CPUSLAB_FLUSH);
+	if (page) {
+		deactivate_slab(s, page, freelist);
+		stat(s, CPUSLAB_FLUSH);
+	}
 }
 
 static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
@@ -2526,15 +2535,27 @@ static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
 	unfreeze_partials_cpu(s, c);
 }
 
+struct slub_flush_work {
+	struct work_struct work;
+	struct kmem_cache *s;
+	bool skip;
+};
+
 /*
  * Flush cpu slab.
  *
- * Called from IPI handler with interrupts disabled.
+ * Called from CPU work handler with migration disabled.
  */
-static void flush_cpu_slab(void *d)
+static void flush_cpu_slab(struct work_struct *w)
 {
-	struct kmem_cache *s = d;
-	struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
+	struct kmem_cache *s;
+	struct kmem_cache_cpu *c;
+	struct slub_flush_work *sfw;
+
+	sfw = container_of(w, struct slub_flush_work, work);
+
+	s = sfw->s;
+	c = this_cpu_ptr(s->cpu_slab);
 
 	if (c->page)
 		flush_slab(s, c);
@@ -2542,17 +2563,51 @@ static void flush_cpu_slab(void *d)
 	unfreeze_partials(s);
 }
 
-static bool has_cpu_slab(int cpu, void *info)
+static bool has_cpu_slab(int cpu, struct kmem_cache *s)
 {
-	struct kmem_cache *s = info;
 	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
 
 	return c->page || slub_percpu_partial(c);
 }
 
+static DEFINE_MUTEX(flush_lock);
+static DEFINE_PER_CPU(struct slub_flush_work, slub_flush);
+
+static void flush_all_cpus_locked(struct kmem_cache *s)
+{
+	struct slub_flush_work *sfw;
+	unsigned int cpu;
+
+	lockdep_assert_cpus_held();
+	mutex_lock(&flush_lock);
+
+	for_each_online_cpu(cpu) {
+		sfw = &per_cpu(slub_flush, cpu);
+		if (!has_cpu_slab(cpu, s)) {
+			sfw->skip = true;
+			continue;
+		}
+		INIT_WORK(&sfw->work, flush_cpu_slab);
+		sfw->skip = false;
+		sfw->s = s;
+		schedule_work_on(cpu, &sfw->work);
+	}
+
+	for_each_online_cpu(cpu) {
+		sfw = &per_cpu(slub_flush, cpu);
+		if (sfw->skip)
+			continue;
+		flush_work(&sfw->work);
+	}
+
+	mutex_unlock(&flush_lock);
+}
+
 static void flush_all(struct kmem_cache *s)
 {
-	on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1);
+	cpus_read_lock();
+	flush_all_cpus_locked(s);
+	cpus_read_unlock();
 }
 
 /*
@@ -4097,7 +4152,7 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 	int node;
 	struct kmem_cache_node *n;
 
-	flush_all(s);
+	flush_all_cpus_locked(s);
 	/* Attempt to free all objects */
 	for_each_kmem_cache_node(s, node, n) {
 		free_partial(s, n);
@@ -4373,7 +4428,7 @@ EXPORT_SYMBOL(kfree);
  * being allocated from last increasing the chance that the last objects
  * are freed in them.
  */
-int __kmem_cache_shrink(struct kmem_cache *s)
+static int __kmem_cache_do_shrink(struct kmem_cache *s)
 {
 	int node;
 	int i;
@@ -4385,7 +4440,6 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 	unsigned long flags;
 	int ret = 0;
 
-	flush_all(s);
 	for_each_kmem_cache_node(s, node, n) {
 		INIT_LIST_HEAD(&discard);
 		for (i = 0; i < SHRINK_PROMOTE_MAX; i++)
@@ -4435,13 +4489,21 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 	return ret;
 }
 
+int __kmem_cache_shrink(struct kmem_cache *s)
+{
+	flush_all(s);
+	return __kmem_cache_do_shrink(s);
+}
+
 static int slab_mem_going_offline_callback(void *arg)
 {
 	struct kmem_cache *s;
 
 	mutex_lock(&slab_mutex);
-	list_for_each_entry(s, &slab_caches, list)
-		__kmem_cache_shrink(s);
+	list_for_each_entry(s, &slab_caches, list) {
+		flush_all_cpus_locked(s);
+		__kmem_cache_do_shrink(s);
+	}
 	mutex_unlock(&slab_mutex);
 
 	return 0;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 29/33] mm: slub: make object_map_lock a raw_spinlock_t
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (27 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 28/33] mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context Vlastimil Babka
@ 2021-09-04 10:49 ` Vlastimil Babka
  2021-09-04 10:50 ` [PATCH v6 30/33] mm, slub: make slab_lock() disable irqs with PREEMPT_RT Vlastimil Babka
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:49 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The variable object_map is protected by object_map_lock. The lock is always
acquired in debug code and within already atomic context

Make object_map_lock a raw_spinlock_t.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b7f8b9d34e46..76e56ce55d21 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -452,7 +452,7 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 
 #ifdef CONFIG_SLUB_DEBUG
 static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
-static DEFINE_SPINLOCK(object_map_lock);
+static DEFINE_RAW_SPINLOCK(object_map_lock);
 
 static void __fill_map(unsigned long *obj_map, struct kmem_cache *s,
 		       struct page *page)
@@ -497,7 +497,7 @@ static unsigned long *get_map(struct kmem_cache *s, struct page *page)
 {
 	VM_BUG_ON(!irqs_disabled());
 
-	spin_lock(&object_map_lock);
+	raw_spin_lock(&object_map_lock);
 
 	__fill_map(object_map, s, page);
 
@@ -507,7 +507,7 @@ static unsigned long *get_map(struct kmem_cache *s, struct page *page)
 static void put_map(unsigned long *map) __releases(&object_map_lock)
 {
 	VM_BUG_ON(map != object_map);
-	spin_unlock(&object_map_lock);
+	raw_spin_unlock(&object_map_lock);
 }
 
 static inline unsigned int size_from_object(struct kmem_cache *s)
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 30/33] mm, slub: make slab_lock() disable irqs with PREEMPT_RT
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (28 preceding siblings ...)
  2021-09-04 10:49 ` [PATCH v6 29/33] mm: slub: make object_map_lock a raw_spinlock_t Vlastimil Babka
@ 2021-09-04 10:50 ` Vlastimil Babka
  2021-09-04 10:50 ` [PATCH v6 31/33] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Vlastimil Babka
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:50 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

We need to disable irqs around slab_lock() (a bit spinlock) to make it
irq-safe. Most calls to slab_lock() are nested under spin_lock_irqsave() which
doesn't disable irqs on PREEMPT_RT, so add explicit disabling with PREEMPT_RT.
The exception is cmpxchg_double_slab() which already disables irqs, so use a
__slab_[un]lock() variant without irq disable there.

slab_[un]lock() thus needs a flags pointer parameter, which is unused on !RT.
free_debug_processing() now has two flags variables, which looks odd, but only
one is actually used - the one used in spin_lock_irqsave() on !RT and the one
used in slab_lock() on RT.

As a result, __cmpxchg_double_slab() and cmpxchg_double_slab() become
effectively identical on RT, as both will disable irqs, which is necessary on
RT as most callers of this function also rely on irqsaving lock operations.
Thus, assert that irqs are already disabled in __cmpxchg_double_slab() only on
!RT and also change the VM_BUG_ON assertion to the more standard lockdep_assert
one.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 58 +++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 76e56ce55d21..a04c36e173c0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -359,25 +359,44 @@ static inline unsigned int oo_objects(struct kmem_cache_order_objects x)
 /*
  * Per slab locking using the pagelock
  */
-static __always_inline void slab_lock(struct page *page)
+static __always_inline void __slab_lock(struct page *page)
 {
 	VM_BUG_ON_PAGE(PageTail(page), page);
 	bit_spin_lock(PG_locked, &page->flags);
 }
 
-static __always_inline void slab_unlock(struct page *page)
+static __always_inline void __slab_unlock(struct page *page)
 {
 	VM_BUG_ON_PAGE(PageTail(page), page);
 	__bit_spin_unlock(PG_locked, &page->flags);
 }
 
-/* Interrupts must be disabled (for the fallback code to work right) */
+static __always_inline void slab_lock(struct page *page, unsigned long *flags)
+{
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_save(*flags);
+	__slab_lock(page);
+}
+
+static __always_inline void slab_unlock(struct page *page, unsigned long *flags)
+{
+	__slab_unlock(page);
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_restore(*flags);
+}
+
+/*
+ * Interrupts must be disabled (for the fallback code to work right), typically
+ * by an _irqsave() lock variant. Except on PREEMPT_RT where locks are different
+ * so we disable interrupts as part of slab_[un]lock().
+ */
 static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 		void *freelist_old, unsigned long counters_old,
 		void *freelist_new, unsigned long counters_new,
 		const char *n)
 {
-	VM_BUG_ON(!irqs_disabled());
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		lockdep_assert_irqs_disabled();
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
@@ -388,15 +407,18 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page
 	} else
 #endif
 	{
-		slab_lock(page);
+		/* init to 0 to prevent spurious warnings */
+		unsigned long flags = 0;
+
+		slab_lock(page, &flags);
 		if (page->freelist == freelist_old &&
 					page->counters == counters_old) {
 			page->freelist = freelist_new;
 			page->counters = counters_new;
-			slab_unlock(page);
+			slab_unlock(page, &flags);
 			return true;
 		}
-		slab_unlock(page);
+		slab_unlock(page, &flags);
 	}
 
 	cpu_relax();
@@ -427,16 +449,16 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 		unsigned long flags;
 
 		local_irq_save(flags);
-		slab_lock(page);
+		__slab_lock(page);
 		if (page->freelist == freelist_old &&
 					page->counters == counters_old) {
 			page->freelist = freelist_new;
 			page->counters = counters_new;
-			slab_unlock(page);
+			__slab_unlock(page);
 			local_irq_restore(flags);
 			return true;
 		}
-		slab_unlock(page);
+		__slab_unlock(page);
 		local_irq_restore(flags);
 	}
 
@@ -1269,11 +1291,11 @@ static noinline int free_debug_processing(
 	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
 	void *object = head;
 	int cnt = 0;
-	unsigned long flags;
+	unsigned long flags, flags2;
 	int ret = 0;
 
 	spin_lock_irqsave(&n->list_lock, flags);
-	slab_lock(page);
+	slab_lock(page, &flags2);
 
 	if (s->flags & SLAB_CONSISTENCY_CHECKS) {
 		if (!check_slab(s, page))
@@ -1306,7 +1328,7 @@ static noinline int free_debug_processing(
 		slab_err(s, page, "Bulk freelist count(%d) invalid(%d)\n",
 			 bulk_cnt, cnt);
 
-	slab_unlock(page);
+	slab_unlock(page, &flags2);
 	spin_unlock_irqrestore(&n->list_lock, flags);
 	if (!ret)
 		slab_fix(s, "Object at 0x%p not freed", object);
@@ -4087,11 +4109,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 {
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
+	unsigned long flags;
 	unsigned long *map;
 	void *p;
 
 	slab_err(s, page, text, s->name);
-	slab_lock(page);
+	slab_lock(page, &flags);
 
 	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects) {
@@ -4102,7 +4125,7 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 		}
 	}
 	put_map(map);
-	slab_unlock(page);
+	slab_unlock(page, &flags);
 #endif
 }
 
@@ -4834,8 +4857,9 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
 {
 	void *p;
 	void *addr = page_address(page);
+	unsigned long flags;
 
-	slab_lock(page);
+	slab_lock(page, &flags);
 
 	if (!check_slab(s, page) || !on_freelist(s, page, NULL))
 		goto unlock;
@@ -4850,7 +4874,7 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
 			break;
 	}
 unlock:
-	slab_unlock(page);
+	slab_unlock(page, &flags);
 }
 
 static int validate_slab_node(struct kmem_cache *s,
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 31/33] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (29 preceding siblings ...)
  2021-09-04 10:50 ` [PATCH v6 30/33] mm, slub: make slab_lock() disable irqs with PREEMPT_RT Vlastimil Babka
@ 2021-09-04 10:50 ` Vlastimil Babka
  2021-09-04 10:50 ` [PATCH v6 32/33] mm, slub: use migrate_disable() on PREEMPT_RT Vlastimil Babka
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:50 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka, Jann Horn

Jann Horn reported [1] the following theoretically possible race:

  task A: put_cpu_partial() calls preempt_disable()
  task A: oldpage = this_cpu_read(s->cpu_slab->partial)
  interrupt: kfree() reaches unfreeze_partials() and discards the page
  task B (on another CPU): reallocates page as page cache
  task A: reads page->pages and page->pobjects, which are actually
  halves of the pointer page->lru.prev
  task B (on another CPU): frees page
  interrupt: allocates page as SLUB page and places it on the percpu partial list
  task A: this_cpu_cmpxchg() succeeds

  which would cause page->pages and page->pobjects to end up containing
  halves of pointers that would then influence when put_cpu_partial()
  happens and show up in root-only sysfs files. Maybe that's acceptable,
  I don't know. But there should probably at least be a comment for now
  to point out that we're reading union fields of a page that might be
  in a completely different state.

Additionally, the this_cpu_cmpxchg() approach in put_cpu_partial() is only safe
against s->cpu_slab->partial manipulation in ___slab_alloc() if the latter
disables irqs, otherwise a __slab_free() in an irq handler could call
put_cpu_partial() in the middle of ___slab_alloc() manipulating ->partial
and corrupt it. This becomes an issue on RT after a local_lock is introduced
in later patch. The fix means taking the local_lock also in put_cpu_partial()
on RT.

After debugging this issue, Mike Galbraith suggested [2] that to avoid
different locking schemes on RT and !RT, we can just protect put_cpu_partial()
with disabled irqs (to be converted to local_lock_irqsave() later) everywhere.
This should be acceptable as it's not a fast path, and moving the actual
partial unfreezing outside of the irq disabled section makes it short, and with
the retry loop gone the code can be also simplified. In addition, the race
reported by Jann should no longer be possible.

[1] https://lore.kernel.org/lkml/CAG48ez1mvUuXwg0YPH5ANzhQLpbphqk-ZS+jbRz+H66fvm4FcA@mail.gmail.com/
[2] https://lore.kernel.org/linux-rt-users/e3470ab357b48bccfbd1f5133b982178a7d2befb.camel@gmx.de/

Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 81 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 44 insertions(+), 37 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index a04c36e173c0..f4b33d2fddc1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2025,7 +2025,12 @@ static inline void *acquire_slab(struct kmem_cache *s,
 	return freelist;
 }
 
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
+#else
+static inline void put_cpu_partial(struct kmem_cache *s, struct page *page,
+				   int drain) { }
+#endif
 static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
 
 /*
@@ -2459,14 +2464,6 @@ static void unfreeze_partials_cpu(struct kmem_cache *s,
 		__unfreeze_partials(s, partial_page);
 }
 
-#else	/* CONFIG_SLUB_CPU_PARTIAL */
-
-static inline void unfreeze_partials(struct kmem_cache *s) { }
-static inline void unfreeze_partials_cpu(struct kmem_cache *s,
-				  struct kmem_cache_cpu *c) { }
-
-#endif	/* CONFIG_SLUB_CPU_PARTIAL */
-
 /*
  * Put a page that was just frozen (in __slab_free|get_partial_node) into a
  * partial page slot if available.
@@ -2476,46 +2473,56 @@ static inline void unfreeze_partials_cpu(struct kmem_cache *s,
  */
 static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 {
-#ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct page *oldpage;
-	int pages;
-	int pobjects;
+	struct page *page_to_unfreeze = NULL;
+	unsigned long flags;
+	int pages = 0;
+	int pobjects = 0;
 
-	preempt_disable();
-	do {
-		pages = 0;
-		pobjects = 0;
-		oldpage = this_cpu_read(s->cpu_slab->partial);
+	local_irq_save(flags);
+
+	oldpage = this_cpu_read(s->cpu_slab->partial);
 
-		if (oldpage) {
+	if (oldpage) {
+		if (drain && oldpage->pobjects > slub_cpu_partial(s)) {
+			/*
+			 * Partial array is full. Move the existing set to the
+			 * per node partial list. Postpone the actual unfreezing
+			 * outside of the critical section.
+			 */
+			page_to_unfreeze = oldpage;
+			oldpage = NULL;
+		} else {
 			pobjects = oldpage->pobjects;
 			pages = oldpage->pages;
-			if (drain && pobjects > slub_cpu_partial(s)) {
-				/*
-				 * partial array is full. Move the existing
-				 * set to the per node partial list.
-				 */
-				unfreeze_partials(s);
-				oldpage = NULL;
-				pobjects = 0;
-				pages = 0;
-				stat(s, CPU_PARTIAL_DRAIN);
-			}
 		}
+	}
 
-		pages++;
-		pobjects += page->objects - page->inuse;
+	pages++;
+	pobjects += page->objects - page->inuse;
 
-		page->pages = pages;
-		page->pobjects = pobjects;
-		page->next = oldpage;
+	page->pages = pages;
+	page->pobjects = pobjects;
+	page->next = oldpage;
 
-	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
-								!= oldpage);
-	preempt_enable();
-#endif	/* CONFIG_SLUB_CPU_PARTIAL */
+	this_cpu_write(s->cpu_slab->partial, page);
+
+	local_irq_restore(flags);
+
+	if (page_to_unfreeze) {
+		__unfreeze_partials(s, page_to_unfreeze);
+		stat(s, CPU_PARTIAL_DRAIN);
+	}
 }
 
+#else	/* CONFIG_SLUB_CPU_PARTIAL */
+
+static inline void unfreeze_partials(struct kmem_cache *s) { }
+static inline void unfreeze_partials_cpu(struct kmem_cache *s,
+				  struct kmem_cache_cpu *c) { }
+
+#endif	/* CONFIG_SLUB_CPU_PARTIAL */
+
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
 	unsigned long flags;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 32/33] mm, slub: use migrate_disable() on PREEMPT_RT
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (30 preceding siblings ...)
  2021-09-04 10:50 ` [PATCH v6 31/33] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Vlastimil Babka
@ 2021-09-04 10:50 ` Vlastimil Babka
  2021-09-04 10:50 ` [PATCH v6 33/33] mm, slub: convert kmem_cpu_slab protection to local_lock Vlastimil Babka
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:50 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

We currently use preempt_disable() (directly or via get_cpu_ptr()) to stabilize
the pointer to kmem_cache_cpu. On PREEMPT_RT this would be incompatible with
the list_lock spinlock. We can use migrate_disable() instead, but that
increases overhead on !PREEMPT_RT as it's an unconditional function call.

In order to get the best available mechanism on both PREEMPT_RT and
!PREEMPT_RT, introduce private slub_get_cpu_ptr() and slub_put_cpu_ptr()
wrappers and use them.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 39 ++++++++++++++++++++++++++++++---------
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index f4b33d2fddc1..38d4cc51e880 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -118,6 +118,26 @@
  * 			the fast path and disables lockless freelists.
  */
 
+/*
+ * We could simply use migrate_disable()/enable() but as long as it's a
+ * function call even on !PREEMPT_RT, use inline preempt_disable() there.
+ */
+#ifndef CONFIG_PREEMPT_RT
+#define slub_get_cpu_ptr(var)	get_cpu_ptr(var)
+#define slub_put_cpu_ptr(var)	put_cpu_ptr(var)
+#else
+#define slub_get_cpu_ptr(var)		\
+({					\
+	migrate_disable();		\
+	this_cpu_ptr(var);		\
+})
+#define slub_put_cpu_ptr(var)		\
+do {					\
+	(void)(var);			\
+	migrate_enable();		\
+} while (0)
+#endif
+
 #ifdef CONFIG_SLUB_DEBUG
 #ifdef CONFIG_SLUB_DEBUG_ON
 DEFINE_STATIC_KEY_TRUE(slub_debug_enabled);
@@ -2852,7 +2872,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	if (unlikely(!pfmemalloc_match_unsafe(page, gfpflags)))
 		goto deactivate_slab;
 
-	/* must check again c->page in case IRQ handler changed it */
+	/* must check again c->page in case we got preempted and it changed */
 	local_irq_save(flags);
 	if (unlikely(page != c->page)) {
 		local_irq_restore(flags);
@@ -2911,7 +2931,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		}
 		if (unlikely(!slub_percpu_partial(c))) {
 			local_irq_restore(flags);
-			goto new_objects; /* stolen by an IRQ handler */
+			/* we were preempted and partial list got empty */
+			goto new_objects;
 		}
 
 		page = c->page = slub_percpu_partial(c);
@@ -2927,9 +2948,9 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	if (freelist)
 		goto check_new_page;
 
-	put_cpu_ptr(s->cpu_slab);
+	slub_put_cpu_ptr(s->cpu_slab);
 	page = new_slab(s, gfpflags, node);
-	c = get_cpu_ptr(s->cpu_slab);
+	c = slub_get_cpu_ptr(s->cpu_slab);
 
 	if (unlikely(!page)) {
 		slab_out_of_memory(s, gfpflags, node);
@@ -3012,12 +3033,12 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	 * cpu before disabling preemption. Need to reload cpu area
 	 * pointer.
 	 */
-	c = get_cpu_ptr(s->cpu_slab);
+	c = slub_get_cpu_ptr(s->cpu_slab);
 #endif
 
 	p = ___slab_alloc(s, gfpflags, node, addr, c);
 #ifdef CONFIG_PREEMPT_COUNT
-	put_cpu_ptr(s->cpu_slab);
+	slub_put_cpu_ptr(s->cpu_slab);
 #endif
 	return p;
 }
@@ -3546,7 +3567,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	 * IRQs, which protects against PREEMPT and interrupts
 	 * handlers invoking normal fastpath.
 	 */
-	c = get_cpu_ptr(s->cpu_slab);
+	c = slub_get_cpu_ptr(s->cpu_slab);
 	local_irq_disable();
 
 	for (i = 0; i < size; i++) {
@@ -3592,7 +3613,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	}
 	c->tid = next_tid(c->tid);
 	local_irq_enable();
-	put_cpu_ptr(s->cpu_slab);
+	slub_put_cpu_ptr(s->cpu_slab);
 
 	/*
 	 * memcg and kmem_cache debug support and memory initialization.
@@ -3602,7 +3623,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 				slab_want_init_on_alloc(flags, s));
 	return i;
 error:
-	put_cpu_ptr(s->cpu_slab);
+	slub_put_cpu_ptr(s->cpu_slab);
 	slab_post_alloc_hook(s, objcg, flags, i, p, false);
 	__kmem_cache_free_bulk(s, i, p);
 	return 0;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 33/33] mm, slub: convert kmem_cpu_slab protection to local_lock
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (31 preceding siblings ...)
  2021-09-04 10:50 ` [PATCH v6 32/33] mm, slub: use migrate_disable() on PREEMPT_RT Vlastimil Babka
@ 2021-09-04 10:50 ` Vlastimil Babka
  2021-09-05 14:16 ` [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Mike Galbraith
  2021-09-07  8:20 ` Mel Gorman
  34 siblings, 0 replies; 36+ messages in thread
From: Vlastimil Babka @ 2021-09-04 10:50 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg, Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, Mike Galbraith,
	Sebastian Andrzej Siewior, Thomas Gleixner, Mel Gorman,
	Vlastimil Babka

Embed local_lock into struct kmem_cpu_slab and use the irq-safe versions of
local_lock instead of plain local_irq_save/restore. On !PREEMPT_RT that's
equivalent, with better lockdep visibility. On PREEMPT_RT that means better
preemption.

However, the cost on PREEMPT_RT is the loss of lockless fast paths which only
work with cpu freelist. Those are designed to detect and recover from being
preempted by other conflicting operations (both fast or slow path), but the
slow path operations assume they cannot be preempted by a fast path operation,
which is guaranteed naturally with disabled irqs. With local locks on
PREEMPT_RT, the fast paths now also need to take the local lock to avoid races.

In the allocation fastpath slab_alloc_node() we can just defer to the slowpath
__slab_alloc() which also works with cpu freelist, but under the local lock.
In the free fastpath do_slab_free() we have to add a new local lock protected
version of freeing to the cpu freelist, as the existing slowpath only works
with the page freelist.

Also update the comment about locking scheme in SLUB to reflect changes done
by this series.

[ Mike Galbraith <efault@gmx.de>: use local_lock() without irq in PREEMPT_RT
  scope; debugging of RT crashes resulting in put_cpu_partial() locking changes ]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slub_def.h |   6 ++
 mm/slub.c                | 146 +++++++++++++++++++++++++++++----------
 2 files changed, 117 insertions(+), 35 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index dcde82a4434c..85499f0586b0 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -10,6 +10,7 @@
 #include <linux/kfence.h>
 #include <linux/kobject.h>
 #include <linux/reciprocal_div.h>
+#include <linux/local_lock.h>
 
 enum stat_item {
 	ALLOC_FASTPATH,		/* Allocation from cpu slab */
@@ -40,6 +41,10 @@ enum stat_item {
 	CPU_PARTIAL_DRAIN,	/* Drain cpu partial to node partial */
 	NR_SLUB_STAT_ITEMS };
 
+/*
+ * When changing the layout, make sure freelist and tid are still compatible
+ * with this_cpu_cmpxchg_double() alignment requirements.
+ */
 struct kmem_cache_cpu {
 	void **freelist;	/* Pointer to next available object */
 	unsigned long tid;	/* Globally unique transaction id */
@@ -47,6 +52,7 @@ struct kmem_cache_cpu {
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct page *partial;	/* Partially allocated frozen slabs */
 #endif
+	local_lock_t lock;	/* Protects the fields above */
 #ifdef CONFIG_SLUB_STATS
 	unsigned stat[NR_SLUB_STAT_ITEMS];
 #endif
diff --git a/mm/slub.c b/mm/slub.c
index 38d4cc51e880..3d2025f7163b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -46,13 +46,21 @@
 /*
  * Lock order:
  *   1. slab_mutex (Global Mutex)
- *   2. node->list_lock
- *   3. slab_lock(page) (Only on some arches and for debugging)
+ *   2. node->list_lock (Spinlock)
+ *   3. kmem_cache->cpu_slab->lock (Local lock)
+ *   4. slab_lock(page) (Only on some arches or for debugging)
+ *   5. object_map_lock (Only for debugging)
  *
  *   slab_mutex
  *
  *   The role of the slab_mutex is to protect the list of all the slabs
  *   and to synchronize major metadata changes to slab cache structures.
+ *   Also synchronizes memory hotplug callbacks.
+ *
+ *   slab_lock
+ *
+ *   The slab_lock is a wrapper around the page lock, thus it is a bit
+ *   spinlock.
  *
  *   The slab_lock is only used for debugging and on arches that do not
  *   have the ability to do a cmpxchg_double. It only protects:
@@ -61,6 +69,8 @@
  *	C. page->objects	-> Number of objects in page
  *	D. page->frozen		-> frozen state
  *
+ *   Frozen slabs
+ *
  *   If a slab is frozen then it is exempt from list management. It is not
  *   on any list except per cpu partial list. The processor that froze the
  *   slab is the one who can perform list operations on the page. Other
@@ -68,6 +78,8 @@
  *   froze the slab is the only one that can retrieve the objects from the
  *   page's freelist.
  *
+ *   list_lock
+ *
  *   The list_lock protects the partial and full list on each node and
  *   the partial slab counter. If taken then no new slabs may be added or
  *   removed from the lists nor make the number of partial slabs be modified.
@@ -79,10 +91,36 @@
  *   slabs, operations can continue without any centralized lock. F.e.
  *   allocating a long series of objects that fill up slabs does not require
  *   the list lock.
- *   Interrupts are disabled during allocation and deallocation in order to
- *   make the slab allocator safe to use in the context of an irq. In addition
- *   interrupts are disabled to ensure that the processor does not change
- *   while handling per_cpu slabs, due to kernel preemption.
+ *
+ *   cpu_slab->lock local lock
+ *
+ *   This locks protect slowpath manipulation of all kmem_cache_cpu fields
+ *   except the stat counters. This is a percpu structure manipulated only by
+ *   the local cpu, so the lock protects against being preempted or interrupted
+ *   by an irq. Fast path operations rely on lockless operations instead.
+ *   On PREEMPT_RT, the local lock does not actually disable irqs (and thus
+ *   prevent the lockless operations), so fastpath operations also need to take
+ *   the lock and are no longer lockless.
+ *
+ *   lockless fastpaths
+ *
+ *   The fast path allocation (slab_alloc_node()) and freeing (do_slab_free())
+ *   are fully lockless when satisfied from the percpu slab (and when
+ *   cmpxchg_double is possible to use, otherwise slab_lock is taken).
+ *   They also don't disable preemption or migration or irqs. They rely on
+ *   the transaction id (tid) field to detect being preempted or moved to
+ *   another cpu.
+ *
+ *   irq, preemption, migration considerations
+ *
+ *   Interrupts are disabled as part of list_lock or local_lock operations, or
+ *   around the slab_lock operation, in order to make the slab allocator safe
+ *   to use in the context of an irq.
+ *
+ *   In addition, preemption (or migration on PREEMPT_RT) is disabled in the
+ *   allocation slowpath, bulk allocation, and put_cpu_partial(), so that the
+ *   local cpu doesn't change in the process and e.g. the kmem_cache_cpu pointer
+ *   doesn't have to be revalidated in each section protected by the local lock.
  *
  * SLUB assigns one slab for allocation to each processor.
  * Allocations only occur from these slabs called cpu slabs.
@@ -2250,9 +2288,13 @@ static inline void note_cmpxchg_failure(const char *n,
 static void init_kmem_cache_cpus(struct kmem_cache *s)
 {
 	int cpu;
+	struct kmem_cache_cpu *c;
 
-	for_each_possible_cpu(cpu)
-		per_cpu_ptr(s->cpu_slab, cpu)->tid = init_tid(cpu);
+	for_each_possible_cpu(cpu) {
+		c = per_cpu_ptr(s->cpu_slab, cpu);
+		local_lock_init(&c->lock);
+		c->tid = init_tid(cpu);
+	}
 }
 
 /*
@@ -2463,10 +2505,10 @@ static void unfreeze_partials(struct kmem_cache *s)
 	struct page *partial_page;
 	unsigned long flags;
 
-	local_irq_save(flags);
+	local_lock_irqsave(&s->cpu_slab->lock, flags);
 	partial_page = this_cpu_read(s->cpu_slab->partial);
 	this_cpu_write(s->cpu_slab->partial, NULL);
-	local_irq_restore(flags);
+	local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 
 	if (partial_page)
 		__unfreeze_partials(s, partial_page);
@@ -2499,7 +2541,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 	int pages = 0;
 	int pobjects = 0;
 
-	local_irq_save(flags);
+	local_lock_irqsave(&s->cpu_slab->lock, flags);
 
 	oldpage = this_cpu_read(s->cpu_slab->partial);
 
@@ -2527,7 +2569,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 
 	this_cpu_write(s->cpu_slab->partial, page);
 
-	local_irq_restore(flags);
+	local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 
 	if (page_to_unfreeze) {
 		__unfreeze_partials(s, page_to_unfreeze);
@@ -2549,7 +2591,7 @@ static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 	struct page *page;
 	void *freelist;
 
-	local_irq_save(flags);
+	local_lock_irqsave(&s->cpu_slab->lock, flags);
 
 	page = c->page;
 	freelist = c->freelist;
@@ -2558,7 +2600,7 @@ static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 	c->freelist = NULL;
 	c->tid = next_tid(c->tid);
 
-	local_irq_restore(flags);
+	local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 
 	if (page) {
 		deactivate_slab(s, page, freelist);
@@ -2780,8 +2822,6 @@ static inline bool pfmemalloc_match_unsafe(struct page *page, gfp_t gfpflags)
  * The page is still frozen if the return value is not NULL.
  *
  * If this function returns NULL then the page has been unfrozen.
- *
- * This function must be called with interrupt disabled.
  */
 static inline void *get_freelist(struct kmem_cache *s, struct page *page)
 {
@@ -2789,6 +2829,8 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
 	unsigned long counters;
 	void *freelist;
 
+	lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock));
+
 	do {
 		freelist = page->freelist;
 		counters = page->counters;
@@ -2873,9 +2915,9 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		goto deactivate_slab;
 
 	/* must check again c->page in case we got preempted and it changed */
-	local_irq_save(flags);
+	local_lock_irqsave(&s->cpu_slab->lock, flags);
 	if (unlikely(page != c->page)) {
-		local_irq_restore(flags);
+		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 		goto reread_page;
 	}
 	freelist = c->freelist;
@@ -2886,7 +2928,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 	if (!freelist) {
 		c->page = NULL;
-		local_irq_restore(flags);
+		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 		stat(s, DEACTIVATE_BYPASS);
 		goto new_slab;
 	}
@@ -2895,7 +2937,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 load_freelist:
 
-	lockdep_assert_irqs_disabled();
+	lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock));
 
 	/*
 	 * freelist is pointing to the list of objects to be used.
@@ -2905,39 +2947,39 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	VM_BUG_ON(!c->page->frozen);
 	c->freelist = get_freepointer(s, freelist);
 	c->tid = next_tid(c->tid);
-	local_irq_restore(flags);
+	local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 	return freelist;
 
 deactivate_slab:
 
-	local_irq_save(flags);
+	local_lock_irqsave(&s->cpu_slab->lock, flags);
 	if (page != c->page) {
-		local_irq_restore(flags);
+		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 		goto reread_page;
 	}
 	freelist = c->freelist;
 	c->page = NULL;
 	c->freelist = NULL;
-	local_irq_restore(flags);
+	local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 	deactivate_slab(s, page, freelist);
 
 new_slab:
 
 	if (slub_percpu_partial(c)) {
-		local_irq_save(flags);
+		local_lock_irqsave(&s->cpu_slab->lock, flags);
 		if (unlikely(c->page)) {
-			local_irq_restore(flags);
+			local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 			goto reread_page;
 		}
 		if (unlikely(!slub_percpu_partial(c))) {
-			local_irq_restore(flags);
+			local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 			/* we were preempted and partial list got empty */
 			goto new_objects;
 		}
 
 		page = c->page = slub_percpu_partial(c);
 		slub_set_percpu_partial(c, page);
-		local_irq_restore(flags);
+		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 		stat(s, CPU_PARTIAL_ALLOC);
 		goto redo;
 	}
@@ -2990,7 +3032,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 retry_load_page:
 
-	local_irq_save(flags);
+	local_lock_irqsave(&s->cpu_slab->lock, flags);
 	if (unlikely(c->page)) {
 		void *flush_freelist = c->freelist;
 		struct page *flush_page = c->page;
@@ -2999,7 +3041,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		c->freelist = NULL;
 		c->tid = next_tid(c->tid);
 
-		local_irq_restore(flags);
+		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 
 		deactivate_slab(s, flush_page, flush_freelist);
 
@@ -3118,7 +3160,15 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 
 	object = c->freelist;
 	page = c->page;
-	if (unlikely(!object || !page || !node_match(page, node))) {
+	/*
+	 * We cannot use the lockless fastpath on PREEMPT_RT because if a
+	 * slowpath has taken the local_lock_irqsave(), it is not protected
+	 * against a fast path operation in an irq handler. So we need to take
+	 * the slow path which uses local_lock. It is still relatively fast if
+	 * there is a suitable cpu freelist.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) ||
+	    unlikely(!object || !page || !node_match(page, node))) {
 		object = __slab_alloc(s, gfpflags, node, addr, c);
 	} else {
 		void *next_object = get_freepointer_safe(s, object);
@@ -3378,6 +3428,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
 	barrier();
 
 	if (likely(page == c->page)) {
+#ifndef CONFIG_PREEMPT_RT
 		void **freelist = READ_ONCE(c->freelist);
 
 		set_freepointer(s, tail_obj, freelist);
@@ -3390,6 +3441,31 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
 			note_cmpxchg_failure("slab_free", s, tid);
 			goto redo;
 		}
+#else /* CONFIG_PREEMPT_RT */
+		/*
+		 * We cannot use the lockless fastpath on PREEMPT_RT because if
+		 * a slowpath has taken the local_lock_irqsave(), it is not
+		 * protected against a fast path operation in an irq handler. So
+		 * we need to take the local_lock. We shouldn't simply defer to
+		 * __slab_free() as that wouldn't use the cpu freelist at all.
+		 */
+		void **freelist;
+
+		local_lock(&s->cpu_slab->lock);
+		c = this_cpu_ptr(s->cpu_slab);
+		if (unlikely(page != c->page)) {
+			local_unlock(&s->cpu_slab->lock);
+			goto redo;
+		}
+		tid = c->tid;
+		freelist = c->freelist;
+
+		set_freepointer(s, tail_obj, freelist);
+		c->freelist = head;
+		c->tid = next_tid(tid);
+
+		local_unlock(&s->cpu_slab->lock);
+#endif
 		stat(s, FREE_FASTPATH);
 	} else
 		__slab_free(s, page, head, tail_obj, cnt, addr);
@@ -3568,7 +3644,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	 * handlers invoking normal fastpath.
 	 */
 	c = slub_get_cpu_ptr(s->cpu_slab);
-	local_irq_disable();
+	local_lock_irq(&s->cpu_slab->lock);
 
 	for (i = 0; i < size; i++) {
 		void *object = kfence_alloc(s, s->object_size, flags);
@@ -3589,7 +3665,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			 */
 			c->tid = next_tid(c->tid);
 
-			local_irq_enable();
+			local_unlock_irq(&s->cpu_slab->lock);
 
 			/*
 			 * Invoking slow path likely have side-effect
@@ -3603,7 +3679,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			c = this_cpu_ptr(s->cpu_slab);
 			maybe_wipe_obj_freeptr(s, p[i]);
 
-			local_irq_disable();
+			local_lock_irq(&s->cpu_slab->lock);
 
 			continue; /* goto for-loop */
 		}
@@ -3612,7 +3688,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 		maybe_wipe_obj_freeptr(s, p[i]);
 	}
 	c->tid = next_tid(c->tid);
-	local_irq_enable();
+	local_unlock_irq(&s->cpu_slab->lock);
 	slub_put_cpu_ptr(s->cpu_slab);
 
 	/*
-- 
2.33.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (32 preceding siblings ...)
  2021-09-04 10:50 ` [PATCH v6 33/33] mm, slub: convert kmem_cpu_slab protection to local_lock Vlastimil Babka
@ 2021-09-05 14:16 ` Mike Galbraith
  2021-09-07  8:20 ` Mel Gorman
  34 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2021-09-05 14:16 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, Christoph Lameter, David Rientjes,
	Pekka Enberg, Joonsoo Kim, Linus Torvalds, Stephen Rothwell
  Cc: Andrew Morton, linux-kernel, Sebastian Andrzej Siewior,
	Thomas Gleixner, Mel Gorman

On Sat, 2021-09-04 at 12:49 +0200, Vlastimil Babka wrote:
> I will of course continue testing, as I hope Mel and
> the RT guys would soon too.

I stuffed the set into tip, tip-rt and the raspberrypi 5.14 branch
kernel+RT, and let my boxen run full LTP for exercise. As expected,
nothing the least bit interesting happened, just a bit of run to run
result diff noise due to.. deterministically challenged testcases.

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible
  2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
                   ` (33 preceding siblings ...)
  2021-09-05 14:16 ` [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Mike Galbraith
@ 2021-09-07  8:20 ` Mel Gorman
  34 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2021-09-07  8:20 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Christoph Lameter, David Rientjes, Pekka Enberg,
	Joonsoo Kim, Linus Torvalds, Stephen Rothwell, Andrew Morton,
	linux-kernel, Mike Galbraith, Sebastian Andrzej Siewior,
	Thomas Gleixner

On Sat, Sep 04, 2021 at 12:49:30PM +0200, Vlastimil Babka wrote:
> The RFC/v1 version also got basic performance screening by Mel that didn't show
> major regressions. Mike's testing with hackbench of v2 on !RT reported
> negligible differences [6]:
> 

FWIW, this version didn't show any major differences in terms of
performance and it didn't functionally fail either.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-09-07  8:20 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-04 10:49 [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 01/33] mm, slub: don't call flush_all() from slab_debug_trace_open() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 02/33] mm, slub: allocate private object map for debugfs listings Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 03/33] mm, slub: allocate private object map for validate_slab_cache() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 04/33] mm, slub: don't disable irq for debug_check_no_locks_freed() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 05/33] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 06/33] mm, slub: extract get_partial() from new_slab_objects() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 07/33] mm, slub: dissolve new_slab_objects() into ___slab_alloc() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 08/33] mm, slub: return slab page from get_partial() and set c->page afterwards Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 09/33] mm, slub: restructure new page checks in ___slab_alloc() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 10/33] mm, slub: simplify kmem_cache_cpu and tid setup Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 11/33] mm, slub: move disabling/enabling irqs to ___slab_alloc() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 12/33] mm, slub: do initial checks in ___slab_alloc() with irqs enabled Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 13/33] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 14/33] mm, slub: restore irqs around calling new_slab() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 15/33] mm, slub: validate slab from partial list or page allocator before making it cpu slab Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 16/33] mm, slub: check new pages with restored irqs Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 17/33] mm, slub: stop disabling irqs around get_partial() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 18/33] mm, slub: move reset of c->page and freelist out of deactivate_slab() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 19/33] mm, slub: make locking in deactivate_slab() irq-safe Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 20/33] mm, slub: call deactivate_slab() without disabling irqs Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 21/33] mm, slub: move irq control into unfreeze_partials() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 22/33] mm, slub: discard slabs in unfreeze_partials() without irqs disabled Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 23/33] mm, slub: detach whole partial list at once in unfreeze_partials() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 24/33] mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 25/33] mm, slub: only disable irq with spin_lock in __unfreeze_partials() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 26/33] mm, slub: don't disable irqs in slub_cpu_dead() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 27/33] mm, slab: split out the cpu offline variant of flush_slab() Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 28/33] mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context Vlastimil Babka
2021-09-04 10:49 ` [PATCH v6 29/33] mm: slub: make object_map_lock a raw_spinlock_t Vlastimil Babka
2021-09-04 10:50 ` [PATCH v6 30/33] mm, slub: make slab_lock() disable irqs with PREEMPT_RT Vlastimil Babka
2021-09-04 10:50 ` [PATCH v6 31/33] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Vlastimil Babka
2021-09-04 10:50 ` [PATCH v6 32/33] mm, slub: use migrate_disable() on PREEMPT_RT Vlastimil Babka
2021-09-04 10:50 ` [PATCH v6 33/33] mm, slub: convert kmem_cpu_slab protection to local_lock Vlastimil Babka
2021-09-05 14:16 ` [PATCH v6 00/33] SLUB: reduce irq disabled scope and make it RT compatible Mike Galbraith
2021-09-07  8:20 ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).