linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@kernel.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mike Galbraith <efault@gmx.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Mel Gorman <mgorman@techsingularity.net>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Jann Horn <jannh@google.com>, Vlastimil Babka <vbabka@suse.cz>
Subject: [PATCH v3 33/35] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg
Date: Thu, 29 Jul 2021 15:21:30 +0200	[thread overview]
Message-ID: <20210729132132.19691-34-vbabka@suse.cz> (raw)
In-Reply-To: <20210729132132.19691-1-vbabka@suse.cz>

Jann Horn reported [1] the following theoretically possible race:

  task A: put_cpu_partial() calls preempt_disable()
  task A: oldpage = this_cpu_read(s->cpu_slab->partial)
  interrupt: kfree() reaches unfreeze_partials() and discards the page
  task B (on another CPU): reallocates page as page cache
  task A: reads page->pages and page->pobjects, which are actually
  halves of the pointer page->lru.prev
  task B (on another CPU): frees page
  interrupt: allocates page as SLUB page and places it on the percpu partial list
  task A: this_cpu_cmpxchg() succeeds

  which would cause page->pages and page->pobjects to end up containing
  halves of pointers that would then influence when put_cpu_partial()
  happens and show up in root-only sysfs files. Maybe that's acceptable,
  I don't know. But there should probably at least be a comment for now
  to point out that we're reading union fields of a page that might be
  in a completely different state.

Additionally, the this_cpu_cmpxchg() approach in put_cpu_partial() is only safe
against s->cpu_slab->partial manipulation in ___slab_alloc() if the latter
disables irqs, otherwise a __slab_free() in an irq handler could call
put_cpu_partial() in the middle of ___slab_alloc() manipulating ->partial
and corrupt it. This becomes an issue on RT after a local_lock is introduced
in later patch. The fix means taking the local_lock also in put_cpu_partial()
on RT.

After debugging this issue, Mike Galbraith suggested [2] that to avoid
different locking schemes on RT and !RT, we can just protect put_cpu_partial()
with disabled irqs (to be converted to local_lock_irqsave() later) everywhere.
This should be acceptable as it's not a fast path, and moving the actual
partial unfreezing outside of the irq disabled section makes it short, and with
the retry loop gone the code can be also simplified. In addition, the race
reported by Jann should no longer be possible.

[1] https://lore.kernel.org/lkml/CAG48ez1mvUuXwg0YPH5ANzhQLpbphqk-ZS+jbRz+H66fvm4FcA@mail.gmail.com/
[2] https://lore.kernel.org/linux-rt-users/e3470ab357b48bccfbd1f5133b982178a7d2befb.camel@gmx.de/

Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 81 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 44 insertions(+), 37 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 4f7218797603..0fd60d9ca27e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2002,7 +2002,12 @@ static inline void *acquire_slab(struct kmem_cache *s,
 	return freelist;
 }
 
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
+#else
+static inline void put_cpu_partial(struct kmem_cache *s, struct page *page,
+				   int drain) { }
+#endif
 static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
 
 /*
@@ -2436,14 +2441,6 @@ static void unfreeze_partials_cpu(struct kmem_cache *s,
 		__unfreeze_partials(s, partial_page);
 }
 
-#else	/* CONFIG_SLUB_CPU_PARTIAL */
-
-static inline void unfreeze_partials(struct kmem_cache *s) { }
-static inline void unfreeze_partials_cpu(struct kmem_cache *s,
-				  struct kmem_cache_cpu *c) { }
-
-#endif	/* CONFIG_SLUB_CPU_PARTIAL */
-
 /*
  * Put a page that was just frozen (in __slab_free|get_partial_node) into a
  * partial page slot if available.
@@ -2453,46 +2450,56 @@ static inline void unfreeze_partials_cpu(struct kmem_cache *s,
  */
 static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 {
-#ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct page *oldpage;
-	int pages;
-	int pobjects;
+	struct page *page_to_unfreeze = NULL;
+	unsigned long flags;
+	int pages = 0;
+	int pobjects = 0;
 
-	preempt_disable();
-	do {
-		pages = 0;
-		pobjects = 0;
-		oldpage = this_cpu_read(s->cpu_slab->partial);
+	local_irq_save(flags);
+
+	oldpage = this_cpu_read(s->cpu_slab->partial);
 
-		if (oldpage) {
+	if (oldpage) {
+		if (drain && pobjects > slub_cpu_partial(s)) {
+			/*
+			 * Partial array is full. Move the existing set to the
+			 * per node partial list. Postpone the actual unfreezing
+			 * outside of the critical section.
+			 */
+			page_to_unfreeze = oldpage;
+			oldpage = NULL;
+		} else {
 			pobjects = oldpage->pobjects;
 			pages = oldpage->pages;
-			if (drain && pobjects > slub_cpu_partial(s)) {
-				/*
-				 * partial array is full. Move the existing
-				 * set to the per node partial list.
-				 */
-				unfreeze_partials(s);
-				oldpage = NULL;
-				pobjects = 0;
-				pages = 0;
-				stat(s, CPU_PARTIAL_DRAIN);
-			}
 		}
+	}
 
-		pages++;
-		pobjects += page->objects - page->inuse;
+	pages++;
+	pobjects += page->objects - page->inuse;
 
-		page->pages = pages;
-		page->pobjects = pobjects;
-		page->next = oldpage;
+	page->pages = pages;
+	page->pobjects = pobjects;
+	page->next = oldpage;
 
-	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
-								!= oldpage);
-	preempt_enable();
-#endif	/* CONFIG_SLUB_CPU_PARTIAL */
+	this_cpu_write(s->cpu_slab->partial, page);
+
+	local_irq_restore(flags);
+
+	if (page_to_unfreeze) {
+		__unfreeze_partials(s, page_to_unfreeze);
+		stat(s, CPU_PARTIAL_DRAIN);
+	}
 }
 
+#else	/* CONFIG_SLUB_CPU_PARTIAL */
+
+static inline void unfreeze_partials(struct kmem_cache *s) { }
+static inline void unfreeze_partials_cpu(struct kmem_cache *s,
+				  struct kmem_cache_cpu *c) { }
+
+#endif	/* CONFIG_SLUB_CPU_PARTIAL */
+
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c,
 			      bool lock)
 {
-- 
2.32.0


  parent reply	other threads:[~2021-07-29 13:24 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 13:20 [PATCH v3 00/35] SLUB: reduce irq disabled scope and make it RT compatible Vlastimil Babka
2021-07-29 13:20 ` [PATCH v3 01/35] mm, slub: don't call flush_all() from slab_debug_trace_open() Vlastimil Babka
2021-07-29 13:20 ` [PATCH v3 02/35] mm, slub: allocate private object map for debugfs listings Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 03/35] mm, slub: allocate private object map for validate_slab_cache() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 04/35] mm, slub: don't disable irq for debug_check_no_locks_freed() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 05/35] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 06/35] mm, slub: unify cmpxchg_double_slab() and __cmpxchg_double_slab() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 07/35] mm, slub: extract get_partial() from new_slab_objects() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 08/35] mm, slub: dissolve new_slab_objects() into ___slab_alloc() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 09/35] mm, slub: return slab page from get_partial() and set c->page afterwards Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 10/35] mm, slub: restructure new page checks in ___slab_alloc() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 11/35] mm, slub: simplify kmem_cache_cpu and tid setup Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 12/35] mm, slub: move disabling/enabling irqs to ___slab_alloc() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 13/35] mm, slub: do initial checks in ___slab_alloc() with irqs enabled Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 14/35] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 15/35] mm, slub: restore irqs around calling new_slab() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 16/35] mm, slub: validate slab from partial list or page allocator before making it cpu slab Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 17/35] mm, slub: check new pages with restored irqs Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 18/35] mm, slub: stop disabling irqs around get_partial() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 19/35] mm, slub: move reset of c->page and freelist out of deactivate_slab() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 20/35] mm, slub: make locking in deactivate_slab() irq-safe Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 21/35] mm, slub: call deactivate_slab() without disabling irqs Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 22/35] mm, slub: move irq control into unfreeze_partials() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 23/35] mm, slub: discard slabs in unfreeze_partials() without irqs disabled Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 24/35] mm, slub: detach whole partial list at once in unfreeze_partials() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 25/35] mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 26/35] mm, slub: only disable irq with spin_lock in __unfreeze_partials() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 27/35] mm, slub: don't disable irqs in slub_cpu_dead() Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 28/35] mm, slab: make flush_slab() possible to call with irqs enabled Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 29/35] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 30/35] mm: slub: Make object_map_lock a raw_spinlock_t Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 31/35] mm, slub: optionally save/restore irqs in slab_[un]lock()/ Vlastimil Babka
2021-07-29 15:43   ` Mel Gorman
2021-07-29 13:21 ` [PATCH v3 32/35] mm, slub: make slab_lock() disable irqs with PREEMPT_RT Vlastimil Babka
2021-07-29 13:21 ` Vlastimil Babka [this message]
2021-07-29 15:42   ` [PATCH v3 33/35] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 34/35] mm, slub: use migrate_disable() on PREEMPT_RT Vlastimil Babka
2021-07-29 13:21 ` [PATCH v3 35/35] mm, slub: convert kmem_cpu_slab protection to local_lock Vlastimil Babka
2021-07-29 15:24 ` [PATCH v3 00/35] SLUB: reduce irq disabled scope and make it RT compatible Sebastian Andrzej Siewior
2021-07-29 15:27   ` Vlastimil Babka
2021-07-29 15:29     ` Sebastian Andrzej Siewior
2021-07-29 15:30       ` Sebastian Andrzej Siewior
2021-07-29 15:31       ` Vlastimil Babka
2021-07-29 15:47 ` Vlastimil Babka
2021-07-30  9:19   ` Sebastian Andrzej Siewior
2021-08-04 12:05 ` Mel Gorman
2021-08-04 12:14   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210729132132.19691-34-vbabka@suse.cz \
    --to=vbabka@suse.cz \
    --cc=bigeasy@linutronix.de \
    --cc=brouer@redhat.com \
    --cc=cl@linux.com \
    --cc=efault@gmx.de \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).