Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] mm: avoid slub allocation while holding list_lock
@ 2019-09-09  6:10 Yu Zhao
  2019-09-09 16:00 ` Kirill A. Shutemov
  2019-09-11 14:13 ` Andrew Morton
  0 siblings, 2 replies; 38+ messages in thread
From: Yu Zhao @ 2019-09-09  6:10 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton
  Cc: linux-mm, linux-kernel, Yu Zhao

If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.

Instead, allocate pages directly. Given currently page->objects has
15 bits, we only need 1 page. We may waste some memory but we only do
so when slub debug is on.

  WARNING: possible recursive locking detected
  --------------------------------------------
  mount-encrypted/4921 is trying to acquire lock:
  (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

  but task is already holding lock:
  (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&n->list_lock)->rlock);
    lock(&(&n->list_lock)->rlock);

   *** DEADLOCK ***

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..574a53ee31e1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3683,7 +3683,11 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
+	int order;
+	unsigned long *map;
+
+	order = get_order(DIV_ROUND_UP(page->objects, BITS_PER_BYTE));
+	map = (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO, order);
 	if (!map)
 		return;
 	slab_err(s, page, text, s->name);
@@ -3698,7 +3702,7 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 		}
 	}
 	slab_unlock(page);
-	bitmap_free(map);
+	free_pages((unsigned long)map, order);
 #endif
 }
 
-- 
2.23.0.187.g17f5b7556c-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm: avoid slub allocation while holding list_lock
  2019-09-09  6:10 [PATCH] mm: avoid slub allocation while holding list_lock Yu Zhao
@ 2019-09-09 16:00 ` Kirill A. Shutemov
       [not found]   ` <e5e25aa3-651d-92b4-ac82-c5011c66a7cb@I-love.SAKURA.ne.jp>
  2019-09-11 14:13 ` Andrew Morton
  1 sibling, 1 reply; 38+ messages in thread
From: Kirill A. Shutemov @ 2019-09-09 16:00 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-kernel

On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote:
> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
> 
> Instead, allocate pages directly. Given currently page->objects has
> 15 bits, we only need 1 page. We may waste some memory but we only do
> so when slub debug is on.
> 
>   WARNING: possible recursive locking detected
>   --------------------------------------------
>   mount-encrypted/4921 is trying to acquire lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> 
>   but task is already holding lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> 
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
> 
>          CPU0
>          ----
>     lock(&(&n->list_lock)->rlock);
>     lock(&(&n->list_lock)->rlock);
> 
>    *** DEADLOCK ***
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>

Looks sane to me:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm: avoid slub allocation while holding list_lock
       [not found]   ` <e5e25aa3-651d-92b4-ac82-c5011c66a7cb@I-love.SAKURA.ne.jp>
@ 2019-09-09 21:39     ` Yu Zhao
       [not found]       ` <201909100141.x8A1fVdu048305@www262.sakura.ne.jp>
  2019-09-10  9:16       ` Kirill A. Shutemov
  0 siblings, 2 replies; 38+ messages in thread
From: Yu Zhao @ 2019-09-09 21:39 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Kirill A. Shutemov, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm,
	linux-kernel

On Tue, Sep 10, 2019 at 05:57:22AM +0900, Tetsuo Handa wrote:
> On 2019/09/10 1:00, Kirill A. Shutemov wrote:
> > On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote:
> >> If we are already under list_lock, don't call kmalloc(). Otherwise we
> >> will run into deadlock because kmalloc() also tries to grab the same
> >> lock.
> >>
> >> Instead, allocate pages directly. Given currently page->objects has
> >> 15 bits, we only need 1 page. We may waste some memory but we only do
> >> so when slub debug is on.
> >>
> >>   WARNING: possible recursive locking detected
> >>   --------------------------------------------
> >>   mount-encrypted/4921 is trying to acquire lock:
> >>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> >>
> >>   but task is already holding lock:
> >>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> >>
> >>   other info that might help us debug this:
> >>    Possible unsafe locking scenario:
> >>
> >>          CPU0
> >>          ----
> >>     lock(&(&n->list_lock)->rlock);
> >>     lock(&(&n->list_lock)->rlock);
> >>
> >>    *** DEADLOCK ***
> >>
> >> Signed-off-by: Yu Zhao <yuzhao@google.com>
> > 
> > Looks sane to me:
> > 
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > 
> 
> Really?
> 
> Since page->objects is handled as bitmap, alignment should be BITS_PER_LONG
> than BITS_PER_BYTE (though in this particular case, get_order() would
> implicitly align BITS_PER_BYTE * PAGE_SIZE). But get_order(0) is an
> undefined behavior.

I think we can safely assume PAGE_SIZE is unsigned long aligned and
page->objects is non-zero. But if you don't feel comfortable with these
assumptions, I'd be happy to ensure them explicitly.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm: avoid slub allocation while holding list_lock
       [not found]       ` <201909100141.x8A1fVdu048305@www262.sakura.ne.jp>
@ 2019-09-10  2:16         ` Yu Zhao
  0 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-09-10  2:16 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Kirill A. Shutemov, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm,
	linux-kernel

On Tue, Sep 10, 2019 at 10:41:31AM +0900, Tetsuo Handa wrote:
> Yu Zhao wrote:
> > I think we can safely assume PAGE_SIZE is unsigned long aligned and
> > page->objects is non-zero. But if you don't feel comfortable with these
> > assumptions, I'd be happy to ensure them explicitly.
> 
> I know PAGE_SIZE is unsigned long aligned. If someone by chance happens to
> change from "dynamic allocation" to "on stack", get_order() will no longer
> be called and the bug will show up.
> 
> I don't know whether __get_free_page(GFP_ATOMIC) can temporarily consume more
> than 4096 bytes, but if it can, we might want to avoid "dynamic allocation".

With GFP_ATOMIC and ~~__GFP_HIGHMEM, it shouldn't.

> By the way, if "struct kmem_cache_node" is object which won't have many thousands
> of instances, can't we embed that buffer into "struct kmem_cache_node" because
> max size of that buffer is only 4096 bytes?

It seems to me allocation in error path is better than always keeping
a page around. But the latter may still be acceptable given it's done
only when debug is on and, of course, on a per-node scale.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm: avoid slub allocation while holding list_lock
  2019-09-09 21:39     ` Yu Zhao
       [not found]       ` <201909100141.x8A1fVdu048305@www262.sakura.ne.jp>
@ 2019-09-10  9:16       ` Kirill A. Shutemov
  1 sibling, 0 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2019-09-10  9:16 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Tetsuo Handa, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-kernel

On Mon, Sep 09, 2019 at 03:39:38PM -0600, Yu Zhao wrote:
> On Tue, Sep 10, 2019 at 05:57:22AM +0900, Tetsuo Handa wrote:
> > On 2019/09/10 1:00, Kirill A. Shutemov wrote:
> > > On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote:
> > >> If we are already under list_lock, don't call kmalloc(). Otherwise we
> > >> will run into deadlock because kmalloc() also tries to grab the same
> > >> lock.
> > >>
> > >> Instead, allocate pages directly. Given currently page->objects has
> > >> 15 bits, we only need 1 page. We may waste some memory but we only do
> > >> so when slub debug is on.
> > >>
> > >>   WARNING: possible recursive locking detected
> > >>   --------------------------------------------
> > >>   mount-encrypted/4921 is trying to acquire lock:
> > >>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> > >>
> > >>   but task is already holding lock:
> > >>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> > >>
> > >>   other info that might help us debug this:
> > >>    Possible unsafe locking scenario:
> > >>
> > >>          CPU0
> > >>          ----
> > >>     lock(&(&n->list_lock)->rlock);
> > >>     lock(&(&n->list_lock)->rlock);
> > >>
> > >>    *** DEADLOCK ***
> > >>
> > >> Signed-off-by: Yu Zhao <yuzhao@google.com>
> > > 
> > > Looks sane to me:
> > > 
> > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > 
> > 
> > Really?
> > 
> > Since page->objects is handled as bitmap, alignment should be BITS_PER_LONG
> > than BITS_PER_BYTE (though in this particular case, get_order() would
> > implicitly align BITS_PER_BYTE * PAGE_SIZE). But get_order(0) is an
> > undefined behavior.
> 
> I think we can safely assume PAGE_SIZE is unsigned long aligned and
> page->objects is non-zero.

I think it's better to handle page->objects == 0 gracefully. It should not
happen, but this code handles situation that should not happen.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm: avoid slub allocation while holding list_lock
  2019-09-09  6:10 [PATCH] mm: avoid slub allocation while holding list_lock Yu Zhao
  2019-09-09 16:00 ` Kirill A. Shutemov
@ 2019-09-11 14:13 ` Andrew Morton
  2019-09-12  0:29   ` [PATCH 1/3] mm: correct mask size for slub page->objects Yu Zhao
  1 sibling, 1 reply; 38+ messages in thread
From: Andrew Morton @ 2019-09-11 14:13 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, linux-kernel

On Mon,  9 Sep 2019 00:10:16 -0600 Yu Zhao <yuzhao@google.com> wrote:

> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
> 
> Instead, allocate pages directly. Given currently page->objects has
> 15 bits, we only need 1 page. We may waste some memory but we only do
> so when slub debug is on.
> 
>   WARNING: possible recursive locking detected
>   --------------------------------------------
>   mount-encrypted/4921 is trying to acquire lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> 
>   but task is already holding lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> 
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
> 
>          CPU0
>          ----
>     lock(&(&n->list_lock)->rlock);
>     lock(&(&n->list_lock)->rlock);
> 
>    *** DEADLOCK ***
> 

It would be better if a silly low-level debug function like this
weren't to try to allocate memory at all.  Did you consider simply
using a statically allocated buffer?

{
	static char buffer[something large enough];
	static spinlock_t lock_to_protect_it;


Alternatively, do we need to call get_map() at all in there?  We could
simply open-code the get_map() functionality inside
list_slab_objects().  It would be slower, but printk is already slow. 
Potentially extremely slow.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/3] mm: correct mask size for slub page->objects
  2019-09-11 14:13 ` Andrew Morton
@ 2019-09-12  0:29   ` Yu Zhao
  2019-09-12  0:29     ` [PATCH 2/3] mm: avoid slub allocation while holding list_lock Yu Zhao
  2019-09-12  0:29     ` [PATCH 3/3] mm: lock slub page when listing objects Yu Zhao
  0 siblings, 2 replies; 38+ messages in thread
From: Yu Zhao @ 2019-09-12  0:29 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
  Cc: linux-mm, linux-kernel, Yu Zhao

Mask of slub objects per page shouldn't be larger than what
page->objects can hold.

It requires more than 2^15 objects to hit the problem, and I don't
think anybody would. It'd be nice to have the mask fixed, but not
really worth cc'ing the stable.

Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..62053ceb4464 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -187,7 +187,7 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
  */
 #define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
 
-#define OO_SHIFT	16
+#define OO_SHIFT	15
 #define OO_MASK		((1 << OO_SHIFT) - 1)
 #define MAX_OBJS_PER_PAGE	32767 /* since page.objects is u15 */
 
@@ -343,6 +343,8 @@ static inline unsigned int oo_order(struct kmem_cache_order_objects x)
 
 static inline unsigned int oo_objects(struct kmem_cache_order_objects x)
 {
+	BUILD_BUG_ON(OO_MASK > MAX_OBJS_PER_PAGE);
+
 	return x.x & OO_MASK;
 }
 
-- 
2.23.0.162.g0b9fbb3734-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/3] mm: avoid slub allocation while holding list_lock
  2019-09-12  0:29   ` [PATCH 1/3] mm: correct mask size for slub page->objects Yu Zhao
@ 2019-09-12  0:29     ` Yu Zhao
  2019-09-12  0:44       ` Kirill A. Shutemov
  2019-09-12  0:29     ` [PATCH 3/3] mm: lock slub page when listing objects Yu Zhao
  1 sibling, 1 reply; 38+ messages in thread
From: Yu Zhao @ 2019-09-12  0:29 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
  Cc: linux-mm, linux-kernel, Yu Zhao

If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.

Instead, statically allocate bitmap in struct kmem_cache_node. Given
currently page->objects has 15 bits, we bloat the per-node struct by
4K. So we waste some memory but only do so when slub debug is on.

  WARNING: possible recursive locking detected
  --------------------------------------------
  mount-encrypted/4921 is trying to acquire lock:
  (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

  but task is already holding lock:
  (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&n->list_lock)->rlock);
    lock(&(&n->list_lock)->rlock);

   *** DEADLOCK ***

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 include/linux/slub_def.h |  4 ++++
 mm/slab.h                |  1 +
 mm/slub.c                | 44 ++++++++++++++--------------------------
 3 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index d2153789bd9f..719d43574360 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -9,6 +9,10 @@
  */
 #include <linux/kobject.h>
 
+#define OO_SHIFT	15
+#define OO_MASK		((1 << OO_SHIFT) - 1)
+#define MAX_OBJS_PER_PAGE	32767 /* since page.objects is u15 */
+
 enum stat_item {
 	ALLOC_FASTPATH,		/* Allocation from cpu slab */
 	ALLOC_SLOWPATH,		/* Allocation by getting a new cpu slab */
diff --git a/mm/slab.h b/mm/slab.h
index 9057b8056b07..2d8639835db1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -556,6 +556,7 @@ struct kmem_cache_node {
 	atomic_long_t nr_slabs;
 	atomic_long_t total_objects;
 	struct list_head full;
+	unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
 #endif
 #endif
 
diff --git a/mm/slub.c b/mm/slub.c
index 62053ceb4464..f28072c9f2ce 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -187,10 +187,6 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
  */
 #define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
 
-#define OO_SHIFT	15
-#define OO_MASK		((1 << OO_SHIFT) - 1)
-#define MAX_OBJS_PER_PAGE	32767 /* since page.objects is u15 */
-
 /* Internal SLUB flags */
 /* Poison object */
 #define __OBJECT_POISON		((slab_flags_t __force)0x80000000U)
@@ -454,6 +450,8 @@ static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
 	void *p;
 	void *addr = page_address(page);
 
+	bitmap_zero(map, page->objects);
+
 	for (p = page->freelist; p; p = get_freepointer(s, p))
 		set_bit(slab_index(p, s, addr), map);
 }
@@ -3680,14 +3678,12 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 }
 
 static void list_slab_objects(struct kmem_cache *s, struct page *page,
-							const char *text)
+			      unsigned long *map, const char *text)
 {
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
-	if (!map)
-		return;
+
 	slab_err(s, page, text, s->name);
 	slab_lock(page);
 
@@ -3699,8 +3695,8 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 			print_tracking(s, p);
 		}
 	}
+
 	slab_unlock(page);
-	bitmap_free(map);
 #endif
 }
 
@@ -3721,7 +3717,7 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
 			remove_partial(n, page);
 			list_add(&page->slab_list, &discard);
 		} else {
-			list_slab_objects(s, page,
+			list_slab_objects(s, page, n->object_map,
 			"Objects remaining in %s on __kmem_cache_shutdown()");
 		}
 	}
@@ -4397,7 +4393,6 @@ static int validate_slab(struct kmem_cache *s, struct page *page,
 		return 0;
 
 	/* Now we know that a valid freelist exists */
-	bitmap_zero(map, page->objects);
 
 	get_map(s, page, map);
 	for_each_object(p, s, addr, page->objects) {
@@ -4422,7 +4417,7 @@ static void validate_slab_slab(struct kmem_cache *s, struct page *page,
 }
 
 static int validate_slab_node(struct kmem_cache *s,
-		struct kmem_cache_node *n, unsigned long *map)
+		struct kmem_cache_node *n)
 {
 	unsigned long count = 0;
 	struct page *page;
@@ -4431,7 +4426,7 @@ static int validate_slab_node(struct kmem_cache *s,
 	spin_lock_irqsave(&n->list_lock, flags);
 
 	list_for_each_entry(page, &n->partial, slab_list) {
-		validate_slab_slab(s, page, map);
+		validate_slab_slab(s, page, n->object_map);
 		count++;
 	}
 	if (count != n->nr_partial)
@@ -4442,7 +4437,7 @@ static int validate_slab_node(struct kmem_cache *s,
 		goto out;
 
 	list_for_each_entry(page, &n->full, slab_list) {
-		validate_slab_slab(s, page, map);
+		validate_slab_slab(s, page, n->object_map);
 		count++;
 	}
 	if (count != atomic_long_read(&n->nr_slabs))
@@ -4459,15 +4454,11 @@ static long validate_slab_cache(struct kmem_cache *s)
 	int node;
 	unsigned long count = 0;
 	struct kmem_cache_node *n;
-	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
-	if (!map)
-		return -ENOMEM;
 
 	flush_all(s);
 	for_each_kmem_cache_node(s, node, n)
-		count += validate_slab_node(s, n, map);
-	bitmap_free(map);
+		count += validate_slab_node(s, n);
+
 	return count;
 }
 /*
@@ -4603,9 +4594,7 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s,
 	void *addr = page_address(page);
 	void *p;
 
-	bitmap_zero(map, page->objects);
 	get_map(s, page, map);
-
 	for_each_object(p, s, addr, page->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			add_location(t, s, get_track(s, p, alloc));
@@ -4619,11 +4608,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
 	struct loc_track t = { 0, 0, NULL };
 	int node;
 	struct kmem_cache_node *n;
-	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
 
-	if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
-				     GFP_KERNEL)) {
-		bitmap_free(map);
+	if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+			     GFP_KERNEL)) {
 		return sprintf(buf, "Out of memory\n");
 	}
 	/* Push back cpu slabs */
@@ -4638,9 +4625,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
 
 		spin_lock_irqsave(&n->list_lock, flags);
 		list_for_each_entry(page, &n->partial, slab_list)
-			process_slab(&t, s, page, alloc, map);
+			process_slab(&t, s, page, alloc, n->object_map);
 		list_for_each_entry(page, &n->full, slab_list)
-			process_slab(&t, s, page, alloc, map);
+			process_slab(&t, s, page, alloc, n->object_map);
 		spin_unlock_irqrestore(&n->list_lock, flags);
 	}
 
@@ -4689,7 +4676,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
 	}
 
 	free_loc_track(&t);
-	bitmap_free(map);
 	if (!t.count)
 		len += sprintf(buf, "No data\n");
 	return len;
-- 
2.23.0.162.g0b9fbb3734-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 3/3] mm: lock slub page when listing objects
  2019-09-12  0:29   ` [PATCH 1/3] mm: correct mask size for slub page->objects Yu Zhao
  2019-09-12  0:29     ` [PATCH 2/3] mm: avoid slub allocation while holding list_lock Yu Zhao
@ 2019-09-12  0:29     ` Yu Zhao
  1 sibling, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-09-12  0:29 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
  Cc: linux-mm, linux-kernel, Yu Zhao

Though I have no idea what the side effect of a race would be,
apparently we want to prevent the free list from being changed
while debugging objects in general.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index f28072c9f2ce..2734a092bbff 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4594,10 +4594,14 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s,
 	void *addr = page_address(page);
 	void *p;
 
+	slab_lock(page);
+
 	get_map(s, page, map);
 	for_each_object(p, s, addr, page->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			add_location(t, s, get_track(s, p, alloc));
+
+	slab_unlock(page);
 }
 
 static int list_locations(struct kmem_cache *s, char *buf,
-- 
2.23.0.162.g0b9fbb3734-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/3] mm: avoid slub allocation while holding list_lock
  2019-09-12  0:29     ` [PATCH 2/3] mm: avoid slub allocation while holding list_lock Yu Zhao
@ 2019-09-12  0:44       ` Kirill A. Shutemov
  2019-09-12  1:31         ` Yu Zhao
  2019-09-12  2:31         ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao
  0 siblings, 2 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2019-09-12  0:44 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel

On Wed, Sep 11, 2019 at 06:29:28PM -0600, Yu Zhao wrote:
> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
> 
> Instead, statically allocate bitmap in struct kmem_cache_node. Given
> currently page->objects has 15 bits, we bloat the per-node struct by
> 4K. So we waste some memory but only do so when slub debug is on.

Why not have single page total protected by a lock?

Listing object from two pages at the same time doesn't make sense anyway.
Cuncurent validating is not something sane to do.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/3] mm: avoid slub allocation while holding list_lock
  2019-09-12  0:44       ` Kirill A. Shutemov
@ 2019-09-12  1:31         ` Yu Zhao
  2019-09-12  2:31         ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao
  1 sibling, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-09-12  1:31 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel

On Thu, Sep 12, 2019 at 03:44:01AM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 11, 2019 at 06:29:28PM -0600, Yu Zhao wrote:
> > If we are already under list_lock, don't call kmalloc(). Otherwise we
> > will run into deadlock because kmalloc() also tries to grab the same
> > lock.
> > 
> > Instead, statically allocate bitmap in struct kmem_cache_node. Given
> > currently page->objects has 15 bits, we bloat the per-node struct by
> > 4K. So we waste some memory but only do so when slub debug is on.
> 
> Why not have single page total protected by a lock?
> 
> Listing object from two pages at the same time doesn't make sense anyway.
> Cuncurent validating is not something sane to do.

Okay, cutting down to static global bitmap.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 1/4] mm: correct mask size for slub page->objects
  2019-09-12  0:44       ` Kirill A. Shutemov
  2019-09-12  1:31         ` Yu Zhao
@ 2019-09-12  2:31         ` Yu Zhao
  2019-09-12  2:31           ` [PATCH v2 2/4] mm: clean up validate_slab() Yu Zhao
                             ` (4 more replies)
  1 sibling, 5 replies; 38+ messages in thread
From: Yu Zhao @ 2019-09-12  2:31 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
  Cc: linux-mm, linux-kernel, Yu Zhao

Mask of slub objects per page shouldn't be larger than what
page->objects can hold.

It requires more than 2^15 objects to hit the problem, and I don't
think anybody would. It'd be nice to have the mask fixed, but not
really worth cc'ing the stable.

Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..62053ceb4464 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -187,7 +187,7 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
  */
 #define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
 
-#define OO_SHIFT	16
+#define OO_SHIFT	15
 #define OO_MASK		((1 << OO_SHIFT) - 1)
 #define MAX_OBJS_PER_PAGE	32767 /* since page.objects is u15 */
 
@@ -343,6 +343,8 @@ static inline unsigned int oo_order(struct kmem_cache_order_objects x)
 
 static inline unsigned int oo_objects(struct kmem_cache_order_objects x)
 {
+	BUILD_BUG_ON(OO_MASK > MAX_OBJS_PER_PAGE);
+
 	return x.x & OO_MASK;
 }
 
-- 
2.23.0.162.g0b9fbb3734-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] mm: clean up validate_slab()
  2019-09-12  2:31         ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao
@ 2019-09-12  2:31           ` Yu Zhao
  2019-09-12  9:46             ` Kirill A. Shutemov
  2019-09-12  2:31           ` [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock Yu Zhao
                             ` (3 subsequent siblings)
  4 siblings, 1 reply; 38+ messages in thread
From: Yu Zhao @ 2019-09-12  2:31 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
  Cc: linux-mm, linux-kernel, Yu Zhao

The function doesn't need to return any value, and the check can be
done in one pass.

There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 62053ceb4464..7b7e1ee264ef 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4386,31 +4386,26 @@ static int count_total(struct page *page)
 #endif
 
 #ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
 						unsigned long *map)
 {
 	void *p;
 	void *addr = page_address(page);
 
-	if (!check_slab(s, page) ||
-			!on_freelist(s, page, NULL))
-		return 0;
+	if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+		return;
 
 	/* Now we know that a valid freelist exists */
 	bitmap_zero(map, page->objects);
 
 	get_map(s, page, map);
 	for_each_object(p, s, addr, page->objects) {
-		if (test_bit(slab_index(p, s, addr), map))
-			if (!check_object(s, page, p, SLUB_RED_INACTIVE))
-				return 0;
-	}
+		u8 val = test_bit(slab_index(p, s, addr), map) ?
+			 SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
 
-	for_each_object(p, s, addr, page->objects)
-		if (!test_bit(slab_index(p, s, addr), map))
-			if (!check_object(s, page, p, SLUB_RED_ACTIVE))
-				return 0;
-	return 1;
+		if (!check_object(s, page, p, val))
+			break;
+	}
 }
 
 static void validate_slab_slab(struct kmem_cache *s, struct page *page,
-- 
2.23.0.162.g0b9fbb3734-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock
  2019-09-12  2:31         ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao
  2019-09-12  2:31           ` [PATCH v2 2/4] mm: clean up validate_slab() Yu Zhao
@ 2019-09-12  2:31           ` Yu Zhao
  2019-09-12 10:04             ` Kirill A. Shutemov
  2019-09-12  2:31           ` [PATCH v2 4/4] mm: lock slub page when listing objects Yu Zhao
                             ` (2 subsequent siblings)
  4 siblings, 1 reply; 38+ messages in thread
From: Yu Zhao @ 2019-09-12  2:31 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
  Cc: linux-mm, linux-kernel, Yu Zhao

If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.

Fixing the problem by using a static bitmap instead.

  WARNING: possible recursive locking detected
  --------------------------------------------
  mount-encrypted/4921 is trying to acquire lock:
  (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

  but task is already holding lock:
  (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&n->list_lock)->rlock);
    lock(&(&n->list_lock)->rlock);

   *** DEADLOCK ***

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 47 insertions(+), 41 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 7b7e1ee264ef..baa60dd73942 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -443,19 +443,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 }
 
 #ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
 /*
  * Determine a map of object in use on a page.
  *
  * Node listlock must be held to guarantee that the page does
  * not vanish from under us.
  */
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
 {
 	void *p;
 	void *addr = page_address(page);
 
+	VM_BUG_ON(!irqs_disabled());
+
+	spin_lock(&object_map_lock);
+
+	bitmap_zero(object_map, page->objects);
+
 	for (p = page->freelist; p; p = get_freepointer(s, p))
-		set_bit(slab_index(p, s, addr), map);
+		set_bit(slab_index(p, s, addr), object_map);
+
+	return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+	VM_BUG_ON(map != object_map);
+	lockdep_assert_held(&object_map_lock);
+
+	spin_unlock(&object_map_lock);
 }
 
 static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3685,13 +3704,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
-	if (!map)
-		return;
+	unsigned long *map;
+
 	slab_err(s, page, text, s->name);
 	slab_lock(page);
 
-	get_map(s, page, map);
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects) {
 
 		if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3699,8 +3717,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 			print_tracking(s, p);
 		}
 	}
+	put_map(map);
+
 	slab_unlock(page);
-	bitmap_free(map);
 #endif
 }
 
@@ -4386,19 +4405,19 @@ static int count_total(struct page *page)
 #endif
 
 #ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
-						unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
 {
 	void *p;
 	void *addr = page_address(page);
+	unsigned long *map;
+
+	slab_lock(page);
 
 	if (!check_slab(s, page) || !on_freelist(s, page, NULL))
-		return;
+		goto unlock;
 
 	/* Now we know that a valid freelist exists */
-	bitmap_zero(map, page->objects);
-
-	get_map(s, page, map);
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects) {
 		u8 val = test_bit(slab_index(p, s, addr), map) ?
 			 SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4406,18 +4425,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
 		if (!check_object(s, page, p, val))
 			break;
 	}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
-						unsigned long *map)
-{
-	slab_lock(page);
-	validate_slab(s, page, map);
+	put_map(map);
+unlock:
 	slab_unlock(page);
 }
 
 static int validate_slab_node(struct kmem_cache *s,
-		struct kmem_cache_node *n, unsigned long *map)
+		struct kmem_cache_node *n)
 {
 	unsigned long count = 0;
 	struct page *page;
@@ -4426,7 +4440,7 @@ static int validate_slab_node(struct kmem_cache *s,
 	spin_lock_irqsave(&n->list_lock, flags);
 
 	list_for_each_entry(page, &n->partial, slab_list) {
-		validate_slab_slab(s, page, map);
+		validate_slab(s, page);
 		count++;
 	}
 	if (count != n->nr_partial)
@@ -4437,7 +4451,7 @@ static int validate_slab_node(struct kmem_cache *s,
 		goto out;
 
 	list_for_each_entry(page, &n->full, slab_list) {
-		validate_slab_slab(s, page, map);
+		validate_slab(s, page);
 		count++;
 	}
 	if (count != atomic_long_read(&n->nr_slabs))
@@ -4454,15 +4468,11 @@ static long validate_slab_cache(struct kmem_cache *s)
 	int node;
 	unsigned long count = 0;
 	struct kmem_cache_node *n;
-	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
-	if (!map)
-		return -ENOMEM;
 
 	flush_all(s);
 	for_each_kmem_cache_node(s, node, n)
-		count += validate_slab_node(s, n, map);
-	bitmap_free(map);
+		count += validate_slab_node(s, n);
+
 	return count;
 }
 /*
@@ -4592,18 +4602,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 }
 
 static void process_slab(struct loc_track *t, struct kmem_cache *s,
-		struct page *page, enum track_item alloc,
-		unsigned long *map)
+		struct page *page, enum track_item alloc)
 {
 	void *addr = page_address(page);
 	void *p;
+	unsigned long *map;
 
-	bitmap_zero(map, page->objects);
-	get_map(s, page, map);
-
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			add_location(t, s, get_track(s, p, alloc));
+	put_map(map);
 }
 
 static int list_locations(struct kmem_cache *s, char *buf,
@@ -4614,11 +4623,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
 	struct loc_track t = { 0, 0, NULL };
 	int node;
 	struct kmem_cache_node *n;
-	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
 
-	if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
-				     GFP_KERNEL)) {
-		bitmap_free(map);
+	if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+			     GFP_KERNEL)) {
 		return sprintf(buf, "Out of memory\n");
 	}
 	/* Push back cpu slabs */
@@ -4633,9 +4640,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
 
 		spin_lock_irqsave(&n->list_lock, flags);
 		list_for_each_entry(page, &n->partial, slab_list)
-			process_slab(&t, s, page, alloc, map);
+			process_slab(&t, s, page, alloc);
 		list_for_each_entry(page, &n->full, slab_list)
-			process_slab(&t, s, page, alloc, map);
+			process_slab(&t, s, page, alloc);
 		spin_unlock_irqrestore(&n->list_lock, flags);
 	}
 
@@ -4684,7 +4691,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
 	}
 
 	free_loc_track(&t);
-	bitmap_free(map);
 	if (!t.count)
 		len += sprintf(buf, "No data\n");
 	return len;
-- 
2.23.0.162.g0b9fbb3734-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 4/4] mm: lock slub page when listing objects
  2019-09-12  2:31         ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao
  2019-09-12  2:31           ` [PATCH v2 2/4] mm: clean up validate_slab() Yu Zhao
  2019-09-12  2:31           ` [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock Yu Zhao
@ 2019-09-12  2:31           ` Yu Zhao
  2019-09-12 10:06             ` Kirill A. Shutemov
  2019-09-13 14:58             ` Christopher Lameter
  2019-09-12  9:40           ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Kirill A. Shutemov
  2019-09-14  0:07           ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao
  4 siblings, 2 replies; 38+ messages in thread
From: Yu Zhao @ 2019-09-12  2:31 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
  Cc: linux-mm, linux-kernel, Yu Zhao

Though I have no idea what the side effect of such race would be,
apparently we want to prevent the free list from being changed
while debugging the objects.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index baa60dd73942..1c9726c28f0b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4608,11 +4608,15 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s,
 	void *p;
 	unsigned long *map;
 
+	slab_lock(page);
+
 	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			add_location(t, s, get_track(s, p, alloc));
 	put_map(map);
+
+	slab_unlock(page);
 }
 
 static int list_locations(struct kmem_cache *s, char *buf,
-- 
2.23.0.162.g0b9fbb3734-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects
  2019-09-12  2:31         ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao
                             ` (2 preceding siblings ...)
  2019-09-12  2:31           ` [PATCH v2 4/4] mm: lock slub page when listing objects Yu Zhao
@ 2019-09-12  9:40           ` Kirill A. Shutemov
  2019-09-12 21:11             ` Yu Zhao
  2019-09-14  0:07           ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao
  4 siblings, 1 reply; 38+ messages in thread
From: Kirill A. Shutemov @ 2019-09-12  9:40 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel

On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote:
> Mask of slub objects per page shouldn't be larger than what
> page->objects can hold.
> 
> It requires more than 2^15 objects to hit the problem, and I don't
> think anybody would. It'd be nice to have the mask fixed, but not
> really worth cc'ing the stable.
> 
> Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
> Signed-off-by: Yu Zhao <yuzhao@google.com>

I don't think the patch fixes anything.

Yes, we have one spare bit between order and number of object that is not
used and always zero. So what?

I can imagine for some microarchitecures accessing higher 16 bits of int
is cheaper than shifting by 15.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/4] mm: clean up validate_slab()
  2019-09-12  2:31           ` [PATCH v2 2/4] mm: clean up validate_slab() Yu Zhao
@ 2019-09-12  9:46             ` Kirill A. Shutemov
  0 siblings, 0 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2019-09-12  9:46 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel

On Wed, Sep 11, 2019 at 08:31:09PM -0600, Yu Zhao wrote:
> The function doesn't need to return any value, and the check can be
> done in one pass.
> 
> There is a behavior change: before the patch, we stop at the first
> invalid free object; after the patch, we stop at the first invalid
> object, free or in use. This shouldn't matter because the original
> behavior isn't intended anyway.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> ---
>  mm/slub.c | 21 ++++++++-------------
>  1 file changed, 8 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 62053ceb4464..7b7e1ee264ef 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4386,31 +4386,26 @@ static int count_total(struct page *page)
>  #endif
>  
>  #ifdef CONFIG_SLUB_DEBUG
> -static int validate_slab(struct kmem_cache *s, struct page *page,
> +static void validate_slab(struct kmem_cache *s, struct page *page,
>  						unsigned long *map)
>  {
>  	void *p;
>  	void *addr = page_address(page);
>  
> -	if (!check_slab(s, page) ||
> -			!on_freelist(s, page, NULL))
> -		return 0;
> +	if (!check_slab(s, page) || !on_freelist(s, page, NULL))
> +		return;
>  
>  	/* Now we know that a valid freelist exists */
>  	bitmap_zero(map, page->objects);
>  
>  	get_map(s, page, map);
>  	for_each_object(p, s, addr, page->objects) {
> -		if (test_bit(slab_index(p, s, addr), map))
> -			if (!check_object(s, page, p, SLUB_RED_INACTIVE))
> -				return 0;
> -	}
> +		u8 val = test_bit(slab_index(p, s, addr), map) ?
> +			 SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;

Proper 'if' would be more readable.

Other than that look fine to me.

>  
> -	for_each_object(p, s, addr, page->objects)
> -		if (!test_bit(slab_index(p, s, addr), map))
> -			if (!check_object(s, page, p, SLUB_RED_ACTIVE))
> -				return 0;
> -	return 1;
> +		if (!check_object(s, page, p, val))
> +			break;
> +	}
>  }
>  
>  static void validate_slab_slab(struct kmem_cache *s, struct page *page,
> -- 
> 2.23.0.162.g0b9fbb3734-goog
> 

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock
  2019-09-12  2:31           ` [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock Yu Zhao
@ 2019-09-12 10:04             ` Kirill A. Shutemov
  0 siblings, 0 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2019-09-12 10:04 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel

On Wed, Sep 11, 2019 at 08:31:10PM -0600, Yu Zhao wrote:
> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
> 
> Fixing the problem by using a static bitmap instead.
> 
>   WARNING: possible recursive locking detected
>   --------------------------------------------
>   mount-encrypted/4921 is trying to acquire lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> 
>   but task is already holding lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> 
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
> 
>          CPU0
>          ----
>     lock(&(&n->list_lock)->rlock);
>     lock(&(&n->list_lock)->rlock);
> 
>    *** DEADLOCK ***
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 4/4] mm: lock slub page when listing objects
  2019-09-12  2:31           ` [PATCH v2 4/4] mm: lock slub page when listing objects Yu Zhao
@ 2019-09-12 10:06             ` Kirill A. Shutemov
  2019-09-12 21:12               ` Yu Zhao
  2019-09-13 14:58             ` Christopher Lameter
  1 sibling, 1 reply; 38+ messages in thread
From: Kirill A. Shutemov @ 2019-09-12 10:06 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel

On Wed, Sep 11, 2019 at 08:31:11PM -0600, Yu Zhao wrote:
> Though I have no idea what the side effect of such race would be,
> apparently we want to prevent the free list from being changed
> while debugging the objects.

Codewise looks good to me. But commit message can be better.

Can we repharase it to indicate that slab_lock is required to protect
page->objects?

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects
  2019-09-12  9:40           ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Kirill A. Shutemov
@ 2019-09-12 21:11             ` Yu Zhao
  2019-09-12 22:03               ` Kirill A. Shutemov
  0 siblings, 1 reply; 38+ messages in thread
From: Yu Zhao @ 2019-09-12 21:11 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel

On Thu, Sep 12, 2019 at 12:40:35PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote:
> > Mask of slub objects per page shouldn't be larger than what
> > page->objects can hold.
> > 
> > It requires more than 2^15 objects to hit the problem, and I don't
> > think anybody would. It'd be nice to have the mask fixed, but not
> > really worth cc'ing the stable.
> > 
> > Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
> > Signed-off-by: Yu Zhao <yuzhao@google.com>
> 
> I don't think the patch fixes anything.

Technically it does. It makes no sense for a mask to have more bits
than the variable that holds the masked value. I had to look up the
commit history to find out why and go through the code to make sure
it doesn't actually cause any problem.

My hope is that nobody else would have to go through the same trouble.

> Yes, we have one spare bit between order and number of object that is not
> used and always zero. So what?
> 
> I can imagine for some microarchitecures accessing higher 16 bits of int
> is cheaper than shifting by 15.

Well, I highly doubt the inconsistency is intended for such
optimization, even it existed.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 4/4] mm: lock slub page when listing objects
  2019-09-12 10:06             ` Kirill A. Shutemov
@ 2019-09-12 21:12               ` Yu Zhao
  0 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-09-12 21:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel

On Thu, Sep 12, 2019 at 01:06:42PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 11, 2019 at 08:31:11PM -0600, Yu Zhao wrote:
> > Though I have no idea what the side effect of such race would be,
> > apparently we want to prevent the free list from being changed
> > while debugging the objects.
> 
> Codewise looks good to me. But commit message can be better.
> 
> Can we repharase it to indicate that slab_lock is required to protect
> page->objects?

Will do.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects
  2019-09-12 21:11             ` Yu Zhao
@ 2019-09-12 22:03               ` Kirill A. Shutemov
  0 siblings, 0 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2019-09-12 22:03 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel

On Thu, Sep 12, 2019 at 03:11:14PM -0600, Yu Zhao wrote:
> On Thu, Sep 12, 2019 at 12:40:35PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote:
> > > Mask of slub objects per page shouldn't be larger than what
> > > page->objects can hold.
> > > 
> > > It requires more than 2^15 objects to hit the problem, and I don't
> > > think anybody would. It'd be nice to have the mask fixed, but not
> > > really worth cc'ing the stable.
> > > 
> > > Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
> > > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > 
> > I don't think the patch fixes anything.
> 
> Technically it does. It makes no sense for a mask to have more bits
> than the variable that holds the masked value. I had to look up the
> commit history to find out why and go through the code to make sure
> it doesn't actually cause any problem.
> 
> My hope is that nobody else would have to go through the same trouble.

Just put some comments then.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 4/4] mm: lock slub page when listing objects
  2019-09-12  2:31           ` [PATCH v2 4/4] mm: lock slub page when listing objects Yu Zhao
  2019-09-12 10:06             ` Kirill A. Shutemov
@ 2019-09-13 14:58             ` Christopher Lameter
  1 sibling, 0 replies; 38+ messages in thread
From: Christopher Lameter @ 2019-09-13 14:58 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel

On Wed, 11 Sep 2019, Yu Zhao wrote:

> Though I have no idea what the side effect of such race would be,
> apparently we want to prevent the free list from being changed
> while debugging the objects.

process_slab() is called under the list_lock which prevents any allocation
from the free list in the slab page. This means that new objects can be
added to the freelist which occurs by updating the freelist pointer in the
slab page with a pointer to the newly free object which in turn contains
the old freelist pointr.

It is therefore possible to safely traverse the objects on the freelist
after the pointer has been retrieved

NAK.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 1/2] mm: clean up validate_slab()
  2019-09-12  2:31         ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao
                             ` (3 preceding siblings ...)
  2019-09-12  9:40           ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Kirill A. Shutemov
@ 2019-09-14  0:07           ` Yu Zhao
  2019-09-14  0:07             ` [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao
                               ` (2 more replies)
  4 siblings, 3 replies; 38+ messages in thread
From: Yu Zhao @ 2019-09-14  0:07 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
  Cc: linux-mm, linux-kernel, Yu Zhao

The function doesn't need to return any value, and the check can be
done in one pass.

There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..445ef8b2aec0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4384,31 +4384,26 @@ static int count_total(struct page *page)
 #endif
 
 #ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
 						unsigned long *map)
 {
 	void *p;
 	void *addr = page_address(page);
 
-	if (!check_slab(s, page) ||
-			!on_freelist(s, page, NULL))
-		return 0;
+	if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+		return;
 
 	/* Now we know that a valid freelist exists */
 	bitmap_zero(map, page->objects);
 
 	get_map(s, page, map);
 	for_each_object(p, s, addr, page->objects) {
-		if (test_bit(slab_index(p, s, addr), map))
-			if (!check_object(s, page, p, SLUB_RED_INACTIVE))
-				return 0;
-	}
+		u8 val = test_bit(slab_index(p, s, addr), map) ?
+			 SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
 
-	for_each_object(p, s, addr, page->objects)
-		if (!test_bit(slab_index(p, s, addr), map))
-			if (!check_object(s, page, p, SLUB_RED_ACTIVE))
-				return 0;
-	return 1;
+		if (!check_object(s, page, p, val))
+			break;
+	}
 }
 
 static void validate_slab_slab(struct kmem_cache *s, struct page *page,
-- 
2.23.0.237.gc6a4ce50a0-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock
  2019-09-14  0:07           ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao
@ 2019-09-14  0:07             ` Yu Zhao
  2019-09-16  8:39             ` [PATCH v3 1/2] mm: clean up validate_slab() Kirill A. Shutemov
  2019-11-08 19:39             ` [PATCH v4 " Yu Zhao
  2 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-09-14  0:07 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
  Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov

If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.

Fixing the problem by using a static bitmap instead.

  WARNING: possible recursive locking detected
  --------------------------------------------
  mount-encrypted/4921 is trying to acquire lock:
  (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

  but task is already holding lock:
  (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&n->list_lock)->rlock);
    lock(&(&n->list_lock)->rlock);

   *** DEADLOCK ***

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 47 insertions(+), 41 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 445ef8b2aec0..c1de01730648 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -441,19 +441,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 }
 
 #ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
 /*
  * Determine a map of object in use on a page.
  *
  * Node listlock must be held to guarantee that the page does
  * not vanish from under us.
  */
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
 {
 	void *p;
 	void *addr = page_address(page);
 
+	VM_BUG_ON(!irqs_disabled());
+
+	spin_lock(&object_map_lock);
+
+	bitmap_zero(object_map, page->objects);
+
 	for (p = page->freelist; p; p = get_freepointer(s, p))
-		set_bit(slab_index(p, s, addr), map);
+		set_bit(slab_index(p, s, addr), object_map);
+
+	return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+	VM_BUG_ON(map != object_map);
+	lockdep_assert_held(&object_map_lock);
+
+	spin_unlock(&object_map_lock);
 }
 
 static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3683,13 +3702,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
-	if (!map)
-		return;
+	unsigned long *map;
+
 	slab_err(s, page, text, s->name);
 	slab_lock(page);
 
-	get_map(s, page, map);
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects) {
 
 		if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3697,8 +3715,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 			print_tracking(s, p);
 		}
 	}
+	put_map(map);
+
 	slab_unlock(page);
-	bitmap_free(map);
 #endif
 }
 
@@ -4384,19 +4403,19 @@ static int count_total(struct page *page)
 #endif
 
 #ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
-						unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
 {
 	void *p;
 	void *addr = page_address(page);
+	unsigned long *map;
+
+	slab_lock(page);
 
 	if (!check_slab(s, page) || !on_freelist(s, page, NULL))
-		return;
+		goto unlock;
 
 	/* Now we know that a valid freelist exists */
-	bitmap_zero(map, page->objects);
-
-	get_map(s, page, map);
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects) {
 		u8 val = test_bit(slab_index(p, s, addr), map) ?
 			 SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4404,18 +4423,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
 		if (!check_object(s, page, p, val))
 			break;
 	}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
-						unsigned long *map)
-{
-	slab_lock(page);
-	validate_slab(s, page, map);
+	put_map(map);
+unlock:
 	slab_unlock(page);
 }
 
 static int validate_slab_node(struct kmem_cache *s,
-		struct kmem_cache_node *n, unsigned long *map)
+		struct kmem_cache_node *n)
 {
 	unsigned long count = 0;
 	struct page *page;
@@ -4424,7 +4438,7 @@ static int validate_slab_node(struct kmem_cache *s,
 	spin_lock_irqsave(&n->list_lock, flags);
 
 	list_for_each_entry(page, &n->partial, slab_list) {
-		validate_slab_slab(s, page, map);
+		validate_slab(s, page);
 		count++;
 	}
 	if (count != n->nr_partial)
@@ -4435,7 +4449,7 @@ static int validate_slab_node(struct kmem_cache *s,
 		goto out;
 
 	list_for_each_entry(page, &n->full, slab_list) {
-		validate_slab_slab(s, page, map);
+		validate_slab(s, page);
 		count++;
 	}
 	if (count != atomic_long_read(&n->nr_slabs))
@@ -4452,15 +4466,11 @@ static long validate_slab_cache(struct kmem_cache *s)
 	int node;
 	unsigned long count = 0;
 	struct kmem_cache_node *n;
-	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
-	if (!map)
-		return -ENOMEM;
 
 	flush_all(s);
 	for_each_kmem_cache_node(s, node, n)
-		count += validate_slab_node(s, n, map);
-	bitmap_free(map);
+		count += validate_slab_node(s, n);
+
 	return count;
 }
 /*
@@ -4590,18 +4600,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 }
 
 static void process_slab(struct loc_track *t, struct kmem_cache *s,
-		struct page *page, enum track_item alloc,
-		unsigned long *map)
+		struct page *page, enum track_item alloc)
 {
 	void *addr = page_address(page);
 	void *p;
+	unsigned long *map;
 
-	bitmap_zero(map, page->objects);
-	get_map(s, page, map);
-
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			add_location(t, s, get_track(s, p, alloc));
+	put_map(map);
 }
 
 static int list_locations(struct kmem_cache *s, char *buf,
@@ -4612,11 +4621,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
 	struct loc_track t = { 0, 0, NULL };
 	int node;
 	struct kmem_cache_node *n;
-	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
 
-	if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
-				     GFP_KERNEL)) {
-		bitmap_free(map);
+	if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+			     GFP_KERNEL)) {
 		return sprintf(buf, "Out of memory\n");
 	}
 	/* Push back cpu slabs */
@@ -4631,9 +4638,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
 
 		spin_lock_irqsave(&n->list_lock, flags);
 		list_for_each_entry(page, &n->partial, slab_list)
-			process_slab(&t, s, page, alloc, map);
+			process_slab(&t, s, page, alloc);
 		list_for_each_entry(page, &n->full, slab_list)
-			process_slab(&t, s, page, alloc, map);
+			process_slab(&t, s, page, alloc);
 		spin_unlock_irqrestore(&n->list_lock, flags);
 	}
 
@@ -4682,7 +4689,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
 	}
 
 	free_loc_track(&t);
-	bitmap_free(map);
 	if (!t.count)
 		len += sprintf(buf, "No data\n");
 	return len;
-- 
2.23.0.237.gc6a4ce50a0-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/2] mm: clean up validate_slab()
  2019-09-14  0:07           ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao
  2019-09-14  0:07             ` [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao
@ 2019-09-16  8:39             ` Kirill A. Shutemov
  2019-11-08 19:39             ` [PATCH v4 " Yu Zhao
  2 siblings, 0 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2019-09-16  8:39 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel

On Fri, Sep 13, 2019 at 06:07:42PM -0600, Yu Zhao wrote:
> The function doesn't need to return any value, and the check can be
> done in one pass.
> 
> There is a behavior change: before the patch, we stop at the first
> invalid free object; after the patch, we stop at the first invalid
> object, free or in use. This shouldn't matter because the original
> behavior isn't intended anyway.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v4 1/2] mm: clean up validate_slab()
  2019-09-14  0:07           ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao
  2019-09-14  0:07             ` [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao
  2019-09-16  8:39             ` [PATCH v3 1/2] mm: clean up validate_slab() Kirill A. Shutemov
@ 2019-11-08 19:39             ` " Yu Zhao
  2019-11-08 19:39               ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao
  2 siblings, 1 reply; 38+ messages in thread
From: Yu Zhao @ 2019-11-08 19:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
  Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov

The function doesn't need to return any value, and the check can be
done in one pass.

There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b25c807a111f..6930c3febad7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4404,31 +4404,26 @@ static int count_total(struct page *page)
 #endif
 
 #ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
 						unsigned long *map)
 {
 	void *p;
 	void *addr = page_address(page);
 
-	if (!check_slab(s, page) ||
-			!on_freelist(s, page, NULL))
-		return 0;
+	if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+		return;
 
 	/* Now we know that a valid freelist exists */
 	bitmap_zero(map, page->objects);
 
 	get_map(s, page, map);
 	for_each_object(p, s, addr, page->objects) {
-		if (test_bit(slab_index(p, s, addr), map))
-			if (!check_object(s, page, p, SLUB_RED_INACTIVE))
-				return 0;
-	}
+		u8 val = test_bit(slab_index(p, s, addr), map) ?
+			 SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
 
-	for_each_object(p, s, addr, page->objects)
-		if (!test_bit(slab_index(p, s, addr), map))
-			if (!check_object(s, page, p, SLUB_RED_ACTIVE))
-				return 0;
-	return 1;
+		if (!check_object(s, page, p, val))
+			break;
+	}
 }
 
 static void validate_slab_slab(struct kmem_cache *s, struct page *page,
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
  2019-11-08 19:39             ` [PATCH v4 " Yu Zhao
@ 2019-11-08 19:39               ` Yu Zhao
  2019-11-09 20:52                 ` Christopher Lameter
  0 siblings, 1 reply; 38+ messages in thread
From: Yu Zhao @ 2019-11-08 19:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
  Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov

If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.

Fixing the problem by using a static bitmap instead.

  WARNING: possible recursive locking detected
  --------------------------------------------
  mount-encrypted/4921 is trying to acquire lock:
  (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

  but task is already holding lock:
  (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&n->list_lock)->rlock);
    lock(&(&n->list_lock)->rlock);

   *** DEADLOCK ***

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 47 insertions(+), 41 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 6930c3febad7..7a4ec3c4b4d9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -441,19 +441,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 }
 
 #ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
 /*
  * Determine a map of object in use on a page.
  *
  * Node listlock must be held to guarantee that the page does
  * not vanish from under us.
  */
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
 {
 	void *p;
 	void *addr = page_address(page);
 
+	VM_BUG_ON(!irqs_disabled());
+
+	spin_lock(&object_map_lock);
+
+	bitmap_zero(object_map, page->objects);
+
 	for (p = page->freelist; p; p = get_freepointer(s, p))
-		set_bit(slab_index(p, s, addr), map);
+		set_bit(slab_index(p, s, addr), object_map);
+
+	return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+	VM_BUG_ON(map != object_map);
+	lockdep_assert_held(&object_map_lock);
+
+	spin_unlock(&object_map_lock);
 }
 
 static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3695,13 +3714,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
-	if (!map)
-		return;
+	unsigned long *map;
+
 	slab_err(s, page, text, s->name);
 	slab_lock(page);
 
-	get_map(s, page, map);
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects) {
 
 		if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3709,8 +3727,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 			print_tracking(s, p);
 		}
 	}
+	put_map(map);
+
 	slab_unlock(page);
-	bitmap_free(map);
 #endif
 }
 
@@ -4404,19 +4423,19 @@ static int count_total(struct page *page)
 #endif
 
 #ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
-						unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
 {
 	void *p;
 	void *addr = page_address(page);
+	unsigned long *map;
+
+	slab_lock(page);
 
 	if (!check_slab(s, page) || !on_freelist(s, page, NULL))
-		return;
+		goto unlock;
 
 	/* Now we know that a valid freelist exists */
-	bitmap_zero(map, page->objects);
-
-	get_map(s, page, map);
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects) {
 		u8 val = test_bit(slab_index(p, s, addr), map) ?
 			 SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4424,18 +4443,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
 		if (!check_object(s, page, p, val))
 			break;
 	}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
-						unsigned long *map)
-{
-	slab_lock(page);
-	validate_slab(s, page, map);
+	put_map(map);
+unlock:
 	slab_unlock(page);
 }
 
 static int validate_slab_node(struct kmem_cache *s,
-		struct kmem_cache_node *n, unsigned long *map)
+		struct kmem_cache_node *n)
 {
 	unsigned long count = 0;
 	struct page *page;
@@ -4444,7 +4458,7 @@ static int validate_slab_node(struct kmem_cache *s,
 	spin_lock_irqsave(&n->list_lock, flags);
 
 	list_for_each_entry(page, &n->partial, slab_list) {
-		validate_slab_slab(s, page, map);
+		validate_slab(s, page);
 		count++;
 	}
 	if (count != n->nr_partial)
@@ -4455,7 +4469,7 @@ static int validate_slab_node(struct kmem_cache *s,
 		goto out;
 
 	list_for_each_entry(page, &n->full, slab_list) {
-		validate_slab_slab(s, page, map);
+		validate_slab(s, page);
 		count++;
 	}
 	if (count != atomic_long_read(&n->nr_slabs))
@@ -4472,15 +4486,11 @@ static long validate_slab_cache(struct kmem_cache *s)
 	int node;
 	unsigned long count = 0;
 	struct kmem_cache_node *n;
-	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
-	if (!map)
-		return -ENOMEM;
 
 	flush_all(s);
 	for_each_kmem_cache_node(s, node, n)
-		count += validate_slab_node(s, n, map);
-	bitmap_free(map);
+		count += validate_slab_node(s, n);
+
 	return count;
 }
 /*
@@ -4610,18 +4620,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 }
 
 static void process_slab(struct loc_track *t, struct kmem_cache *s,
-		struct page *page, enum track_item alloc,
-		unsigned long *map)
+		struct page *page, enum track_item alloc)
 {
 	void *addr = page_address(page);
 	void *p;
+	unsigned long *map;
 
-	bitmap_zero(map, page->objects);
-	get_map(s, page, map);
-
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			add_location(t, s, get_track(s, p, alloc));
+	put_map(map);
 }
 
 static int list_locations(struct kmem_cache *s, char *buf,
@@ -4632,11 +4641,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
 	struct loc_track t = { 0, 0, NULL };
 	int node;
 	struct kmem_cache_node *n;
-	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
 
-	if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
-				     GFP_KERNEL)) {
-		bitmap_free(map);
+	if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+			     GFP_KERNEL)) {
 		return sprintf(buf, "Out of memory\n");
 	}
 	/* Push back cpu slabs */
@@ -4651,9 +4658,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
 
 		spin_lock_irqsave(&n->list_lock, flags);
 		list_for_each_entry(page, &n->partial, slab_list)
-			process_slab(&t, s, page, alloc, map);
+			process_slab(&t, s, page, alloc);
 		list_for_each_entry(page, &n->full, slab_list)
-			process_slab(&t, s, page, alloc, map);
+			process_slab(&t, s, page, alloc);
 		spin_unlock_irqrestore(&n->list_lock, flags);
 	}
 
@@ -4702,7 +4709,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
 	}
 
 	free_loc_track(&t);
-	bitmap_free(map);
 	if (!t.count)
 		len += sprintf(buf, "No data\n");
 	return len;
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
  2019-11-08 19:39               ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao
@ 2019-11-09 20:52                 ` Christopher Lameter
  2019-11-09 23:01                   ` Yu Zhao
  0 siblings, 1 reply; 38+ messages in thread
From: Christopher Lameter @ 2019-11-09 20:52 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
	Kirill A . Shutemov

On Fri, 8 Nov 2019, Yu Zhao wrote:

> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.

How did this happen? The kmalloc needs to be always done before the
list_lock is taken.

> Fixing the problem by using a static bitmap instead.
>
>   WARNING: possible recursive locking detected
>   --------------------------------------------
>   mount-encrypted/4921 is trying to acquire lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
>
>   but task is already holding lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
>
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
>
>          CPU0
>          ----
>     lock(&(&n->list_lock)->rlock);
>     lock(&(&n->list_lock)->rlock);
>
>    *** DEADLOCK ***


Ahh. list_slab_objects() in shutdown?

There is a much easier fix for this:



[FIX] slub: Remove kmalloc under list_lock from list_slab_objects()

list_slab_objects() is called when a slab is destroyed and there are objects still left
to list the objects in the syslog. This is a pretty rare event.

And there it seems we take the list_lock and call kmalloc while holding that lock.

Perform the allocation in free_partial() before the list_lock is taken.

Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2019-10-15 13:54:57.032655296 +0000
+++ linux/mm/slub.c	2019-11-09 20:43:52.374187381 +0000
@@ -3690,14 +3690,11 @@ error:
 }

 static void list_slab_objects(struct kmem_cache *s, struct page *page,
-							const char *text)
+					const char *text, unsigned long *map)
 {
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
-	if (!map)
-		return;
 	slab_err(s, page, text, s->name);
 	slab_lock(page);

@@ -3723,6 +3720,10 @@ static void free_partial(struct kmem_cac
 {
 	LIST_HEAD(discard);
 	struct page *page, *h;
+	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
+
+	if (!map)
+		return;

 	BUG_ON(irqs_disabled());
 	spin_lock_irq(&n->list_lock);
@@ -3732,7 +3733,8 @@ static void free_partial(struct kmem_cac
 			list_add(&page->slab_list, &discard);
 		} else {
 			list_slab_objects(s, page,
-			"Objects remaining in %s on __kmem_cache_shutdown()");
+			"Objects remaining in %s on __kmem_cache_shutdown()",
+			map);
 		}
 	}
 	spin_unlock_irq(&n->list_lock);


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
  2019-11-09 20:52                 ` Christopher Lameter
@ 2019-11-09 23:01                   ` Yu Zhao
  2019-11-09 23:16                     ` Christopher Lameter
  0 siblings, 1 reply; 38+ messages in thread
From: Yu Zhao @ 2019-11-09 23:01 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
	Kirill A . Shutemov

On Sat, Nov 09, 2019 at 08:52:29PM +0000, Christopher Lameter wrote:
> On Fri, 8 Nov 2019, Yu Zhao wrote:
> 
> > If we are already under list_lock, don't call kmalloc(). Otherwise we
> > will run into deadlock because kmalloc() also tries to grab the same
> > lock.
> 
> How did this happen? The kmalloc needs to be always done before the
> list_lock is taken.
> 
> > Fixing the problem by using a static bitmap instead.
> >
> >   WARNING: possible recursive locking detected
> >   --------------------------------------------
> >   mount-encrypted/4921 is trying to acquire lock:
> >   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> >
> >   but task is already holding lock:
> >   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> >
> >   other info that might help us debug this:
> >    Possible unsafe locking scenario:
> >
> >          CPU0
> >          ----
> >     lock(&(&n->list_lock)->rlock);
> >     lock(&(&n->list_lock)->rlock);
> >
> >    *** DEADLOCK ***
> 
> 
> Ahh. list_slab_objects() in shutdown?
> 
> There is a much easier fix for this:
> 
> 
> 
> [FIX] slub: Remove kmalloc under list_lock from list_slab_objects()
> 
> list_slab_objects() is called when a slab is destroyed and there are objects still left
> to list the objects in the syslog. This is a pretty rare event.
> 
> And there it seems we take the list_lock and call kmalloc while holding that lock.
> 
> Perform the allocation in free_partial() before the list_lock is taken.
> 
> Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2019-10-15 13:54:57.032655296 +0000
> +++ linux/mm/slub.c	2019-11-09 20:43:52.374187381 +0000
> @@ -3690,14 +3690,11 @@ error:
>  }
> 
>  static void list_slab_objects(struct kmem_cache *s, struct page *page,
> -							const char *text)
> +					const char *text, unsigned long *map)
>  {
>  #ifdef CONFIG_SLUB_DEBUG
>  	void *addr = page_address(page);
>  	void *p;
> -	unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
> -	if (!map)
> -		return;
>  	slab_err(s, page, text, s->name);
>  	slab_lock(page);
> 
> @@ -3723,6 +3720,10 @@ static void free_partial(struct kmem_cac
>  {
>  	LIST_HEAD(discard);
>  	struct page *page, *h;
> +	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> +
> +	if (!map)
> +		return;

What would happen if we are trying to allocate from the slab that is
being shut down? And shouldn't the allocation be conditional (i.e.,
only when CONFIG_SLUB_DEBUG=y)?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
  2019-11-09 23:01                   ` Yu Zhao
@ 2019-11-09 23:16                     ` Christopher Lameter
  2019-11-10 18:47                       ` Yu Zhao
  0 siblings, 1 reply; 38+ messages in thread
From: Christopher Lameter @ 2019-11-09 23:16 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
	Kirill A . Shutemov

On Sat, 9 Nov 2019, Yu Zhao wrote:

> >  	struct page *page, *h;
> > +	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > +
> > +	if (!map)
> > +		return;
>
> What would happen if we are trying to allocate from the slab that is
> being shut down? And shouldn't the allocation be conditional (i.e.,
> only when CONFIG_SLUB_DEBUG=y)?

Kmalloc slabs are never shut down.

The allocation does not hurt and CONFIG_SLUB_DEBUG is on in most
configurations.




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
  2019-11-09 23:16                     ` Christopher Lameter
@ 2019-11-10 18:47                       ` Yu Zhao
  2019-11-11 15:47                         ` Christopher Lameter
  0 siblings, 1 reply; 38+ messages in thread
From: Yu Zhao @ 2019-11-10 18:47 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
	Kirill A . Shutemov

On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote:
> On Sat, 9 Nov 2019, Yu Zhao wrote:
> 
> > >  	struct page *page, *h;
> > > +	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > > +
> > > +	if (!map)
> > > +		return;
> >
> > What would happen if we are trying to allocate from the slab that is
> > being shut down? And shouldn't the allocation be conditional (i.e.,
> > only when CONFIG_SLUB_DEBUG=y)?
> 
> Kmalloc slabs are never shut down.

Maybe I'm not thinking straight -- isn't it what caused the deadlock in
the first place?

Kmalloc slabs can be shut down when memcg is on.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
  2019-11-10 18:47                       ` Yu Zhao
@ 2019-11-11 15:47                         ` Christopher Lameter
  2019-11-11 15:55                           ` [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 Christopher Lameter
  2019-11-11 18:15                           ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Shakeel Butt
  0 siblings, 2 replies; 38+ messages in thread
From: Christopher Lameter @ 2019-11-11 15:47 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
	Kirill A . Shutemov

On Sun, 10 Nov 2019, Yu Zhao wrote:

> On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote:
> > On Sat, 9 Nov 2019, Yu Zhao wrote:
> >
> > > >  	struct page *page, *h;
> > > > +	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > > > +
> > > > +	if (!map)
> > > > +		return;
> > >
> > > What would happen if we are trying to allocate from the slab that is
> > > being shut down? And shouldn't the allocation be conditional (i.e.,
> > > only when CONFIG_SLUB_DEBUG=y)?
> >
> > Kmalloc slabs are never shut down.
>
> Maybe I'm not thinking straight -- isn't it what caused the deadlock in
> the first place?

Well if kmalloc allocations become a problem then we have numerous
issues all over the kernel to fix.

> Kmalloc slabs can be shut down when memcg is on.

Kmalloc needs to work even during shutdown of a memcg.

Maybe we need to fix memcg to not allocate from the current memcg during
shutdown?




^ permalink raw reply	[flat|nested] 38+ messages in thread

* [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
  2019-11-11 15:47                         ` Christopher Lameter
@ 2019-11-11 15:55                           ` Christopher Lameter
  2019-11-30 23:09                             ` Andrew Morton
  2019-11-11 18:15                           ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Shakeel Butt
  1 sibling, 1 reply; 38+ messages in thread
From: Christopher Lameter @ 2019-11-11 15:55 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
	Kirill A . Shutemov

Regardless of the issue with memcgs allowing allocations from its
kmalloc array during shutdown: This patch cleans things up and properly
allocates the bitmap outside of the list_lock.


[FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2

V1->V2 : Properly handle CONFIG_SLUB_DEBUG. Handle bitmap free correctly.

list_slab_objects() is called when a slab is destroyed and there are objects still left
to list the objects in the syslog. This is a pretty rare event.

And there it seems we take the list_lock and call kmalloc while holding that lock.

Perform the allocation in free_partial() before the list_lock is taken.

Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
Signed-off-by: Christoph Lameter

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2019-10-15 13:54:57.032655296 +0000
+++ linux/mm/slub.c	2019-11-11 15:52:11.616397853 +0000
@@ -3690,14 +3690,15 @@ error:
 }

 static void list_slab_objects(struct kmem_cache *s, struct page *page,
-							const char *text)
+					const char *text, unsigned long *map)
 {
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
+
 	if (!map)
 		return;
+
 	slab_err(s, page, text, s->name);
 	slab_lock(page);

@@ -3710,7 +3711,6 @@ static void list_slab_objects(struct kme
 		}
 	}
 	slab_unlock(page);
-	bitmap_free(map);
 #endif
 }

@@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac
 {
 	LIST_HEAD(discard);
 	struct page *page, *h;
+	unsigned long *map = NULL;
+
+#ifdef CONFIG_SLUB_DEBUG
+	map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
+#endif

 	BUG_ON(irqs_disabled());
 	spin_lock_irq(&n->list_lock);
@@ -3732,11 +3737,16 @@ static void free_partial(struct kmem_cac
 			list_add(&page->slab_list, &discard);
 		} else {
 			list_slab_objects(s, page,
-			"Objects remaining in %s on __kmem_cache_shutdown()");
+			"Objects remaining in %s on __kmem_cache_shutdown()",
+			map);
 		}
 	}
 	spin_unlock_irq(&n->list_lock);

+#ifdef CONFIG_SLUB_DEBUG
+	bitmap_free(map);
+#endif
+
 	list_for_each_entry_safe(page, h, &discard, slab_list)
 		discard_slab(s, page);
 }


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
  2019-11-11 15:47                         ` Christopher Lameter
  2019-11-11 15:55                           ` [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 Christopher Lameter
@ 2019-11-11 18:15                           ` Shakeel Butt
  1 sibling, 0 replies; 38+ messages in thread
From: Shakeel Butt @ 2019-11-11 18:15 UTC (permalink / raw)
  To: Christopher Lameter, Roman Gushchin
  Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kirill A . Shutemov, Tetsuo Handa, Linux MM, LKML,
	Kirill A . Shutemov

+Roman Gushchin

On Mon, Nov 11, 2019 at 7:47 AM Christopher Lameter <cl@linux.com> wrote:
>
> On Sun, 10 Nov 2019, Yu Zhao wrote:
>
> > On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote:
> > > On Sat, 9 Nov 2019, Yu Zhao wrote:
> > >
> > > > >         struct page *page, *h;
> > > > > +       unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > > > > +
> > > > > +       if (!map)
> > > > > +               return;
> > > >
> > > > What would happen if we are trying to allocate from the slab that is
> > > > being shut down? And shouldn't the allocation be conditional (i.e.,
> > > > only when CONFIG_SLUB_DEBUG=y)?
> > >
> > > Kmalloc slabs are never shut down.
> >
> > Maybe I'm not thinking straight -- isn't it what caused the deadlock in
> > the first place?
>
> Well if kmalloc allocations become a problem then we have numerous
> issues all over the kernel to fix.
>
> > Kmalloc slabs can be shut down when memcg is on.
>
> Kmalloc needs to work even during shutdown of a memcg.
>
> Maybe we need to fix memcg to not allocate from the current memcg during
> shutdown?
>
>

Roman recently added reparenting of memcg kmem caches on memcg offline
and can comment in more detail but we don't shutdown a kmem cache
until all the in-fly memcg allocations are resolved. Also the
allocation here does not look like a __GFP_ACCOUNT allocation.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
  2019-11-11 15:55                           ` [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 Christopher Lameter
@ 2019-11-30 23:09                             ` Andrew Morton
  2019-12-02 15:12                               ` Christopher Lameter
  0 siblings, 1 reply; 38+ messages in thread
From: Andrew Morton @ 2019-11-30 23:09 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
	Kirill A . Shutemov

On Mon, 11 Nov 2019 15:55:05 +0000 (UTC) Christopher Lameter <cl@linux.com> wrote:

> Regardless of the issue with memcgs allowing allocations from its
> kmalloc array during shutdown: This patch cleans things up and properly
> allocates the bitmap outside of the list_lock.
> 
> 
> [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
> 
> V1->V2 : Properly handle CONFIG_SLUB_DEBUG. Handle bitmap free correctly.
> 
> list_slab_objects() is called when a slab is destroyed and there are objects still left
> to list the objects in the syslog. This is a pretty rare event.
> 
> And there it seems we take the list_lock and call kmalloc while holding that lock.
> 
> Perform the allocation in free_partial() before the list_lock is taken.

No response here?  It looks a lot simpler than the originally proposed
patch?

> --- linux.orig/mm/slub.c	2019-10-15 13:54:57.032655296 +0000
> +++ linux/mm/slub.c	2019-11-11 15:52:11.616397853 +0000
> @@ -3690,14 +3690,15 @@ error:
>  }
> 
>  static void list_slab_objects(struct kmem_cache *s, struct page *page,
> -							const char *text)
> +					const char *text, unsigned long *map)
>  {
>  #ifdef CONFIG_SLUB_DEBUG
>  	void *addr = page_address(page);
>  	void *p;
> -	unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
> +
>  	if (!map)
>  		return;
> +
>  	slab_err(s, page, text, s->name);
>  	slab_lock(page);
> 
> @@ -3710,7 +3711,6 @@ static void list_slab_objects(struct kme
>  		}
>  	}
>  	slab_unlock(page);
> -	bitmap_free(map);
>  #endif
>  }
> 
> @@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac
>  {
>  	LIST_HEAD(discard);
>  	struct page *page, *h;
> +	unsigned long *map = NULL;
> +
> +#ifdef CONFIG_SLUB_DEBUG
> +	map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> +#endif
> 
>  	BUG_ON(irqs_disabled());
>  	spin_lock_irq(&n->list_lock);
> @@ -3732,11 +3737,16 @@ static void free_partial(struct kmem_cac
>  			list_add(&page->slab_list, &discard);
>  		} else {
>  			list_slab_objects(s, page,
> -			"Objects remaining in %s on __kmem_cache_shutdown()");
> +			"Objects remaining in %s on __kmem_cache_shutdown()",
> +			map);
>  		}
>  	}
>  	spin_unlock_irq(&n->list_lock);
> 
> +#ifdef CONFIG_SLUB_DEBUG
> +	bitmap_free(map);
> +#endif
> +
>  	list_for_each_entry_safe(page, h, &discard, slab_list)
>  		discard_slab(s, page);
>  }


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
  2019-11-30 23:09                             ` Andrew Morton
@ 2019-12-02 15:12                               ` Christopher Lameter
  2019-12-07 22:03                                 ` Yu Zhao
  0 siblings, 1 reply; 38+ messages in thread
From: Christopher Lameter @ 2019-12-02 15:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
	Kirill A . Shutemov

On Sat, 30 Nov 2019, Andrew Morton wrote:

> > Perform the allocation in free_partial() before the list_lock is taken.
>
> No response here?  It looks a lot simpler than the originally proposed
> patch?

Yup. I prefer this one but its my own patch so I cannot Ack this.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
  2019-12-02 15:12                               ` Christopher Lameter
@ 2019-12-07 22:03                                 ` Yu Zhao
  0 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-12-07 22:03 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
	Kirill A . Shutemov

On Mon, Dec 02, 2019 at 03:12:20PM +0000, Christopher Lameter wrote:
> On Sat, 30 Nov 2019, Andrew Morton wrote:
> 
> > > Perform the allocation in free_partial() before the list_lock is taken.
> >
> > No response here?  It looks a lot simpler than the originally proposed
> > patch?
> 
> Yup. I prefer this one but its my own patch so I cannot Ack this.

Hi, there is a pending question from Tetsuo-san. I'd be happy to ack
once it's address.


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, back to index

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-09  6:10 [PATCH] mm: avoid slub allocation while holding list_lock Yu Zhao
2019-09-09 16:00 ` Kirill A. Shutemov
     [not found]   ` <e5e25aa3-651d-92b4-ac82-c5011c66a7cb@I-love.SAKURA.ne.jp>
2019-09-09 21:39     ` Yu Zhao
     [not found]       ` <201909100141.x8A1fVdu048305@www262.sakura.ne.jp>
2019-09-10  2:16         ` Yu Zhao
2019-09-10  9:16       ` Kirill A. Shutemov
2019-09-11 14:13 ` Andrew Morton
2019-09-12  0:29   ` [PATCH 1/3] mm: correct mask size for slub page->objects Yu Zhao
2019-09-12  0:29     ` [PATCH 2/3] mm: avoid slub allocation while holding list_lock Yu Zhao
2019-09-12  0:44       ` Kirill A. Shutemov
2019-09-12  1:31         ` Yu Zhao
2019-09-12  2:31         ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao
2019-09-12  2:31           ` [PATCH v2 2/4] mm: clean up validate_slab() Yu Zhao
2019-09-12  9:46             ` Kirill A. Shutemov
2019-09-12  2:31           ` [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock Yu Zhao
2019-09-12 10:04             ` Kirill A. Shutemov
2019-09-12  2:31           ` [PATCH v2 4/4] mm: lock slub page when listing objects Yu Zhao
2019-09-12 10:06             ` Kirill A. Shutemov
2019-09-12 21:12               ` Yu Zhao
2019-09-13 14:58             ` Christopher Lameter
2019-09-12  9:40           ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Kirill A. Shutemov
2019-09-12 21:11             ` Yu Zhao
2019-09-12 22:03               ` Kirill A. Shutemov
2019-09-14  0:07           ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao
2019-09-14  0:07             ` [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao
2019-09-16  8:39             ` [PATCH v3 1/2] mm: clean up validate_slab() Kirill A. Shutemov
2019-11-08 19:39             ` [PATCH v4 " Yu Zhao
2019-11-08 19:39               ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao
2019-11-09 20:52                 ` Christopher Lameter
2019-11-09 23:01                   ` Yu Zhao
2019-11-09 23:16                     ` Christopher Lameter
2019-11-10 18:47                       ` Yu Zhao
2019-11-11 15:47                         ` Christopher Lameter
2019-11-11 15:55                           ` [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 Christopher Lameter
2019-11-30 23:09                             ` Andrew Morton
2019-12-02 15:12                               ` Christopher Lameter
2019-12-07 22:03                                 ` Yu Zhao
2019-11-11 18:15                           ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Shakeel Butt
2019-09-12  0:29     ` [PATCH 3/3] mm: lock slub page when listing objects Yu Zhao

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git