* [PATCH] mm: avoid slub allocation while holding list_lock @ 2019-09-09 6:10 Yu Zhao 2019-09-09 16:00 ` Kirill A. Shutemov 2019-09-11 14:13 ` Andrew Morton 0 siblings, 2 replies; 44+ messages in thread From: Yu Zhao @ 2019-09-09 6:10 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton Cc: linux-mm, linux-kernel, Yu Zhao If we are already under list_lock, don't call kmalloc(). Otherwise we will run into deadlock because kmalloc() also tries to grab the same lock. Instead, allocate pages directly. Given currently page->objects has 15 bits, we only need 1 page. We may waste some memory but we only do so when slub debug is on. WARNING: possible recursive locking detected -------------------------------------------- mount-encrypted/4921 is trying to acquire lock: (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 but task is already holding lock: (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&n->list_lock)->rlock); lock(&(&n->list_lock)->rlock); *** DEADLOCK *** Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 8834563cdb4b..574a53ee31e1 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3683,7 +3683,11 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page, #ifdef CONFIG_SLUB_DEBUG void *addr = page_address(page); void *p; - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); + int order; + unsigned long *map; + + order = get_order(DIV_ROUND_UP(page->objects, BITS_PER_BYTE)); + map = (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO, order); if (!map) return; slab_err(s, page, text, s->name); @@ -3698,7 +3702,7 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page, } } slab_unlock(page); - bitmap_free(map); + free_pages((unsigned long)map, order); #endif } -- 2.23.0.187.g17f5b7556c-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: avoid slub allocation while holding list_lock 2019-09-09 6:10 [PATCH] mm: avoid slub allocation while holding list_lock Yu Zhao @ 2019-09-09 16:00 ` Kirill A. Shutemov 2019-09-09 20:57 ` Tetsuo Handa 2019-09-11 14:13 ` Andrew Morton 1 sibling, 1 reply; 44+ messages in thread From: Kirill A. Shutemov @ 2019-09-09 16:00 UTC (permalink / raw) To: Yu Zhao Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm, linux-kernel On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote: > If we are already under list_lock, don't call kmalloc(). Otherwise we > will run into deadlock because kmalloc() also tries to grab the same > lock. > > Instead, allocate pages directly. Given currently page->objects has > 15 bits, we only need 1 page. We may waste some memory but we only do > so when slub debug is on. > > WARNING: possible recursive locking detected > -------------------------------------------- > mount-encrypted/4921 is trying to acquire lock: > (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > > but task is already holding lock: > (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&(&n->list_lock)->rlock); > lock(&(&n->list_lock)->rlock); > > *** DEADLOCK *** > > Signed-off-by: Yu Zhao <yuzhao@google.com> Looks sane to me: Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: avoid slub allocation while holding list_lock 2019-09-09 16:00 ` Kirill A. Shutemov @ 2019-09-09 20:57 ` Tetsuo Handa 2019-09-09 21:39 ` Yu Zhao 0 siblings, 1 reply; 44+ messages in thread From: Tetsuo Handa @ 2019-09-09 20:57 UTC (permalink / raw) To: Kirill A. Shutemov, Yu Zhao Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm, linux-kernel On 2019/09/10 1:00, Kirill A. Shutemov wrote: > On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote: >> If we are already under list_lock, don't call kmalloc(). Otherwise we >> will run into deadlock because kmalloc() also tries to grab the same >> lock. >> >> Instead, allocate pages directly. Given currently page->objects has >> 15 bits, we only need 1 page. We may waste some memory but we only do >> so when slub debug is on. >> >> WARNING: possible recursive locking detected >> -------------------------------------------- >> mount-encrypted/4921 is trying to acquire lock: >> (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 >> >> but task is already holding lock: >> (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb >> >> other info that might help us debug this: >> Possible unsafe locking scenario: >> >> CPU0 >> ---- >> lock(&(&n->list_lock)->rlock); >> lock(&(&n->list_lock)->rlock); >> >> *** DEADLOCK *** >> >> Signed-off-by: Yu Zhao <yuzhao@google.com> > > Looks sane to me: > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Really? Since page->objects is handled as bitmap, alignment should be BITS_PER_LONG than BITS_PER_BYTE (though in this particular case, get_order() would implicitly align BITS_PER_BYTE * PAGE_SIZE). But get_order(0) is an undefined behavior. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: avoid slub allocation while holding list_lock 2019-09-09 20:57 ` Tetsuo Handa @ 2019-09-09 21:39 ` Yu Zhao 2019-09-10 1:41 ` Tetsuo Handa 2019-09-10 9:16 ` Kirill A. Shutemov 0 siblings, 2 replies; 44+ messages in thread From: Yu Zhao @ 2019-09-09 21:39 UTC (permalink / raw) To: Tetsuo Handa Cc: Kirill A. Shutemov, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm, linux-kernel On Tue, Sep 10, 2019 at 05:57:22AM +0900, Tetsuo Handa wrote: > On 2019/09/10 1:00, Kirill A. Shutemov wrote: > > On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote: > >> If we are already under list_lock, don't call kmalloc(). Otherwise we > >> will run into deadlock because kmalloc() also tries to grab the same > >> lock. > >> > >> Instead, allocate pages directly. Given currently page->objects has > >> 15 bits, we only need 1 page. We may waste some memory but we only do > >> so when slub debug is on. > >> > >> WARNING: possible recursive locking detected > >> -------------------------------------------- > >> mount-encrypted/4921 is trying to acquire lock: > >> (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > >> > >> but task is already holding lock: > >> (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > >> > >> other info that might help us debug this: > >> Possible unsafe locking scenario: > >> > >> CPU0 > >> ---- > >> lock(&(&n->list_lock)->rlock); > >> lock(&(&n->list_lock)->rlock); > >> > >> *** DEADLOCK *** > >> > >> Signed-off-by: Yu Zhao <yuzhao@google.com> > > > > Looks sane to me: > > > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > > > Really? > > Since page->objects is handled as bitmap, alignment should be BITS_PER_LONG > than BITS_PER_BYTE (though in this particular case, get_order() would > implicitly align BITS_PER_BYTE * PAGE_SIZE). But get_order(0) is an > undefined behavior. I think we can safely assume PAGE_SIZE is unsigned long aligned and page->objects is non-zero. But if you don't feel comfortable with these assumptions, I'd be happy to ensure them explicitly. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: avoid slub allocation while holding list_lock 2019-09-09 21:39 ` Yu Zhao @ 2019-09-10 1:41 ` Tetsuo Handa 2019-09-10 2:16 ` Yu Zhao 2019-09-10 9:16 ` Kirill A. Shutemov 1 sibling, 1 reply; 44+ messages in thread From: Tetsuo Handa @ 2019-09-10 1:41 UTC (permalink / raw) To: Yu Zhao Cc: Kirill A. Shutemov, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm, linux-kernel Yu Zhao wrote: > I think we can safely assume PAGE_SIZE is unsigned long aligned and > page->objects is non-zero. But if you don't feel comfortable with these > assumptions, I'd be happy to ensure them explicitly. I know PAGE_SIZE is unsigned long aligned. If someone by chance happens to change from "dynamic allocation" to "on stack", get_order() will no longer be called and the bug will show up. I don't know whether __get_free_page(GFP_ATOMIC) can temporarily consume more than 4096 bytes, but if it can, we might want to avoid "dynamic allocation". By the way, if "struct kmem_cache_node" is object which won't have many thousands of instances, can't we embed that buffer into "struct kmem_cache_node" because max size of that buffer is only 4096 bytes? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: avoid slub allocation while holding list_lock 2019-09-10 1:41 ` Tetsuo Handa @ 2019-09-10 2:16 ` Yu Zhao 0 siblings, 0 replies; 44+ messages in thread From: Yu Zhao @ 2019-09-10 2:16 UTC (permalink / raw) To: Tetsuo Handa Cc: Kirill A. Shutemov, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm, linux-kernel On Tue, Sep 10, 2019 at 10:41:31AM +0900, Tetsuo Handa wrote: > Yu Zhao wrote: > > I think we can safely assume PAGE_SIZE is unsigned long aligned and > > page->objects is non-zero. But if you don't feel comfortable with these > > assumptions, I'd be happy to ensure them explicitly. > > I know PAGE_SIZE is unsigned long aligned. If someone by chance happens to > change from "dynamic allocation" to "on stack", get_order() will no longer > be called and the bug will show up. > > I don't know whether __get_free_page(GFP_ATOMIC) can temporarily consume more > than 4096 bytes, but if it can, we might want to avoid "dynamic allocation". With GFP_ATOMIC and ~~__GFP_HIGHMEM, it shouldn't. > By the way, if "struct kmem_cache_node" is object which won't have many thousands > of instances, can't we embed that buffer into "struct kmem_cache_node" because > max size of that buffer is only 4096 bytes? It seems to me allocation in error path is better than always keeping a page around. But the latter may still be acceptable given it's done only when debug is on and, of course, on a per-node scale. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: avoid slub allocation while holding list_lock 2019-09-09 21:39 ` Yu Zhao 2019-09-10 1:41 ` Tetsuo Handa @ 2019-09-10 9:16 ` Kirill A. Shutemov 1 sibling, 0 replies; 44+ messages in thread From: Kirill A. Shutemov @ 2019-09-10 9:16 UTC (permalink / raw) To: Yu Zhao Cc: Tetsuo Handa, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm, linux-kernel On Mon, Sep 09, 2019 at 03:39:38PM -0600, Yu Zhao wrote: > On Tue, Sep 10, 2019 at 05:57:22AM +0900, Tetsuo Handa wrote: > > On 2019/09/10 1:00, Kirill A. Shutemov wrote: > > > On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote: > > >> If we are already under list_lock, don't call kmalloc(). Otherwise we > > >> will run into deadlock because kmalloc() also tries to grab the same > > >> lock. > > >> > > >> Instead, allocate pages directly. Given currently page->objects has > > >> 15 bits, we only need 1 page. We may waste some memory but we only do > > >> so when slub debug is on. > > >> > > >> WARNING: possible recursive locking detected > > >> -------------------------------------------- > > >> mount-encrypted/4921 is trying to acquire lock: > > >> (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > > >> > > >> but task is already holding lock: > > >> (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > > >> > > >> other info that might help us debug this: > > >> Possible unsafe locking scenario: > > >> > > >> CPU0 > > >> ---- > > >> lock(&(&n->list_lock)->rlock); > > >> lock(&(&n->list_lock)->rlock); > > >> > > >> *** DEADLOCK *** > > >> > > >> Signed-off-by: Yu Zhao <yuzhao@google.com> > > > > > > Looks sane to me: > > > > > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > > > > > > Really? > > > > Since page->objects is handled as bitmap, alignment should be BITS_PER_LONG > > than BITS_PER_BYTE (though in this particular case, get_order() would > > implicitly align BITS_PER_BYTE * PAGE_SIZE). But get_order(0) is an > > undefined behavior. > > I think we can safely assume PAGE_SIZE is unsigned long aligned and > page->objects is non-zero. I think it's better to handle page->objects == 0 gracefully. It should not happen, but this code handles situation that should not happen. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] mm: avoid slub allocation while holding list_lock 2019-09-09 6:10 [PATCH] mm: avoid slub allocation while holding list_lock Yu Zhao 2019-09-09 16:00 ` Kirill A. Shutemov @ 2019-09-11 14:13 ` Andrew Morton 2019-09-12 0:29 ` [PATCH 1/3] mm: correct mask size for slub page->objects Yu Zhao 1 sibling, 1 reply; 44+ messages in thread From: Andrew Morton @ 2019-09-11 14:13 UTC (permalink / raw) To: Yu Zhao Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, linux-mm, linux-kernel On Mon, 9 Sep 2019 00:10:16 -0600 Yu Zhao <yuzhao@google.com> wrote: > If we are already under list_lock, don't call kmalloc(). Otherwise we > will run into deadlock because kmalloc() also tries to grab the same > lock. > > Instead, allocate pages directly. Given currently page->objects has > 15 bits, we only need 1 page. We may waste some memory but we only do > so when slub debug is on. > > WARNING: possible recursive locking detected > -------------------------------------------- > mount-encrypted/4921 is trying to acquire lock: > (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > > but task is already holding lock: > (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&(&n->list_lock)->rlock); > lock(&(&n->list_lock)->rlock); > > *** DEADLOCK *** > It would be better if a silly low-level debug function like this weren't to try to allocate memory at all. Did you consider simply using a statically allocated buffer? { static char buffer[something large enough]; static spinlock_t lock_to_protect_it; Alternatively, do we need to call get_map() at all in there? We could simply open-code the get_map() functionality inside list_slab_objects(). It would be slower, but printk is already slow. Potentially extremely slow. ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 1/3] mm: correct mask size for slub page->objects 2019-09-11 14:13 ` Andrew Morton @ 2019-09-12 0:29 ` Yu Zhao 2019-09-12 0:29 ` [PATCH 2/3] mm: avoid slub allocation while holding list_lock Yu Zhao 2019-09-12 0:29 ` [PATCH 3/3] mm: lock slub page when listing objects Yu Zhao 0 siblings, 2 replies; 44+ messages in thread From: Yu Zhao @ 2019-09-12 0:29 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa Cc: linux-mm, linux-kernel, Yu Zhao Mask of slub objects per page shouldn't be larger than what page->objects can hold. It requires more than 2^15 objects to hit the problem, and I don't think anybody would. It'd be nice to have the mask fixed, but not really worth cc'ing the stable. Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters") Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index 8834563cdb4b..62053ceb4464 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -187,7 +187,7 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s) */ #define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER) -#define OO_SHIFT 16 +#define OO_SHIFT 15 #define OO_MASK ((1 << OO_SHIFT) - 1) #define MAX_OBJS_PER_PAGE 32767 /* since page.objects is u15 */ @@ -343,6 +343,8 @@ static inline unsigned int oo_order(struct kmem_cache_order_objects x) static inline unsigned int oo_objects(struct kmem_cache_order_objects x) { + BUILD_BUG_ON(OO_MASK > MAX_OBJS_PER_PAGE); + return x.x & OO_MASK; } -- 2.23.0.162.g0b9fbb3734-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/3] mm: avoid slub allocation while holding list_lock 2019-09-12 0:29 ` [PATCH 1/3] mm: correct mask size for slub page->objects Yu Zhao @ 2019-09-12 0:29 ` Yu Zhao 2019-09-12 0:44 ` Kirill A. Shutemov 2019-09-12 0:29 ` [PATCH 3/3] mm: lock slub page when listing objects Yu Zhao 1 sibling, 1 reply; 44+ messages in thread From: Yu Zhao @ 2019-09-12 0:29 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa Cc: linux-mm, linux-kernel, Yu Zhao If we are already under list_lock, don't call kmalloc(). Otherwise we will run into deadlock because kmalloc() also tries to grab the same lock. Instead, statically allocate bitmap in struct kmem_cache_node. Given currently page->objects has 15 bits, we bloat the per-node struct by 4K. So we waste some memory but only do so when slub debug is on. WARNING: possible recursive locking detected -------------------------------------------- mount-encrypted/4921 is trying to acquire lock: (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 but task is already holding lock: (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&n->list_lock)->rlock); lock(&(&n->list_lock)->rlock); *** DEADLOCK *** Signed-off-by: Yu Zhao <yuzhao@google.com> --- include/linux/slub_def.h | 4 ++++ mm/slab.h | 1 + mm/slub.c | 44 ++++++++++++++-------------------------- 3 files changed, 20 insertions(+), 29 deletions(-) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index d2153789bd9f..719d43574360 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -9,6 +9,10 @@ */ #include <linux/kobject.h> +#define OO_SHIFT 15 +#define OO_MASK ((1 << OO_SHIFT) - 1) +#define MAX_OBJS_PER_PAGE 32767 /* since page.objects is u15 */ + enum stat_item { ALLOC_FASTPATH, /* Allocation from cpu slab */ ALLOC_SLOWPATH, /* Allocation by getting a new cpu slab */ diff --git a/mm/slab.h b/mm/slab.h index 9057b8056b07..2d8639835db1 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -556,6 +556,7 @@ struct kmem_cache_node { atomic_long_t nr_slabs; atomic_long_t total_objects; struct list_head full; + unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)]; #endif #endif diff --git a/mm/slub.c b/mm/slub.c index 62053ceb4464..f28072c9f2ce 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -187,10 +187,6 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s) */ #define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER) -#define OO_SHIFT 15 -#define OO_MASK ((1 << OO_SHIFT) - 1) -#define MAX_OBJS_PER_PAGE 32767 /* since page.objects is u15 */ - /* Internal SLUB flags */ /* Poison object */ #define __OBJECT_POISON ((slab_flags_t __force)0x80000000U) @@ -454,6 +450,8 @@ static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map) void *p; void *addr = page_address(page); + bitmap_zero(map, page->objects); + for (p = page->freelist; p; p = get_freepointer(s, p)) set_bit(slab_index(p, s, addr), map); } @@ -3680,14 +3678,12 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags) } static void list_slab_objects(struct kmem_cache *s, struct page *page, - const char *text) + unsigned long *map, const char *text) { #ifdef CONFIG_SLUB_DEBUG void *addr = page_address(page); void *p; - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); - if (!map) - return; + slab_err(s, page, text, s->name); slab_lock(page); @@ -3699,8 +3695,8 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page, print_tracking(s, p); } } + slab_unlock(page); - bitmap_free(map); #endif } @@ -3721,7 +3717,7 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n) remove_partial(n, page); list_add(&page->slab_list, &discard); } else { - list_slab_objects(s, page, + list_slab_objects(s, page, n->object_map, "Objects remaining in %s on __kmem_cache_shutdown()"); } } @@ -4397,7 +4393,6 @@ static int validate_slab(struct kmem_cache *s, struct page *page, return 0; /* Now we know that a valid freelist exists */ - bitmap_zero(map, page->objects); get_map(s, page, map); for_each_object(p, s, addr, page->objects) { @@ -4422,7 +4417,7 @@ static void validate_slab_slab(struct kmem_cache *s, struct page *page, } static int validate_slab_node(struct kmem_cache *s, - struct kmem_cache_node *n, unsigned long *map) + struct kmem_cache_node *n) { unsigned long count = 0; struct page *page; @@ -4431,7 +4426,7 @@ static int validate_slab_node(struct kmem_cache *s, spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) { - validate_slab_slab(s, page, map); + validate_slab_slab(s, page, n->object_map); count++; } if (count != n->nr_partial) @@ -4442,7 +4437,7 @@ static int validate_slab_node(struct kmem_cache *s, goto out; list_for_each_entry(page, &n->full, slab_list) { - validate_slab_slab(s, page, map); + validate_slab_slab(s, page, n->object_map); count++; } if (count != atomic_long_read(&n->nr_slabs)) @@ -4459,15 +4454,11 @@ static long validate_slab_cache(struct kmem_cache *s) int node; unsigned long count = 0; struct kmem_cache_node *n; - unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); - - if (!map) - return -ENOMEM; flush_all(s); for_each_kmem_cache_node(s, node, n) - count += validate_slab_node(s, n, map); - bitmap_free(map); + count += validate_slab_node(s, n); + return count; } /* @@ -4603,9 +4594,7 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s, void *addr = page_address(page); void *p; - bitmap_zero(map, page->objects); get_map(s, page, map); - for_each_object(p, s, addr, page->objects) if (!test_bit(slab_index(p, s, addr), map)) add_location(t, s, get_track(s, p, alloc)); @@ -4619,11 +4608,9 @@ static int list_locations(struct kmem_cache *s, char *buf, struct loc_track t = { 0, 0, NULL }; int node; struct kmem_cache_node *n; - unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); - if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location), - GFP_KERNEL)) { - bitmap_free(map); + if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location), + GFP_KERNEL)) { return sprintf(buf, "Out of memory\n"); } /* Push back cpu slabs */ @@ -4638,9 +4625,9 @@ static int list_locations(struct kmem_cache *s, char *buf, spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) - process_slab(&t, s, page, alloc, map); + process_slab(&t, s, page, alloc, n->object_map); list_for_each_entry(page, &n->full, slab_list) - process_slab(&t, s, page, alloc, map); + process_slab(&t, s, page, alloc, n->object_map); spin_unlock_irqrestore(&n->list_lock, flags); } @@ -4689,7 +4676,6 @@ static int list_locations(struct kmem_cache *s, char *buf, } free_loc_track(&t); - bitmap_free(map); if (!t.count) len += sprintf(buf, "No data\n"); return len; -- 2.23.0.162.g0b9fbb3734-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/3] mm: avoid slub allocation while holding list_lock 2019-09-12 0:29 ` [PATCH 2/3] mm: avoid slub allocation while holding list_lock Yu Zhao @ 2019-09-12 0:44 ` Kirill A. Shutemov 2019-09-12 1:31 ` Yu Zhao 2019-09-12 2:31 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao 0 siblings, 2 replies; 44+ messages in thread From: Kirill A. Shutemov @ 2019-09-12 0:44 UTC (permalink / raw) To: Yu Zhao Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel On Wed, Sep 11, 2019 at 06:29:28PM -0600, Yu Zhao wrote: > If we are already under list_lock, don't call kmalloc(). Otherwise we > will run into deadlock because kmalloc() also tries to grab the same > lock. > > Instead, statically allocate bitmap in struct kmem_cache_node. Given > currently page->objects has 15 bits, we bloat the per-node struct by > 4K. So we waste some memory but only do so when slub debug is on. Why not have single page total protected by a lock? Listing object from two pages at the same time doesn't make sense anyway. Cuncurent validating is not something sane to do. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/3] mm: avoid slub allocation while holding list_lock 2019-09-12 0:44 ` Kirill A. Shutemov @ 2019-09-12 1:31 ` Yu Zhao 2019-09-12 2:31 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao 1 sibling, 0 replies; 44+ messages in thread From: Yu Zhao @ 2019-09-12 1:31 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel On Thu, Sep 12, 2019 at 03:44:01AM +0300, Kirill A. Shutemov wrote: > On Wed, Sep 11, 2019 at 06:29:28PM -0600, Yu Zhao wrote: > > If we are already under list_lock, don't call kmalloc(). Otherwise we > > will run into deadlock because kmalloc() also tries to grab the same > > lock. > > > > Instead, statically allocate bitmap in struct kmem_cache_node. Given > > currently page->objects has 15 bits, we bloat the per-node struct by > > 4K. So we waste some memory but only do so when slub debug is on. > > Why not have single page total protected by a lock? > > Listing object from two pages at the same time doesn't make sense anyway. > Cuncurent validating is not something sane to do. Okay, cutting down to static global bitmap. ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 1/4] mm: correct mask size for slub page->objects 2019-09-12 0:44 ` Kirill A. Shutemov 2019-09-12 1:31 ` Yu Zhao @ 2019-09-12 2:31 ` Yu Zhao 2019-09-12 2:31 ` [PATCH v2 2/4] mm: clean up validate_slab() Yu Zhao ` (4 more replies) 1 sibling, 5 replies; 44+ messages in thread From: Yu Zhao @ 2019-09-12 2:31 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa Cc: linux-mm, linux-kernel, Yu Zhao Mask of slub objects per page shouldn't be larger than what page->objects can hold. It requires more than 2^15 objects to hit the problem, and I don't think anybody would. It'd be nice to have the mask fixed, but not really worth cc'ing the stable. Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters") Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index 8834563cdb4b..62053ceb4464 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -187,7 +187,7 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s) */ #define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER) -#define OO_SHIFT 16 +#define OO_SHIFT 15 #define OO_MASK ((1 << OO_SHIFT) - 1) #define MAX_OBJS_PER_PAGE 32767 /* since page.objects is u15 */ @@ -343,6 +343,8 @@ static inline unsigned int oo_order(struct kmem_cache_order_objects x) static inline unsigned int oo_objects(struct kmem_cache_order_objects x) { + BUILD_BUG_ON(OO_MASK > MAX_OBJS_PER_PAGE); + return x.x & OO_MASK; } -- 2.23.0.162.g0b9fbb3734-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 2/4] mm: clean up validate_slab() 2019-09-12 2:31 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao @ 2019-09-12 2:31 ` Yu Zhao 2019-09-12 9:46 ` Kirill A. Shutemov 2019-09-12 2:31 ` [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock Yu Zhao ` (3 subsequent siblings) 4 siblings, 1 reply; 44+ messages in thread From: Yu Zhao @ 2019-09-12 2:31 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa Cc: linux-mm, linux-kernel, Yu Zhao The function doesn't need to return any value, and the check can be done in one pass. There is a behavior change: before the patch, we stop at the first invalid free object; after the patch, we stop at the first invalid object, free or in use. This shouldn't matter because the original behavior isn't intended anyway. Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 62053ceb4464..7b7e1ee264ef 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4386,31 +4386,26 @@ static int count_total(struct page *page) #endif #ifdef CONFIG_SLUB_DEBUG -static int validate_slab(struct kmem_cache *s, struct page *page, +static void validate_slab(struct kmem_cache *s, struct page *page, unsigned long *map) { void *p; void *addr = page_address(page); - if (!check_slab(s, page) || - !on_freelist(s, page, NULL)) - return 0; + if (!check_slab(s, page) || !on_freelist(s, page, NULL)) + return; /* Now we know that a valid freelist exists */ bitmap_zero(map, page->objects); get_map(s, page, map); for_each_object(p, s, addr, page->objects) { - if (test_bit(slab_index(p, s, addr), map)) - if (!check_object(s, page, p, SLUB_RED_INACTIVE)) - return 0; - } + u8 val = test_bit(slab_index(p, s, addr), map) ? + SLUB_RED_INACTIVE : SLUB_RED_ACTIVE; - for_each_object(p, s, addr, page->objects) - if (!test_bit(slab_index(p, s, addr), map)) - if (!check_object(s, page, p, SLUB_RED_ACTIVE)) - return 0; - return 1; + if (!check_object(s, page, p, val)) + break; + } } static void validate_slab_slab(struct kmem_cache *s, struct page *page, -- 2.23.0.162.g0b9fbb3734-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] mm: clean up validate_slab() 2019-09-12 2:31 ` [PATCH v2 2/4] mm: clean up validate_slab() Yu Zhao @ 2019-09-12 9:46 ` Kirill A. Shutemov 0 siblings, 0 replies; 44+ messages in thread From: Kirill A. Shutemov @ 2019-09-12 9:46 UTC (permalink / raw) To: Yu Zhao Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel On Wed, Sep 11, 2019 at 08:31:09PM -0600, Yu Zhao wrote: > The function doesn't need to return any value, and the check can be > done in one pass. > > There is a behavior change: before the patch, we stop at the first > invalid free object; after the patch, we stop at the first invalid > object, free or in use. This shouldn't matter because the original > behavior isn't intended anyway. > > Signed-off-by: Yu Zhao <yuzhao@google.com> > --- > mm/slub.c | 21 ++++++++------------- > 1 file changed, 8 insertions(+), 13 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 62053ceb4464..7b7e1ee264ef 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -4386,31 +4386,26 @@ static int count_total(struct page *page) > #endif > > #ifdef CONFIG_SLUB_DEBUG > -static int validate_slab(struct kmem_cache *s, struct page *page, > +static void validate_slab(struct kmem_cache *s, struct page *page, > unsigned long *map) > { > void *p; > void *addr = page_address(page); > > - if (!check_slab(s, page) || > - !on_freelist(s, page, NULL)) > - return 0; > + if (!check_slab(s, page) || !on_freelist(s, page, NULL)) > + return; > > /* Now we know that a valid freelist exists */ > bitmap_zero(map, page->objects); > > get_map(s, page, map); > for_each_object(p, s, addr, page->objects) { > - if (test_bit(slab_index(p, s, addr), map)) > - if (!check_object(s, page, p, SLUB_RED_INACTIVE)) > - return 0; > - } > + u8 val = test_bit(slab_index(p, s, addr), map) ? > + SLUB_RED_INACTIVE : SLUB_RED_ACTIVE; Proper 'if' would be more readable. Other than that look fine to me. > > - for_each_object(p, s, addr, page->objects) > - if (!test_bit(slab_index(p, s, addr), map)) > - if (!check_object(s, page, p, SLUB_RED_ACTIVE)) > - return 0; > - return 1; > + if (!check_object(s, page, p, val)) > + break; > + } > } > > static void validate_slab_slab(struct kmem_cache *s, struct page *page, > -- > 2.23.0.162.g0b9fbb3734-goog > -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock 2019-09-12 2:31 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao 2019-09-12 2:31 ` [PATCH v2 2/4] mm: clean up validate_slab() Yu Zhao @ 2019-09-12 2:31 ` Yu Zhao 2019-09-12 10:04 ` Kirill A. Shutemov 2019-09-12 2:31 ` [PATCH v2 4/4] mm: lock slub page when listing objects Yu Zhao ` (2 subsequent siblings) 4 siblings, 1 reply; 44+ messages in thread From: Yu Zhao @ 2019-09-12 2:31 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa Cc: linux-mm, linux-kernel, Yu Zhao If we are already under list_lock, don't call kmalloc(). Otherwise we will run into deadlock because kmalloc() also tries to grab the same lock. Fixing the problem by using a static bitmap instead. WARNING: possible recursive locking detected -------------------------------------------- mount-encrypted/4921 is trying to acquire lock: (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 but task is already holding lock: (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&n->list_lock)->rlock); lock(&(&n->list_lock)->rlock); *** DEADLOCK *** Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 88 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 47 insertions(+), 41 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 7b7e1ee264ef..baa60dd73942 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -443,19 +443,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page, } #ifdef CONFIG_SLUB_DEBUG +static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)]; +static DEFINE_SPINLOCK(object_map_lock); + /* * Determine a map of object in use on a page. * * Node listlock must be held to guarantee that the page does * not vanish from under us. */ -static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map) +static unsigned long *get_map(struct kmem_cache *s, struct page *page) { void *p; void *addr = page_address(page); + VM_BUG_ON(!irqs_disabled()); + + spin_lock(&object_map_lock); + + bitmap_zero(object_map, page->objects); + for (p = page->freelist; p; p = get_freepointer(s, p)) - set_bit(slab_index(p, s, addr), map); + set_bit(slab_index(p, s, addr), object_map); + + return object_map; +} + +static void put_map(unsigned long *map) +{ + VM_BUG_ON(map != object_map); + lockdep_assert_held(&object_map_lock); + + spin_unlock(&object_map_lock); } static inline unsigned int size_from_object(struct kmem_cache *s) @@ -3685,13 +3704,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page, #ifdef CONFIG_SLUB_DEBUG void *addr = page_address(page); void *p; - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); - if (!map) - return; + unsigned long *map; + slab_err(s, page, text, s->name); slab_lock(page); - get_map(s, page, map); + map = get_map(s, page); for_each_object(p, s, addr, page->objects) { if (!test_bit(slab_index(p, s, addr), map)) { @@ -3699,8 +3717,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page, print_tracking(s, p); } } + put_map(map); + slab_unlock(page); - bitmap_free(map); #endif } @@ -4386,19 +4405,19 @@ static int count_total(struct page *page) #endif #ifdef CONFIG_SLUB_DEBUG -static void validate_slab(struct kmem_cache *s, struct page *page, - unsigned long *map) +static void validate_slab(struct kmem_cache *s, struct page *page) { void *p; void *addr = page_address(page); + unsigned long *map; + + slab_lock(page); if (!check_slab(s, page) || !on_freelist(s, page, NULL)) - return; + goto unlock; /* Now we know that a valid freelist exists */ - bitmap_zero(map, page->objects); - - get_map(s, page, map); + map = get_map(s, page); for_each_object(p, s, addr, page->objects) { u8 val = test_bit(slab_index(p, s, addr), map) ? SLUB_RED_INACTIVE : SLUB_RED_ACTIVE; @@ -4406,18 +4425,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page, if (!check_object(s, page, p, val)) break; } -} - -static void validate_slab_slab(struct kmem_cache *s, struct page *page, - unsigned long *map) -{ - slab_lock(page); - validate_slab(s, page, map); + put_map(map); +unlock: slab_unlock(page); } static int validate_slab_node(struct kmem_cache *s, - struct kmem_cache_node *n, unsigned long *map) + struct kmem_cache_node *n) { unsigned long count = 0; struct page *page; @@ -4426,7 +4440,7 @@ static int validate_slab_node(struct kmem_cache *s, spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) { - validate_slab_slab(s, page, map); + validate_slab(s, page); count++; } if (count != n->nr_partial) @@ -4437,7 +4451,7 @@ static int validate_slab_node(struct kmem_cache *s, goto out; list_for_each_entry(page, &n->full, slab_list) { - validate_slab_slab(s, page, map); + validate_slab(s, page); count++; } if (count != atomic_long_read(&n->nr_slabs)) @@ -4454,15 +4468,11 @@ static long validate_slab_cache(struct kmem_cache *s) int node; unsigned long count = 0; struct kmem_cache_node *n; - unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); - - if (!map) - return -ENOMEM; flush_all(s); for_each_kmem_cache_node(s, node, n) - count += validate_slab_node(s, n, map); - bitmap_free(map); + count += validate_slab_node(s, n); + return count; } /* @@ -4592,18 +4602,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s, } static void process_slab(struct loc_track *t, struct kmem_cache *s, - struct page *page, enum track_item alloc, - unsigned long *map) + struct page *page, enum track_item alloc) { void *addr = page_address(page); void *p; + unsigned long *map; - bitmap_zero(map, page->objects); - get_map(s, page, map); - + map = get_map(s, page); for_each_object(p, s, addr, page->objects) if (!test_bit(slab_index(p, s, addr), map)) add_location(t, s, get_track(s, p, alloc)); + put_map(map); } static int list_locations(struct kmem_cache *s, char *buf, @@ -4614,11 +4623,9 @@ static int list_locations(struct kmem_cache *s, char *buf, struct loc_track t = { 0, 0, NULL }; int node; struct kmem_cache_node *n; - unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); - if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location), - GFP_KERNEL)) { - bitmap_free(map); + if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location), + GFP_KERNEL)) { return sprintf(buf, "Out of memory\n"); } /* Push back cpu slabs */ @@ -4633,9 +4640,9 @@ static int list_locations(struct kmem_cache *s, char *buf, spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) - process_slab(&t, s, page, alloc, map); + process_slab(&t, s, page, alloc); list_for_each_entry(page, &n->full, slab_list) - process_slab(&t, s, page, alloc, map); + process_slab(&t, s, page, alloc); spin_unlock_irqrestore(&n->list_lock, flags); } @@ -4684,7 +4691,6 @@ static int list_locations(struct kmem_cache *s, char *buf, } free_loc_track(&t); - bitmap_free(map); if (!t.count) len += sprintf(buf, "No data\n"); return len; -- 2.23.0.162.g0b9fbb3734-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock 2019-09-12 2:31 ` [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock Yu Zhao @ 2019-09-12 10:04 ` Kirill A. Shutemov 0 siblings, 0 replies; 44+ messages in thread From: Kirill A. Shutemov @ 2019-09-12 10:04 UTC (permalink / raw) To: Yu Zhao Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel On Wed, Sep 11, 2019 at 08:31:10PM -0600, Yu Zhao wrote: > If we are already under list_lock, don't call kmalloc(). Otherwise we > will run into deadlock because kmalloc() also tries to grab the same > lock. > > Fixing the problem by using a static bitmap instead. > > WARNING: possible recursive locking detected > -------------------------------------------- > mount-encrypted/4921 is trying to acquire lock: > (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > > but task is already holding lock: > (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&(&n->list_lock)->rlock); > lock(&(&n->list_lock)->rlock); > > *** DEADLOCK *** > > Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 4/4] mm: lock slub page when listing objects 2019-09-12 2:31 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao 2019-09-12 2:31 ` [PATCH v2 2/4] mm: clean up validate_slab() Yu Zhao 2019-09-12 2:31 ` [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock Yu Zhao @ 2019-09-12 2:31 ` Yu Zhao 2019-09-12 10:06 ` Kirill A. Shutemov 2019-09-13 14:58 ` Christopher Lameter 2019-09-12 9:40 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Kirill A. Shutemov 2019-09-14 0:07 ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao 4 siblings, 2 replies; 44+ messages in thread From: Yu Zhao @ 2019-09-12 2:31 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa Cc: linux-mm, linux-kernel, Yu Zhao Though I have no idea what the side effect of such race would be, apparently we want to prevent the free list from being changed while debugging the objects. Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/slub.c b/mm/slub.c index baa60dd73942..1c9726c28f0b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4608,11 +4608,15 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s, void *p; unsigned long *map; + slab_lock(page); + map = get_map(s, page); for_each_object(p, s, addr, page->objects) if (!test_bit(slab_index(p, s, addr), map)) add_location(t, s, get_track(s, p, alloc)); put_map(map); + + slab_unlock(page); } static int list_locations(struct kmem_cache *s, char *buf, -- 2.23.0.162.g0b9fbb3734-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2 4/4] mm: lock slub page when listing objects 2019-09-12 2:31 ` [PATCH v2 4/4] mm: lock slub page when listing objects Yu Zhao @ 2019-09-12 10:06 ` Kirill A. Shutemov 2019-09-12 21:12 ` Yu Zhao 2019-09-13 14:58 ` Christopher Lameter 1 sibling, 1 reply; 44+ messages in thread From: Kirill A. Shutemov @ 2019-09-12 10:06 UTC (permalink / raw) To: Yu Zhao Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel On Wed, Sep 11, 2019 at 08:31:11PM -0600, Yu Zhao wrote: > Though I have no idea what the side effect of such race would be, > apparently we want to prevent the free list from being changed > while debugging the objects. Codewise looks good to me. But commit message can be better. Can we repharase it to indicate that slab_lock is required to protect page->objects? -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 4/4] mm: lock slub page when listing objects 2019-09-12 10:06 ` Kirill A. Shutemov @ 2019-09-12 21:12 ` Yu Zhao 0 siblings, 0 replies; 44+ messages in thread From: Yu Zhao @ 2019-09-12 21:12 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel On Thu, Sep 12, 2019 at 01:06:42PM +0300, Kirill A. Shutemov wrote: > On Wed, Sep 11, 2019 at 08:31:11PM -0600, Yu Zhao wrote: > > Though I have no idea what the side effect of such race would be, > > apparently we want to prevent the free list from being changed > > while debugging the objects. > > Codewise looks good to me. But commit message can be better. > > Can we repharase it to indicate that slab_lock is required to protect > page->objects? Will do. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 4/4] mm: lock slub page when listing objects 2019-09-12 2:31 ` [PATCH v2 4/4] mm: lock slub page when listing objects Yu Zhao 2019-09-12 10:06 ` Kirill A. Shutemov @ 2019-09-13 14:58 ` Christopher Lameter 1 sibling, 0 replies; 44+ messages in thread From: Christopher Lameter @ 2019-09-13 14:58 UTC (permalink / raw) To: Yu Zhao Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel On Wed, 11 Sep 2019, Yu Zhao wrote: > Though I have no idea what the side effect of such race would be, > apparently we want to prevent the free list from being changed > while debugging the objects. process_slab() is called under the list_lock which prevents any allocation from the free list in the slab page. This means that new objects can be added to the freelist which occurs by updating the freelist pointer in the slab page with a pointer to the newly free object which in turn contains the old freelist pointr. It is therefore possible to safely traverse the objects on the freelist after the pointer has been retrieved NAK. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects 2019-09-12 2:31 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao ` (2 preceding siblings ...) 2019-09-12 2:31 ` [PATCH v2 4/4] mm: lock slub page when listing objects Yu Zhao @ 2019-09-12 9:40 ` Kirill A. Shutemov 2019-09-12 21:11 ` Yu Zhao 2019-09-14 0:07 ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao 4 siblings, 1 reply; 44+ messages in thread From: Kirill A. Shutemov @ 2019-09-12 9:40 UTC (permalink / raw) To: Yu Zhao Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote: > Mask of slub objects per page shouldn't be larger than what > page->objects can hold. > > It requires more than 2^15 objects to hit the problem, and I don't > think anybody would. It'd be nice to have the mask fixed, but not > really worth cc'ing the stable. > > Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters") > Signed-off-by: Yu Zhao <yuzhao@google.com> I don't think the patch fixes anything. Yes, we have one spare bit between order and number of object that is not used and always zero. So what? I can imagine for some microarchitecures accessing higher 16 bits of int is cheaper than shifting by 15. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects 2019-09-12 9:40 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Kirill A. Shutemov @ 2019-09-12 21:11 ` Yu Zhao 2019-09-12 22:03 ` Kirill A. Shutemov 0 siblings, 1 reply; 44+ messages in thread From: Yu Zhao @ 2019-09-12 21:11 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel On Thu, Sep 12, 2019 at 12:40:35PM +0300, Kirill A. Shutemov wrote: > On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote: > > Mask of slub objects per page shouldn't be larger than what > > page->objects can hold. > > > > It requires more than 2^15 objects to hit the problem, and I don't > > think anybody would. It'd be nice to have the mask fixed, but not > > really worth cc'ing the stable. > > > > Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters") > > Signed-off-by: Yu Zhao <yuzhao@google.com> > > I don't think the patch fixes anything. Technically it does. It makes no sense for a mask to have more bits than the variable that holds the masked value. I had to look up the commit history to find out why and go through the code to make sure it doesn't actually cause any problem. My hope is that nobody else would have to go through the same trouble. > Yes, we have one spare bit between order and number of object that is not > used and always zero. So what? > > I can imagine for some microarchitecures accessing higher 16 bits of int > is cheaper than shifting by 15. Well, I highly doubt the inconsistency is intended for such optimization, even it existed. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects 2019-09-12 21:11 ` Yu Zhao @ 2019-09-12 22:03 ` Kirill A. Shutemov 0 siblings, 0 replies; 44+ messages in thread From: Kirill A. Shutemov @ 2019-09-12 22:03 UTC (permalink / raw) To: Yu Zhao Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel On Thu, Sep 12, 2019 at 03:11:14PM -0600, Yu Zhao wrote: > On Thu, Sep 12, 2019 at 12:40:35PM +0300, Kirill A. Shutemov wrote: > > On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote: > > > Mask of slub objects per page shouldn't be larger than what > > > page->objects can hold. > > > > > > It requires more than 2^15 objects to hit the problem, and I don't > > > think anybody would. It'd be nice to have the mask fixed, but not > > > really worth cc'ing the stable. > > > > > > Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters") > > > Signed-off-by: Yu Zhao <yuzhao@google.com> > > > > I don't think the patch fixes anything. > > Technically it does. It makes no sense for a mask to have more bits > than the variable that holds the masked value. I had to look up the > commit history to find out why and go through the code to make sure > it doesn't actually cause any problem. > > My hope is that nobody else would have to go through the same trouble. Just put some comments then. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v3 1/2] mm: clean up validate_slab() 2019-09-12 2:31 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao ` (3 preceding siblings ...) 2019-09-12 9:40 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Kirill A. Shutemov @ 2019-09-14 0:07 ` Yu Zhao 2019-09-14 0:07 ` [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao ` (2 more replies) 4 siblings, 3 replies; 44+ messages in thread From: Yu Zhao @ 2019-09-14 0:07 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa Cc: linux-mm, linux-kernel, Yu Zhao The function doesn't need to return any value, and the check can be done in one pass. There is a behavior change: before the patch, we stop at the first invalid free object; after the patch, we stop at the first invalid object, free or in use. This shouldn't matter because the original behavior isn't intended anyway. Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 8834563cdb4b..445ef8b2aec0 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4384,31 +4384,26 @@ static int count_total(struct page *page) #endif #ifdef CONFIG_SLUB_DEBUG -static int validate_slab(struct kmem_cache *s, struct page *page, +static void validate_slab(struct kmem_cache *s, struct page *page, unsigned long *map) { void *p; void *addr = page_address(page); - if (!check_slab(s, page) || - !on_freelist(s, page, NULL)) - return 0; + if (!check_slab(s, page) || !on_freelist(s, page, NULL)) + return; /* Now we know that a valid freelist exists */ bitmap_zero(map, page->objects); get_map(s, page, map); for_each_object(p, s, addr, page->objects) { - if (test_bit(slab_index(p, s, addr), map)) - if (!check_object(s, page, p, SLUB_RED_INACTIVE)) - return 0; - } + u8 val = test_bit(slab_index(p, s, addr), map) ? + SLUB_RED_INACTIVE : SLUB_RED_ACTIVE; - for_each_object(p, s, addr, page->objects) - if (!test_bit(slab_index(p, s, addr), map)) - if (!check_object(s, page, p, SLUB_RED_ACTIVE)) - return 0; - return 1; + if (!check_object(s, page, p, val)) + break; + } } static void validate_slab_slab(struct kmem_cache *s, struct page *page, -- 2.23.0.237.gc6a4ce50a0-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock 2019-09-14 0:07 ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao @ 2019-09-14 0:07 ` Yu Zhao 2019-09-16 8:39 ` [PATCH v3 1/2] mm: clean up validate_slab() Kirill A. Shutemov 2019-11-08 19:39 ` [PATCH v4 " Yu Zhao 2 siblings, 0 replies; 44+ messages in thread From: Yu Zhao @ 2019-09-14 0:07 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov If we are already under list_lock, don't call kmalloc(). Otherwise we will run into deadlock because kmalloc() also tries to grab the same lock. Fixing the problem by using a static bitmap instead. WARNING: possible recursive locking detected -------------------------------------------- mount-encrypted/4921 is trying to acquire lock: (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 but task is already holding lock: (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&n->list_lock)->rlock); lock(&(&n->list_lock)->rlock); *** DEADLOCK *** Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 88 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 47 insertions(+), 41 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 445ef8b2aec0..c1de01730648 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -441,19 +441,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page, } #ifdef CONFIG_SLUB_DEBUG +static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)]; +static DEFINE_SPINLOCK(object_map_lock); + /* * Determine a map of object in use on a page. * * Node listlock must be held to guarantee that the page does * not vanish from under us. */ -static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map) +static unsigned long *get_map(struct kmem_cache *s, struct page *page) { void *p; void *addr = page_address(page); + VM_BUG_ON(!irqs_disabled()); + + spin_lock(&object_map_lock); + + bitmap_zero(object_map, page->objects); + for (p = page->freelist; p; p = get_freepointer(s, p)) - set_bit(slab_index(p, s, addr), map); + set_bit(slab_index(p, s, addr), object_map); + + return object_map; +} + +static void put_map(unsigned long *map) +{ + VM_BUG_ON(map != object_map); + lockdep_assert_held(&object_map_lock); + + spin_unlock(&object_map_lock); } static inline unsigned int size_from_object(struct kmem_cache *s) @@ -3683,13 +3702,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page, #ifdef CONFIG_SLUB_DEBUG void *addr = page_address(page); void *p; - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); - if (!map) - return; + unsigned long *map; + slab_err(s, page, text, s->name); slab_lock(page); - get_map(s, page, map); + map = get_map(s, page); for_each_object(p, s, addr, page->objects) { if (!test_bit(slab_index(p, s, addr), map)) { @@ -3697,8 +3715,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page, print_tracking(s, p); } } + put_map(map); + slab_unlock(page); - bitmap_free(map); #endif } @@ -4384,19 +4403,19 @@ static int count_total(struct page *page) #endif #ifdef CONFIG_SLUB_DEBUG -static void validate_slab(struct kmem_cache *s, struct page *page, - unsigned long *map) +static void validate_slab(struct kmem_cache *s, struct page *page) { void *p; void *addr = page_address(page); + unsigned long *map; + + slab_lock(page); if (!check_slab(s, page) || !on_freelist(s, page, NULL)) - return; + goto unlock; /* Now we know that a valid freelist exists */ - bitmap_zero(map, page->objects); - - get_map(s, page, map); + map = get_map(s, page); for_each_object(p, s, addr, page->objects) { u8 val = test_bit(slab_index(p, s, addr), map) ? SLUB_RED_INACTIVE : SLUB_RED_ACTIVE; @@ -4404,18 +4423,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page, if (!check_object(s, page, p, val)) break; } -} - -static void validate_slab_slab(struct kmem_cache *s, struct page *page, - unsigned long *map) -{ - slab_lock(page); - validate_slab(s, page, map); + put_map(map); +unlock: slab_unlock(page); } static int validate_slab_node(struct kmem_cache *s, - struct kmem_cache_node *n, unsigned long *map) + struct kmem_cache_node *n) { unsigned long count = 0; struct page *page; @@ -4424,7 +4438,7 @@ static int validate_slab_node(struct kmem_cache *s, spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) { - validate_slab_slab(s, page, map); + validate_slab(s, page); count++; } if (count != n->nr_partial) @@ -4435,7 +4449,7 @@ static int validate_slab_node(struct kmem_cache *s, goto out; list_for_each_entry(page, &n->full, slab_list) { - validate_slab_slab(s, page, map); + validate_slab(s, page); count++; } if (count != atomic_long_read(&n->nr_slabs)) @@ -4452,15 +4466,11 @@ static long validate_slab_cache(struct kmem_cache *s) int node; unsigned long count = 0; struct kmem_cache_node *n; - unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); - - if (!map) - return -ENOMEM; flush_all(s); for_each_kmem_cache_node(s, node, n) - count += validate_slab_node(s, n, map); - bitmap_free(map); + count += validate_slab_node(s, n); + return count; } /* @@ -4590,18 +4600,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s, } static void process_slab(struct loc_track *t, struct kmem_cache *s, - struct page *page, enum track_item alloc, - unsigned long *map) + struct page *page, enum track_item alloc) { void *addr = page_address(page); void *p; + unsigned long *map; - bitmap_zero(map, page->objects); - get_map(s, page, map); - + map = get_map(s, page); for_each_object(p, s, addr, page->objects) if (!test_bit(slab_index(p, s, addr), map)) add_location(t, s, get_track(s, p, alloc)); + put_map(map); } static int list_locations(struct kmem_cache *s, char *buf, @@ -4612,11 +4621,9 @@ static int list_locations(struct kmem_cache *s, char *buf, struct loc_track t = { 0, 0, NULL }; int node; struct kmem_cache_node *n; - unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); - if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location), - GFP_KERNEL)) { - bitmap_free(map); + if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location), + GFP_KERNEL)) { return sprintf(buf, "Out of memory\n"); } /* Push back cpu slabs */ @@ -4631,9 +4638,9 @@ static int list_locations(struct kmem_cache *s, char *buf, spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) - process_slab(&t, s, page, alloc, map); + process_slab(&t, s, page, alloc); list_for_each_entry(page, &n->full, slab_list) - process_slab(&t, s, page, alloc, map); + process_slab(&t, s, page, alloc); spin_unlock_irqrestore(&n->list_lock, flags); } @@ -4682,7 +4689,6 @@ static int list_locations(struct kmem_cache *s, char *buf, } free_loc_track(&t); - bitmap_free(map); if (!t.count) len += sprintf(buf, "No data\n"); return len; -- 2.23.0.237.gc6a4ce50a0-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v3 1/2] mm: clean up validate_slab() 2019-09-14 0:07 ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao 2019-09-14 0:07 ` [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao @ 2019-09-16 8:39 ` Kirill A. Shutemov 2019-11-08 19:39 ` [PATCH v4 " Yu Zhao 2 siblings, 0 replies; 44+ messages in thread From: Kirill A. Shutemov @ 2019-09-16 8:39 UTC (permalink / raw) To: Yu Zhao Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel On Fri, Sep 13, 2019 at 06:07:42PM -0600, Yu Zhao wrote: > The function doesn't need to return any value, and the check can be > done in one pass. > > There is a behavior change: before the patch, we stop at the first > invalid free object; after the patch, we stop at the first invalid > object, free or in use. This shouldn't matter because the original > behavior isn't intended anyway. > > Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v4 1/2] mm: clean up validate_slab() 2019-09-14 0:07 ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao 2019-09-14 0:07 ` [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao 2019-09-16 8:39 ` [PATCH v3 1/2] mm: clean up validate_slab() Kirill A. Shutemov @ 2019-11-08 19:39 ` Yu Zhao 2019-11-08 19:39 ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao 2 siblings, 1 reply; 44+ messages in thread From: Yu Zhao @ 2019-11-08 19:39 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov The function doesn't need to return any value, and the check can be done in one pass. There is a behavior change: before the patch, we stop at the first invalid free object; after the patch, we stop at the first invalid object, free or in use. This shouldn't matter because the original behavior isn't intended anyway. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index b25c807a111f..6930c3febad7 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4404,31 +4404,26 @@ static int count_total(struct page *page) #endif #ifdef CONFIG_SLUB_DEBUG -static int validate_slab(struct kmem_cache *s, struct page *page, +static void validate_slab(struct kmem_cache *s, struct page *page, unsigned long *map) { void *p; void *addr = page_address(page); - if (!check_slab(s, page) || - !on_freelist(s, page, NULL)) - return 0; + if (!check_slab(s, page) || !on_freelist(s, page, NULL)) + return; /* Now we know that a valid freelist exists */ bitmap_zero(map, page->objects); get_map(s, page, map); for_each_object(p, s, addr, page->objects) { - if (test_bit(slab_index(p, s, addr), map)) - if (!check_object(s, page, p, SLUB_RED_INACTIVE)) - return 0; - } + u8 val = test_bit(slab_index(p, s, addr), map) ? + SLUB_RED_INACTIVE : SLUB_RED_ACTIVE; - for_each_object(p, s, addr, page->objects) - if (!test_bit(slab_index(p, s, addr), map)) - if (!check_object(s, page, p, SLUB_RED_ACTIVE)) - return 0; - return 1; + if (!check_object(s, page, p, val)) + break; + } } static void validate_slab_slab(struct kmem_cache *s, struct page *page, -- 2.24.0.rc1.363.gb1bccd3e3d-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock 2019-11-08 19:39 ` [PATCH v4 " Yu Zhao @ 2019-11-08 19:39 ` Yu Zhao 2019-11-09 20:52 ` Christopher Lameter 0 siblings, 1 reply; 44+ messages in thread From: Yu Zhao @ 2019-11-08 19:39 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov If we are already under list_lock, don't call kmalloc(). Otherwise we will run into deadlock because kmalloc() also tries to grab the same lock. Fixing the problem by using a static bitmap instead. WARNING: possible recursive locking detected -------------------------------------------- mount-encrypted/4921 is trying to acquire lock: (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 but task is already holding lock: (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&n->list_lock)->rlock); lock(&(&n->list_lock)->rlock); *** DEADLOCK *** Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 88 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 47 insertions(+), 41 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 6930c3febad7..7a4ec3c4b4d9 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -441,19 +441,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page, } #ifdef CONFIG_SLUB_DEBUG +static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)]; +static DEFINE_SPINLOCK(object_map_lock); + /* * Determine a map of object in use on a page. * * Node listlock must be held to guarantee that the page does * not vanish from under us. */ -static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map) +static unsigned long *get_map(struct kmem_cache *s, struct page *page) { void *p; void *addr = page_address(page); + VM_BUG_ON(!irqs_disabled()); + + spin_lock(&object_map_lock); + + bitmap_zero(object_map, page->objects); + for (p = page->freelist; p; p = get_freepointer(s, p)) - set_bit(slab_index(p, s, addr), map); + set_bit(slab_index(p, s, addr), object_map); + + return object_map; +} + +static void put_map(unsigned long *map) +{ + VM_BUG_ON(map != object_map); + lockdep_assert_held(&object_map_lock); + + spin_unlock(&object_map_lock); } static inline unsigned int size_from_object(struct kmem_cache *s) @@ -3695,13 +3714,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page, #ifdef CONFIG_SLUB_DEBUG void *addr = page_address(page); void *p; - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); - if (!map) - return; + unsigned long *map; + slab_err(s, page, text, s->name); slab_lock(page); - get_map(s, page, map); + map = get_map(s, page); for_each_object(p, s, addr, page->objects) { if (!test_bit(slab_index(p, s, addr), map)) { @@ -3709,8 +3727,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page, print_tracking(s, p); } } + put_map(map); + slab_unlock(page); - bitmap_free(map); #endif } @@ -4404,19 +4423,19 @@ static int count_total(struct page *page) #endif #ifdef CONFIG_SLUB_DEBUG -static void validate_slab(struct kmem_cache *s, struct page *page, - unsigned long *map) +static void validate_slab(struct kmem_cache *s, struct page *page) { void *p; void *addr = page_address(page); + unsigned long *map; + + slab_lock(page); if (!check_slab(s, page) || !on_freelist(s, page, NULL)) - return; + goto unlock; /* Now we know that a valid freelist exists */ - bitmap_zero(map, page->objects); - - get_map(s, page, map); + map = get_map(s, page); for_each_object(p, s, addr, page->objects) { u8 val = test_bit(slab_index(p, s, addr), map) ? SLUB_RED_INACTIVE : SLUB_RED_ACTIVE; @@ -4424,18 +4443,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page, if (!check_object(s, page, p, val)) break; } -} - -static void validate_slab_slab(struct kmem_cache *s, struct page *page, - unsigned long *map) -{ - slab_lock(page); - validate_slab(s, page, map); + put_map(map); +unlock: slab_unlock(page); } static int validate_slab_node(struct kmem_cache *s, - struct kmem_cache_node *n, unsigned long *map) + struct kmem_cache_node *n) { unsigned long count = 0; struct page *page; @@ -4444,7 +4458,7 @@ static int validate_slab_node(struct kmem_cache *s, spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) { - validate_slab_slab(s, page, map); + validate_slab(s, page); count++; } if (count != n->nr_partial) @@ -4455,7 +4469,7 @@ static int validate_slab_node(struct kmem_cache *s, goto out; list_for_each_entry(page, &n->full, slab_list) { - validate_slab_slab(s, page, map); + validate_slab(s, page); count++; } if (count != atomic_long_read(&n->nr_slabs)) @@ -4472,15 +4486,11 @@ static long validate_slab_cache(struct kmem_cache *s) int node; unsigned long count = 0; struct kmem_cache_node *n; - unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); - - if (!map) - return -ENOMEM; flush_all(s); for_each_kmem_cache_node(s, node, n) - count += validate_slab_node(s, n, map); - bitmap_free(map); + count += validate_slab_node(s, n); + return count; } /* @@ -4610,18 +4620,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s, } static void process_slab(struct loc_track *t, struct kmem_cache *s, - struct page *page, enum track_item alloc, - unsigned long *map) + struct page *page, enum track_item alloc) { void *addr = page_address(page); void *p; + unsigned long *map; - bitmap_zero(map, page->objects); - get_map(s, page, map); - + map = get_map(s, page); for_each_object(p, s, addr, page->objects) if (!test_bit(slab_index(p, s, addr), map)) add_location(t, s, get_track(s, p, alloc)); + put_map(map); } static int list_locations(struct kmem_cache *s, char *buf, @@ -4632,11 +4641,9 @@ static int list_locations(struct kmem_cache *s, char *buf, struct loc_track t = { 0, 0, NULL }; int node; struct kmem_cache_node *n; - unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); - if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location), - GFP_KERNEL)) { - bitmap_free(map); + if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location), + GFP_KERNEL)) { return sprintf(buf, "Out of memory\n"); } /* Push back cpu slabs */ @@ -4651,9 +4658,9 @@ static int list_locations(struct kmem_cache *s, char *buf, spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) - process_slab(&t, s, page, alloc, map); + process_slab(&t, s, page, alloc); list_for_each_entry(page, &n->full, slab_list) - process_slab(&t, s, page, alloc, map); + process_slab(&t, s, page, alloc); spin_unlock_irqrestore(&n->list_lock, flags); } @@ -4702,7 +4709,6 @@ static int list_locations(struct kmem_cache *s, char *buf, } free_loc_track(&t); - bitmap_free(map); if (!t.count) len += sprintf(buf, "No data\n"); return len; -- 2.24.0.rc1.363.gb1bccd3e3d-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock 2019-11-08 19:39 ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao @ 2019-11-09 20:52 ` Christopher Lameter 2019-11-09 23:01 ` Yu Zhao 0 siblings, 1 reply; 44+ messages in thread From: Christopher Lameter @ 2019-11-09 20:52 UTC (permalink / raw) To: Yu Zhao Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel, Kirill A . Shutemov On Fri, 8 Nov 2019, Yu Zhao wrote: > If we are already under list_lock, don't call kmalloc(). Otherwise we > will run into deadlock because kmalloc() also tries to grab the same > lock. How did this happen? The kmalloc needs to be always done before the list_lock is taken. > Fixing the problem by using a static bitmap instead. > > WARNING: possible recursive locking detected > -------------------------------------------- > mount-encrypted/4921 is trying to acquire lock: > (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > > but task is already holding lock: > (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&(&n->list_lock)->rlock); > lock(&(&n->list_lock)->rlock); > > *** DEADLOCK *** Ahh. list_slab_objects() in shutdown? There is a much easier fix for this: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() list_slab_objects() is called when a slab is destroyed and there are objects still left to list the objects in the syslog. This is a pretty rare event. And there it seems we take the list_lock and call kmalloc while holding that lock. Perform the allocation in free_partial() before the list_lock is taken. Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow") Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/slub.c =================================================================== --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000 +++ linux/mm/slub.c 2019-11-09 20:43:52.374187381 +0000 @@ -3690,14 +3690,11 @@ error: } static void list_slab_objects(struct kmem_cache *s, struct page *page, - const char *text) + const char *text, unsigned long *map) { #ifdef CONFIG_SLUB_DEBUG void *addr = page_address(page); void *p; - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); - if (!map) - return; slab_err(s, page, text, s->name); slab_lock(page); @@ -3723,6 +3720,10 @@ static void free_partial(struct kmem_cac { LIST_HEAD(discard); struct page *page, *h; + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); + + if (!map) + return; BUG_ON(irqs_disabled()); spin_lock_irq(&n->list_lock); @@ -3732,7 +3733,8 @@ static void free_partial(struct kmem_cac list_add(&page->slab_list, &discard); } else { list_slab_objects(s, page, - "Objects remaining in %s on __kmem_cache_shutdown()"); + "Objects remaining in %s on __kmem_cache_shutdown()", + map); } } spin_unlock_irq(&n->list_lock); ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock 2019-11-09 20:52 ` Christopher Lameter @ 2019-11-09 23:01 ` Yu Zhao 2019-11-09 23:16 ` Christopher Lameter 0 siblings, 1 reply; 44+ messages in thread From: Yu Zhao @ 2019-11-09 23:01 UTC (permalink / raw) To: Christopher Lameter Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel, Kirill A . Shutemov On Sat, Nov 09, 2019 at 08:52:29PM +0000, Christopher Lameter wrote: > On Fri, 8 Nov 2019, Yu Zhao wrote: > > > If we are already under list_lock, don't call kmalloc(). Otherwise we > > will run into deadlock because kmalloc() also tries to grab the same > > lock. > > How did this happen? The kmalloc needs to be always done before the > list_lock is taken. > > > Fixing the problem by using a static bitmap instead. > > > > WARNING: possible recursive locking detected > > -------------------------------------------- > > mount-encrypted/4921 is trying to acquire lock: > > (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > > > > but task is already holding lock: > > (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > > > > other info that might help us debug this: > > Possible unsafe locking scenario: > > > > CPU0 > > ---- > > lock(&(&n->list_lock)->rlock); > > lock(&(&n->list_lock)->rlock); > > > > *** DEADLOCK *** > > > Ahh. list_slab_objects() in shutdown? > > There is a much easier fix for this: > > > > [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() > > list_slab_objects() is called when a slab is destroyed and there are objects still left > to list the objects in the syslog. This is a pretty rare event. > > And there it seems we take the list_lock and call kmalloc while holding that lock. > > Perform the allocation in free_partial() before the list_lock is taken. > > Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow") > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/mm/slub.c > =================================================================== > --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000 > +++ linux/mm/slub.c 2019-11-09 20:43:52.374187381 +0000 > @@ -3690,14 +3690,11 @@ error: > } > > static void list_slab_objects(struct kmem_cache *s, struct page *page, > - const char *text) > + const char *text, unsigned long *map) > { > #ifdef CONFIG_SLUB_DEBUG > void *addr = page_address(page); > void *p; > - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); > - if (!map) > - return; > slab_err(s, page, text, s->name); > slab_lock(page); > > @@ -3723,6 +3720,10 @@ static void free_partial(struct kmem_cac > { > LIST_HEAD(discard); > struct page *page, *h; > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); > + > + if (!map) > + return; What would happen if we are trying to allocate from the slab that is being shut down? And shouldn't the allocation be conditional (i.e., only when CONFIG_SLUB_DEBUG=y)? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock 2019-11-09 23:01 ` Yu Zhao @ 2019-11-09 23:16 ` Christopher Lameter 2019-11-10 18:47 ` Yu Zhao 0 siblings, 1 reply; 44+ messages in thread From: Christopher Lameter @ 2019-11-09 23:16 UTC (permalink / raw) To: Yu Zhao Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel, Kirill A . Shutemov On Sat, 9 Nov 2019, Yu Zhao wrote: > > struct page *page, *h; > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); > > + > > + if (!map) > > + return; > > What would happen if we are trying to allocate from the slab that is > being shut down? And shouldn't the allocation be conditional (i.e., > only when CONFIG_SLUB_DEBUG=y)? Kmalloc slabs are never shut down. The allocation does not hurt and CONFIG_SLUB_DEBUG is on in most configurations. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock 2019-11-09 23:16 ` Christopher Lameter @ 2019-11-10 18:47 ` Yu Zhao 2019-11-11 15:47 ` Christopher Lameter 0 siblings, 1 reply; 44+ messages in thread From: Yu Zhao @ 2019-11-10 18:47 UTC (permalink / raw) To: Christopher Lameter Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel, Kirill A . Shutemov On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote: > On Sat, 9 Nov 2019, Yu Zhao wrote: > > > > struct page *page, *h; > > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); > > > + > > > + if (!map) > > > + return; > > > > What would happen if we are trying to allocate from the slab that is > > being shut down? And shouldn't the allocation be conditional (i.e., > > only when CONFIG_SLUB_DEBUG=y)? > > Kmalloc slabs are never shut down. Maybe I'm not thinking straight -- isn't it what caused the deadlock in the first place? Kmalloc slabs can be shut down when memcg is on. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock 2019-11-10 18:47 ` Yu Zhao @ 2019-11-11 15:47 ` Christopher Lameter 2019-11-11 15:55 ` [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 Christopher Lameter 2019-11-11 18:15 ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Shakeel Butt 0 siblings, 2 replies; 44+ messages in thread From: Christopher Lameter @ 2019-11-11 15:47 UTC (permalink / raw) To: Yu Zhao Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel, Kirill A . Shutemov On Sun, 10 Nov 2019, Yu Zhao wrote: > On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote: > > On Sat, 9 Nov 2019, Yu Zhao wrote: > > > > > > struct page *page, *h; > > > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); > > > > + > > > > + if (!map) > > > > + return; > > > > > > What would happen if we are trying to allocate from the slab that is > > > being shut down? And shouldn't the allocation be conditional (i.e., > > > only when CONFIG_SLUB_DEBUG=y)? > > > > Kmalloc slabs are never shut down. > > Maybe I'm not thinking straight -- isn't it what caused the deadlock in > the first place? Well if kmalloc allocations become a problem then we have numerous issues all over the kernel to fix. > Kmalloc slabs can be shut down when memcg is on. Kmalloc needs to work even during shutdown of a memcg. Maybe we need to fix memcg to not allocate from the current memcg during shutdown? ^ permalink raw reply [flat|nested] 44+ messages in thread
* [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 2019-11-11 15:47 ` Christopher Lameter @ 2019-11-11 15:55 ` Christopher Lameter 2019-11-30 23:09 ` Andrew Morton 2019-11-11 18:15 ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Shakeel Butt 1 sibling, 1 reply; 44+ messages in thread From: Christopher Lameter @ 2019-11-11 15:55 UTC (permalink / raw) To: Yu Zhao Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel, Kirill A . Shutemov Regardless of the issue with memcgs allowing allocations from its kmalloc array during shutdown: This patch cleans things up and properly allocates the bitmap outside of the list_lock. [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 V1->V2 : Properly handle CONFIG_SLUB_DEBUG. Handle bitmap free correctly. list_slab_objects() is called when a slab is destroyed and there are objects still left to list the objects in the syslog. This is a pretty rare event. And there it seems we take the list_lock and call kmalloc while holding that lock. Perform the allocation in free_partial() before the list_lock is taken. Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow") Signed-off-by: Christoph Lameter Index: linux/mm/slub.c =================================================================== --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000 +++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000 @@ -3690,14 +3690,15 @@ error: } static void list_slab_objects(struct kmem_cache *s, struct page *page, - const char *text) + const char *text, unsigned long *map) { #ifdef CONFIG_SLUB_DEBUG void *addr = page_address(page); void *p; - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); + if (!map) return; + slab_err(s, page, text, s->name); slab_lock(page); @@ -3710,7 +3711,6 @@ static void list_slab_objects(struct kme } } slab_unlock(page); - bitmap_free(map); #endif } @@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac { LIST_HEAD(discard); struct page *page, *h; + unsigned long *map = NULL; + +#ifdef CONFIG_SLUB_DEBUG + map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); +#endif BUG_ON(irqs_disabled()); spin_lock_irq(&n->list_lock); @@ -3732,11 +3737,16 @@ static void free_partial(struct kmem_cac list_add(&page->slab_list, &discard); } else { list_slab_objects(s, page, - "Objects remaining in %s on __kmem_cache_shutdown()"); + "Objects remaining in %s on __kmem_cache_shutdown()", + map); } } spin_unlock_irq(&n->list_lock); +#ifdef CONFIG_SLUB_DEBUG + bitmap_free(map); +#endif + list_for_each_entry_safe(page, h, &discard, slab_list) discard_slab(s, page); } ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 2019-11-11 15:55 ` [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 Christopher Lameter @ 2019-11-30 23:09 ` Andrew Morton 2019-12-01 1:17 ` Tetsuo Handa 2019-12-02 15:12 ` Christopher Lameter 0 siblings, 2 replies; 44+ messages in thread From: Andrew Morton @ 2019-11-30 23:09 UTC (permalink / raw) To: Christopher Lameter Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim, Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel, Kirill A . Shutemov On Mon, 11 Nov 2019 15:55:05 +0000 (UTC) Christopher Lameter <cl@linux.com> wrote: > Regardless of the issue with memcgs allowing allocations from its > kmalloc array during shutdown: This patch cleans things up and properly > allocates the bitmap outside of the list_lock. > > > [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 > > V1->V2 : Properly handle CONFIG_SLUB_DEBUG. Handle bitmap free correctly. > > list_slab_objects() is called when a slab is destroyed and there are objects still left > to list the objects in the syslog. This is a pretty rare event. > > And there it seems we take the list_lock and call kmalloc while holding that lock. > > Perform the allocation in free_partial() before the list_lock is taken. No response here? It looks a lot simpler than the originally proposed patch? > --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000 > +++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000 > @@ -3690,14 +3690,15 @@ error: > } > > static void list_slab_objects(struct kmem_cache *s, struct page *page, > - const char *text) > + const char *text, unsigned long *map) > { > #ifdef CONFIG_SLUB_DEBUG > void *addr = page_address(page); > void *p; > - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); > + > if (!map) > return; > + > slab_err(s, page, text, s->name); > slab_lock(page); > > @@ -3710,7 +3711,6 @@ static void list_slab_objects(struct kme > } > } > slab_unlock(page); > - bitmap_free(map); > #endif > } > > @@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac > { > LIST_HEAD(discard); > struct page *page, *h; > + unsigned long *map = NULL; > + > +#ifdef CONFIG_SLUB_DEBUG > + map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); > +#endif > > BUG_ON(irqs_disabled()); > spin_lock_irq(&n->list_lock); > @@ -3732,11 +3737,16 @@ static void free_partial(struct kmem_cac > list_add(&page->slab_list, &discard); > } else { > list_slab_objects(s, page, > - "Objects remaining in %s on __kmem_cache_shutdown()"); > + "Objects remaining in %s on __kmem_cache_shutdown()", > + map); > } > } > spin_unlock_irq(&n->list_lock); > > +#ifdef CONFIG_SLUB_DEBUG > + bitmap_free(map); > +#endif > + > list_for_each_entry_safe(page, h, &discard, slab_list) > discard_slab(s, page); > } ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 2019-11-30 23:09 ` Andrew Morton @ 2019-12-01 1:17 ` Tetsuo Handa 2019-12-02 15:12 ` Christopher Lameter 1 sibling, 0 replies; 44+ messages in thread From: Tetsuo Handa @ 2019-12-01 1:17 UTC (permalink / raw) To: Andrew Morton, Christopher Lameter Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim, Kirill A . Shutemov, linux-mm, linux-kernel, Kirill A . Shutemov On 2019/12/01 8:09, Andrew Morton wrote: >> Perform the allocation in free_partial() before the list_lock is taken. > > No response here? It looks a lot simpler than the originally proposed > patch? > >> --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000 >> +++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000 >> @@ -3690,14 +3690,15 @@ error: >> } >> >> static void list_slab_objects(struct kmem_cache *s, struct page *page, >> - const char *text) >> + const char *text, unsigned long *map) >> { >> #ifdef CONFIG_SLUB_DEBUG >> void *addr = page_address(page); >> void *p; >> - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); Changing from !(__GFP_IO | __GFP_FS) allocation to >> + >> if (!map) >> return; >> + >> slab_err(s, page, text, s->name); >> slab_lock(page); >> >> @@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac >> { >> LIST_HEAD(discard); >> struct page *page, *h; >> + unsigned long *map = NULL; >> + >> +#ifdef CONFIG_SLUB_DEBUG >> + map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); __GFP_IO | __GFP_FS allocation. How is this path guaranteed to be safe to perform __GFP_IO | __GFP_FS reclaim? >> +#endif >> >> BUG_ON(irqs_disabled()); >> spin_lock_irq(&n->list_lock); ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 2019-11-30 23:09 ` Andrew Morton 2019-12-01 1:17 ` Tetsuo Handa @ 2019-12-02 15:12 ` Christopher Lameter 2019-12-07 22:03 ` Yu Zhao 1 sibling, 1 reply; 44+ messages in thread From: Christopher Lameter @ 2019-12-02 15:12 UTC (permalink / raw) To: Andrew Morton Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim, Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel, Kirill A . Shutemov On Sat, 30 Nov 2019, Andrew Morton wrote: > > Perform the allocation in free_partial() before the list_lock is taken. > > No response here? It looks a lot simpler than the originally proposed > patch? Yup. I prefer this one but its my own patch so I cannot Ack this. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 2019-12-02 15:12 ` Christopher Lameter @ 2019-12-07 22:03 ` Yu Zhao 2020-01-10 14:11 ` Vlastimil Babka 0 siblings, 1 reply; 44+ messages in thread From: Yu Zhao @ 2019-12-07 22:03 UTC (permalink / raw) To: Christopher Lameter Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel, Kirill A . Shutemov On Mon, Dec 02, 2019 at 03:12:20PM +0000, Christopher Lameter wrote: > On Sat, 30 Nov 2019, Andrew Morton wrote: > > > > Perform the allocation in free_partial() before the list_lock is taken. > > > > No response here? It looks a lot simpler than the originally proposed > > patch? > > Yup. I prefer this one but its my own patch so I cannot Ack this. Hi, there is a pending question from Tetsuo-san. I'd be happy to ack once it's address. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 2019-12-07 22:03 ` Yu Zhao @ 2020-01-10 14:11 ` Vlastimil Babka 2020-01-12 11:03 ` Tetsuo Handa 0 siblings, 1 reply; 44+ messages in thread From: Vlastimil Babka @ 2020-01-10 14:11 UTC (permalink / raw) To: Yu Zhao, Christopher Lameter Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel, Kirill A . Shutemov On 12/7/19 11:03 PM, Yu Zhao wrote: > On Mon, Dec 02, 2019 at 03:12:20PM +0000, Christopher Lameter wrote: >> On Sat, 30 Nov 2019, Andrew Morton wrote: >> >>>> Perform the allocation in free_partial() before the list_lock is taken. >>> >>> No response here? It looks a lot simpler than the originally proposed >>> patch? >> >> Yup. I prefer this one but its my own patch so I cannot Ack this. > > Hi, there is a pending question from Tetsuo-san. I'd be happy to ack > once it's address. Tetsuo's mails don't reach linux-mm for a while and he has given up trying to do something about it. It's hard to discuss anything outside the direct CC group then. I don't know what's the pending question, for example. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 2020-01-10 14:11 ` Vlastimil Babka @ 2020-01-12 11:03 ` Tetsuo Handa 2020-01-13 1:34 ` Christopher Lameter 0 siblings, 1 reply; 44+ messages in thread From: Tetsuo Handa @ 2020-01-12 11:03 UTC (permalink / raw) To: Vlastimil Babka, Yu Zhao, Christopher Lameter Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Kirill A . Shutemov, linux-mm, linux-kernel, Kirill A . Shutemov On 2020/01/10 23:11, Vlastimil Babka wrote: > On 12/7/19 11:03 PM, Yu Zhao wrote: >> On Mon, Dec 02, 2019 at 03:12:20PM +0000, Christopher Lameter wrote: >>> On Sat, 30 Nov 2019, Andrew Morton wrote: >>> >>>>> Perform the allocation in free_partial() before the list_lock is taken. >>>> >>>> No response here? It looks a lot simpler than the originally proposed >>>> patch? >>> >>> Yup. I prefer this one but its my own patch so I cannot Ack this. >> >> Hi, there is a pending question from Tetsuo-san. I'd be happy to ack >> once it's address. > > Tetsuo's mails don't reach linux-mm for a while and he has given up > trying to do something about it. It's hard to discuss anything outside > the direct CC group then. I don't know what's the pending question, for > example. > Hmm, this one? Even non-ML destinations are sometimes rejected (e.g. 554 5.7.1 Service unavailable; Client host [202.181.97.72] blocked using b.barracudacentral.org; http://www.barracudanetworks.com/reputation/?pr=1&ip=202.181.97.72 ). Anyway, I just worried whether it is really safe to do memory allocation which might involve memory reclaim. You MM guys know better... -------- Forwarded Message -------- Subject: Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 Message-ID: <54b6c6a1-f9e4-5002-c828-3084c5203489@i-love.sakura.ne.jp> Date: Sun, 1 Dec 2019 10:17:38 +0900 On 2019/12/01 8:09, Andrew Morton wrote: >> Perform the allocation in free_partial() before the list_lock is taken. > > No response here? It looks a lot simpler than the originally proposed > patch? > >> --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000 >> +++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000 >> @@ -3690,14 +3690,15 @@ error: >> } >> >> static void list_slab_objects(struct kmem_cache *s, struct page *page, >> - const char *text) >> + const char *text, unsigned long *map) >> { >> #ifdef CONFIG_SLUB_DEBUG >> void *addr = page_address(page); >> void *p; >> - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); Changing from !(__GFP_IO | __GFP_FS) allocation to >> + >> if (!map) >> return; >> + >> slab_err(s, page, text, s->name); >> slab_lock(page); >> >> @@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac >> { >> LIST_HEAD(discard); >> struct page *page, *h; >> + unsigned long *map = NULL; >> + >> +#ifdef CONFIG_SLUB_DEBUG >> + map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); __GFP_IO | __GFP_FS allocation. How is this path guaranteed to be safe to perform __GFP_IO | __GFP_FS reclaim? >> +#endif >> >> BUG_ON(irqs_disabled()); >> spin_lock_irq(&n->list_lock); ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 2020-01-12 11:03 ` Tetsuo Handa @ 2020-01-13 1:34 ` Christopher Lameter 0 siblings, 0 replies; 44+ messages in thread From: Christopher Lameter @ 2020-01-13 1:34 UTC (permalink / raw) To: Tetsuo Handa Cc: Vlastimil Babka, Yu Zhao, Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Kirill A . Shutemov, linux-mm, linux-kernel, Kirill A . Shutemov On Sun, 12 Jan 2020, Tetsuo Handa wrote: > On 2020/01/10 23:11, Vlastimil Babka wrote: > Hmm, this one? Even non-ML destinations are sometimes rejected (e.g. > 554 5.7.1 Service unavailable; Client host [202.181.97.72] blocked using b.barracudacentral.org; http://www.barracudanetworks.com/reputation/?pr=1&ip=202.181.97.72 > ). Anyway, I just worried whether it is really safe to do memory allocation > which might involve memory reclaim. You MM guys know better... We are talking about a call to destroy the kmem_cache. This is not done under any lock. The lock was taken inside that function before the call to list_slab_objects. That can be avoided. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock 2019-11-11 15:47 ` Christopher Lameter 2019-11-11 15:55 ` [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 Christopher Lameter @ 2019-11-11 18:15 ` Shakeel Butt 1 sibling, 0 replies; 44+ messages in thread From: Shakeel Butt @ 2019-11-11 18:15 UTC (permalink / raw) To: Christopher Lameter, Roman Gushchin Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa, Linux MM, LKML, Kirill A . Shutemov +Roman Gushchin On Mon, Nov 11, 2019 at 7:47 AM Christopher Lameter <cl@linux.com> wrote: > > On Sun, 10 Nov 2019, Yu Zhao wrote: > > > On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote: > > > On Sat, 9 Nov 2019, Yu Zhao wrote: > > > > > > > > struct page *page, *h; > > > > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); > > > > > + > > > > > + if (!map) > > > > > + return; > > > > > > > > What would happen if we are trying to allocate from the slab that is > > > > being shut down? And shouldn't the allocation be conditional (i.e., > > > > only when CONFIG_SLUB_DEBUG=y)? > > > > > > Kmalloc slabs are never shut down. > > > > Maybe I'm not thinking straight -- isn't it what caused the deadlock in > > the first place? > > Well if kmalloc allocations become a problem then we have numerous > issues all over the kernel to fix. > > > Kmalloc slabs can be shut down when memcg is on. > > Kmalloc needs to work even during shutdown of a memcg. > > Maybe we need to fix memcg to not allocate from the current memcg during > shutdown? > > Roman recently added reparenting of memcg kmem caches on memcg offline and can comment in more detail but we don't shutdown a kmem cache until all the in-fly memcg allocations are resolved. Also the allocation here does not look like a __GFP_ACCOUNT allocation. ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 3/3] mm: lock slub page when listing objects 2019-09-12 0:29 ` [PATCH 1/3] mm: correct mask size for slub page->objects Yu Zhao 2019-09-12 0:29 ` [PATCH 2/3] mm: avoid slub allocation while holding list_lock Yu Zhao @ 2019-09-12 0:29 ` Yu Zhao 1 sibling, 0 replies; 44+ messages in thread From: Yu Zhao @ 2019-09-12 0:29 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Kirill A . Shutemov, Tetsuo Handa Cc: linux-mm, linux-kernel, Yu Zhao Though I have no idea what the side effect of a race would be, apparently we want to prevent the free list from being changed while debugging objects in general. Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/slub.c b/mm/slub.c index f28072c9f2ce..2734a092bbff 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4594,10 +4594,14 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s, void *addr = page_address(page); void *p; + slab_lock(page); + get_map(s, page, map); for_each_object(p, s, addr, page->objects) if (!test_bit(slab_index(p, s, addr), map)) add_location(t, s, get_track(s, p, alloc)); + + slab_unlock(page); } static int list_locations(struct kmem_cache *s, char *buf, -- 2.23.0.162.g0b9fbb3734-goog ^ permalink raw reply related [flat|nested] 44+ messages in thread
end of thread, other threads:[~2020-01-13 1:34 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-09 6:10 [PATCH] mm: avoid slub allocation while holding list_lock Yu Zhao 2019-09-09 16:00 ` Kirill A. Shutemov 2019-09-09 20:57 ` Tetsuo Handa 2019-09-09 21:39 ` Yu Zhao 2019-09-10 1:41 ` Tetsuo Handa 2019-09-10 2:16 ` Yu Zhao 2019-09-10 9:16 ` Kirill A. Shutemov 2019-09-11 14:13 ` Andrew Morton 2019-09-12 0:29 ` [PATCH 1/3] mm: correct mask size for slub page->objects Yu Zhao 2019-09-12 0:29 ` [PATCH 2/3] mm: avoid slub allocation while holding list_lock Yu Zhao 2019-09-12 0:44 ` Kirill A. Shutemov 2019-09-12 1:31 ` Yu Zhao 2019-09-12 2:31 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Yu Zhao 2019-09-12 2:31 ` [PATCH v2 2/4] mm: clean up validate_slab() Yu Zhao 2019-09-12 9:46 ` Kirill A. Shutemov 2019-09-12 2:31 ` [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock Yu Zhao 2019-09-12 10:04 ` Kirill A. Shutemov 2019-09-12 2:31 ` [PATCH v2 4/4] mm: lock slub page when listing objects Yu Zhao 2019-09-12 10:06 ` Kirill A. Shutemov 2019-09-12 21:12 ` Yu Zhao 2019-09-13 14:58 ` Christopher Lameter 2019-09-12 9:40 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Kirill A. Shutemov 2019-09-12 21:11 ` Yu Zhao 2019-09-12 22:03 ` Kirill A. Shutemov 2019-09-14 0:07 ` [PATCH v3 1/2] mm: clean up validate_slab() Yu Zhao 2019-09-14 0:07 ` [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao 2019-09-16 8:39 ` [PATCH v3 1/2] mm: clean up validate_slab() Kirill A. Shutemov 2019-11-08 19:39 ` [PATCH v4 " Yu Zhao 2019-11-08 19:39 ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Yu Zhao 2019-11-09 20:52 ` Christopher Lameter 2019-11-09 23:01 ` Yu Zhao 2019-11-09 23:16 ` Christopher Lameter 2019-11-10 18:47 ` Yu Zhao 2019-11-11 15:47 ` Christopher Lameter 2019-11-11 15:55 ` [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2 Christopher Lameter 2019-11-30 23:09 ` Andrew Morton 2019-12-01 1:17 ` Tetsuo Handa 2019-12-02 15:12 ` Christopher Lameter 2019-12-07 22:03 ` Yu Zhao 2020-01-10 14:11 ` Vlastimil Babka 2020-01-12 11:03 ` Tetsuo Handa 2020-01-13 1:34 ` Christopher Lameter 2019-11-11 18:15 ` [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock Shakeel Butt 2019-09-12 0:29 ` [PATCH 3/3] mm: lock slub page when listing objects Yu Zhao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).