* [PATCH v2 1/4] mm: correct mask size for slub page->objects
@ 2019-09-12 2:31 ` Yu Zhao
0 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-12 2:31 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao
Mask of slub objects per page shouldn't be larger than what
page->objects can hold.
It requires more than 2^15 objects to hit the problem, and I don't
think anybody would. It'd be nice to have the mask fixed, but not
really worth cc'ing the stable.
Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..62053ceb4464 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -187,7 +187,7 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
*/
#define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
-#define OO_SHIFT 16
+#define OO_SHIFT 15
#define OO_MASK ((1 << OO_SHIFT) - 1)
#define MAX_OBJS_PER_PAGE 32767 /* since page.objects is u15 */
@@ -343,6 +343,8 @@ static inline unsigned int oo_order(struct kmem_cache_order_objects x)
static inline unsigned int oo_objects(struct kmem_cache_order_objects x)
{
+ BUILD_BUG_ON(OO_MASK > MAX_OBJS_PER_PAGE);
+
return x.x & OO_MASK;
}
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH v2 2/4] mm: clean up validate_slab()
2019-09-12 2:31 ` Yu Zhao
@ 2019-09-12 2:31 ` Yu Zhao
-1 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-12 2:31 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao
The function doesn't need to return any value, and the check can be
done in one pass.
There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 62053ceb4464..7b7e1ee264ef 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4386,31 +4386,26 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
unsigned long *map)
{
void *p;
void *addr = page_address(page);
- if (!check_slab(s, page) ||
- !on_freelist(s, page, NULL))
- return 0;
+ if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+ return;
/* Now we know that a valid freelist exists */
bitmap_zero(map, page->objects);
get_map(s, page, map);
for_each_object(p, s, addr, page->objects) {
- if (test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_INACTIVE))
- return 0;
- }
+ u8 val = test_bit(slab_index(p, s, addr), map) ?
+ SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
- for_each_object(p, s, addr, page->objects)
- if (!test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_ACTIVE))
- return 0;
- return 1;
+ if (!check_object(s, page, p, val))
+ break;
+ }
}
static void validate_slab_slab(struct kmem_cache *s, struct page *page,
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH v2 2/4] mm: clean up validate_slab()
@ 2019-09-12 2:31 ` Yu Zhao
0 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-12 2:31 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao
The function doesn't need to return any value, and the check can be
done in one pass.
There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 62053ceb4464..7b7e1ee264ef 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4386,31 +4386,26 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
unsigned long *map)
{
void *p;
void *addr = page_address(page);
- if (!check_slab(s, page) ||
- !on_freelist(s, page, NULL))
- return 0;
+ if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+ return;
/* Now we know that a valid freelist exists */
bitmap_zero(map, page->objects);
get_map(s, page, map);
for_each_object(p, s, addr, page->objects) {
- if (test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_INACTIVE))
- return 0;
- }
+ u8 val = test_bit(slab_index(p, s, addr), map) ?
+ SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
- for_each_object(p, s, addr, page->objects)
- if (!test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_ACTIVE))
- return 0;
- return 1;
+ if (!check_object(s, page, p, val))
+ break;
+ }
}
static void validate_slab_slab(struct kmem_cache *s, struct page *page,
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: [PATCH v2 2/4] mm: clean up validate_slab()
2019-09-12 2:31 ` Yu Zhao
(?)
@ 2019-09-12 9:46 ` Kirill A. Shutemov
-1 siblings, 0 replies; 64+ messages in thread
From: Kirill A. Shutemov @ 2019-09-12 9:46 UTC (permalink / raw)
To: Yu Zhao
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel
On Wed, Sep 11, 2019 at 08:31:09PM -0600, Yu Zhao wrote:
> The function doesn't need to return any value, and the check can be
> done in one pass.
>
> There is a behavior change: before the patch, we stop at the first
> invalid free object; after the patch, we stop at the first invalid
> object, free or in use. This shouldn't matter because the original
> behavior isn't intended anyway.
>
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> ---
> mm/slub.c | 21 ++++++++-------------
> 1 file changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 62053ceb4464..7b7e1ee264ef 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4386,31 +4386,26 @@ static int count_total(struct page *page)
> #endif
>
> #ifdef CONFIG_SLUB_DEBUG
> -static int validate_slab(struct kmem_cache *s, struct page *page,
> +static void validate_slab(struct kmem_cache *s, struct page *page,
> unsigned long *map)
> {
> void *p;
> void *addr = page_address(page);
>
> - if (!check_slab(s, page) ||
> - !on_freelist(s, page, NULL))
> - return 0;
> + if (!check_slab(s, page) || !on_freelist(s, page, NULL))
> + return;
>
> /* Now we know that a valid freelist exists */
> bitmap_zero(map, page->objects);
>
> get_map(s, page, map);
> for_each_object(p, s, addr, page->objects) {
> - if (test_bit(slab_index(p, s, addr), map))
> - if (!check_object(s, page, p, SLUB_RED_INACTIVE))
> - return 0;
> - }
> + u8 val = test_bit(slab_index(p, s, addr), map) ?
> + SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
Proper 'if' would be more readable.
Other than that look fine to me.
>
> - for_each_object(p, s, addr, page->objects)
> - if (!test_bit(slab_index(p, s, addr), map))
> - if (!check_object(s, page, p, SLUB_RED_ACTIVE))
> - return 0;
> - return 1;
> + if (!check_object(s, page, p, val))
> + break;
> + }
> }
>
> static void validate_slab_slab(struct kmem_cache *s, struct page *page,
> --
> 2.23.0.162.g0b9fbb3734-goog
>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock
2019-09-12 2:31 ` Yu Zhao
@ 2019-09-12 2:31 ` Yu Zhao
-1 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-12 2:31 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao
If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.
Fixing the problem by using a static bitmap instead.
WARNING: possible recursive locking detected
--------------------------------------------
mount-encrypted/4921 is trying to acquire lock:
(&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
but task is already holding lock:
(&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&n->list_lock)->rlock);
lock(&(&n->list_lock)->rlock);
*** DEADLOCK ***
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
1 file changed, 47 insertions(+), 41 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 7b7e1ee264ef..baa60dd73942 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -443,19 +443,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
}
#ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
/*
* Determine a map of object in use on a page.
*
* Node listlock must be held to guarantee that the page does
* not vanish from under us.
*/
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ VM_BUG_ON(!irqs_disabled());
+
+ spin_lock(&object_map_lock);
+
+ bitmap_zero(object_map, page->objects);
+
for (p = page->freelist; p; p = get_freepointer(s, p))
- set_bit(slab_index(p, s, addr), map);
+ set_bit(slab_index(p, s, addr), object_map);
+
+ return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+ VM_BUG_ON(map != object_map);
+ lockdep_assert_held(&object_map_lock);
+
+ spin_unlock(&object_map_lock);
}
static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3685,13 +3704,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
+ unsigned long *map;
+
slab_err(s, page, text, s->name);
slab_lock(page);
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3699,8 +3717,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
print_tracking(s, p);
}
}
+ put_map(map);
+
slab_unlock(page);
- bitmap_free(map);
#endif
}
@@ -4386,19 +4405,19 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ unsigned long *map;
+
+ slab_lock(page);
if (!check_slab(s, page) || !on_freelist(s, page, NULL))
- return;
+ goto unlock;
/* Now we know that a valid freelist exists */
- bitmap_zero(map, page->objects);
-
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
u8 val = test_bit(slab_index(p, s, addr), map) ?
SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4406,18 +4425,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
if (!check_object(s, page, p, val))
break;
}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
-{
- slab_lock(page);
- validate_slab(s, page, map);
+ put_map(map);
+unlock:
slab_unlock(page);
}
static int validate_slab_node(struct kmem_cache *s,
- struct kmem_cache_node *n, unsigned long *map)
+ struct kmem_cache_node *n)
{
unsigned long count = 0;
struct page *page;
@@ -4426,7 +4440,7 @@ static int validate_slab_node(struct kmem_cache *s,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != n->nr_partial)
@@ -4437,7 +4451,7 @@ static int validate_slab_node(struct kmem_cache *s,
goto out;
list_for_each_entry(page, &n->full, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != atomic_long_read(&n->nr_slabs))
@@ -4454,15 +4468,11 @@ static long validate_slab_cache(struct kmem_cache *s)
int node;
unsigned long count = 0;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
- if (!map)
- return -ENOMEM;
flush_all(s);
for_each_kmem_cache_node(s, node, n)
- count += validate_slab_node(s, n, map);
- bitmap_free(map);
+ count += validate_slab_node(s, n);
+
return count;
}
/*
@@ -4592,18 +4602,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
}
static void process_slab(struct loc_track *t, struct kmem_cache *s,
- struct page *page, enum track_item alloc,
- unsigned long *map)
+ struct page *page, enum track_item alloc)
{
void *addr = page_address(page);
void *p;
+ unsigned long *map;
- bitmap_zero(map, page->objects);
- get_map(s, page, map);
-
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
+ put_map(map);
}
static int list_locations(struct kmem_cache *s, char *buf,
@@ -4614,11 +4623,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
struct loc_track t = { 0, 0, NULL };
int node;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
- if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
- GFP_KERNEL)) {
- bitmap_free(map);
+ if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+ GFP_KERNEL)) {
return sprintf(buf, "Out of memory\n");
}
/* Push back cpu slabs */
@@ -4633,9 +4640,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
list_for_each_entry(page, &n->full, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
spin_unlock_irqrestore(&n->list_lock, flags);
}
@@ -4684,7 +4691,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
}
free_loc_track(&t);
- bitmap_free(map);
if (!t.count)
len += sprintf(buf, "No data\n");
return len;
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock
@ 2019-09-12 2:31 ` Yu Zhao
0 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-12 2:31 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao
If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.
Fixing the problem by using a static bitmap instead.
WARNING: possible recursive locking detected
--------------------------------------------
mount-encrypted/4921 is trying to acquire lock:
(&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
but task is already holding lock:
(&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&n->list_lock)->rlock);
lock(&(&n->list_lock)->rlock);
*** DEADLOCK ***
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
1 file changed, 47 insertions(+), 41 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 7b7e1ee264ef..baa60dd73942 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -443,19 +443,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
}
#ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
/*
* Determine a map of object in use on a page.
*
* Node listlock must be held to guarantee that the page does
* not vanish from under us.
*/
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ VM_BUG_ON(!irqs_disabled());
+
+ spin_lock(&object_map_lock);
+
+ bitmap_zero(object_map, page->objects);
+
for (p = page->freelist; p; p = get_freepointer(s, p))
- set_bit(slab_index(p, s, addr), map);
+ set_bit(slab_index(p, s, addr), object_map);
+
+ return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+ VM_BUG_ON(map != object_map);
+ lockdep_assert_held(&object_map_lock);
+
+ spin_unlock(&object_map_lock);
}
static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3685,13 +3704,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
+ unsigned long *map;
+
slab_err(s, page, text, s->name);
slab_lock(page);
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3699,8 +3717,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
print_tracking(s, p);
}
}
+ put_map(map);
+
slab_unlock(page);
- bitmap_free(map);
#endif
}
@@ -4386,19 +4405,19 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ unsigned long *map;
+
+ slab_lock(page);
if (!check_slab(s, page) || !on_freelist(s, page, NULL))
- return;
+ goto unlock;
/* Now we know that a valid freelist exists */
- bitmap_zero(map, page->objects);
-
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
u8 val = test_bit(slab_index(p, s, addr), map) ?
SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4406,18 +4425,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
if (!check_object(s, page, p, val))
break;
}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
-{
- slab_lock(page);
- validate_slab(s, page, map);
+ put_map(map);
+unlock:
slab_unlock(page);
}
static int validate_slab_node(struct kmem_cache *s,
- struct kmem_cache_node *n, unsigned long *map)
+ struct kmem_cache_node *n)
{
unsigned long count = 0;
struct page *page;
@@ -4426,7 +4440,7 @@ static int validate_slab_node(struct kmem_cache *s,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != n->nr_partial)
@@ -4437,7 +4451,7 @@ static int validate_slab_node(struct kmem_cache *s,
goto out;
list_for_each_entry(page, &n->full, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != atomic_long_read(&n->nr_slabs))
@@ -4454,15 +4468,11 @@ static long validate_slab_cache(struct kmem_cache *s)
int node;
unsigned long count = 0;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
- if (!map)
- return -ENOMEM;
flush_all(s);
for_each_kmem_cache_node(s, node, n)
- count += validate_slab_node(s, n, map);
- bitmap_free(map);
+ count += validate_slab_node(s, n);
+
return count;
}
/*
@@ -4592,18 +4602,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
}
static void process_slab(struct loc_track *t, struct kmem_cache *s,
- struct page *page, enum track_item alloc,
- unsigned long *map)
+ struct page *page, enum track_item alloc)
{
void *addr = page_address(page);
void *p;
+ unsigned long *map;
- bitmap_zero(map, page->objects);
- get_map(s, page, map);
-
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
+ put_map(map);
}
static int list_locations(struct kmem_cache *s, char *buf,
@@ -4614,11 +4623,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
struct loc_track t = { 0, 0, NULL };
int node;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
- if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
- GFP_KERNEL)) {
- bitmap_free(map);
+ if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+ GFP_KERNEL)) {
return sprintf(buf, "Out of memory\n");
}
/* Push back cpu slabs */
@@ -4633,9 +4640,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
list_for_each_entry(page, &n->full, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
spin_unlock_irqrestore(&n->list_lock, flags);
}
@@ -4684,7 +4691,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
}
free_loc_track(&t);
- bitmap_free(map);
if (!t.count)
len += sprintf(buf, "No data\n");
return len;
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock
2019-09-12 2:31 ` Yu Zhao
(?)
@ 2019-09-12 10:04 ` Kirill A. Shutemov
-1 siblings, 0 replies; 64+ messages in thread
From: Kirill A. Shutemov @ 2019-09-12 10:04 UTC (permalink / raw)
To: Yu Zhao
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel
On Wed, Sep 11, 2019 at 08:31:10PM -0600, Yu Zhao wrote:
> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
>
> Fixing the problem by using a static bitmap instead.
>
> WARNING: possible recursive locking detected
> --------------------------------------------
> mount-encrypted/4921 is trying to acquire lock:
> (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
>
> but task is already holding lock:
> (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(&n->list_lock)->rlock);
> lock(&(&n->list_lock)->rlock);
>
> *** DEADLOCK ***
>
> Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v2 4/4] mm: lock slub page when listing objects
2019-09-12 2:31 ` Yu Zhao
@ 2019-09-12 2:31 ` Yu Zhao
-1 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-12 2:31 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao
Though I have no idea what the side effect of such race would be,
apparently we want to prevent the free list from being changed
while debugging the objects.
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/slub.c b/mm/slub.c
index baa60dd73942..1c9726c28f0b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4608,11 +4608,15 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s,
void *p;
unsigned long *map;
+ slab_lock(page);
+
map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
put_map(map);
+
+ slab_unlock(page);
}
static int list_locations(struct kmem_cache *s, char *buf,
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH v2 4/4] mm: lock slub page when listing objects
@ 2019-09-12 2:31 ` Yu Zhao
0 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-12 2:31 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao
Though I have no idea what the side effect of such race would be,
apparently we want to prevent the free list from being changed
while debugging the objects.
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/slub.c b/mm/slub.c
index baa60dd73942..1c9726c28f0b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4608,11 +4608,15 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s,
void *p;
unsigned long *map;
+ slab_lock(page);
+
map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
put_map(map);
+
+ slab_unlock(page);
}
static int list_locations(struct kmem_cache *s, char *buf,
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: [PATCH v2 4/4] mm: lock slub page when listing objects
2019-09-12 2:31 ` Yu Zhao
(?)
@ 2019-09-12 10:06 ` Kirill A. Shutemov
2019-09-12 21:12 ` Yu Zhao
-1 siblings, 1 reply; 64+ messages in thread
From: Kirill A. Shutemov @ 2019-09-12 10:06 UTC (permalink / raw)
To: Yu Zhao
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel
On Wed, Sep 11, 2019 at 08:31:11PM -0600, Yu Zhao wrote:
> Though I have no idea what the side effect of such race would be,
> apparently we want to prevent the free list from being changed
> while debugging the objects.
Codewise looks good to me. But commit message can be better.
Can we repharase it to indicate that slab_lock is required to protect
page->objects?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v2 4/4] mm: lock slub page when listing objects
2019-09-12 10:06 ` Kirill A. Shutemov
@ 2019-09-12 21:12 ` Yu Zhao
0 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-12 21:12 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel
On Thu, Sep 12, 2019 at 01:06:42PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 11, 2019 at 08:31:11PM -0600, Yu Zhao wrote:
> > Though I have no idea what the side effect of such race would be,
> > apparently we want to prevent the free list from being changed
> > while debugging the objects.
>
> Codewise looks good to me. But commit message can be better.
>
> Can we repharase it to indicate that slab_lock is required to protect
> page->objects?
Will do.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v2 4/4] mm: lock slub page when listing objects
2019-09-12 2:31 ` Yu Zhao
@ 2019-09-13 14:58 ` Christopher Lameter
-1 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-09-13 14:58 UTC (permalink / raw)
To: Yu Zhao
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel
On Wed, 11 Sep 2019, Yu Zhao wrote:
> Though I have no idea what the side effect of such race would be,
> apparently we want to prevent the free list from being changed
> while debugging the objects.
process_slab() is called under the list_lock which prevents any allocation
from the free list in the slab page. This means that new objects can be
added to the freelist which occurs by updating the freelist pointer in the
slab page with a pointer to the newly free object which in turn contains
the old freelist pointr.
It is therefore possible to safely traverse the objects on the freelist
after the pointer has been retrieved
NAK.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v2 4/4] mm: lock slub page when listing objects
@ 2019-09-13 14:58 ` Christopher Lameter
0 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-09-13 14:58 UTC (permalink / raw)
To: Yu Zhao
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel
On Wed, 11 Sep 2019, Yu Zhao wrote:
> Though I have no idea what the side effect of such race would be,
> apparently we want to prevent the free list from being changed
> while debugging the objects.
process_slab() is called under the list_lock which prevents any allocation
from the free list in the slab page. This means that new objects can be
added to the freelist which occurs by updating the freelist pointer in the
slab page with a pointer to the newly free object which in turn contains
the old freelist pointr.
It is therefore possible to safely traverse the objects on the freelist
after the pointer has been retrieved
NAK.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects
2019-09-12 2:31 ` Yu Zhao
` (3 preceding siblings ...)
(?)
@ 2019-09-12 9:40 ` Kirill A. Shutemov
2019-09-12 21:11 ` Yu Zhao
-1 siblings, 1 reply; 64+ messages in thread
From: Kirill A. Shutemov @ 2019-09-12 9:40 UTC (permalink / raw)
To: Yu Zhao
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel
On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote:
> Mask of slub objects per page shouldn't be larger than what
> page->objects can hold.
>
> It requires more than 2^15 objects to hit the problem, and I don't
> think anybody would. It'd be nice to have the mask fixed, but not
> really worth cc'ing the stable.
>
> Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
> Signed-off-by: Yu Zhao <yuzhao@google.com>
I don't think the patch fixes anything.
Yes, we have one spare bit between order and number of object that is not
used and always zero. So what?
I can imagine for some microarchitecures accessing higher 16 bits of int
is cheaper than shifting by 15.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects
2019-09-12 9:40 ` [PATCH v2 1/4] mm: correct mask size for slub page->objects Kirill A. Shutemov
@ 2019-09-12 21:11 ` Yu Zhao
2019-09-12 22:03 ` Kirill A. Shutemov
0 siblings, 1 reply; 64+ messages in thread
From: Yu Zhao @ 2019-09-12 21:11 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel
On Thu, Sep 12, 2019 at 12:40:35PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote:
> > Mask of slub objects per page shouldn't be larger than what
> > page->objects can hold.
> >
> > It requires more than 2^15 objects to hit the problem, and I don't
> > think anybody would. It'd be nice to have the mask fixed, but not
> > really worth cc'ing the stable.
> >
> > Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
> > Signed-off-by: Yu Zhao <yuzhao@google.com>
>
> I don't think the patch fixes anything.
Technically it does. It makes no sense for a mask to have more bits
than the variable that holds the masked value. I had to look up the
commit history to find out why and go through the code to make sure
it doesn't actually cause any problem.
My hope is that nobody else would have to go through the same trouble.
> Yes, we have one spare bit between order and number of object that is not
> used and always zero. So what?
>
> I can imagine for some microarchitecures accessing higher 16 bits of int
> is cheaper than shifting by 15.
Well, I highly doubt the inconsistency is intended for such
optimization, even it existed.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects
2019-09-12 21:11 ` Yu Zhao
@ 2019-09-12 22:03 ` Kirill A. Shutemov
0 siblings, 0 replies; 64+ messages in thread
From: Kirill A. Shutemov @ 2019-09-12 22:03 UTC (permalink / raw)
To: Yu Zhao
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel
On Thu, Sep 12, 2019 at 03:11:14PM -0600, Yu Zhao wrote:
> On Thu, Sep 12, 2019 at 12:40:35PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote:
> > > Mask of slub objects per page shouldn't be larger than what
> > > page->objects can hold.
> > >
> > > It requires more than 2^15 objects to hit the problem, and I don't
> > > think anybody would. It'd be nice to have the mask fixed, but not
> > > really worth cc'ing the stable.
> > >
> > > Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
> > > Signed-off-by: Yu Zhao <yuzhao@google.com>
> >
> > I don't think the patch fixes anything.
>
> Technically it does. It makes no sense for a mask to have more bits
> than the variable that holds the masked value. I had to look up the
> commit history to find out why and go through the code to make sure
> it doesn't actually cause any problem.
>
> My hope is that nobody else would have to go through the same trouble.
Just put some comments then.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v3 1/2] mm: clean up validate_slab()
2019-09-12 2:31 ` Yu Zhao
@ 2019-09-14 0:07 ` Yu Zhao
-1 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-14 0:07 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao
The function doesn't need to return any value, and the check can be
done in one pass.
There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..445ef8b2aec0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4384,31 +4384,26 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
unsigned long *map)
{
void *p;
void *addr = page_address(page);
- if (!check_slab(s, page) ||
- !on_freelist(s, page, NULL))
- return 0;
+ if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+ return;
/* Now we know that a valid freelist exists */
bitmap_zero(map, page->objects);
get_map(s, page, map);
for_each_object(p, s, addr, page->objects) {
- if (test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_INACTIVE))
- return 0;
- }
+ u8 val = test_bit(slab_index(p, s, addr), map) ?
+ SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
- for_each_object(p, s, addr, page->objects)
- if (!test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_ACTIVE))
- return 0;
- return 1;
+ if (!check_object(s, page, p, val))
+ break;
+ }
}
static void validate_slab_slab(struct kmem_cache *s, struct page *page,
--
2.23.0.237.gc6a4ce50a0-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH v3 1/2] mm: clean up validate_slab()
@ 2019-09-14 0:07 ` Yu Zhao
0 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-14 0:07 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao
The function doesn't need to return any value, and the check can be
done in one pass.
There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..445ef8b2aec0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4384,31 +4384,26 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
unsigned long *map)
{
void *p;
void *addr = page_address(page);
- if (!check_slab(s, page) ||
- !on_freelist(s, page, NULL))
- return 0;
+ if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+ return;
/* Now we know that a valid freelist exists */
bitmap_zero(map, page->objects);
get_map(s, page, map);
for_each_object(p, s, addr, page->objects) {
- if (test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_INACTIVE))
- return 0;
- }
+ u8 val = test_bit(slab_index(p, s, addr), map) ?
+ SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
- for_each_object(p, s, addr, page->objects)
- if (!test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_ACTIVE))
- return 0;
- return 1;
+ if (!check_object(s, page, p, val))
+ break;
+ }
}
static void validate_slab_slab(struct kmem_cache *s, struct page *page,
--
2.23.0.237.gc6a4ce50a0-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock
2019-09-14 0:07 ` Yu Zhao
@ 2019-09-14 0:07 ` Yu Zhao
-1 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-14 0:07 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov
If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.
Fixing the problem by using a static bitmap instead.
WARNING: possible recursive locking detected
--------------------------------------------
mount-encrypted/4921 is trying to acquire lock:
(&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
but task is already holding lock:
(&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&n->list_lock)->rlock);
lock(&(&n->list_lock)->rlock);
*** DEADLOCK ***
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
1 file changed, 47 insertions(+), 41 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 445ef8b2aec0..c1de01730648 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -441,19 +441,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
}
#ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
/*
* Determine a map of object in use on a page.
*
* Node listlock must be held to guarantee that the page does
* not vanish from under us.
*/
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ VM_BUG_ON(!irqs_disabled());
+
+ spin_lock(&object_map_lock);
+
+ bitmap_zero(object_map, page->objects);
+
for (p = page->freelist; p; p = get_freepointer(s, p))
- set_bit(slab_index(p, s, addr), map);
+ set_bit(slab_index(p, s, addr), object_map);
+
+ return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+ VM_BUG_ON(map != object_map);
+ lockdep_assert_held(&object_map_lock);
+
+ spin_unlock(&object_map_lock);
}
static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3683,13 +3702,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
+ unsigned long *map;
+
slab_err(s, page, text, s->name);
slab_lock(page);
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3697,8 +3715,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
print_tracking(s, p);
}
}
+ put_map(map);
+
slab_unlock(page);
- bitmap_free(map);
#endif
}
@@ -4384,19 +4403,19 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ unsigned long *map;
+
+ slab_lock(page);
if (!check_slab(s, page) || !on_freelist(s, page, NULL))
- return;
+ goto unlock;
/* Now we know that a valid freelist exists */
- bitmap_zero(map, page->objects);
-
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
u8 val = test_bit(slab_index(p, s, addr), map) ?
SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4404,18 +4423,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
if (!check_object(s, page, p, val))
break;
}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
-{
- slab_lock(page);
- validate_slab(s, page, map);
+ put_map(map);
+unlock:
slab_unlock(page);
}
static int validate_slab_node(struct kmem_cache *s,
- struct kmem_cache_node *n, unsigned long *map)
+ struct kmem_cache_node *n)
{
unsigned long count = 0;
struct page *page;
@@ -4424,7 +4438,7 @@ static int validate_slab_node(struct kmem_cache *s,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != n->nr_partial)
@@ -4435,7 +4449,7 @@ static int validate_slab_node(struct kmem_cache *s,
goto out;
list_for_each_entry(page, &n->full, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != atomic_long_read(&n->nr_slabs))
@@ -4452,15 +4466,11 @@ static long validate_slab_cache(struct kmem_cache *s)
int node;
unsigned long count = 0;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
- if (!map)
- return -ENOMEM;
flush_all(s);
for_each_kmem_cache_node(s, node, n)
- count += validate_slab_node(s, n, map);
- bitmap_free(map);
+ count += validate_slab_node(s, n);
+
return count;
}
/*
@@ -4590,18 +4600,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
}
static void process_slab(struct loc_track *t, struct kmem_cache *s,
- struct page *page, enum track_item alloc,
- unsigned long *map)
+ struct page *page, enum track_item alloc)
{
void *addr = page_address(page);
void *p;
+ unsigned long *map;
- bitmap_zero(map, page->objects);
- get_map(s, page, map);
-
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
+ put_map(map);
}
static int list_locations(struct kmem_cache *s, char *buf,
@@ -4612,11 +4621,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
struct loc_track t = { 0, 0, NULL };
int node;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
- if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
- GFP_KERNEL)) {
- bitmap_free(map);
+ if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+ GFP_KERNEL)) {
return sprintf(buf, "Out of memory\n");
}
/* Push back cpu slabs */
@@ -4631,9 +4638,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
list_for_each_entry(page, &n->full, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
spin_unlock_irqrestore(&n->list_lock, flags);
}
@@ -4682,7 +4689,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
}
free_loc_track(&t);
- bitmap_free(map);
if (!t.count)
len += sprintf(buf, "No data\n");
return len;
--
2.23.0.237.gc6a4ce50a0-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock
@ 2019-09-14 0:07 ` Yu Zhao
0 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-09-14 0:07 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov
If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.
Fixing the problem by using a static bitmap instead.
WARNING: possible recursive locking detected
--------------------------------------------
mount-encrypted/4921 is trying to acquire lock:
(&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
but task is already holding lock:
(&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&n->list_lock)->rlock);
lock(&(&n->list_lock)->rlock);
*** DEADLOCK ***
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
1 file changed, 47 insertions(+), 41 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 445ef8b2aec0..c1de01730648 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -441,19 +441,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
}
#ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
/*
* Determine a map of object in use on a page.
*
* Node listlock must be held to guarantee that the page does
* not vanish from under us.
*/
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ VM_BUG_ON(!irqs_disabled());
+
+ spin_lock(&object_map_lock);
+
+ bitmap_zero(object_map, page->objects);
+
for (p = page->freelist; p; p = get_freepointer(s, p))
- set_bit(slab_index(p, s, addr), map);
+ set_bit(slab_index(p, s, addr), object_map);
+
+ return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+ VM_BUG_ON(map != object_map);
+ lockdep_assert_held(&object_map_lock);
+
+ spin_unlock(&object_map_lock);
}
static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3683,13 +3702,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
+ unsigned long *map;
+
slab_err(s, page, text, s->name);
slab_lock(page);
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3697,8 +3715,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
print_tracking(s, p);
}
}
+ put_map(map);
+
slab_unlock(page);
- bitmap_free(map);
#endif
}
@@ -4384,19 +4403,19 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ unsigned long *map;
+
+ slab_lock(page);
if (!check_slab(s, page) || !on_freelist(s, page, NULL))
- return;
+ goto unlock;
/* Now we know that a valid freelist exists */
- bitmap_zero(map, page->objects);
-
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
u8 val = test_bit(slab_index(p, s, addr), map) ?
SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4404,18 +4423,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
if (!check_object(s, page, p, val))
break;
}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
-{
- slab_lock(page);
- validate_slab(s, page, map);
+ put_map(map);
+unlock:
slab_unlock(page);
}
static int validate_slab_node(struct kmem_cache *s,
- struct kmem_cache_node *n, unsigned long *map)
+ struct kmem_cache_node *n)
{
unsigned long count = 0;
struct page *page;
@@ -4424,7 +4438,7 @@ static int validate_slab_node(struct kmem_cache *s,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != n->nr_partial)
@@ -4435,7 +4449,7 @@ static int validate_slab_node(struct kmem_cache *s,
goto out;
list_for_each_entry(page, &n->full, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != atomic_long_read(&n->nr_slabs))
@@ -4452,15 +4466,11 @@ static long validate_slab_cache(struct kmem_cache *s)
int node;
unsigned long count = 0;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
- if (!map)
- return -ENOMEM;
flush_all(s);
for_each_kmem_cache_node(s, node, n)
- count += validate_slab_node(s, n, map);
- bitmap_free(map);
+ count += validate_slab_node(s, n);
+
return count;
}
/*
@@ -4590,18 +4600,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
}
static void process_slab(struct loc_track *t, struct kmem_cache *s,
- struct page *page, enum track_item alloc,
- unsigned long *map)
+ struct page *page, enum track_item alloc)
{
void *addr = page_address(page);
void *p;
+ unsigned long *map;
- bitmap_zero(map, page->objects);
- get_map(s, page, map);
-
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
+ put_map(map);
}
static int list_locations(struct kmem_cache *s, char *buf,
@@ -4612,11 +4621,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
struct loc_track t = { 0, 0, NULL };
int node;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
- if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
- GFP_KERNEL)) {
- bitmap_free(map);
+ if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+ GFP_KERNEL)) {
return sprintf(buf, "Out of memory\n");
}
/* Push back cpu slabs */
@@ -4631,9 +4638,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
list_for_each_entry(page, &n->full, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
spin_unlock_irqrestore(&n->list_lock, flags);
}
@@ -4682,7 +4689,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
}
free_loc_track(&t);
- bitmap_free(map);
if (!t.count)
len += sprintf(buf, "No data\n");
return len;
--
2.23.0.237.gc6a4ce50a0-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: [PATCH v3 1/2] mm: clean up validate_slab()
2019-09-14 0:07 ` Yu Zhao
(?)
(?)
@ 2019-09-16 8:39 ` Kirill A. Shutemov
-1 siblings, 0 replies; 64+ messages in thread
From: Kirill A. Shutemov @ 2019-09-16 8:39 UTC (permalink / raw)
To: Yu Zhao
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Tetsuo Handa, linux-mm, linux-kernel
On Fri, Sep 13, 2019 at 06:07:42PM -0600, Yu Zhao wrote:
> The function doesn't need to return any value, and the check can be
> done in one pass.
>
> There is a behavior change: before the patch, we stop at the first
> invalid free object; after the patch, we stop at the first invalid
> object, free or in use. This shouldn't matter because the original
> behavior isn't intended anyway.
>
> Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 1/2] mm: clean up validate_slab()
2019-09-14 0:07 ` Yu Zhao
@ 2019-11-08 19:39 ` Yu Zhao
-1 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-11-08 19:39 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov
The function doesn't need to return any value, and the check can be
done in one pass.
There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index b25c807a111f..6930c3febad7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4404,31 +4404,26 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
unsigned long *map)
{
void *p;
void *addr = page_address(page);
- if (!check_slab(s, page) ||
- !on_freelist(s, page, NULL))
- return 0;
+ if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+ return;
/* Now we know that a valid freelist exists */
bitmap_zero(map, page->objects);
get_map(s, page, map);
for_each_object(p, s, addr, page->objects) {
- if (test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_INACTIVE))
- return 0;
- }
+ u8 val = test_bit(slab_index(p, s, addr), map) ?
+ SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
- for_each_object(p, s, addr, page->objects)
- if (!test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_ACTIVE))
- return 0;
- return 1;
+ if (!check_object(s, page, p, val))
+ break;
+ }
}
static void validate_slab_slab(struct kmem_cache *s, struct page *page,
--
2.24.0.rc1.363.gb1bccd3e3d-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH v4 1/2] mm: clean up validate_slab()
@ 2019-11-08 19:39 ` Yu Zhao
0 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-11-08 19:39 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov
The function doesn't need to return any value, and the check can be
done in one pass.
There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index b25c807a111f..6930c3febad7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4404,31 +4404,26 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
unsigned long *map)
{
void *p;
void *addr = page_address(page);
- if (!check_slab(s, page) ||
- !on_freelist(s, page, NULL))
- return 0;
+ if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+ return;
/* Now we know that a valid freelist exists */
bitmap_zero(map, page->objects);
get_map(s, page, map);
for_each_object(p, s, addr, page->objects) {
- if (test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_INACTIVE))
- return 0;
- }
+ u8 val = test_bit(slab_index(p, s, addr), map) ?
+ SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
- for_each_object(p, s, addr, page->objects)
- if (!test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_ACTIVE))
- return 0;
- return 1;
+ if (!check_object(s, page, p, val))
+ break;
+ }
}
static void validate_slab_slab(struct kmem_cache *s, struct page *page,
--
2.24.0.rc1.363.gb1bccd3e3d-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
2019-11-08 19:39 ` Yu Zhao
@ 2019-11-08 19:39 ` Yu Zhao
-1 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-11-08 19:39 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov
If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.
Fixing the problem by using a static bitmap instead.
WARNING: possible recursive locking detected
--------------------------------------------
mount-encrypted/4921 is trying to acquire lock:
(&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
but task is already holding lock:
(&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&n->list_lock)->rlock);
lock(&(&n->list_lock)->rlock);
*** DEADLOCK ***
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
1 file changed, 47 insertions(+), 41 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 6930c3febad7..7a4ec3c4b4d9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -441,19 +441,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
}
#ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
/*
* Determine a map of object in use on a page.
*
* Node listlock must be held to guarantee that the page does
* not vanish from under us.
*/
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ VM_BUG_ON(!irqs_disabled());
+
+ spin_lock(&object_map_lock);
+
+ bitmap_zero(object_map, page->objects);
+
for (p = page->freelist; p; p = get_freepointer(s, p))
- set_bit(slab_index(p, s, addr), map);
+ set_bit(slab_index(p, s, addr), object_map);
+
+ return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+ VM_BUG_ON(map != object_map);
+ lockdep_assert_held(&object_map_lock);
+
+ spin_unlock(&object_map_lock);
}
static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3695,13 +3714,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
+ unsigned long *map;
+
slab_err(s, page, text, s->name);
slab_lock(page);
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3709,8 +3727,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
print_tracking(s, p);
}
}
+ put_map(map);
+
slab_unlock(page);
- bitmap_free(map);
#endif
}
@@ -4404,19 +4423,19 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ unsigned long *map;
+
+ slab_lock(page);
if (!check_slab(s, page) || !on_freelist(s, page, NULL))
- return;
+ goto unlock;
/* Now we know that a valid freelist exists */
- bitmap_zero(map, page->objects);
-
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
u8 val = test_bit(slab_index(p, s, addr), map) ?
SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4424,18 +4443,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
if (!check_object(s, page, p, val))
break;
}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
-{
- slab_lock(page);
- validate_slab(s, page, map);
+ put_map(map);
+unlock:
slab_unlock(page);
}
static int validate_slab_node(struct kmem_cache *s,
- struct kmem_cache_node *n, unsigned long *map)
+ struct kmem_cache_node *n)
{
unsigned long count = 0;
struct page *page;
@@ -4444,7 +4458,7 @@ static int validate_slab_node(struct kmem_cache *s,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != n->nr_partial)
@@ -4455,7 +4469,7 @@ static int validate_slab_node(struct kmem_cache *s,
goto out;
list_for_each_entry(page, &n->full, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != atomic_long_read(&n->nr_slabs))
@@ -4472,15 +4486,11 @@ static long validate_slab_cache(struct kmem_cache *s)
int node;
unsigned long count = 0;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
- if (!map)
- return -ENOMEM;
flush_all(s);
for_each_kmem_cache_node(s, node, n)
- count += validate_slab_node(s, n, map);
- bitmap_free(map);
+ count += validate_slab_node(s, n);
+
return count;
}
/*
@@ -4610,18 +4620,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
}
static void process_slab(struct loc_track *t, struct kmem_cache *s,
- struct page *page, enum track_item alloc,
- unsigned long *map)
+ struct page *page, enum track_item alloc)
{
void *addr = page_address(page);
void *p;
+ unsigned long *map;
- bitmap_zero(map, page->objects);
- get_map(s, page, map);
-
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
+ put_map(map);
}
static int list_locations(struct kmem_cache *s, char *buf,
@@ -4632,11 +4641,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
struct loc_track t = { 0, 0, NULL };
int node;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
- if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
- GFP_KERNEL)) {
- bitmap_free(map);
+ if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+ GFP_KERNEL)) {
return sprintf(buf, "Out of memory\n");
}
/* Push back cpu slabs */
@@ -4651,9 +4658,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
list_for_each_entry(page, &n->full, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
spin_unlock_irqrestore(&n->list_lock, flags);
}
@@ -4702,7 +4709,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
}
free_loc_track(&t);
- bitmap_free(map);
if (!t.count)
len += sprintf(buf, "No data\n");
return len;
--
2.24.0.rc1.363.gb1bccd3e3d-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
@ 2019-11-08 19:39 ` Yu Zhao
0 siblings, 0 replies; 64+ messages in thread
From: Yu Zhao @ 2019-11-08 19:39 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa
Cc: linux-mm, linux-kernel, Yu Zhao, Kirill A . Shutemov
If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.
Fixing the problem by using a static bitmap instead.
WARNING: possible recursive locking detected
--------------------------------------------
mount-encrypted/4921 is trying to acquire lock:
(&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
but task is already holding lock:
(&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&n->list_lock)->rlock);
lock(&(&n->list_lock)->rlock);
*** DEADLOCK ***
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
1 file changed, 47 insertions(+), 41 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 6930c3febad7..7a4ec3c4b4d9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -441,19 +441,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
}
#ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
/*
* Determine a map of object in use on a page.
*
* Node listlock must be held to guarantee that the page does
* not vanish from under us.
*/
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ VM_BUG_ON(!irqs_disabled());
+
+ spin_lock(&object_map_lock);
+
+ bitmap_zero(object_map, page->objects);
+
for (p = page->freelist; p; p = get_freepointer(s, p))
- set_bit(slab_index(p, s, addr), map);
+ set_bit(slab_index(p, s, addr), object_map);
+
+ return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+ VM_BUG_ON(map != object_map);
+ lockdep_assert_held(&object_map_lock);
+
+ spin_unlock(&object_map_lock);
}
static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3695,13 +3714,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
+ unsigned long *map;
+
slab_err(s, page, text, s->name);
slab_lock(page);
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3709,8 +3727,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
print_tracking(s, p);
}
}
+ put_map(map);
+
slab_unlock(page);
- bitmap_free(map);
#endif
}
@@ -4404,19 +4423,19 @@ static int count_total(struct page *page)
#endif
#ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ unsigned long *map;
+
+ slab_lock(page);
if (!check_slab(s, page) || !on_freelist(s, page, NULL))
- return;
+ goto unlock;
/* Now we know that a valid freelist exists */
- bitmap_zero(map, page->objects);
-
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
u8 val = test_bit(slab_index(p, s, addr), map) ?
SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4424,18 +4443,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
if (!check_object(s, page, p, val))
break;
}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
-{
- slab_lock(page);
- validate_slab(s, page, map);
+ put_map(map);
+unlock:
slab_unlock(page);
}
static int validate_slab_node(struct kmem_cache *s,
- struct kmem_cache_node *n, unsigned long *map)
+ struct kmem_cache_node *n)
{
unsigned long count = 0;
struct page *page;
@@ -4444,7 +4458,7 @@ static int validate_slab_node(struct kmem_cache *s,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != n->nr_partial)
@@ -4455,7 +4469,7 @@ static int validate_slab_node(struct kmem_cache *s,
goto out;
list_for_each_entry(page, &n->full, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != atomic_long_read(&n->nr_slabs))
@@ -4472,15 +4486,11 @@ static long validate_slab_cache(struct kmem_cache *s)
int node;
unsigned long count = 0;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
- if (!map)
- return -ENOMEM;
flush_all(s);
for_each_kmem_cache_node(s, node, n)
- count += validate_slab_node(s, n, map);
- bitmap_free(map);
+ count += validate_slab_node(s, n);
+
return count;
}
/*
@@ -4610,18 +4620,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
}
static void process_slab(struct loc_track *t, struct kmem_cache *s,
- struct page *page, enum track_item alloc,
- unsigned long *map)
+ struct page *page, enum track_item alloc)
{
void *addr = page_address(page);
void *p;
+ unsigned long *map;
- bitmap_zero(map, page->objects);
- get_map(s, page, map);
-
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
+ put_map(map);
}
static int list_locations(struct kmem_cache *s, char *buf,
@@ -4632,11 +4641,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
struct loc_track t = { 0, 0, NULL };
int node;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
- if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
- GFP_KERNEL)) {
- bitmap_free(map);
+ if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+ GFP_KERNEL)) {
return sprintf(buf, "Out of memory\n");
}
/* Push back cpu slabs */
@@ -4651,9 +4658,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
list_for_each_entry(page, &n->full, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
spin_unlock_irqrestore(&n->list_lock, flags);
}
@@ -4702,7 +4709,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
}
free_loc_track(&t);
- bitmap_free(map);
if (!t.count)
len += sprintf(buf, "No data\n");
return len;
--
2.24.0.rc1.363.gb1bccd3e3d-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
2019-11-08 19:39 ` Yu Zhao
@ 2019-11-09 20:52 ` Christopher Lameter
-1 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-11-09 20:52 UTC (permalink / raw)
To: Yu Zhao
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Fri, 8 Nov 2019, Yu Zhao wrote:
> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
How did this happen? The kmalloc needs to be always done before the
list_lock is taken.
> Fixing the problem by using a static bitmap instead.
>
> WARNING: possible recursive locking detected
> --------------------------------------------
> mount-encrypted/4921 is trying to acquire lock:
> (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
>
> but task is already holding lock:
> (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(&n->list_lock)->rlock);
> lock(&(&n->list_lock)->rlock);
>
> *** DEADLOCK ***
Ahh. list_slab_objects() in shutdown?
There is a much easier fix for this:
[FIX] slub: Remove kmalloc under list_lock from list_slab_objects()
list_slab_objects() is called when a slab is destroyed and there are objects still left
to list the objects in the syslog. This is a pretty rare event.
And there it seems we take the list_lock and call kmalloc while holding that lock.
Perform the allocation in free_partial() before the list_lock is taken.
Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
+++ linux/mm/slub.c 2019-11-09 20:43:52.374187381 +0000
@@ -3690,14 +3690,11 @@ error:
}
static void list_slab_objects(struct kmem_cache *s, struct page *page,
- const char *text)
+ const char *text, unsigned long *map)
{
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
slab_err(s, page, text, s->name);
slab_lock(page);
@@ -3723,6 +3720,10 @@ static void free_partial(struct kmem_cac
{
LIST_HEAD(discard);
struct page *page, *h;
+ unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
+
+ if (!map)
+ return;
BUG_ON(irqs_disabled());
spin_lock_irq(&n->list_lock);
@@ -3732,7 +3733,8 @@ static void free_partial(struct kmem_cac
list_add(&page->slab_list, &discard);
} else {
list_slab_objects(s, page,
- "Objects remaining in %s on __kmem_cache_shutdown()");
+ "Objects remaining in %s on __kmem_cache_shutdown()",
+ map);
}
}
spin_unlock_irq(&n->list_lock);
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
@ 2019-11-09 20:52 ` Christopher Lameter
0 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-11-09 20:52 UTC (permalink / raw)
To: Yu Zhao
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Fri, 8 Nov 2019, Yu Zhao wrote:
> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
How did this happen? The kmalloc needs to be always done before the
list_lock is taken.
> Fixing the problem by using a static bitmap instead.
>
> WARNING: possible recursive locking detected
> --------------------------------------------
> mount-encrypted/4921 is trying to acquire lock:
> (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
>
> but task is already holding lock:
> (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(&n->list_lock)->rlock);
> lock(&(&n->list_lock)->rlock);
>
> *** DEADLOCK ***
Ahh. list_slab_objects() in shutdown?
There is a much easier fix for this:
[FIX] slub: Remove kmalloc under list_lock from list_slab_objects()
list_slab_objects() is called when a slab is destroyed and there are objects still left
to list the objects in the syslog. This is a pretty rare event.
And there it seems we take the list_lock and call kmalloc while holding that lock.
Perform the allocation in free_partial() before the list_lock is taken.
Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
+++ linux/mm/slub.c 2019-11-09 20:43:52.374187381 +0000
@@ -3690,14 +3690,11 @@ error:
}
static void list_slab_objects(struct kmem_cache *s, struct page *page,
- const char *text)
+ const char *text, unsigned long *map)
{
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
slab_err(s, page, text, s->name);
slab_lock(page);
@@ -3723,6 +3720,10 @@ static void free_partial(struct kmem_cac
{
LIST_HEAD(discard);
struct page *page, *h;
+ unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
+
+ if (!map)
+ return;
BUG_ON(irqs_disabled());
spin_lock_irq(&n->list_lock);
@@ -3732,7 +3733,8 @@ static void free_partial(struct kmem_cac
list_add(&page->slab_list, &discard);
} else {
list_slab_objects(s, page,
- "Objects remaining in %s on __kmem_cache_shutdown()");
+ "Objects remaining in %s on __kmem_cache_shutdown()",
+ map);
}
}
spin_unlock_irq(&n->list_lock);
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
2019-11-09 20:52 ` Christopher Lameter
(?)
@ 2019-11-09 23:01 ` Yu Zhao
2019-11-09 23:16 ` Christopher Lameter
-1 siblings, 1 reply; 64+ messages in thread
From: Yu Zhao @ 2019-11-09 23:01 UTC (permalink / raw)
To: Christopher Lameter
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Sat, Nov 09, 2019 at 08:52:29PM +0000, Christopher Lameter wrote:
> On Fri, 8 Nov 2019, Yu Zhao wrote:
>
> > If we are already under list_lock, don't call kmalloc(). Otherwise we
> > will run into deadlock because kmalloc() also tries to grab the same
> > lock.
>
> How did this happen? The kmalloc needs to be always done before the
> list_lock is taken.
>
> > Fixing the problem by using a static bitmap instead.
> >
> > WARNING: possible recursive locking detected
> > --------------------------------------------
> > mount-encrypted/4921 is trying to acquire lock:
> > (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> >
> > but task is already holding lock:
> > (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> >
> > other info that might help us debug this:
> > Possible unsafe locking scenario:
> >
> > CPU0
> > ----
> > lock(&(&n->list_lock)->rlock);
> > lock(&(&n->list_lock)->rlock);
> >
> > *** DEADLOCK ***
>
>
> Ahh. list_slab_objects() in shutdown?
>
> There is a much easier fix for this:
>
>
>
> [FIX] slub: Remove kmalloc under list_lock from list_slab_objects()
>
> list_slab_objects() is called when a slab is destroyed and there are objects still left
> to list the objects in the syslog. This is a pretty rare event.
>
> And there it seems we take the list_lock and call kmalloc while holding that lock.
>
> Perform the allocation in free_partial() before the list_lock is taken.
>
> Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
> +++ linux/mm/slub.c 2019-11-09 20:43:52.374187381 +0000
> @@ -3690,14 +3690,11 @@ error:
> }
>
> static void list_slab_objects(struct kmem_cache *s, struct page *page,
> - const char *text)
> + const char *text, unsigned long *map)
> {
> #ifdef CONFIG_SLUB_DEBUG
> void *addr = page_address(page);
> void *p;
> - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
> - if (!map)
> - return;
> slab_err(s, page, text, s->name);
> slab_lock(page);
>
> @@ -3723,6 +3720,10 @@ static void free_partial(struct kmem_cac
> {
> LIST_HEAD(discard);
> struct page *page, *h;
> + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> +
> + if (!map)
> + return;
What would happen if we are trying to allocate from the slab that is
being shut down? And shouldn't the allocation be conditional (i.e.,
only when CONFIG_SLUB_DEBUG=y)?
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
2019-11-09 23:01 ` Yu Zhao
@ 2019-11-09 23:16 ` Christopher Lameter
0 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-11-09 23:16 UTC (permalink / raw)
To: Yu Zhao
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Sat, 9 Nov 2019, Yu Zhao wrote:
> > struct page *page, *h;
> > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > +
> > + if (!map)
> > + return;
>
> What would happen if we are trying to allocate from the slab that is
> being shut down? And shouldn't the allocation be conditional (i.e.,
> only when CONFIG_SLUB_DEBUG=y)?
Kmalloc slabs are never shut down.
The allocation does not hurt and CONFIG_SLUB_DEBUG is on in most
configurations.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
@ 2019-11-09 23:16 ` Christopher Lameter
0 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-11-09 23:16 UTC (permalink / raw)
To: Yu Zhao
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Sat, 9 Nov 2019, Yu Zhao wrote:
> > struct page *page, *h;
> > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > +
> > + if (!map)
> > + return;
>
> What would happen if we are trying to allocate from the slab that is
> being shut down? And shouldn't the allocation be conditional (i.e.,
> only when CONFIG_SLUB_DEBUG=y)?
Kmalloc slabs are never shut down.
The allocation does not hurt and CONFIG_SLUB_DEBUG is on in most
configurations.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
2019-11-09 23:16 ` Christopher Lameter
(?)
@ 2019-11-10 18:47 ` Yu Zhao
2019-11-11 15:47 ` Christopher Lameter
-1 siblings, 1 reply; 64+ messages in thread
From: Yu Zhao @ 2019-11-10 18:47 UTC (permalink / raw)
To: Christopher Lameter
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote:
> On Sat, 9 Nov 2019, Yu Zhao wrote:
>
> > > struct page *page, *h;
> > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > > +
> > > + if (!map)
> > > + return;
> >
> > What would happen if we are trying to allocate from the slab that is
> > being shut down? And shouldn't the allocation be conditional (i.e.,
> > only when CONFIG_SLUB_DEBUG=y)?
>
> Kmalloc slabs are never shut down.
Maybe I'm not thinking straight -- isn't it what caused the deadlock in
the first place?
Kmalloc slabs can be shut down when memcg is on.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
2019-11-10 18:47 ` Yu Zhao
@ 2019-11-11 15:47 ` Christopher Lameter
0 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-11-11 15:47 UTC (permalink / raw)
To: Yu Zhao
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Sun, 10 Nov 2019, Yu Zhao wrote:
> On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote:
> > On Sat, 9 Nov 2019, Yu Zhao wrote:
> >
> > > > struct page *page, *h;
> > > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > > > +
> > > > + if (!map)
> > > > + return;
> > >
> > > What would happen if we are trying to allocate from the slab that is
> > > being shut down? And shouldn't the allocation be conditional (i.e.,
> > > only when CONFIG_SLUB_DEBUG=y)?
> >
> > Kmalloc slabs are never shut down.
>
> Maybe I'm not thinking straight -- isn't it what caused the deadlock in
> the first place?
Well if kmalloc allocations become a problem then we have numerous
issues all over the kernel to fix.
> Kmalloc slabs can be shut down when memcg is on.
Kmalloc needs to work even during shutdown of a memcg.
Maybe we need to fix memcg to not allocate from the current memcg during
shutdown?
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
@ 2019-11-11 15:47 ` Christopher Lameter
0 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-11-11 15:47 UTC (permalink / raw)
To: Yu Zhao
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Sun, 10 Nov 2019, Yu Zhao wrote:
> On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote:
> > On Sat, 9 Nov 2019, Yu Zhao wrote:
> >
> > > > struct page *page, *h;
> > > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > > > +
> > > > + if (!map)
> > > > + return;
> > >
> > > What would happen if we are trying to allocate from the slab that is
> > > being shut down? And shouldn't the allocation be conditional (i.e.,
> > > only when CONFIG_SLUB_DEBUG=y)?
> >
> > Kmalloc slabs are never shut down.
>
> Maybe I'm not thinking straight -- isn't it what caused the deadlock in
> the first place?
Well if kmalloc allocations become a problem then we have numerous
issues all over the kernel to fix.
> Kmalloc slabs can be shut down when memcg is on.
Kmalloc needs to work even during shutdown of a memcg.
Maybe we need to fix memcg to not allocate from the current memcg during
shutdown?
^ permalink raw reply [flat|nested] 64+ messages in thread
* [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
2019-11-11 15:47 ` Christopher Lameter
@ 2019-11-11 15:55 ` Christopher Lameter
-1 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-11-11 15:55 UTC (permalink / raw)
To: Yu Zhao
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
Regardless of the issue with memcgs allowing allocations from its
kmalloc array during shutdown: This patch cleans things up and properly
allocates the bitmap outside of the list_lock.
[FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
V1->V2 : Properly handle CONFIG_SLUB_DEBUG. Handle bitmap free correctly.
list_slab_objects() is called when a slab is destroyed and there are objects still left
to list the objects in the syslog. This is a pretty rare event.
And there it seems we take the list_lock and call kmalloc while holding that lock.
Perform the allocation in free_partial() before the list_lock is taken.
Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
Signed-off-by: Christoph Lameter
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
+++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000
@@ -3690,14 +3690,15 @@ error:
}
static void list_slab_objects(struct kmem_cache *s, struct page *page,
- const char *text)
+ const char *text, unsigned long *map)
{
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
+
if (!map)
return;
+
slab_err(s, page, text, s->name);
slab_lock(page);
@@ -3710,7 +3711,6 @@ static void list_slab_objects(struct kme
}
}
slab_unlock(page);
- bitmap_free(map);
#endif
}
@@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac
{
LIST_HEAD(discard);
struct page *page, *h;
+ unsigned long *map = NULL;
+
+#ifdef CONFIG_SLUB_DEBUG
+ map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
+#endif
BUG_ON(irqs_disabled());
spin_lock_irq(&n->list_lock);
@@ -3732,11 +3737,16 @@ static void free_partial(struct kmem_cac
list_add(&page->slab_list, &discard);
} else {
list_slab_objects(s, page,
- "Objects remaining in %s on __kmem_cache_shutdown()");
+ "Objects remaining in %s on __kmem_cache_shutdown()",
+ map);
}
}
spin_unlock_irq(&n->list_lock);
+#ifdef CONFIG_SLUB_DEBUG
+ bitmap_free(map);
+#endif
+
list_for_each_entry_safe(page, h, &discard, slab_list)
discard_slab(s, page);
}
^ permalink raw reply [flat|nested] 64+ messages in thread
* [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
@ 2019-11-11 15:55 ` Christopher Lameter
0 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-11-11 15:55 UTC (permalink / raw)
To: Yu Zhao
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
Regardless of the issue with memcgs allowing allocations from its
kmalloc array during shutdown: This patch cleans things up and properly
allocates the bitmap outside of the list_lock.
[FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
V1->V2 : Properly handle CONFIG_SLUB_DEBUG. Handle bitmap free correctly.
list_slab_objects() is called when a slab is destroyed and there are objects still left
to list the objects in the syslog. This is a pretty rare event.
And there it seems we take the list_lock and call kmalloc while holding that lock.
Perform the allocation in free_partial() before the list_lock is taken.
Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
Signed-off-by: Christoph Lameter
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
+++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000
@@ -3690,14 +3690,15 @@ error:
}
static void list_slab_objects(struct kmem_cache *s, struct page *page,
- const char *text)
+ const char *text, unsigned long *map)
{
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
+
if (!map)
return;
+
slab_err(s, page, text, s->name);
slab_lock(page);
@@ -3710,7 +3711,6 @@ static void list_slab_objects(struct kme
}
}
slab_unlock(page);
- bitmap_free(map);
#endif
}
@@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac
{
LIST_HEAD(discard);
struct page *page, *h;
+ unsigned long *map = NULL;
+
+#ifdef CONFIG_SLUB_DEBUG
+ map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
+#endif
BUG_ON(irqs_disabled());
spin_lock_irq(&n->list_lock);
@@ -3732,11 +3737,16 @@ static void free_partial(struct kmem_cac
list_add(&page->slab_list, &discard);
} else {
list_slab_objects(s, page,
- "Objects remaining in %s on __kmem_cache_shutdown()");
+ "Objects remaining in %s on __kmem_cache_shutdown()",
+ map);
}
}
spin_unlock_irq(&n->list_lock);
+#ifdef CONFIG_SLUB_DEBUG
+ bitmap_free(map);
+#endif
+
list_for_each_entry_safe(page, h, &discard, slab_list)
discard_slab(s, page);
}
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
2019-11-11 15:55 ` Christopher Lameter
(?)
@ 2019-11-30 23:09 ` Andrew Morton
2019-12-01 1:17 ` Tetsuo Handa
2019-12-02 15:12 ` Christopher Lameter
-1 siblings, 2 replies; 64+ messages in thread
From: Andrew Morton @ 2019-11-30 23:09 UTC (permalink / raw)
To: Christopher Lameter
Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Mon, 11 Nov 2019 15:55:05 +0000 (UTC) Christopher Lameter <cl@linux.com> wrote:
> Regardless of the issue with memcgs allowing allocations from its
> kmalloc array during shutdown: This patch cleans things up and properly
> allocates the bitmap outside of the list_lock.
>
>
> [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
>
> V1->V2 : Properly handle CONFIG_SLUB_DEBUG. Handle bitmap free correctly.
>
> list_slab_objects() is called when a slab is destroyed and there are objects still left
> to list the objects in the syslog. This is a pretty rare event.
>
> And there it seems we take the list_lock and call kmalloc while holding that lock.
>
> Perform the allocation in free_partial() before the list_lock is taken.
No response here? It looks a lot simpler than the originally proposed
patch?
> --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
> +++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000
> @@ -3690,14 +3690,15 @@ error:
> }
>
> static void list_slab_objects(struct kmem_cache *s, struct page *page,
> - const char *text)
> + const char *text, unsigned long *map)
> {
> #ifdef CONFIG_SLUB_DEBUG
> void *addr = page_address(page);
> void *p;
> - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
> +
> if (!map)
> return;
> +
> slab_err(s, page, text, s->name);
> slab_lock(page);
>
> @@ -3710,7 +3711,6 @@ static void list_slab_objects(struct kme
> }
> }
> slab_unlock(page);
> - bitmap_free(map);
> #endif
> }
>
> @@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac
> {
> LIST_HEAD(discard);
> struct page *page, *h;
> + unsigned long *map = NULL;
> +
> +#ifdef CONFIG_SLUB_DEBUG
> + map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> +#endif
>
> BUG_ON(irqs_disabled());
> spin_lock_irq(&n->list_lock);
> @@ -3732,11 +3737,16 @@ static void free_partial(struct kmem_cac
> list_add(&page->slab_list, &discard);
> } else {
> list_slab_objects(s, page,
> - "Objects remaining in %s on __kmem_cache_shutdown()");
> + "Objects remaining in %s on __kmem_cache_shutdown()",
> + map);
> }
> }
> spin_unlock_irq(&n->list_lock);
>
> +#ifdef CONFIG_SLUB_DEBUG
> + bitmap_free(map);
> +#endif
> +
> list_for_each_entry_safe(page, h, &discard, slab_list)
> discard_slab(s, page);
> }
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
2019-11-30 23:09 ` Andrew Morton
@ 2019-12-01 1:17 ` Tetsuo Handa
2019-12-02 15:12 ` Christopher Lameter
1 sibling, 0 replies; 64+ messages in thread
From: Tetsuo Handa @ 2019-12-01 1:17 UTC (permalink / raw)
To: Andrew Morton, Christopher Lameter
Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim,
Kirill A . Shutemov, linux-mm, linux-kernel, Kirill A . Shutemov
On 2019/12/01 8:09, Andrew Morton wrote:
>> Perform the allocation in free_partial() before the list_lock is taken.
>
> No response here? It looks a lot simpler than the originally proposed
> patch?
>
>> --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
>> +++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000
>> @@ -3690,14 +3690,15 @@ error:
>> }
>>
>> static void list_slab_objects(struct kmem_cache *s, struct page *page,
>> - const char *text)
>> + const char *text, unsigned long *map)
>> {
>> #ifdef CONFIG_SLUB_DEBUG
>> void *addr = page_address(page);
>> void *p;
>> - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
Changing from !(__GFP_IO | __GFP_FS) allocation to
>> +
>> if (!map)
>> return;
>> +
>> slab_err(s, page, text, s->name);
>> slab_lock(page);
>>
>> @@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac
>> {
>> LIST_HEAD(discard);
>> struct page *page, *h;
>> + unsigned long *map = NULL;
>> +
>> +#ifdef CONFIG_SLUB_DEBUG
>> + map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
__GFP_IO | __GFP_FS allocation.
How is this path guaranteed to be safe to perform __GFP_IO | __GFP_FS reclaim?
>> +#endif
>>
>> BUG_ON(irqs_disabled());
>> spin_lock_irq(&n->list_lock);
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
2019-11-30 23:09 ` Andrew Morton
@ 2019-12-02 15:12 ` Christopher Lameter
2019-12-02 15:12 ` Christopher Lameter
1 sibling, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-12-02 15:12 UTC (permalink / raw)
To: Andrew Morton
Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Sat, 30 Nov 2019, Andrew Morton wrote:
> > Perform the allocation in free_partial() before the list_lock is taken.
>
> No response here? It looks a lot simpler than the originally proposed
> patch?
Yup. I prefer this one but its my own patch so I cannot Ack this.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
@ 2019-12-02 15:12 ` Christopher Lameter
0 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2019-12-02 15:12 UTC (permalink / raw)
To: Andrew Morton
Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Sat, 30 Nov 2019, Andrew Morton wrote:
> > Perform the allocation in free_partial() before the list_lock is taken.
>
> No response here? It looks a lot simpler than the originally proposed
> patch?
Yup. I prefer this one but its my own patch so I cannot Ack this.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
2019-12-02 15:12 ` Christopher Lameter
(?)
@ 2019-12-07 22:03 ` Yu Zhao
2020-01-10 14:11 ` Vlastimil Babka
-1 siblings, 1 reply; 64+ messages in thread
From: Yu Zhao @ 2019-12-07 22:03 UTC (permalink / raw)
To: Christopher Lameter
Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On Mon, Dec 02, 2019 at 03:12:20PM +0000, Christopher Lameter wrote:
> On Sat, 30 Nov 2019, Andrew Morton wrote:
>
> > > Perform the allocation in free_partial() before the list_lock is taken.
> >
> > No response here? It looks a lot simpler than the originally proposed
> > patch?
>
> Yup. I prefer this one but its my own patch so I cannot Ack this.
Hi, there is a pending question from Tetsuo-san. I'd be happy to ack
once it's address.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
2019-12-07 22:03 ` Yu Zhao
@ 2020-01-10 14:11 ` Vlastimil Babka
2020-01-12 11:03 ` Tetsuo Handa
0 siblings, 1 reply; 64+ messages in thread
From: Vlastimil Babka @ 2020-01-10 14:11 UTC (permalink / raw)
To: Yu Zhao, Christopher Lameter
Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
Kirill A . Shutemov, Tetsuo Handa, linux-mm, linux-kernel,
Kirill A . Shutemov
On 12/7/19 11:03 PM, Yu Zhao wrote:
> On Mon, Dec 02, 2019 at 03:12:20PM +0000, Christopher Lameter wrote:
>> On Sat, 30 Nov 2019, Andrew Morton wrote:
>>
>>>> Perform the allocation in free_partial() before the list_lock is taken.
>>>
>>> No response here? It looks a lot simpler than the originally proposed
>>> patch?
>>
>> Yup. I prefer this one but its my own patch so I cannot Ack this.
>
> Hi, there is a pending question from Tetsuo-san. I'd be happy to ack
> once it's address.
Tetsuo's mails don't reach linux-mm for a while and he has given up
trying to do something about it. It's hard to discuss anything outside
the direct CC group then. I don't know what's the pending question, for
example.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
2020-01-10 14:11 ` Vlastimil Babka
@ 2020-01-12 11:03 ` Tetsuo Handa
2020-01-13 1:34 ` Christopher Lameter
0 siblings, 1 reply; 64+ messages in thread
From: Tetsuo Handa @ 2020-01-12 11:03 UTC (permalink / raw)
To: Vlastimil Babka, Yu Zhao, Christopher Lameter
Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim,
Kirill A . Shutemov, linux-mm, linux-kernel, Kirill A . Shutemov
On 2020/01/10 23:11, Vlastimil Babka wrote:
> On 12/7/19 11:03 PM, Yu Zhao wrote:
>> On Mon, Dec 02, 2019 at 03:12:20PM +0000, Christopher Lameter wrote:
>>> On Sat, 30 Nov 2019, Andrew Morton wrote:
>>>
>>>>> Perform the allocation in free_partial() before the list_lock is taken.
>>>>
>>>> No response here? It looks a lot simpler than the originally proposed
>>>> patch?
>>>
>>> Yup. I prefer this one but its my own patch so I cannot Ack this.
>>
>> Hi, there is a pending question from Tetsuo-san. I'd be happy to ack
>> once it's address.
>
> Tetsuo's mails don't reach linux-mm for a while and he has given up
> trying to do something about it. It's hard to discuss anything outside
> the direct CC group then. I don't know what's the pending question, for
> example.
>
Hmm, this one? Even non-ML destinations are sometimes rejected (e.g.
554 5.7.1 Service unavailable; Client host [202.181.97.72] blocked using b.barracudacentral.org; http://www.barracudanetworks.com/reputation/?pr=1&ip=202.181.97.72
). Anyway, I just worried whether it is really safe to do memory allocation
which might involve memory reclaim. You MM guys know better...
-------- Forwarded Message --------
Subject: Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
Message-ID: <54b6c6a1-f9e4-5002-c828-3084c5203489@i-love.sakura.ne.jp>
Date: Sun, 1 Dec 2019 10:17:38 +0900
On 2019/12/01 8:09, Andrew Morton wrote:
>> Perform the allocation in free_partial() before the list_lock is taken.
>
> No response here? It looks a lot simpler than the originally proposed
> patch?
>
>> --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
>> +++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000
>> @@ -3690,14 +3690,15 @@ error:
>> }
>>
>> static void list_slab_objects(struct kmem_cache *s, struct page *page,
>> - const char *text)
>> + const char *text, unsigned long *map)
>> {
>> #ifdef CONFIG_SLUB_DEBUG
>> void *addr = page_address(page);
>> void *p;
>> - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
Changing from !(__GFP_IO | __GFP_FS) allocation to
>> +
>> if (!map)
>> return;
>> +
>> slab_err(s, page, text, s->name);
>> slab_lock(page);
>>
>> @@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac
>> {
>> LIST_HEAD(discard);
>> struct page *page, *h;
>> + unsigned long *map = NULL;
>> +
>> +#ifdef CONFIG_SLUB_DEBUG
>> + map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
__GFP_IO | __GFP_FS allocation.
How is this path guaranteed to be safe to perform __GFP_IO | __GFP_FS reclaim?
>> +#endif
>>
>> BUG_ON(irqs_disabled());
>> spin_lock_irq(&n->list_lock);
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
2020-01-12 11:03 ` Tetsuo Handa
@ 2020-01-13 1:34 ` Christopher Lameter
0 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2020-01-13 1:34 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Vlastimil Babka, Yu Zhao, Andrew Morton, Pekka Enberg,
David Rientjes, Joonsoo Kim, Kirill A . Shutemov, linux-mm,
linux-kernel, Kirill A . Shutemov
On Sun, 12 Jan 2020, Tetsuo Handa wrote:
> On 2020/01/10 23:11, Vlastimil Babka wrote:
> Hmm, this one? Even non-ML destinations are sometimes rejected (e.g.
> 554 5.7.1 Service unavailable; Client host [202.181.97.72] blocked using b.barracudacentral.org; http://www.barracudanetworks.com/reputation/?pr=1&ip=202.181.97.72
> ). Anyway, I just worried whether it is really safe to do memory allocation
> which might involve memory reclaim. You MM guys know better...
We are talking about a call to destroy the kmem_cache. This is not done
under any lock. The lock was taken inside that function before the call to
list_slab_objects. That can be avoided.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
@ 2020-01-13 1:34 ` Christopher Lameter
0 siblings, 0 replies; 64+ messages in thread
From: Christopher Lameter @ 2020-01-13 1:34 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Vlastimil Babka, Yu Zhao, Andrew Morton, Pekka Enberg,
David Rientjes, Joonsoo Kim, Kirill A . Shutemov, linux-mm,
linux-kernel, Kirill A . Shutemov
On Sun, 12 Jan 2020, Tetsuo Handa wrote:
> On 2020/01/10 23:11, Vlastimil Babka wrote:
> Hmm, this one? Even non-ML destinations are sometimes rejected (e.g.
> 554 5.7.1 Service unavailable; Client host [202.181.97.72] blocked using b.barracudacentral.org; http://www.barracudanetworks.com/reputation/?pr=1&ip=202.181.97.72
> ). Anyway, I just worried whether it is really safe to do memory allocation
> which might involve memory reclaim. You MM guys know better...
We are talking about a call to destroy the kmem_cache. This is not done
under any lock. The lock was taken inside that function before the call to
list_slab_objects. That can be avoided.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
2019-11-11 15:47 ` Christopher Lameter
@ 2019-11-11 18:15 ` Shakeel Butt
-1 siblings, 0 replies; 64+ messages in thread
From: Shakeel Butt @ 2019-11-11 18:15 UTC (permalink / raw)
To: Christopher Lameter, Roman Gushchin
Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa, Linux MM, LKML,
Kirill A . Shutemov
+Roman Gushchin
On Mon, Nov 11, 2019 at 7:47 AM Christopher Lameter <cl@linux.com> wrote:
>
> On Sun, 10 Nov 2019, Yu Zhao wrote:
>
> > On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote:
> > > On Sat, 9 Nov 2019, Yu Zhao wrote:
> > >
> > > > > struct page *page, *h;
> > > > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > > > > +
> > > > > + if (!map)
> > > > > + return;
> > > >
> > > > What would happen if we are trying to allocate from the slab that is
> > > > being shut down? And shouldn't the allocation be conditional (i.e.,
> > > > only when CONFIG_SLUB_DEBUG=y)?
> > >
> > > Kmalloc slabs are never shut down.
> >
> > Maybe I'm not thinking straight -- isn't it what caused the deadlock in
> > the first place?
>
> Well if kmalloc allocations become a problem then we have numerous
> issues all over the kernel to fix.
>
> > Kmalloc slabs can be shut down when memcg is on.
>
> Kmalloc needs to work even during shutdown of a memcg.
>
> Maybe we need to fix memcg to not allocate from the current memcg during
> shutdown?
>
>
Roman recently added reparenting of memcg kmem caches on memcg offline
and can comment in more detail but we don't shutdown a kmem cache
until all the in-fly memcg allocations are resolved. Also the
allocation here does not look like a __GFP_ACCOUNT allocation.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock
@ 2019-11-11 18:15 ` Shakeel Butt
0 siblings, 0 replies; 64+ messages in thread
From: Shakeel Butt @ 2019-11-11 18:15 UTC (permalink / raw)
To: Christopher Lameter, Roman Gushchin
Cc: Yu Zhao, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Kirill A . Shutemov, Tetsuo Handa, Linux MM, LKML,
Kirill A . Shutemov
+Roman Gushchin
On Mon, Nov 11, 2019 at 7:47 AM Christopher Lameter <cl@linux.com> wrote:
>
> On Sun, 10 Nov 2019, Yu Zhao wrote:
>
> > On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote:
> > > On Sat, 9 Nov 2019, Yu Zhao wrote:
> > >
> > > > > struct page *page, *h;
> > > > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > > > > +
> > > > > + if (!map)
> > > > > + return;
> > > >
> > > > What would happen if we are trying to allocate from the slab that is
> > > > being shut down? And shouldn't the allocation be conditional (i.e.,
> > > > only when CONFIG_SLUB_DEBUG=y)?
> > >
> > > Kmalloc slabs are never shut down.
> >
> > Maybe I'm not thinking straight -- isn't it what caused the deadlock in
> > the first place?
>
> Well if kmalloc allocations become a problem then we have numerous
> issues all over the kernel to fix.
>
> > Kmalloc slabs can be shut down when memcg is on.
>
> Kmalloc needs to work even during shutdown of a memcg.
>
> Maybe we need to fix memcg to not allocate from the current memcg during
> shutdown?
>
>
Roman recently added reparenting of memcg kmem caches on memcg offline
and can comment in more detail but we don't shutdown a kmem cache
until all the in-fly memcg allocations are resolved. Also the
allocation here does not look like a __GFP_ACCOUNT allocation.
^ permalink raw reply [flat|nested] 64+ messages in thread