* [PATCH v4] kmemleak: survive in a low-memory situation @ 2019-03-27 0:59 Qian Cai 2019-03-27 8:44 ` Michal Hocko 2019-03-28 6:05 ` Pekka Enberg 0 siblings, 2 replies; 22+ messages in thread From: Qian Cai @ 2019-03-27 0:59 UTC (permalink / raw) To: akpm Cc: catalin.marinas, cl, mhocko, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel, Qian Cai Kmemleak could quickly fail to allocate an object structure and then disable itself below in a low-memory situation. For example, running a mmap() workload triggering swapping and OOM. This is especially problematic for running things like LTP testsuite where one OOM test case would disable the whole kmemleak and render the rest of test cases without kmemleak watching for leaking. Kmemleak allocation could fail even though the tracked memory is succeeded. Hence, it could still try to start a direct reclaim if it is not executed in an atomic context (spinlock, irq-handler etc), or a high-priority allocation in an atomic context as a last-ditch effort. Since kmemleak is a debug feature, it is unlikely to be used in production that memory resources is scarce where direct reclaim or high-priority atomic allocations should not be granted lightly. Unless there is a brave soul to reimplement the kmemleak to embed it's metadata into the tracked memory itself in a foreseeable future, this provides a good balance between enabling kmemleak in a low-memory situation and not introducing too much hackiness into the existing code for now. Another approach is to fail back the original allocation once kmemleak_alloc() failed, but there are too many call sites to deal with which makes it error-prone. kmemleak: Cannot allocate a kmemleak_object structure kmemleak: Kernel memory leak detector disabled kmemleak: Automatic memory scanning thread ended RIP: 0010:__alloc_pages_nodemask+0x242a/0x2ab0 Call Trace: allocate_slab+0x4d9/0x930 new_slab+0x46/0x70 ___slab_alloc+0x5d3/0x9c0 __slab_alloc+0x12/0x20 kmem_cache_alloc+0x30a/0x360 create_object+0x96/0x9a0 kmemleak_alloc+0x71/0xa0 kmem_cache_alloc+0x254/0x360 mempool_alloc_slab+0x3f/0x60 mempool_alloc+0x120/0x329 bio_alloc_bioset+0x1a8/0x510 get_swap_bio+0x107/0x470 __swap_writepage+0xab4/0x1650 swap_writepage+0x86/0xe0 Signed-off-by: Qian Cai <cai@lca.pw> --- v4: Update the commit log. Fix a typo in comments per Christ. Consolidate the allocation. v3: Update the commit log. Simplify the code inspired by graph_trace_open() from ftrace. v2: Remove the needless checking for NULL objects in slab_post_alloc_hook() per Catalin. mm/kmemleak.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/mm/kmemleak.c b/mm/kmemleak.c index a2d894d3de07..7f4545ab1f84 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -580,7 +580,16 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, struct rb_node **link, *rb_parent; unsigned long untagged_ptr; - object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); + /* + * The tracked memory was allocated successful, if the kmemleak object + * failed to allocate for some reasons, it ends up with the whole + * kmemleak disabled, so try it harder. + */ + gfp = (in_atomic() || irqs_disabled()) ? + gfp_kmemleak_mask(gfp) | GFP_ATOMIC : + gfp_kmemleak_mask(gfp) | __GFP_DIRECT_RECLAIM; + + object = kmem_cache_alloc(object_cache, gfp); if (!object) { pr_warn("Cannot allocate a kmemleak_object structure\n"); kmemleak_disable(); -- 2.17.2 (Apple Git-113) ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 0:59 [PATCH v4] kmemleak: survive in a low-memory situation Qian Cai @ 2019-03-27 8:44 ` Michal Hocko 2019-03-27 11:34 ` Qian Cai 2019-03-27 17:29 ` Catalin Marinas 2019-03-28 6:05 ` Pekka Enberg 1 sibling, 2 replies; 22+ messages in thread From: Michal Hocko @ 2019-03-27 8:44 UTC (permalink / raw) To: Qian Cai Cc: akpm, catalin.marinas, cl, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Tue 26-03-19 20:59:48, Qian Cai wrote: [...] > Unless there is a brave soul to reimplement the kmemleak to embed it's > metadata into the tracked memory itself in a foreseeable future, this > provides a good balance between enabling kmemleak in a low-memory > situation and not introducing too much hackiness into the existing > code for now. Another approach is to fail back the original allocation > once kmemleak_alloc() failed, but there are too many call sites to > deal with which makes it error-prone. As long as there is an implicit __GFP_NOFAIL then kmemleak is simply broken no matter what other gfp flags you play with. Has anybody looked at some sort of preallocation where gfpflags_allow_blocking context allocate objects into a pool that non-sleeping allocations can eat from? > kmemleak: Cannot allocate a kmemleak_object structure > kmemleak: Kernel memory leak detector disabled > kmemleak: Automatic memory scanning thread ended > RIP: 0010:__alloc_pages_nodemask+0x242a/0x2ab0 > Call Trace: > allocate_slab+0x4d9/0x930 > new_slab+0x46/0x70 > ___slab_alloc+0x5d3/0x9c0 > __slab_alloc+0x12/0x20 > kmem_cache_alloc+0x30a/0x360 > create_object+0x96/0x9a0 > kmemleak_alloc+0x71/0xa0 > kmem_cache_alloc+0x254/0x360 > mempool_alloc_slab+0x3f/0x60 > mempool_alloc+0x120/0x329 > bio_alloc_bioset+0x1a8/0x510 > get_swap_bio+0x107/0x470 > __swap_writepage+0xab4/0x1650 > swap_writepage+0x86/0xe0 > > Signed-off-by: Qian Cai <cai@lca.pw> > --- > > v4: Update the commit log. > Fix a typo in comments per Christ. > Consolidate the allocation. > v3: Update the commit log. > Simplify the code inspired by graph_trace_open() from ftrace. > v2: Remove the needless checking for NULL objects in slab_post_alloc_hook() > per Catalin. > > mm/kmemleak.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/mm/kmemleak.c b/mm/kmemleak.c > index a2d894d3de07..7f4545ab1f84 100644 > --- a/mm/kmemleak.c > +++ b/mm/kmemleak.c > @@ -580,7 +580,16 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, > struct rb_node **link, *rb_parent; > unsigned long untagged_ptr; > > - object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); > + /* > + * The tracked memory was allocated successful, if the kmemleak object > + * failed to allocate for some reasons, it ends up with the whole > + * kmemleak disabled, so try it harder. > + */ > + gfp = (in_atomic() || irqs_disabled()) ? > + gfp_kmemleak_mask(gfp) | GFP_ATOMIC : > + gfp_kmemleak_mask(gfp) | __GFP_DIRECT_RECLAIM; The comment for in_atomic says: * Are we running in atomic context? WARNING: this macro cannot * always detect atomic context; in particular, it cannot know about * held spinlocks in non-preemptible kernels. Thus it should not be * used in the general case to determine whether sleeping is possible. * Do not use in_atomic() in driver code. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 8:44 ` Michal Hocko @ 2019-03-27 11:34 ` Qian Cai 2019-03-27 11:44 ` Michal Hocko 2019-03-27 17:29 ` Catalin Marinas 1 sibling, 1 reply; 22+ messages in thread From: Qian Cai @ 2019-03-27 11:34 UTC (permalink / raw) To: Michal Hocko Cc: akpm, catalin.marinas, cl, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On 3/27/19 4:44 AM, Michal Hocko wrote: >> diff --git a/mm/kmemleak.c b/mm/kmemleak.c >> index a2d894d3de07..7f4545ab1f84 100644 >> --- a/mm/kmemleak.c >> +++ b/mm/kmemleak.c >> @@ -580,7 +580,16 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, >> struct rb_node **link, *rb_parent; >> unsigned long untagged_ptr; >> >> - object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); >> + /* >> + * The tracked memory was allocated successful, if the kmemleak object >> + * failed to allocate for some reasons, it ends up with the whole >> + * kmemleak disabled, so try it harder. >> + */ >> + gfp = (in_atomic() || irqs_disabled()) ? >> + gfp_kmemleak_mask(gfp) | GFP_ATOMIC : >> + gfp_kmemleak_mask(gfp) | __GFP_DIRECT_RECLAIM; > > > The comment for in_atomic says: > * Are we running in atomic context? WARNING: this macro cannot > * always detect atomic context; in particular, it cannot know about > * held spinlocks in non-preemptible kernels. Thus it should not be > * used in the general case to determine whether sleeping is possible. > * Do not use in_atomic() in driver code. That is why it needs both in_atomic() and irqs_disabled(), so irqs_disabled() can detect kernel functions held spinlocks even in non-preemptible kernels. According to [1], "This [2] is useful if you know that the data in question is only ever manipulated from a "process context", ie no interrupts involved." Since kmemleak only deal with kernel context, if a spinlock was held, it always has local interrupt disabled. ftrace is in the same boat where this commit was merged a while back that has the same check. ef99b88b16be tracing: Handle ftrace_dump() atomic context in graph_trace_open() [1] https://www.kernel.org/doc/Documentation/locking/spinlocks.txt [2] spin_lock(&lock); ... spin_unlock(&lock); ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 11:34 ` Qian Cai @ 2019-03-27 11:44 ` Michal Hocko 2019-03-27 13:05 ` Qian Cai 0 siblings, 1 reply; 22+ messages in thread From: Michal Hocko @ 2019-03-27 11:44 UTC (permalink / raw) To: Qian Cai Cc: akpm, catalin.marinas, cl, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Wed 27-03-19 07:34:32, Qian Cai wrote: > On 3/27/19 4:44 AM, Michal Hocko wrote: > >> diff --git a/mm/kmemleak.c b/mm/kmemleak.c > >> index a2d894d3de07..7f4545ab1f84 100644 > >> --- a/mm/kmemleak.c > >> +++ b/mm/kmemleak.c > >> @@ -580,7 +580,16 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, > >> struct rb_node **link, *rb_parent; > >> unsigned long untagged_ptr; > >> > >> - object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); > >> + /* > >> + * The tracked memory was allocated successful, if the kmemleak object > >> + * failed to allocate for some reasons, it ends up with the whole > >> + * kmemleak disabled, so try it harder. > >> + */ > >> + gfp = (in_atomic() || irqs_disabled()) ? > >> + gfp_kmemleak_mask(gfp) | GFP_ATOMIC : > >> + gfp_kmemleak_mask(gfp) | __GFP_DIRECT_RECLAIM; > > > > > > The comment for in_atomic says: > > * Are we running in atomic context? WARNING: this macro cannot > > * always detect atomic context; in particular, it cannot know about > > * held spinlocks in non-preemptible kernels. Thus it should not be > > * used in the general case to determine whether sleeping is possible. > > * Do not use in_atomic() in driver code. > > That is why it needs both in_atomic() and irqs_disabled(), so irqs_disabled() > can detect kernel functions held spinlocks even in non-preemptible kernels. > > According to [1], > > "This [2] is useful if you know that the data in question is only ever > manipulated from a "process context", ie no interrupts involved." > > Since kmemleak only deal with kernel context, if a spinlock was held, it always > has local interrupt disabled. What? Normal spin lock implementation doesn't disable interrupts. So either I misunderstand what you are saying or you seem to be confused. the thing is that in_atomic relies on preempt_count to work properly and if you have CONFIG_PREEMPT_COUNT=n then you simply never know whether preemption is disabled so you do not know that a spin_lock is held. irqs_disabled on the other hand checks whether arch specific flag for IRQs handling is set (or cleared). So you would only catch irq safe spin locks with the above check. > ftrace is in the same boat where this commit was merged a while back that has > the same check. > > ef99b88b16be > tracing: Handle ftrace_dump() atomic context in graph_trace_open() > > [1] https://www.kernel.org/doc/Documentation/locking/spinlocks.txt > [2] > spin_lock(&lock); > ... > spin_unlock(&lock); -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 11:44 ` Michal Hocko @ 2019-03-27 13:05 ` Qian Cai 2019-03-27 13:17 ` Michal Hocko 0 siblings, 1 reply; 22+ messages in thread From: Qian Cai @ 2019-03-27 13:05 UTC (permalink / raw) To: Michal Hocko Cc: akpm, catalin.marinas, cl, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On 3/27/19 7:44 AM, Michal Hocko wrote> What? Normal spin lock implementation doesn't disable interrupts. So > either I misunderstand what you are saying or you seem to be confused. > the thing is that in_atomic relies on preempt_count to work properly and > if you have CONFIG_PREEMPT_COUNT=n then you simply never know whether > preemption is disabled so you do not know that a spin_lock is held. > irqs_disabled on the other hand checks whether arch specific flag for > IRQs handling is set (or cleared). So you would only catch irq safe spin > locks with the above check. Exactly, because kmemleak_alloc() is only called in a few call sites, slab allocation, neigh_hash_alloc(), alloc_page_ext(), sg_kmalloc(), early_amd_iommu_init() and blk_mq_alloc_rqs(), my review does not yield any of those holding irq unsafe spinlocks. Could future code changes suddenly call kmemleak_alloc() with a irq unsafe spinlock held? Always possible, but it is unlikely to happen. I could put some comments on kmemleak_alloc() about this though. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 13:05 ` Qian Cai @ 2019-03-27 13:17 ` Michal Hocko 0 siblings, 0 replies; 22+ messages in thread From: Michal Hocko @ 2019-03-27 13:17 UTC (permalink / raw) To: Qian Cai Cc: akpm, catalin.marinas, cl, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Wed 27-03-19 09:05:31, Qian Cai wrote: > On 3/27/19 7:44 AM, Michal Hocko wrote> What? Normal spin lock implementation > doesn't disable interrupts. So > > either I misunderstand what you are saying or you seem to be confused. > > the thing is that in_atomic relies on preempt_count to work properly and > > if you have CONFIG_PREEMPT_COUNT=n then you simply never know whether > > preemption is disabled so you do not know that a spin_lock is held. > > irqs_disabled on the other hand checks whether arch specific flag for > > IRQs handling is set (or cleared). So you would only catch irq safe spin > > locks with the above check. > > Exactly, because kmemleak_alloc() is only called in a few call sites, slab > allocation, neigh_hash_alloc(), alloc_page_ext(), sg_kmalloc(), > early_amd_iommu_init() and blk_mq_alloc_rqs(), my review does not yield any of > those holding irq unsafe spinlocks. I do not understand. What about a regular kmalloc(GFP_NOWAIT) callers with a simple spinlock held? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 8:44 ` Michal Hocko 2019-03-27 11:34 ` Qian Cai @ 2019-03-27 17:29 ` Catalin Marinas 2019-03-27 18:02 ` Qian Cai ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: Catalin Marinas @ 2019-03-27 17:29 UTC (permalink / raw) To: Michal Hocko Cc: Qian Cai, akpm, cl, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Wed, Mar 27, 2019 at 09:44:32AM +0100, Michal Hocko wrote: > On Tue 26-03-19 20:59:48, Qian Cai wrote: > [...] > > Unless there is a brave soul to reimplement the kmemleak to embed it's > > metadata into the tracked memory itself in a foreseeable future, Revisiting the kmemleak memory scanning code, that's not actually possible without some long periods with kmemleak_lock held. The scanner relies on the kmemleak_object (refcounted) being around even when the actual memory block has been freed. > > this > > provides a good balance between enabling kmemleak in a low-memory > > situation and not introducing too much hackiness into the existing > > code for now. Another approach is to fail back the original allocation > > once kmemleak_alloc() failed, but there are too many call sites to > > deal with which makes it error-prone. > > As long as there is an implicit __GFP_NOFAIL then kmemleak is simply > broken no matter what other gfp flags you play with. Has anybody looked > at some sort of preallocation where gfpflags_allow_blocking context > allocate objects into a pool that non-sleeping allocations can eat from? Quick attempt below and it needs some more testing (pretty random pick of the EMERGENCY_POOL_SIZE value). Also, with __GFP_NOFAIL removed, are the other flags safe or we should trim them further? ---------------8<------------------------------- From dc4194539f8191bb754901cea74c86e7960886f8 Mon Sep 17 00:00:00 2001 From: Catalin Marinas <catalin.marinas@arm.com> Date: Wed, 27 Mar 2019 17:20:57 +0000 Subject: [PATCH] mm: kmemleak: Add an emergency allocation pool for kmemleak objects This patch adds an emergency pool for struct kmemleak_object in case the normal kmem_cache_alloc() fails under the gfp constraints passed by the slab allocation caller. The patch also removes __GFP_NOFAIL which does not play well with other gfp flags (introduced by commit d9570ee3bd1d, "kmemleak: allow to coexist with fault injection"). Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- mm/kmemleak.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 57 insertions(+), 2 deletions(-) diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 6c318f5ac234..366a680cff7c 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -127,7 +127,7 @@ /* GFP bitmask for kmemleak internal allocations */ #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ __GFP_NORETRY | __GFP_NOMEMALLOC | \ - __GFP_NOWARN | __GFP_NOFAIL) + __GFP_NOWARN) /* scanning area inside a memory block */ struct kmemleak_scan_area { @@ -191,11 +191,16 @@ struct kmemleak_object { #define HEX_ASCII 1 /* max number of lines to be printed */ #define HEX_MAX_LINES 2 +/* minimum emergency pool size */ +#define EMERGENCY_POOL_SIZE (NR_CPUS * 4) /* the list of all allocated objects */ static LIST_HEAD(object_list); /* the list of gray-colored objects (see color_gray comment below) */ static LIST_HEAD(gray_list); +/* emergency pool allocation */ +static LIST_HEAD(emergency_list); +static int emergency_pool_size; /* search tree for object boundaries */ static struct rb_root object_tree_root = RB_ROOT; /* rw_lock protecting the access to object_list and object_tree_root */ @@ -467,6 +472,43 @@ static int get_object(struct kmemleak_object *object) return atomic_inc_not_zero(&object->use_count); } +/* + * Emergency pool allocation and freeing. kmemleak_lock must not be held. + */ +static struct kmemleak_object *emergency_alloc(void) +{ + unsigned long flags; + struct kmemleak_object *object; + + write_lock_irqsave(&kmemleak_lock, flags); + object = list_first_entry_or_null(&emergency_list, typeof(*object), object_list); + if (object) { + list_del(&object->object_list); + emergency_pool_size--; + } + write_unlock_irqrestore(&kmemleak_lock, flags); + + return object; +} + +/* + * Return true if object added to the emergency pool, false otherwise. + */ +static bool emergency_free(struct kmemleak_object *object) +{ + unsigned long flags; + + if (emergency_pool_size >= EMERGENCY_POOL_SIZE) + return false; + + write_lock_irqsave(&kmemleak_lock, flags); + list_add(&object->object_list, &emergency_list); + emergency_pool_size++; + write_unlock_irqrestore(&kmemleak_lock, flags); + + return true; +} + /* * RCU callback to free a kmemleak_object. */ @@ -485,7 +527,8 @@ static void free_object_rcu(struct rcu_head *rcu) hlist_del(&area->node); kmem_cache_free(scan_area_cache, area); } - kmem_cache_free(object_cache, object); + if (!emergency_free(object)) + kmem_cache_free(object_cache, object); } /* @@ -577,6 +620,8 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, unsigned long untagged_ptr; object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); + if (!object) + object = emergency_alloc(); if (!object) { pr_warn("Cannot allocate a kmemleak_object structure\n"); kmemleak_disable(); @@ -2127,6 +2172,16 @@ void __init kmemleak_init(void) kmemleak_warning = 0; } } + + /* populate the emergency allocation pool */ + while (emergency_pool_size < EMERGENCY_POOL_SIZE) { + struct kmemleak_object *object; + + object = kmem_cache_alloc(object_cache, GFP_KERNEL); + if (!object) + break; + emergency_free(object); + } } /* ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 17:29 ` Catalin Marinas @ 2019-03-27 18:02 ` Qian Cai 2019-03-28 15:05 ` Catalin Marinas 2019-03-27 18:17 ` Michal Hocko 2019-03-27 18:21 ` Matthew Wilcox 2 siblings, 1 reply; 22+ messages in thread From: Qian Cai @ 2019-03-27 18:02 UTC (permalink / raw) To: Catalin Marinas, Michal Hocko Cc: akpm, cl, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On 3/27/19 1:29 PM, Catalin Marinas wrote: > From dc4194539f8191bb754901cea74c86e7960886f8 Mon Sep 17 00:00:00 2001 > From: Catalin Marinas <catalin.marinas@arm.com> > Date: Wed, 27 Mar 2019 17:20:57 +0000 > Subject: [PATCH] mm: kmemleak: Add an emergency allocation pool for kmemleak > objects > > This patch adds an emergency pool for struct kmemleak_object in case the > normal kmem_cache_alloc() fails under the gfp constraints passed by the > slab allocation caller. The patch also removes __GFP_NOFAIL which does > not play well with other gfp flags (introduced by commit d9570ee3bd1d, > "kmemleak: allow to coexist with fault injection"). > > Suggested-by: Michal Hocko <mhocko@kernel.org> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> It takes 2 runs of LTP oom01 tests to disable kmemleak. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 18:02 ` Qian Cai @ 2019-03-28 15:05 ` Catalin Marinas 2019-03-28 15:41 ` Qian Cai 0 siblings, 1 reply; 22+ messages in thread From: Catalin Marinas @ 2019-03-28 15:05 UTC (permalink / raw) To: Qian Cai Cc: Michal Hocko, akpm, cl, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Wed, Mar 27, 2019 at 02:02:27PM -0400, Qian Cai wrote: > On 3/27/19 1:29 PM, Catalin Marinas wrote: > > From dc4194539f8191bb754901cea74c86e7960886f8 Mon Sep 17 00:00:00 2001 > > From: Catalin Marinas <catalin.marinas@arm.com> > > Date: Wed, 27 Mar 2019 17:20:57 +0000 > > Subject: [PATCH] mm: kmemleak: Add an emergency allocation pool for kmemleak > > objects > > > > This patch adds an emergency pool for struct kmemleak_object in case the > > normal kmem_cache_alloc() fails under the gfp constraints passed by the > > slab allocation caller. The patch also removes __GFP_NOFAIL which does > > not play well with other gfp flags (introduced by commit d9570ee3bd1d, > > "kmemleak: allow to coexist with fault injection"). > > > > Suggested-by: Michal Hocko <mhocko@kernel.org> > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > > It takes 2 runs of LTP oom01 tests to disable kmemleak. What configuration are you using (number of CPUs, RAM)? I tried this on an arm64 guest under kvm with 4 CPUs and 512MB of RAM, together with fault injection on kmemleak_object cache and running oom01 several times without any failures. -- Catalin ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-28 15:05 ` Catalin Marinas @ 2019-03-28 15:41 ` Qian Cai 0 siblings, 0 replies; 22+ messages in thread From: Qian Cai @ 2019-03-28 15:41 UTC (permalink / raw) To: Catalin Marinas Cc: Michal Hocko, akpm, cl, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Thu, 2019-03-28 at 15:05 +0000, Catalin Marinas wrote: > > It takes 2 runs of LTP oom01 tests to disable kmemleak. > > What configuration are you using (number of CPUs, RAM)? I tried this on > an arm64 guest under kvm with 4 CPUs and 512MB of RAM, together with > fault injection on kmemleak_object cache and running oom01 several times > without any failures. Apparently, the CPUs are so fast and the disk is so slow (swapping). It ends up taking a long time for OOM to kick in. # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6126T CPU @ 2.60GHz Stepping: 4 CPU MHz: 3300.002 BogoMIPS: 5200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 19712K NUMA node0 CPU(s): 0-11,24-35 NUMA node1 CPU(s): 12-23,36-47 # free -m total used free shared buff/cache available Mem: 166206 31737 134063 33 406 133584 Swap: 4095 0 4095 # lspci | grep -i sata 00:11.5 SATA controller: Intel Corporation C620 Series Chipset Family SSATA Controller [AHCI mode] (rev 08) 00:17.0 SATA controller: Intel Corporation C620 Series Chipset Family SATA Controller [AHCI mode] (rev 08) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 17:29 ` Catalin Marinas 2019-03-27 18:02 ` Qian Cai @ 2019-03-27 18:17 ` Michal Hocko 2019-03-27 18:21 ` Matthew Wilcox 2 siblings, 0 replies; 22+ messages in thread From: Michal Hocko @ 2019-03-27 18:17 UTC (permalink / raw) To: Catalin Marinas Cc: Qian Cai, akpm, cl, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Wed 27-03-19 17:29:57, Catalin Marinas wrote: [...] > Quick attempt below and it needs some more testing (pretty random pick > of the EMERGENCY_POOL_SIZE value). Also, with __GFP_NOFAIL removed, are > the other flags safe or we should trim them further? I would be still careful about __GFP_NORETRY. I pressume that the primary purpose is to prevent from the OOM killer but this makes the allocation failure much more likely. So if anything __GFP_RETRY_MAYFAIL would suite better for that purpose. But I am not really sure that this is worth bothering. > ---------------8<------------------------------- > >From dc4194539f8191bb754901cea74c86e7960886f8 Mon Sep 17 00:00:00 2001 > From: Catalin Marinas <catalin.marinas@arm.com> > Date: Wed, 27 Mar 2019 17:20:57 +0000 > Subject: [PATCH] mm: kmemleak: Add an emergency allocation pool for kmemleak > objects > > This patch adds an emergency pool for struct kmemleak_object in case the > normal kmem_cache_alloc() fails under the gfp constraints passed by the > slab allocation caller. The patch also removes __GFP_NOFAIL which does > not play well with other gfp flags (introduced by commit d9570ee3bd1d, > "kmemleak: allow to coexist with fault injection"). Thank you! This is definitely a step into the right direction. Maybe the pool allocation logic will need some tuning - e.g. does it make sense to allocate into the pool from sleepable allocations - or is it sufficient to refill on free. Something for the real workloads to tell, I guess. > Suggested-by: Michal Hocko <mhocko@kernel.org> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > --- > mm/kmemleak.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 57 insertions(+), 2 deletions(-) > > diff --git a/mm/kmemleak.c b/mm/kmemleak.c > index 6c318f5ac234..366a680cff7c 100644 > --- a/mm/kmemleak.c > +++ b/mm/kmemleak.c > @@ -127,7 +127,7 @@ > /* GFP bitmask for kmemleak internal allocations */ > #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ > __GFP_NORETRY | __GFP_NOMEMALLOC | \ > - __GFP_NOWARN | __GFP_NOFAIL) > + __GFP_NOWARN) > > /* scanning area inside a memory block */ > struct kmemleak_scan_area { > @@ -191,11 +191,16 @@ struct kmemleak_object { > #define HEX_ASCII 1 > /* max number of lines to be printed */ > #define HEX_MAX_LINES 2 > +/* minimum emergency pool size */ > +#define EMERGENCY_POOL_SIZE (NR_CPUS * 4) > > /* the list of all allocated objects */ > static LIST_HEAD(object_list); > /* the list of gray-colored objects (see color_gray comment below) */ > static LIST_HEAD(gray_list); > +/* emergency pool allocation */ > +static LIST_HEAD(emergency_list); > +static int emergency_pool_size; > /* search tree for object boundaries */ > static struct rb_root object_tree_root = RB_ROOT; > /* rw_lock protecting the access to object_list and object_tree_root */ > @@ -467,6 +472,43 @@ static int get_object(struct kmemleak_object *object) > return atomic_inc_not_zero(&object->use_count); > } > > +/* > + * Emergency pool allocation and freeing. kmemleak_lock must not be held. > + */ > +static struct kmemleak_object *emergency_alloc(void) > +{ > + unsigned long flags; > + struct kmemleak_object *object; > + > + write_lock_irqsave(&kmemleak_lock, flags); > + object = list_first_entry_or_null(&emergency_list, typeof(*object), object_list); > + if (object) { > + list_del(&object->object_list); > + emergency_pool_size--; > + } > + write_unlock_irqrestore(&kmemleak_lock, flags); > + > + return object; > +} > + > +/* > + * Return true if object added to the emergency pool, false otherwise. > + */ > +static bool emergency_free(struct kmemleak_object *object) > +{ > + unsigned long flags; > + > + if (emergency_pool_size >= EMERGENCY_POOL_SIZE) > + return false; > + > + write_lock_irqsave(&kmemleak_lock, flags); > + list_add(&object->object_list, &emergency_list); > + emergency_pool_size++; > + write_unlock_irqrestore(&kmemleak_lock, flags); > + > + return true; > +} > + > /* > * RCU callback to free a kmemleak_object. > */ > @@ -485,7 +527,8 @@ static void free_object_rcu(struct rcu_head *rcu) > hlist_del(&area->node); > kmem_cache_free(scan_area_cache, area); > } > - kmem_cache_free(object_cache, object); > + if (!emergency_free(object)) > + kmem_cache_free(object_cache, object); > } > > /* > @@ -577,6 +620,8 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, > unsigned long untagged_ptr; > > object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); > + if (!object) > + object = emergency_alloc(); > if (!object) { > pr_warn("Cannot allocate a kmemleak_object structure\n"); > kmemleak_disable(); > @@ -2127,6 +2172,16 @@ void __init kmemleak_init(void) > kmemleak_warning = 0; > } > } > + > + /* populate the emergency allocation pool */ > + while (emergency_pool_size < EMERGENCY_POOL_SIZE) { > + struct kmemleak_object *object; > + > + object = kmem_cache_alloc(object_cache, GFP_KERNEL); > + if (!object) > + break; > + emergency_free(object); > + } > } > > /* -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 17:29 ` Catalin Marinas 2019-03-27 18:02 ` Qian Cai 2019-03-27 18:17 ` Michal Hocko @ 2019-03-27 18:21 ` Matthew Wilcox 2019-03-28 14:59 ` Catalin Marinas 2 siblings, 1 reply; 22+ messages in thread From: Matthew Wilcox @ 2019-03-27 18:21 UTC (permalink / raw) To: Catalin Marinas Cc: Michal Hocko, Qian Cai, akpm, cl, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Wed, Mar 27, 2019 at 05:29:57PM +0000, Catalin Marinas wrote: > On Wed, Mar 27, 2019 at 09:44:32AM +0100, Michal Hocko wrote: > > As long as there is an implicit __GFP_NOFAIL then kmemleak is simply > > broken no matter what other gfp flags you play with. Has anybody looked > > at some sort of preallocation where gfpflags_allow_blocking context > > allocate objects into a pool that non-sleeping allocations can eat from? > > Quick attempt below and it needs some more testing (pretty random pick > of the EMERGENCY_POOL_SIZE value). Also, with __GFP_NOFAIL removed, are > the other flags safe or we should trim them further? Why not use mempool? > #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ > __GFP_NORETRY | __GFP_NOMEMALLOC | \ > - __GFP_NOWARN | __GFP_NOFAIL) > + __GFP_NOWARN) Why GFP_NORETRY? And if I have specified one of the other retry policies in my gfp flags, you should presumably clear that off before setting GFP_NORETRY. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 18:21 ` Matthew Wilcox @ 2019-03-28 14:59 ` Catalin Marinas 2019-03-28 15:24 ` Qian Cai 2019-03-29 12:02 ` Michal Hocko 0 siblings, 2 replies; 22+ messages in thread From: Catalin Marinas @ 2019-03-28 14:59 UTC (permalink / raw) To: Matthew Wilcox Cc: Michal Hocko, Qian Cai, akpm, cl, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Wed, Mar 27, 2019 at 11:21:58AM -0700, Matthew Wilcox wrote: > On Wed, Mar 27, 2019 at 05:29:57PM +0000, Catalin Marinas wrote: > > On Wed, Mar 27, 2019 at 09:44:32AM +0100, Michal Hocko wrote: > > > As long as there is an implicit __GFP_NOFAIL then kmemleak is simply > > > broken no matter what other gfp flags you play with. Has anybody looked > > > at some sort of preallocation where gfpflags_allow_blocking context > > > allocate objects into a pool that non-sleeping allocations can eat from? > > > > Quick attempt below and it needs some more testing (pretty random pick > > of the EMERGENCY_POOL_SIZE value). Also, with __GFP_NOFAIL removed, are > > the other flags safe or we should trim them further? > > Why not use mempool? I had the wrong impression that it could sleep but it's only if __GFP_DIRECT_RECLAIM is passed. See below for an updated patch. > > #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ > > __GFP_NORETRY | __GFP_NOMEMALLOC | \ > > - __GFP_NOWARN | __GFP_NOFAIL) > > + __GFP_NOWARN) > > Why GFP_NORETRY? And if I have specified one of the other retry policies > in my gfp flags, you should presumably clear that off before setting > GFP_NORETRY. It only preserves GFP_KERNEL|GFP_ATOMIC from the original flags while setting the NOWARN|NORETRY|NOMEMALLOC (the same flags seem to be set by mempool_alloc()). Anyway, with the changes below, I'll let mempool add the relevant flags while kmemleak only passes GFP_KERNEL|GFP_ATOMIC from the original caller. -----------------------8<------------------------------------------ From 09eba8f0235eb16409931e6aad77a45a12bedc82 Mon Sep 17 00:00:00 2001 From: Catalin Marinas <catalin.marinas@arm.com> Date: Thu, 28 Mar 2019 13:26:07 +0000 Subject: [PATCH] mm: kmemleak: Use mempool allocations for kmemleak objects This patch adds mempool allocations for struct kmemleak_object and kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() under memory pressure. The patch also masks out all the gfp flags passed to kmemleak other than GFP_KERNEL|GFP_ATOMIC. Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- mm/kmemleak.c | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-) diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 6c318f5ac234..9755678e83b9 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -82,6 +82,7 @@ #include <linux/kthread.h> #include <linux/rbtree.h> #include <linux/fs.h> +#include <linux/mempool.h> #include <linux/debugfs.h> #include <linux/seq_file.h> #include <linux/cpumask.h> @@ -125,9 +126,7 @@ #define BYTES_PER_POINTER sizeof(void *) /* GFP bitmask for kmemleak internal allocations */ -#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ - __GFP_NORETRY | __GFP_NOMEMALLOC | \ - __GFP_NOWARN | __GFP_NOFAIL) +#define gfp_kmemleak_mask(gfp) ((gfp) & (GFP_KERNEL | GFP_ATOMIC)) /* scanning area inside a memory block */ struct kmemleak_scan_area { @@ -191,6 +190,9 @@ struct kmemleak_object { #define HEX_ASCII 1 /* max number of lines to be printed */ #define HEX_MAX_LINES 2 +/* minimum memory pool sizes */ +#define MIN_OBJECT_POOL (NR_CPUS * 4) +#define MIN_SCAN_AREA_POOL (NR_CPUS * 1) /* the list of all allocated objects */ static LIST_HEAD(object_list); @@ -203,7 +205,9 @@ static DEFINE_RWLOCK(kmemleak_lock); /* allocation caches for kmemleak internal data */ static struct kmem_cache *object_cache; +static mempool_t *object_mempool; static struct kmem_cache *scan_area_cache; +static mempool_t *scan_area_mempool; /* set if tracing memory operations is enabled */ static int kmemleak_enabled; @@ -483,9 +487,9 @@ static void free_object_rcu(struct rcu_head *rcu) */ hlist_for_each_entry_safe(area, tmp, &object->area_list, node) { hlist_del(&area->node); - kmem_cache_free(scan_area_cache, area); + mempool_free(area, scan_area_mempool); } - kmem_cache_free(object_cache, object); + mempool_free(object, object_mempool); } /* @@ -576,7 +580,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, struct rb_node **link, *rb_parent; unsigned long untagged_ptr; - object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); + object = mempool_alloc(object_mempool, gfp_kmemleak_mask(gfp)); if (!object) { pr_warn("Cannot allocate a kmemleak_object structure\n"); kmemleak_disable(); @@ -640,7 +644,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, * be freed while the kmemleak_lock is held. */ dump_object_info(parent); - kmem_cache_free(object_cache, object); + mempool_free(object, object_mempool); object = NULL; goto out; } @@ -798,7 +802,7 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp) return; } - area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp)); + area = mempool_alloc(scan_area_mempool, gfp_kmemleak_mask(gfp)); if (!area) { pr_warn("Cannot allocate a scan area\n"); goto out; @@ -810,7 +814,7 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp) } else if (ptr + size > object->pointer + object->size) { kmemleak_warn("Scan area larger than object 0x%08lx\n", ptr); dump_object_info(object); - kmem_cache_free(scan_area_cache, area); + mempool_free(area, scan_area_mempool); goto out_unlock; } @@ -2049,6 +2053,18 @@ void __init kmemleak_init(void) object_cache = KMEM_CACHE(kmemleak_object, SLAB_NOLEAKTRACE); scan_area_cache = KMEM_CACHE(kmemleak_scan_area, SLAB_NOLEAKTRACE); + if (!object_cache || !scan_area_cache) { + kmemleak_disable(); + return; + } + object_mempool = mempool_create_slab_pool(MIN_OBJECT_POOL, + object_cache); + scan_area_mempool = mempool_create_slab_pool(MIN_SCAN_AREA_POOL, + scan_area_cache); + if (!object_mempool || !scan_area_mempool) { + kmemleak_disable(); + return; + } if (crt_early_log > ARRAY_SIZE(early_log)) pr_warn("Early log buffer exceeded (%d), please increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE\n", ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-28 14:59 ` Catalin Marinas @ 2019-03-28 15:24 ` Qian Cai 2019-03-29 12:02 ` Michal Hocko 1 sibling, 0 replies; 22+ messages in thread From: Qian Cai @ 2019-03-28 15:24 UTC (permalink / raw) To: Catalin Marinas, Matthew Wilcox Cc: Michal Hocko, akpm, cl, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Thu, 2019-03-28 at 14:59 +0000, Catalin Marinas wrote: 2 > +/* minimum memory pool sizes */ > +#define MIN_OBJECT_POOL (NR_CPUS * 4) > +#define MIN_SCAN_AREA_POOL (NR_CPUS * 1) I am thinking about making those are tunable, so people could have a big pool depends on their workloads. Also, I would like to see a way to refill this emergency pool. For example, once OOM kiler freed up more memory. Maybe at the time of kmemleak_scan kthread kicked in and then monitor and refill the pool whenever possible. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-28 14:59 ` Catalin Marinas 2019-03-28 15:24 ` Qian Cai @ 2019-03-29 12:02 ` Michal Hocko 2019-03-29 16:16 ` Catalin Marinas 1 sibling, 1 reply; 22+ messages in thread From: Michal Hocko @ 2019-03-29 12:02 UTC (permalink / raw) To: Catalin Marinas Cc: Matthew Wilcox, Qian Cai, akpm, cl, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Thu 28-03-19 14:59:17, Catalin Marinas wrote: [...] > >From 09eba8f0235eb16409931e6aad77a45a12bedc82 Mon Sep 17 00:00:00 2001 > From: Catalin Marinas <catalin.marinas@arm.com> > Date: Thu, 28 Mar 2019 13:26:07 +0000 > Subject: [PATCH] mm: kmemleak: Use mempool allocations for kmemleak objects > > This patch adds mempool allocations for struct kmemleak_object and > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > under memory pressure. The patch also masks out all the gfp flags passed > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. Using mempool allocator is better than inventing its own implementation but there is one thing to be slightly careful/worried about. This allocator expects that somebody will refill the pool in a finit time. Most users are OK with that because objects in flight are going to return in the pool in a relatively short time (think of an IO) but kmemleak is not guaranteed to comply with that AFAIU. Sure ephemeral allocations are happening all the time so there should be some churn in the pool all the time but if we go to an extreme where there is a serious memory leak then I suspect we might get stuck here without any way forward. Page/slab allocator would eventually back off even though small allocations never fail because a user context would get killed sooner or later but there is no fatal_signal_pending backoff in the mempool alloc path. Anyway, I believe this is a step in the right direction and should the above ever materializes as a relevant problem we can tune the mempool to backoff for _some_ callers or do something similar. Btw. there is kmemleak_update_trace call in mempool_alloc, is this ok for the kmemleak allocation path? > Suggested-by: Michal Hocko <mhocko@kernel.org> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > --- > mm/kmemleak.c | 34 +++++++++++++++++++++++++--------- > 1 file changed, 25 insertions(+), 9 deletions(-) > > diff --git a/mm/kmemleak.c b/mm/kmemleak.c > index 6c318f5ac234..9755678e83b9 100644 > --- a/mm/kmemleak.c > +++ b/mm/kmemleak.c > @@ -82,6 +82,7 @@ > #include <linux/kthread.h> > #include <linux/rbtree.h> > #include <linux/fs.h> > +#include <linux/mempool.h> > #include <linux/debugfs.h> > #include <linux/seq_file.h> > #include <linux/cpumask.h> > @@ -125,9 +126,7 @@ > #define BYTES_PER_POINTER sizeof(void *) > > /* GFP bitmask for kmemleak internal allocations */ > -#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ > - __GFP_NORETRY | __GFP_NOMEMALLOC | \ > - __GFP_NOWARN | __GFP_NOFAIL) > +#define gfp_kmemleak_mask(gfp) ((gfp) & (GFP_KERNEL | GFP_ATOMIC)) > > /* scanning area inside a memory block */ > struct kmemleak_scan_area { > @@ -191,6 +190,9 @@ struct kmemleak_object { > #define HEX_ASCII 1 > /* max number of lines to be printed */ > #define HEX_MAX_LINES 2 > +/* minimum memory pool sizes */ > +#define MIN_OBJECT_POOL (NR_CPUS * 4) > +#define MIN_SCAN_AREA_POOL (NR_CPUS * 1) > > /* the list of all allocated objects */ > static LIST_HEAD(object_list); > @@ -203,7 +205,9 @@ static DEFINE_RWLOCK(kmemleak_lock); > > /* allocation caches for kmemleak internal data */ > static struct kmem_cache *object_cache; > +static mempool_t *object_mempool; > static struct kmem_cache *scan_area_cache; > +static mempool_t *scan_area_mempool; > > /* set if tracing memory operations is enabled */ > static int kmemleak_enabled; > @@ -483,9 +487,9 @@ static void free_object_rcu(struct rcu_head *rcu) > */ > hlist_for_each_entry_safe(area, tmp, &object->area_list, node) { > hlist_del(&area->node); > - kmem_cache_free(scan_area_cache, area); > + mempool_free(area, scan_area_mempool); > } > - kmem_cache_free(object_cache, object); > + mempool_free(object, object_mempool); > } > > /* > @@ -576,7 +580,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, > struct rb_node **link, *rb_parent; > unsigned long untagged_ptr; > > - object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); > + object = mempool_alloc(object_mempool, gfp_kmemleak_mask(gfp)); > if (!object) { > pr_warn("Cannot allocate a kmemleak_object structure\n"); > kmemleak_disable(); > @@ -640,7 +644,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, > * be freed while the kmemleak_lock is held. > */ > dump_object_info(parent); > - kmem_cache_free(object_cache, object); > + mempool_free(object, object_mempool); > object = NULL; > goto out; > } > @@ -798,7 +802,7 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp) > return; > } > > - area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp)); > + area = mempool_alloc(scan_area_mempool, gfp_kmemleak_mask(gfp)); > if (!area) { > pr_warn("Cannot allocate a scan area\n"); > goto out; > @@ -810,7 +814,7 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp) > } else if (ptr + size > object->pointer + object->size) { > kmemleak_warn("Scan area larger than object 0x%08lx\n", ptr); > dump_object_info(object); > - kmem_cache_free(scan_area_cache, area); > + mempool_free(area, scan_area_mempool); > goto out_unlock; > } > > @@ -2049,6 +2053,18 @@ void __init kmemleak_init(void) > > object_cache = KMEM_CACHE(kmemleak_object, SLAB_NOLEAKTRACE); > scan_area_cache = KMEM_CACHE(kmemleak_scan_area, SLAB_NOLEAKTRACE); > + if (!object_cache || !scan_area_cache) { > + kmemleak_disable(); > + return; > + } > + object_mempool = mempool_create_slab_pool(MIN_OBJECT_POOL, > + object_cache); > + scan_area_mempool = mempool_create_slab_pool(MIN_SCAN_AREA_POOL, > + scan_area_cache); > + if (!object_mempool || !scan_area_mempool) { > + kmemleak_disable(); > + return; > + } > > if (crt_early_log > ARRAY_SIZE(early_log)) > pr_warn("Early log buffer exceeded (%d), please increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE\n", -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-29 12:02 ` Michal Hocko @ 2019-03-29 16:16 ` Catalin Marinas 2019-04-01 20:12 ` Michal Hocko 0 siblings, 1 reply; 22+ messages in thread From: Catalin Marinas @ 2019-03-29 16:16 UTC (permalink / raw) To: Michal Hocko Cc: Matthew Wilcox, Qian Cai, akpm, cl, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Fri, Mar 29, 2019 at 01:02:37PM +0100, Michal Hocko wrote: > On Thu 28-03-19 14:59:17, Catalin Marinas wrote: > [...] > > >From 09eba8f0235eb16409931e6aad77a45a12bedc82 Mon Sep 17 00:00:00 2001 > > From: Catalin Marinas <catalin.marinas@arm.com> > > Date: Thu, 28 Mar 2019 13:26:07 +0000 > > Subject: [PATCH] mm: kmemleak: Use mempool allocations for kmemleak objects > > > > This patch adds mempool allocations for struct kmemleak_object and > > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > > under memory pressure. The patch also masks out all the gfp flags passed > > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > Using mempool allocator is better than inventing its own implementation > but there is one thing to be slightly careful/worried about. > > This allocator expects that somebody will refill the pool in a finit > time. Most users are OK with that because objects in flight are going > to return in the pool in a relatively short time (think of an IO) but > kmemleak is not guaranteed to comply with that AFAIU. Sure ephemeral > allocations are happening all the time so there should be some churn > in the pool all the time but if we go to an extreme where there is a > serious memory leak then I suspect we might get stuck here without any > way forward. Page/slab allocator would eventually back off even though > small allocations never fail because a user context would get killed > sooner or later but there is no fatal_signal_pending backoff in the > mempool alloc path. We could improve the mempool code slightly to refill itself (from some workqueue or during a mempool_alloc() which allows blocking) but it's really just a best effort for a debug tool under OOM conditions. It may be sufficient just to make the mempool size tunable (via /sys/kernel/debug/kmemleak). > Anyway, I believe this is a step in the right direction and should the > above ever materializes as a relevant problem we can tune the mempool > to backoff for _some_ callers or do something similar. > > Btw. there is kmemleak_update_trace call in mempool_alloc, is this ok > for the kmemleak allocation path? It's not a problem, maybe only a small overhead in searching an rbtree in kmemleak but it cannot find anything since the kmemleak metadata is not tracked. And this only happens if a normal allocation fails and takes an existing object from the pool. I thought about passing the mempool back into kmemleak and checking whether it's one of the two pools it uses but concluded that it's not worth it. -- Catalin ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-29 16:16 ` Catalin Marinas @ 2019-04-01 20:12 ` Michal Hocko 2019-04-05 16:43 ` Catalin Marinas 0 siblings, 1 reply; 22+ messages in thread From: Michal Hocko @ 2019-04-01 20:12 UTC (permalink / raw) To: Catalin Marinas Cc: Matthew Wilcox, Qian Cai, akpm, cl, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Fri 29-03-19 16:16:38, Catalin Marinas wrote: > On Fri, Mar 29, 2019 at 01:02:37PM +0100, Michal Hocko wrote: > > On Thu 28-03-19 14:59:17, Catalin Marinas wrote: > > [...] > > > >From 09eba8f0235eb16409931e6aad77a45a12bedc82 Mon Sep 17 00:00:00 2001 > > > From: Catalin Marinas <catalin.marinas@arm.com> > > > Date: Thu, 28 Mar 2019 13:26:07 +0000 > > > Subject: [PATCH] mm: kmemleak: Use mempool allocations for kmemleak objects > > > > > > This patch adds mempool allocations for struct kmemleak_object and > > > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > > > under memory pressure. The patch also masks out all the gfp flags passed > > > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > > > Using mempool allocator is better than inventing its own implementation > > but there is one thing to be slightly careful/worried about. > > > > This allocator expects that somebody will refill the pool in a finit > > time. Most users are OK with that because objects in flight are going > > to return in the pool in a relatively short time (think of an IO) but > > kmemleak is not guaranteed to comply with that AFAIU. Sure ephemeral > > allocations are happening all the time so there should be some churn > > in the pool all the time but if we go to an extreme where there is a > > serious memory leak then I suspect we might get stuck here without any > > way forward. Page/slab allocator would eventually back off even though > > small allocations never fail because a user context would get killed > > sooner or later but there is no fatal_signal_pending backoff in the > > mempool alloc path. > > We could improve the mempool code slightly to refill itself (from some > workqueue or during a mempool_alloc() which allows blocking) but it's > really just a best effort for a debug tool under OOM conditions. It may > be sufficient just to make the mempool size tunable (via > /sys/kernel/debug/kmemleak). The point I've tried to make is that you really have to fail at some point but mempool is fundamentally about non-failing as long as the allocation is sleepable. And we cannot really break that assumptions because existing users really depend on it. But as I've said I would try it out and see. This is just a debugging feature and I assume that a really fatal oom caused by a real memory leak would be detected sooner than the whole thing just blows up. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-04-01 20:12 ` Michal Hocko @ 2019-04-05 16:43 ` Catalin Marinas 0 siblings, 0 replies; 22+ messages in thread From: Catalin Marinas @ 2019-04-05 16:43 UTC (permalink / raw) To: Michal Hocko Cc: Matthew Wilcox, Qian Cai, akpm, cl, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Mon, Apr 01, 2019 at 10:12:01PM +0200, Michal Hocko wrote: > On Fri 29-03-19 16:16:38, Catalin Marinas wrote: > > On Fri, Mar 29, 2019 at 01:02:37PM +0100, Michal Hocko wrote: > > > On Thu 28-03-19 14:59:17, Catalin Marinas wrote: > > > [...] > > > > >From 09eba8f0235eb16409931e6aad77a45a12bedc82 Mon Sep 17 00:00:00 2001 > > > > From: Catalin Marinas <catalin.marinas@arm.com> > > > > Date: Thu, 28 Mar 2019 13:26:07 +0000 > > > > Subject: [PATCH] mm: kmemleak: Use mempool allocations for kmemleak objects > > > > > > > > This patch adds mempool allocations for struct kmemleak_object and > > > > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > > > > under memory pressure. The patch also masks out all the gfp flags passed > > > > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > > > > > Using mempool allocator is better than inventing its own implementation > > > but there is one thing to be slightly careful/worried about. > > > > > > This allocator expects that somebody will refill the pool in a finit > > > time. Most users are OK with that because objects in flight are going > > > to return in the pool in a relatively short time (think of an IO) but > > > kmemleak is not guaranteed to comply with that AFAIU. Sure ephemeral > > > allocations are happening all the time so there should be some churn > > > in the pool all the time but if we go to an extreme where there is a > > > serious memory leak then I suspect we might get stuck here without any > > > way forward. Page/slab allocator would eventually back off even though > > > small allocations never fail because a user context would get killed > > > sooner or later but there is no fatal_signal_pending backoff in the > > > mempool alloc path. > > > > We could improve the mempool code slightly to refill itself (from some > > workqueue or during a mempool_alloc() which allows blocking) but it's > > really just a best effort for a debug tool under OOM conditions. It may > > be sufficient just to make the mempool size tunable (via > > /sys/kernel/debug/kmemleak). > > The point I've tried to make is that you really have to fail at some > point but mempool is fundamentally about non-failing as long as the > allocation is sleepable. And we cannot really break that assumptions > because existing users really depend on it. But as I've said I would try > it out and see. This is just a debugging feature and I assume that a > really fatal oom caused by a real memory leak would be detected sooner > than the whole thing just blows up. I'll first push a patch to use mempool as it is, with a tunable size via /sys/kernel/debug/kmemleak. I think the better solution would be a rewrite of the metadata handling in kmemleak to embed it into the slab object (as per Pekka's suggestion). However, I'll be on holiday until the 15th, so cannot look into this. Thanks. -- Catalin ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-27 0:59 [PATCH v4] kmemleak: survive in a low-memory situation Qian Cai 2019-03-27 8:44 ` Michal Hocko @ 2019-03-28 6:05 ` Pekka Enberg 2019-03-28 10:30 ` Catalin Marinas 1 sibling, 1 reply; 22+ messages in thread From: Pekka Enberg @ 2019-03-28 6:05 UTC (permalink / raw) To: Qian Cai, akpm Cc: catalin.marinas, cl, mhocko, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel Hi, On 27/03/2019 2.59, Qian Cai wrote: > Unless there is a brave soul to reimplement the kmemleak to embed it's > metadata into the tracked memory itself in a foreseeable future, this > provides a good balance between enabling kmemleak in a low-memory > situation and not introducing too much hackiness into the existing > code for now. Unfortunately I am not that brave soul, but I'm wondering what the complication here is? It shouldn't be too hard to teach calculate_sizes() in SLUB about a new SLAB_KMEMLEAK flag that reserves spaces for the metadata. - Pekka ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-28 6:05 ` Pekka Enberg @ 2019-03-28 10:30 ` Catalin Marinas 2019-03-28 11:50 ` Pekka Enberg 0 siblings, 1 reply; 22+ messages in thread From: Catalin Marinas @ 2019-03-28 10:30 UTC (permalink / raw) To: Pekka Enberg Cc: Qian Cai, akpm, cl, mhocko, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Thu, Mar 28, 2019 at 08:05:31AM +0200, Pekka Enberg wrote: > On 27/03/2019 2.59, Qian Cai wrote: > > Unless there is a brave soul to reimplement the kmemleak to embed it's > > metadata into the tracked memory itself in a foreseeable future, this > > provides a good balance between enabling kmemleak in a low-memory > > situation and not introducing too much hackiness into the existing > > code for now. > > Unfortunately I am not that brave soul, but I'm wondering what the > complication here is? It shouldn't be too hard to teach calculate_sizes() in > SLUB about a new SLAB_KMEMLEAK flag that reserves spaces for the metadata. I don't think it's the calculate_sizes() that's the hard part. The way kmemleak is designed assumes that the metadata has a longer lifespan than the slab object it is tracking (and refcounted via get_object/put_object()). We'd have to replace some of the rcu_read_(un)lock() regions with a full kmemleak_lock together with a few more tweaks to allow the release of kmemleak_lock during memory scanning (which can take minutes; so it needs to be safe w.r.t. metadata freeing, currently relying on a deferred RCU freeing). Anyway, I think it is possible, just not straight forward. -- Catalin ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-28 10:30 ` Catalin Marinas @ 2019-03-28 11:50 ` Pekka Enberg 2019-03-28 15:28 ` Catalin Marinas 0 siblings, 1 reply; 22+ messages in thread From: Pekka Enberg @ 2019-03-28 11:50 UTC (permalink / raw) To: Catalin Marinas Cc: Qian Cai, akpm, cl, mhocko, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel Hi Catalin, On 27/03/2019 2.59, Qian Cai wrote: >>> Unless there is a brave soul to reimplement the kmemleak to embed it's >>> metadata into the tracked memory itself in a foreseeable future, this >>> provides a good balance between enabling kmemleak in a low-memory >>> situation and not introducing too much hackiness into the existing >>> code for now. On Thu, Mar 28, 2019 at 08:05:31AM +0200, Pekka Enberg wrote: >> Unfortunately I am not that brave soul, but I'm wondering what the >> complication here is? It shouldn't be too hard to teach calculate_sizes() in >> SLUB about a new SLAB_KMEMLEAK flag that reserves spaces for the metadata. On 28/03/2019 12.30, Catalin Marinas wrote:> I don't think it's the calculate_sizes() that's the hard part. The way > kmemleak is designed assumes that the metadata has a longer lifespan > than the slab object it is tracking (and refcounted via > get_object/put_object()). We'd have to replace some of the > rcu_read_(un)lock() regions with a full kmemleak_lock together with a > few more tweaks to allow the release of kmemleak_lock during memory > scanning (which can take minutes; so it needs to be safe w.r.t. metadata > freeing, currently relying on a deferred RCU freeing). Right. I think SLUB already supports delaying object freeing because of KASAN (see the slab_free_freelist_hook() function) so the issue with metadata outliving object is solvable (although will consume more memory). I can't say I remember enough details from kmemleak to comment on the locking complications you point out, though. - Pekka ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] kmemleak: survive in a low-memory situation 2019-03-28 11:50 ` Pekka Enberg @ 2019-03-28 15:28 ` Catalin Marinas 0 siblings, 0 replies; 22+ messages in thread From: Catalin Marinas @ 2019-03-28 15:28 UTC (permalink / raw) To: Pekka Enberg Cc: Qian Cai, akpm, cl, mhocko, willy, penberg, rientjes, iamjoonsoo.kim, linux-mm, linux-kernel On Thu, Mar 28, 2019 at 01:50:29PM +0200, Pekka Enberg wrote: > On 27/03/2019 2.59, Qian Cai wrote: > > > > Unless there is a brave soul to reimplement the kmemleak to embed it's > > > > metadata into the tracked memory itself in a foreseeable future, this > > > > provides a good balance between enabling kmemleak in a low-memory > > > > situation and not introducing too much hackiness into the existing > > > > code for now. > > On Thu, Mar 28, 2019 at 08:05:31AM +0200, Pekka Enberg wrote: > > > Unfortunately I am not that brave soul, but I'm wondering what the > > > complication here is? It shouldn't be too hard to teach calculate_sizes() in > > > SLUB about a new SLAB_KMEMLEAK flag that reserves spaces for the metadata. > > On 28/03/2019 12.30, Catalin Marinas wrote:> I don't think it's the > calculate_sizes() that's the hard part. The way > > kmemleak is designed assumes that the metadata has a longer lifespan > > than the slab object it is tracking (and refcounted via > > get_object/put_object()). We'd have to replace some of the > > rcu_read_(un)lock() regions with a full kmemleak_lock together with a > > few more tweaks to allow the release of kmemleak_lock during memory > > scanning (which can take minutes; so it needs to be safe w.r.t. metadata > > freeing, currently relying on a deferred RCU freeing). > > Right. > > I think SLUB already supports delaying object freeing because of KASAN (see > the slab_free_freelist_hook() function) so the issue with metadata outliving > object is solvable (although will consume more memory). Thanks. That's definitely an area to investigate. > I can't say I remember enough details from kmemleak to comment on the > locking complications you point out, though. They are not too bad, I'd just need some quiet couple of days to go through them (which I don't have at the moment). -- Catalin ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2019-04-05 16:43 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-27 0:59 [PATCH v4] kmemleak: survive in a low-memory situation Qian Cai 2019-03-27 8:44 ` Michal Hocko 2019-03-27 11:34 ` Qian Cai 2019-03-27 11:44 ` Michal Hocko 2019-03-27 13:05 ` Qian Cai 2019-03-27 13:17 ` Michal Hocko 2019-03-27 17:29 ` Catalin Marinas 2019-03-27 18:02 ` Qian Cai 2019-03-28 15:05 ` Catalin Marinas 2019-03-28 15:41 ` Qian Cai 2019-03-27 18:17 ` Michal Hocko 2019-03-27 18:21 ` Matthew Wilcox 2019-03-28 14:59 ` Catalin Marinas 2019-03-28 15:24 ` Qian Cai 2019-03-29 12:02 ` Michal Hocko 2019-03-29 16:16 ` Catalin Marinas 2019-04-01 20:12 ` Michal Hocko 2019-04-05 16:43 ` Catalin Marinas 2019-03-28 6:05 ` Pekka Enberg 2019-03-28 10:30 ` Catalin Marinas 2019-03-28 11:50 ` Pekka Enberg 2019-03-28 15:28 ` Catalin Marinas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).