WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258 __alloc_pages

All of lore.kernel.org
 help / color / mirror / Atom feed

* WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258 __alloc_pages_nodemask+0xa88/0xfec
@ 2018-05-26  7:14 Mathieu Malaterre
  2018-05-28  8:34 ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Mathieu Malaterre @ 2018-05-26  7:14 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm

Hi Michal,

For the last couple of days, I am seeing the following appearing in
dmesg (*). I am a happy kmemleak user on an oldish Mac Mini G4
(ppc32), it has been working great. What does this new warning checks:

    /*
     * All existing users of the __GFP_NOFAIL are blockable, so warn
     * of any new users that actually require GFP_NOWAIT
     */
    if (WARN_ON_ONCE(!can_direct_reclaim))
      goto fail;

Thanks,

(*)
[  269.038911] WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258
__alloc_pages_nodemask+0xa88/0xfec
[  269.038931] Modules linked in: ctr ccm uinput arc4 b43 bcma
mac80211 sha256_generic snd_aoa_codec_toonie cfg80211
snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus snd_aoa_soundbus snd_pcm
evdev snd_timer snd sg ssb soundcore usb_storage autofs4 ext4
crc32c_generic crc16 mbcache jbd2 fscrypto usbhid ohci_pci ohci_hcd
ehci_pci ehci_hcd usbcore firewire_ohci sd_mod sr_mod cdrom
firewire_core sungem sungem_phy crc_itu_t nls_base usb_common
[  269.039026] CPU: 0 PID: 21 Comm: kswapd0 Tainted: G        W
 4.17.0-rc6+ #18
[  269.039032] NIP:  c020e8f8 LR: c020e2e0 CTR: c020b514
[  269.039038] REGS: dde3b6a0 TRAP: 0700   Tainted: G        W
 (4.17.0-rc6+)
[  269.039042] MSR:  00021032 <ME,IR,DR,RI>  CR: 22224484  XER: 00000000
[  269.039056]
               GPR00: c020e2e0 dde3b750 df6ab480 00000000 00000001
00000000 00000004 00000040
               GPR08: 00000800 22224484 00000040 01ffffff 42224482
00000000 00000040 01011a00
               GPR16: c0a864bc 00000040 c0c5a730 01011a00 c0c2f5cc
00000000 00000000 c0c318c4
               GPR24: c0c6ba20 00000001 c0a84158 01011a00 c0ce065c
c0a84158 00000000 00000000
[  269.039118] NIP [c020e8f8] __alloc_pages_nodemask+0xa88/0xfec
[  269.039124] LR [c020e2e0] __alloc_pages_nodemask+0x470/0xfec
[  269.039128] Call Trace:
[  269.039136] [dde3b750] [c020e2e0]
__alloc_pages_nodemask+0x470/0xfec (unreliable)
[  269.039146] [dde3b820] [c0288c14] new_slab+0x53c/0x970
[  269.039155] [dde3b880] [c028b61c] ___slab_alloc.constprop.23+0x28c/0x468
[  269.039163] [dde3b920] [c028c754] kmem_cache_alloc+0x290/0x3dc
[  269.039177] [dde3b990] [c02a6030] create_object+0x50/0x3d0
[  269.039185] [dde3b9e0] [c028c7a8] kmem_cache_alloc+0x2e4/0x3dc
[  269.039193] [dde3ba50] [c0200f88] mempool_alloc+0x7c/0x164
[  269.039205] [dde3bab0] [c03e33c0] bio_alloc_bioset+0x130/0x298
[  269.039216] [dde3baf0] [c0278694] get_swap_bio+0x34/0xe8
[  269.039223] [dde3bb30] [c0278fb4] __swap_writepage+0x22c/0x644
[  269.039237] [dde3bbb0] [c022528c] pageout.isra.13+0x238/0x52c
[  269.039246] [dde3bc10] [c02288a0] shrink_page_list+0x9d4/0x1768
[  269.039254] [dde3bcb0] [c022a264] shrink_inactive_list+0x2c4/0xa34
[  269.039262] [dde3bd40] [c022b454] shrink_node_memcg+0x344/0xe34
[  269.039270] [dde3bde0] [c022c068] shrink_node+0x124/0x73c
[  269.039277] [dde3be50] [c022d78c] kswapd+0x318/0xb2c
[  269.039291] [dde3bf10] [c008e264] kthread+0x138/0x1f0
[  269.039300] [dde3bf40] [c001b2e4] ret_from_kernel_thread+0x5c/0x64
[  269.039304] Instruction dump:
[  269.039311] 7f44d378 7fa3eb78 4802bd95 4bfff9f4 485d7309 4bfff998
7f03c378 7fc5f378
[  269.039326] 7f44d378 4802bd79 7c781b78 4bfffd48 <0fe00000> 8081002c
3ca0c08b 7fe6fb78
[  269.039343] ---[ end trace c255e24f03e28d77 ]---
[  269.039351] kmemleak: Cannot allocate a kmemleak_object structure
[  269.039373] kmemleak: Kernel memory leak detector disabled
[  269.039412] kmemleak: Automatic memory scanning thread ended

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258 __alloc_pages_nodemask+0xa88/0xfec
  2018-05-26  7:14 WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258 __alloc_pages_nodemask+0xa88/0xfec Mathieu Malaterre
@ 2018-05-28  8:34 ` Michal Hocko
  2018-05-28 13:05   ` [PATCH] kmemleak: don't use __GFP_NOFAIL Tetsuo Handa
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-05-28  8:34 UTC (permalink / raw)
  To: Mathieu Malaterre; +Cc: linux-mm

On Sat 26-05-18 09:14:35, Mathieu Malaterre wrote:
> Hi Michal,
> 
> For the last couple of days, I am seeing the following appearing in
> dmesg (*). I am a happy kmemleak user on an oldish Mac Mini G4
> (ppc32), it has been working great. What does this new warning checks:
> 
>     /*
>      * All existing users of the __GFP_NOFAIL are blockable, so warn
>      * of any new users that actually require GFP_NOWAIT
>      */
>     if (WARN_ON_ONCE(!can_direct_reclaim))
>       goto fail;

Interesting. Where does this path get GFP_NOFAIL from? I am looking at
the current upstream code and 
get_swap_bio(GFP_NOIO)
  bio_alloc(gfp_mask)
    bio_alloc_bioset(gfp_mask)
      mempool_alloc(gfp_msk & ~__GFP_DIRECT_RECLAIM)

mempool_alloc does play some tricks with the gfp_mask but it doesn't add
GFP_NOFAIL AFAICS.
 
> Thanks,
> 
> (*)
> [  269.038911] WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258
> __alloc_pages_nodemask+0xa88/0xfec
[...]
> [  269.039118] NIP [c020e8f8] __alloc_pages_nodemask+0xa88/0xfec
> [  269.039124] LR [c020e2e0] __alloc_pages_nodemask+0x470/0xfec
> [  269.039128] Call Trace:
> [  269.039136] [dde3b750] [c020e2e0]  __alloc_pages_nodemask+0x470/0xfec (unreliable)
> [  269.039146] [dde3b820] [c0288c14] new_slab+0x53c/0x970
> [  269.039155] [dde3b880] [c028b61c] ___slab_alloc.constprop.23+0x28c/0x468
> [  269.039163] [dde3b920] [c028c754] kmem_cache_alloc+0x290/0x3dc
> [  269.039177] [dde3b990] [c02a6030] create_object+0x50/0x3d0
> [  269.039185] [dde3b9e0] [c028c7a8] kmem_cache_alloc+0x2e4/0x3dc
> [  269.039193] [dde3ba50] [c0200f88] mempool_alloc+0x7c/0x164
> [  269.039205] [dde3bab0] [c03e33c0] bio_alloc_bioset+0x130/0x298
> [  269.039216] [dde3baf0] [c0278694] get_swap_bio+0x34/0xe8
> [  269.039223] [dde3bb30] [c0278fb4] __swap_writepage+0x22c/0x644
> [  269.039237] [dde3bbb0] [c022528c] pageout.isra.13+0x238/0x52c
> [  269.039246] [dde3bc10] [c02288a0] shrink_page_list+0x9d4/0x1768
> [  269.039254] [dde3bcb0] [c022a264] shrink_inactive_list+0x2c4/0xa34
> [  269.039262] [dde3bd40] [c022b454] shrink_node_memcg+0x344/0xe34
> [  269.039270] [dde3bde0] [c022c068] shrink_node+0x124/0x73c
> [  269.039277] [dde3be50] [c022d78c] kswapd+0x318/0xb2c
> [  269.039291] [dde3bf10] [c008e264] kthread+0x138/0x1f0
> [  269.039300] [dde3bf40] [c001b2e4] ret_from_kernel_thread+0x5c/0x64
> [  269.039304] Instruction dump:
> [  269.039311] 7f44d378 7fa3eb78 4802bd95 4bfff9f4 485d7309 4bfff998 7f03c378 7fc5f378
> [  269.039326] 7f44d378 4802bd79 7c781b78 4bfffd48 <0fe00000> 8081002c 3ca0c08b 7fe6fb78
> [  269.039343] ---[ end trace c255e24f03e28d77 ]---
> [  269.039351] kmemleak: Cannot allocate a kmemleak_object structure
> [  269.039373] kmemleak: Kernel memory leak detector disabled
> [  269.039412] kmemleak: Automatic memory scanning thread ended

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-28  8:34 ` Michal Hocko
@ 2018-05-28 13:05   ` Tetsuo Handa
  2018-05-28 13:24     ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Tetsuo Handa @ 2018-05-28 13:05 UTC (permalink / raw)
  To: Michal Hocko, Mathieu Malaterre, Dmitry Vyukov; +Cc: linux-mm, Catalin Marinas



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-28 13:05   ` [PATCH] kmemleak: don't use __GFP_NOFAIL Tetsuo Handa
@ 2018-05-28 13:24     ` Michal Hocko
  2018-05-28 21:05       ` Tetsuo Handa
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-05-28 13:24 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Mathieu Malaterre, Dmitry Vyukov, linux-mm, Catalin Marinas, Chunyu Hu

I've found the previous report [1] finally. Adding Chunyu Hu to the CC
list. The report which triggered this one is [2]

[1] http://lkml.kernel.org/r/1524243513-29118-1-git-send-email-chuhu@redhat.com
[2] http://lkml.kernel.org/r/CA+7wUswp_Sr=hHqi1bwRZ3FE2wY5ozZWZ8Z1BgrFnSAmijUKjA@mail.gmail.com

I am not really familiar with the kmemleak code but the expectation that
you can make a forward progress in an unknown allocation context seems
broken to me. Why kmemleak cannot pre-allocate a pool of object_cache
and refill it from a reasonably strong contexts (e.g. in a sleepable
context)?

On Mon 28-05-18 22:05:21, Tetsuo Handa wrote:
> >From f0b7f6c2146f693fec6706bf9e3c34687c73f21a Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Mon, 28 May 2018 21:49:51 +0900
> Subject: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> Commit d9570ee3bd1d4f20 ("kmemleak: allow to coexist with fault injection")
> added __GFP_NOFAIL to gfp_kmemleak_mask() macro because memory allocation
> fault injection trivially disables kmemleak.
> 
> But since !__GFP_DIRECT_RECLAIM && __GFP_NOFAIL memory allocation is not
> supported, Mathieu Malaterre is observing warning messages upon
> mempool_alloc(gfp_msk & ~__GFP_DIRECT_RECLAIM) allocation request.
> 
> [  269.039118] NIP [c020e8f8] __alloc_pages_nodemask+0xa88/0xfec
> [  269.039124] LR [c020e2e0] __alloc_pages_nodemask+0x470/0xfec
> [  269.039128] Call Trace:
> [  269.039136] [dde3b750] [c020e2e0]  __alloc_pages_nodemask+0x470/0xfec (unreliable)
> [  269.039146] [dde3b820] [c0288c14] new_slab+0x53c/0x970
> [  269.039155] [dde3b880] [c028b61c] ___slab_alloc.constprop.23+0x28c/0x468
> [  269.039163] [dde3b920] [c028c754] kmem_cache_alloc+0x290/0x3dc
> [  269.039177] [dde3b990] [c02a6030] create_object+0x50/0x3d0
> [  269.039185] [dde3b9e0] [c028c7a8] kmem_cache_alloc+0x2e4/0x3dc
> [  269.039193] [dde3ba50] [c0200f88] mempool_alloc+0x7c/0x164
> [  269.039205] [dde3bab0] [c03e33c0] bio_alloc_bioset+0x130/0x298
> [  269.039216] [dde3baf0] [c0278694] get_swap_bio+0x34/0xe8
> [  269.039223] [dde3bb30] [c0278fb4] __swap_writepage+0x22c/0x644
> [  269.039237] [dde3bbb0] [c022528c] pageout.isra.13+0x238/0x52c
> [  269.039246] [dde3bc10] [c02288a0] shrink_page_list+0x9d4/0x1768
> [  269.039254] [dde3bcb0] [c022a264] shrink_inactive_list+0x2c4/0xa34
> [  269.039262] [dde3bd40] [c022b454] shrink_node_memcg+0x344/0xe34
> [  269.039270] [dde3bde0] [c022c068] shrink_node+0x124/0x73c
> [  269.039277] [dde3be50] [c022d78c] kswapd+0x318/0xb2c
> [  269.039291] [dde3bf10] [c008e264] kthread+0x138/0x1f0
> [  269.039300] [dde3bf40] [c001b2e4] ret_from_kernel_thread+0x5c/0x64
> 
> Since the intent of adding __GFP_NOFAIL is not to disable kmemleak by
> failing the N'th allocation request, it should be possible to workaround
> it by simply retrying N'th allocation request. Thus, this patch changes
> callers of gfp_kmemleak_mask() macro to retry for several times.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reported-by: Mathieu Malaterre <malat@debian.org>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Michal Hocko <mhocko@suse.com>
> ---
>  mm/kmemleak.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index 9a085d5..973998b 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -126,7 +126,7 @@
>  /* GFP bitmask for kmemleak internal allocations */
>  #define gfp_kmemleak_mask(gfp)	(((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
>  				 __GFP_NORETRY | __GFP_NOMEMALLOC | \
> -				 __GFP_NOWARN | __GFP_NOFAIL)
> +				 __GFP_NOWARN)
>  
>  /* scanning area inside a memory block */
>  struct kmemleak_scan_area {
> @@ -548,10 +548,12 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size,
>  					     int min_count, gfp_t gfp)
>  {
>  	unsigned long flags;
> -	struct kmemleak_object *object, *parent;
> +	struct kmemleak_object *object = NULL, *parent;
>  	struct rb_node **link, *rb_parent;
> +	unsigned int i;
>  
> -	object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
> +	for (i = 0; i < 10 && !object; i++)
> +		object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
>  	if (!object) {
>  		pr_warn("Cannot allocate a kmemleak_object structure\n");
>  		kmemleak_disable();
> @@ -763,7 +765,8 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
>  {
>  	unsigned long flags;
>  	struct kmemleak_object *object;
> -	struct kmemleak_scan_area *area;
> +	struct kmemleak_scan_area *area = NULL;
> +	unsigned int i;
>  
>  	object = find_and_get_object(ptr, 1);
>  	if (!object) {
> @@ -772,7 +775,9 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
>  		return;
>  	}
>  
> -	area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
> +	for (i = 0; i < 10 && !area; i++)
> +		area = kmem_cache_alloc(scan_area_cache,
> +					gfp_kmemleak_mask(gfp));
>  	if (!area) {
>  		pr_warn("Cannot allocate a scan area\n");
>  		goto out;
> -- 
> 1.8.3.1
> 
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-28 13:24     ` Michal Hocko
@ 2018-05-28 21:05       ` Tetsuo Handa
  2018-05-29 13:27         ` Chunyu Hu
  0 siblings, 1 reply; 23+ messages in thread
From: Tetsuo Handa @ 2018-05-28 21:05 UTC (permalink / raw)
  To: mhocko; +Cc: malat, dvyukov, linux-mm, catalin.marinas, chuhu

Michal Hocko wrote:
> I've found the previous report [1] finally. Adding Chunyu Hu to the CC
> list. The report which triggered this one is [2]
> 
> [1] http://lkml.kernel.org/r/1524243513-29118-1-git-send-email-chuhu@redhat.com
> [2] http://lkml.kernel.org/r/CA+7wUswp_Sr=hHqi1bwRZ3FE2wY5ozZWZ8Z1BgrFnSAmijUKjA@mail.gmail.com
> 
> I am not really familiar with the kmemleak code but the expectation that
> you can make a forward progress in an unknown allocation context seems
> broken to me. Why kmemleak cannot pre-allocate a pool of object_cache
> and refill it from a reasonably strong contexts (e.g. in a sleepable
> context)?

Or, we can undo the original allocation if the kmemleak allocation failed?

kmalloc(size, gfp) {
  ptr = do_kmalloc(size, gfp);
  if (ptr) {
    object = do_kmalloc(size, gfp_kmemleak_mask(gfp));
    if (!object) {
      kfree(ptr);
      return NULL;
    }
    // Store information of ptr into object.
  }
  return ptr;
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-28 21:05       ` Tetsuo Handa
@ 2018-05-29 13:27         ` Chunyu Hu
  2018-05-29 13:46           ` Tetsuo Handa
  0 siblings, 1 reply; 23+ messages in thread
From: Chunyu Hu @ 2018-05-29 13:27 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mhocko, malat, dvyukov, linux-mm, catalin marinas



----- Original Message -----
> From: "Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp>
> To: mhocko@suse.com
> Cc: malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, "catalin marinas" <catalin.marinas@arm.com>,
> chuhu@redhat.com
> Sent: Tuesday, May 29, 2018 5:05:45 AM
> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> Michal Hocko wrote:
> > I've found the previous report [1] finally. Adding Chunyu Hu to the CC
> > list. The report which triggered this one is [2]
> > 
> > [1]
> > http://lkml.kernel.org/r/1524243513-29118-1-git-send-email-chuhu@redhat.com
> > [2]
> > http://lkml.kernel.org/r/CA+7wUswp_Sr=hHqi1bwRZ3FE2wY5ozZWZ8Z1BgrFnSAmijUKjA@mail.gmail.com
> > 
> > I am not really familiar with the kmemleak code but the expectation that
> > you can make a forward progress in an unknown allocation context seems
> > broken to me. Why kmemleak cannot pre-allocate a pool of object_cache
> > and refill it from a reasonably strong contexts (e.g. in a sleepable
> > context)?
> 
> Or, we can undo the original allocation if the kmemleak allocation failed?

If so, you are making kmemleak a fault injection trigger. But the original
purpose for adding GFP_NOFAIL[2] is just for making kmemleak avoid fault injection.
(discussion in [1])

I'm trying with per task way for fault injection, and did some tries. In my 
try, I removed this from NOFAIL kmemleak and kmemleak survived with the per
task fault injection helper (disable/enable of task). Maybe I can send another
RFC for the api. 

> 
> kmalloc(size, gfp) {
>   ptr = do_kmalloc(size, gfp);
>   if (ptr) {
>     object = do_kmalloc(size, gfp_kmemleak_mask(gfp));
>     if (!object) {
>       kfree(ptr);
>       return NULL;
>     }
>     // Store information of ptr into object.
>   }
>   return ptr;
> }
> 

-- 
Regards,
Chunyu Hu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-29 13:27         ` Chunyu Hu
@ 2018-05-29 13:46           ` Tetsuo Handa
  2018-05-30  9:35             ` Chunyu Hu
  0 siblings, 1 reply; 23+ messages in thread
From: Tetsuo Handa @ 2018-05-29 13:46 UTC (permalink / raw)
  To: Chunyu Hu; +Cc: mhocko, malat, dvyukov, linux-mm, catalin marinas

On 2018/05/29 22:27, Chunyu Hu wrote:
>>> I am not really familiar with the kmemleak code but the expectation that
>>> you can make a forward progress in an unknown allocation context seems
>>> broken to me. Why kmemleak cannot pre-allocate a pool of object_cache
>>> and refill it from a reasonably strong contexts (e.g. in a sleepable
>>> context)?
>>
>> Or, we can undo the original allocation if the kmemleak allocation failed?
> 
> If so, you are making kmemleak a fault injection trigger. But the original
> purpose for adding GFP_NOFAIL[2] is just for making kmemleak avoid fault injection.
> (discussion in [1])

I don't think that applying fault injection to kmemleak allocations is bad
(except that fault injection messages might be noisy).

> 
> I'm trying with per task way for fault injection, and did some tries. In my 
> try, I removed this from NOFAIL kmemleak and kmemleak survived with the per
> task fault injection helper (disable/enable of task). Maybe I can send another
> RFC for the api. 

You could carry __GFP_NO_FAULT_INJECTION using per "struct task_struct" flag.

But I think that undoing the original allocation if the kmemleak allocation failed
has an advantage that it does not disable kmemleak when the system is under memory
pressure (i.e. about to invoke the OOM killer); allowing us to test memory pressure
conditions.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-29 13:46           ` Tetsuo Handa
@ 2018-05-30  9:35             ` Chunyu Hu
  2018-05-30 10:46               ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Chunyu Hu @ 2018-05-30  9:35 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mhocko, malat, dvyukov, linux-mm, catalin marinas



----- Original Message -----
> From: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>
> To: "Chunyu Hu" <chuhu@redhat.com>
> Cc: mhocko@suse.com, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, "catalin marinas"
> <catalin.marinas@arm.com>
> Sent: Tuesday, May 29, 2018 9:46:59 PM
> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> On 2018/05/29 22:27, Chunyu Hu wrote:
> >>> I am not really familiar with the kmemleak code but the expectation that
> >>> you can make a forward progress in an unknown allocation context seems
> >>> broken to me. Why kmemleak cannot pre-allocate a pool of object_cache
> >>> and refill it from a reasonably strong contexts (e.g. in a sleepable
> >>> context)?
> >>
> >> Or, we can undo the original allocation if the kmemleak allocation failed?
> > 
> > If so, you are making kmemleak a fault injection trigger. But the original
> > purpose for adding GFP_NOFAIL[2] is just for making kmemleak avoid fault
> > injection.
> > (discussion in [1])
> 
> I don't think that applying fault injection to kmemleak allocations is bad
> (except that fault injection messages might be noisy).

Maybe we provide a way for user decide to apply fault inject to kmemleak or not 
is better, by adding another sys file in /sys/kernel/debug/failslab and the
fail_page_alloc.

> 
> > 
> > I'm trying with per task way for fault injection, and did some tries. In my
> > try, I removed this from NOFAIL kmemleak and kmemleak survived with the per
> > task fault injection helper (disable/enable of task). Maybe I can send
> > another
> > RFC for the api.
> 
> You could carry __GFP_NO_FAULT_INJECTION using per "struct task_struct" flag.

Thanks for this suggestion. 

I'm trying to reuse the make_it_fail field in task for fault injection. As adding
an extra memory alloc flag is not thought so good,  I think adding task flag
is either? 

> 
> But I think that undoing the original allocation if the kmemleak allocation
> failed
> has an advantage that it does not disable kmemleak when the system is under
> memory
> pressure (i.e. about to invoke the OOM killer); allowing us to test memory
> pressure
> conditions.

There should be benefit, this is redefining kmemleak's principle, currently it
won't affect other allocation directly, but if we free the other user's mem
alloc without thinking about the context seems also risk. 

> 
> 

-- 
Regards,
Chunyu Hu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-30  9:35             ` Chunyu Hu
@ 2018-05-30 10:46               ` Michal Hocko
  2018-05-30 11:42                 ` Chunyu Hu
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-05-30 10:46 UTC (permalink / raw)
  To: Chunyu Hu; +Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas

On Wed 30-05-18 05:35:37, Chunyu Hu wrote:
[...]
> I'm trying to reuse the make_it_fail field in task for fault injection. As adding
> an extra memory alloc flag is not thought so good,  I think adding task flag
> is either? 

Yeah, task flag will be reduced to KMEMLEAK enabled configurations
without an additional maint. overhead. Anyway, you should really think
about how to guarantee trackability for atomic allocation requests. You
cannot simply assume that GFP_NOWAIT will succeed. I guess you really
want to have a pre-populated pool of objects for those requests. The
obvious question is how to balance such a pool. It ain't easy to track
memory by allocating more memory...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-30 10:46               ` Michal Hocko
@ 2018-05-30 11:42                 ` Chunyu Hu
  2018-05-30 12:38                   ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Chunyu Hu @ 2018-05-30 11:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas, Akinobu Mita


----- Original Message -----
> From: "Michal Hocko" <mhocko@suse.com>
> To: "Chunyu Hu" <chuhu@redhat.com>
> Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> "catalin marinas" <catalin.marinas@arm.com>
> Sent: Wednesday, May 30, 2018 6:46:37 PM
> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> On Wed 30-05-18 05:35:37, Chunyu Hu wrote:
> [...]
> > I'm trying to reuse the make_it_fail field in task for fault injection. As
> > adding
> > an extra memory alloc flag is not thought so good,  I think adding task
> > flag
> > is either?
> 
> Yeah, task flag will be reduced to KMEMLEAK enabled configurations
> without an additional maint. overhead. Anyway, you should really think
> about how to guarantee trackability for atomic allocation requests. You
> cannot simply assume that GFP_NOWAIT will succeed. I guess you really

Sure. While I'm using task->make_it_fail, I'm still in the direction of 
making kmemleak avoid fault inject with task flag instead of page alloc flag.

> want to have a pre-populated pool of objects for those requests. The
> obvious question is how to balance such a pool. It ain't easy to track
> memory by allocating more memory...

This solution is going to make kmemleak trace really nofail. We can think
later.

while I'm thinking about if fault inject can be disabled via flag in task.

Actually, I'm doing something like below, the disable_fault_inject() is
just setting a flag in task->make_it_fail. But this will depend on if
fault injection accept a change like this. CCing Akinobu 

[1] http://lkml.kernel.org/r/1524243513-29118-1-git-send-email-chuhu@redhat.com
[2] http://lkml.kernel.org/r/CA+7wUswp_Sr=hHqi1bwRZ3FE2wY5ozZWZ8Z1BgrFnSAmijUKjA@mail.gmail.com
[3] commit d9570ee3bd1d ("kmemleak: allow to coexist with fault injection")


+#define disable_fault_inject() \
+do {   \
+   unsigned long flag; \
+   local_irq_save(flag);   \
+   if (in_irq())   \
+       current->make_it_fail |= HARDIRQ_NOFAULT_OFFSET;    \
+   else    \
+       current->make_it_fail |= TASK_NOFAULT_OFFSET;   \
+   local_irq_restore(flag);        \
+} while (0)


--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c                                                                                                                                                          
@@ -111,6 +111,7 @@
 #include <linux/kasan.h>
 #include <linux/kmemleak.h>
 #include <linux/memory_hotplug.h>
+#include <linux/fault-inject.h>
 
 /*
  * Kmemleak configuration and common defines.
@@ -126,7 +127,7 @@
 /* GFP bitmask for kmemleak internal allocations */
 #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
                 __GFP_NORETRY | __GFP_NOMEMALLOC | \
-                __GFP_NOWARN | __GFP_NOFAIL)
+                __GFP_NOWARN)
 
 /* scanning area inside a memory block */
 struct kmemleak_scan_area {
@@ -551,12 +552,15 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size,
    struct kmemleak_object *object, *parent;
    struct rb_node **link, *rb_parent;
 
+   disable_fault_inject();
    object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
    if (!object) {
        pr_warn("Cannot allocate a kmemleak_object structure\n");
        kmemleak_disable();
+       enable_fault_inject();
        return NULL;
    }
+   enable_fault_inject();
 
    INIT_LIST_HEAD(&object->object_list);
    INIT_LIST_HEAD(&object->gray_list);


> 
> --
> Michal Hocko
> SUSE Labs
> 

-- 
Regards,
Chunyu Hu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-30 11:42                 ` Chunyu Hu
@ 2018-05-30 12:38                   ` Michal Hocko
  2018-05-31 10:51                     ` Chunyu Hu
  2018-05-31 15:22                     ` Catalin Marinas
  0 siblings, 2 replies; 23+ messages in thread
From: Michal Hocko @ 2018-05-30 12:38 UTC (permalink / raw)
  To: Chunyu Hu
  Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas, Akinobu Mita

On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
> 
> ----- Original Message -----
> > From: "Michal Hocko" <mhocko@suse.com>
> > To: "Chunyu Hu" <chuhu@redhat.com>
> > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> > "catalin marinas" <catalin.marinas@arm.com>
> > Sent: Wednesday, May 30, 2018 6:46:37 PM
> > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > 
> > On Wed 30-05-18 05:35:37, Chunyu Hu wrote:
> > [...]
> > > I'm trying to reuse the make_it_fail field in task for fault injection. As
> > > adding
> > > an extra memory alloc flag is not thought so good,  I think adding task
> > > flag
> > > is either?
> > 
> > Yeah, task flag will be reduced to KMEMLEAK enabled configurations
> > without an additional maint. overhead. Anyway, you should really think
> > about how to guarantee trackability for atomic allocation requests. You
> > cannot simply assume that GFP_NOWAIT will succeed. I guess you really
> 
> Sure. While I'm using task->make_it_fail, I'm still in the direction of 
> making kmemleak avoid fault inject with task flag instead of page alloc flag.
> 
> > want to have a pre-populated pool of objects for those requests. The
> > obvious question is how to balance such a pool. It ain't easy to track
> > memory by allocating more memory...
> 
> This solution is going to make kmemleak trace really nofail. We can think
> later.
> 
> while I'm thinking about if fault inject can be disabled via flag in task.
> 
> Actually, I'm doing something like below, the disable_fault_inject() is
> just setting a flag in task->make_it_fail. But this will depend on if
> fault injection accept a change like this. CCing Akinobu 

You still seem to be missing my point I am afraid (or I am ;). So say
that you want to track a GFP_NOWAIT allocation request. So create_object
will get called with that gfp mask and no matter what you try here your
tracking object will be allocated in a weak allocation context as well
and disable kmemleak. So it only takes a more heavy memory pressure and
the tracing is gone...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-30 12:38                   ` Michal Hocko
@ 2018-05-31 10:51                     ` Chunyu Hu
  2018-05-31 11:35                       ` Michal Hocko
  2018-05-31 15:22                     ` Catalin Marinas
  1 sibling, 1 reply; 23+ messages in thread
From: Chunyu Hu @ 2018-05-31 10:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas, Akinobu Mita



----- Original Message -----
> From: "Michal Hocko" <mhocko@suse.com>
> To: "Chunyu Hu" <chuhu@redhat.com>
> Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita" <akinobu.mita@gmail.com>
> Sent: Wednesday, May 30, 2018 8:38:26 PM
> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
> > 
> > ----- Original Message -----
> > > From: "Michal Hocko" <mhocko@suse.com>
> > > To: "Chunyu Hu" <chuhu@redhat.com>
> > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>,
> > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> > > "catalin marinas" <catalin.marinas@arm.com>
> > > Sent: Wednesday, May 30, 2018 6:46:37 PM
> > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > > 
> > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote:
> > > [...]
> > > > I'm trying to reuse the make_it_fail field in task for fault injection.
> > > > As
> > > > adding
> > > > an extra memory alloc flag is not thought so good,  I think adding task
> > > > flag
> > > > is either?
> > > 
> > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations
> > > without an additional maint. overhead. Anyway, you should really think
> > > about how to guarantee trackability for atomic allocation requests. You
> > > cannot simply assume that GFP_NOWAIT will succeed. I guess you really
> > 
> > Sure. While I'm using task->make_it_fail, I'm still in the direction of
> > making kmemleak avoid fault inject with task flag instead of page alloc
> > flag.
> > 
> > > want to have a pre-populated pool of objects for those requests. The
> > > obvious question is how to balance such a pool. It ain't easy to track
> > > memory by allocating more memory...
> > 
> > This solution is going to make kmemleak trace really nofail. We can think
> > later.
> > 
> > while I'm thinking about if fault inject can be disabled via flag in task.
> > 
> > Actually, I'm doing something like below, the disable_fault_inject() is
> > just setting a flag in task->make_it_fail. But this will depend on if
> > fault injection accept a change like this. CCing Akinobu
> 
> You still seem to be missing my point I am afraid (or I am ;). So say
> that you want to track a GFP_NOWAIT allocation request. So create_object
> will get called with that gfp mask and no matter what you try here your
> tracking object will be allocated in a weak allocation context as well
> and disable kmemleak. So it only takes a more heavy memory pressure and
> the tracing is gone...

Michal,

Thank you for the good suggestion. You mean GFP_NOWAIT still can make create_object
fail and as a result kmemleak disable itself. So it's not so useful, just like
the current __GFP_NOFAIL usage in create_object. 

In the first thread, we discussed this. and that time you suggested we have 
fault injection disabled when kmemleak is working and suggested per task way.
so my head has been stuck in that point. While now you gave a better suggestion
that why not we pre allocate a urgent pool for kmemleak objects. After thinking
for a while, I got  your point, it's a good way for improving kmemleak to make
it can tolerate light allocation failure. And catalin mentioned that we have
one option that use the early_log array as urgent pool, which has the similar
ideology.

Basing on your suggestions, I tried to draft this, what does it look to you? 
another strong alloc mask and an extra thread for fill the pool, which containts
1M objects in a frequency of 100 ms. If first kmem_cache_alloc failed, then
get a object from the pool. 


diff --git a/mm/kmemleak.c b/mm/kmemleak.c                                                                                                                                   
index 9a085d5..7163489 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -128,6 +128,10 @@
                 __GFP_NORETRY | __GFP_NOMEMALLOC | \
                 __GFP_NOWARN | __GFP_NOFAIL)
 
+#define gfp_kmemleak_mask_strong() (__GFP_NOMEMALLOC | \
+                __GFP_NOWARN | __GFP_RECLAIM | __GFP_NOFAIL)
+
+
 /* scanning area inside a memory block */
 struct kmemleak_scan_area {
    struct hlist_node node;
@@ -299,6 +303,83 @@ struct early_log {
    kmemleak_disable();     \
 } while (0)
 
+static DEFINE_SPINLOCK(kmemleak_object_lock);
+static LIST_HEAD(pool_object_list);
+static unsigned int volatile total;
+static unsigned int pool_object_max = 1024 * 1024;
+static struct task_struct *pool_thread;
+
+static struct kmemleak_object* kmemleak_pool_fill(void)
+{
+   struct kmemleak_object *object = NULL;
+   unsigned long flags;
+
+   object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask_strong());
+   spin_lock_irqsave(&kmemleak_object_lock, flags);
+   if (object) {
+       list_add(&object->object_list, &pool_object_list);
+       total++;
+   }
+   spin_unlock_irqrestore(&kmemleak_object_lock, flags);
+   return object;
+}
+
+static struct kmemleak_object* kmemleak_get_pool_object(void)
+{
+   struct kmemleak_object *object = NULL;
+   unsigned long flags;
+
+   spin_lock_irqsave(&kmemleak_object_lock, flags);
+   if (!list_empty(&pool_object_list)) {
+       object = list_first_entry(&pool_object_list,struct kmemleak_object,
+               object_list);
+       list_del(&object->object_list);
+       total--;
+   }
+   spin_unlock_irqrestore(&kmemleak_object_lock, flags);
+   return object;
+}
+
+static int kmemleak_pool_thread(void *nothinng)
+{
+   struct kmemleak_object *object = NULL;
+   while (!kthread_should_stop()) {
+       if (READ_ONCE(total) < pool_object_max) {
+           object = kmemleak_pool_fill();
+           WARN_ON(!object);
+       }
+       schedule_timeout_interruptible(msecs_to_jiffies(100));
+   }
+   return 0;
+}
+
+static void start_pool_thread(void)
+{
+   if (pool_thread)
+       return;
+   pool_thread = kthread_run(kmemleak_pool_thread, NULL, "kmemleak_pool");
+   if (IS_ERR(pool_thread)) {
+       pr_warn("Failed to create the scan thread\n");
+       pool_thread = NULL;
+   }
+}
+static void stop_pool_thread(void)
+{
+   struct kmemleak_object *object;
+   unsigned long flags;
+   if (pool_thread) {
+       kthread_stop(pool_thread);
+       pool_thread = NULL;
+   }
+   spin_lock_irqsave(&kmemleak_object_lock, flags);
+   list_for_each_entry(object, &pool_object_list, object_list) {
+       list_del(&object->object_list);
+       kmem_cache_free(object_cache, object);
+   }
+   spin_unlock_irqrestore(&kmemleak_object_lock, flags);
+}
+
 /*
  * Printing of the objects hex dump to the seq file. The number of lines to be
  * printed is limited to HEX_MAX_LINES to prevent seq file spamming. The
@@ -553,6 +634,10 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size,
 
    object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
    if (!object) {
+       object = kmemleak_get_pool_object();
+       pr_info("total=%u", total);
+   }
+   if (!object) {
        pr_warn("Cannot allocate a kmemleak_object structure\n");
        kmemleak_disable();
        return NULL;
@@ -1872,8 +1957,10 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf,
        kmemleak_stack_scan = 0;
    else if (strncmp(buf, "scan=on", 7) == 0)
        start_scan_thread();
-   else if (strncmp(buf, "scan=off", 8) == 0)
+   else if (strncmp(buf, "scan=off", 8) == 0) {
        stop_scan_thread();
+       stop_pool_thread();
+   }
    else if (strncmp(buf, "scan=", 5) == 0) {
        unsigned long secs;
 
@@ -1929,6 +2016,7 @@ static void __kmemleak_do_cleanup(void)
 static void kmemleak_do_cleanup(struct work_struct *work)
 {
    stop_scan_thread();
+   stop_pool_thread();
 
    mutex_lock(&scan_mutex);
    /*
@@ -2114,6 +2202,7 @@ static int __init kmemleak_late_init(void)
        pr_warn("Failed to create the debugfs kmemleak file\n");
    mutex_lock(&scan_mutex);
    start_scan_thread();
+   start_pool_thread();
    mutex_unlock(&scan_mutex);
 
    pr_info("Kernel memory leak detector initialized\n");                           



> --
> Michal Hocko
> SUSE Labs
> 

-- 
Regards,
Chunyu Hu

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-31 10:51                     ` Chunyu Hu
@ 2018-05-31 11:35                       ` Michal Hocko
  2018-05-31 12:28                         ` Chunyu Hu
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-05-31 11:35 UTC (permalink / raw)
  To: Chunyu Hu
  Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas, Akinobu Mita

On Thu 31-05-18 06:51:22, Chunyu Hu wrote:
> 
> 
> ----- Original Message -----
> > From: "Michal Hocko" <mhocko@suse.com>
> > To: "Chunyu Hu" <chuhu@redhat.com>
> > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> > "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita" <akinobu.mita@gmail.com>
> > Sent: Wednesday, May 30, 2018 8:38:26 PM
> > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > 
> > On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
> > > 
> > > ----- Original Message -----
> > > > From: "Michal Hocko" <mhocko@suse.com>
> > > > To: "Chunyu Hu" <chuhu@redhat.com>
> > > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>,
> > > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> > > > "catalin marinas" <catalin.marinas@arm.com>
> > > > Sent: Wednesday, May 30, 2018 6:46:37 PM
> > > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > > > 
> > > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote:
> > > > [...]
> > > > > I'm trying to reuse the make_it_fail field in task for fault injection.
> > > > > As
> > > > > adding
> > > > > an extra memory alloc flag is not thought so good,  I think adding task
> > > > > flag
> > > > > is either?
> > > > 
> > > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations
> > > > without an additional maint. overhead. Anyway, you should really think
> > > > about how to guarantee trackability for atomic allocation requests. You
> > > > cannot simply assume that GFP_NOWAIT will succeed. I guess you really
> > > 
> > > Sure. While I'm using task->make_it_fail, I'm still in the direction of
> > > making kmemleak avoid fault inject with task flag instead of page alloc
> > > flag.
> > > 
> > > > want to have a pre-populated pool of objects for those requests. The
> > > > obvious question is how to balance such a pool. It ain't easy to track
> > > > memory by allocating more memory...
> > > 
> > > This solution is going to make kmemleak trace really nofail. We can think
> > > later.
> > > 
> > > while I'm thinking about if fault inject can be disabled via flag in task.
> > > 
> > > Actually, I'm doing something like below, the disable_fault_inject() is
> > > just setting a flag in task->make_it_fail. But this will depend on if
> > > fault injection accept a change like this. CCing Akinobu
> > 
> > You still seem to be missing my point I am afraid (or I am ;). So say
> > that you want to track a GFP_NOWAIT allocation request. So create_object
> > will get called with that gfp mask and no matter what you try here your
> > tracking object will be allocated in a weak allocation context as well
> > and disable kmemleak. So it only takes a more heavy memory pressure and
> > the tracing is gone...
> 
> Michal,
> 
> Thank you for the good suggestion. You mean GFP_NOWAIT still can make create_object
> fail and as a result kmemleak disable itself. So it's not so useful, just like
> the current __GFP_NOFAIL usage in create_object. 
> 
> In the first thread, we discussed this. and that time you suggested we have 
> fault injection disabled when kmemleak is working and suggested per task way.
> so my head has been stuck in that point. While now you gave a better suggestion
> that why not we pre allocate a urgent pool for kmemleak objects. After thinking
> for a while, I got  your point, it's a good way for improving kmemleak to make
> it can tolerate light allocation failure. And catalin mentioned that we have
> one option that use the early_log array as urgent pool, which has the similar
> ideology.
> 
> Basing on your suggestions, I tried to draft this, what does it look to you? 
> another strong alloc mask and an extra thread for fill the pool, which containts
> 1M objects in a frequency of 100 ms. If first kmem_cache_alloc failed, then
> get a object from the pool. 

I am not really familiar with kmemleak code base to judge the
implementation. Could you be more specific about the highlevel design
please? Who is the producer and how does it sync with consumers?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-31 11:35                       ` Michal Hocko
@ 2018-05-31 12:28                         ` Chunyu Hu
  0 siblings, 0 replies; 23+ messages in thread
From: Chunyu Hu @ 2018-05-31 12:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas, Akinobu Mita



----- Original Message -----
> From: "Michal Hocko" <mhocko@suse.com>
> To: "Chunyu Hu" <chuhu@redhat.com>
> Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita" <akinobu.mita@gmail.com>
> Sent: Thursday, May 31, 2018 7:35:08 PM
> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> On Thu 31-05-18 06:51:22, Chunyu Hu wrote:
> > 
> > 
> > ----- Original Message -----
> > > From: "Michal Hocko" <mhocko@suse.com>
> > > To: "Chunyu Hu" <chuhu@redhat.com>
> > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>,
> > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> > > "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita"
> > > <akinobu.mita@gmail.com>
> > > Sent: Wednesday, May 30, 2018 8:38:26 PM
> > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > > 
> > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Michal Hocko" <mhocko@suse.com>
> > > > > To: "Chunyu Hu" <chuhu@redhat.com>
> > > > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>,
> > > > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> > > > > "catalin marinas" <catalin.marinas@arm.com>
> > > > > Sent: Wednesday, May 30, 2018 6:46:37 PM
> > > > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > > > > 
> > > > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote:
> > > > > [...]
> > > > > > I'm trying to reuse the make_it_fail field in task for fault
> > > > > > injection.
> > > > > > As
> > > > > > adding
> > > > > > an extra memory alloc flag is not thought so good,  I think adding
> > > > > > task
> > > > > > flag
> > > > > > is either?
> > > > > 
> > > > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations
> > > > > without an additional maint. overhead. Anyway, you should really
> > > > > think
> > > > > about how to guarantee trackability for atomic allocation requests.
> > > > > You
> > > > > cannot simply assume that GFP_NOWAIT will succeed. I guess you really
> > > > 
> > > > Sure. While I'm using task->make_it_fail, I'm still in the direction of
> > > > making kmemleak avoid fault inject with task flag instead of page alloc
> > > > flag.
> > > > 
> > > > > want to have a pre-populated pool of objects for those requests. The
> > > > > obvious question is how to balance such a pool. It ain't easy to
> > > > > track
> > > > > memory by allocating more memory...
> > > > 
> > > > This solution is going to make kmemleak trace really nofail. We can
> > > > think
> > > > later.
> > > > 
> > > > while I'm thinking about if fault inject can be disabled via flag in
> > > > task.
> > > > 
> > > > Actually, I'm doing something like below, the disable_fault_inject() is
> > > > just setting a flag in task->make_it_fail. But this will depend on if
> > > > fault injection accept a change like this. CCing Akinobu
> > > 
> > > You still seem to be missing my point I am afraid (or I am ;). So say
> > > that you want to track a GFP_NOWAIT allocation request. So create_object
> > > will get called with that gfp mask and no matter what you try here your
> > > tracking object will be allocated in a weak allocation context as well
> > > and disable kmemleak. So it only takes a more heavy memory pressure and
> > > the tracing is gone...
> > 
> > Michal,
> > 
> > Thank you for the good suggestion. You mean GFP_NOWAIT still can make
> > create_object
> > fail and as a result kmemleak disable itself. So it's not so useful, just
> > like
> > the current __GFP_NOFAIL usage in create_object.
> > 
> > In the first thread, we discussed this. and that time you suggested we have
> > fault injection disabled when kmemleak is working and suggested per task
> > way.
> > so my head has been stuck in that point. While now you gave a better
> > suggestion
> > that why not we pre allocate a urgent pool for kmemleak objects. After
> > thinking
> > for a while, I got  your point, it's a good way for improving kmemleak to
> > make
> > it can tolerate light allocation failure. And catalin mentioned that we
> > have
> > one option that use the early_log array as urgent pool, which has the
> > similar
> > ideology.
> > 
> > Basing on your suggestions, I tried to draft this, what does it look to
> > you?
> > another strong alloc mask and an extra thread for fill the pool, which
> > containts
> > 1M objects in a frequency of 100 ms. If first kmem_cache_alloc failed, then
> > get a object from the pool.
> 
> I am not really familiar with kmemleak code base to judge the
> implementation. Could you be more specific about the highlevel design
> please? Who is the producer and how does it sync with consumers?

OK. 

To better describe. We know that, kmemleak_object is meta object for kmemleak
trace, and each time kmem_cache_alloc(or other) success, the another following
kmem_cache_alloc would be called (in create_object() to get a kmemleak_object 
and this must succeed, otherwise kmemleak would generate too many false positives
as a result of losing track to a memory block which could contain pointer to 
other objects. so kmemleak trace choose to disable itself when getting such
a allocation failure. 

When facing fault injection, this would become an issue that kmemleak would
easily disable itself when fault injected. And  memory allocation can
happen in irq context, so the followed kmemleak_alloc can't choose a
very strong way for allocation (such as blackable). So we can prepare
a dynamic kmemleak_object pool. And the design is in fact rather straight,
by maintaining a list of kmemleak_object. 

So the reproducer is a new kernel thread. which do a kmemleak_object(contains
list member itself, so easy to link) allocation every 100ms, in a strong
allocation way (can sleep and reclaim), to the pool_object_list, and the max
length of the list is 1024*1024 (1M).
 
  [pool_thread (reproducer)]                   
    pool_object_list<-->kmemleak_object<-->kmemleak_object...<-->...

And the consumer is create_object(). it can pick one from the list when
got failure in first weak allocation. 

  [task doing memory alloc (consumer)]
    kmem_cache_alloc()
        create_object() 
           kmem_cache_alloc()
             (fail ?)--Yes ---> (get kmemleak_object from the pool_object_lsit)
                     |_ No ---> got kmemleak_object
                  [insert kmemleak_object to rb tree]

And consumer and producer are synced with spinlock kmemleak_object_lock(maybe
call pool_object_lock)

  [spin lock]
  kmemleak_object_lock

Hope I described it clear...

-- 
Regards,
Chunyu Hu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-30 12:38                   ` Michal Hocko
  2018-05-31 10:51                     ` Chunyu Hu
@ 2018-05-31 15:22                     ` Catalin Marinas
  2018-05-31 18:41                       ` Michal Hocko
  1 sibling, 1 reply; 23+ messages in thread
From: Catalin Marinas @ 2018-05-31 15:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Chunyu Hu, Tetsuo Handa, malat, dvyukov, linux-mm, Akinobu Mita

Hi Michal,

I'm catching up with this thread.

On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote:
> On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
> > From: "Michal Hocko" <mhocko@suse.com>
> > > want to have a pre-populated pool of objects for those requests. The
> > > obvious question is how to balance such a pool. It ain't easy to track
> > > memory by allocating more memory...
> > 
> > This solution is going to make kmemleak trace really nofail. We can think
> > later.
> > 
> > while I'm thinking about if fault inject can be disabled via flag in task.
> > 
> > Actually, I'm doing something like below, the disable_fault_inject() is
> > just setting a flag in task->make_it_fail. But this will depend on if
> > fault injection accept a change like this. CCing Akinobu 
> 
> You still seem to be missing my point I am afraid (or I am ;). So say
> that you want to track a GFP_NOWAIT allocation request. So create_object
> will get called with that gfp mask and no matter what you try here your
> tracking object will be allocated in a weak allocation context as well
> and disable kmemleak. So it only takes a more heavy memory pressure and
> the tracing is gone...

create_object() indeed gets the originating gfp but it only cares
whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out
all the other flags when allocating a struct kmemleak_object (and adds
some of its own).

This has worked ok so far. There is a higher risk of GFP_ATOMIC
allocations failing but I haven't seen issues with kmemleak unless you
run it under heavy memory pressure (and kmemleak just disables itself).
With fault injection, such pressure is simulated with the side effect of
rendering kmemleak unusable.

Kmemleak could implement its own allocation mechanism (maybe even
skipping the slab altogether) but I feel it's overkill just to cope with
the specific case of fault injection. Also, it's not easy to figure out
how to balance such pool and it may end up calling alloc_pages() which
in turn can inject a fault.

Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a
compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault
injection is enabled?

Otherwise, I'd prefer some per-thread temporary fault injection
disabling.

-- 
Catalin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-31 15:22                     ` Catalin Marinas
@ 2018-05-31 18:41                       ` Michal Hocko
  2018-06-01  1:50                         ` Chunyu Hu
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-05-31 18:41 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Chunyu Hu, Tetsuo Handa, malat, dvyukov, linux-mm, Akinobu Mita

On Thu 31-05-18 16:22:26, Catalin Marinas wrote:
> Hi Michal,
> 
> I'm catching up with this thread.
> 
> On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote:
> > On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
> > > From: "Michal Hocko" <mhocko@suse.com>
> > > > want to have a pre-populated pool of objects for those requests. The
> > > > obvious question is how to balance such a pool. It ain't easy to track
> > > > memory by allocating more memory...
> > > 
> > > This solution is going to make kmemleak trace really nofail. We can think
> > > later.
> > > 
> > > while I'm thinking about if fault inject can be disabled via flag in task.
> > > 
> > > Actually, I'm doing something like below, the disable_fault_inject() is
> > > just setting a flag in task->make_it_fail. But this will depend on if
> > > fault injection accept a change like this. CCing Akinobu 
> > 
> > You still seem to be missing my point I am afraid (or I am ;). So say
> > that you want to track a GFP_NOWAIT allocation request. So create_object
> > will get called with that gfp mask and no matter what you try here your
> > tracking object will be allocated in a weak allocation context as well
> > and disable kmemleak. So it only takes a more heavy memory pressure and
> > the tracing is gone...
> 
> create_object() indeed gets the originating gfp but it only cares
> whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out
> all the other flags when allocating a struct kmemleak_object (and adds
> some of its own).
> 
> This has worked ok so far. There is a higher risk of GFP_ATOMIC
> allocations failing but I haven't seen issues with kmemleak unless you
> run it under heavy memory pressure (and kmemleak just disables itself).
> With fault injection, such pressure is simulated with the side effect of
> rendering kmemleak unusable.
> 
> Kmemleak could implement its own allocation mechanism (maybe even
> skipping the slab altogether) but I feel it's overkill just to cope with
> the specific case of fault injection. Also, it's not easy to figure out
> how to balance such pool and it may end up calling alloc_pages() which
> in turn can inject a fault.
> 
> Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a
> compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault
> injection is enabled?
> 
> Otherwise, I'd prefer some per-thread temporary fault injection
> disabling.

Well, there are two issues (which boil down to the one in the end) here.
Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL
context is something to care about. A weaker allocation context can and
will lead to kmemleak meta data allocation failures regardless of the
fault injection. The way how those objects are allocated directly in the
allacator context makes this really hard and inherently subtle. I can
see only two ways around. Either you have a placeholder for "this object
is not tracked so do not throw false positives" or have a preallocated
pool to use if the direct context allocation failes for whatever reason.
Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind
of problems.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-05-31 18:41                       ` Michal Hocko
@ 2018-06-01  1:50                         ` Chunyu Hu
  2018-06-01  4:53                           ` Chunyu Hu
  0 siblings, 1 reply; 23+ messages in thread
From: Chunyu Hu @ 2018-06-01  1:50 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Michal Hocko, Tetsuo Handa, malat, dvyukov, linux-mm, Akinobu Mita



----- Original Message -----
> From: "Michal Hocko" <mhocko@suse.com>
> To: "Catalin Marinas" <catalin.marinas@arm.com>
> Cc: "Chunyu Hu" <chuhu@redhat.com>, "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org,
> dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" <akinobu.mita@gmail.com>
> Sent: Friday, June 1, 2018 2:41:04 AM
> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> On Thu 31-05-18 16:22:26, Catalin Marinas wrote:
> > Hi Michal,
> > 
> > I'm catching up with this thread.
> > 
> > On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote:
> > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
> > > > From: "Michal Hocko" <mhocko@suse.com>
> > > > > want to have a pre-populated pool of objects for those requests. The
> > > > > obvious question is how to balance such a pool. It ain't easy to
> > > > > track
> > > > > memory by allocating more memory...
> > > > 
> > > > This solution is going to make kmemleak trace really nofail. We can
> > > > think
> > > > later.
> > > > 
> > > > while I'm thinking about if fault inject can be disabled via flag in
> > > > task.
> > > > 
> > > > Actually, I'm doing something like below, the disable_fault_inject() is
> > > > just setting a flag in task->make_it_fail. But this will depend on if
> > > > fault injection accept a change like this. CCing Akinobu
> > > 
> > > You still seem to be missing my point I am afraid (or I am ;). So say
> > > that you want to track a GFP_NOWAIT allocation request. So create_object
> > > will get called with that gfp mask and no matter what you try here your
> > > tracking object will be allocated in a weak allocation context as well
> > > and disable kmemleak. So it only takes a more heavy memory pressure and
> > > the tracing is gone...
> > 
> > create_object() indeed gets the originating gfp but it only cares
> > whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out
> > all the other flags when allocating a struct kmemleak_object (and adds
> > some of its own).
> > 
> > This has worked ok so far. There is a higher risk of GFP_ATOMIC
> > allocations failing but I haven't seen issues with kmemleak unless you
> > run it under heavy memory pressure (and kmemleak just disables itself).
> > With fault injection, such pressure is simulated with the side effect of
> > rendering kmemleak unusable.
> > 
> > Kmemleak could implement its own allocation mechanism (maybe even
> > skipping the slab altogether) but I feel it's overkill just to cope with
> > the specific case of fault injection. Also, it's not easy to figure out
> > how to balance such pool and it may end up calling alloc_pages() which
> > in turn can inject a fault.


it would benefit kmemleak trace, I see in my test that kmemleak even can work in
user pressure cases, such as in my test, stress-ng to consume
nearly all the swap space. kmemleak is still working. but 1M objects pool
is consuming around 400M + memory. So this is just a experiment try, as you
said, how to balance it's size is the issue or ther issues has to be resolved,
such as when to add pool, the speed, how big, and so on ...

And I fault injected 20000 times fail_page_alloc, and 2148 times happened
in create_object, and in such a case, kmemleak is till working after the
2000+ calloc failures. 

[root@dhcp-12-244 fail_page_alloc]# grep create_object /var/log/messages | wc -l
2148

[60498.299412] FAULT_INJECTION: forcing a failure.
name fail_page_alloc, interval 0, probability 80, space 0, times 2

So this way is not just for fault injection, it's about making kmemleak
a bit stronger under memory failure case. It would be an exciting experience we
see if kmemleak still work even after mem pressure, as a user, I experienced
the good usage.


> > 
> > Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a
> > compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault
> > injection is enabled?

Maybe I can have a try on this..

> > 
> > Otherwise, I'd prefer some per-thread temporary fault injection
> > disabling.

I tried in make_it_fail flag, kmemleak can avoid fault injection, but I
can see kmemleak diabled itself...

> 
> Well, there are two issues (which boil down to the one in the end) here.
> Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL
> context is something to care about. A weaker allocation context can and
> will lead to kmemleak meta data allocation failures regardless of the
> fault injection. The way how those objects are allocated directly in the
> allacator context makes this really hard and inherently subtle. I can
> see only two ways around. Either you have a placeholder for "this object
> is not tracked so do not throw false positives" or have a preallocated
> pool to use if the direct context allocation failes for whatever reason.
> Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind
> of problems.
> --
> Michal Hocko
> SUSE Labs
> 

-- 
Regards,
Chunyu Hu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-06-01  1:50                         ` Chunyu Hu
@ 2018-06-01  4:53                           ` Chunyu Hu
  2018-06-04  8:41                             ` Dmitry Vyukov
  0 siblings, 1 reply; 23+ messages in thread
From: Chunyu Hu @ 2018-06-01  4:53 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Michal Hocko, Tetsuo Handa, malat, dvyukov, linux-mm, Akinobu Mita



----- Original Message -----
> From: "Chunyu Hu" <chuhu@redhat.com>
> To: "Catalin Marinas" <catalin.marinas@arm.com>
> Cc: "Michal Hocko" <mhocko@suse.com>, "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org,
> dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" <akinobu.mita@gmail.com>
> Sent: Friday, June 1, 2018 9:50:20 AM
> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> 
> 
> ----- Original Message -----
> > From: "Michal Hocko" <mhocko@suse.com>
> > To: "Catalin Marinas" <catalin.marinas@arm.com>
> > Cc: "Chunyu Hu" <chuhu@redhat.com>, "Tetsuo Handa"
> > <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org,
> > dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita"
> > <akinobu.mita@gmail.com>
> > Sent: Friday, June 1, 2018 2:41:04 AM
> > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > 
> > On Thu 31-05-18 16:22:26, Catalin Marinas wrote:
> > > Hi Michal,
> > > 
> > > I'm catching up with this thread.
> > > 
> > > On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote:
> > > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
> > > > > From: "Michal Hocko" <mhocko@suse.com>
> > > > > > want to have a pre-populated pool of objects for those requests.
> > > > > > The
> > > > > > obvious question is how to balance such a pool. It ain't easy to
> > > > > > track
> > > > > > memory by allocating more memory...
> > > > > 
> > > > > This solution is going to make kmemleak trace really nofail. We can
> > > > > think
> > > > > later.
> > > > > 
> > > > > while I'm thinking about if fault inject can be disabled via flag in
> > > > > task.
> > > > > 
> > > > > Actually, I'm doing something like below, the disable_fault_inject()
> > > > > is
> > > > > just setting a flag in task->make_it_fail. But this will depend on if
> > > > > fault injection accept a change like this. CCing Akinobu
> > > > 
> > > > You still seem to be missing my point I am afraid (or I am ;). So say
> > > > that you want to track a GFP_NOWAIT allocation request. So
> > > > create_object
> > > > will get called with that gfp mask and no matter what you try here your
> > > > tracking object will be allocated in a weak allocation context as well
> > > > and disable kmemleak. So it only takes a more heavy memory pressure and
> > > > the tracing is gone...
> > > 
> > > create_object() indeed gets the originating gfp but it only cares
> > > whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out
> > > all the other flags when allocating a struct kmemleak_object (and adds
> > > some of its own).
> > > 
> > > This has worked ok so far. There is a higher risk of GFP_ATOMIC
> > > allocations failing but I haven't seen issues with kmemleak unless you
> > > run it under heavy memory pressure (and kmemleak just disables itself).
> > > With fault injection, such pressure is simulated with the side effect of
> > > rendering kmemleak unusable.
> > > 
> > > Kmemleak could implement its own allocation mechanism (maybe even
> > > skipping the slab altogether) but I feel it's overkill just to cope with
> > > the specific case of fault injection. Also, it's not easy to figure out
> > > how to balance such pool and it may end up calling alloc_pages() which
> > > in turn can inject a fault.
> 
> 
> it would benefit kmemleak trace, I see in my test that kmemleak even can work
> in
> user pressure cases, such as in my test, stress-ng to consume
> nearly all the swap space. kmemleak is still working. but 1M objects pool
> is consuming around 400M + memory. So this is just a experiment try, as you
> said, how to balance it's size is the issue or ther issues has to be
> resolved,
> such as when to add pool, the speed, how big, and so on ...
> 
> And I fault injected 20000 times fail_page_alloc, and 2148 times happened
> in create_object, and in such a case, kmemleak is till working after the
> 2000+ calloc failures.
> 
> [root@dhcp-12-244 fail_page_alloc]# grep create_object /var/log/messages | wc
> -l
> 2148
> 
> [60498.299412] FAULT_INJECTION: forcing a failure.
> name fail_page_alloc, interval 0, probability 80, space 0, times 2
> 
> So this way is not just for fault injection, it's about making kmemleak
> a bit stronger under memory failure case. It would be an exciting experience
> we
> see if kmemleak still work even after mem pressure, as a user, I experienced
> the good usage.
> 
> 
> > > 
> > > Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a
> > > compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault
> > > injection is enabled?
> 
> Maybe I can have a try on this..

looks like I tried this in my first report thread, and it failed. As it
can sleep in irq() ..

https://marc.info/?l=linux-mm&m=152482400222806&w=2

> 
> > > 
> > > Otherwise, I'd prefer some per-thread temporary fault injection
> > > disabling.
> 
> I tried in make_it_fail flag, kmemleak can avoid fault injection, but I
> can see kmemleak diabled itself...
> 
> > 
> > Well, there are two issues (which boil down to the one in the end) here.
> > Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL
> > context is something to care about. A weaker allocation context can and
> > will lead to kmemleak meta data allocation failures regardless of the
> > fault injection. The way how those objects are allocated directly in the
> > allacator context makes this really hard and inherently subtle. I can
> > see only two ways around. Either you have a placeholder for "this object
> > is not tracked so do not throw false positives" or have a preallocated
> > pool to use if the direct context allocation failes for whatever reason.
> > Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind
> > of problems.
> > --
> > Michal Hocko
> > SUSE Labs
> > 
> 
> --
> Regards,
> Chunyu Hu
> 
> 

-- 
Regards,
Chunyu Hu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-06-01  4:53                           ` Chunyu Hu
@ 2018-06-04  8:41                             ` Dmitry Vyukov
  2018-06-04 12:42                               ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Dmitry Vyukov @ 2018-06-04  8:41 UTC (permalink / raw)
  To: Chunyu Hu
  Cc: Catalin Marinas, Michal Hocko, Tetsuo Handa, malat, Linux-MM,
	Akinobu Mita

On Fri, Jun 1, 2018 at 6:53 AM, Chunyu Hu <chuhu@redhat.com> wrote:
> ----- Original Message -----
>> From: "Chunyu Hu" <chuhu@redhat.com>
>> To: "Catalin Marinas" <catalin.marinas@arm.com>
>> Cc: "Michal Hocko" <mhocko@suse.com>, "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org,
>> dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" <akinobu.mita@gmail.com>
>> Sent: Friday, June 1, 2018 9:50:20 AM
>> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
>>
>>
>>
>> ----- Original Message -----
>> > From: "Michal Hocko" <mhocko@suse.com>
>> > To: "Catalin Marinas" <catalin.marinas@arm.com>
>> > Cc: "Chunyu Hu" <chuhu@redhat.com>, "Tetsuo Handa"
>> > <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org,
>> > dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita"
>> > <akinobu.mita@gmail.com>
>> > Sent: Friday, June 1, 2018 2:41:04 AM
>> > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
>> >
>> > On Thu 31-05-18 16:22:26, Catalin Marinas wrote:
>> > > Hi Michal,
>> > >
>> > > I'm catching up with this thread.
>> > >
>> > > On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote:
>> > > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
>> > > > > From: "Michal Hocko" <mhocko@suse.com>
>> > > > > > want to have a pre-populated pool of objects for those requests.
>> > > > > > The
>> > > > > > obvious question is how to balance such a pool. It ain't easy to
>> > > > > > track
>> > > > > > memory by allocating more memory...
>> > > > >
>> > > > > This solution is going to make kmemleak trace really nofail. We can
>> > > > > think
>> > > > > later.
>> > > > >
>> > > > > while I'm thinking about if fault inject can be disabled via flag in
>> > > > > task.
>> > > > >
>> > > > > Actually, I'm doing something like below, the disable_fault_inject()
>> > > > > is
>> > > > > just setting a flag in task->make_it_fail. But this will depend on if
>> > > > > fault injection accept a change like this. CCing Akinobu
>> > > >
>> > > > You still seem to be missing my point I am afraid (or I am ;). So say
>> > > > that you want to track a GFP_NOWAIT allocation request. So
>> > > > create_object
>> > > > will get called with that gfp mask and no matter what you try here your
>> > > > tracking object will be allocated in a weak allocation context as well
>> > > > and disable kmemleak. So it only takes a more heavy memory pressure and
>> > > > the tracing is gone...
>> > >
>> > > create_object() indeed gets the originating gfp but it only cares
>> > > whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out
>> > > all the other flags when allocating a struct kmemleak_object (and adds
>> > > some of its own).
>> > >
>> > > This has worked ok so far. There is a higher risk of GFP_ATOMIC
>> > > allocations failing but I haven't seen issues with kmemleak unless you
>> > > run it under heavy memory pressure (and kmemleak just disables itself).
>> > > With fault injection, such pressure is simulated with the side effect of
>> > > rendering kmemleak unusable.
>> > >
>> > > Kmemleak could implement its own allocation mechanism (maybe even
>> > > skipping the slab altogether) but I feel it's overkill just to cope with
>> > > the specific case of fault injection. Also, it's not easy to figure out
>> > > how to balance such pool and it may end up calling alloc_pages() which
>> > > in turn can inject a fault.
>>
>>
>> it would benefit kmemleak trace, I see in my test that kmemleak even can work
>> in
>> user pressure cases, such as in my test, stress-ng to consume
>> nearly all the swap space. kmemleak is still working. but 1M objects pool
>> is consuming around 400M + memory. So this is just a experiment try, as you
>> said, how to balance it's size is the issue or ther issues has to be
>> resolved,
>> such as when to add pool, the speed, how big, and so on ...
>>
>> And I fault injected 20000 times fail_page_alloc, and 2148 times happened
>> in create_object, and in such a case, kmemleak is till working after the
>> 2000+ calloc failures.
>>
>> [root@dhcp-12-244 fail_page_alloc]# grep create_object /var/log/messages | wc
>> -l
>> 2148
>>
>> [60498.299412] FAULT_INJECTION: forcing a failure.
>> name fail_page_alloc, interval 0, probability 80, space 0, times 2
>>
>> So this way is not just for fault injection, it's about making kmemleak
>> a bit stronger under memory failure case. It would be an exciting experience
>> we
>> see if kmemleak still work even after mem pressure, as a user, I experienced
>> the good usage.
>>
>>
>> > >
>> > > Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a
>> > > compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault
>> > > injection is enabled?
>>
>> Maybe I can have a try on this..
>
> looks like I tried this in my first report thread, and it failed. As it
> can sleep in irq() ..
>
> https://marc.info/?l=linux-mm&m=152482400222806&w=2
>
>>
>> > >
>> > > Otherwise, I'd prefer some per-thread temporary fault injection
>> > > disabling.
>>
>> I tried in make_it_fail flag, kmemleak can avoid fault injection, but I
>> can see kmemleak diabled itself...
>>
>> >
>> > Well, there are two issues (which boil down to the one in the end) here.
>> > Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL
>> > context is something to care about. A weaker allocation context can and
>> > will lead to kmemleak meta data allocation failures regardless of the
>> > fault injection. The way how those objects are allocated directly in the
>> > allacator context makes this really hard and inherently subtle. I can
>> > see only two ways around. Either you have a placeholder for "this object
>> > is not tracked so do not throw false positives" or have a preallocated
>> > pool to use if the direct context allocation failes for whatever reason.
>> > Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind
>> > of problems.



FWIW this problem is traditionally solved in dynamic analysis tools by
embedding meta info right in headers of heap blocks. All of KASAN,
KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then
an object is either allocated or not. If caller has something to
prevent allocations from failing in any context, then the same will be
true for KMEMLEAK meta data.

On a related note, we could also consider switching to
lib/stackdepot.c for alloc stack memorization to reduce header size.
stackdepot is not 100% proof against allocation failures in atomic
contexts, but it does eager memory preallocation and seems to work
well in practice.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-06-04  8:41                             ` Dmitry Vyukov
@ 2018-06-04 12:42                               ` Michal Hocko
  2018-06-04 15:08                                 ` Catalin Marinas
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-06-04 12:42 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Chunyu Hu, Catalin Marinas, Tetsuo Handa, malat, Linux-MM, Akinobu Mita

On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote:
[...]
> FWIW this problem is traditionally solved in dynamic analysis tools by
> embedding meta info right in headers of heap blocks. All of KASAN,
> KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then
> an object is either allocated or not. If caller has something to
> prevent allocations from failing in any context, then the same will be
> true for KMEMLEAK meta data.
> 

This makes much more sense, of course. I thought there were some
fundamental reasons why kmemleak needs to have an off-object tracking
which makes the whole thing much more complicated of course.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-06-04 12:42                               ` Michal Hocko
@ 2018-06-04 15:08                                 ` Catalin Marinas
  2018-06-04 15:36                                   ` Dmitry Vyukov
  0 siblings, 1 reply; 23+ messages in thread
From: Catalin Marinas @ 2018-06-04 15:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dmitry Vyukov, Chunyu Hu, Tetsuo Handa, malat, Linux-MM, Akinobu Mita

On Mon, Jun 04, 2018 at 02:42:10PM +0200, Michal Hocko wrote:
> On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote:
> [...]
> > FWIW this problem is traditionally solved in dynamic analysis tools by
> > embedding meta info right in headers of heap blocks. All of KASAN,
> > KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then
> > an object is either allocated or not. If caller has something to
> > prevent allocations from failing in any context, then the same will be
> > true for KMEMLEAK meta data.
> 
> This makes much more sense, of course. I thought there were some
> fundamental reasons why kmemleak needs to have an off-object tracking
> which makes the whole thing much more complicated of course.

Kmemleak needs to track all memory blocks that may contain pointers
(otherwise the dependency graph cannot be correctly tracked leading to
lots of false positives). Not all these objects come from the slab
allocator, for example it tracks certain alloc_pages() blocks, all of
memblock_alloc().

An option would be to use separate metadata only for non-slab objects,
though I'd have to see how intrusive this is for mm/sl*b.c. Also there
is RCU freeing for the kmemleak metadata to avoid locking when
traversing the internal lists. If the metadata is in the slab object
itself, we'd have to either defer its freeing or add some bigger lock to
kmemleak.

-- 
Catalin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-06-04 15:08                                 ` Catalin Marinas
@ 2018-06-04 15:36                                   ` Dmitry Vyukov
  2018-06-04 16:41                                     ` Catalin Marinas
  0 siblings, 1 reply; 23+ messages in thread
From: Dmitry Vyukov @ 2018-06-04 15:36 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Michal Hocko, Chunyu Hu, Tetsuo Handa, malat, Linux-MM, Akinobu Mita

On Mon, Jun 4, 2018 at 5:08 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Jun 04, 2018 at 02:42:10PM +0200, Michal Hocko wrote:
>> On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote:
>> [...]
>> > FWIW this problem is traditionally solved in dynamic analysis tools by
>> > embedding meta info right in headers of heap blocks. All of KASAN,
>> > KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then
>> > an object is either allocated or not. If caller has something to
>> > prevent allocations from failing in any context, then the same will be
>> > true for KMEMLEAK meta data.
>>
>> This makes much more sense, of course. I thought there were some
>> fundamental reasons why kmemleak needs to have an off-object tracking
>> which makes the whole thing much more complicated of course.
>
> Kmemleak needs to track all memory blocks that may contain pointers
> (otherwise the dependency graph cannot be correctly tracked leading to
> lots of false positives). Not all these objects come from the slab
> allocator, for example it tracks certain alloc_pages() blocks, all of
> memblock_alloc().

I understand that this will make KMEMLEAK tracking non-uniform, but
heap objects are the most important class of allocations.
page struct already contains stackdepot id if CONFIG_PAGE_OWNER is
enabled. Do we need anything else other than stack trace for pages?
I don't know about memblock's.

> An option would be to use separate metadata only for non-slab objects,
> though I'd have to see how intrusive this is for mm/sl*b.c. Also there
> is RCU freeing for the kmemleak metadata to avoid locking when
> traversing the internal lists. If the metadata is in the slab object
> itself, we'd have to either defer its freeing or add some bigger lock to
> kmemleak.

This relates to scanning without slopped world, right? In our
experience with large-scale systematic testing any tool with false
positives can't be used in practice in systematic way. KMEMLEAK false
positives do not allow to enable it on syzbot. We know there are tons
of leaks, we have the tool, but we are not detecting leaks. I don't
know who/how uses KMEMLEAK in non-stop-the-world mode, but
stop-the-world is pretty much a requirement for deployment for us. And
it would also solve the problem with disappearing under our feet heap
blocks, right?
FWIW In LeakSanitizer we don't specifically keep track of heap blocks.
Instead we stop the world and then ask memory allocator for metainfo.
I would expect that sl*b also have all required info, maybe in not
O(1) accessible form, so it may require some preprocessing (e.g.
collecting all free objects in a slab and then subtracting it from set
of all objects in the slab to get set of allocated objects).
But I understand that all of this turns this from "add a flag" to
almost a complete rewrite of the tool...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
  2018-06-04 15:36                                   ` Dmitry Vyukov
@ 2018-06-04 16:41                                     ` Catalin Marinas
  0 siblings, 0 replies; 23+ messages in thread
From: Catalin Marinas @ 2018-06-04 16:41 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Michal Hocko, Chunyu Hu, Tetsuo Handa, malat, Linux-MM, Akinobu Mita

On Mon, Jun 04, 2018 at 05:36:31PM +0200, Dmitry Vyukov wrote:
> On Mon, Jun 4, 2018 at 5:08 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Mon, Jun 04, 2018 at 02:42:10PM +0200, Michal Hocko wrote:
> >> On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote:
> >> [...]
> >> > FWIW this problem is traditionally solved in dynamic analysis tools by
> >> > embedding meta info right in headers of heap blocks. All of KASAN,
> >> > KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then
> >> > an object is either allocated or not. If caller has something to
> >> > prevent allocations from failing in any context, then the same will be
> >> > true for KMEMLEAK meta data.
> >>
> >> This makes much more sense, of course. I thought there were some
> >> fundamental reasons why kmemleak needs to have an off-object tracking
> >> which makes the whole thing much more complicated of course.
> >
> > Kmemleak needs to track all memory blocks that may contain pointers
> > (otherwise the dependency graph cannot be correctly tracked leading to
> > lots of false positives). Not all these objects come from the slab
> > allocator, for example it tracks certain alloc_pages() blocks, all of
> > memblock_alloc().
> 
> I understand that this will make KMEMLEAK tracking non-uniform, but
> heap objects are the most important class of allocations.
> page struct already contains stackdepot id if CONFIG_PAGE_OWNER is
> enabled. Do we need anything else other than stack trace for pages?
> I don't know about memblock's.

Well, it needs most of the other stuff that's in struct kmemleak_object
(list_head, rb_node, some counters, spinlock_t).

> > An option would be to use separate metadata only for non-slab objects,
> > though I'd have to see how intrusive this is for mm/sl*b.c. Also there
> > is RCU freeing for the kmemleak metadata to avoid locking when
> > traversing the internal lists. If the metadata is in the slab object
> > itself, we'd have to either defer its freeing or add some bigger lock to
> > kmemleak.
> 
> This relates to scanning without slopped world, right?

Initially the RCU mechanism was added to defer kmemleak freeing its
metadata with another recursive call into the slab freeing routine
(since it does this when the tracked object is freed). This came in
handy for other lists traversal in kmemleak. For the actual memory
scanning, there is some fine-grained locking per metadata object as we
want to block the freeing until the scanning of the specific object
completes (e.g. vfree() must not unmap the object during scanning).

> In our
> experience with large-scale systematic testing any tool with false
> positives can't be used in practice in systematic way. KMEMLEAK false
> positives do not allow to enable it on syzbot. We know there are tons
> of leaks, we have the tool, but we are not detecting leaks. I don't
> know who/how uses KMEMLEAK in non-stop-the-world mode, but
> stop-the-world is pretty much a requirement for deployment for us. And
> it would also solve the problem with disappearing under our feet heap
> blocks, right?

A hard requirement during the early kmemleak development was not to
actually stop the world (as it can even take minutes to complete the
scanning). It employs various heuristics to deal with false positives
like checksumming, delaying the actual reporting, waiting for an object
to be detected as a leak in two successive scans while its checksum is
the same. While not ideal, it works most of the time.

Now, there was indeed a recent requirement to implement stop-the-world
scanning via a "stopscan" command to /sys/kernel/debug/kmemleak (using
stop_machine()) but I never got around to implementing it. This would be
very useful for non-interactive sessions like automated testing.

> FWIW In LeakSanitizer we don't specifically keep track of heap blocks.
> Instead we stop the world and then ask memory allocator for metainfo.
> I would expect that sl*b also have all required info, maybe in not
> O(1) accessible form, so it may require some preprocessing (e.g.
> collecting all free objects in a slab and then subtracting it from set
> of all objects in the slab to get set of allocated objects).
> But I understand that all of this turns this from "add a flag" to
> almost a complete rewrite of the tool...

As I said above, background scanning is still a requirement but we could
add a stopscan command on top, should be too hard.

-- 
Catalin

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-06-04 16:41 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-26  7:14 WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258 __alloc_pages_nodemask+0xa88/0xfec Mathieu Malaterre
2018-05-28  8:34 ` Michal Hocko
2018-05-28 13:05   ` [PATCH] kmemleak: don't use __GFP_NOFAIL Tetsuo Handa
2018-05-28 13:24     ` Michal Hocko
2018-05-28 21:05       ` Tetsuo Handa
2018-05-29 13:27         ` Chunyu Hu
2018-05-29 13:46           ` Tetsuo Handa
2018-05-30  9:35             ` Chunyu Hu
2018-05-30 10:46               ` Michal Hocko
2018-05-30 11:42                 ` Chunyu Hu
2018-05-30 12:38                   ` Michal Hocko
2018-05-31 10:51                     ` Chunyu Hu
2018-05-31 11:35                       ` Michal Hocko
2018-05-31 12:28                         ` Chunyu Hu
2018-05-31 15:22                     ` Catalin Marinas
2018-05-31 18:41                       ` Michal Hocko
2018-06-01  1:50                         ` Chunyu Hu
2018-06-01  4:53                           ` Chunyu Hu
2018-06-04  8:41                             ` Dmitry Vyukov
2018-06-04 12:42                               ` Michal Hocko
2018-06-04 15:08                                 ` Catalin Marinas
2018-06-04 15:36                                   ` Dmitry Vyukov
2018-06-04 16:41                                     ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.