* WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258 __alloc_pages_nodemask+0xa88/0xfec @ 2018-05-26 7:14 Mathieu Malaterre 2018-05-28 8:34 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Mathieu Malaterre @ 2018-05-26 7:14 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm Hi Michal, For the last couple of days, I am seeing the following appearing in dmesg (*). I am a happy kmemleak user on an oldish Mac Mini G4 (ppc32), it has been working great. What does this new warning checks: /* * All existing users of the __GFP_NOFAIL are blockable, so warn * of any new users that actually require GFP_NOWAIT */ if (WARN_ON_ONCE(!can_direct_reclaim)) goto fail; Thanks, (*) [ 269.038911] WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258 __alloc_pages_nodemask+0xa88/0xfec [ 269.038931] Modules linked in: ctr ccm uinput arc4 b43 bcma mac80211 sha256_generic snd_aoa_codec_toonie cfg80211 snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus snd_aoa_soundbus snd_pcm evdev snd_timer snd sg ssb soundcore usb_storage autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto usbhid ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore firewire_ohci sd_mod sr_mod cdrom firewire_core sungem sungem_phy crc_itu_t nls_base usb_common [ 269.039026] CPU: 0 PID: 21 Comm: kswapd0 Tainted: G W 4.17.0-rc6+ #18 [ 269.039032] NIP: c020e8f8 LR: c020e2e0 CTR: c020b514 [ 269.039038] REGS: dde3b6a0 TRAP: 0700 Tainted: G W (4.17.0-rc6+) [ 269.039042] MSR: 00021032 <ME,IR,DR,RI> CR: 22224484 XER: 00000000 [ 269.039056] GPR00: c020e2e0 dde3b750 df6ab480 00000000 00000001 00000000 00000004 00000040 GPR08: 00000800 22224484 00000040 01ffffff 42224482 00000000 00000040 01011a00 GPR16: c0a864bc 00000040 c0c5a730 01011a00 c0c2f5cc 00000000 00000000 c0c318c4 GPR24: c0c6ba20 00000001 c0a84158 01011a00 c0ce065c c0a84158 00000000 00000000 [ 269.039118] NIP [c020e8f8] __alloc_pages_nodemask+0xa88/0xfec [ 269.039124] LR [c020e2e0] __alloc_pages_nodemask+0x470/0xfec [ 269.039128] Call Trace: [ 269.039136] [dde3b750] [c020e2e0] __alloc_pages_nodemask+0x470/0xfec (unreliable) [ 269.039146] [dde3b820] [c0288c14] new_slab+0x53c/0x970 [ 269.039155] [dde3b880] [c028b61c] ___slab_alloc.constprop.23+0x28c/0x468 [ 269.039163] [dde3b920] [c028c754] kmem_cache_alloc+0x290/0x3dc [ 269.039177] [dde3b990] [c02a6030] create_object+0x50/0x3d0 [ 269.039185] [dde3b9e0] [c028c7a8] kmem_cache_alloc+0x2e4/0x3dc [ 269.039193] [dde3ba50] [c0200f88] mempool_alloc+0x7c/0x164 [ 269.039205] [dde3bab0] [c03e33c0] bio_alloc_bioset+0x130/0x298 [ 269.039216] [dde3baf0] [c0278694] get_swap_bio+0x34/0xe8 [ 269.039223] [dde3bb30] [c0278fb4] __swap_writepage+0x22c/0x644 [ 269.039237] [dde3bbb0] [c022528c] pageout.isra.13+0x238/0x52c [ 269.039246] [dde3bc10] [c02288a0] shrink_page_list+0x9d4/0x1768 [ 269.039254] [dde3bcb0] [c022a264] shrink_inactive_list+0x2c4/0xa34 [ 269.039262] [dde3bd40] [c022b454] shrink_node_memcg+0x344/0xe34 [ 269.039270] [dde3bde0] [c022c068] shrink_node+0x124/0x73c [ 269.039277] [dde3be50] [c022d78c] kswapd+0x318/0xb2c [ 269.039291] [dde3bf10] [c008e264] kthread+0x138/0x1f0 [ 269.039300] [dde3bf40] [c001b2e4] ret_from_kernel_thread+0x5c/0x64 [ 269.039304] Instruction dump: [ 269.039311] 7f44d378 7fa3eb78 4802bd95 4bfff9f4 485d7309 4bfff998 7f03c378 7fc5f378 [ 269.039326] 7f44d378 4802bd79 7c781b78 4bfffd48 <0fe00000> 8081002c 3ca0c08b 7fe6fb78 [ 269.039343] ---[ end trace c255e24f03e28d77 ]--- [ 269.039351] kmemleak: Cannot allocate a kmemleak_object structure [ 269.039373] kmemleak: Kernel memory leak detector disabled [ 269.039412] kmemleak: Automatic memory scanning thread ended ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258 __alloc_pages_nodemask+0xa88/0xfec 2018-05-26 7:14 WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258 __alloc_pages_nodemask+0xa88/0xfec Mathieu Malaterre @ 2018-05-28 8:34 ` Michal Hocko 2018-05-28 13:05 ` [PATCH] kmemleak: don't use __GFP_NOFAIL Tetsuo Handa 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2018-05-28 8:34 UTC (permalink / raw) To: Mathieu Malaterre; +Cc: linux-mm On Sat 26-05-18 09:14:35, Mathieu Malaterre wrote: > Hi Michal, > > For the last couple of days, I am seeing the following appearing in > dmesg (*). I am a happy kmemleak user on an oldish Mac Mini G4 > (ppc32), it has been working great. What does this new warning checks: > > /* > * All existing users of the __GFP_NOFAIL are blockable, so warn > * of any new users that actually require GFP_NOWAIT > */ > if (WARN_ON_ONCE(!can_direct_reclaim)) > goto fail; Interesting. Where does this path get GFP_NOFAIL from? I am looking at the current upstream code and get_swap_bio(GFP_NOIO) bio_alloc(gfp_mask) bio_alloc_bioset(gfp_mask) mempool_alloc(gfp_msk & ~__GFP_DIRECT_RECLAIM) mempool_alloc does play some tricks with the gfp_mask but it doesn't add GFP_NOFAIL AFAICS. > Thanks, > > (*) > [ 269.038911] WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258 > __alloc_pages_nodemask+0xa88/0xfec [...] > [ 269.039118] NIP [c020e8f8] __alloc_pages_nodemask+0xa88/0xfec > [ 269.039124] LR [c020e2e0] __alloc_pages_nodemask+0x470/0xfec > [ 269.039128] Call Trace: > [ 269.039136] [dde3b750] [c020e2e0] __alloc_pages_nodemask+0x470/0xfec (unreliable) > [ 269.039146] [dde3b820] [c0288c14] new_slab+0x53c/0x970 > [ 269.039155] [dde3b880] [c028b61c] ___slab_alloc.constprop.23+0x28c/0x468 > [ 269.039163] [dde3b920] [c028c754] kmem_cache_alloc+0x290/0x3dc > [ 269.039177] [dde3b990] [c02a6030] create_object+0x50/0x3d0 > [ 269.039185] [dde3b9e0] [c028c7a8] kmem_cache_alloc+0x2e4/0x3dc > [ 269.039193] [dde3ba50] [c0200f88] mempool_alloc+0x7c/0x164 > [ 269.039205] [dde3bab0] [c03e33c0] bio_alloc_bioset+0x130/0x298 > [ 269.039216] [dde3baf0] [c0278694] get_swap_bio+0x34/0xe8 > [ 269.039223] [dde3bb30] [c0278fb4] __swap_writepage+0x22c/0x644 > [ 269.039237] [dde3bbb0] [c022528c] pageout.isra.13+0x238/0x52c > [ 269.039246] [dde3bc10] [c02288a0] shrink_page_list+0x9d4/0x1768 > [ 269.039254] [dde3bcb0] [c022a264] shrink_inactive_list+0x2c4/0xa34 > [ 269.039262] [dde3bd40] [c022b454] shrink_node_memcg+0x344/0xe34 > [ 269.039270] [dde3bde0] [c022c068] shrink_node+0x124/0x73c > [ 269.039277] [dde3be50] [c022d78c] kswapd+0x318/0xb2c > [ 269.039291] [dde3bf10] [c008e264] kthread+0x138/0x1f0 > [ 269.039300] [dde3bf40] [c001b2e4] ret_from_kernel_thread+0x5c/0x64 > [ 269.039304] Instruction dump: > [ 269.039311] 7f44d378 7fa3eb78 4802bd95 4bfff9f4 485d7309 4bfff998 7f03c378 7fc5f378 > [ 269.039326] 7f44d378 4802bd79 7c781b78 4bfffd48 <0fe00000> 8081002c 3ca0c08b 7fe6fb78 > [ 269.039343] ---[ end trace c255e24f03e28d77 ]--- > [ 269.039351] kmemleak: Cannot allocate a kmemleak_object structure > [ 269.039373] kmemleak: Kernel memory leak detector disabled > [ 269.039412] kmemleak: Automatic memory scanning thread ended -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-28 8:34 ` Michal Hocko @ 2018-05-28 13:05 ` Tetsuo Handa 2018-05-28 13:24 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Tetsuo Handa @ 2018-05-28 13:05 UTC (permalink / raw) To: Michal Hocko, Mathieu Malaterre, Dmitry Vyukov; +Cc: linux-mm, Catalin Marinas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-28 13:05 ` [PATCH] kmemleak: don't use __GFP_NOFAIL Tetsuo Handa @ 2018-05-28 13:24 ` Michal Hocko 2018-05-28 21:05 ` Tetsuo Handa 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2018-05-28 13:24 UTC (permalink / raw) To: Tetsuo Handa Cc: Mathieu Malaterre, Dmitry Vyukov, linux-mm, Catalin Marinas, Chunyu Hu I've found the previous report [1] finally. Adding Chunyu Hu to the CC list. The report which triggered this one is [2] [1] http://lkml.kernel.org/r/1524243513-29118-1-git-send-email-chuhu@redhat.com [2] http://lkml.kernel.org/r/CA+7wUswp_Sr=hHqi1bwRZ3FE2wY5ozZWZ8Z1BgrFnSAmijUKjA@mail.gmail.com I am not really familiar with the kmemleak code but the expectation that you can make a forward progress in an unknown allocation context seems broken to me. Why kmemleak cannot pre-allocate a pool of object_cache and refill it from a reasonably strong contexts (e.g. in a sleepable context)? On Mon 28-05-18 22:05:21, Tetsuo Handa wrote: > >From f0b7f6c2146f693fec6706bf9e3c34687c73f21a Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Date: Mon, 28 May 2018 21:49:51 +0900 > Subject: [PATCH] kmemleak: don't use __GFP_NOFAIL > > Commit d9570ee3bd1d4f20 ("kmemleak: allow to coexist with fault injection") > added __GFP_NOFAIL to gfp_kmemleak_mask() macro because memory allocation > fault injection trivially disables kmemleak. > > But since !__GFP_DIRECT_RECLAIM && __GFP_NOFAIL memory allocation is not > supported, Mathieu Malaterre is observing warning messages upon > mempool_alloc(gfp_msk & ~__GFP_DIRECT_RECLAIM) allocation request. > > [ 269.039118] NIP [c020e8f8] __alloc_pages_nodemask+0xa88/0xfec > [ 269.039124] LR [c020e2e0] __alloc_pages_nodemask+0x470/0xfec > [ 269.039128] Call Trace: > [ 269.039136] [dde3b750] [c020e2e0] __alloc_pages_nodemask+0x470/0xfec (unreliable) > [ 269.039146] [dde3b820] [c0288c14] new_slab+0x53c/0x970 > [ 269.039155] [dde3b880] [c028b61c] ___slab_alloc.constprop.23+0x28c/0x468 > [ 269.039163] [dde3b920] [c028c754] kmem_cache_alloc+0x290/0x3dc > [ 269.039177] [dde3b990] [c02a6030] create_object+0x50/0x3d0 > [ 269.039185] [dde3b9e0] [c028c7a8] kmem_cache_alloc+0x2e4/0x3dc > [ 269.039193] [dde3ba50] [c0200f88] mempool_alloc+0x7c/0x164 > [ 269.039205] [dde3bab0] [c03e33c0] bio_alloc_bioset+0x130/0x298 > [ 269.039216] [dde3baf0] [c0278694] get_swap_bio+0x34/0xe8 > [ 269.039223] [dde3bb30] [c0278fb4] __swap_writepage+0x22c/0x644 > [ 269.039237] [dde3bbb0] [c022528c] pageout.isra.13+0x238/0x52c > [ 269.039246] [dde3bc10] [c02288a0] shrink_page_list+0x9d4/0x1768 > [ 269.039254] [dde3bcb0] [c022a264] shrink_inactive_list+0x2c4/0xa34 > [ 269.039262] [dde3bd40] [c022b454] shrink_node_memcg+0x344/0xe34 > [ 269.039270] [dde3bde0] [c022c068] shrink_node+0x124/0x73c > [ 269.039277] [dde3be50] [c022d78c] kswapd+0x318/0xb2c > [ 269.039291] [dde3bf10] [c008e264] kthread+0x138/0x1f0 > [ 269.039300] [dde3bf40] [c001b2e4] ret_from_kernel_thread+0x5c/0x64 > > Since the intent of adding __GFP_NOFAIL is not to disable kmemleak by > failing the N'th allocation request, it should be possible to workaround > it by simply retrying N'th allocation request. Thus, this patch changes > callers of gfp_kmemleak_mask() macro to retry for several times. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Reported-by: Mathieu Malaterre <malat@debian.org> > Cc: Dmitry Vyukov <dvyukov@google.com> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Michal Hocko <mhocko@suse.com> > --- > mm/kmemleak.c | 15 ++++++++++----- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/mm/kmemleak.c b/mm/kmemleak.c > index 9a085d5..973998b 100644 > --- a/mm/kmemleak.c > +++ b/mm/kmemleak.c > @@ -126,7 +126,7 @@ > /* GFP bitmask for kmemleak internal allocations */ > #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ > __GFP_NORETRY | __GFP_NOMEMALLOC | \ > - __GFP_NOWARN | __GFP_NOFAIL) > + __GFP_NOWARN) > > /* scanning area inside a memory block */ > struct kmemleak_scan_area { > @@ -548,10 +548,12 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, > int min_count, gfp_t gfp) > { > unsigned long flags; > - struct kmemleak_object *object, *parent; > + struct kmemleak_object *object = NULL, *parent; > struct rb_node **link, *rb_parent; > + unsigned int i; > > - object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); > + for (i = 0; i < 10 && !object; i++) > + object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); > if (!object) { > pr_warn("Cannot allocate a kmemleak_object structure\n"); > kmemleak_disable(); > @@ -763,7 +765,8 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp) > { > unsigned long flags; > struct kmemleak_object *object; > - struct kmemleak_scan_area *area; > + struct kmemleak_scan_area *area = NULL; > + unsigned int i; > > object = find_and_get_object(ptr, 1); > if (!object) { > @@ -772,7 +775,9 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp) > return; > } > > - area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp)); > + for (i = 0; i < 10 && !area; i++) > + area = kmem_cache_alloc(scan_area_cache, > + gfp_kmemleak_mask(gfp)); > if (!area) { > pr_warn("Cannot allocate a scan area\n"); > goto out; > -- > 1.8.3.1 > > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-28 13:24 ` Michal Hocko @ 2018-05-28 21:05 ` Tetsuo Handa 2018-05-29 13:27 ` Chunyu Hu 0 siblings, 1 reply; 23+ messages in thread From: Tetsuo Handa @ 2018-05-28 21:05 UTC (permalink / raw) To: mhocko; +Cc: malat, dvyukov, linux-mm, catalin.marinas, chuhu Michal Hocko wrote: > I've found the previous report [1] finally. Adding Chunyu Hu to the CC > list. The report which triggered this one is [2] > > [1] http://lkml.kernel.org/r/1524243513-29118-1-git-send-email-chuhu@redhat.com > [2] http://lkml.kernel.org/r/CA+7wUswp_Sr=hHqi1bwRZ3FE2wY5ozZWZ8Z1BgrFnSAmijUKjA@mail.gmail.com > > I am not really familiar with the kmemleak code but the expectation that > you can make a forward progress in an unknown allocation context seems > broken to me. Why kmemleak cannot pre-allocate a pool of object_cache > and refill it from a reasonably strong contexts (e.g. in a sleepable > context)? Or, we can undo the original allocation if the kmemleak allocation failed? kmalloc(size, gfp) { ptr = do_kmalloc(size, gfp); if (ptr) { object = do_kmalloc(size, gfp_kmemleak_mask(gfp)); if (!object) { kfree(ptr); return NULL; } // Store information of ptr into object. } return ptr; } ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-28 21:05 ` Tetsuo Handa @ 2018-05-29 13:27 ` Chunyu Hu 2018-05-29 13:46 ` Tetsuo Handa 0 siblings, 1 reply; 23+ messages in thread From: Chunyu Hu @ 2018-05-29 13:27 UTC (permalink / raw) To: Tetsuo Handa; +Cc: mhocko, malat, dvyukov, linux-mm, catalin marinas ----- Original Message ----- > From: "Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp> > To: mhocko@suse.com > Cc: malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, "catalin marinas" <catalin.marinas@arm.com>, > chuhu@redhat.com > Sent: Tuesday, May 29, 2018 5:05:45 AM > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > Michal Hocko wrote: > > I've found the previous report [1] finally. Adding Chunyu Hu to the CC > > list. The report which triggered this one is [2] > > > > [1] > > http://lkml.kernel.org/r/1524243513-29118-1-git-send-email-chuhu@redhat.com > > [2] > > http://lkml.kernel.org/r/CA+7wUswp_Sr=hHqi1bwRZ3FE2wY5ozZWZ8Z1BgrFnSAmijUKjA@mail.gmail.com > > > > I am not really familiar with the kmemleak code but the expectation that > > you can make a forward progress in an unknown allocation context seems > > broken to me. Why kmemleak cannot pre-allocate a pool of object_cache > > and refill it from a reasonably strong contexts (e.g. in a sleepable > > context)? > > Or, we can undo the original allocation if the kmemleak allocation failed? If so, you are making kmemleak a fault injection trigger. But the original purpose for adding GFP_NOFAIL[2] is just for making kmemleak avoid fault injection. (discussion in [1]) I'm trying with per task way for fault injection, and did some tries. In my try, I removed this from NOFAIL kmemleak and kmemleak survived with the per task fault injection helper (disable/enable of task). Maybe I can send another RFC for the api. > > kmalloc(size, gfp) { > ptr = do_kmalloc(size, gfp); > if (ptr) { > object = do_kmalloc(size, gfp_kmemleak_mask(gfp)); > if (!object) { > kfree(ptr); > return NULL; > } > // Store information of ptr into object. > } > return ptr; > } > -- Regards, Chunyu Hu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-29 13:27 ` Chunyu Hu @ 2018-05-29 13:46 ` Tetsuo Handa 2018-05-30 9:35 ` Chunyu Hu 0 siblings, 1 reply; 23+ messages in thread From: Tetsuo Handa @ 2018-05-29 13:46 UTC (permalink / raw) To: Chunyu Hu; +Cc: mhocko, malat, dvyukov, linux-mm, catalin marinas On 2018/05/29 22:27, Chunyu Hu wrote: >>> I am not really familiar with the kmemleak code but the expectation that >>> you can make a forward progress in an unknown allocation context seems >>> broken to me. Why kmemleak cannot pre-allocate a pool of object_cache >>> and refill it from a reasonably strong contexts (e.g. in a sleepable >>> context)? >> >> Or, we can undo the original allocation if the kmemleak allocation failed? > > If so, you are making kmemleak a fault injection trigger. But the original > purpose for adding GFP_NOFAIL[2] is just for making kmemleak avoid fault injection. > (discussion in [1]) I don't think that applying fault injection to kmemleak allocations is bad (except that fault injection messages might be noisy). > > I'm trying with per task way for fault injection, and did some tries. In my > try, I removed this from NOFAIL kmemleak and kmemleak survived with the per > task fault injection helper (disable/enable of task). Maybe I can send another > RFC for the api. You could carry __GFP_NO_FAULT_INJECTION using per "struct task_struct" flag. But I think that undoing the original allocation if the kmemleak allocation failed has an advantage that it does not disable kmemleak when the system is under memory pressure (i.e. about to invoke the OOM killer); allowing us to test memory pressure conditions. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-29 13:46 ` Tetsuo Handa @ 2018-05-30 9:35 ` Chunyu Hu 2018-05-30 10:46 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Chunyu Hu @ 2018-05-30 9:35 UTC (permalink / raw) To: Tetsuo Handa; +Cc: mhocko, malat, dvyukov, linux-mm, catalin marinas ----- Original Message ----- > From: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp> > To: "Chunyu Hu" <chuhu@redhat.com> > Cc: mhocko@suse.com, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, "catalin marinas" > <catalin.marinas@arm.com> > Sent: Tuesday, May 29, 2018 9:46:59 PM > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > On 2018/05/29 22:27, Chunyu Hu wrote: > >>> I am not really familiar with the kmemleak code but the expectation that > >>> you can make a forward progress in an unknown allocation context seems > >>> broken to me. Why kmemleak cannot pre-allocate a pool of object_cache > >>> and refill it from a reasonably strong contexts (e.g. in a sleepable > >>> context)? > >> > >> Or, we can undo the original allocation if the kmemleak allocation failed? > > > > If so, you are making kmemleak a fault injection trigger. But the original > > purpose for adding GFP_NOFAIL[2] is just for making kmemleak avoid fault > > injection. > > (discussion in [1]) > > I don't think that applying fault injection to kmemleak allocations is bad > (except that fault injection messages might be noisy). Maybe we provide a way for user decide to apply fault inject to kmemleak or not is better, by adding another sys file in /sys/kernel/debug/failslab and the fail_page_alloc. > > > > > I'm trying with per task way for fault injection, and did some tries. In my > > try, I removed this from NOFAIL kmemleak and kmemleak survived with the per > > task fault injection helper (disable/enable of task). Maybe I can send > > another > > RFC for the api. > > You could carry __GFP_NO_FAULT_INJECTION using per "struct task_struct" flag. Thanks for this suggestion. I'm trying to reuse the make_it_fail field in task for fault injection. As adding an extra memory alloc flag is not thought so good, I think adding task flag is either? > > But I think that undoing the original allocation if the kmemleak allocation > failed > has an advantage that it does not disable kmemleak when the system is under > memory > pressure (i.e. about to invoke the OOM killer); allowing us to test memory > pressure > conditions. There should be benefit, this is redefining kmemleak's principle, currently it won't affect other allocation directly, but if we free the other user's mem alloc without thinking about the context seems also risk. > > -- Regards, Chunyu Hu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-30 9:35 ` Chunyu Hu @ 2018-05-30 10:46 ` Michal Hocko 2018-05-30 11:42 ` Chunyu Hu 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2018-05-30 10:46 UTC (permalink / raw) To: Chunyu Hu; +Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas On Wed 30-05-18 05:35:37, Chunyu Hu wrote: [...] > I'm trying to reuse the make_it_fail field in task for fault injection. As adding > an extra memory alloc flag is not thought so good, I think adding task flag > is either? Yeah, task flag will be reduced to KMEMLEAK enabled configurations without an additional maint. overhead. Anyway, you should really think about how to guarantee trackability for atomic allocation requests. You cannot simply assume that GFP_NOWAIT will succeed. I guess you really want to have a pre-populated pool of objects for those requests. The obvious question is how to balance such a pool. It ain't easy to track memory by allocating more memory... -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-30 10:46 ` Michal Hocko @ 2018-05-30 11:42 ` Chunyu Hu 2018-05-30 12:38 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Chunyu Hu @ 2018-05-30 11:42 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas, Akinobu Mita ----- Original Message ----- > From: "Michal Hocko" <mhocko@suse.com> > To: "Chunyu Hu" <chuhu@redhat.com> > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, > "catalin marinas" <catalin.marinas@arm.com> > Sent: Wednesday, May 30, 2018 6:46:37 PM > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote: > [...] > > I'm trying to reuse the make_it_fail field in task for fault injection. As > > adding > > an extra memory alloc flag is not thought so good, I think adding task > > flag > > is either? > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations > without an additional maint. overhead. Anyway, you should really think > about how to guarantee trackability for atomic allocation requests. You > cannot simply assume that GFP_NOWAIT will succeed. I guess you really Sure. While I'm using task->make_it_fail, I'm still in the direction of making kmemleak avoid fault inject with task flag instead of page alloc flag. > want to have a pre-populated pool of objects for those requests. The > obvious question is how to balance such a pool. It ain't easy to track > memory by allocating more memory... This solution is going to make kmemleak trace really nofail. We can think later. while I'm thinking about if fault inject can be disabled via flag in task. Actually, I'm doing something like below, the disable_fault_inject() is just setting a flag in task->make_it_fail. But this will depend on if fault injection accept a change like this. CCing Akinobu [1] http://lkml.kernel.org/r/1524243513-29118-1-git-send-email-chuhu@redhat.com [2] http://lkml.kernel.org/r/CA+7wUswp_Sr=hHqi1bwRZ3FE2wY5ozZWZ8Z1BgrFnSAmijUKjA@mail.gmail.com [3] commit d9570ee3bd1d ("kmemleak: allow to coexist with fault injection") +#define disable_fault_inject() \ +do { \ + unsigned long flag; \ + local_irq_save(flag); \ + if (in_irq()) \ + current->make_it_fail |= HARDIRQ_NOFAULT_OFFSET; \ + else \ + current->make_it_fail |= TASK_NOFAULT_OFFSET; \ + local_irq_restore(flag); \ +} while (0) --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -111,6 +111,7 @@ #include <linux/kasan.h> #include <linux/kmemleak.h> #include <linux/memory_hotplug.h> +#include <linux/fault-inject.h> /* * Kmemleak configuration and common defines. @@ -126,7 +127,7 @@ /* GFP bitmask for kmemleak internal allocations */ #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ __GFP_NORETRY | __GFP_NOMEMALLOC | \ - __GFP_NOWARN | __GFP_NOFAIL) + __GFP_NOWARN) /* scanning area inside a memory block */ struct kmemleak_scan_area { @@ -551,12 +552,15 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, struct kmemleak_object *object, *parent; struct rb_node **link, *rb_parent; + disable_fault_inject(); object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); if (!object) { pr_warn("Cannot allocate a kmemleak_object structure\n"); kmemleak_disable(); + enable_fault_inject(); return NULL; } + enable_fault_inject(); INIT_LIST_HEAD(&object->object_list); INIT_LIST_HEAD(&object->gray_list); > > -- > Michal Hocko > SUSE Labs > -- Regards, Chunyu Hu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-30 11:42 ` Chunyu Hu @ 2018-05-30 12:38 ` Michal Hocko 2018-05-31 10:51 ` Chunyu Hu 2018-05-31 15:22 ` Catalin Marinas 0 siblings, 2 replies; 23+ messages in thread From: Michal Hocko @ 2018-05-30 12:38 UTC (permalink / raw) To: Chunyu Hu Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas, Akinobu Mita On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > ----- Original Message ----- > > From: "Michal Hocko" <mhocko@suse.com> > > To: "Chunyu Hu" <chuhu@redhat.com> > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, > > "catalin marinas" <catalin.marinas@arm.com> > > Sent: Wednesday, May 30, 2018 6:46:37 PM > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote: > > [...] > > > I'm trying to reuse the make_it_fail field in task for fault injection. As > > > adding > > > an extra memory alloc flag is not thought so good, I think adding task > > > flag > > > is either? > > > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations > > without an additional maint. overhead. Anyway, you should really think > > about how to guarantee trackability for atomic allocation requests. You > > cannot simply assume that GFP_NOWAIT will succeed. I guess you really > > Sure. While I'm using task->make_it_fail, I'm still in the direction of > making kmemleak avoid fault inject with task flag instead of page alloc flag. > > > want to have a pre-populated pool of objects for those requests. The > > obvious question is how to balance such a pool. It ain't easy to track > > memory by allocating more memory... > > This solution is going to make kmemleak trace really nofail. We can think > later. > > while I'm thinking about if fault inject can be disabled via flag in task. > > Actually, I'm doing something like below, the disable_fault_inject() is > just setting a flag in task->make_it_fail. But this will depend on if > fault injection accept a change like this. CCing Akinobu You still seem to be missing my point I am afraid (or I am ;). So say that you want to track a GFP_NOWAIT allocation request. So create_object will get called with that gfp mask and no matter what you try here your tracking object will be allocated in a weak allocation context as well and disable kmemleak. So it only takes a more heavy memory pressure and the tracing is gone... -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-30 12:38 ` Michal Hocko @ 2018-05-31 10:51 ` Chunyu Hu 2018-05-31 11:35 ` Michal Hocko 2018-05-31 15:22 ` Catalin Marinas 1 sibling, 1 reply; 23+ messages in thread From: Chunyu Hu @ 2018-05-31 10:51 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas, Akinobu Mita ----- Original Message ----- > From: "Michal Hocko" <mhocko@suse.com> > To: "Chunyu Hu" <chuhu@redhat.com> > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, > "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita" <akinobu.mita@gmail.com> > Sent: Wednesday, May 30, 2018 8:38:26 PM > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > > > ----- Original Message ----- > > > From: "Michal Hocko" <mhocko@suse.com> > > > To: "Chunyu Hu" <chuhu@redhat.com> > > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, > > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, > > > "catalin marinas" <catalin.marinas@arm.com> > > > Sent: Wednesday, May 30, 2018 6:46:37 PM > > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > > > > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote: > > > [...] > > > > I'm trying to reuse the make_it_fail field in task for fault injection. > > > > As > > > > adding > > > > an extra memory alloc flag is not thought so good, I think adding task > > > > flag > > > > is either? > > > > > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations > > > without an additional maint. overhead. Anyway, you should really think > > > about how to guarantee trackability for atomic allocation requests. You > > > cannot simply assume that GFP_NOWAIT will succeed. I guess you really > > > > Sure. While I'm using task->make_it_fail, I'm still in the direction of > > making kmemleak avoid fault inject with task flag instead of page alloc > > flag. > > > > > want to have a pre-populated pool of objects for those requests. The > > > obvious question is how to balance such a pool. It ain't easy to track > > > memory by allocating more memory... > > > > This solution is going to make kmemleak trace really nofail. We can think > > later. > > > > while I'm thinking about if fault inject can be disabled via flag in task. > > > > Actually, I'm doing something like below, the disable_fault_inject() is > > just setting a flag in task->make_it_fail. But this will depend on if > > fault injection accept a change like this. CCing Akinobu > > You still seem to be missing my point I am afraid (or I am ;). So say > that you want to track a GFP_NOWAIT allocation request. So create_object > will get called with that gfp mask and no matter what you try here your > tracking object will be allocated in a weak allocation context as well > and disable kmemleak. So it only takes a more heavy memory pressure and > the tracing is gone... Michal, Thank you for the good suggestion. You mean GFP_NOWAIT still can make create_object fail and as a result kmemleak disable itself. So it's not so useful, just like the current __GFP_NOFAIL usage in create_object. In the first thread, we discussed this. and that time you suggested we have fault injection disabled when kmemleak is working and suggested per task way. so my head has been stuck in that point. While now you gave a better suggestion that why not we pre allocate a urgent pool for kmemleak objects. After thinking for a while, I got your point, it's a good way for improving kmemleak to make it can tolerate light allocation failure. And catalin mentioned that we have one option that use the early_log array as urgent pool, which has the similar ideology. Basing on your suggestions, I tried to draft this, what does it look to you? another strong alloc mask and an extra thread for fill the pool, which containts 1M objects in a frequency of 100 ms. If first kmem_cache_alloc failed, then get a object from the pool. diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 9a085d5..7163489 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -128,6 +128,10 @@ __GFP_NORETRY | __GFP_NOMEMALLOC | \ __GFP_NOWARN | __GFP_NOFAIL) +#define gfp_kmemleak_mask_strong() (__GFP_NOMEMALLOC | \ + __GFP_NOWARN | __GFP_RECLAIM | __GFP_NOFAIL) + + /* scanning area inside a memory block */ struct kmemleak_scan_area { struct hlist_node node; @@ -299,6 +303,83 @@ struct early_log { kmemleak_disable(); \ } while (0) +static DEFINE_SPINLOCK(kmemleak_object_lock); +static LIST_HEAD(pool_object_list); +static unsigned int volatile total; +static unsigned int pool_object_max = 1024 * 1024; +static struct task_struct *pool_thread; + +static struct kmemleak_object* kmemleak_pool_fill(void) +{ + struct kmemleak_object *object = NULL; + unsigned long flags; + + object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask_strong()); + spin_lock_irqsave(&kmemleak_object_lock, flags); + if (object) { + list_add(&object->object_list, &pool_object_list); + total++; + } + spin_unlock_irqrestore(&kmemleak_object_lock, flags); + return object; +} + +static struct kmemleak_object* kmemleak_get_pool_object(void) +{ + struct kmemleak_object *object = NULL; + unsigned long flags; + + spin_lock_irqsave(&kmemleak_object_lock, flags); + if (!list_empty(&pool_object_list)) { + object = list_first_entry(&pool_object_list,struct kmemleak_object, + object_list); + list_del(&object->object_list); + total--; + } + spin_unlock_irqrestore(&kmemleak_object_lock, flags); + return object; +} + +static int kmemleak_pool_thread(void *nothinng) +{ + struct kmemleak_object *object = NULL; + while (!kthread_should_stop()) { + if (READ_ONCE(total) < pool_object_max) { + object = kmemleak_pool_fill(); + WARN_ON(!object); + } + schedule_timeout_interruptible(msecs_to_jiffies(100)); + } + return 0; +} + +static void start_pool_thread(void) +{ + if (pool_thread) + return; + pool_thread = kthread_run(kmemleak_pool_thread, NULL, "kmemleak_pool"); + if (IS_ERR(pool_thread)) { + pr_warn("Failed to create the scan thread\n"); + pool_thread = NULL; + } +} +static void stop_pool_thread(void) +{ + struct kmemleak_object *object; + unsigned long flags; + if (pool_thread) { + kthread_stop(pool_thread); + pool_thread = NULL; + } + spin_lock_irqsave(&kmemleak_object_lock, flags); + list_for_each_entry(object, &pool_object_list, object_list) { + list_del(&object->object_list); + kmem_cache_free(object_cache, object); + } + spin_unlock_irqrestore(&kmemleak_object_lock, flags); +} + /* * Printing of the objects hex dump to the seq file. The number of lines to be * printed is limited to HEX_MAX_LINES to prevent seq file spamming. The @@ -553,6 +634,10 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); if (!object) { + object = kmemleak_get_pool_object(); + pr_info("total=%u", total); + } + if (!object) { pr_warn("Cannot allocate a kmemleak_object structure\n"); kmemleak_disable(); return NULL; @@ -1872,8 +1957,10 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf, kmemleak_stack_scan = 0; else if (strncmp(buf, "scan=on", 7) == 0) start_scan_thread(); - else if (strncmp(buf, "scan=off", 8) == 0) + else if (strncmp(buf, "scan=off", 8) == 0) { stop_scan_thread(); + stop_pool_thread(); + } else if (strncmp(buf, "scan=", 5) == 0) { unsigned long secs; @@ -1929,6 +2016,7 @@ static void __kmemleak_do_cleanup(void) static void kmemleak_do_cleanup(struct work_struct *work) { stop_scan_thread(); + stop_pool_thread(); mutex_lock(&scan_mutex); /* @@ -2114,6 +2202,7 @@ static int __init kmemleak_late_init(void) pr_warn("Failed to create the debugfs kmemleak file\n"); mutex_lock(&scan_mutex); start_scan_thread(); + start_pool_thread(); mutex_unlock(&scan_mutex); pr_info("Kernel memory leak detector initialized\n"); > -- > Michal Hocko > SUSE Labs > -- Regards, Chunyu Hu ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-31 10:51 ` Chunyu Hu @ 2018-05-31 11:35 ` Michal Hocko 2018-05-31 12:28 ` Chunyu Hu 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2018-05-31 11:35 UTC (permalink / raw) To: Chunyu Hu Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas, Akinobu Mita On Thu 31-05-18 06:51:22, Chunyu Hu wrote: > > > ----- Original Message ----- > > From: "Michal Hocko" <mhocko@suse.com> > > To: "Chunyu Hu" <chuhu@redhat.com> > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, > > "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita" <akinobu.mita@gmail.com> > > Sent: Wednesday, May 30, 2018 8:38:26 PM > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > > > > > ----- Original Message ----- > > > > From: "Michal Hocko" <mhocko@suse.com> > > > > To: "Chunyu Hu" <chuhu@redhat.com> > > > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, > > > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, > > > > "catalin marinas" <catalin.marinas@arm.com> > > > > Sent: Wednesday, May 30, 2018 6:46:37 PM > > > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > > > > > > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote: > > > > [...] > > > > > I'm trying to reuse the make_it_fail field in task for fault injection. > > > > > As > > > > > adding > > > > > an extra memory alloc flag is not thought so good, I think adding task > > > > > flag > > > > > is either? > > > > > > > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations > > > > without an additional maint. overhead. Anyway, you should really think > > > > about how to guarantee trackability for atomic allocation requests. You > > > > cannot simply assume that GFP_NOWAIT will succeed. I guess you really > > > > > > Sure. While I'm using task->make_it_fail, I'm still in the direction of > > > making kmemleak avoid fault inject with task flag instead of page alloc > > > flag. > > > > > > > want to have a pre-populated pool of objects for those requests. The > > > > obvious question is how to balance such a pool. It ain't easy to track > > > > memory by allocating more memory... > > > > > > This solution is going to make kmemleak trace really nofail. We can think > > > later. > > > > > > while I'm thinking about if fault inject can be disabled via flag in task. > > > > > > Actually, I'm doing something like below, the disable_fault_inject() is > > > just setting a flag in task->make_it_fail. But this will depend on if > > > fault injection accept a change like this. CCing Akinobu > > > > You still seem to be missing my point I am afraid (or I am ;). So say > > that you want to track a GFP_NOWAIT allocation request. So create_object > > will get called with that gfp mask and no matter what you try here your > > tracking object will be allocated in a weak allocation context as well > > and disable kmemleak. So it only takes a more heavy memory pressure and > > the tracing is gone... > > Michal, > > Thank you for the good suggestion. You mean GFP_NOWAIT still can make create_object > fail and as a result kmemleak disable itself. So it's not so useful, just like > the current __GFP_NOFAIL usage in create_object. > > In the first thread, we discussed this. and that time you suggested we have > fault injection disabled when kmemleak is working and suggested per task way. > so my head has been stuck in that point. While now you gave a better suggestion > that why not we pre allocate a urgent pool for kmemleak objects. After thinking > for a while, I got your point, it's a good way for improving kmemleak to make > it can tolerate light allocation failure. And catalin mentioned that we have > one option that use the early_log array as urgent pool, which has the similar > ideology. > > Basing on your suggestions, I tried to draft this, what does it look to you? > another strong alloc mask and an extra thread for fill the pool, which containts > 1M objects in a frequency of 100 ms. If first kmem_cache_alloc failed, then > get a object from the pool. I am not really familiar with kmemleak code base to judge the implementation. Could you be more specific about the highlevel design please? Who is the producer and how does it sync with consumers? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-31 11:35 ` Michal Hocko @ 2018-05-31 12:28 ` Chunyu Hu 0 siblings, 0 replies; 23+ messages in thread From: Chunyu Hu @ 2018-05-31 12:28 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, malat, dvyukov, linux-mm, catalin marinas, Akinobu Mita ----- Original Message ----- > From: "Michal Hocko" <mhocko@suse.com> > To: "Chunyu Hu" <chuhu@redhat.com> > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, > "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita" <akinobu.mita@gmail.com> > Sent: Thursday, May 31, 2018 7:35:08 PM > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > On Thu 31-05-18 06:51:22, Chunyu Hu wrote: > > > > > > ----- Original Message ----- > > > From: "Michal Hocko" <mhocko@suse.com> > > > To: "Chunyu Hu" <chuhu@redhat.com> > > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, > > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, > > > "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita" > > > <akinobu.mita@gmail.com> > > > Sent: Wednesday, May 30, 2018 8:38:26 PM > > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > > > > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > > > > > > > ----- Original Message ----- > > > > > From: "Michal Hocko" <mhocko@suse.com> > > > > > To: "Chunyu Hu" <chuhu@redhat.com> > > > > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, > > > > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, > > > > > "catalin marinas" <catalin.marinas@arm.com> > > > > > Sent: Wednesday, May 30, 2018 6:46:37 PM > > > > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > > > > > > > > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote: > > > > > [...] > > > > > > I'm trying to reuse the make_it_fail field in task for fault > > > > > > injection. > > > > > > As > > > > > > adding > > > > > > an extra memory alloc flag is not thought so good, I think adding > > > > > > task > > > > > > flag > > > > > > is either? > > > > > > > > > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations > > > > > without an additional maint. overhead. Anyway, you should really > > > > > think > > > > > about how to guarantee trackability for atomic allocation requests. > > > > > You > > > > > cannot simply assume that GFP_NOWAIT will succeed. I guess you really > > > > > > > > Sure. While I'm using task->make_it_fail, I'm still in the direction of > > > > making kmemleak avoid fault inject with task flag instead of page alloc > > > > flag. > > > > > > > > > want to have a pre-populated pool of objects for those requests. The > > > > > obvious question is how to balance such a pool. It ain't easy to > > > > > track > > > > > memory by allocating more memory... > > > > > > > > This solution is going to make kmemleak trace really nofail. We can > > > > think > > > > later. > > > > > > > > while I'm thinking about if fault inject can be disabled via flag in > > > > task. > > > > > > > > Actually, I'm doing something like below, the disable_fault_inject() is > > > > just setting a flag in task->make_it_fail. But this will depend on if > > > > fault injection accept a change like this. CCing Akinobu > > > > > > You still seem to be missing my point I am afraid (or I am ;). So say > > > that you want to track a GFP_NOWAIT allocation request. So create_object > > > will get called with that gfp mask and no matter what you try here your > > > tracking object will be allocated in a weak allocation context as well > > > and disable kmemleak. So it only takes a more heavy memory pressure and > > > the tracing is gone... > > > > Michal, > > > > Thank you for the good suggestion. You mean GFP_NOWAIT still can make > > create_object > > fail and as a result kmemleak disable itself. So it's not so useful, just > > like > > the current __GFP_NOFAIL usage in create_object. > > > > In the first thread, we discussed this. and that time you suggested we have > > fault injection disabled when kmemleak is working and suggested per task > > way. > > so my head has been stuck in that point. While now you gave a better > > suggestion > > that why not we pre allocate a urgent pool for kmemleak objects. After > > thinking > > for a while, I got your point, it's a good way for improving kmemleak to > > make > > it can tolerate light allocation failure. And catalin mentioned that we > > have > > one option that use the early_log array as urgent pool, which has the > > similar > > ideology. > > > > Basing on your suggestions, I tried to draft this, what does it look to > > you? > > another strong alloc mask and an extra thread for fill the pool, which > > containts > > 1M objects in a frequency of 100 ms. If first kmem_cache_alloc failed, then > > get a object from the pool. > > I am not really familiar with kmemleak code base to judge the > implementation. Could you be more specific about the highlevel design > please? Who is the producer and how does it sync with consumers? OK. To better describe. We know that, kmemleak_object is meta object for kmemleak trace, and each time kmem_cache_alloc(or other) success, the another following kmem_cache_alloc would be called (in create_object() to get a kmemleak_object and this must succeed, otherwise kmemleak would generate too many false positives as a result of losing track to a memory block which could contain pointer to other objects. so kmemleak trace choose to disable itself when getting such a allocation failure. When facing fault injection, this would become an issue that kmemleak would easily disable itself when fault injected. And memory allocation can happen in irq context, so the followed kmemleak_alloc can't choose a very strong way for allocation (such as blackable). So we can prepare a dynamic kmemleak_object pool. And the design is in fact rather straight, by maintaining a list of kmemleak_object. So the reproducer is a new kernel thread. which do a kmemleak_object(contains list member itself, so easy to link) allocation every 100ms, in a strong allocation way (can sleep and reclaim), to the pool_object_list, and the max length of the list is 1024*1024 (1M). [pool_thread (reproducer)] pool_object_list<-->kmemleak_object<-->kmemleak_object...<-->... And the consumer is create_object(). it can pick one from the list when got failure in first weak allocation. [task doing memory alloc (consumer)] kmem_cache_alloc() create_object() kmem_cache_alloc() (fail ?)--Yes ---> (get kmemleak_object from the pool_object_lsit) |_ No ---> got kmemleak_object [insert kmemleak_object to rb tree] And consumer and producer are synced with spinlock kmemleak_object_lock(maybe call pool_object_lock) [spin lock] kmemleak_object_lock Hope I described it clear... -- Regards, Chunyu Hu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-30 12:38 ` Michal Hocko 2018-05-31 10:51 ` Chunyu Hu @ 2018-05-31 15:22 ` Catalin Marinas 2018-05-31 18:41 ` Michal Hocko 1 sibling, 1 reply; 23+ messages in thread From: Catalin Marinas @ 2018-05-31 15:22 UTC (permalink / raw) To: Michal Hocko Cc: Chunyu Hu, Tetsuo Handa, malat, dvyukov, linux-mm, Akinobu Mita Hi Michal, I'm catching up with this thread. On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote: > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > From: "Michal Hocko" <mhocko@suse.com> > > > want to have a pre-populated pool of objects for those requests. The > > > obvious question is how to balance such a pool. It ain't easy to track > > > memory by allocating more memory... > > > > This solution is going to make kmemleak trace really nofail. We can think > > later. > > > > while I'm thinking about if fault inject can be disabled via flag in task. > > > > Actually, I'm doing something like below, the disable_fault_inject() is > > just setting a flag in task->make_it_fail. But this will depend on if > > fault injection accept a change like this. CCing Akinobu > > You still seem to be missing my point I am afraid (or I am ;). So say > that you want to track a GFP_NOWAIT allocation request. So create_object > will get called with that gfp mask and no matter what you try here your > tracking object will be allocated in a weak allocation context as well > and disable kmemleak. So it only takes a more heavy memory pressure and > the tracing is gone... create_object() indeed gets the originating gfp but it only cares whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out all the other flags when allocating a struct kmemleak_object (and adds some of its own). This has worked ok so far. There is a higher risk of GFP_ATOMIC allocations failing but I haven't seen issues with kmemleak unless you run it under heavy memory pressure (and kmemleak just disables itself). With fault injection, such pressure is simulated with the side effect of rendering kmemleak unusable. Kmemleak could implement its own allocation mechanism (maybe even skipping the slab altogether) but I feel it's overkill just to cope with the specific case of fault injection. Also, it's not easy to figure out how to balance such pool and it may end up calling alloc_pages() which in turn can inject a fault. Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault injection is enabled? Otherwise, I'd prefer some per-thread temporary fault injection disabling. -- Catalin ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-31 15:22 ` Catalin Marinas @ 2018-05-31 18:41 ` Michal Hocko 2018-06-01 1:50 ` Chunyu Hu 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2018-05-31 18:41 UTC (permalink / raw) To: Catalin Marinas Cc: Chunyu Hu, Tetsuo Handa, malat, dvyukov, linux-mm, Akinobu Mita On Thu 31-05-18 16:22:26, Catalin Marinas wrote: > Hi Michal, > > I'm catching up with this thread. > > On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote: > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > > From: "Michal Hocko" <mhocko@suse.com> > > > > want to have a pre-populated pool of objects for those requests. The > > > > obvious question is how to balance such a pool. It ain't easy to track > > > > memory by allocating more memory... > > > > > > This solution is going to make kmemleak trace really nofail. We can think > > > later. > > > > > > while I'm thinking about if fault inject can be disabled via flag in task. > > > > > > Actually, I'm doing something like below, the disable_fault_inject() is > > > just setting a flag in task->make_it_fail. But this will depend on if > > > fault injection accept a change like this. CCing Akinobu > > > > You still seem to be missing my point I am afraid (or I am ;). So say > > that you want to track a GFP_NOWAIT allocation request. So create_object > > will get called with that gfp mask and no matter what you try here your > > tracking object will be allocated in a weak allocation context as well > > and disable kmemleak. So it only takes a more heavy memory pressure and > > the tracing is gone... > > create_object() indeed gets the originating gfp but it only cares > whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out > all the other flags when allocating a struct kmemleak_object (and adds > some of its own). > > This has worked ok so far. There is a higher risk of GFP_ATOMIC > allocations failing but I haven't seen issues with kmemleak unless you > run it under heavy memory pressure (and kmemleak just disables itself). > With fault injection, such pressure is simulated with the side effect of > rendering kmemleak unusable. > > Kmemleak could implement its own allocation mechanism (maybe even > skipping the slab altogether) but I feel it's overkill just to cope with > the specific case of fault injection. Also, it's not easy to figure out > how to balance such pool and it may end up calling alloc_pages() which > in turn can inject a fault. > > Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a > compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault > injection is enabled? > > Otherwise, I'd prefer some per-thread temporary fault injection > disabling. Well, there are two issues (which boil down to the one in the end) here. Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL context is something to care about. A weaker allocation context can and will lead to kmemleak meta data allocation failures regardless of the fault injection. The way how those objects are allocated directly in the allacator context makes this really hard and inherently subtle. I can see only two ways around. Either you have a placeholder for "this object is not tracked so do not throw false positives" or have a preallocated pool to use if the direct context allocation failes for whatever reason. Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind of problems. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-05-31 18:41 ` Michal Hocko @ 2018-06-01 1:50 ` Chunyu Hu 2018-06-01 4:53 ` Chunyu Hu 0 siblings, 1 reply; 23+ messages in thread From: Chunyu Hu @ 2018-06-01 1:50 UTC (permalink / raw) To: Catalin Marinas Cc: Michal Hocko, Tetsuo Handa, malat, dvyukov, linux-mm, Akinobu Mita ----- Original Message ----- > From: "Michal Hocko" <mhocko@suse.com> > To: "Catalin Marinas" <catalin.marinas@arm.com> > Cc: "Chunyu Hu" <chuhu@redhat.com>, "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, > dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" <akinobu.mita@gmail.com> > Sent: Friday, June 1, 2018 2:41:04 AM > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > On Thu 31-05-18 16:22:26, Catalin Marinas wrote: > > Hi Michal, > > > > I'm catching up with this thread. > > > > On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote: > > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > > > From: "Michal Hocko" <mhocko@suse.com> > > > > > want to have a pre-populated pool of objects for those requests. The > > > > > obvious question is how to balance such a pool. It ain't easy to > > > > > track > > > > > memory by allocating more memory... > > > > > > > > This solution is going to make kmemleak trace really nofail. We can > > > > think > > > > later. > > > > > > > > while I'm thinking about if fault inject can be disabled via flag in > > > > task. > > > > > > > > Actually, I'm doing something like below, the disable_fault_inject() is > > > > just setting a flag in task->make_it_fail. But this will depend on if > > > > fault injection accept a change like this. CCing Akinobu > > > > > > You still seem to be missing my point I am afraid (or I am ;). So say > > > that you want to track a GFP_NOWAIT allocation request. So create_object > > > will get called with that gfp mask and no matter what you try here your > > > tracking object will be allocated in a weak allocation context as well > > > and disable kmemleak. So it only takes a more heavy memory pressure and > > > the tracing is gone... > > > > create_object() indeed gets the originating gfp but it only cares > > whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out > > all the other flags when allocating a struct kmemleak_object (and adds > > some of its own). > > > > This has worked ok so far. There is a higher risk of GFP_ATOMIC > > allocations failing but I haven't seen issues with kmemleak unless you > > run it under heavy memory pressure (and kmemleak just disables itself). > > With fault injection, such pressure is simulated with the side effect of > > rendering kmemleak unusable. > > > > Kmemleak could implement its own allocation mechanism (maybe even > > skipping the slab altogether) but I feel it's overkill just to cope with > > the specific case of fault injection. Also, it's not easy to figure out > > how to balance such pool and it may end up calling alloc_pages() which > > in turn can inject a fault. it would benefit kmemleak trace, I see in my test that kmemleak even can work in user pressure cases, such as in my test, stress-ng to consume nearly all the swap space. kmemleak is still working. but 1M objects pool is consuming around 400M + memory. So this is just a experiment try, as you said, how to balance it's size is the issue or ther issues has to be resolved, such as when to add pool, the speed, how big, and so on ... And I fault injected 20000 times fail_page_alloc, and 2148 times happened in create_object, and in such a case, kmemleak is till working after the 2000+ calloc failures. [root@dhcp-12-244 fail_page_alloc]# grep create_object /var/log/messages | wc -l 2148 [60498.299412] FAULT_INJECTION: forcing a failure. name fail_page_alloc, interval 0, probability 80, space 0, times 2 So this way is not just for fault injection, it's about making kmemleak a bit stronger under memory failure case. It would be an exciting experience we see if kmemleak still work even after mem pressure, as a user, I experienced the good usage. > > > > Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a > > compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault > > injection is enabled? Maybe I can have a try on this.. > > > > Otherwise, I'd prefer some per-thread temporary fault injection > > disabling. I tried in make_it_fail flag, kmemleak can avoid fault injection, but I can see kmemleak diabled itself... > > Well, there are two issues (which boil down to the one in the end) here. > Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL > context is something to care about. A weaker allocation context can and > will lead to kmemleak meta data allocation failures regardless of the > fault injection. The way how those objects are allocated directly in the > allacator context makes this really hard and inherently subtle. I can > see only two ways around. Either you have a placeholder for "this object > is not tracked so do not throw false positives" or have a preallocated > pool to use if the direct context allocation failes for whatever reason. > Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind > of problems. > -- > Michal Hocko > SUSE Labs > -- Regards, Chunyu Hu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-06-01 1:50 ` Chunyu Hu @ 2018-06-01 4:53 ` Chunyu Hu 2018-06-04 8:41 ` Dmitry Vyukov 0 siblings, 1 reply; 23+ messages in thread From: Chunyu Hu @ 2018-06-01 4:53 UTC (permalink / raw) To: Catalin Marinas Cc: Michal Hocko, Tetsuo Handa, malat, dvyukov, linux-mm, Akinobu Mita ----- Original Message ----- > From: "Chunyu Hu" <chuhu@redhat.com> > To: "Catalin Marinas" <catalin.marinas@arm.com> > Cc: "Michal Hocko" <mhocko@suse.com>, "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, > dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" <akinobu.mita@gmail.com> > Sent: Friday, June 1, 2018 9:50:20 AM > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > > > ----- Original Message ----- > > From: "Michal Hocko" <mhocko@suse.com> > > To: "Catalin Marinas" <catalin.marinas@arm.com> > > Cc: "Chunyu Hu" <chuhu@redhat.com>, "Tetsuo Handa" > > <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, > > dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" > > <akinobu.mita@gmail.com> > > Sent: Friday, June 1, 2018 2:41:04 AM > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > > > On Thu 31-05-18 16:22:26, Catalin Marinas wrote: > > > Hi Michal, > > > > > > I'm catching up with this thread. > > > > > > On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote: > > > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > > > > From: "Michal Hocko" <mhocko@suse.com> > > > > > > want to have a pre-populated pool of objects for those requests. > > > > > > The > > > > > > obvious question is how to balance such a pool. It ain't easy to > > > > > > track > > > > > > memory by allocating more memory... > > > > > > > > > > This solution is going to make kmemleak trace really nofail. We can > > > > > think > > > > > later. > > > > > > > > > > while I'm thinking about if fault inject can be disabled via flag in > > > > > task. > > > > > > > > > > Actually, I'm doing something like below, the disable_fault_inject() > > > > > is > > > > > just setting a flag in task->make_it_fail. But this will depend on if > > > > > fault injection accept a change like this. CCing Akinobu > > > > > > > > You still seem to be missing my point I am afraid (or I am ;). So say > > > > that you want to track a GFP_NOWAIT allocation request. So > > > > create_object > > > > will get called with that gfp mask and no matter what you try here your > > > > tracking object will be allocated in a weak allocation context as well > > > > and disable kmemleak. So it only takes a more heavy memory pressure and > > > > the tracing is gone... > > > > > > create_object() indeed gets the originating gfp but it only cares > > > whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out > > > all the other flags when allocating a struct kmemleak_object (and adds > > > some of its own). > > > > > > This has worked ok so far. There is a higher risk of GFP_ATOMIC > > > allocations failing but I haven't seen issues with kmemleak unless you > > > run it under heavy memory pressure (and kmemleak just disables itself). > > > With fault injection, such pressure is simulated with the side effect of > > > rendering kmemleak unusable. > > > > > > Kmemleak could implement its own allocation mechanism (maybe even > > > skipping the slab altogether) but I feel it's overkill just to cope with > > > the specific case of fault injection. Also, it's not easy to figure out > > > how to balance such pool and it may end up calling alloc_pages() which > > > in turn can inject a fault. > > > it would benefit kmemleak trace, I see in my test that kmemleak even can work > in > user pressure cases, such as in my test, stress-ng to consume > nearly all the swap space. kmemleak is still working. but 1M objects pool > is consuming around 400M + memory. So this is just a experiment try, as you > said, how to balance it's size is the issue or ther issues has to be > resolved, > such as when to add pool, the speed, how big, and so on ... > > And I fault injected 20000 times fail_page_alloc, and 2148 times happened > in create_object, and in such a case, kmemleak is till working after the > 2000+ calloc failures. > > [root@dhcp-12-244 fail_page_alloc]# grep create_object /var/log/messages | wc > -l > 2148 > > [60498.299412] FAULT_INJECTION: forcing a failure. > name fail_page_alloc, interval 0, probability 80, space 0, times 2 > > So this way is not just for fault injection, it's about making kmemleak > a bit stronger under memory failure case. It would be an exciting experience > we > see if kmemleak still work even after mem pressure, as a user, I experienced > the good usage. > > > > > > > > Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a > > > compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault > > > injection is enabled? > > Maybe I can have a try on this.. looks like I tried this in my first report thread, and it failed. As it can sleep in irq() .. https://marc.info/?l=linux-mm&m=152482400222806&w=2 > > > > > > > Otherwise, I'd prefer some per-thread temporary fault injection > > > disabling. > > I tried in make_it_fail flag, kmemleak can avoid fault injection, but I > can see kmemleak diabled itself... > > > > > Well, there are two issues (which boil down to the one in the end) here. > > Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL > > context is something to care about. A weaker allocation context can and > > will lead to kmemleak meta data allocation failures regardless of the > > fault injection. The way how those objects are allocated directly in the > > allacator context makes this really hard and inherently subtle. I can > > see only two ways around. Either you have a placeholder for "this object > > is not tracked so do not throw false positives" or have a preallocated > > pool to use if the direct context allocation failes for whatever reason. > > Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind > > of problems. > > -- > > Michal Hocko > > SUSE Labs > > > > -- > Regards, > Chunyu Hu > > -- Regards, Chunyu Hu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-06-01 4:53 ` Chunyu Hu @ 2018-06-04 8:41 ` Dmitry Vyukov 2018-06-04 12:42 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Dmitry Vyukov @ 2018-06-04 8:41 UTC (permalink / raw) To: Chunyu Hu Cc: Catalin Marinas, Michal Hocko, Tetsuo Handa, malat, Linux-MM, Akinobu Mita On Fri, Jun 1, 2018 at 6:53 AM, Chunyu Hu <chuhu@redhat.com> wrote: > ----- Original Message ----- >> From: "Chunyu Hu" <chuhu@redhat.com> >> To: "Catalin Marinas" <catalin.marinas@arm.com> >> Cc: "Michal Hocko" <mhocko@suse.com>, "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, >> dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" <akinobu.mita@gmail.com> >> Sent: Friday, June 1, 2018 9:50:20 AM >> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL >> >> >> >> ----- Original Message ----- >> > From: "Michal Hocko" <mhocko@suse.com> >> > To: "Catalin Marinas" <catalin.marinas@arm.com> >> > Cc: "Chunyu Hu" <chuhu@redhat.com>, "Tetsuo Handa" >> > <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, >> > dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" >> > <akinobu.mita@gmail.com> >> > Sent: Friday, June 1, 2018 2:41:04 AM >> > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL >> > >> > On Thu 31-05-18 16:22:26, Catalin Marinas wrote: >> > > Hi Michal, >> > > >> > > I'm catching up with this thread. >> > > >> > > On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote: >> > > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: >> > > > > From: "Michal Hocko" <mhocko@suse.com> >> > > > > > want to have a pre-populated pool of objects for those requests. >> > > > > > The >> > > > > > obvious question is how to balance such a pool. It ain't easy to >> > > > > > track >> > > > > > memory by allocating more memory... >> > > > > >> > > > > This solution is going to make kmemleak trace really nofail. We can >> > > > > think >> > > > > later. >> > > > > >> > > > > while I'm thinking about if fault inject can be disabled via flag in >> > > > > task. >> > > > > >> > > > > Actually, I'm doing something like below, the disable_fault_inject() >> > > > > is >> > > > > just setting a flag in task->make_it_fail. But this will depend on if >> > > > > fault injection accept a change like this. CCing Akinobu >> > > > >> > > > You still seem to be missing my point I am afraid (or I am ;). So say >> > > > that you want to track a GFP_NOWAIT allocation request. So >> > > > create_object >> > > > will get called with that gfp mask and no matter what you try here your >> > > > tracking object will be allocated in a weak allocation context as well >> > > > and disable kmemleak. So it only takes a more heavy memory pressure and >> > > > the tracing is gone... >> > > >> > > create_object() indeed gets the originating gfp but it only cares >> > > whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out >> > > all the other flags when allocating a struct kmemleak_object (and adds >> > > some of its own). >> > > >> > > This has worked ok so far. There is a higher risk of GFP_ATOMIC >> > > allocations failing but I haven't seen issues with kmemleak unless you >> > > run it under heavy memory pressure (and kmemleak just disables itself). >> > > With fault injection, such pressure is simulated with the side effect of >> > > rendering kmemleak unusable. >> > > >> > > Kmemleak could implement its own allocation mechanism (maybe even >> > > skipping the slab altogether) but I feel it's overkill just to cope with >> > > the specific case of fault injection. Also, it's not easy to figure out >> > > how to balance such pool and it may end up calling alloc_pages() which >> > > in turn can inject a fault. >> >> >> it would benefit kmemleak trace, I see in my test that kmemleak even can work >> in >> user pressure cases, such as in my test, stress-ng to consume >> nearly all the swap space. kmemleak is still working. but 1M objects pool >> is consuming around 400M + memory. So this is just a experiment try, as you >> said, how to balance it's size is the issue or ther issues has to be >> resolved, >> such as when to add pool, the speed, how big, and so on ... >> >> And I fault injected 20000 times fail_page_alloc, and 2148 times happened >> in create_object, and in such a case, kmemleak is till working after the >> 2000+ calloc failures. >> >> [root@dhcp-12-244 fail_page_alloc]# grep create_object /var/log/messages | wc >> -l >> 2148 >> >> [60498.299412] FAULT_INJECTION: forcing a failure. >> name fail_page_alloc, interval 0, probability 80, space 0, times 2 >> >> So this way is not just for fault injection, it's about making kmemleak >> a bit stronger under memory failure case. It would be an exciting experience >> we >> see if kmemleak still work even after mem pressure, as a user, I experienced >> the good usage. >> >> >> > > >> > > Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a >> > > compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault >> > > injection is enabled? >> >> Maybe I can have a try on this.. > > looks like I tried this in my first report thread, and it failed. As it > can sleep in irq() .. > > https://marc.info/?l=linux-mm&m=152482400222806&w=2 > >> >> > > >> > > Otherwise, I'd prefer some per-thread temporary fault injection >> > > disabling. >> >> I tried in make_it_fail flag, kmemleak can avoid fault injection, but I >> can see kmemleak diabled itself... >> >> > >> > Well, there are two issues (which boil down to the one in the end) here. >> > Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL >> > context is something to care about. A weaker allocation context can and >> > will lead to kmemleak meta data allocation failures regardless of the >> > fault injection. The way how those objects are allocated directly in the >> > allacator context makes this really hard and inherently subtle. I can >> > see only two ways around. Either you have a placeholder for "this object >> > is not tracked so do not throw false positives" or have a preallocated >> > pool to use if the direct context allocation failes for whatever reason. >> > Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind >> > of problems. FWIW this problem is traditionally solved in dynamic analysis tools by embedding meta info right in headers of heap blocks. All of KASAN, KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then an object is either allocated or not. If caller has something to prevent allocations from failing in any context, then the same will be true for KMEMLEAK meta data. On a related note, we could also consider switching to lib/stackdepot.c for alloc stack memorization to reduce header size. stackdepot is not 100% proof against allocation failures in atomic contexts, but it does eager memory preallocation and seems to work well in practice. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-06-04 8:41 ` Dmitry Vyukov @ 2018-06-04 12:42 ` Michal Hocko 2018-06-04 15:08 ` Catalin Marinas 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2018-06-04 12:42 UTC (permalink / raw) To: Dmitry Vyukov Cc: Chunyu Hu, Catalin Marinas, Tetsuo Handa, malat, Linux-MM, Akinobu Mita On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote: [...] > FWIW this problem is traditionally solved in dynamic analysis tools by > embedding meta info right in headers of heap blocks. All of KASAN, > KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then > an object is either allocated or not. If caller has something to > prevent allocations from failing in any context, then the same will be > true for KMEMLEAK meta data. > This makes much more sense, of course. I thought there were some fundamental reasons why kmemleak needs to have an off-object tracking which makes the whole thing much more complicated of course. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-06-04 12:42 ` Michal Hocko @ 2018-06-04 15:08 ` Catalin Marinas 2018-06-04 15:36 ` Dmitry Vyukov 0 siblings, 1 reply; 23+ messages in thread From: Catalin Marinas @ 2018-06-04 15:08 UTC (permalink / raw) To: Michal Hocko Cc: Dmitry Vyukov, Chunyu Hu, Tetsuo Handa, malat, Linux-MM, Akinobu Mita On Mon, Jun 04, 2018 at 02:42:10PM +0200, Michal Hocko wrote: > On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote: > [...] > > FWIW this problem is traditionally solved in dynamic analysis tools by > > embedding meta info right in headers of heap blocks. All of KASAN, > > KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then > > an object is either allocated or not. If caller has something to > > prevent allocations from failing in any context, then the same will be > > true for KMEMLEAK meta data. > > This makes much more sense, of course. I thought there were some > fundamental reasons why kmemleak needs to have an off-object tracking > which makes the whole thing much more complicated of course. Kmemleak needs to track all memory blocks that may contain pointers (otherwise the dependency graph cannot be correctly tracked leading to lots of false positives). Not all these objects come from the slab allocator, for example it tracks certain alloc_pages() blocks, all of memblock_alloc(). An option would be to use separate metadata only for non-slab objects, though I'd have to see how intrusive this is for mm/sl*b.c. Also there is RCU freeing for the kmemleak metadata to avoid locking when traversing the internal lists. If the metadata is in the slab object itself, we'd have to either defer its freeing or add some bigger lock to kmemleak. -- Catalin ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-06-04 15:08 ` Catalin Marinas @ 2018-06-04 15:36 ` Dmitry Vyukov 2018-06-04 16:41 ` Catalin Marinas 0 siblings, 1 reply; 23+ messages in thread From: Dmitry Vyukov @ 2018-06-04 15:36 UTC (permalink / raw) To: Catalin Marinas Cc: Michal Hocko, Chunyu Hu, Tetsuo Handa, malat, Linux-MM, Akinobu Mita On Mon, Jun 4, 2018 at 5:08 PM, Catalin Marinas <catalin.marinas@arm.com> wrote: > On Mon, Jun 04, 2018 at 02:42:10PM +0200, Michal Hocko wrote: >> On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote: >> [...] >> > FWIW this problem is traditionally solved in dynamic analysis tools by >> > embedding meta info right in headers of heap blocks. All of KASAN, >> > KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then >> > an object is either allocated or not. If caller has something to >> > prevent allocations from failing in any context, then the same will be >> > true for KMEMLEAK meta data. >> >> This makes much more sense, of course. I thought there were some >> fundamental reasons why kmemleak needs to have an off-object tracking >> which makes the whole thing much more complicated of course. > > Kmemleak needs to track all memory blocks that may contain pointers > (otherwise the dependency graph cannot be correctly tracked leading to > lots of false positives). Not all these objects come from the slab > allocator, for example it tracks certain alloc_pages() blocks, all of > memblock_alloc(). I understand that this will make KMEMLEAK tracking non-uniform, but heap objects are the most important class of allocations. page struct already contains stackdepot id if CONFIG_PAGE_OWNER is enabled. Do we need anything else other than stack trace for pages? I don't know about memblock's. > An option would be to use separate metadata only for non-slab objects, > though I'd have to see how intrusive this is for mm/sl*b.c. Also there > is RCU freeing for the kmemleak metadata to avoid locking when > traversing the internal lists. If the metadata is in the slab object > itself, we'd have to either defer its freeing or add some bigger lock to > kmemleak. This relates to scanning without slopped world, right? In our experience with large-scale systematic testing any tool with false positives can't be used in practice in systematic way. KMEMLEAK false positives do not allow to enable it on syzbot. We know there are tons of leaks, we have the tool, but we are not detecting leaks. I don't know who/how uses KMEMLEAK in non-stop-the-world mode, but stop-the-world is pretty much a requirement for deployment for us. And it would also solve the problem with disappearing under our feet heap blocks, right? FWIW In LeakSanitizer we don't specifically keep track of heap blocks. Instead we stop the world and then ask memory allocator for metainfo. I would expect that sl*b also have all required info, maybe in not O(1) accessible form, so it may require some preprocessing (e.g. collecting all free objects in a slab and then subtracting it from set of all objects in the slab to get set of allocated objects). But I understand that all of this turns this from "add a flag" to almost a complete rewrite of the tool... ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] kmemleak: don't use __GFP_NOFAIL 2018-06-04 15:36 ` Dmitry Vyukov @ 2018-06-04 16:41 ` Catalin Marinas 0 siblings, 0 replies; 23+ messages in thread From: Catalin Marinas @ 2018-06-04 16:41 UTC (permalink / raw) To: Dmitry Vyukov Cc: Michal Hocko, Chunyu Hu, Tetsuo Handa, malat, Linux-MM, Akinobu Mita On Mon, Jun 04, 2018 at 05:36:31PM +0200, Dmitry Vyukov wrote: > On Mon, Jun 4, 2018 at 5:08 PM, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Mon, Jun 04, 2018 at 02:42:10PM +0200, Michal Hocko wrote: > >> On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote: > >> [...] > >> > FWIW this problem is traditionally solved in dynamic analysis tools by > >> > embedding meta info right in headers of heap blocks. All of KASAN, > >> > KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then > >> > an object is either allocated or not. If caller has something to > >> > prevent allocations from failing in any context, then the same will be > >> > true for KMEMLEAK meta data. > >> > >> This makes much more sense, of course. I thought there were some > >> fundamental reasons why kmemleak needs to have an off-object tracking > >> which makes the whole thing much more complicated of course. > > > > Kmemleak needs to track all memory blocks that may contain pointers > > (otherwise the dependency graph cannot be correctly tracked leading to > > lots of false positives). Not all these objects come from the slab > > allocator, for example it tracks certain alloc_pages() blocks, all of > > memblock_alloc(). > > I understand that this will make KMEMLEAK tracking non-uniform, but > heap objects are the most important class of allocations. > page struct already contains stackdepot id if CONFIG_PAGE_OWNER is > enabled. Do we need anything else other than stack trace for pages? > I don't know about memblock's. Well, it needs most of the other stuff that's in struct kmemleak_object (list_head, rb_node, some counters, spinlock_t). > > An option would be to use separate metadata only for non-slab objects, > > though I'd have to see how intrusive this is for mm/sl*b.c. Also there > > is RCU freeing for the kmemleak metadata to avoid locking when > > traversing the internal lists. If the metadata is in the slab object > > itself, we'd have to either defer its freeing or add some bigger lock to > > kmemleak. > > This relates to scanning without slopped world, right? Initially the RCU mechanism was added to defer kmemleak freeing its metadata with another recursive call into the slab freeing routine (since it does this when the tracked object is freed). This came in handy for other lists traversal in kmemleak. For the actual memory scanning, there is some fine-grained locking per metadata object as we want to block the freeing until the scanning of the specific object completes (e.g. vfree() must not unmap the object during scanning). > In our > experience with large-scale systematic testing any tool with false > positives can't be used in practice in systematic way. KMEMLEAK false > positives do not allow to enable it on syzbot. We know there are tons > of leaks, we have the tool, but we are not detecting leaks. I don't > know who/how uses KMEMLEAK in non-stop-the-world mode, but > stop-the-world is pretty much a requirement for deployment for us. And > it would also solve the problem with disappearing under our feet heap > blocks, right? A hard requirement during the early kmemleak development was not to actually stop the world (as it can even take minutes to complete the scanning). It employs various heuristics to deal with false positives like checksumming, delaying the actual reporting, waiting for an object to be detected as a leak in two successive scans while its checksum is the same. While not ideal, it works most of the time. Now, there was indeed a recent requirement to implement stop-the-world scanning via a "stopscan" command to /sys/kernel/debug/kmemleak (using stop_machine()) but I never got around to implementing it. This would be very useful for non-interactive sessions like automated testing. > FWIW In LeakSanitizer we don't specifically keep track of heap blocks. > Instead we stop the world and then ask memory allocator for metainfo. > I would expect that sl*b also have all required info, maybe in not > O(1) accessible form, so it may require some preprocessing (e.g. > collecting all free objects in a slab and then subtracting it from set > of all objects in the slab to get set of allocated objects). > But I understand that all of this turns this from "add a flag" to > almost a complete rewrite of the tool... As I said above, background scanning is still a requirement but we could add a stopscan command on top, should be too hard. -- Catalin ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2018-06-04 16:41 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-26 7:14 WARNING: CPU: 0 PID: 21 at ../mm/page_alloc.c:4258 __alloc_pages_nodemask+0xa88/0xfec Mathieu Malaterre 2018-05-28 8:34 ` Michal Hocko 2018-05-28 13:05 ` [PATCH] kmemleak: don't use __GFP_NOFAIL Tetsuo Handa 2018-05-28 13:24 ` Michal Hocko 2018-05-28 21:05 ` Tetsuo Handa 2018-05-29 13:27 ` Chunyu Hu 2018-05-29 13:46 ` Tetsuo Handa 2018-05-30 9:35 ` Chunyu Hu 2018-05-30 10:46 ` Michal Hocko 2018-05-30 11:42 ` Chunyu Hu 2018-05-30 12:38 ` Michal Hocko 2018-05-31 10:51 ` Chunyu Hu 2018-05-31 11:35 ` Michal Hocko 2018-05-31 12:28 ` Chunyu Hu 2018-05-31 15:22 ` Catalin Marinas 2018-05-31 18:41 ` Michal Hocko 2018-06-01 1:50 ` Chunyu Hu 2018-06-01 4:53 ` Chunyu Hu 2018-06-04 8:41 ` Dmitry Vyukov 2018-06-04 12:42 ` Michal Hocko 2018-06-04 15:08 ` Catalin Marinas 2018-06-04 15:36 ` Dmitry Vyukov 2018-06-04 16:41 ` Catalin Marinas
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.