From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753501AbdIDKJK (ORCPT ); Mon, 4 Sep 2017 06:09:10 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:56406 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753371AbdIDKJJ (ORCPT ); Mon, 4 Sep 2017 06:09:09 -0400 Date: Mon, 4 Sep 2017 11:09:05 +0100 From: Catalin Marinas To: Steven Rostedt Cc: LKML , Andrey Ryabinin , kasan-dev@googlegroups.com Subject: Re: kmemleak not always catching stuff Message-ID: <20170904100904.v5soe2afqebogefv@armageddon.cambridge.arm.com> References: <20170901183311.3bf3348a@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170901183311.3bf3348a@gandalf.local.home> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Steve, On Fri, Sep 01, 2017 at 06:33:11PM -0400, Steven Rostedt wrote: > Recently kmemleak discovered a bug in my code where an allocated > trampoline for a ftrace function tracer wasn't freed due to an exit > path. The thing is, kmemleak was able to catch this 100% when it was > triggered by one of my ftrace selftests that happen at bootup. But when > I trigger the issue from user space after bootup finished, it would not > catch it. Is this the create_filter() fix that went in recently? > Now I was thinking that it may be due to the fact that the trampoline > is allocated with module_alloc(), and that has some magic kasan goo in > it. But when forcing the issue with adding the following code: > > void **pblah; > void *blah; > > pblah = kmalloc(sizeof(*pblah), GFP_KERNEL); > blah = module_alloc(PAGE_SIZE); > *pblah = blah; > printk("allocated blah %p\n", blah); > kfree(pblah); > > in a path that I could control, it would catch it only after doing it > several times. I was never able to have kmemleak catch the actual bug > from user space no matter how many times I triggered it. module_alloc() uses vmalloc_exec(), so it is tracked by kmemleak but you probably hit a false negative with the blah pointer lingering somewhere on some stack. > # dmesg |grep kmemleak > [ 16.746832] kmemleak: Kernel memory leak detector initialized > [ 16.746888] kmemleak: Automatic memory scanning thread started > > And then I would do: > > # echo scan=on > /sys/kernel/debug/kmemleak scan=on is not necessary since this just enables the scanning thread (already started as per dmesg). > [do the test] > > # echo scan > /sys/kernel/debug/kmemleak Some heuristics in kmemleak cause the first leak of an object not to be reported (too many false positives). You'd need to do "echo scan" at least twice after an allocation. I tried the same test code you have above triggered with an echo ... > /sys from user space. After the second scan it shows the leak, both with and without KASan. > Most of the times it found nothing. Even when I switched the above from > module_alloc() to kmalloc(). > > Is this normal? In general, a leak would eventually appear after a few scans or in time when some memory location is overridden. Yet another heuristics in kmemleak is to treat pointers at some offset inside an object as valid references (because of the container_of tricks). However, the downside is that the bigger the object, the greater chances of finding some random data that looks like a pointer. We could change this logic to require precise pointers above a certain size (e.g. PAGE_SIZE) where the use of container_of() is less likely. Kmemleak doesn't have a way to inspect false negatives but if you are interested in digging further, I could add a "find=0x..." command to print all references to an object during scanning. I also need to find some time to implement a "stopscan" command which uses stop_machine() and skips the heuristics for reducing false positives. -- Catalin