From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753501AbdIDKJK (ORCPT <rfc822;w@1wt.eu>);
        Mon, 4 Sep 2017 06:09:10 -0400
Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:56406 "EHLO
        foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1753371AbdIDKJJ (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 4 Sep 2017 06:09:09 -0400
Date: Mon, 4 Sep 2017 11:09:05 +0100
From: Catalin Marinas <catalin.marinas@arm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
        Andrey Ryabinin <aryabinin@virtuozzo.com>, kasan-dev@googlegroups.com
Subject: Re: kmemleak not always catching stuff
Message-ID: <20170904100904.v5soe2afqebogefv@armageddon.cambridge.arm.com>
References: <20170901183311.3bf3348a@gandalf.local.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170901183311.3bf3348a@gandalf.local.home>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Steve,

On Fri, Sep 01, 2017 at 06:33:11PM -0400, Steven Rostedt wrote:
> Recently kmemleak discovered a bug in my code where an allocated
> trampoline for a ftrace function tracer wasn't freed due to an exit
> path. The thing is, kmemleak was able to catch this 100% when it was
> triggered by one of my ftrace selftests that happen at bootup. But when
> I trigger the issue from user space after bootup finished, it would not
> catch it.

Is this the create_filter() fix that went in recently?

> Now I was thinking that it may be due to the fact that the trampoline
> is allocated with module_alloc(), and that has some magic kasan goo in
> it. But when forcing the issue with adding the following code:
> 
> 	void **pblah;
> 	void *blah;
> 
> 	pblah = kmalloc(sizeof(*pblah), GFP_KERNEL);	
> 	blah = module_alloc(PAGE_SIZE);
> 	*pblah = blah;
> 	printk("allocated blah %p\n", blah);
> 	kfree(pblah);
> 
> in a path that I could control, it would catch it only after doing it
> several times. I was never able to have kmemleak catch the actual bug
> from user space no matter how many times I triggered it.

module_alloc() uses vmalloc_exec(), so it is tracked by kmemleak but you
probably hit a false negative with the blah pointer lingering somewhere
on some stack.

>  # dmesg |grep kmemleak 
> [   16.746832] kmemleak: Kernel memory leak detector initialized
> [   16.746888] kmemleak: Automatic memory scanning thread started
> 
> And then I would do:
> 
>  # echo scan=on > /sys/kernel/debug/kmemleak

scan=on is not necessary since this just enables the scanning thread
(already started as per dmesg).

>  [do the test]
> 
>  # echo scan > /sys/kernel/debug/kmemleak

Some heuristics in kmemleak cause the first leak of an object not to be
reported (too many false positives). You'd need to do "echo scan" at
least twice after an allocation.

I tried the same test code you have above triggered with an echo ... >
/sys from user space. After the second scan it shows the leak, both with
and without KASan.

> Most of the times it found nothing. Even when I switched the above from
> module_alloc() to kmalloc().
> 
> Is this normal?

In general, a leak would eventually appear after a few scans or in time
when some memory location is overridden.

Yet another heuristics in kmemleak is to treat pointers at some offset
inside an object as valid references (because of the container_of
tricks). However, the downside is that the bigger the object, the
greater chances of finding some random data that looks like a pointer.
We could change this logic to require precise pointers above a certain
size (e.g. PAGE_SIZE) where the use of container_of() is less likely.

Kmemleak doesn't have a way to inspect false negatives but if you are
interested in digging further, I could add a "find=0x..." command to
print all references to an object during scanning. I also need to find
some time to implement a "stopscan" command which uses stop_machine()
and skips the heuristics for reducing false positives.

-- 
Catalin