From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1756698AbdIHRp0 (ORCPT <rfc822;w@1wt.eu>);
        Fri, 8 Sep 2017 13:45:26 -0400
Received: from foss.arm.com ([217.140.101.70]:43552 "EHLO foss.arm.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1753338AbdIHRpY (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 8 Sep 2017 13:45:24 -0400
Date: Fri, 8 Sep 2017 18:45:20 +0100
From: Catalin Marinas <catalin.marinas@arm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
        Andrey Ryabinin <aryabinin@virtuozzo.com>, kasan-dev@googlegroups.com
Subject: Re: kmemleak not always catching stuff
Message-ID: <20170908174518.3i74qvdbruf3lynl@armageddon.cambridge.arm.com>
References: <20170901183311.3bf3348a@gandalf.local.home>
 <20170904100904.v5soe2afqebogefv@armageddon.cambridge.arm.com>
 <20170905101545.1c6459a5@gandalf.local.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170905101545.1c6459a5@gandalf.local.home>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Sep 05, 2017 at 10:15:45AM -0400, Steven Rostedt wrote:
> On Mon, 4 Sep 2017 11:09:05 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Fri, Sep 01, 2017 at 06:33:11PM -0400, Steven Rostedt wrote:
> > > Now I was thinking that it may be due to the fact that the trampoline
> > > is allocated with module_alloc(), and that has some magic kasan goo in
> > > it. But when forcing the issue with adding the following code:
> > > 
> > > 	void **pblah;
> > > 	void *blah;
> > > 
> > > 	pblah = kmalloc(sizeof(*pblah), GFP_KERNEL);	
> > > 	blah = module_alloc(PAGE_SIZE);
> > > 	*pblah = blah;
> > > 	printk("allocated blah %p\n", blah);
> > > 	kfree(pblah);
> > > 
> > > in a path that I could control, it would catch it only after doing it
> > > several times. I was never able to have kmemleak catch the actual bug
> > > from user space no matter how many times I triggered it.  
> > 
> > module_alloc() uses vmalloc_exec(), so it is tracked by kmemleak but you
> > probably hit a false negative with the blah pointer lingering somewhere
> > on some stack.
> 
> Hmm, could this also be what causes the miss of catching the lingering
> ftrace trampoline?

Not sure (not without some additional support in kmemleak to help track
down the source of false negatives; I'm on a long flight to LA next
week, maybe I manage to hack something up ;)).

BTW, I had a quick look at the trace_selftest_ops() function (without
pretending I understand the ftrace code) and there is one case where
this function can exit without freeing dyn_ops. Is this intentional?

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index cb917cebae29..b17ec642793b 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -273,7 +273,7 @@ static int trace_selftest_ops(struct trace_array *tr, int cnt)
 		goto out_free;
 	if (cnt > 1) {
 		if (trace_selftest_test_global_cnt == 0)
-			goto out;
+			goto out_free;
 	}
 	if (trace_selftest_test_dyn_cnt == 0)
 		goto out_free;

> > >  [do the test]
> > > 
> > >  # echo scan > /sys/kernel/debug/kmemleak  
> > 
> > Some heuristics in kmemleak cause the first leak of an object not to be
> > reported (too many false positives). You'd need to do "echo scan" at
> > least twice after an allocation.
> 
> OK, is this documented somewhere?

Apparently not. I'll add some notes to the kmemleak documentation.

> > > Most of the times it found nothing. Even when I switched the above from
> > > module_alloc() to kmalloc().
> > > 
> > > Is this normal?  
> > 
> > In general, a leak would eventually appear after a few scans or in time
> > when some memory location is overridden.
> > 
> > Yet another heuristics in kmemleak is to treat pointers at some offset
> > inside an object as valid references (because of the container_of
> > tricks). However, the downside is that the bigger the object, the
> > greater chances of finding some random data that looks like a pointer.
> > We could change this logic to require precise pointers above a certain
> > size (e.g. PAGE_SIZE) where the use of container_of() is less likely.
> > 
> > Kmemleak doesn't have a way to inspect false negatives but if you are
> > interested in digging further, I could add a "find=0x..." command to
> > print all references to an object during scanning. I also need to find
> > some time to implement a "stopscan" command which uses stop_machine()
> > and skips the heuristics for reducing false positives.
> 
> Without the patch in the link above, there's a memory leak with the
> ftrace trampoline with the following commands:
> 
> Requires: CONFIG_FUNCTION_TRACER and CONFIG_SCHED_TRACER
> 
>  # cd /sys/kernel/debug/tracing
>  # mkdir instances/foo
>  # echo wakeup > instances/foo/current_tracer
>  # echo 0 > /proc/sys/kernel/ftrace_enabled
>  # echo nop > instances/foo/current_tracer
>  # rmdir instances/foo
> 
> What the above does, is creates a new (allocated/non-static) buffer in
> the instance directory. Then we enable the wakeup tracer which
> enables function tracing and also creates a dynamic ftrace trampoline
> for it. We disable function tracing for all tracers with the proc
> sysctl ftrace_enabled set to zero. The nop removes the wakeup tracer
> and unregisters its function tracing handler. This is where the leak
> happens. The unregister path sees that function tracing is disabled and
> exits out early, without releasing the trampoline.

Are the ftrace_ops allocated dynamically in this case (and freed when
unregistered)? Otherwise, you may have an ops->trampoline still around
that kmemleak finds.

-- 
Catalin