Re: kmemleak not always catching stuff

From: Steven Rostedt <rostedt@goodmis.org>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	kasan-dev@googlegroups.com
Subject: Re: kmemleak not always catching stuff
Date: Tue, 5 Sep 2017 10:15:45 -0400	[thread overview]
Message-ID: <20170905101545.1c6459a5@gandalf.local.home> (raw)
In-Reply-To: <20170904100904.v5soe2afqebogefv@armageddon.cambridge.arm.com>

[-- Attachment #1: Type: text/plain, Size: 5135 bytes --]

On Mon, 4 Sep 2017 11:09:05 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:

> Hi Steve,
> 
> On Fri, Sep 01, 2017 at 06:33:11PM -0400, Steven Rostedt wrote:
> > Recently kmemleak discovered a bug in my code where an allocated
> > trampoline for a ftrace function tracer wasn't freed due to an exit
> > path. The thing is, kmemleak was able to catch this 100% when it was
> > triggered by one of my ftrace selftests that happen at bootup. But when
> > I trigger the issue from user space after bootup finished, it would not
> > catch it.  
> 
> Is this the create_filter() fix that went in recently?

No. I haven't pushed it. I just finished testing it and will include it
in my pull request to Linus. I just pushed it to my linux-next branch
(here's the link: http://lkml.kernel.org/r/20170905133217.883468521@goodmis.org)

> 
> > Now I was thinking that it may be due to the fact that the trampoline
> > is allocated with module_alloc(), and that has some magic kasan goo in
> > it. But when forcing the issue with adding the following code:
> > 
> > 	void **pblah;
> > 	void *blah;
> > 
> > 	pblah = kmalloc(sizeof(*pblah), GFP_KERNEL);	
> > 	blah = module_alloc(PAGE_SIZE);
> > 	*pblah = blah;
> > 	printk("allocated blah %p\n", blah);
> > 	kfree(pblah);
> > 
> > in a path that I could control, it would catch it only after doing it
> > several times. I was never able to have kmemleak catch the actual bug
> > from user space no matter how many times I triggered it.  
> 
> module_alloc() uses vmalloc_exec(), so it is tracked by kmemleak but you
> probably hit a false negative with the blah pointer lingering somewhere
> on some stack.

Hmm, could this also be what causes the miss of catching the lingering
ftrace trampoline?

> 
> >  # dmesg |grep kmemleak 
> > [   16.746832] kmemleak: Kernel memory leak detector initialized
> > [   16.746888] kmemleak: Automatic memory scanning thread started
> > 
> > And then I would do:
> > 
> >  # echo scan=on > /sys/kernel/debug/kmemleak  
> 
> scan=on is not necessary since this just enables the scanning thread
> (already started as per dmesg).

Yeah, but I found doing that appeared to catch things quicker when I
repeated the test.

> 
> >  [do the test]
> > 
> >  # echo scan > /sys/kernel/debug/kmemleak  
> 
> Some heuristics in kmemleak cause the first leak of an object not to be
> reported (too many false positives). You'd need to do "echo scan" at
> least twice after an allocation.

OK, is this documented somewhere?

> 
> I tried the same test code you have above triggered with an echo ... >
> /sys from user space. After the second scan it shows the leak, both with
> and without KASan.
> 
> > Most of the times it found nothing. Even when I switched the above from
> > module_alloc() to kmalloc().
> > 
> > Is this normal?  
> 
> In general, a leak would eventually appear after a few scans or in time
> when some memory location is overridden.
> 
> Yet another heuristics in kmemleak is to treat pointers at some offset
> inside an object as valid references (because of the container_of
> tricks). However, the downside is that the bigger the object, the
> greater chances of finding some random data that looks like a pointer.
> We could change this logic to require precise pointers above a certain
> size (e.g. PAGE_SIZE) where the use of container_of() is less likely.
> 
> Kmemleak doesn't have a way to inspect false negatives but if you are
> interested in digging further, I could add a "find=0x..." command to
> print all references to an object during scanning. I also need to find
> some time to implement a "stopscan" command which uses stop_machine()
> and skips the heuristics for reducing false positives.
> 

Without the patch in the link above, there's a memory leak with the
ftrace trampoline with the following commands:

Requires: CONFIG_FUNCTION_TRACER and CONFIG_SCHED_TRACER

 # cd /sys/kernel/debug/tracing
 # mkdir instances/foo
 # echo wakeup > instances/foo/current_tracer
 # echo 0 > /proc/sys/kernel/ftrace_enabled
 # echo nop > instances/foo/current_tracer
 # rmdir instances/foo

What the above does, is creates a new (allocated/non-static) buffer in
the instance directory. Then we enable the wakeup tracer which
enables function tracing and also creates a dynamic ftrace trampoline
for it. We disable function tracing for all tracers with the proc
sysctl ftrace_enabled set to zero. The nop removes the wakeup tracer
and unregisters its function tracing handler. This is where the leak
happens. The unregister path sees that function tracing is disabled and
exits out early, without releasing the trampoline.

To make sure nothing else may be accessing it, I even remove the trace
instance with the rmdir.

I tried the above over and over and kmemleak never catches it. The
attached patch is what I added to debug this to see that it was not
being freed. This patch can be added on top of the above commit, as it
reverts the fix.

Note, this is basically the same printks I used to debug what was
happening with the original triggering of kmemleak (which only caught
this on boot up, via the ftrace self tests).

-- Steve

[-- Attachment #2: kmemleak-debug.patch --]
[-- Type: text/x-patch, Size: 4082 bytes --]

Index: linux-trace.git/arch/x86/kernel/ftrace.c
===================================================================

--- linux-trace.git.orig/arch/x86/kernel/ftrace.c
+++ linux-trace.git/arch/x86/kernel/ftrace.c
@@ -687,12 +687,18 @@ static unsigned char *ftrace_jmp_replace
 /* Module allocation simplifies allocating memory for code */
 static inline void *alloc_tramp(unsigned long size)
 {
-	return module_alloc(size);
+	void *ret;
+
+	ret = module_alloc(size);
+	printk("alloc tramp %p %ld\n", ret, size);
+	return ret;
+
 }
 static inline void tramp_free(void *tramp, int size)
 {
 	int npages = PAGE_ALIGN(size) >> PAGE_SHIFT;
 
+	printk("free tramp %p %d\n", tramp, size);
 	set_memory_nx((unsigned long)tramp, npages);
 	set_memory_rw((unsigned long)tramp, npages);
 	module_memfree(tramp);
@@ -947,6 +953,7 @@ void arch_ftrace_trampoline_free(struct
 	if (!ops || !(ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
 		return;
 
+	printk("free %pS at %p size=%d\n", ops, (void *)ops->trampoline, ops->trampoline_size);
 	tramp_free((void *)ops->trampoline, ops->trampoline_size);
 	ops->trampoline = 0;
 }
Index: linux-trace.git/kernel/trace/ftrace.c
===================================================================
--- linux-trace.git.orig/kernel/trace/ftrace.c
+++ linux-trace.git/kernel/trace/ftrace.c
@@ -2801,10 +2801,12 @@ static int ftrace_shutdown(struct ftrace
 	if (unlikely(ftrace_disabled))
 		return -ENODEV;
 
+	printk("unreg %pS %d\n", ops, __LINE__);
 	ret = __unregister_ftrace_function(ops);
 	if (ret)
 		return ret;
 
+	printk("unreg %pS %d\n", ops, __LINE__);
 	ftrace_start_up--;
 	/*
 	 * Just warn in case of unbalance, no need to kill ftrace, it's not
@@ -2828,17 +2830,20 @@ static int ftrace_shutdown(struct ftrace
 
 	if (!command || !ftrace_enabled) {
 		/*
-		 * If these are dynamic or per_cpu ops, they still
-		 * need their data freed. Since, function tracing is
+		 * If these are per_cpu ops, they still need their
+		 * per_cpu field freed. Since, function tracing is
 		 * not currently active, we can just free them
 		 * without synchronizing all CPUs.
 		 */
-		if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU))
-			goto free_ops;
-
+	printk("%pS ops->flags=%x tramp=%pS (leaving)\n", ops, ops->flags,
+		(void *)ops->trampoline);
+		if (ops->flags & FTRACE_OPS_FL_PER_CPU)
+			per_cpu_ops_free(ops);
+	printk("unreg %pS %d leaving\n", ops, __LINE__);
 		return 0;
 	}
 
+	printk("unreg %pS %d\n", ops, __LINE__);
 	/*
 	 * If the ops uses a trampoline, then it needs to be
 	 * tested first on update.
@@ -2880,6 +2885,7 @@ static int ftrace_shutdown(struct ftrace
 	 * The same goes for freeing the per_cpu data of the per_cpu
 	 * ops.
 	 */
+	printk("%pS ops->flags=%x\n", ops, ops->flags);
 	if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU)) {
 		/*
 		 * We need to do a hard force of sched synchronization.
@@ -2901,7 +2907,6 @@ static int ftrace_shutdown(struct ftrace
 		if (IS_ENABLED(CONFIG_PREEMPT))
 			synchronize_rcu_tasks();
 
- free_ops:
 		arch_ftrace_trampoline_free(ops);
 
 		if (ops->flags & FTRACE_OPS_FL_PER_CPU)
@@ -5535,6 +5540,7 @@ void ftrace_create_filter_files(struct f
 void ftrace_destroy_filter_files(struct ftrace_ops *ops)
 {
 	mutex_lock(&ftrace_lock);
+	printk("destroy %pS %x (%x)\n", ops, ops->flags, ops->flags & FTRACE_OPS_FL_ENABLED);
 	if (ops->flags & FTRACE_OPS_FL_ENABLED)
 		ftrace_shutdown(ops, 0);
 	ops->flags |= FTRACE_OPS_FL_DELETED;
Index: linux-trace.git/kernel/trace/trace_selftest.c
===================================================================
--- linux-trace.git.orig/kernel/trace/trace_selftest.c
+++ linux-trace.git/kernel/trace/trace_selftest.c
@@ -257,6 +257,7 @@ static int trace_selftest_ops(struct tra
 
 	dyn_ops->func = trace_selftest_test_dyn_func;
 
+	printk("reg dynamic ops\n");
 	register_ftrace_function(dyn_ops);
 
 	trace_selftest_test_global_cnt = 0;
@@ -291,6 +292,7 @@ static int trace_selftest_ops(struct tra
 
 	ret = 0;
  out_free:
+	printk("free dyn_ops\n");
 	unregister_ftrace_function(dyn_ops);
 	kfree(dyn_ops);