All of lore.kernel.org
 help / color / mirror / Atom feed
* kmemleak splat on copy_process()
@ 2017-02-03 21:06 Luis R. Rodriguez
  2017-02-06  9:47 ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Luis R. Rodriguez @ 2017-02-03 21:06 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko
  Cc: Ingo Molnar, Andy Lutomirski, Kees Cook, Eric W. Biederman,
	Mateusz Guzik, linux-kernel

On next-20170125 running some kselftest not yet upstream I eventually
get a kmemleak splat:

unreferenced object 0xffffa7b1034b4000 (size 16384):
  comm "driver_data.sh", pid 6506, jiffies 4295068366 (age 1697.272s)
  hex dump (first 32 bytes):
    9d 6e ac 57 00 00 00 00 74 2d 64 72 69 76 65 72  .n.W....t-driver
    5f 64 61 74 61 2e 62 69 6e 0a 00 00 00 00 00 00  _data.bin.......
  backtrace:
    [<ffffffff9005f7fa>] kmemleak_alloc+0x4a/0xa0
    [<ffffffff8fbe7006>] __vmalloc_node_range+0x206/0x2a0
    [<ffffffff8fa7f3e9>] copy_process.part.36+0x609/0x1cc0
    [<ffffffff8fa80c77>] _do_fork+0xd7/0x390
    [<ffffffff8fa80fd9>] SyS_clone+0x19/0x20
    [<ffffffff8fa03b4b>] do_syscall_64+0x5b/0xc0
    [<ffffffff9006b3af>] return_from_SYSCALL_64+0x0/0x6a
    [<ffffffffffffffff>] 0xffffffffffffffff

As per gdb:

(gdb) l *(copy_process+0x609)
0xffffffff8107f3e9 is in copy_process (kernel/fork.c:204).
warning: Source file is more recent than executable.
199             /*
200              * We can't call find_vm_area() in interrupt context, and
201              * free_thread_stack() can be called in interrupt context,
202              * so cache the vm_struct.
203              */
204             if (stack) {
205                     tsk->stack_vm_area = find_vm_area(stack);
206             }
207             return stack;
208     #else

So it would seem a complaint about alloc_thread_stack_node() -- I
checked but I cannot find the leak so am thinking this is a false
positive. Can you confirm?

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-02-03 21:06 kmemleak splat on copy_process() Luis R. Rodriguez
@ 2017-02-06  9:47 ` Michal Hocko
  2017-02-07  1:37   ` Luis R. Rodriguez
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2017-02-06  9:47 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andrew Morton, Ingo Molnar, Andy Lutomirski, Kees Cook,
	Eric W. Biederman, Mateusz Guzik, linux-kernel

On Fri 03-02-17 13:06:04, Luis R. Rodriguez wrote:
> On next-20170125 running some kselftest not yet upstream I eventually
> get a kmemleak splat:
> 
> unreferenced object 0xffffa7b1034b4000 (size 16384):
>   comm "driver_data.sh", pid 6506, jiffies 4295068366 (age 1697.272s)
>   hex dump (first 32 bytes):
>     9d 6e ac 57 00 00 00 00 74 2d 64 72 69 76 65 72  .n.W....t-driver
>     5f 64 61 74 61 2e 62 69 6e 0a 00 00 00 00 00 00  _data.bin.......
>   backtrace:
>     [<ffffffff9005f7fa>] kmemleak_alloc+0x4a/0xa0
>     [<ffffffff8fbe7006>] __vmalloc_node_range+0x206/0x2a0
>     [<ffffffff8fa7f3e9>] copy_process.part.36+0x609/0x1cc0
>     [<ffffffff8fa80c77>] _do_fork+0xd7/0x390
>     [<ffffffff8fa80fd9>] SyS_clone+0x19/0x20
>     [<ffffffff8fa03b4b>] do_syscall_64+0x5b/0xc0
>     [<ffffffff9006b3af>] return_from_SYSCALL_64+0x0/0x6a
>     [<ffffffffffffffff>] 0xffffffffffffffff
> 
> As per gdb:
> 
> (gdb) l *(copy_process+0x609)
> 0xffffffff8107f3e9 is in copy_process (kernel/fork.c:204).
> warning: Source file is more recent than executable.
> 199             /*
> 200              * We can't call find_vm_area() in interrupt context, and
> 201              * free_thread_stack() can be called in interrupt context,
> 202              * so cache the vm_struct.
> 203              */
> 204             if (stack) {
> 205                     tsk->stack_vm_area = find_vm_area(stack);
> 206             }
> 207             return stack;
> 208     #else

Could you check the state of the above process (pid 6506)? Does it still
own its stack? From a quick check I do not see any leak there either.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-02-06  9:47 ` Michal Hocko
@ 2017-02-07  1:37   ` Luis R. Rodriguez
  2017-02-07  8:03     ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Luis R. Rodriguez @ 2017-02-07  1:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Luis R. Rodriguez, Andrew Morton, Ingo Molnar, Andy Lutomirski,
	Kees Cook, Eric W. Biederman, Mateusz Guzik, linux-kernel

On Mon, Feb 06, 2017 at 10:47:41AM +0100, Michal Hocko wrote:
> On Fri 03-02-17 13:06:04, Luis R. Rodriguez wrote:
> > On next-20170125 running some kselftest not yet upstream I eventually
> > get a kmemleak splat:
> > 
> > unreferenced object 0xffffa7b1034b4000 (size 16384):
> >   comm "driver_data.sh", pid 6506, jiffies 4295068366 (age 1697.272s)
> >   hex dump (first 32 bytes):
> >     9d 6e ac 57 00 00 00 00 74 2d 64 72 69 76 65 72  .n.W....t-driver
> >     5f 64 61 74 61 2e 62 69 6e 0a 00 00 00 00 00 00  _data.bin.......
> >   backtrace:
> >     [<ffffffff9005f7fa>] kmemleak_alloc+0x4a/0xa0
> >     [<ffffffff8fbe7006>] __vmalloc_node_range+0x206/0x2a0
> >     [<ffffffff8fa7f3e9>] copy_process.part.36+0x609/0x1cc0
> >     [<ffffffff8fa80c77>] _do_fork+0xd7/0x390
> >     [<ffffffff8fa80fd9>] SyS_clone+0x19/0x20
> >     [<ffffffff8fa03b4b>] do_syscall_64+0x5b/0xc0
> >     [<ffffffff9006b3af>] return_from_SYSCALL_64+0x0/0x6a
> >     [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > As per gdb:
> > 
> > (gdb) l *(copy_process+0x609)
> > 0xffffffff8107f3e9 is in copy_process (kernel/fork.c:204).
> > warning: Source file is more recent than executable.
> > 199             /*
> > 200              * We can't call find_vm_area() in interrupt context, and
> > 201              * free_thread_stack() can be called in interrupt context,
> > 202              * so cache the vm_struct.
> > 203              */
> > 204             if (stack) {
> > 205                     tsk->stack_vm_area = find_vm_area(stack);
> > 206             }
> > 207             return stack;
> > 208     #else
> 
> Could you check the state of the above process (pid 6506)? Does it still
> own its stack? 

Although I can reproduce the splat on kmemleak, getting it to trigger
at a point I can stop the kernel and inspect the process seems rather hard.

> From a quick check I do not see any leak there either.

Then in that case what about:

diff --git a/kernel/fork.c b/kernel/fork.c
index 937ba59709c9..3c96aafa1f82 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -196,6 +196,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
 				     PAGE_KERNEL,
 				     0, node, __builtin_return_address(0));
 
+	kmemleak_ignore(stack);
 	/*
 	 * We can't call find_vm_area() in interrupt context, and
 	 * free_thread_stack() can be called in interrupt context,

I no longer get the spurious splats from kmemleak after this.

  Luis

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-02-07  1:37   ` Luis R. Rodriguez
@ 2017-02-07  8:03     ` Michal Hocko
  2017-02-09  1:37       ` Luis R. Rodriguez
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2017-02-07  8:03 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andrew Morton, Ingo Molnar, Andy Lutomirski, Kees Cook,
	Eric W. Biederman, Mateusz Guzik, linux-kernel

On Tue 07-02-17 02:37:02, Luis R. Rodriguez wrote:
> On Mon, Feb 06, 2017 at 10:47:41AM +0100, Michal Hocko wrote:
> > On Fri 03-02-17 13:06:04, Luis R. Rodriguez wrote:
> > > On next-20170125 running some kselftest not yet upstream I eventually
> > > get a kmemleak splat:
> > > 
> > > unreferenced object 0xffffa7b1034b4000 (size 16384):
> > >   comm "driver_data.sh", pid 6506, jiffies 4295068366 (age 1697.272s)
> > >   hex dump (first 32 bytes):
> > >     9d 6e ac 57 00 00 00 00 74 2d 64 72 69 76 65 72  .n.W....t-driver
> > >     5f 64 61 74 61 2e 62 69 6e 0a 00 00 00 00 00 00  _data.bin.......
> > >   backtrace:
> > >     [<ffffffff9005f7fa>] kmemleak_alloc+0x4a/0xa0
> > >     [<ffffffff8fbe7006>] __vmalloc_node_range+0x206/0x2a0
> > >     [<ffffffff8fa7f3e9>] copy_process.part.36+0x609/0x1cc0
> > >     [<ffffffff8fa80c77>] _do_fork+0xd7/0x390
> > >     [<ffffffff8fa80fd9>] SyS_clone+0x19/0x20
> > >     [<ffffffff8fa03b4b>] do_syscall_64+0x5b/0xc0
> > >     [<ffffffff9006b3af>] return_from_SYSCALL_64+0x0/0x6a
> > >     [<ffffffffffffffff>] 0xffffffffffffffff
> > > 
> > > As per gdb:
> > > 
> > > (gdb) l *(copy_process+0x609)
> > > 0xffffffff8107f3e9 is in copy_process (kernel/fork.c:204).
> > > warning: Source file is more recent than executable.
> > > 199             /*
> > > 200              * We can't call find_vm_area() in interrupt context, and
> > > 201              * free_thread_stack() can be called in interrupt context,
> > > 202              * so cache the vm_struct.
> > > 203              */
> > > 204             if (stack) {
> > > 205                     tsk->stack_vm_area = find_vm_area(stack);
> > > 206             }
> > > 207             return stack;
> > > 208     #else
> > 
> > Could you check the state of the above process (pid 6506)? Does it still
> > own its stack? 
> 
> Although I can reproduce the splat on kmemleak, getting it to trigger
> at a point I can stop the kernel and inspect the process seems rather hard.

Can you make the kernel BUG_ON in this case and check the vmcore?

> > From a quick check I do not see any leak there either.
> 
> Then in that case what about:

This just disables the kmemleak altogether which doesn't sound like a
good idea to me.
 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 937ba59709c9..3c96aafa1f82 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -196,6 +196,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
>  				     PAGE_KERNEL,
>  				     0, node, __builtin_return_address(0));
>  
> +	kmemleak_ignore(stack);
>  	/*
>  	 * We can't call find_vm_area() in interrupt context, and
>  	 * free_thread_stack() can be called in interrupt context,
> 
> I no longer get the spurious splats from kmemleak after this.
> 
>   Luis

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-02-07  8:03     ` Michal Hocko
@ 2017-02-09  1:37       ` Luis R. Rodriguez
  2017-02-17 17:07         ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Luis R. Rodriguez @ 2017-02-09  1:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Luis R. Rodriguez, Andrew Morton, Ingo Molnar, Andy Lutomirski,
	Kees Cook, Eric W. Biederman, Mateusz Guzik, linux-kernel

On Tue, Feb 07, 2017 at 09:03:43AM +0100, Michal Hocko wrote:
> On Tue 07-02-17 02:37:02, Luis R. Rodriguez wrote:
> > > From a quick check I do not see any leak there either.
> > 
> > Then in that case what about:
> 
> This just disables the kmemleak altogether which doesn't sound like a
> good idea to me.

Only for this case, but if that is also not desirable let us debug further.
That or I think we could perhaps massage code to make it clearer to kmemleak
things are good.

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-02-09  1:37       ` Luis R. Rodriguez
@ 2017-02-17 17:07         ` Andy Lutomirski
  2017-02-17 17:23           ` Luis R. Rodriguez
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2017-02-17 17:07 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Michal Hocko, Andrew Morton, Ingo Molnar, Andy Lutomirski,
	Kees Cook, Eric W. Biederman, Mateusz Guzik, linux-kernel

On Wed, Feb 8, 2017 at 5:37 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Tue, Feb 07, 2017 at 09:03:43AM +0100, Michal Hocko wrote:
>> On Tue 07-02-17 02:37:02, Luis R. Rodriguez wrote:
>> > > From a quick check I do not see any leak there either.
>> >
>> > Then in that case what about:
>>
>> This just disables the kmemleak altogether which doesn't sound like a
>> good idea to me.
>
> Only for this case, but if that is also not desirable let us debug further.
> That or I think we could perhaps massage code to make it clearer to kmemleak
> things are good.
>

I'm not seeing the issue.  There should be a live pointer to stack at
all times, either in a local variable or in task->stack.  There's a
weird window in dup_task_struct in which we're stashing away
stack_vm_area, but stack itself should be okay, I think.  But maybe
there really is a race in which a kmemleak check right in the middle
of duplicating the task struct really can't see the stack pointer.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-02-17 17:07         ` Andy Lutomirski
@ 2017-02-17 17:23           ` Luis R. Rodriguez
  2017-02-17 19:32             ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Luis R. Rodriguez @ 2017-02-17 17:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Michal Hocko, Andrew Morton, Ingo Molnar, Andy Lutomirski,
	Kees Cook, Eric W. Biederman, Mateusz Guzik, linux-kernel

On Fri, Feb 17, 2017 at 9:07 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> But maybe
> there really is a race in which a kmemleak check right in the middle
> of duplicating the task struct really can't see the stack pointer.

Funny, but it was actually using kmemleak how I can easily reproduce:

To reproduce the kmemleak splat:

echo clear >  /sys/kernel/debug/kmemleak
echo scan >  /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak

Try that.

 Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-02-17 17:23           ` Luis R. Rodriguez
@ 2017-02-17 19:32             ` Andy Lutomirski
  2017-05-15 21:53               ` Luis R. Rodriguez
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2017-02-17 19:32 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Michal Hocko, Andrew Morton, Ingo Molnar, Andy Lutomirski,
	Kees Cook, Eric W. Biederman, Mateusz Guzik, linux-kernel

On Fri, Feb 17, 2017 at 9:23 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Fri, Feb 17, 2017 at 9:07 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> But maybe
>> there really is a race in which a kmemleak check right in the middle
>> of duplicating the task struct really can't see the stack pointer.
>
> Funny, but it was actually using kmemleak how I can easily reproduce:
>
> To reproduce the kmemleak splat:
>
> echo clear >  /sys/kernel/debug/kmemleak
> echo scan >  /sys/kernel/debug/kmemleak
> cat /sys/kernel/debug/kmemleak

Worked fine for me.  Maybe your config is special?

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-02-17 19:32             ` Andy Lutomirski
@ 2017-05-15 21:53               ` Luis R. Rodriguez
  2017-05-16  6:28                 ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Luis R. Rodriguez @ 2017-05-15 21:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Michal Hocko, Andrew Morton, Ingo Molnar,
	Andy Lutomirski, Kees Cook, Eric W. Biederman, Mateusz Guzik,
	linux-kernel

On Fri, Feb 17, 2017 at 11:32:34AM -0800, Andy Lutomirski wrote:
> On Fri, Feb 17, 2017 at 9:23 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > On Fri, Feb 17, 2017 at 9:07 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> >> But maybe
> >> there really is a race in which a kmemleak check right in the middle
> >> of duplicating the task struct really can't see the stack pointer.
> >
> > Funny, but it was actually using kmemleak how I can easily reproduce:
> >
> > To reproduce the kmemleak splat:
> >
> > echo clear >  /sys/kernel/debug/kmemleak
> > echo scan >  /sys/kernel/debug/kmemleak
> > cat /sys/kernel/debug/kmemleak
> 
> Worked fine for me.  Maybe your config is special?

I don't think my config is special at all, here it is its just
what I use for my qemu kvm guest image:

http://drvbp1.linux-foundation.org/~mcgrof/2017/05/15/configs/piggy-x86_64_qemu_fork_kmemleak.config

Another new kernel (next-20170515 based now on v4.12-rc1) and yet the same
kmemleeak can be triggered easily, although this time I need to hit "scan"
quite a bit more -- try using scan 6 times in a row or so.

echo clear >  /sys/kernel/debug/kmemleak
echo scan >  /sys/kernel/debug/kmemleak
echo scan >  /sys/kernel/debug/kmemleak
echo scan >  /sys/kernel/debug/kmemleak
echo scan >  /sys/kernel/debug/kmemleak
echo scan >  /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak

root@piggy:~# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffffa07500d4c000 (size 16384):
  comm "bash", pid 1349, jiffies 4294895999 (age 263.204s)
  hex dump (first 32 bytes):
    9d 6e ac 57 00 00 00 00 00 00 00 00 00 00 00 00  .n.W............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0
    [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0
    [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0
    [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390
    [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20
    [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0
    [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffffa07500c30000 (size 16384):
  comm "bash", pid 1394, jiffies 4294940106 (age 86.780s)
  hex dump (first 32 bytes):
    9d 6e ac 57 00 00 00 00 00 00 00 00 00 00 00 00  .n.W............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0
    [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0
    [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0
    [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390
    [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20
    [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0
    [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffffa07500c98000 (size 16384):
  comm "bash", pid 1368, jiffies 4294956480 (age 21.284s)
  hex dump (first 32 bytes):
    9d 6e ac 57 00 00 00 00 00 00 00 00 fe 01 00 00  .n.W............
    c0 f4 9d 44 2f e8 ff ff 00 de ac 44 2f e8 ff ff  ...D/......D/...
  backtrace:  
    [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0
    [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0
    [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0
    [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390
    [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20
    [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0
    [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a
    [<ffffffffffffffff>] 0xffffffffffffffff

I confirm that stack is allocated and that a respective tsk->stack_vm_area gets
assigned. So neither of these BUG() triggers, for instance but yet the kmemleak
does:

diff --git a/kernel/fork.c b/kernel/fork.c
index 657373b2ddd2..9bd7ccd55b89 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -230,8 +230,12 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
 	 * free_thread_stack() can be called in interrupt context,
 	 * so cache the vm_struct.
 	 */
-	if (stack)
+	if (stack) {
 		tsk->stack_vm_area = find_vm_area(stack);
+		if (!tsk->stack_vm_area)
+			BUG();
+	} else
+		BUG();
 	return stack;
 #else
 	struct page *page = alloc_pages_node(node, THREADINFO_GFP,



  Luis

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-05-15 21:53               ` Luis R. Rodriguez
@ 2017-05-16  6:28                 ` Michal Hocko
  2017-05-16 13:39                   ` Catalin Marinas
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2017-05-16  6:28 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Andrew Morton, Ingo Molnar, Andy Lutomirski,
	Kees Cook, Eric W. Biederman, Mateusz Guzik, linux-kernel,
	Catalin Marinas

Let's CC Catalin

this smells like a kmemleak bug to me.

On Mon 15-05-17 23:53:18, Luis R. Rodriguez wrote:
> On Fri, Feb 17, 2017 at 11:32:34AM -0800, Andy Lutomirski wrote:
> > On Fri, Feb 17, 2017 at 9:23 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > On Fri, Feb 17, 2017 at 9:07 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > >> But maybe
> > >> there really is a race in which a kmemleak check right in the middle
> > >> of duplicating the task struct really can't see the stack pointer.
> > >
> > > Funny, but it was actually using kmemleak how I can easily reproduce:
> > >
> > > To reproduce the kmemleak splat:
> > >
> > > echo clear >  /sys/kernel/debug/kmemleak
> > > echo scan >  /sys/kernel/debug/kmemleak
> > > cat /sys/kernel/debug/kmemleak
> > 
> > Worked fine for me.  Maybe your config is special?
> 
> I don't think my config is special at all, here it is its just
> what I use for my qemu kvm guest image:
> 
> http://drvbp1.linux-foundation.org/~mcgrof/2017/05/15/configs/piggy-x86_64_qemu_fork_kmemleak.config
> 
> Another new kernel (next-20170515 based now on v4.12-rc1) and yet the same
> kmemleeak can be triggered easily, although this time I need to hit "scan"
> quite a bit more -- try using scan 6 times in a row or so.
> 
> echo clear >  /sys/kernel/debug/kmemleak
> echo scan >  /sys/kernel/debug/kmemleak
> echo scan >  /sys/kernel/debug/kmemleak
> echo scan >  /sys/kernel/debug/kmemleak
> echo scan >  /sys/kernel/debug/kmemleak
> echo scan >  /sys/kernel/debug/kmemleak
> cat /sys/kernel/debug/kmemleak
> 
> root@piggy:~# cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffffa07500d4c000 (size 16384):
>   comm "bash", pid 1349, jiffies 4294895999 (age 263.204s)
>   hex dump (first 32 bytes):
>     9d 6e ac 57 00 00 00 00 00 00 00 00 00 00 00 00  .n.W............
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0
>     [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0
>     [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0
>     [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390
>     [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20
>     [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0
>     [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a
>     [<ffffffffffffffff>] 0xffffffffffffffff
> unreferenced object 0xffffa07500c30000 (size 16384):
>   comm "bash", pid 1394, jiffies 4294940106 (age 86.780s)
>   hex dump (first 32 bytes):
>     9d 6e ac 57 00 00 00 00 00 00 00 00 00 00 00 00  .n.W............
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0
>     [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0
>     [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0
>     [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390
>     [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20
>     [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0
>     [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a
>     [<ffffffffffffffff>] 0xffffffffffffffff
> unreferenced object 0xffffa07500c98000 (size 16384):
>   comm "bash", pid 1368, jiffies 4294956480 (age 21.284s)
>   hex dump (first 32 bytes):
>     9d 6e ac 57 00 00 00 00 00 00 00 00 fe 01 00 00  .n.W............
>     c0 f4 9d 44 2f e8 ff ff 00 de ac 44 2f e8 ff ff  ...D/......D/...
>   backtrace:  
>     [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0
>     [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0
>     [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0
>     [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390
>     [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20
>     [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0
>     [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a
>     [<ffffffffffffffff>] 0xffffffffffffffff
> 
> I confirm that stack is allocated and that a respective tsk->stack_vm_area gets
> assigned. So neither of these BUG() triggers, for instance but yet the kmemleak
> does:
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 657373b2ddd2..9bd7ccd55b89 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -230,8 +230,12 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
>  	 * free_thread_stack() can be called in interrupt context,
>  	 * so cache the vm_struct.
>  	 */
> -	if (stack)
> +	if (stack) {
>  		tsk->stack_vm_area = find_vm_area(stack);
> +		if (!tsk->stack_vm_area)
> +			BUG();
> +	} else
> +		BUG();
>  	return stack;
>  #else
>  	struct page *page = alloc_pages_node(node, THREADINFO_GFP,
> 
> 
> 
>   Luis

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-05-16  6:28                 ` Michal Hocko
@ 2017-05-16 13:39                   ` Catalin Marinas
  2017-05-16 23:55                     ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Catalin Marinas @ 2017-05-16 13:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Luis R. Rodriguez, Andy Lutomirski, Andrew Morton, Ingo Molnar,
	Andy Lutomirski, Kees Cook, Eric W. Biederman, Mateusz Guzik,
	linux-kernel

Thanks for cc'ing me. The vmalloc allocations have always been tricky
for kmemleak since there are 2-3 other memory locations with the same
value as the vmalloc'ed object: vm_struct.addr and vmap_area.va_start;
occasionally we have vmap_area.va_end pointing to the next
vmap_area.va_start.

To have a more reliable object reference counting, kmemleak avoids
scanning most of the vmap_area structure (apart from vmap_area.vm) and
requires a minimum count of 2 references for a vmalloc'ed object to not
be considered a leak (that is vm_struct.addr and the caller's location,
in this case task->stack).

> On Mon 15-05-17 23:53:18, Luis R. Rodriguez wrote:
> > root@piggy:~# cat /sys/kernel/debug/kmemleak
> > unreferenced object 0xffffa07500d4c000 (size 16384):
> >   comm "bash", pid 1349, jiffies 4294895999 (age 263.204s)
> >   hex dump (first 32 bytes):
> >     9d 6e ac 57 00 00 00 00 00 00 00 00 00 00 00 00  .n.W............
> >     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >   backtrace:
> >     [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0
> >     [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0
> >     [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0
> >     [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390
> >     [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20
> >     [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0
> >     [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a
> >     [<ffffffffffffffff>] 0xffffffffffffffff

This stack here was allocated by bash for a child process via
alloc_thread_stack_node() when CONFIG_VMAP_STACK is enabled. When the
child died, free_stack_thread() did not call vfree_atomic() but rather
stored the corresponding vm_struct pointer in cached_stacks[i]. However,
the original task->stack pointer was lost (child task_struct freed), so
when scanning the memory kmemleak can only find a single reference to
the original stack (via vm_struct cached_stacks[i]) rather than 2 as
required for vmalloc'ed objects.

BTW, you can get more info about a specific object, including the number
of references found, with:

  echo dump=0xffffa07500d4c000 > /sys/kernel/debug/kmemleak

So basically kernel/fork.c has its own thread stack allocator on top of
vmalloc and kmemleak gets confused. A quick workaround:

------------8<------------------------------
diff --git a/kernel/fork.c b/kernel/fork.c
index 06d759ab4c62..785907fccf67 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -214,6 +214,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
 		this_cpu_write(cached_stacks[i], NULL);
 
 		tsk->stack_vm_area = s;
+		kmemleak_alloc(s->addr, THREAD_SIZE, 2, GFP_ATOMIC);
 		local_irq_enable();
 		return s->addr;
 	}
@@ -254,6 +255,7 @@ static inline void free_thread_stack(struct task_struct *tsk)
 				continue;
 
 			this_cpu_write(cached_stacks[i], tsk->stack_vm_area);
+			kmemleak_free(tsk->stack);
 			local_irq_restore(flags);
 			return;
 		}
------------8<------------------------------

A better solution I think would be something like kmemleak_ignore() on
the free path paired with a new kmemleak_unignore() in order to avoid
the freeing/re-allocation of the kmemleak metadata.

A more complex solution (which I'm not yet convinced it works) may be to
pass the number of references of an objects down to the objects it
refers. In the vmalloc case we normally have a single reference to
vm_struct and two to the vmalloc'ed object. If a reference to the
vmalloc'ed object disappeared we could balance it with an increased
number of references to the vm_struct object. But this option requires
more research.

Unless there are better ideas, I'll post a patch adding
kmemleak_ignore/unignore annotations to kerne/fork.c (and the
corresponding mm/kmemleak.c changes).

Thanks.

-- 
Catalin

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-05-16 13:39                   ` Catalin Marinas
@ 2017-05-16 23:55                     ` Andy Lutomirski
  2017-05-17 11:09                       ` Catalin Marinas
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2017-05-16 23:55 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Michal Hocko, Luis R. Rodriguez, Andrew Morton, Ingo Molnar,
	Andy Lutomirski, Kees Cook, Eric W. Biederman, Mateusz Guzik,
	linux-kernel

On Tue, May 16, 2017 at 6:39 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> Thanks for cc'ing me. The vmalloc allocations have always been tricky
> for kmemleak since there are 2-3 other memory locations with the same
> value as the vmalloc'ed object: vm_struct.addr and vmap_area.va_start;
> occasionally we have vmap_area.va_end pointing to the next
> vmap_area.va_start.
>
> To have a more reliable object reference counting, kmemleak avoids
> scanning most of the vmap_area structure (apart from vmap_area.vm) and
> requires a minimum count of 2 references for a vmalloc'ed object to not
> be considered a leak (that is vm_struct.addr and the caller's location,
> in this case task->stack).
>
>> On Mon 15-05-17 23:53:18, Luis R. Rodriguez wrote:
>> > root@piggy:~# cat /sys/kernel/debug/kmemleak
>> > unreferenced object 0xffffa07500d4c000 (size 16384):
>> >   comm "bash", pid 1349, jiffies 4294895999 (age 263.204s)
>> >   hex dump (first 32 bytes):
>> >     9d 6e ac 57 00 00 00 00 00 00 00 00 00 00 00 00  .n.W............
>> >     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>> >   backtrace:
>> >     [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0
>> >     [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0
>> >     [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0
>> >     [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390
>> >     [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20
>> >     [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0
>> >     [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a
>> >     [<ffffffffffffffff>] 0xffffffffffffffff
>
> This stack here was allocated by bash for a child process via
> alloc_thread_stack_node() when CONFIG_VMAP_STACK is enabled. When the
> child died, free_stack_thread() did not call vfree_atomic() but rather
> stored the corresponding vm_struct pointer in cached_stacks[i]. However,
> the original task->stack pointer was lost (child task_struct freed), so
> when scanning the memory kmemleak can only find a single reference to
> the original stack (via vm_struct cached_stacks[i]) rather than 2 as
> required for vmalloc'ed objects.
>
> BTW, you can get more info about a specific object, including the number
> of references found, with:
>
>   echo dump=0xffffa07500d4c000 > /sys/kernel/debug/kmemleak
>
> So basically kernel/fork.c has its own thread stack allocator on top of
> vmalloc and kmemleak gets confused. A quick workaround:
>
> ------------8<------------------------------
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 06d759ab4c62..785907fccf67 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -214,6 +214,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
>                 this_cpu_write(cached_stacks[i], NULL);
>
>                 tsk->stack_vm_area = s;
> +               kmemleak_alloc(s->addr, THREAD_SIZE, 2, GFP_ATOMIC);
>                 local_irq_enable();
>                 return s->addr;
>         }
> @@ -254,6 +255,7 @@ static inline void free_thread_stack(struct task_struct *tsk)
>                                 continue;
>
>                         this_cpu_write(cached_stacks[i], tsk->stack_vm_area);
> +                       kmemleak_free(tsk->stack);
>                         local_irq_restore(flags);
>                         return;
>                 }
> ------------8<------------------------------

I would do this differently.  The problem only affects cached stacks,
right?  How about kmemleal_ignoring when caching a stack and
unignoring when uncaching a stack?

Also, this needs a serious comment.  How about "kmemleak does not
presently understand that a reference to a vm_area_struct implies a
reference to the vmalloced memory"?

> A more complex solution (which I'm not yet convinced it works) may be to
> pass the number of references of an objects down to the objects it
> refers. In the vmalloc case we normally have a single reference to
> vm_struct and two to the vmalloc'ed object. If a reference to the
> vmalloc'ed object disappeared we could balance it with an increased
> number of references to the vm_struct object. But this option requires
> more research.

The idea being that, with this change, kmemleak would start to
understand that surplus references to vm_area_struct cause it to think
the memory itself is reachable?

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kmemleak splat on copy_process()
  2017-05-16 23:55                     ` Andy Lutomirski
@ 2017-05-17 11:09                       ` Catalin Marinas
  0 siblings, 0 replies; 13+ messages in thread
From: Catalin Marinas @ 2017-05-17 11:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Michal Hocko, Luis R. Rodriguez, Andrew Morton, Ingo Molnar,
	Kees Cook, Eric W. Biederman, Mateusz Guzik, linux-kernel

On Tue, May 16, 2017 at 04:55:28PM -0700, Andy Lutomirski wrote:
> On Tue, May 16, 2017 at 6:39 AM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > Thanks for cc'ing me. The vmalloc allocations have always been tricky
> > for kmemleak since there are 2-3 other memory locations with the same
> > value as the vmalloc'ed object: vm_struct.addr and vmap_area.va_start;
> > occasionally we have vmap_area.va_end pointing to the next
> > vmap_area.va_start.
> >
> > To have a more reliable object reference counting, kmemleak avoids
> > scanning most of the vmap_area structure (apart from vmap_area.vm) and
> > requires a minimum count of 2 references for a vmalloc'ed object to not
> > be considered a leak (that is vm_struct.addr and the caller's location,
> > in this case task->stack).
> >
> >> On Mon 15-05-17 23:53:18, Luis R. Rodriguez wrote:
> >> > root@piggy:~# cat /sys/kernel/debug/kmemleak
> >> > unreferenced object 0xffffa07500d4c000 (size 16384):
> >> >   comm "bash", pid 1349, jiffies 4294895999 (age 263.204s)
> >> >   hex dump (first 32 bytes):
> >> >     9d 6e ac 57 00 00 00 00 00 00 00 00 00 00 00 00  .n.W............
> >> >     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >> >   backtrace:
> >> >     [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0
> >> >     [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0
> >> >     [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0
> >> >     [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390
> >> >     [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20
> >> >     [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0
> >> >     [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a
> >> >     [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > This stack here was allocated by bash for a child process via
> > alloc_thread_stack_node() when CONFIG_VMAP_STACK is enabled. When the
> > child died, free_stack_thread() did not call vfree_atomic() but rather
> > stored the corresponding vm_struct pointer in cached_stacks[i]. However,
> > the original task->stack pointer was lost (child task_struct freed), so
> > when scanning the memory kmemleak can only find a single reference to
> > the original stack (via vm_struct cached_stacks[i]) rather than 2 as
> > required for vmalloc'ed objects.
> >
> > BTW, you can get more info about a specific object, including the number
> > of references found, with:
> >
> >   echo dump=0xffffa07500d4c000 > /sys/kernel/debug/kmemleak
> >
> > So basically kernel/fork.c has its own thread stack allocator on top of
> > vmalloc and kmemleak gets confused. A quick workaround:
> >
> > ------------8<------------------------------
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 06d759ab4c62..785907fccf67 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -214,6 +214,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
> >                 this_cpu_write(cached_stacks[i], NULL);
> >
> >                 tsk->stack_vm_area = s;
> > +               kmemleak_alloc(s->addr, THREAD_SIZE, 2, GFP_ATOMIC);
> >                 local_irq_enable();
> >                 return s->addr;
> >         }
> > @@ -254,6 +255,7 @@ static inline void free_thread_stack(struct task_struct *tsk)
> >                                 continue;
> >
> >                         this_cpu_write(cached_stacks[i], tsk->stack_vm_area);
> > +                       kmemleak_free(tsk->stack);
> >                         local_irq_restore(flags);
> >                         return;
> >                 }
> > ------------8<------------------------------
> 
> I would do this differently.  The problem only affects cached stacks,
> right? How about kmemleal_ignoring when caching a stack and
> unignoring when uncaching a stack?

Indeed ;), that was my paragraph just below the patch:

> > A better solution I think would be something like kmemleak_ignore() on
> > the free path paired with a new kmemleak_unignore() in order to avoid
> > the freeing/re-allocation of the kmemleak metadata.

> Also, this needs a serious comment.  How about "kmemleak does not
> presently understand that a reference to a vm_area_struct implies a
> reference to the vmalloced memory"?

Yes, as most of the other kmemleak annotations.

> > A more complex solution (which I'm not yet convinced it works) may be to
> > pass the number of references of an objects down to the objects it
> > refers. In the vmalloc case we normally have a single reference to
> > vm_struct and two to the vmalloc'ed object. If a reference to the
> > vmalloc'ed object disappeared we could balance it with an increased
> > number of references to the vm_struct object. But this option requires
> > more research.
> 
> The idea being that, with this change, kmemleak would start to
> understand that surplus references to vm_area_struct cause it to think
> the memory itself is reachable?

That's the idea, possibly with a new kmemleak API specific to vmalloc
that would link the vm_struct object with the vmalloc'ed one. The
advantage is that we don't need to annotate the stack caching (that's
the only place that I'm aware of), though the actual implementation is a
bit more complex than kmemleak_ignore/unignore (and assuming the idea is
sane).

-- 
Catalin

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-05-17 11:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-03 21:06 kmemleak splat on copy_process() Luis R. Rodriguez
2017-02-06  9:47 ` Michal Hocko
2017-02-07  1:37   ` Luis R. Rodriguez
2017-02-07  8:03     ` Michal Hocko
2017-02-09  1:37       ` Luis R. Rodriguez
2017-02-17 17:07         ` Andy Lutomirski
2017-02-17 17:23           ` Luis R. Rodriguez
2017-02-17 19:32             ` Andy Lutomirski
2017-05-15 21:53               ` Luis R. Rodriguez
2017-05-16  6:28                 ` Michal Hocko
2017-05-16 13:39                   ` Catalin Marinas
2017-05-16 23:55                     ` Andy Lutomirski
2017-05-17 11:09                       ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.