From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753115AbdEPXzw (ORCPT ); Tue, 16 May 2017 19:55:52 -0400 Received: from mail.kernel.org ([198.145.29.99]:58530 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752431AbdEPXzv (ORCPT ); Tue, 16 May 2017 19:55:51 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 20CC8239D9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: <20170516133925.GA9453@e104818-lin.cambridge.arm.com> References: <20170206094741.GB3097@dhcp22.suse.cz> <20170207013702.GF24047@wotan.suse.de> <20170207080343.GA5065@dhcp22.suse.cz> <20170209013732.GL24047@wotan.suse.de> <20170515215317.GC17314@wotan.suse.de> <20170516062806.GC30277@dhcp22.suse.cz> <20170516133925.GA9453@e104818-lin.cambridge.arm.com> From: Andy Lutomirski Date: Tue, 16 May 2017 16:55:28 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: kmemleak splat on copy_process() To: Catalin Marinas Cc: Michal Hocko , "Luis R. Rodriguez" , Andrew Morton , Ingo Molnar , Andy Lutomirski , Kees Cook , "Eric W. Biederman" , Mateusz Guzik , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 16, 2017 at 6:39 AM, Catalin Marinas wrote: > Thanks for cc'ing me. The vmalloc allocations have always been tricky > for kmemleak since there are 2-3 other memory locations with the same > value as the vmalloc'ed object: vm_struct.addr and vmap_area.va_start; > occasionally we have vmap_area.va_end pointing to the next > vmap_area.va_start. > > To have a more reliable object reference counting, kmemleak avoids > scanning most of the vmap_area structure (apart from vmap_area.vm) and > requires a minimum count of 2 references for a vmalloc'ed object to not > be considered a leak (that is vm_struct.addr and the caller's location, > in this case task->stack). > >> On Mon 15-05-17 23:53:18, Luis R. Rodriguez wrote: >> > root@piggy:~# cat /sys/kernel/debug/kmemleak >> > unreferenced object 0xffffa07500d4c000 (size 16384): >> > comm "bash", pid 1349, jiffies 4294895999 (age 263.204s) >> > hex dump (first 32 bytes): >> > 9d 6e ac 57 00 00 00 00 00 00 00 00 00 00 00 00 .n.W............ >> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> > backtrace: >> > [] kmemleak_alloc+0x4a/0xa0 >> > [] __vmalloc_node_range+0x20c/0x2b0 >> > [] copy_process.part.37+0x5c2/0x1af0 >> > [] _do_fork+0xcf/0x390 >> > [] SyS_clone+0x19/0x20 >> > [] do_syscall_64+0x5b/0xc0 >> > [] return_from_SYSCALL_64+0x0/0x6a >> > [] 0xffffffffffffffff > > This stack here was allocated by bash for a child process via > alloc_thread_stack_node() when CONFIG_VMAP_STACK is enabled. When the > child died, free_stack_thread() did not call vfree_atomic() but rather > stored the corresponding vm_struct pointer in cached_stacks[i]. However, > the original task->stack pointer was lost (child task_struct freed), so > when scanning the memory kmemleak can only find a single reference to > the original stack (via vm_struct cached_stacks[i]) rather than 2 as > required for vmalloc'ed objects. > > BTW, you can get more info about a specific object, including the number > of references found, with: > > echo dump=0xffffa07500d4c000 > /sys/kernel/debug/kmemleak > > So basically kernel/fork.c has its own thread stack allocator on top of > vmalloc and kmemleak gets confused. A quick workaround: > > ------------8<------------------------------ > diff --git a/kernel/fork.c b/kernel/fork.c > index 06d759ab4c62..785907fccf67 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -214,6 +214,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) > this_cpu_write(cached_stacks[i], NULL); > > tsk->stack_vm_area = s; > + kmemleak_alloc(s->addr, THREAD_SIZE, 2, GFP_ATOMIC); > local_irq_enable(); > return s->addr; > } > @@ -254,6 +255,7 @@ static inline void free_thread_stack(struct task_struct *tsk) > continue; > > this_cpu_write(cached_stacks[i], tsk->stack_vm_area); > + kmemleak_free(tsk->stack); > local_irq_restore(flags); > return; > } > ------------8<------------------------------ I would do this differently. The problem only affects cached stacks, right? How about kmemleal_ignoring when caching a stack and unignoring when uncaching a stack? Also, this needs a serious comment. How about "kmemleak does not presently understand that a reference to a vm_area_struct implies a reference to the vmalloced memory"? > A more complex solution (which I'm not yet convinced it works) may be to > pass the number of references of an objects down to the objects it > refers. In the vmalloc case we normally have a single reference to > vm_struct and two to the vmalloc'ed object. If a reference to the > vmalloc'ed object disappeared we could balance it with an increased > number of references to the vm_struct object. But this option requires > more research. The idea being that, with this change, kmemleak would start to understand that surplus references to vm_area_struct cause it to think the memory itself is reachable? --Andy