From: Michal Hocko <mhocko@kernel.org> To: Vegard Nossum <vegard.nossum@oracle.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Rik van Riel <riel@redhat.com>, Matthew Wilcox <mawilcox@microsoft.com>, Peter Zijlstra <peterz@infradead.org>, Andrew Morton <akpm@linux-foundation.org>, Al Viro <viro@zeniv.linux.org.uk>, Ingo Molnar <mingo@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org> Subject: Re: [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting debugging Date: Fri, 16 Dec 2016 10:01:57 +0100 [thread overview] Message-ID: <20161216090157.GA13940@dhcp22.suse.cz> (raw) In-Reply-To: <20161216082202.21044-4-vegard.nossum@oracle.com> On Fri 16-12-16 09:22:02, Vegard Nossum wrote: > Reference counting bugs are hard to debug by their nature since the actual > manifestation of one can occur very far from where the error is introduced > (e.g. a missing get() only manifest as a use-after-free when the reference > count prematurely drops to 0, which could be arbitrarily long after where > the get() should have happened if there are other users). I wrote this patch > to try to track down a suspected 'mm_struct' reference counting bug. I definitely agree that hunting these bugs is a royal PITA, no question about that. I am just wondering whether this has been motivated by any particular bug recently. I do not seem to remember any such an issue for quite some time. > The basic idea is to keep track of all references, not just with a reference > counter, but with an actual reference _list_. Whenever you get() or put() a > reference, you also add or remove yourself, respectively, from the reference > list. This really helps debugging because (for example) you always put a > specific reference, meaning that if that reference was not yours to put, you > will notice it immediately (rather than when the reference counter goes to 0 > and you still have an active reference). But who is the owner of the reference? A function/task? It is not all that uncommon to take an mm reference from one context and release it from a different one. But I might be missing your point here. > The main interface is in <linux/mm_ref_types.h> and <linux/mm_ref.h>, while > the implementation lives in mm/mm_ref.c. Since 'struct mm_struct' has both > ->mm_users and ->mm_count, we introduce helpers for both of them, but use > the same data structure for each (struct mm_ref). The low-level rules (i.e. > the ones we have to follow, but which nobody else should really have to > care about since they use the higher-level interface) are: > > - after incrementing ->mm_count you also have to call get_mm_ref() > > - before decrementing ->mm_count you also have to call put_mm_ref() > > - after incrementing ->mm_users you also have to call get_mm_users_ref() > > - before decrementing ->mm_users you also have to call put_mm_users_ref() > > The rules that most of the rest of the kernel will care about are: > > - functions that acquire and return a mm_struct should take a > 'struct mm_ref *' which it can pass on to mmget()/mmgrab()/etc. > > - functions that release an mm_struct passed as a parameter should also > take a 'struct mm_ref *' which it can pass on to mmput()/mmdrop()/etc. > > - any function that temporarily acquires a mm_struct reference should > use MM_REF() to define an on-stack reference and pass it on to > mmget()/mmput()/mmgrab()/mmdrop()/etc. > > - any structure that holds an mm_struct pointer must also include a > 'struct mm_ref' member; when the mm_struct pointer is modified you > would typically also call mmget()/mmgrab()/mmput()/mmdrop() and they > should be called with this mm_ref > > - you can convert (for example) an on-stack reference to an in-struct > reference using move_mm_ref(). This is semantically equivalent to > (atomically) taking the new reference and dropping the old one, but > doesn't actually need to modify the reference count This all sounds way too intrusive to me so I am not really sure this is something we really want. A nice thing for debugging for sure but I am somehow skeptical whether it is really worth it considering how many those ref. count bugs we've had. [...] -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: Vegard Nossum <vegard.nossum@oracle.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Rik van Riel <riel@redhat.com>, Matthew Wilcox <mawilcox@microsoft.com>, Peter Zijlstra <peterz@infradead.org>, Andrew Morton <akpm@linux-foundation.org>, Al Viro <viro@zeniv.linux.org.uk>, Ingo Molnar <mingo@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org> Subject: Re: [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting debugging Date: Fri, 16 Dec 2016 10:01:57 +0100 [thread overview] Message-ID: <20161216090157.GA13940@dhcp22.suse.cz> (raw) In-Reply-To: <20161216082202.21044-4-vegard.nossum@oracle.com> On Fri 16-12-16 09:22:02, Vegard Nossum wrote: > Reference counting bugs are hard to debug by their nature since the actual > manifestation of one can occur very far from where the error is introduced > (e.g. a missing get() only manifest as a use-after-free when the reference > count prematurely drops to 0, which could be arbitrarily long after where > the get() should have happened if there are other users). I wrote this patch > to try to track down a suspected 'mm_struct' reference counting bug. I definitely agree that hunting these bugs is a royal PITA, no question about that. I am just wondering whether this has been motivated by any particular bug recently. I do not seem to remember any such an issue for quite some time. > The basic idea is to keep track of all references, not just with a reference > counter, but with an actual reference _list_. Whenever you get() or put() a > reference, you also add or remove yourself, respectively, from the reference > list. This really helps debugging because (for example) you always put a > specific reference, meaning that if that reference was not yours to put, you > will notice it immediately (rather than when the reference counter goes to 0 > and you still have an active reference). But who is the owner of the reference? A function/task? It is not all that uncommon to take an mm reference from one context and release it from a different one. But I might be missing your point here. > The main interface is in <linux/mm_ref_types.h> and <linux/mm_ref.h>, while > the implementation lives in mm/mm_ref.c. Since 'struct mm_struct' has both > ->mm_users and ->mm_count, we introduce helpers for both of them, but use > the same data structure for each (struct mm_ref). The low-level rules (i.e. > the ones we have to follow, but which nobody else should really have to > care about since they use the higher-level interface) are: > > - after incrementing ->mm_count you also have to call get_mm_ref() > > - before decrementing ->mm_count you also have to call put_mm_ref() > > - after incrementing ->mm_users you also have to call get_mm_users_ref() > > - before decrementing ->mm_users you also have to call put_mm_users_ref() > > The rules that most of the rest of the kernel will care about are: > > - functions that acquire and return a mm_struct should take a > 'struct mm_ref *' which it can pass on to mmget()/mmgrab()/etc. > > - functions that release an mm_struct passed as a parameter should also > take a 'struct mm_ref *' which it can pass on to mmput()/mmdrop()/etc. > > - any function that temporarily acquires a mm_struct reference should > use MM_REF() to define an on-stack reference and pass it on to > mmget()/mmput()/mmgrab()/mmdrop()/etc. > > - any structure that holds an mm_struct pointer must also include a > 'struct mm_ref' member; when the mm_struct pointer is modified you > would typically also call mmget()/mmgrab()/mmput()/mmdrop() and they > should be called with this mm_ref > > - you can convert (for example) an on-stack reference to an in-struct > reference using move_mm_ref(). This is semantically equivalent to > (atomically) taking the new reference and dropping the old one, but > doesn't actually need to modify the reference count This all sounds way too intrusive to me so I am not really sure this is something we really want. A nice thing for debugging for sure but I am somehow skeptical whether it is really worth it considering how many those ref. count bugs we've had. [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-12-16 9:02 UTC|newest] Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-12-16 8:21 [PATCH 1/4] mm: add new mmgrab() helper Vegard Nossum 2016-12-16 8:21 ` Vegard Nossum 2016-12-16 8:22 ` [PATCH 2/4] mm: add new mmget() helper Vegard Nossum 2016-12-16 8:22 ` Vegard Nossum 2016-12-16 9:26 ` Michal Hocko 2016-12-16 9:26 ` Michal Hocko 2016-12-16 8:22 ` [PATCH 3/4] mm: use mmget_not_zero() helper Vegard Nossum 2016-12-16 8:22 ` Vegard Nossum 2016-12-16 9:27 ` Michal Hocko 2016-12-16 9:27 ` Michal Hocko 2016-12-16 8:22 ` [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting debugging Vegard Nossum 2016-12-16 8:22 ` Vegard Nossum 2016-12-16 9:01 ` Michal Hocko [this message] 2016-12-16 9:01 ` Michal Hocko 2016-12-16 9:43 ` Vegard Nossum 2016-12-16 9:43 ` Vegard Nossum 2016-12-16 10:11 ` crash during oom reaper (was: Re: [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting debugging) Michal Hocko 2016-12-16 10:11 ` Michal Hocko 2016-12-16 10:44 ` Kirill A. Shutemov 2016-12-16 10:44 ` Kirill A. Shutemov 2016-12-16 11:42 ` crash during oom reaper Michal Hocko 2016-12-16 11:42 ` Michal Hocko 2016-12-16 12:12 ` Michal Hocko 2016-12-16 12:12 ` Michal Hocko 2016-12-16 12:35 ` Kirill A. Shutemov 2016-12-16 12:35 ` Kirill A. Shutemov 2016-12-16 12:56 ` Michal Hocko 2016-12-16 12:56 ` Michal Hocko 2016-12-16 13:07 ` Kirill A. Shutemov 2016-12-16 13:07 ` Kirill A. Shutemov 2016-12-16 13:14 ` Michal Hocko 2016-12-16 13:14 ` Michal Hocko 2016-12-18 13:47 ` Tetsuo Handa 2016-12-18 13:47 ` Tetsuo Handa 2016-12-18 16:06 ` Michal Hocko 2016-12-18 16:06 ` Michal Hocko 2016-12-16 13:14 ` Vegard Nossum 2016-12-16 13:14 ` Vegard Nossum 2016-12-16 14:00 ` Michal Hocko 2016-12-16 14:00 ` Michal Hocko 2016-12-16 14:25 ` Vegard Nossum 2016-12-16 14:25 ` Vegard Nossum 2016-12-16 14:32 ` Michal Hocko 2016-12-16 14:32 ` Michal Hocko 2016-12-16 14:53 ` Vegard Nossum 2016-12-16 14:53 ` Vegard Nossum 2016-12-16 14:04 ` Vegard Nossum 2016-12-16 14:04 ` Vegard Nossum 2016-12-16 9:14 ` [PATCH 1/4] mm: add new mmgrab() helper Michal Hocko 2016-12-16 9:14 ` Michal Hocko 2016-12-16 9:56 ` Peter Zijlstra 2016-12-16 9:56 ` Peter Zijlstra 2016-12-16 10:19 ` Kirill A. Shutemov 2016-12-16 10:19 ` Kirill A. Shutemov 2016-12-16 10:20 ` Vegard Nossum 2016-12-16 10:20 ` Vegard Nossum 2016-12-16 10:36 ` Michal Hocko 2016-12-16 10:36 ` Michal Hocko 2016-12-16 11:14 ` Vegard Nossum 2016-12-16 11:14 ` Vegard Nossum
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20161216090157.GA13940@dhcp22.suse.cz \ --to=mhocko@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mawilcox@microsoft.com \ --cc=mingo@kernel.org \ --cc=peterz@infradead.org \ --cc=riel@redhat.com \ --cc=torvalds@linux-foundation.org \ --cc=vegard.nossum@oracle.com \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.