All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Vegard Nossum <vegard.nossum@oracle.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rik van Riel <riel@redhat.com>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Ingo Molnar <mingo@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting debugging
Date: Fri, 16 Dec 2016 10:01:57 +0100	[thread overview]
Message-ID: <20161216090157.GA13940@dhcp22.suse.cz> (raw)
In-Reply-To: <20161216082202.21044-4-vegard.nossum@oracle.com>

On Fri 16-12-16 09:22:02, Vegard Nossum wrote:
> Reference counting bugs are hard to debug by their nature since the actual
> manifestation of one can occur very far from where the error is introduced
> (e.g. a missing get() only manifest as a use-after-free when the reference
> count prematurely drops to 0, which could be arbitrarily long after where
> the get() should have happened if there are other users). I wrote this patch
> to try to track down a suspected 'mm_struct' reference counting bug.

I definitely agree that hunting these bugs is a royal PITA, no question
about that. I am just wondering whether this has been motivated by any
particular bug recently. I do not seem to remember any such an issue for
quite some time.

> The basic idea is to keep track of all references, not just with a reference
> counter, but with an actual reference _list_. Whenever you get() or put() a
> reference, you also add or remove yourself, respectively, from the reference
> list. This really helps debugging because (for example) you always put a
> specific reference, meaning that if that reference was not yours to put, you
> will notice it immediately (rather than when the reference counter goes to 0
> and you still have an active reference).

But who is the owner of the reference? A function/task? It is not all
that uncommon to take an mm reference from one context and release it
from a different one. But I might be missing your point here.

> The main interface is in <linux/mm_ref_types.h> and <linux/mm_ref.h>, while
> the implementation lives in mm/mm_ref.c. Since 'struct mm_struct' has both
> ->mm_users and ->mm_count, we introduce helpers for both of them, but use
> the same data structure for each (struct mm_ref). The low-level rules (i.e.
> the ones we have to follow, but which nobody else should really have to
> care about since they use the higher-level interface) are:
> 
>  - after incrementing ->mm_count you also have to call get_mm_ref()
> 
>  - before decrementing ->mm_count you also have to call put_mm_ref()
> 
>  - after incrementing ->mm_users you also have to call get_mm_users_ref()
> 
>  - before decrementing ->mm_users you also have to call put_mm_users_ref()
> 
> The rules that most of the rest of the kernel will care about are:
> 
>  - functions that acquire and return a mm_struct should take a
>    'struct mm_ref *' which it can pass on to mmget()/mmgrab()/etc.
> 
>  - functions that release an mm_struct passed as a parameter should also
>    take a 'struct mm_ref *' which it can pass on to mmput()/mmdrop()/etc.
> 
>  - any function that temporarily acquires a mm_struct reference should
>    use MM_REF() to define an on-stack reference and pass it on to
>    mmget()/mmput()/mmgrab()/mmdrop()/etc.
> 
>  - any structure that holds an mm_struct pointer must also include a
>    'struct mm_ref' member; when the mm_struct pointer is modified you
>    would typically also call mmget()/mmgrab()/mmput()/mmdrop() and they
>    should be called with this mm_ref
> 
>  - you can convert (for example) an on-stack reference to an in-struct
>    reference using move_mm_ref(). This is semantically equivalent to
>    (atomically) taking the new reference and dropping the old one, but
>    doesn't actually need to modify the reference count

This all sounds way too intrusive to me so I am not really sure this is
something we really want. A nice thing for debugging for sure but I am
somehow skeptical whether it is really worth it considering how many
those ref. count bugs we've had.

[...]
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Vegard Nossum <vegard.nossum@oracle.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rik van Riel <riel@redhat.com>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Ingo Molnar <mingo@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting debugging
Date: Fri, 16 Dec 2016 10:01:57 +0100	[thread overview]
Message-ID: <20161216090157.GA13940@dhcp22.suse.cz> (raw)
In-Reply-To: <20161216082202.21044-4-vegard.nossum@oracle.com>

On Fri 16-12-16 09:22:02, Vegard Nossum wrote:
> Reference counting bugs are hard to debug by their nature since the actual
> manifestation of one can occur very far from where the error is introduced
> (e.g. a missing get() only manifest as a use-after-free when the reference
> count prematurely drops to 0, which could be arbitrarily long after where
> the get() should have happened if there are other users). I wrote this patch
> to try to track down a suspected 'mm_struct' reference counting bug.

I definitely agree that hunting these bugs is a royal PITA, no question
about that. I am just wondering whether this has been motivated by any
particular bug recently. I do not seem to remember any such an issue for
quite some time.

> The basic idea is to keep track of all references, not just with a reference
> counter, but with an actual reference _list_. Whenever you get() or put() a
> reference, you also add or remove yourself, respectively, from the reference
> list. This really helps debugging because (for example) you always put a
> specific reference, meaning that if that reference was not yours to put, you
> will notice it immediately (rather than when the reference counter goes to 0
> and you still have an active reference).

But who is the owner of the reference? A function/task? It is not all
that uncommon to take an mm reference from one context and release it
from a different one. But I might be missing your point here.

> The main interface is in <linux/mm_ref_types.h> and <linux/mm_ref.h>, while
> the implementation lives in mm/mm_ref.c. Since 'struct mm_struct' has both
> ->mm_users and ->mm_count, we introduce helpers for both of them, but use
> the same data structure for each (struct mm_ref). The low-level rules (i.e.
> the ones we have to follow, but which nobody else should really have to
> care about since they use the higher-level interface) are:
> 
>  - after incrementing ->mm_count you also have to call get_mm_ref()
> 
>  - before decrementing ->mm_count you also have to call put_mm_ref()
> 
>  - after incrementing ->mm_users you also have to call get_mm_users_ref()
> 
>  - before decrementing ->mm_users you also have to call put_mm_users_ref()
> 
> The rules that most of the rest of the kernel will care about are:
> 
>  - functions that acquire and return a mm_struct should take a
>    'struct mm_ref *' which it can pass on to mmget()/mmgrab()/etc.
> 
>  - functions that release an mm_struct passed as a parameter should also
>    take a 'struct mm_ref *' which it can pass on to mmput()/mmdrop()/etc.
> 
>  - any function that temporarily acquires a mm_struct reference should
>    use MM_REF() to define an on-stack reference and pass it on to
>    mmget()/mmput()/mmgrab()/mmdrop()/etc.
> 
>  - any structure that holds an mm_struct pointer must also include a
>    'struct mm_ref' member; when the mm_struct pointer is modified you
>    would typically also call mmget()/mmgrab()/mmput()/mmdrop() and they
>    should be called with this mm_ref
> 
>  - you can convert (for example) an on-stack reference to an in-struct
>    reference using move_mm_ref(). This is semantically equivalent to
>    (atomically) taking the new reference and dropping the old one, but
>    doesn't actually need to modify the reference count

This all sounds way too intrusive to me so I am not really sure this is
something we really want. A nice thing for debugging for sure but I am
somehow skeptical whether it is really worth it considering how many
those ref. count bugs we've had.

[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-12-16  9:02 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-16  8:21 [PATCH 1/4] mm: add new mmgrab() helper Vegard Nossum
2016-12-16  8:21 ` Vegard Nossum
2016-12-16  8:22 ` [PATCH 2/4] mm: add new mmget() helper Vegard Nossum
2016-12-16  8:22   ` Vegard Nossum
2016-12-16  9:26   ` Michal Hocko
2016-12-16  9:26     ` Michal Hocko
2016-12-16  8:22 ` [PATCH 3/4] mm: use mmget_not_zero() helper Vegard Nossum
2016-12-16  8:22   ` Vegard Nossum
2016-12-16  9:27   ` Michal Hocko
2016-12-16  9:27     ` Michal Hocko
2016-12-16  8:22 ` [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting debugging Vegard Nossum
2016-12-16  8:22   ` Vegard Nossum
2016-12-16  9:01   ` Michal Hocko [this message]
2016-12-16  9:01     ` Michal Hocko
2016-12-16  9:43     ` Vegard Nossum
2016-12-16  9:43       ` Vegard Nossum
2016-12-16 10:11       ` crash during oom reaper (was: Re: [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting debugging) Michal Hocko
2016-12-16 10:11         ` Michal Hocko
2016-12-16 10:44         ` Kirill A. Shutemov
2016-12-16 10:44           ` Kirill A. Shutemov
2016-12-16 11:42           ` crash during oom reaper Michal Hocko
2016-12-16 11:42             ` Michal Hocko
2016-12-16 12:12             ` Michal Hocko
2016-12-16 12:12               ` Michal Hocko
2016-12-16 12:35             ` Kirill A. Shutemov
2016-12-16 12:35               ` Kirill A. Shutemov
2016-12-16 12:56               ` Michal Hocko
2016-12-16 12:56                 ` Michal Hocko
2016-12-16 13:07                 ` Kirill A. Shutemov
2016-12-16 13:07                   ` Kirill A. Shutemov
2016-12-16 13:14                   ` Michal Hocko
2016-12-16 13:14                     ` Michal Hocko
2016-12-18 13:47                     ` Tetsuo Handa
2016-12-18 13:47                       ` Tetsuo Handa
2016-12-18 16:06                       ` Michal Hocko
2016-12-18 16:06                         ` Michal Hocko
2016-12-16 13:14         ` Vegard Nossum
2016-12-16 13:14           ` Vegard Nossum
2016-12-16 14:00           ` Michal Hocko
2016-12-16 14:00             ` Michal Hocko
2016-12-16 14:25             ` Vegard Nossum
2016-12-16 14:25               ` Vegard Nossum
2016-12-16 14:32               ` Michal Hocko
2016-12-16 14:32                 ` Michal Hocko
2016-12-16 14:53                 ` Vegard Nossum
2016-12-16 14:53                   ` Vegard Nossum
2016-12-16 14:04           ` Vegard Nossum
2016-12-16 14:04             ` Vegard Nossum
2016-12-16  9:14 ` [PATCH 1/4] mm: add new mmgrab() helper Michal Hocko
2016-12-16  9:14   ` Michal Hocko
2016-12-16  9:56 ` Peter Zijlstra
2016-12-16  9:56   ` Peter Zijlstra
2016-12-16 10:19   ` Kirill A. Shutemov
2016-12-16 10:19     ` Kirill A. Shutemov
2016-12-16 10:20     ` Vegard Nossum
2016-12-16 10:20       ` Vegard Nossum
2016-12-16 10:36       ` Michal Hocko
2016-12-16 10:36         ` Michal Hocko
2016-12-16 11:14   ` Vegard Nossum
2016-12-16 11:14     ` Vegard Nossum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161216090157.GA13940@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mawilcox@microsoft.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vegard.nossum@oracle.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.