linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: prevent mmap_cache race in find_vma()
@ 2013-04-04 18:35 Hugh Dickins
  2013-04-04 18:48 ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Hugh Dickins @ 2013-04-04 18:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Jan Stancek, Jakub Jelinek, David Rientjes,
	Johannes Weiner, Paul E. McKenney, Ian Lance Taylor, linux-mm,
	linux-kernel

From: Jan Stancek <jstancek@redhat.com>

find_vma() can be called by multiple threads with read lock
held on mm->mmap_sem and any of them can update mm->mmap_cache.
Prevent compiler from re-fetching mm->mmap_cache, because other
readers could update it in the meantime:

               thread 1                             thread 2
                                        |
  find_vma()                            |  find_vma()
    struct vm_area_struct *vma = NULL;  |
    vma = mm->mmap_cache;               |
    if (!(vma && vma->vm_end > addr     |
        && vma->vm_start <= addr)) {    |
                                        |    mm->mmap_cache = vma;
    return vma;                         |
     ^^ compiler may optimize this      |
        local variable out and re-read  |
        mm->mmap_cache                  |

This issue can be reproduced with gcc-4.8.0-1 on s390x by running
mallocstress testcase from LTP, which triggers:
  kernel BUG at mm/rmap.c:1088!
    Call Trace:
     ([<000003d100c57000>] 0x3d100c57000)
      [<000000000023a1c0>] do_wp_page+0x2fc/0xa88
      [<000000000023baae>] handle_pte_fault+0x41a/0xac8
      [<000000000023d832>] handle_mm_fault+0x17a/0x268
      [<000000000060507a>] do_protection_exception+0x1e2/0x394
      [<0000000000603a04>] pgm_check_handler+0x138/0x13c
      [<000003fffcf1f07a>] 0x3fffcf1f07a
    Last Breaking-Event-Address:
      [<000000000024755e>] page_add_new_anon_rmap+0xc2/0x168

Thanks to Jakub Jelinek for his insight on gcc and helping to
track this down.

Signed-off-by: Jan Stancek <jstancek@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
---
This is Jan's valuable patch, posted a couple of days ago, which then
triggered "some discussion" of ACCESS_ONCE() on linux-mm.  I've marked
it for stable: should go back years, but 3.5 changed the indentation.

 mm/mmap.c  |    2 +-
 mm/nommu.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 6466699..0db0de1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1940,7 +1940,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
 
 	/* Check the cache first. */
 	/* (Cache hit rate is typically around 35%.) */
-	vma = mm->mmap_cache;
+	vma = ACCESS_ONCE(mm->mmap_cache);
 	if (!(vma && vma->vm_end > addr && vma->vm_start <= addr)) {
 		struct rb_node *rb_node;
 
diff --git a/mm/nommu.c b/mm/nommu.c
index e193280..2f3ea74 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -821,7 +821,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
 	struct vm_area_struct *vma;
 
 	/* check the cache first */
-	vma = mm->mmap_cache;
+	vma = ACCESS_ONCE(mm->mmap_cache);
 	if (vma && vma->vm_start <= addr && vma->vm_end > addr)
 		return vma;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: prevent mmap_cache race in find_vma()
  2013-04-04 18:35 [PATCH] mm: prevent mmap_cache race in find_vma() Hugh Dickins
@ 2013-04-04 18:48 ` Linus Torvalds
  2013-04-04 19:01   ` Hugh Dickins
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2013-04-04 18:48 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Jan Stancek, Jakub Jelinek, David Rientjes,
	Johannes Weiner, Paul E. McKenney, Ian Lance Taylor, linux-mm,
	Linux Kernel Mailing List

On Thu, Apr 4, 2013 at 11:35 AM, Hugh Dickins <hughd@google.com> wrote:
>
> find_vma() can be called by multiple threads with read lock
> held on mm->mmap_sem and any of them can update mm->mmap_cache.
> Prevent compiler from re-fetching mm->mmap_cache, because other
> readers could update it in the meantime:

Ack. I do wonder if we should mark the unlocked update too some way
(also in find_vma()), although it's probably not a problem in practice
since there's no way the compiler can reasonably really do anything
odd with it. We *could* make that an ACCESS_ONCE() write too just to
highlight the fact that it's an unlocked write to this optimistic data
structure.

Anyway, applied.

                  Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: prevent mmap_cache race in find_vma()
  2013-04-04 18:48 ` Linus Torvalds
@ 2013-04-04 19:01   ` Hugh Dickins
  2013-04-04 19:10     ` Linus Torvalds
  2013-04-04 22:30     ` Paul E. McKenney
  0 siblings, 2 replies; 5+ messages in thread
From: Hugh Dickins @ 2013-04-04 19:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Jan Stancek, Jakub Jelinek, David Rientjes,
	Johannes Weiner, Paul E. McKenney, Ian Lance Taylor, linux-mm,
	Linux Kernel Mailing List

On Thu, 4 Apr 2013, Linus Torvalds wrote:
> On Thu, Apr 4, 2013 at 11:35 AM, Hugh Dickins <hughd@google.com> wrote:
> >
> > find_vma() can be called by multiple threads with read lock
> > held on mm->mmap_sem and any of them can update mm->mmap_cache.
> > Prevent compiler from re-fetching mm->mmap_cache, because other
> > readers could update it in the meantime:
> 
> Ack. I do wonder if we should mark the unlocked update too some way
> (also in find_vma()), although it's probably not a problem in practice
> since there's no way the compiler can reasonably really do anything
> odd with it. We *could* make that an ACCESS_ONCE() write too just to
> highlight the fact that it's an unlocked write to this optimistic data
> structure.

Hah, you beat me to it.

I wanted to get Jan's patch in first, seeing as it actually fixes his
observed issue; and it is very nice to have such a good description of
one of those, when ACCESS_ONCE() is usually just an insurance policy.

But then I was researching the much rarer "ACCESS_ONCE(x) = y" usage
(popular in drivers/net/wireless/ath/ath9k and kernel/rcutree* and
sound/firewire, but few places else).

When Paul reminded us of it yesterday, I came to wonder if actually
every use of ACCESS_ONCE in the read form should strictly be matched
by ACCESS_ONCE whenever modifying the location.

My uneducated guess is that strictly it ought to, in the sense of
insurance policy; but that (apart from that strange split writing
issue which came up a couple of months ago) in practice our compilers
have not "advanced" to the point of making this an issue yet.

> 
> Anyway, applied.

Thanks,
Hugh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: prevent mmap_cache race in find_vma()
  2013-04-04 19:01   ` Hugh Dickins
@ 2013-04-04 19:10     ` Linus Torvalds
  2013-04-04 22:30     ` Paul E. McKenney
  1 sibling, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2013-04-04 19:10 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Jan Stancek, Jakub Jelinek, David Rientjes,
	Johannes Weiner, Paul E. McKenney, Ian Lance Taylor, linux-mm,
	Linux Kernel Mailing List

On Thu, Apr 4, 2013 at 12:01 PM, Hugh Dickins <hughd@google.com> wrote:
>
> When Paul reminded us of it yesterday, I came to wonder if actually
> every use of ACCESS_ONCE in the read form should strictly be matched
> by ACCESS_ONCE whenever modifying the location.
>
> My uneducated guess is that strictly it ought to, in the sense of
> insurance policy; but that (apart from that strange split writing
> issue which came up a couple of months ago) in practice our compilers
> have not "advanced" to the point of making this an issue yet.

I don't see how a compiler could reasonably really ever do anything
different, but I do think the ACCESS_ONCE() modification version might
be a good thing just as a "documentation".

This is a good example of this issue, exactly because we have a mix of
both speculative cases (the find_vma() lookup and modification)
together with strictly exclusive locked accesses to the same field
(the ones that invalidate the cache under the write lock). So
documenting that the write in find_vma() is this kind of "optimistic
unlocked access" is actually a potentially interesting piece of
information for programmers, completely independently of whether the
compiler will then treat it really differently or not.

Of course, a plain comment would do the same, but would be less greppable.

And despite the verbiage here, I don't really have a very strong
opinion on this. I'm going to let it go, and if somebody sends me a
patch with a good explanation in the next merge window, I'll probably
apply it.

                    Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: prevent mmap_cache race in find_vma()
  2013-04-04 19:01   ` Hugh Dickins
  2013-04-04 19:10     ` Linus Torvalds
@ 2013-04-04 22:30     ` Paul E. McKenney
  1 sibling, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2013-04-04 22:30 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Jan Stancek, Jakub Jelinek,
	David Rientjes, Johannes Weiner, Ian Lance Taylor, linux-mm,
	Linux Kernel Mailing List

On Thu, Apr 04, 2013 at 12:01:52PM -0700, Hugh Dickins wrote:
> On Thu, 4 Apr 2013, Linus Torvalds wrote:
> > On Thu, Apr 4, 2013 at 11:35 AM, Hugh Dickins <hughd@google.com> wrote:
> > >
> > > find_vma() can be called by multiple threads with read lock
> > > held on mm->mmap_sem and any of them can update mm->mmap_cache.
> > > Prevent compiler from re-fetching mm->mmap_cache, because other
> > > readers could update it in the meantime:
> > 
> > Ack. I do wonder if we should mark the unlocked update too some way
> > (also in find_vma()), although it's probably not a problem in practice
> > since there's no way the compiler can reasonably really do anything
> > odd with it. We *could* make that an ACCESS_ONCE() write too just to
> > highlight the fact that it's an unlocked write to this optimistic data
> > structure.
> 
> Hah, you beat me to it.
> 
> I wanted to get Jan's patch in first, seeing as it actually fixes his
> observed issue; and it is very nice to have such a good description of
> one of those, when ACCESS_ONCE() is usually just an insurance policy.
> 
> But then I was researching the much rarer "ACCESS_ONCE(x) = y" usage
> (popular in drivers/net/wireless/ath/ath9k and kernel/rcutree* and
> sound/firewire, but few places else).
> 
> When Paul reminded us of it yesterday, I came to wonder if actually
> every use of ACCESS_ONCE in the read form should strictly be matched
> by ACCESS_ONCE whenever modifying the location.

>From a hygiene/insurance/documentation point of view, I agree.  Of course,
it is OK to use things like cmpxchg() in place of ACCESS_ONCE().

The possible exceptions that come to mind are (1) if the access in
question is done holding a lock that excludes all other accesses to that
location, (2) if the access in question happens during initialization
before any other CPU has access to that location, and (3) if the access
in question happens during cleanup after all other CPUs have lost access
to that location.  Any others?

/me goes to look to see if the RCU code follows this good advice...

							Thanx, Paul

> My uneducated guess is that strictly it ought to, in the sense of
> insurance policy; but that (apart from that strange split writing
> issue which came up a couple of months ago) in practice our compilers
> have not "advanced" to the point of making this an issue yet.
> 
> > 
> > Anyway, applied.
> 
> Thanks,
> Hugh
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-04-04 22:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-04 18:35 [PATCH] mm: prevent mmap_cache race in find_vma() Hugh Dickins
2013-04-04 18:48 ` Linus Torvalds
2013-04-04 19:01   ` Hugh Dickins
2013-04-04 19:10     ` Linus Torvalds
2013-04-04 22:30     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).