From: Peter Zijlstra <peterz@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>, Borislav Petkov <bp@alien8.de>,
Johannes Weiner <hannes@cmpxchg.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
Minchan Kim <minchan.kim@gmail.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
Nick Piggin <npiggin@suse.de>,
Andrea Arcangeli <aarcange@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
sgunderson@bigfoot.com
Subject: Re: [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA
Date: Mon, 12 Apr 2010 20:40:38 +0200 [thread overview]
Message-ID: <1271097638.4807.129.camel@twins> (raw)
In-Reply-To: <alpine.LFD.2.00.1004120941310.26679@i5.linux-foundation.org>
On Mon, 2010-04-12 at 09:46 -0700, Linus Torvalds wrote:
>
> On Mon, 12 Apr 2010, Rik van Riel wrote:
>
> > On 04/12/2010 12:01 PM, Peter Zijlstra wrote:
> >
> > > @@ -864,15 +889,8 @@ void page_remove_rmap(struct page *page)
> > > __dec_zone_page_state(page, NR_FILE_MAPPED);
> > > mem_cgroup_update_file_mapped(page, -1);
> > > }
> > > - /*
> > > - * It would be tidy to reset the PageAnon mapping here,
> > > - * but that might overwrite a racing page_add_anon_rmap
> > > - * which increments mapcount after us but sets mapping
> > > - * before us: so leave the reset to free_hot_cold_page,
> > > - * and remember that it's only reliable while mapped.
> > > - * Leaving it set also helps swapoff to reinstate ptes
> > > - * faster for those pages still in swapcache.
> > > - */
> > > +
> > > + page->mapping = NULL;
> > > }
> >
> > That would be a bug for file pages :)
> >
> > I could see how it could work for anonymous memory, though.
>
> I think it's scary for anonymous pages too. The _common_ case of
> page_remove_rmap() is from unmap/exit, which holds no locks on the page
> what-so-ever. So assuming the page could be reachable some other way (swap
> cache etc), I think the above is pretty scary.
Fully agreed.
> Also do note that the bug we've been chasing has _always_ had that test
> for "page_mapped(page)". See my other email about why the unmapped case
> isn't even interesting, because it's so easy to see how page->mapping can
> be stale for unmapped pages.
>
> It's the _mapped_ case that is interesting, not the unmapped one. So
> setting page->mapping to NULL when unmapping is perhaps a nice consistency
> issue ("never have stale pointers"), but it's missing the fact that it's
> not really the case we care about.
Yes, I don't think this is the problem that has been plaguing us for
over a week now.
But while staring at that code it did get me worried that the current
code (page_lock_anon_vma):
- is missing the smp_read_barrier_depends() after the ACCESS_ONCE
- isn't properly ordered wrt page->mapping and page->_mapcount.
- doesn't appear to guarantee much at all when returning an anon_vma
since it locks after checking page->_mapcount so:
* it can return !NULL for an unmapped page (your patch cures that)
* it can return !NULL but for a different anon_vma
(my earlier patch checking page_rmapping() after the spin_lock
cures that, but doesn't cure the above):
[ highly unlikely but not impossible race ]
page_referenced(page_A)
try_to_unmap(page_A)
unrelated fault
fault page_A
CPU0 CPU1 CPU2 CPU3
rcu_read_lock()
anon_vma = page->mapping;
if (!anon_vma & ANON_BIT)
goto out
if (!page_mapped(page))
goto out
page_remove_rmap()
...
anon_vma_free()-----\
v
anon_vma_alloc()
anon_vma_alloc()
page_add_anon_rmap()
^
spin_lock(anon_vma->lock)----------/
Now I don't think the above can happen due to how our slab
allocators work, they won't share a slab page between cpus like
that, but once we make the whole thing preemptible this race
becomes a lot more likely.
So a page_lock_anon_vma(), that looks a little like the below should
(I think) cure all our problems with it.
struct anon_vma *page_lock_anon_vma(struct page *page)
{
struct anon_vma *anon_vma;
unsigned long anon_mapping;
rcu_read_lock();
again:
anon_mapping = (unsigned long)rcu_dereference(page->mapping);
if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
goto out;
anon_vma = (struct anon_vma *)(anon_mapping - PAGE_MAPPING_ANON);
/*
* The RCU read lock ensures we can safely dereference anon_vma
* since it ensures the backing slab won't go away. It will however
* not guarantee it's the right object.
*
* First take the anon_vma->lock, this will, per anon_vma_unlink()
* avoid this anon_vma from being freed if it is a valid object.
*/
spin_lock(&anon_vma->lock);
/*
* Secondly, we have to re-read page->mapping, so ensure it
* has not changed, rely on spin_lock() being at least a
* compiler barrier to force the re-read.
*/
if (unlikely(page_rmapping(page) != anon_vma)) {
spin_unlock(&anon_vma->lock);
goto again;
}
/*
* Ensure we read page->mapping before page->_mapcount,
* orders against atomic_add_negative() in page_remove_rmap().
*/
smp_rmb();
/*
* Finally check that the page is still mapped,
* if not, this can't possibly be the right anon_vma.
*/
if (!page_mapped(page))
goto unlock;
return anon_vma;
unlock:
spin_unlock(&anon_vma->lock);
out:
rcu_read_unlock();
return NULL;
}
With this, I think we can actually drop the RCU read lock when returning
since if this is indeed a valid anon_vma for this page, then the page is
still mapped, and hence the anon_vma was not deleted, and a possible
future delete will be held back by us holding the anon_vma->lock.
Now I could be totally wrong and have confused myself throroughly, but
how does this look?
next prev parent reply other threads:[~2010-04-12 18:42 UTC|newest]
Thread overview: 231+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-30 17:50 Linux 2.6.34-rc3 Linus Torvalds
2010-03-30 21:16 ` [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenGL on RS780 (was: Re: Linux 2.6.34-rc3) Rafael J. Wysocki
2010-03-31 20:34 ` [stable] " Greg KH
2010-04-01 1:13 ` Rafael J. Wysocki
2010-04-01 2:19 ` Alex Deucher
2010-04-01 6:36 ` Clemens Ladisch
2010-04-01 15:01 ` Alex Deucher
2010-04-01 20:28 ` Rafael J. Wysocki
2010-04-01 20:39 ` Alex Deucher
2010-04-01 20:48 ` Rafael J. Wysocki
2010-04-01 21:00 ` Alex Deucher
2010-04-01 21:01 ` Alex Deucher
2010-04-01 21:08 ` Rafael J. Wysocki
2010-04-01 21:13 ` Alex Deucher
2010-04-01 21:46 ` Rafael J. Wysocki
2010-04-01 22:07 ` Alex Deucher
2010-04-01 23:20 ` Rafael J. Wysocki
2010-04-02 0:23 ` Linus Torvalds
2010-04-02 16:46 ` Rafael J. Wysocki
2010-04-03 18:08 ` Clemens Ladisch
2010-04-03 19:33 ` Rafael J. Wysocki
2010-04-01 16:29 ` Linus Torvalds
2010-04-01 17:07 ` Alex Deucher
2010-04-01 17:24 ` Linus Torvalds
2010-04-01 17:50 ` [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenGL on RS780 Clemens Ladisch
2010-04-01 17:53 ` [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenGL on RS780 (was: Re: Linux 2.6.34-rc3) Alex Deucher
2010-04-01 20:17 ` Linus Torvalds
2010-04-01 20:23 ` Alex Deucher
2010-04-01 19:46 ` Rafael J. Wysocki
2010-04-01 22:48 ` Jesse Barnes
2010-04-01 23:23 ` Rafael J. Wysocki
2010-04-02 17:59 ` Ugly rmap NULL ptr deref oopsie on hibernate (was " Borislav Petkov
2010-04-02 18:09 ` Linus Torvalds
2010-04-02 15:24 ` Andrew Morton
2010-04-02 18:37 ` Linus Torvalds
2010-04-02 22:01 ` Rik van Riel
2010-04-03 0:19 ` Linus Torvalds
2010-04-04 16:12 ` Minchan Kim
2010-04-04 17:24 ` Rik van Riel
2010-04-04 23:09 ` [PATCH] rmap: fix anon_vma_fork() memory leak Rik van Riel
2010-04-04 23:56 ` Minchan Kim
2010-04-05 15:37 ` Linus Torvalds
2010-04-05 15:48 ` Minchan Kim
2010-04-05 16:04 ` Rik van Riel
2010-04-05 16:13 ` [PATCH -v2] " Rik van Riel
2010-04-06 8:53 ` Ugly rmap NULL ptr deref oopsie on hibernate (was Linux 2.6.34-rc3) KOSAKI Motohiro
2010-04-06 10:09 ` KOSAKI Motohiro
2010-04-06 14:34 ` Rik van Riel
2010-04-06 14:38 ` Rik van Riel
2010-04-06 15:34 ` Minchan Kim
2010-04-06 15:40 ` Rik van Riel
2010-04-06 15:58 ` Minchan Kim
2010-04-06 15:55 ` Linus Torvalds
2010-04-06 16:23 ` Minchan Kim
2010-04-06 16:28 ` Linus Torvalds
2010-04-06 16:45 ` Minchan Kim
2010-04-06 16:53 ` Linus Torvalds
2010-04-06 17:04 ` Rik van Riel
2010-04-06 18:28 ` Linus Torvalds
2010-04-06 19:03 ` Andrew Morton
2010-04-06 19:10 ` Steinar H. Gunderson
2010-04-06 19:10 ` Linus Torvalds
2010-04-06 19:35 ` Linus Torvalds
2010-04-06 19:42 ` Borislav Petkov
2010-04-06 20:02 ` Linus Torvalds
2010-04-06 20:46 ` Steinar H. Gunderson
2010-04-06 20:56 ` Linus Torvalds
2010-04-06 21:05 ` Steinar H. Gunderson
2010-04-06 20:51 ` Borislav Petkov
2010-04-06 21:27 ` Linus Torvalds
2010-04-06 22:59 ` Borislav Petkov
2010-04-06 23:27 ` Linus Torvalds
2010-04-06 23:54 ` [PATCH] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA Rik van Riel
2010-04-07 7:00 ` KOSAKI Motohiro
2010-04-07 14:48 ` Rik van Riel
2010-04-07 14:54 ` [PATCH -v2] " Rik van Riel
2010-04-07 15:30 ` Linus Torvalds
2010-04-07 15:52 ` Rik van Riel
2010-04-07 16:56 ` Linus Torvalds
2010-04-07 21:19 ` Linus Torvalds
2010-04-07 21:52 ` Rik van Riel
2010-04-07 22:09 ` Linus Torvalds
2010-04-07 22:15 ` Linus Torvalds
2010-04-08 0:38 ` Rik van Riel
2010-04-07 23:37 ` Linus Torvalds
2010-04-08 2:03 ` KOSAKI Motohiro
2010-04-08 2:33 ` Linus Torvalds
2010-04-08 5:47 ` Borislav Petkov
2010-04-08 14:11 ` Linus Torvalds
2010-04-08 18:25 ` Rik van Riel
2010-04-08 18:32 ` Linus Torvalds
2010-04-08 20:31 ` Borislav Petkov
2010-04-08 21:00 ` Borislav Petkov
2010-04-08 23:16 ` Linus Torvalds
2010-04-08 23:47 ` Borislav Petkov
2010-04-09 0:50 ` Linus Torvalds
2010-04-09 1:30 ` Borislav Petkov
2010-04-09 9:21 ` Borislav Petkov
2010-04-09 16:35 ` Linus Torvalds
2010-04-09 17:40 ` Borislav Petkov
2010-04-09 17:50 ` Linus Torvalds
2010-04-09 19:14 ` Borislav Petkov
2010-04-09 19:32 ` Linus Torvalds
2010-04-09 20:03 ` Rik van Riel
2010-04-09 20:43 ` Johannes Weiner
2010-04-09 20:57 ` Rik van Riel
2010-04-09 21:33 ` Borislav Petkov
2010-04-09 23:22 ` Linus Torvalds
2010-04-09 23:45 ` Rik van Riel
2010-04-10 0:03 ` Linus Torvalds
2010-04-10 0:11 ` Rik van Riel
2010-04-09 23:54 ` Johannes Weiner
2010-04-09 23:56 ` Linus Torvalds
2010-04-10 0:19 ` Rik van Riel
2010-04-10 0:31 ` Johannes Weiner
2010-04-10 0:32 ` Linus Torvalds
2010-04-10 7:27 ` Borislav Petkov
2010-04-10 11:26 ` Borislav Petkov
2010-04-10 14:45 ` Rik van Riel
2010-04-10 15:24 ` Linus Torvalds
2010-04-10 16:38 ` Borislav Petkov
2010-04-10 17:05 ` Linus Torvalds
2010-04-10 18:21 ` Linus Torvalds
2010-04-10 18:26 ` Linus Torvalds
2010-04-10 18:51 ` Borislav Petkov
2010-04-10 18:58 ` Borislav Petkov
2010-04-10 20:05 ` Linus Torvalds
2010-04-10 20:12 ` Linus Torvalds
2010-04-10 20:36 ` Borislav Petkov
2010-04-10 20:40 ` Linus Torvalds
2010-04-10 21:25 ` Borislav Petkov
2010-04-10 21:30 ` Linus Torvalds
2010-04-10 21:51 ` Borislav Petkov
2010-04-11 13:08 ` Borislav Petkov
2010-04-11 13:19 ` [PATCH 1/3] mm: make page freeing path RCU-safe Borislav Petkov
2010-04-11 13:19 ` [PATCH 2/3] mm: cleanup find_mergeable_anon_vma complexity Borislav Petkov
2010-04-11 13:19 ` [PATCH 3/3] mm: fixup vma_adjust Borislav Petkov
2010-04-11 13:25 ` [PATCH 2/3] mm: cleanup find_mergeable_anon_vma complexity Borislav Petkov
2010-04-11 17:07 ` [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA Linus Torvalds
2010-04-11 17:16 ` Linus Torvalds
2010-04-11 18:55 ` Borislav Petkov
2010-04-12 0:13 ` Linus Torvalds
2010-04-12 1:04 ` Linus Torvalds
2010-04-12 7:20 ` Borislav Petkov
2010-04-12 16:02 ` Linus Torvalds
2010-04-12 16:26 ` Linus Torvalds
2010-04-12 18:40 ` Rik van Riel
2010-04-12 19:00 ` Borislav Petkov
2010-04-12 19:17 ` Linus Torvalds
2010-04-12 20:22 ` [PATCH 1/4] Simplify and comment on anon_vma re-use for anon_vma_prepare() Linus Torvalds
2010-04-12 20:23 ` [PATCH 2/4] vma_adjust: fix the copying of anon_vma chains Linus Torvalds
2010-04-12 20:23 ` [PATCH 3/4] anon_vma: clone the anon_vma chain in the right order Linus Torvalds
2010-04-12 20:23 ` [PATCH 4/4] anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma Linus Torvalds
2010-04-12 21:03 ` Rik van Riel
2010-04-13 0:41 ` Johannes Weiner
2010-04-13 1:08 ` Linus Torvalds
2010-04-13 4:23 ` Minchan Kim
2010-04-13 4:26 ` Minchan Kim
2010-04-12 20:57 ` [PATCH 3/4] anon_vma: clone the anon_vma chain in the right order Rik van Riel
2010-04-13 0:18 ` Johannes Weiner
2010-04-13 4:16 ` Minchan Kim
2010-04-12 20:54 ` [PATCH 2/4] vma_adjust: fix the copying of anon_vma chains Rik van Riel
2010-04-12 23:59 ` Johannes Weiner
2010-04-13 4:15 ` Minchan Kim
2010-04-12 20:54 ` [PATCH 1/4] Simplify and comment on anon_vma re-use for anon_vma_prepare() Rik van Riel
2010-04-12 23:54 ` Johannes Weiner
2010-04-13 4:04 ` Minchan Kim
2010-04-13 9:51 ` Peter Zijlstra
2010-04-12 21:50 ` [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA Borislav Petkov
2010-04-12 22:11 ` Linus Torvalds
2010-04-12 22:18 ` Linus Torvalds
2010-04-12 22:29 ` Borislav Petkov
2010-04-13 9:38 ` Borislav Petkov
2010-04-14 21:59 ` [PATCH] rmap: add exclusively owned pages to the newest anon_vma Rik van Riel
2010-04-14 23:20 ` Johannes Weiner
2010-04-15 8:34 ` Borislav Petkov
2010-04-15 16:02 ` Minchan Kim
2010-04-15 20:01 ` Linus Torvalds
2010-04-16 6:09 ` Felipe Balbi
2010-04-16 14:48 ` Linus Torvalds
2010-04-11 19:49 ` [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA Rik van Riel
2010-04-12 15:44 ` Linus Torvalds
2010-04-12 15:51 ` Rik van Riel
2010-04-11 21:45 ` Rik van Riel
2010-04-12 15:51 ` Linus Torvalds
2010-04-13 10:36 ` KOSAKI Motohiro
2010-04-10 20:24 ` Rik van Riel
2010-04-10 20:34 ` Linus Torvalds
2010-04-10 20:43 ` Rik van Riel
2010-04-10 20:32 ` Rik van Riel
2010-04-10 19:36 ` Rik van Riel
2010-04-12 14:40 ` Peter Zijlstra
2010-04-12 15:17 ` Minchan Kim
2010-04-12 15:33 ` Peter Zijlstra
2010-04-12 15:19 ` Rik van Riel
2010-04-12 16:01 ` Peter Zijlstra
2010-04-12 16:06 ` Rik van Riel
2010-04-12 16:46 ` Linus Torvalds
2010-04-12 18:40 ` Peter Zijlstra [this message]
2010-04-12 19:30 ` Peter Zijlstra
2010-04-12 19:44 ` Peter Zijlstra
2010-04-13 10:53 ` KOSAKI Motohiro
2010-04-13 11:30 ` Peter Zijlstra
2010-04-13 12:00 ` KOSAKI Motohiro
2010-04-14 14:27 ` Peter Zijlstra
2010-04-10 17:07 ` Borislav Petkov
2010-04-10 16:41 ` Linus Torvalds
2010-04-10 22:49 ` Johannes Weiner
2010-04-10 23:31 ` Linus Torvalds
2010-04-09 1:45 ` KOSAKI Motohiro
2010-04-07 15:55 ` Minchan Kim
2010-04-07 7:29 ` Ugly rmap NULL ptr deref oopsie on hibernate (was Linux 2.6.34-rc3) Borislav Petkov
2010-04-07 14:05 ` Paulo Marques
2010-04-07 14:13 ` Borislav Petkov
2010-04-06 23:37 ` Linus Torvalds
2010-04-06 23:22 ` Rik van Riel
2010-04-07 0:10 ` Linus Torvalds
2010-04-07 1:18 ` Rik van Riel
2010-04-07 7:22 ` Borislav Petkov
2010-04-07 10:09 ` Pekka Enberg
2010-04-07 10:12 ` KOSAKI Motohiro
2010-04-07 8:41 ` Peter Zijlstra
2010-04-07 8:36 ` Peter Zijlstra
2010-04-07 9:16 ` Johannes Weiner
2010-04-07 9:37 ` Peter Zijlstra
2010-04-07 14:12 ` Rik van Riel
2010-04-07 15:46 ` Linus Torvalds
2010-04-06 16:32 ` Linus Torvalds
2010-04-06 16:54 ` Minchan Kim
2010-04-07 8:37 ` Peter Zijlstra
2010-04-06 17:05 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1271097638.4807.129.camel@twins \
--to=peterz@infradead.org \
--cc=Lee.Schermerhorn@hp.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=hannes@cmpxchg.org \
--cc=hugh.dickins@tiscali.co.uk \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=minchan.kim@gmail.com \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
--cc=sgunderson@bigfoot.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).