From: Linus Torvalds <torvalds@linux-foundation.org>
To: Borislav Petkov <bp@alien8.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Minchan Kim <minchan.kim@gmail.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
Nick Piggin <npiggin@suse.de>,
Andrea Arcangeli <aarcange@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
sgunderson@bigfoot.com
Subject: Re: [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA
Date: Sat, 10 Apr 2010 09:41:52 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.00.1004100930460.3558@i5.linux-foundation.org> (raw)
In-Reply-To: <alpine.LFD.2.00.1004100821340.3558@i5.linux-foundation.org>
On Sat, 10 Apr 2010, Linus Torvalds wrote:
>
> But I think the fact that you are apparently not able to get the list
> corruption is a good sign. Of course, it might just be harder to trigger,
> and these things could all be a sign of a different bug, but my gut feel
> is that we did fix something, and you are just damn good at stressing the
> new code. Kudos.
Btw, I do hate the current 'find_mergeable_anon_vma()' with its duplicated
checks for prev/next compatibility that I just made even more complex.
So I'm actually inclined to want to write my simple two-liner fix as a
rather more complex cleanup patch, below.
It adds way more lines than it deletes, but a lot of it is comments (and
some of it is just because one routine got split up into three), and I
think it makes the result a lot more readable.
It also splits off the decision of whether we can reuse an non_vma from
the decision of whether we can merge the vma's - the two are kind of
related, but they are not really the same, and they have different issues.
I think it's good to try to keep separate issues separate.
This is UNTESTED! It's meant to be an "obvious cleanup" with no real
semantic difference, but if I did something wrong it won't work. Also note
the comment about the lack of locking between two adjacent anon_vma's
taking a page fault at the same time: the ACCESS_ONCE() is unlikely to
ever matter (anon_vma's are stable once they are set, so it's really just
that you could first load a NULL, and then if you re-load the value you
might get a non-NULL thing).
Also note that when checking whether the anon_vma is a singleton, we don't
hold any lock that protects the list we are checking. But
"list_is_singular()" is safe and won't oops even if the pointers in the
list are crap, because it only _compares_ the prev/next pointers, it
doesn't dereference them.
In short, what I'm saying is that there is a pretty subtle race in the
very very unlikely case that two anon_vma's get prepared concurrently, but
from a correctness standpoint it doesn't matter. We might sometimes - once
in a blue moon - reject an anon_vma that could in theory have been merged,
but that won't hurt.
Comments? Rik, Johannes?
Linus
---
mm/mmap.c | 86 ++++++++++++++++++++++++++++++++++++++++++++-----------------
1 files changed, 62 insertions(+), 24 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 75557c6..acb023e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -825,6 +825,61 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
}
/*
+ * Rough compatbility check to quickly see if it's even worth looking
+ * at sharing an anon_vma.
+ *
+ * They need to have the same vm_file, and the flags can only differ
+ * in things that mprotect may change.
+ *
+ * NOTE! The fact that we share an anon_vma doesn't _have_ to mean that
+ * we can merge the two vma's. For example, we refuse to merge a vma if
+ * there is a vm_ops->close() function, because that indicates that the
+ * driver is doing some kind of reference counting. But that doesn't
+ * really matter for the anon_vma sharing case.
+ */
+static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *b)
+{
+ return a->vm_end == b->vm_start &&
+ mpol_equal(vma_policy(a), vma_policy(b)) &&
+ a->vm_file == b->vm_file &&
+ !((a->vm_flags ^ b->vm_flags) & ~(VM_READ|VM_WRITE|VM_EXEC)) &&
+ b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
+}
+
+/*
+ * Do some basic sanity checking to see if we can re-use the anon_vma
+ * from 'old'. The 'a'/'b' vma's are in VM order - one of them will be
+ * the same as 'old', the other will be the new one that is trying
+ * to share the anon_vma.
+ *
+ * NOTE! This runs with mm_sem held for reading, so it is possible that
+ * the anon_vma of 'old' is concurrently in the process of being set up
+ * by another page fault trying to merge _that_. But that's ok: if it
+ * is being set up, that automatically means that it will be a singleton
+ * acceptable for merging, so we can do all of this optimistically. But
+ * we do that ACCESS_ONCE() to make sure that we never re-load the pointer.
+ *
+ * IOW: that the "list_is_singular()" test on the anon_vma_chain only
+ * matters for the 'stable anon_vma' case (ie the thing we want to avoid
+ * is to return an anon_vma that is "complex" due to having gone through
+ * a fork).
+ *
+ * We also make sure that the two vma's are compatible (adjacent,
+ * and with the same memory policies). That's all stable, even with just
+ * a read lock on the mm_sem.
+ */
+static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old, struct vm_area_struct *a, struct vm_area_struct *b)
+{
+ if (anon_vma_compatible(a, b)) {
+ struct anon_vma *anon_vma = ACCESS_ONCE(old->anon_vma);
+
+ if (anon_vma && list_is_singular(&old->anon_vma_chain))
+ return anon_vma;
+ }
+ return NULL;
+}
+
+/*
* find_mergeable_anon_vma is used by anon_vma_prepare, to check
* neighbouring vmas for a suitable anon_vma, before it goes off
* to allocate a new anon_vma. It checks because a repetitive
@@ -834,28 +889,16 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
*/
struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma)
{
+ struct anon_vma *anon_vma;
struct vm_area_struct *near;
- unsigned long vm_flags;
near = vma->vm_next;
if (!near)
goto try_prev;
- /*
- * Since only mprotect tries to remerge vmas, match flags
- * which might be mprotected into each other later on.
- * Neither mlock nor madvise tries to remerge at present,
- * so leave their flags as obstructing a merge.
- */
- vm_flags = vma->vm_flags & ~(VM_READ|VM_WRITE|VM_EXEC);
- vm_flags |= near->vm_flags & (VM_READ|VM_WRITE|VM_EXEC);
-
- if (near->anon_vma && vma->vm_end == near->vm_start &&
- mpol_equal(vma_policy(vma), vma_policy(near)) &&
- can_vma_merge_before(near, vm_flags,
- NULL, vma->vm_file, vma->vm_pgoff +
- ((vma->vm_end - vma->vm_start) >> PAGE_SHIFT)))
- return near->anon_vma;
+ anon_vma = reusable_anon_vma(near, vma, near);
+ if (anon_vma)
+ return anon_vma;
try_prev:
/*
* It is potentially slow to have to call find_vma_prev here.
@@ -868,14 +911,9 @@ try_prev:
if (!near)
goto none;
- vm_flags = vma->vm_flags & ~(VM_READ|VM_WRITE|VM_EXEC);
- vm_flags |= near->vm_flags & (VM_READ|VM_WRITE|VM_EXEC);
-
- if (near->anon_vma && near->vm_end == vma->vm_start &&
- mpol_equal(vma_policy(near), vma_policy(vma)) &&
- can_vma_merge_after(near, vm_flags,
- NULL, vma->vm_file, vma->vm_pgoff))
- return near->anon_vma;
+ anon_vma = reusable_anon_vma(near, near, vma);
+ if (anon_vma)
+ return anon_vma;
none:
/*
* There's no absolute need to look only at touching neighbours:
next prev parent reply other threads:[~2010-04-10 16:46 UTC|newest]
Thread overview: 231+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-30 17:50 Linux 2.6.34-rc3 Linus Torvalds
2010-03-30 21:16 ` [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenGL on RS780 (was: Re: Linux 2.6.34-rc3) Rafael J. Wysocki
2010-03-31 20:34 ` [stable] " Greg KH
2010-04-01 1:13 ` Rafael J. Wysocki
2010-04-01 2:19 ` Alex Deucher
2010-04-01 6:36 ` Clemens Ladisch
2010-04-01 15:01 ` Alex Deucher
2010-04-01 20:28 ` Rafael J. Wysocki
2010-04-01 20:39 ` Alex Deucher
2010-04-01 20:48 ` Rafael J. Wysocki
2010-04-01 21:00 ` Alex Deucher
2010-04-01 21:01 ` Alex Deucher
2010-04-01 21:08 ` Rafael J. Wysocki
2010-04-01 21:13 ` Alex Deucher
2010-04-01 21:46 ` Rafael J. Wysocki
2010-04-01 22:07 ` Alex Deucher
2010-04-01 23:20 ` Rafael J. Wysocki
2010-04-02 0:23 ` Linus Torvalds
2010-04-02 16:46 ` Rafael J. Wysocki
2010-04-03 18:08 ` Clemens Ladisch
2010-04-03 19:33 ` Rafael J. Wysocki
2010-04-01 16:29 ` Linus Torvalds
2010-04-01 17:07 ` Alex Deucher
2010-04-01 17:24 ` Linus Torvalds
2010-04-01 17:50 ` [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenGL on RS780 Clemens Ladisch
2010-04-01 17:53 ` [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenGL on RS780 (was: Re: Linux 2.6.34-rc3) Alex Deucher
2010-04-01 20:17 ` Linus Torvalds
2010-04-01 20:23 ` Alex Deucher
2010-04-01 19:46 ` Rafael J. Wysocki
2010-04-01 22:48 ` Jesse Barnes
2010-04-01 23:23 ` Rafael J. Wysocki
2010-04-02 17:59 ` Ugly rmap NULL ptr deref oopsie on hibernate (was " Borislav Petkov
2010-04-02 18:09 ` Linus Torvalds
2010-04-02 15:24 ` Andrew Morton
2010-04-02 18:37 ` Linus Torvalds
2010-04-02 22:01 ` Rik van Riel
2010-04-03 0:19 ` Linus Torvalds
2010-04-04 16:12 ` Minchan Kim
2010-04-04 17:24 ` Rik van Riel
2010-04-04 23:09 ` [PATCH] rmap: fix anon_vma_fork() memory leak Rik van Riel
2010-04-04 23:56 ` Minchan Kim
2010-04-05 15:37 ` Linus Torvalds
2010-04-05 15:48 ` Minchan Kim
2010-04-05 16:04 ` Rik van Riel
2010-04-05 16:13 ` [PATCH -v2] " Rik van Riel
2010-04-06 8:53 ` Ugly rmap NULL ptr deref oopsie on hibernate (was Linux 2.6.34-rc3) KOSAKI Motohiro
2010-04-06 10:09 ` KOSAKI Motohiro
2010-04-06 14:34 ` Rik van Riel
2010-04-06 14:38 ` Rik van Riel
2010-04-06 15:34 ` Minchan Kim
2010-04-06 15:40 ` Rik van Riel
2010-04-06 15:58 ` Minchan Kim
2010-04-06 15:55 ` Linus Torvalds
2010-04-06 16:23 ` Minchan Kim
2010-04-06 16:28 ` Linus Torvalds
2010-04-06 16:45 ` Minchan Kim
2010-04-06 16:53 ` Linus Torvalds
2010-04-06 17:04 ` Rik van Riel
2010-04-06 18:28 ` Linus Torvalds
2010-04-06 19:03 ` Andrew Morton
2010-04-06 19:10 ` Steinar H. Gunderson
2010-04-06 19:10 ` Linus Torvalds
2010-04-06 19:35 ` Linus Torvalds
2010-04-06 19:42 ` Borislav Petkov
2010-04-06 20:02 ` Linus Torvalds
2010-04-06 20:46 ` Steinar H. Gunderson
2010-04-06 20:56 ` Linus Torvalds
2010-04-06 21:05 ` Steinar H. Gunderson
2010-04-06 20:51 ` Borislav Petkov
2010-04-06 21:27 ` Linus Torvalds
2010-04-06 22:59 ` Borislav Petkov
2010-04-06 23:27 ` Linus Torvalds
2010-04-06 23:54 ` [PATCH] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA Rik van Riel
2010-04-07 7:00 ` KOSAKI Motohiro
2010-04-07 14:48 ` Rik van Riel
2010-04-07 14:54 ` [PATCH -v2] " Rik van Riel
2010-04-07 15:30 ` Linus Torvalds
2010-04-07 15:52 ` Rik van Riel
2010-04-07 16:56 ` Linus Torvalds
2010-04-07 21:19 ` Linus Torvalds
2010-04-07 21:52 ` Rik van Riel
2010-04-07 22:09 ` Linus Torvalds
2010-04-07 22:15 ` Linus Torvalds
2010-04-08 0:38 ` Rik van Riel
2010-04-07 23:37 ` Linus Torvalds
2010-04-08 2:03 ` KOSAKI Motohiro
2010-04-08 2:33 ` Linus Torvalds
2010-04-08 5:47 ` Borislav Petkov
2010-04-08 14:11 ` Linus Torvalds
2010-04-08 18:25 ` Rik van Riel
2010-04-08 18:32 ` Linus Torvalds
2010-04-08 20:31 ` Borislav Petkov
2010-04-08 21:00 ` Borislav Petkov
2010-04-08 23:16 ` Linus Torvalds
2010-04-08 23:47 ` Borislav Petkov
2010-04-09 0:50 ` Linus Torvalds
2010-04-09 1:30 ` Borislav Petkov
2010-04-09 9:21 ` Borislav Petkov
2010-04-09 16:35 ` Linus Torvalds
2010-04-09 17:40 ` Borislav Petkov
2010-04-09 17:50 ` Linus Torvalds
2010-04-09 19:14 ` Borislav Petkov
2010-04-09 19:32 ` Linus Torvalds
2010-04-09 20:03 ` Rik van Riel
2010-04-09 20:43 ` Johannes Weiner
2010-04-09 20:57 ` Rik van Riel
2010-04-09 21:33 ` Borislav Petkov
2010-04-09 23:22 ` Linus Torvalds
2010-04-09 23:45 ` Rik van Riel
2010-04-10 0:03 ` Linus Torvalds
2010-04-10 0:11 ` Rik van Riel
2010-04-09 23:54 ` Johannes Weiner
2010-04-09 23:56 ` Linus Torvalds
2010-04-10 0:19 ` Rik van Riel
2010-04-10 0:31 ` Johannes Weiner
2010-04-10 0:32 ` Linus Torvalds
2010-04-10 7:27 ` Borislav Petkov
2010-04-10 11:26 ` Borislav Petkov
2010-04-10 14:45 ` Rik van Riel
2010-04-10 15:24 ` Linus Torvalds
2010-04-10 16:38 ` Borislav Petkov
2010-04-10 17:05 ` Linus Torvalds
2010-04-10 18:21 ` Linus Torvalds
2010-04-10 18:26 ` Linus Torvalds
2010-04-10 18:51 ` Borislav Petkov
2010-04-10 18:58 ` Borislav Petkov
2010-04-10 20:05 ` Linus Torvalds
2010-04-10 20:12 ` Linus Torvalds
2010-04-10 20:36 ` Borislav Petkov
2010-04-10 20:40 ` Linus Torvalds
2010-04-10 21:25 ` Borislav Petkov
2010-04-10 21:30 ` Linus Torvalds
2010-04-10 21:51 ` Borislav Petkov
2010-04-11 13:08 ` Borislav Petkov
2010-04-11 13:19 ` [PATCH 1/3] mm: make page freeing path RCU-safe Borislav Petkov
2010-04-11 13:19 ` [PATCH 2/3] mm: cleanup find_mergeable_anon_vma complexity Borislav Petkov
2010-04-11 13:19 ` [PATCH 3/3] mm: fixup vma_adjust Borislav Petkov
2010-04-11 13:25 ` [PATCH 2/3] mm: cleanup find_mergeable_anon_vma complexity Borislav Petkov
2010-04-11 17:07 ` [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA Linus Torvalds
2010-04-11 17:16 ` Linus Torvalds
2010-04-11 18:55 ` Borislav Petkov
2010-04-12 0:13 ` Linus Torvalds
2010-04-12 1:04 ` Linus Torvalds
2010-04-12 7:20 ` Borislav Petkov
2010-04-12 16:02 ` Linus Torvalds
2010-04-12 16:26 ` Linus Torvalds
2010-04-12 18:40 ` Rik van Riel
2010-04-12 19:00 ` Borislav Petkov
2010-04-12 19:17 ` Linus Torvalds
2010-04-12 20:22 ` [PATCH 1/4] Simplify and comment on anon_vma re-use for anon_vma_prepare() Linus Torvalds
2010-04-12 20:23 ` [PATCH 2/4] vma_adjust: fix the copying of anon_vma chains Linus Torvalds
2010-04-12 20:23 ` [PATCH 3/4] anon_vma: clone the anon_vma chain in the right order Linus Torvalds
2010-04-12 20:23 ` [PATCH 4/4] anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma Linus Torvalds
2010-04-12 21:03 ` Rik van Riel
2010-04-13 0:41 ` Johannes Weiner
2010-04-13 1:08 ` Linus Torvalds
2010-04-13 4:23 ` Minchan Kim
2010-04-13 4:26 ` Minchan Kim
2010-04-12 20:57 ` [PATCH 3/4] anon_vma: clone the anon_vma chain in the right order Rik van Riel
2010-04-13 0:18 ` Johannes Weiner
2010-04-13 4:16 ` Minchan Kim
2010-04-12 20:54 ` [PATCH 2/4] vma_adjust: fix the copying of anon_vma chains Rik van Riel
2010-04-12 23:59 ` Johannes Weiner
2010-04-13 4:15 ` Minchan Kim
2010-04-12 20:54 ` [PATCH 1/4] Simplify and comment on anon_vma re-use for anon_vma_prepare() Rik van Riel
2010-04-12 23:54 ` Johannes Weiner
2010-04-13 4:04 ` Minchan Kim
2010-04-13 9:51 ` Peter Zijlstra
2010-04-12 21:50 ` [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA Borislav Petkov
2010-04-12 22:11 ` Linus Torvalds
2010-04-12 22:18 ` Linus Torvalds
2010-04-12 22:29 ` Borislav Petkov
2010-04-13 9:38 ` Borislav Petkov
2010-04-14 21:59 ` [PATCH] rmap: add exclusively owned pages to the newest anon_vma Rik van Riel
2010-04-14 23:20 ` Johannes Weiner
2010-04-15 8:34 ` Borislav Petkov
2010-04-15 16:02 ` Minchan Kim
2010-04-15 20:01 ` Linus Torvalds
2010-04-16 6:09 ` Felipe Balbi
2010-04-16 14:48 ` Linus Torvalds
2010-04-11 19:49 ` [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA Rik van Riel
2010-04-12 15:44 ` Linus Torvalds
2010-04-12 15:51 ` Rik van Riel
2010-04-11 21:45 ` Rik van Riel
2010-04-12 15:51 ` Linus Torvalds
2010-04-13 10:36 ` KOSAKI Motohiro
2010-04-10 20:24 ` Rik van Riel
2010-04-10 20:34 ` Linus Torvalds
2010-04-10 20:43 ` Rik van Riel
2010-04-10 20:32 ` Rik van Riel
2010-04-10 19:36 ` Rik van Riel
2010-04-12 14:40 ` Peter Zijlstra
2010-04-12 15:17 ` Minchan Kim
2010-04-12 15:33 ` Peter Zijlstra
2010-04-12 15:19 ` Rik van Riel
2010-04-12 16:01 ` Peter Zijlstra
2010-04-12 16:06 ` Rik van Riel
2010-04-12 16:46 ` Linus Torvalds
2010-04-12 18:40 ` Peter Zijlstra
2010-04-12 19:30 ` Peter Zijlstra
2010-04-12 19:44 ` Peter Zijlstra
2010-04-13 10:53 ` KOSAKI Motohiro
2010-04-13 11:30 ` Peter Zijlstra
2010-04-13 12:00 ` KOSAKI Motohiro
2010-04-14 14:27 ` Peter Zijlstra
2010-04-10 17:07 ` Borislav Petkov
2010-04-10 16:41 ` Linus Torvalds [this message]
2010-04-10 22:49 ` Johannes Weiner
2010-04-10 23:31 ` Linus Torvalds
2010-04-09 1:45 ` KOSAKI Motohiro
2010-04-07 15:55 ` Minchan Kim
2010-04-07 7:29 ` Ugly rmap NULL ptr deref oopsie on hibernate (was Linux 2.6.34-rc3) Borislav Petkov
2010-04-07 14:05 ` Paulo Marques
2010-04-07 14:13 ` Borislav Petkov
2010-04-06 23:37 ` Linus Torvalds
2010-04-06 23:22 ` Rik van Riel
2010-04-07 0:10 ` Linus Torvalds
2010-04-07 1:18 ` Rik van Riel
2010-04-07 7:22 ` Borislav Petkov
2010-04-07 10:09 ` Pekka Enberg
2010-04-07 10:12 ` KOSAKI Motohiro
2010-04-07 8:41 ` Peter Zijlstra
2010-04-07 8:36 ` Peter Zijlstra
2010-04-07 9:16 ` Johannes Weiner
2010-04-07 9:37 ` Peter Zijlstra
2010-04-07 14:12 ` Rik van Riel
2010-04-07 15:46 ` Linus Torvalds
2010-04-06 16:32 ` Linus Torvalds
2010-04-06 16:54 ` Minchan Kim
2010-04-07 8:37 ` Peter Zijlstra
2010-04-06 17:05 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.1004100930460.3558@i5.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=Lee.Schermerhorn@hp.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=hannes@cmpxchg.org \
--cc=hugh.dickins@tiscali.co.uk \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=minchan.kim@gmail.com \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
--cc=sgunderson@bigfoot.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).