From: Andrea Arcangeli <aarcange@redhat.com>
To: qemu-devel@nongnu.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-api@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Andres Lagar-Cavilla <andreslc@google.com>,
Dave Hansen <dave@sr71.net>, Paolo Bonzini <pbonzini@redhat.com>,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
Andy Lutomirski <luto@amacapital.net>,
Andrew Morton <akpm@linux-foundation.org>,
Sasha Levin <sasha.levin@oracle.com>,
Hugh Dickins <hughd@google.com>,
Peter Feiner <pfeiner@google.com>,
"\\\"Dr. David Alan Gilbert\\\"" <dgilbert@redhat.com>,
Christopher Covington <cov@codeaurora.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Android Kernel Team <kernel-team@android.com>,
Robert Love <rlove@google.com>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Neil Brown <neilb@suse.de>, Mike Hommey <mh@glandium.org>,
Taras Glek <tglek@mozilla.com>, Jan Kara <jack@suse.cz>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
Michel Lespinasse <walken@google.com>,
Minchan Kim <minchan@kernel.org>,
Keith Packard <keithp@keithp.com>,
"Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
Isaku Yamahata <yamahata@valinux.co.jp>,
Anthony Liguori <anthony@codemonkey.ws>,
Stefan Hajnoczi <stefanha@gmail.com>,
Wenchao Xia <wenchaoqemu@gmail.com>,
Andrew Jones <drjones@redhat.com>,
Juan Quintela <quintela@redhat.com>
Subject: [PATCH 10/17] mm: rmap preparation for remap_anon_pages
Date: Fri, 3 Oct 2014 19:08:00 +0200 [thread overview]
Message-ID: <1412356087-16115-11-git-send-email-aarcange@redhat.com> (raw)
In-Reply-To: <1412356087-16115-1-git-send-email-aarcange@redhat.com>
remap_anon_pages (unlike remap_file_pages) tries to be non intrusive
in the rmap code.
As far as the rmap code is concerned, rmap_anon_pages only alters the
page->mapping and page->index. It does it while holding the page
lock. However there are a few places that in presence of anon pages
are allowed to do rmap walks without the page lock (split_huge_page
and page_referenced_anon). Those places that are doing rmap walks
without taking the page lock first, must be updated to re-check that
the page->mapping didn't change after they obtained the anon_vma
lock. remap_anon_pages takes the anon_vma lock for writing before
altering the page->mapping, so if the page->mapping is still the same
after obtaining the anon_vma lock (without the page lock), the rmap
walks can go ahead safely (and remap_anon_pages will wait them to
complete before proceeding).
remap_anon_pages serializes against itself with the page lock.
All other places taking the anon_vma lock while holding the mmap_sem
for writing, don't need to check if the page->mapping has changed
after taking the anon_vma lock, regardless of the page lock, because
remap_anon_pages holds the mmap_sem for reading.
Overall this looks a fairly small change to the rmap code, notably
less intrusive than the nonlinear vmas created by remap_file_pages.
There's one constraint enforced to allow this simplification: the
source pages passed to remap_anon_pages must be mapped only in one
vma, but this is not a limitation when used to handle userland page
faults with MADV_USERFAULT. The source addresses passed to
remap_anon_pages should be set as VM_DONTCOPY with MADV_DONTFORK to
avoid any risk of the mapcount of the pages increasing, if fork runs
in parallel in another thread, before or while remap_anon_pages runs.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
mm/huge_memory.c | 24 ++++++++++++++++++++----
mm/rmap.c | 9 +++++++++
2 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b402d60..4277ed7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1921,6 +1921,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
{
struct anon_vma *anon_vma;
int ret = 1;
+ struct address_space *mapping;
BUG_ON(is_huge_zero_page(page));
BUG_ON(!PageAnon(page));
@@ -1932,10 +1933,24 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
* page_lock_anon_vma_read except the write lock is taken to serialise
* against parallel split or collapse operations.
*/
- anon_vma = page_get_anon_vma(page);
- if (!anon_vma)
- goto out;
- anon_vma_lock_write(anon_vma);
+ for (;;) {
+ mapping = ACCESS_ONCE(page->mapping);
+ anon_vma = page_get_anon_vma(page);
+ if (!anon_vma)
+ goto out;
+ anon_vma_lock_write(anon_vma);
+ /*
+ * We don't hold the page lock here so
+ * remap_anon_pages_huge_pmd can change the anon_vma
+ * from under us until we obtain the anon_vma
+ * lock. Verify that we obtained the anon_vma lock
+ * before remap_anon_pages did.
+ */
+ if (likely(mapping == ACCESS_ONCE(page->mapping)))
+ break;
+ anon_vma_unlock_write(anon_vma);
+ put_anon_vma(anon_vma);
+ }
ret = 0;
if (!PageCompound(page))
@@ -2460,6 +2475,7 @@ static void collapse_huge_page(struct mm_struct *mm,
* Prevent all access to pagetables with the exception of
* gup_fast later hanlded by the ptep_clear_flush and the VM
* handled by the anon_vma lock + PG_lock.
+ * remap_anon_pages is prevented to race as well by the mmap_sem.
*/
down_write(&mm->mmap_sem);
if (unlikely(khugepaged_test_exit(mm)))
diff --git a/mm/rmap.c b/mm/rmap.c
index 3e8491c..6d875eb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -450,6 +450,7 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)
struct anon_vma *root_anon_vma;
unsigned long anon_mapping;
+repeat:
rcu_read_lock();
anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
@@ -488,6 +489,14 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)
rcu_read_unlock();
anon_vma_lock_read(anon_vma);
+ /* check if remap_anon_pages changed the anon_vma */
+ if (unlikely((unsigned long) ACCESS_ONCE(page->mapping) != anon_mapping)) {
+ anon_vma_unlock_read(anon_vma);
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ goto repeat;
+ }
+
if (atomic_dec_and_test(&anon_vma->refcount)) {
/*
* Oops, we held the last refcount, release the lock
next prev parent reply other threads:[~2014-10-03 18:01 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-03 17:07 [PATCH 00/17] RFC: userfault v2 Andrea Arcangeli
2014-10-03 17:07 ` [PATCH 01/17] mm: gup: add FOLL_TRIED Andrea Arcangeli
2014-10-03 18:15 ` Linus Torvalds
2014-10-03 20:55 ` Paolo Bonzini
2014-10-03 17:07 ` [PATCH 02/17] mm: gup: add get_user_pages_locked and get_user_pages_unlocked Andrea Arcangeli
2014-10-03 17:07 ` [PATCH 03/17] mm: gup: use get_user_pages_unlocked within get_user_pages_fast Andrea Arcangeli
2014-10-03 17:07 ` [PATCH 04/17] mm: gup: make get_user_pages_fast and __get_user_pages_fast latency conscious Andrea Arcangeli
2014-10-03 18:23 ` Linus Torvalds
2014-10-06 14:14 ` Andrea Arcangeli
2014-10-03 17:07 ` [PATCH 05/17] mm: gup: use get_user_pages_fast and get_user_pages_unlocked Andrea Arcangeli
2014-10-03 17:07 ` [PATCH 06/17] kvm: Faults which trigger IO release the mmap_sem Andrea Arcangeli
2014-10-03 17:07 ` [PATCH 07/17] mm: madvise MADV_USERFAULT: prepare vm_flags to allow more than 32bits Andrea Arcangeli
2014-10-07 9:03 ` Kirill A. Shutemov
2014-11-06 20:08 ` Konstantin Khlebnikov
2014-10-03 17:07 ` [PATCH 08/17] mm: madvise MADV_USERFAULT Andrea Arcangeli
2014-10-03 23:13 ` Mike Hommey
2014-10-06 17:24 ` Andrea Arcangeli
2014-10-07 10:36 ` Kirill A. Shutemov
2014-10-07 10:46 ` Dr. David Alan Gilbert
2014-10-07 10:52 ` [Qemu-devel] " Kirill A. Shutemov
2014-10-07 11:01 ` Dr. David Alan Gilbert
2014-10-07 11:30 ` Kirill A. Shutemov
2014-10-07 13:24 ` Andrea Arcangeli
2014-10-07 15:21 ` Kirill A. Shutemov
2014-10-03 17:07 ` [PATCH 09/17] mm: PT lock: export double_pt_lock/unlock Andrea Arcangeli
2014-10-03 17:08 ` Andrea Arcangeli [this message]
2014-10-03 18:31 ` [PATCH 10/17] mm: rmap preparation for remap_anon_pages Linus Torvalds
2014-10-06 8:55 ` Dr. David Alan Gilbert
2014-10-06 16:41 ` Andrea Arcangeli
2014-10-07 12:47 ` Linus Torvalds
2014-10-07 14:19 ` Andrea Arcangeli
2014-10-07 15:52 ` Andrea Arcangeli
2014-10-07 15:54 ` Andy Lutomirski
2014-10-07 16:13 ` Peter Feiner
2014-10-07 16:56 ` Linus Torvalds
2014-10-07 17:07 ` Dr. David Alan Gilbert
2014-10-07 17:14 ` Paolo Bonzini
2014-10-07 17:25 ` Dr. David Alan Gilbert
2014-10-07 11:10 ` [Qemu-devel] " Kirill A. Shutemov
2014-10-07 13:37 ` Andrea Arcangeli
2014-10-03 17:08 ` [PATCH 11/17] mm: swp_entry_swapcount Andrea Arcangeli
2014-10-03 17:08 ` [PATCH 12/17] mm: sys_remap_anon_pages Andrea Arcangeli
2014-10-04 13:13 ` Andi Kleen
2014-10-06 17:00 ` Andrea Arcangeli
2014-10-03 17:08 ` [PATCH 13/17] waitqueue: add nr wake parameter to __wake_up_locked_key Andrea Arcangeli
2014-10-03 17:08 ` [PATCH 14/17] userfaultfd: add new syscall to provide memory externalization Andrea Arcangeli
2014-10-03 17:08 ` [PATCH 15/17] userfaultfd: make userfaultfd_write non blocking Andrea Arcangeli
2014-10-03 17:08 ` [PATCH 16/17] powerpc: add remap_anon_pages and userfaultfd Andrea Arcangeli
2014-10-03 17:08 ` [PATCH 17/17] userfaultfd: implement USERFAULTFD_RANGE_REGISTER|UNREGISTER Andrea Arcangeli
2014-10-27 9:32 ` [PATCH 00/17] RFC: userfault v2 zhanghailiang
2014-10-29 17:46 ` Andrea Arcangeli
2014-10-29 17:56 ` [Qemu-devel] " Peter Maydell
2014-11-21 20:14 ` Andrea Arcangeli
2014-11-21 23:05 ` Peter Maydell
2014-11-25 19:45 ` Andrea Arcangeli
2014-10-30 11:31 ` zhanghailiang
2014-10-30 12:49 ` Dr. David Alan Gilbert
2014-10-31 1:26 ` zhanghailiang
2014-11-19 18:49 ` Andrea Arcangeli
2014-11-20 2:54 ` zhanghailiang
2014-11-20 17:38 ` Andrea Arcangeli
2014-11-21 7:19 ` zhanghailiang
2014-10-31 2:23 ` Peter Feiner
2014-10-31 3:29 ` zhanghailiang
2014-10-31 4:38 ` zhanghailiang
2014-10-31 5:17 ` Andres Lagar-Cavilla
2014-10-31 8:11 ` zhanghailiang
2014-10-31 19:39 ` Peter Feiner
2014-11-01 8:48 ` zhanghailiang
2014-11-20 17:29 ` Andrea Arcangeli
2014-11-12 7:18 ` zhanghailiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1412356087-16115-11-git-send-email-aarcange@redhat.com \
--to=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andreslc@google.com \
--cc=anthony@codemonkey.ws \
--cc=cov@codeaurora.org \
--cc=dave@sr71.net \
--cc=dgilbert@redhat.com \
--cc=dmitry.adamushko@gmail.com \
--cc=drjones@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=keithp@keithp.com \
--cc=kernel-team@android.com \
--cc=kosaki.motohiro@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=mgorman@suse.de \
--cc=mh@glandium.org \
--cc=minchan@kernel.org \
--cc=neilb@suse.de \
--cc=pbonzini@redhat.com \
--cc=peter.huangpeng@huawei.com \
--cc=pfeiner@google.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=riel@redhat.com \
--cc=rlove@google.com \
--cc=sasha.levin@oracle.com \
--cc=stefanha@gmail.com \
--cc=tglek@mozilla.com \
--cc=torvalds@linux-foundation.org \
--cc=walken@google.com \
--cc=wenchaoqemu@gmail.com \
--cc=yamahata@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).