All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Jiri Olsa" <jolsa@redhat.com>,
	"Namhyung Kim" <namhyung@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Hugh Dickins" <hughd@google.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	"David Rientjes" <rientjes@google.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Lance Roy" <ldr709@gmail.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Dave Airlie" <airlied@redhat.com>,
	"Thomas Hellstrom" <thellstrom@vmware.com>,
	"Souptick Joarder" <jrdr.linux@gmail.com>,
	"Mel Gorman" <mgorman@suse.de>, "Jan Kara" <jack@suse.cz>,
	"Mike Kravetz" <mike.kravetz@oracle.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Aaron Lu" <ziqian.lzq@antfin.com>,
	"Omar Sandoval" <osandov@fb.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Joel Fernandes" <joel@joelfernandes.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Alexander Duyck" <alexander.h.duyck@linux.intel.com>,
	"Pavel Tatashin" <pavel.tatashin@microsoft.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Anthony Yznaga" <anthony.yznaga@oracle.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	"Yu Zhao" <yuzhao@google.com>
Subject: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup prematurely
Date: Tue, 24 Sep 2019 17:24:58 -0600	[thread overview]
Message-ID: <20190924232459.214097-3-yuzhao@google.com> (raw)
In-Reply-To: <20190924232459.214097-1-yuzhao@google.com>

We don't want to expose a non-hugetlb page to the fast gup running
on a remote CPU before all local non-atomic ops on the page flags
are visible first.

For an anon page that isn't in swap cache, we need to make sure all
prior non-atomic ops, especially __SetPageSwapBacked() in
page_add_new_anon_rmap(), are ordered before set_pte_at() to prevent
the following race:

	CPU 1				CPU1
	set_pte_at()			get_user_pages_fast()
	  page_add_new_anon_rmap()	  gup_pte_range()
	  __SetPageSwapBacked()		    SetPageReferenced()

This demonstrates a non-fatal scenario. Though haven't been directly
observed, the fatal ones can exist, e.g., PG_lock set by fast gup
caller and then overwritten by __SetPageSwapBacked().

For an anon page that is already in swap cache or a file page, we
don't need smp_wmb() before set_pte_at() because adding to swap or
file cach serves as a valid write barrier. Using non-atomic ops
thereafter is a bug, obviously.

smp_wmb() is added following 11 of total 12 page_add_new_anon_rmap()
call sites, with the only exception being
do_huge_pmd_wp_page_fallback() because of an existing smp_wmb().

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 kernel/events/uprobes.c |  2 ++
 mm/huge_memory.c        |  6 ++++++
 mm/khugepaged.c         |  2 ++
 mm/memory.c             | 10 +++++++++-
 mm/migrate.c            |  2 ++
 mm/swapfile.c           |  6 ++++--
 mm/userfaultfd.c        |  2 ++
 7 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 84fa00497c49..7069785e2e52 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -194,6 +194,8 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 
 	flush_cache_page(vma, addr, pte_pfn(*pvmw.pte));
 	ptep_clear_flush_notify(vma, addr, pvmw.pte);
+	/* commit non-atomic ops before exposing to fast gup */
+	smp_wmb();
 	set_pte_at_notify(mm, addr, pvmw.pte,
 			mk_pte(new_page, vma->vm_page_prot));
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index de1f15969e27..21d271a29d96 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -616,6 +616,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
 		mem_cgroup_commit_charge(page, memcg, false, true);
 		lru_cache_add_active_or_unevictable(page, vma);
 		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
+		/* commit non-atomic ops before exposing to fast gup */
+		smp_wmb();
 		set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
 		add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
 		mm_inc_nr_ptes(vma->vm_mm);
@@ -1276,7 +1278,9 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
 	}
 	kfree(pages);
 
+	/* commit non-atomic ops before exposing to fast gup */
 	smp_wmb(); /* make pte visible before pmd */
+
 	pmd_populate(vma->vm_mm, vmf->pmd, pgtable);
 	page_remove_rmap(page, true);
 	spin_unlock(vmf->ptl);
@@ -1423,6 +1427,8 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 		page_add_new_anon_rmap(new_page, vma, haddr, true);
 		mem_cgroup_commit_charge(new_page, memcg, false, true);
 		lru_cache_add_active_or_unevictable(new_page, vma);
+		/* commit non-atomic ops before exposing to fast gup */
+		smp_wmb();
 		set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
 		update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
 		if (!page) {
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 70ff98e1414d..f2901edce6de 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1074,6 +1074,8 @@ static void collapse_huge_page(struct mm_struct *mm,
 	count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
 	lru_cache_add_active_or_unevictable(new_page, vma);
 	pgtable_trans_huge_deposit(mm, pmd, pgtable);
+	/* commit non-atomic ops before exposing to fast gup */
+	smp_wmb();
 	set_pmd_at(mm, address, pmd, _pmd);
 	update_mmu_cache_pmd(vma, address, pmd);
 	spin_unlock(pmd_ptl);
diff --git a/mm/memory.c b/mm/memory.c
index aa86852d9ec2..6dabbc3cd3b7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2367,6 +2367,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 		 * mmu page tables (such as kvm shadow page tables), we want the
 		 * new page to be mapped directly into the secondary page table.
 		 */
+		/* commit non-atomic ops before exposing to fast gup */
+		smp_wmb();
 		set_pte_at_notify(mm, vmf->address, vmf->pte, entry);
 		update_mmu_cache(vma, vmf->address, vmf->pte);
 		if (old_page) {
@@ -2877,7 +2879,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	flush_icache_page(vma, page);
 	if (pte_swp_soft_dirty(vmf->orig_pte))
 		pte = pte_mksoft_dirty(pte);
-	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
 	arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
 	vmf->orig_pte = pte;
 
@@ -2886,12 +2887,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
+		/* commit non-atomic ops before exposing to fast gup */
+		smp_wmb();
 	} else {
 		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
 		mem_cgroup_commit_charge(page, memcg, true, false);
 		activate_page(page);
 	}
 
+	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
 	swap_free(entry);
 	if (mem_cgroup_swap_full(page) ||
 	    (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
@@ -3034,6 +3038,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	page_add_new_anon_rmap(page, vma, vmf->address, false);
 	mem_cgroup_commit_charge(page, memcg, false, false);
 	lru_cache_add_active_or_unevictable(page, vma);
+	/* commit non-atomic ops before exposing to fast gup */
+	smp_wmb();
 setpte:
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
 
@@ -3297,6 +3303,8 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
+		/* commit non-atomic ops before exposing to fast gup */
+		smp_wmb();
 	} else {
 		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
 		page_add_file_rmap(page, false);
diff --git a/mm/migrate.c b/mm/migrate.c
index 9f4ed4e985c1..943d147ecc3e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2783,6 +2783,8 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
 		lru_cache_add_active_or_unevictable(page, vma);
 	get_page(page);
 
+	/* commit non-atomic ops before exposing to fast gup */
+	smp_wmb();
 	if (flush) {
 		flush_cache_page(vma, addr, pte_pfn(*ptep));
 		ptep_clear_flush_notify(vma, addr, ptep);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index dab43523afdd..5c5547053ee0 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1880,8 +1880,6 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 	dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
 	inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
 	get_page(page);
-	set_pte_at(vma->vm_mm, addr, pte,
-		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
 	if (page == swapcache) {
 		page_add_anon_rmap(page, vma, addr, false);
 		mem_cgroup_commit_charge(page, memcg, true, false);
@@ -1889,7 +1887,11 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 		page_add_new_anon_rmap(page, vma, addr, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
+		/* commit non-atomic ops before exposing to fast gup */
+		smp_wmb();
 	}
+	set_pte_at(vma->vm_mm, addr, pte,
+		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
 	swap_free(entry);
 	/*
 	 * Move the page to the active list so it is not
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index c7ae74ce5ff3..4f92913242a1 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -92,6 +92,8 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
 	mem_cgroup_commit_charge(page, memcg, false, false);
 	lru_cache_add_active_or_unevictable(page, dst_vma);
 
+	/* commit non-atomic ops before exposing to fast gup */
+	smp_wmb();
 	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
 
 	/* No need to invalidate - it was non-present before */
-- 
2.23.0.351.gc4317032e6-goog


  parent reply	other threads:[~2019-09-24 23:25 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-08 22:56 [PATCH] mm: don't expose page to fast gup before it's ready Yu Zhao
2018-01-08 22:56 ` Yu Zhao
2018-01-09  8:46 ` Michal Hocko
2018-01-09  8:46   ` Michal Hocko
2018-01-09 10:10   ` Yu Zhao
2018-01-09 10:10     ` Yu Zhao
2018-01-31 23:07     ` Andrew Morton
2018-01-31 23:07       ` Andrew Morton
2019-05-14 21:25     ` Andrew Morton
2019-05-14 23:07       ` Yu Zhao
2019-09-14  7:05         ` [PATCH v2] mm: don't expose page to fast gup prematurely Yu Zhao
2019-09-24 11:23           ` Kirill A. Shutemov
2019-09-24 22:05             ` Yu Zhao
2019-09-25 12:17               ` Kirill A. Shutemov
2019-09-26  3:58                 ` Yu Zhao
2019-09-24 23:24           ` [PATCH v3 1/4] mm: remove unnecessary smp_wmb() in collapse_huge_page() Yu Zhao
2019-09-24 23:24             ` Yu Zhao
2019-09-24 23:24             ` [PATCH v3 2/4] mm: don't expose hugetlb page to fast gup prematurely Yu Zhao
2019-09-24 23:24               ` Yu Zhao
2019-09-24 23:24             ` Yu Zhao [this message]
2019-09-24 23:24               ` [PATCH v3 3/4] mm: don't expose non-hugetlb " Yu Zhao
2019-09-25  8:25               ` Peter Zijlstra
2019-09-25 22:26                 ` Yu Zhao
2019-09-26 10:20                   ` Kirill A. Shutemov
2019-09-27  3:26                     ` John Hubbard
2019-09-27  5:06                       ` Yu Zhao
2019-10-01 22:31                         ` John Hubbard
2019-10-02  0:00                           ` Yu Zhao
2019-09-27 12:33                       ` Michal Hocko
2019-09-27 18:31                         ` Yu Zhao
2019-09-27 19:31                         ` John Hubbard
2019-09-29 22:47                           ` John Hubbard
2019-09-30  9:20                           ` Jan Kara
2019-09-30 17:57                             ` John Hubbard
2019-10-01  7:10                               ` Jan Kara
2019-10-01  8:36                                 ` Peter Zijlstra
2019-10-01  8:40                                   ` Jan Kara
2019-10-01 18:43                                 ` John Hubbard
2019-10-02  9:24                                   ` Jan Kara
2019-10-02 17:33                                     ` John Hubbard
2019-09-24 23:24             ` [PATCH v3 4/4] mm: remove unnecessary smp_wmb() in __SetPageUptodate() Yu Zhao
2019-09-24 23:24               ` Yu Zhao
2019-09-24 23:50               ` Matthew Wilcox
2019-09-25 22:03                 ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190924232459.214097-3-yuzhao@google.com \
    --to=yuzhao@google.com \
    --cc=aarcange@redhat.com \
    --cc=acme@kernel.org \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jgross@suse.com \
    --cc=joel@joelfernandes.org \
    --cc=jolsa@redhat.com \
    --cc=jrdr.linux@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=ldr709@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=osandov@fb.com \
    --cc=pavel.tatashin@microsoft.com \
    --cc=peterz@infradead.org \
    --cc=rcampbell@nvidia.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thellstrom@vmware.com \
    --cc=vbabka@suse.cz \
    --cc=vpillai@digitalocean.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=ziqian.lzq@antfin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.