[PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-01 18:17 ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Suren Baghdasaryan

We received a report that the copy-on-write issue repored by Jann Horn in
https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
coded in vmsplice.c). I confirmed this and also that the issue was not
reproducible with 5.10 kernel. I tracked the fix to the following patch
introduced in 5.9 which changes the do_wp_page() logic:

09854ba94c6a 'mm: do_wp_page() simplification'

I backported this patch (#2 in the series) along with 2 prerequisite patches
(#1 and #4) that keep the backports clean and two followup fixes to the main
patch (#3 and #5). I had to skip the following fix:

feb889fb40fa 'mm: don't put pinned pages into the swap cache'

because it uses page_maybe_dma_pinned() which does not exists in earlier
kernels. Because pin_user_pages() does not exist there as well, I *think*
we can safely skip this fix on older kernels, but I would appreciate if
someone could confirm that claim.

The patchset cleanly applies over: stable linux-4.14.y, tag: v4.14.228

Note: 4.14 and 4.19 backports are very similar, so while I backported
only to these two versions I think backports for other versions can be
done easily.

Kirill Tkhai (1):
  mm: reuse only-pte-mapped KSM page in do_wp_page()

Linus Torvalds (2):
  mm: do_wp_page() simplification
  mm: fix misplaced unlock_page in do_wp_page()

Nadav Amit (1):
  mm/userfaultfd: fix memory corruption due to writeprotect

Shaohua Li (1):
  userfaultfd: wp: add helper for writeprotect check

 include/linux/ksm.h           |  7 ++++
 include/linux/userfaultfd_k.h | 10 ++++++
 mm/ksm.c                      | 30 ++++++++++++++++--
 mm/memory.c                   | 60 ++++++++++++++++-------------------
 4 files changed, 73 insertions(+), 34 deletions(-)

-- 
2.31.0.291.g576ba9dcdaf-goog

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-01 18:17 ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Suren Baghdasaryan

We received a report that the copy-on-write issue repored by Jann Horn in
https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
coded in vmsplice.c). I confirmed this and also that the issue was not
reproducible with 5.10 kernel. I tracked the fix to the following patch
introduced in 5.9 which changes the do_wp_page() logic:

09854ba94c6a 'mm: do_wp_page() simplification'

I backported this patch (#2 in the series) along with 2 prerequisite patches
(#1 and #4) that keep the backports clean and two followup fixes to the main
patch (#3 and #5). I had to skip the following fix:

feb889fb40fa 'mm: don't put pinned pages into the swap cache'

because it uses page_maybe_dma_pinned() which does not exists in earlier
kernels. Because pin_user_pages() does not exist there as well, I *think*
we can safely skip this fix on older kernels, but I would appreciate if
someone could confirm that claim.

The patchset cleanly applies over: stable linux-4.14.y, tag: v4.14.228

Note: 4.14 and 4.19 backports are very similar, so while I backported
only to these two versions I think backports for other versions can be
done easily.

Kirill Tkhai (1):
  mm: reuse only-pte-mapped KSM page in do_wp_page()

Linus Torvalds (2):
  mm: do_wp_page() simplification
  mm: fix misplaced unlock_page in do_wp_page()

Nadav Amit (1):
  mm/userfaultfd: fix memory corruption due to writeprotect

Shaohua Li (1):
  userfaultfd: wp: add helper for writeprotect check

 include/linux/ksm.h           |  7 ++++
 include/linux/userfaultfd_k.h | 10 ++++++
 mm/ksm.c                      | 30 ++++++++++++++++--
 mm/memory.c                   | 60 ++++++++++++++++-------------------
 4 files changed, 73 insertions(+), 34 deletions(-)

-- 
2.31.0.291.g576ba9dcdaf-goog

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/5] mm: reuse only-pte-mapped KSM page in do_wp_page()
  2021-04-01 18:17 ` Suren Baghdasaryan
@ 2021-04-01 18:17   ` Suren Baghdasaryan
  -1 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Yang Shi, Kirill A. Shutemov,
	Hugh Dickins, Andrea Arcangeli, Christian Koenig,
	Claudio Imbrenda, Rik van Riel, Huang Ying, Minchan Kim,
	Andrew Morton

From: Kirill Tkhai <ktkhai@virtuozzo.com>

Add an optimization for KSM pages almost in the same way that we have
for ordinary anonymous pages.  If there is a write fault in a page,
which is mapped to an only pte, and it is not related to swap cache; the
page may be reused without copying its content.

[ Note that we do not consider PageSwapCache() pages at least for now,
  since we don't want to complicate __get_ksm_page(), which has nice
  optimization based on this (for the migration case). Currenly it is
  spinning on PageSwapCache() pages, waiting for when they have
  unfreezed counters (i.e., for the migration finish). But we don't want
  to make it also spinning on swap cache pages, which we try to reuse,
  since there is not a very high probability to reuse them. So, for now
  we do not consider PageSwapCache() pages at all. ]

So in reuse_ksm_page() we check for 1) PageSwapCache() and 2)
page_stable_node(), to skip a page, which KSM is currently trying to
link to stable tree.  Then we do page_ref_freeze() to prohibit KSM to
merge one more page into the page, we are reusing.  After that, nobody
can refer to the reusing page: KSM skips !PageSwapCache() pages with
zero refcount; and the protection against of all other participants is
the same as for reused ordinary anon pages pte lock, page lock and
mmap_sem.

[akpm@linux-foundation.org: replace BUG_ON()s with WARN_ON()s]
Link: http://lkml.kernel.org/r/154471491016.31352.1168978849911555609.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/ksm.h |  7 +++++++
 mm/ksm.c            | 30 ++++++++++++++++++++++++++++--
 mm/memory.c         | 16 ++++++++++++++--
 3 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 44368b19b27e..def48a2d87aa 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -64,6 +64,8 @@ struct page *ksm_might_need_to_copy(struct page *page,
 
 void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc);
 void ksm_migrate_page(struct page *newpage, struct page *oldpage);
+bool reuse_ksm_page(struct page *page,
+			struct vm_area_struct *vma, unsigned long address);
 
 #else  /* !CONFIG_KSM */
 
@@ -103,6 +105,11 @@ static inline void rmap_walk_ksm(struct page *page,
 static inline void ksm_migrate_page(struct page *newpage, struct page *oldpage)
 {
 }
+static inline bool reuse_ksm_page(struct page *page,
+			struct vm_area_struct *vma, unsigned long address)
+{
+	return false;
+}
 #endif /* CONFIG_MMU */
 #endif /* !CONFIG_KSM */
 
diff --git a/mm/ksm.c b/mm/ksm.c
index 65d4bf52f543..62419735ee9c 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -695,8 +695,9 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
 	 * case this node is no longer referenced, and should be freed;
 	 * however, it might mean that the page is under page_freeze_refs().
 	 * The __remove_mapping() case is easy, again the node is now stale;
-	 * but if page is swapcache in migrate_page_move_mapping(), it might
-	 * still be our page, in which case it's essential to keep the node.
+	 * the same is in reuse_ksm_page() case; but if page is swapcache
+	 * in migrate_page_move_mapping(), it might still be our page,
+	 * in which case it's essential to keep the node.
 	 */
 	while (!get_page_unless_zero(page)) {
 		/*
@@ -2609,6 +2610,31 @@ void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc)
 		goto again;
 }
 
+bool reuse_ksm_page(struct page *page,
+		    struct vm_area_struct *vma,
+		    unsigned long address)
+{
+#ifdef CONFIG_DEBUG_VM
+	if (WARN_ON(is_zero_pfn(page_to_pfn(page))) ||
+			WARN_ON(!page_mapped(page)) ||
+			WARN_ON(!PageLocked(page))) {
+		dump_page(page, "reuse_ksm_page");
+		return false;
+	}
+#endif
+
+	if (PageSwapCache(page) || !page_stable_node(page))
+		return false;
+	/* Prohibit parallel get_ksm_page() */
+	if (!page_ref_freeze(page, 1))
+		return false;
+
+	page_move_anon_rmap(page, vma);
+	page->index = linear_page_index(vma, address);
+	page_ref_unfreeze(page, 1);
+
+	return true;
+}
 #ifdef CONFIG_MIGRATION
 void ksm_migrate_page(struct page *newpage, struct page *oldpage)
 {
diff --git a/mm/memory.c b/mm/memory.c
index 21a0bbb9c21f..6920bfb3f89c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2831,8 +2831,11 @@ static int do_wp_page(struct vm_fault *vmf)
 	 * Take out anonymous pages first, anonymous shared vmas are
 	 * not dirty accountable.
 	 */
-	if (PageAnon(vmf->page) && !PageKsm(vmf->page)) {
+	if (PageAnon(vmf->page)) {
 		int total_map_swapcount;
+		if (PageKsm(vmf->page) && (PageSwapCache(vmf->page) ||
+					   page_count(vmf->page) != 1))
+			goto copy;
 		if (!trylock_page(vmf->page)) {
 			get_page(vmf->page);
 			pte_unmap_unlock(vmf->pte, vmf->ptl);
@@ -2847,6 +2850,15 @@ static int do_wp_page(struct vm_fault *vmf)
 			}
 			put_page(vmf->page);
 		}
+		if (PageKsm(vmf->page)) {
+			bool reused = reuse_ksm_page(vmf->page, vmf->vma,
+						     vmf->address);
+			unlock_page(vmf->page);
+			if (!reused)
+				goto copy;
+			wp_page_reuse(vmf);
+			return VM_FAULT_WRITE;
+		}
 		if (reuse_swap_page(vmf->page, &total_map_swapcount)) {
 			if (total_map_swapcount == 1) {
 				/*
@@ -2867,7 +2879,7 @@ static int do_wp_page(struct vm_fault *vmf)
 					(VM_WRITE|VM_SHARED))) {
 		return wp_page_shared(vmf);
 	}
-
+copy:
 	/*
 	 * Ok, we need to copy. Oh, well..
 	 */
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 1/5] mm: reuse only-pte-mapped KSM page in do_wp_page()
@ 2021-04-01 18:17   ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Yang Shi, Kirill A. Shutemov,
	Hugh Dickins, Andrea Arcangeli, Christian Koenig,
	Claudio Imbrenda, Rik van Riel, Huang Ying, Minchan Kim,
	Andrew Morton

From: Kirill Tkhai <ktkhai@virtuozzo.com>

Add an optimization for KSM pages almost in the same way that we have
for ordinary anonymous pages.  If there is a write fault in a page,
which is mapped to an only pte, and it is not related to swap cache; the
page may be reused without copying its content.

[ Note that we do not consider PageSwapCache() pages at least for now,
  since we don't want to complicate __get_ksm_page(), which has nice
  optimization based on this (for the migration case). Currenly it is
  spinning on PageSwapCache() pages, waiting for when they have
  unfreezed counters (i.e., for the migration finish). But we don't want
  to make it also spinning on swap cache pages, which we try to reuse,
  since there is not a very high probability to reuse them. So, for now
  we do not consider PageSwapCache() pages at all. ]

So in reuse_ksm_page() we check for 1) PageSwapCache() and 2)
page_stable_node(), to skip a page, which KSM is currently trying to
link to stable tree.  Then we do page_ref_freeze() to prohibit KSM to
merge one more page into the page, we are reusing.  After that, nobody
can refer to the reusing page: KSM skips !PageSwapCache() pages with
zero refcount; and the protection against of all other participants is
the same as for reused ordinary anon pages pte lock, page lock and
mmap_sem.

[akpm@linux-foundation.org: replace BUG_ON()s with WARN_ON()s]
Link: http://lkml.kernel.org/r/154471491016.31352.1168978849911555609.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/ksm.h |  7 +++++++
 mm/ksm.c            | 30 ++++++++++++++++++++++++++++--
 mm/memory.c         | 16 ++++++++++++++--
 3 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 44368b19b27e..def48a2d87aa 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -64,6 +64,8 @@ struct page *ksm_might_need_to_copy(struct page *page,
 
 void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc);
 void ksm_migrate_page(struct page *newpage, struct page *oldpage);
+bool reuse_ksm_page(struct page *page,
+			struct vm_area_struct *vma, unsigned long address);
 
 #else  /* !CONFIG_KSM */
 
@@ -103,6 +105,11 @@ static inline void rmap_walk_ksm(struct page *page,
 static inline void ksm_migrate_page(struct page *newpage, struct page *oldpage)
 {
 }
+static inline bool reuse_ksm_page(struct page *page,
+			struct vm_area_struct *vma, unsigned long address)
+{
+	return false;
+}
 #endif /* CONFIG_MMU */
 #endif /* !CONFIG_KSM */
 
diff --git a/mm/ksm.c b/mm/ksm.c
index 65d4bf52f543..62419735ee9c 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -695,8 +695,9 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
 	 * case this node is no longer referenced, and should be freed;
 	 * however, it might mean that the page is under page_freeze_refs().
 	 * The __remove_mapping() case is easy, again the node is now stale;
-	 * but if page is swapcache in migrate_page_move_mapping(), it might
-	 * still be our page, in which case it's essential to keep the node.
+	 * the same is in reuse_ksm_page() case; but if page is swapcache
+	 * in migrate_page_move_mapping(), it might still be our page,
+	 * in which case it's essential to keep the node.
 	 */
 	while (!get_page_unless_zero(page)) {
 		/*
@@ -2609,6 +2610,31 @@ void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc)
 		goto again;
 }
 
+bool reuse_ksm_page(struct page *page,
+		    struct vm_area_struct *vma,
+		    unsigned long address)
+{
+#ifdef CONFIG_DEBUG_VM
+	if (WARN_ON(is_zero_pfn(page_to_pfn(page))) ||
+			WARN_ON(!page_mapped(page)) ||
+			WARN_ON(!PageLocked(page))) {
+		dump_page(page, "reuse_ksm_page");
+		return false;
+	}
+#endif
+
+	if (PageSwapCache(page) || !page_stable_node(page))
+		return false;
+	/* Prohibit parallel get_ksm_page() */
+	if (!page_ref_freeze(page, 1))
+		return false;
+
+	page_move_anon_rmap(page, vma);
+	page->index = linear_page_index(vma, address);
+	page_ref_unfreeze(page, 1);
+
+	return true;
+}
 #ifdef CONFIG_MIGRATION
 void ksm_migrate_page(struct page *newpage, struct page *oldpage)
 {
diff --git a/mm/memory.c b/mm/memory.c
index 21a0bbb9c21f..6920bfb3f89c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2831,8 +2831,11 @@ static int do_wp_page(struct vm_fault *vmf)
 	 * Take out anonymous pages first, anonymous shared vmas are
 	 * not dirty accountable.
 	 */
-	if (PageAnon(vmf->page) && !PageKsm(vmf->page)) {
+	if (PageAnon(vmf->page)) {
 		int total_map_swapcount;
+		if (PageKsm(vmf->page) && (PageSwapCache(vmf->page) ||
+					   page_count(vmf->page) != 1))
+			goto copy;
 		if (!trylock_page(vmf->page)) {
 			get_page(vmf->page);
 			pte_unmap_unlock(vmf->pte, vmf->ptl);
@@ -2847,6 +2850,15 @@ static int do_wp_page(struct vm_fault *vmf)
 			}
 			put_page(vmf->page);
 		}
+		if (PageKsm(vmf->page)) {
+			bool reused = reuse_ksm_page(vmf->page, vmf->vma,
+						     vmf->address);
+			unlock_page(vmf->page);
+			if (!reused)
+				goto copy;
+			wp_page_reuse(vmf);
+			return VM_FAULT_WRITE;
+		}
 		if (reuse_swap_page(vmf->page, &total_map_swapcount)) {
 			if (total_map_swapcount == 1) {
 				/*
@@ -2867,7 +2879,7 @@ static int do_wp_page(struct vm_fault *vmf)
 					(VM_WRITE|VM_SHARED))) {
 		return wp_page_shared(vmf);
 	}
-
+copy:
 	/*
 	 * Ok, we need to copy. Oh, well..
 	 */
-- 
2.31.0.291.g576ba9dcdaf-goog



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/5] mm: do_wp_page() simplification
  2021-04-01 18:17 ` Suren Baghdasaryan
@ 2021-04-01 18:17   ` Suren Baghdasaryan
  -1 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Peter Xu

From: Linus Torvalds <torvalds@linux-foundation.org>

How about we just make sure we're the only possible valid user fo the
page before we bother to reuse it?

Simplify, simplify, simplify.

And get rid of the nasty serialization on the page lock at the same time.

[peterx: add subject prefix]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory.c | 58 ++++++++++++++++-------------------------------------
 1 file changed, 17 insertions(+), 41 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 6920bfb3f89c..e84648d81d6d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2832,49 +2832,25 @@ static int do_wp_page(struct vm_fault *vmf)
 	 * not dirty accountable.
 	 */
 	if (PageAnon(vmf->page)) {
-		int total_map_swapcount;
-		if (PageKsm(vmf->page) && (PageSwapCache(vmf->page) ||
-					   page_count(vmf->page) != 1))
+		struct page *page = vmf->page;
+
+		/* PageKsm() doesn't necessarily raise the page refcount */
+		if (PageKsm(page) || page_count(page) != 1)
+			goto copy;
+		if (!trylock_page(page))
+			goto copy;
+		if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) {
+			unlock_page(page);
 			goto copy;
-		if (!trylock_page(vmf->page)) {
-			get_page(vmf->page);
-			pte_unmap_unlock(vmf->pte, vmf->ptl);
-			lock_page(vmf->page);
-			vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-					vmf->address, &vmf->ptl);
-			if (!pte_same(*vmf->pte, vmf->orig_pte)) {
-				unlock_page(vmf->page);
-				pte_unmap_unlock(vmf->pte, vmf->ptl);
-				put_page(vmf->page);
-				return 0;
-			}
-			put_page(vmf->page);
-		}
-		if (PageKsm(vmf->page)) {
-			bool reused = reuse_ksm_page(vmf->page, vmf->vma,
-						     vmf->address);
-			unlock_page(vmf->page);
-			if (!reused)
-				goto copy;
-			wp_page_reuse(vmf);
-			return VM_FAULT_WRITE;
-		}
-		if (reuse_swap_page(vmf->page, &total_map_swapcount)) {
-			if (total_map_swapcount == 1) {
-				/*
-				 * The page is all ours. Move it to
-				 * our anon_vma so the rmap code will
-				 * not search our parent or siblings.
-				 * Protected against the rmap code by
-				 * the page lock.
-				 */
-				page_move_anon_rmap(vmf->page, vma);
-			}
-			unlock_page(vmf->page);
-			wp_page_reuse(vmf);
-			return VM_FAULT_WRITE;
 		}
-		unlock_page(vmf->page);
+		/*
+		 * Ok, we've got the only map reference, and the only
+		 * page count reference, and the page is locked,
+		 * it's dark out, and we're wearing sunglasses. Hit it.
+		 */
+		wp_page_reuse(vmf);
+		unlock_page(page);
+		return VM_FAULT_WRITE;
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
 		return wp_page_shared(vmf);
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/5] mm: do_wp_page() simplification
@ 2021-04-01 18:17   ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Peter Xu

From: Linus Torvalds <torvalds@linux-foundation.org>

How about we just make sure we're the only possible valid user fo the
page before we bother to reuse it?

Simplify, simplify, simplify.

And get rid of the nasty serialization on the page lock at the same time.

[peterx: add subject prefix]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory.c | 58 ++++++++++++++++-------------------------------------
 1 file changed, 17 insertions(+), 41 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 6920bfb3f89c..e84648d81d6d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2832,49 +2832,25 @@ static int do_wp_page(struct vm_fault *vmf)
 	 * not dirty accountable.
 	 */
 	if (PageAnon(vmf->page)) {
-		int total_map_swapcount;
-		if (PageKsm(vmf->page) && (PageSwapCache(vmf->page) ||
-					   page_count(vmf->page) != 1))
+		struct page *page = vmf->page;
+
+		/* PageKsm() doesn't necessarily raise the page refcount */
+		if (PageKsm(page) || page_count(page) != 1)
+			goto copy;
+		if (!trylock_page(page))
+			goto copy;
+		if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) {
+			unlock_page(page);
 			goto copy;
-		if (!trylock_page(vmf->page)) {
-			get_page(vmf->page);
-			pte_unmap_unlock(vmf->pte, vmf->ptl);
-			lock_page(vmf->page);
-			vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-					vmf->address, &vmf->ptl);
-			if (!pte_same(*vmf->pte, vmf->orig_pte)) {
-				unlock_page(vmf->page);
-				pte_unmap_unlock(vmf->pte, vmf->ptl);
-				put_page(vmf->page);
-				return 0;
-			}
-			put_page(vmf->page);
-		}
-		if (PageKsm(vmf->page)) {
-			bool reused = reuse_ksm_page(vmf->page, vmf->vma,
-						     vmf->address);
-			unlock_page(vmf->page);
-			if (!reused)
-				goto copy;
-			wp_page_reuse(vmf);
-			return VM_FAULT_WRITE;
-		}
-		if (reuse_swap_page(vmf->page, &total_map_swapcount)) {
-			if (total_map_swapcount == 1) {
-				/*
-				 * The page is all ours. Move it to
-				 * our anon_vma so the rmap code will
-				 * not search our parent or siblings.
-				 * Protected against the rmap code by
-				 * the page lock.
-				 */
-				page_move_anon_rmap(vmf->page, vma);
-			}
-			unlock_page(vmf->page);
-			wp_page_reuse(vmf);
-			return VM_FAULT_WRITE;
 		}
-		unlock_page(vmf->page);
+		/*
+		 * Ok, we've got the only map reference, and the only
+		 * page count reference, and the page is locked,
+		 * it's dark out, and we're wearing sunglasses. Hit it.
+		 */
+		wp_page_reuse(vmf);
+		unlock_page(page);
+		return VM_FAULT_WRITE;
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
 		return wp_page_shared(vmf);
-- 
2.31.0.291.g576ba9dcdaf-goog



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/5] mm: fix misplaced unlock_page in do_wp_page()
  2021-04-01 18:17 ` Suren Baghdasaryan
@ 2021-04-01 18:17   ` Suren Baghdasaryan
  -1 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Qian Cai, Alex Shi, Gerald Schaefer

From: Linus Torvalds <torvalds@linux-foundation.org>

Commit 09854ba94c6a ("mm: do_wp_page() simplification") reorganized all
the code around the page re-use vs copy, but in the process also moved
the final unlock_page() around to after the wp_page_reuse() call.

That normally doesn't matter - but it means that the unlock_page() is
now done after releasing the page table lock.  Again, not a big deal,
you'd think.

But it turns out that it's very wrong indeed, because once we've
released the page table lock, we've basically lost our only reference to
the page - the page tables - and it could now be free'd at any time.  We
do hold the mmap_sem, so no actual unmap() can happen, but madvise can
come in and a MADV_DONTNEED will zap the page range - and free the page.

So now the page may be free'd just as we're unlocking it, which in turn
will usually trigger a "Bad page state" error in the freeing path.  To
make matters more confusing, by the time the debug code prints out the
page state, the unlock has typically completed and everything looks fine
again.

This all doesn't happen in any normal situations, but it does trigger
with the dirtyc0w_child LTP test.  And it seems to trigger much more
easily (but not expclusively) on s390 than elsewhere, probably because
s390 doesn't do the "batch pages up for freeing after the TLB flush"
that gives the unlock_page() more time to complete and makes the race
harder to hit.

Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Link: https://lore.kernel.org/lkml/a46e9bbef2ed4e17778f5615e818526ef848d791.camel@redhat.com/
Link: https://lore.kernel.org/linux-mm/c41149a8-211e-390b-af1d-d5eee690fecb@linux.alibaba.com/
Reported-by: Qian Cai <cai@redhat.com>
Reported-by: Alex Shi <alex.shi@linux.alibaba.com>
Bisected-and-analyzed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index e84648d81d6d..14470ceaf3f2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,8 +2848,8 @@ static int do_wp_page(struct vm_fault *vmf)
 		 * page count reference, and the page is locked,
 		 * it's dark out, and we're wearing sunglasses. Hit it.
 		 */
-		wp_page_reuse(vmf);
 		unlock_page(page);
+		wp_page_reuse(vmf);
 		return VM_FAULT_WRITE;
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
-- 
2.31.0.291.g576ba9dcdaf-goog

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/5] mm: fix misplaced unlock_page in do_wp_page()
@ 2021-04-01 18:17   ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Qian Cai, Alex Shi, Gerald Schaefer

From: Linus Torvalds <torvalds@linux-foundation.org>

Commit 09854ba94c6a ("mm: do_wp_page() simplification") reorganized all
the code around the page re-use vs copy, but in the process also moved
the final unlock_page() around to after the wp_page_reuse() call.

That normally doesn't matter - but it means that the unlock_page() is
now done after releasing the page table lock.  Again, not a big deal,
you'd think.

But it turns out that it's very wrong indeed, because once we've
released the page table lock, we've basically lost our only reference to
the page - the page tables - and it could now be free'd at any time.  We
do hold the mmap_sem, so no actual unmap() can happen, but madvise can
come in and a MADV_DONTNEED will zap the page range - and free the page.

So now the page may be free'd just as we're unlocking it, which in turn
will usually trigger a "Bad page state" error in the freeing path.  To
make matters more confusing, by the time the debug code prints out the
page state, the unlock has typically completed and everything looks fine
again.

This all doesn't happen in any normal situations, but it does trigger
with the dirtyc0w_child LTP test.  And it seems to trigger much more
easily (but not expclusively) on s390 than elsewhere, probably because
s390 doesn't do the "batch pages up for freeing after the TLB flush"
that gives the unlock_page() more time to complete and makes the race
harder to hit.

Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Link: https://lore.kernel.org/lkml/a46e9bbef2ed4e17778f5615e818526ef848d791.camel@redhat.com/
Link: https://lore.kernel.org/linux-mm/c41149a8-211e-390b-af1d-d5eee690fecb@linux.alibaba.com/
Reported-by: Qian Cai <cai@redhat.com>
Reported-by: Alex Shi <alex.shi@linux.alibaba.com>
Bisected-and-analyzed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index e84648d81d6d..14470ceaf3f2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,8 +2848,8 @@ static int do_wp_page(struct vm_fault *vmf)
 		 * page count reference, and the page is locked,
 		 * it's dark out, and we're wearing sunglasses. Hit it.
 		 */
-		wp_page_reuse(vmf);
 		unlock_page(page);
+		wp_page_reuse(vmf);
 		return VM_FAULT_WRITE;
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
-- 
2.31.0.291.g576ba9dcdaf-goog

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 4/5] userfaultfd: wp: add helper for writeprotect check
  2021-04-01 18:17 ` Suren Baghdasaryan
@ 2021-04-01 18:17   ` Suren Baghdasaryan
  -1 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Andrea Arcangeli, Peter Xu,
	Andrew Morton, Jerome Glisse, Mike Rapoport, Rik van Riel,
	Kirill A . Shutemov, Mel Gorman, Hugh Dickins, Johannes Weiner,
	Bobby Powers, Brian Geffon, David Hildenbrand, Denis Plotnikov,
	Dr . David Alan Gilbert, Martin Cracauer, Marty McFadden,
	Maya Gokhale, Mike Kravetz, Pavel Emelyanov

From: Shaohua Li <shli@fb.com>

Patch series "userfaultfd: write protection support", v6.

Overview
========

The uffd-wp work was initialized by Shaohua Li [1], and later continued by
Andrea [2].  This series is based upon Andrea's latest userfaultfd tree,
and it is a continuous works from both Shaohua and Andrea.  Many of the
follow up ideas come from Andrea too.

Besides the old MISSING register mode of userfaultfd, the new uffd-wp
support provides another alternative register mode called
UFFDIO_REGISTER_MODE_WP that can be used to listen to not only missing
page faults but also write protection page faults, or even they can be
registered together.  At the same time, the new feature also provides a
new userfaultfd ioctl called UFFDIO_WRITEPROTECT which allows the
userspace to write protect a range or memory or fixup write permission of
faulted pages.

Please refer to the document patch "userfaultfd: wp:
UFFDIO_REGISTER_MODE_WP documentation update" for more information on the
new interface and what it can do.

The major workflow of an uffd-wp program should be:

  1. Register a memory region with WP mode using UFFDIO_REGISTER_MODE_WP

  2. Write protect part of the whole registered region using
     UFFDIO_WRITEPROTECT, passing in UFFDIO_WRITEPROTECT_MODE_WP to
     show that we want to write protect the range.

  3. Start a working thread that modifies the protected pages,
     meanwhile listening to UFFD messages.

  4. When a write is detected upon the protected range, page fault
     happens, a UFFD message will be generated and reported to the
     page fault handling thread

  5. The page fault handler thread resolves the page fault using the
     new UFFDIO_WRITEPROTECT ioctl, but this time passing in
     !UFFDIO_WRITEPROTECT_MODE_WP instead showing that we want to
     recover the write permission.  Before this operation, the fault
     handler thread can do anything it wants, e.g., dumps the page to
     a persistent storage.

  6. The worker thread will continue running with the correctly
     applied write permission from step 5.

Currently there are already two projects that are based on this new
userfaultfd feature.

QEMU Live Snapshot: The project provides a way to allow the QEMU
                    hypervisor to take snapshot of VMs without
                    stopping the VM [3].

LLNL umap library:  The project provides a mmap-like interface and
                    "allow to have an application specific buffer of
                    pages cached from a large file, i.e. out-of-core
                    execution using memory map" [4][5].

Before posting the patchset, this series was smoke tested against QEMU
live snapshot and the LLNL umap library (by doing parallel quicksort using
128 sorting threads + 80 uffd servicing threads).  My sincere thanks to
Marty Mcfadden and Denis Plotnikov for the help along the way.

TODO
====

- hugetlbfs/shmem support
- performance
- more architectures
- cooperate with mprotect()-allowed processes (???)
- ...

References
==========

[1] https://lwn.net/Articles/666187/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/log/?h=userfault
[3] https://github.com/denis-plotnikov/qemu/commits/background-snapshot-kvm
[4] https://github.com/LLNL/umap
[5] https://llnl-umap.readthedocs.io/en/develop/
[6] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=b245ecf6cf59156966f3da6e6b674f6695a5ffa5
[7] https://lkml.org/lkml/2018/11/21/370
[8] https://lkml.org/lkml/2018/12/30/64

This patch (of 19):

Add helper for writeprotect check. Will use it later.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220163112.11409-2-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/userfaultfd_k.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index f2f3b68ba910..07878cd475f2 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -48,6 +48,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_UFFD_MISSING;
 }
 
+static inline bool userfaultfd_wp(struct vm_area_struct *vma)
+{
+	return vma->vm_flags & VM_UFFD_WP;
+}
+
 static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 {
 	return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP);
@@ -91,6 +96,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma)
 	return false;
 }
 
+static inline bool userfaultfd_wp(struct vm_area_struct *vma)
+{
+	return false;
+}
+
 static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 {
 	return false;
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 4/5] userfaultfd: wp: add helper for writeprotect check
@ 2021-04-01 18:17   ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Andrea Arcangeli, Peter Xu,
	Andrew Morton, Jerome Glisse, Mike Rapoport, Rik van Riel,
	Kirill A . Shutemov, Mel Gorman, Hugh Dickins, Johannes Weiner,
	Bobby Powers, Brian Geffon, David Hildenbrand, Denis Plotnikov,
	Dr . David Alan Gilbert, Martin Cracauer, Marty McFadden,
	Maya Gokhale, Mike Kravetz, Pavel Emelyanov

From: Shaohua Li <shli@fb.com>

Patch series "userfaultfd: write protection support", v6.

Overview
========

The uffd-wp work was initialized by Shaohua Li [1], and later continued by
Andrea [2].  This series is based upon Andrea's latest userfaultfd tree,
and it is a continuous works from both Shaohua and Andrea.  Many of the
follow up ideas come from Andrea too.

Besides the old MISSING register mode of userfaultfd, the new uffd-wp
support provides another alternative register mode called
UFFDIO_REGISTER_MODE_WP that can be used to listen to not only missing
page faults but also write protection page faults, or even they can be
registered together.  At the same time, the new feature also provides a
new userfaultfd ioctl called UFFDIO_WRITEPROTECT which allows the
userspace to write protect a range or memory or fixup write permission of
faulted pages.

Please refer to the document patch "userfaultfd: wp:
UFFDIO_REGISTER_MODE_WP documentation update" for more information on the
new interface and what it can do.

The major workflow of an uffd-wp program should be:

  1. Register a memory region with WP mode using UFFDIO_REGISTER_MODE_WP

  2. Write protect part of the whole registered region using
     UFFDIO_WRITEPROTECT, passing in UFFDIO_WRITEPROTECT_MODE_WP to
     show that we want to write protect the range.

  3. Start a working thread that modifies the protected pages,
     meanwhile listening to UFFD messages.

  4. When a write is detected upon the protected range, page fault
     happens, a UFFD message will be generated and reported to the
     page fault handling thread

  5. The page fault handler thread resolves the page fault using the
     new UFFDIO_WRITEPROTECT ioctl, but this time passing in
     !UFFDIO_WRITEPROTECT_MODE_WP instead showing that we want to
     recover the write permission.  Before this operation, the fault
     handler thread can do anything it wants, e.g., dumps the page to
     a persistent storage.

  6. The worker thread will continue running with the correctly
     applied write permission from step 5.

Currently there are already two projects that are based on this new
userfaultfd feature.

QEMU Live Snapshot: The project provides a way to allow the QEMU
                    hypervisor to take snapshot of VMs without
                    stopping the VM [3].

LLNL umap library:  The project provides a mmap-like interface and
                    "allow to have an application specific buffer of
                    pages cached from a large file, i.e. out-of-core
                    execution using memory map" [4][5].

Before posting the patchset, this series was smoke tested against QEMU
live snapshot and the LLNL umap library (by doing parallel quicksort using
128 sorting threads + 80 uffd servicing threads).  My sincere thanks to
Marty Mcfadden and Denis Plotnikov for the help along the way.

TODO
====

- hugetlbfs/shmem support
- performance
- more architectures
- cooperate with mprotect()-allowed processes (???)
- ...

References
==========

[1] https://lwn.net/Articles/666187/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/log/?h=userfault
[3] https://github.com/denis-plotnikov/qemu/commits/background-snapshot-kvm
[4] https://github.com/LLNL/umap
[5] https://llnl-umap.readthedocs.io/en/develop/
[6] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=b245ecf6cf59156966f3da6e6b674f6695a5ffa5
[7] https://lkml.org/lkml/2018/11/21/370
[8] https://lkml.org/lkml/2018/12/30/64

This patch (of 19):

Add helper for writeprotect check. Will use it later.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220163112.11409-2-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/userfaultfd_k.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index f2f3b68ba910..07878cd475f2 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -48,6 +48,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_UFFD_MISSING;
 }
 
+static inline bool userfaultfd_wp(struct vm_area_struct *vma)
+{
+	return vma->vm_flags & VM_UFFD_WP;
+}
+
 static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 {
 	return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP);
@@ -91,6 +96,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma)
 	return false;
 }
 
+static inline bool userfaultfd_wp(struct vm_area_struct *vma)
+{
+	return false;
+}
+
 static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 {
 	return false;
-- 
2.31.0.291.g576ba9dcdaf-goog



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 5/5] mm/userfaultfd: fix memory corruption due to writeprotect
  2021-04-01 18:17 ` Suren Baghdasaryan
@ 2021-04-01 18:17   ` Suren Baghdasaryan
  -1 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Yu Zhao, Peter Xu, Andrea Arcangeli,
	Andy Lutomirski, Pavel Emelyanov, Mike Kravetz, Mike Rapoport,
	Minchan Kim, Will Deacon, Peter Zijlstra, Andrew Morton

From: Nadav Amit <namit@vmware.com>

Userfaultfd self-test fails occasionally, indicating a memory corruption.

Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy().  Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.

To make matters worse, memory-unprotection using userfaultfd also poses a
problem.  Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set.  Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.

The scenario that happens in selftests/vm/userfaultfd is as follows:

cpu0				cpu1			cpu2
----				----			----
							[ Writable PTE
							  cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()

change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
				[ page-fault ]
				...
				wp_page_copy()
				 cow_user_page()
				  [ copy page ]
							[ write to old
							  page ]
				...
				 set_pte_at_notify()

A similar scenario can happen:

cpu0		cpu1		cpu2		cpu3
----		----		----		----
						[ Writable PTE
				  		  cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
		userfaultfd_writeprotect()
		[ write-unprotect ]
		[ deferred TLB flush]
				[ page-fault ]
				wp_page_copy()
				 cow_user_page()
				 [ copy page ]
				 ...		[ write to page ]
				set_pte_at_notify()

This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit").  Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.

To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.

Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.

Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit <namit@vmware.com>
Suggested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 14470ceaf3f2..3f33651a2a39 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2810,6 +2810,14 @@ static int do_wp_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;

+	/*
+	 * Userfaultfd write-protect can defer flushes. Ensure the TLB
+	 * is flushed in this case before copying.
+	 */
+	if (unlikely(userfaultfd_wp(vmf->vma) &&
+		     mm_tlb_flush_pending(vmf->vma->vm_mm)))
+		flush_tlb_page(vmf->vma, vmf->address);
+
 	vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
 	if (!vmf->page) {
 		/*
-- 
2.31.0.291.g576ba9dcdaf-goog

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 5/5] mm/userfaultfd: fix memory corruption due to writeprotect
@ 2021-04-01 18:17   ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 18:17 UTC (permalink / raw)
  To: stable
  Cc: gregkh, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Yu Zhao, Peter Xu, Andrea Arcangeli,
	Andy Lutomirski, Pavel Emelyanov, Mike Kravetz, Mike Rapoport,
	Minchan Kim, Will Deacon, Peter Zijlstra, Andrew Morton

From: Nadav Amit <namit@vmware.com>

Userfaultfd self-test fails occasionally, indicating a memory corruption.

Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy().  Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.

To make matters worse, memory-unprotection using userfaultfd also poses a
problem.  Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set.  Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.

The scenario that happens in selftests/vm/userfaultfd is as follows:

cpu0				cpu1			cpu2
----				----			----
							[ Writable PTE
							  cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()

change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
				[ page-fault ]
				...
				wp_page_copy()
				 cow_user_page()
				  [ copy page ]
							[ write to old
							  page ]
				...
				 set_pte_at_notify()

A similar scenario can happen:

cpu0		cpu1		cpu2		cpu3
----		----		----		----
						[ Writable PTE
				  		  cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
		userfaultfd_writeprotect()
		[ write-unprotect ]
		[ deferred TLB flush]
				[ page-fault ]
				wp_page_copy()
				 cow_user_page()
				 [ copy page ]
				 ...		[ write to page ]
				set_pte_at_notify()

This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit").  Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.

To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.

Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.

Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit <namit@vmware.com>
Suggested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 14470ceaf3f2..3f33651a2a39 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2810,6 +2810,14 @@ static int do_wp_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;

+	/*
+	 * Userfaultfd write-protect can defer flushes. Ensure the TLB
+	 * is flushed in this case before copying.
+	 */
+	if (unlikely(userfaultfd_wp(vmf->vma) &&
+		     mm_tlb_flush_pending(vmf->vma->vm_mm)))
+		flush_tlb_page(vmf->vma, vmf->address);
+
 	vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
 	if (!vmf->page) {
 		/*
-- 
2.31.0.291.g576ba9dcdaf-goog

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-01 18:17 ` Suren Baghdasaryan
@ 2021-04-01 18:59   ` Linus Torvalds
  -1 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2021-04-01 18:59 UTC (permalink / raw)
  To: Suren Baghdasaryan, Peter Xu
  Cc: stable, Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team

On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> We received a report that the copy-on-write issue repored by Jann Horn in
> https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
> reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
> coded in vmsplice.c).

Gaah.

> I confirmed this and also that the issue was not
> reproducible with 5.10 kernel. I tracked the fix to the following patch
> introduced in 5.9 which changes the do_wp_page() logic:
>
> 09854ba94c6a 'mm: do_wp_page() simplification'

The problem here is that there's a _lot_ more patches than the few you
found that fixed various other cases (THP etc).

> I backported this patch (#2 in the series) along with 2 prerequisite patches
> (#1 and #4) that keep the backports clean and two followup fixes to the main
> patch (#3 and #5). I had to skip the following fix:
>
> feb889fb40fa 'mm: don't put pinned pages into the swap cache'
>
> because it uses page_maybe_dma_pinned() which does not exists in earlier
> kernels. Because pin_user_pages() does not exist there as well, I *think*
> we can safely skip this fix on older kernels, but I would appreciate if
> someone could confirm that claim.

Hmm. I think this means that swap activity can now break the
connection to a GUP page (the whole pre-pinning model), but it
probably isn't a new problem for 4.9/4.19.

I suspect the test there should be something like

        /* Single mapper, more references than us and the map? */
        if (page_mapcount(page) == 1 && page_count(page) > 2)
                goto keep_locked;

in the pre-pinning days.

But I really think that there are a number of other commits you're
missing too, because we had a whole series for THP fixes for the same
exact issue.

Added Peter Xu to the cc, because he probably tracked those issues
better than I did.

So NAK on this for now, I think this limited patch-set likely
introduces more problems than it fixes.

        Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-01 18:59   ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2021-04-01 18:59 UTC (permalink / raw)
  To: Suren Baghdasaryan, Peter Xu
  Cc: stable, Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team

On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> We received a report that the copy-on-write issue repored by Jann Horn in
> https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
> reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
> coded in vmsplice.c).

Gaah.

> I confirmed this and also that the issue was not
> reproducible with 5.10 kernel. I tracked the fix to the following patch
> introduced in 5.9 which changes the do_wp_page() logic:
>
> 09854ba94c6a 'mm: do_wp_page() simplification'

The problem here is that there's a _lot_ more patches than the few you
found that fixed various other cases (THP etc).

> I backported this patch (#2 in the series) along with 2 prerequisite patches
> (#1 and #4) that keep the backports clean and two followup fixes to the main
> patch (#3 and #5). I had to skip the following fix:
>
> feb889fb40fa 'mm: don't put pinned pages into the swap cache'
>
> because it uses page_maybe_dma_pinned() which does not exists in earlier
> kernels. Because pin_user_pages() does not exist there as well, I *think*
> we can safely skip this fix on older kernels, but I would appreciate if
> someone could confirm that claim.

Hmm. I think this means that swap activity can now break the
connection to a GUP page (the whole pre-pinning model), but it
probably isn't a new problem for 4.9/4.19.

I suspect the test there should be something like

        /* Single mapper, more references than us and the map? */
        if (page_mapcount(page) == 1 && page_count(page) > 2)
                goto keep_locked;

in the pre-pinning days.

But I really think that there are a number of other commits you're
missing too, because we had a whole series for THP fixes for the same
exact issue.

Added Peter Xu to the cc, because he probably tracked those issues
better than I did.

So NAK on this for now, I think this limited patch-set likely
introduces more problems than it fixes.

        Linus


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/5] mm: reuse only-pte-mapped KSM page in do_wp_page()
  2021-04-01 18:17   ` Suren Baghdasaryan
  (?)
@ 2021-04-01 19:38   ` Greg KH
  2021-04-01 19:47       ` Suren Baghdasaryan
  -1 siblings, 1 reply; 45+ messages in thread
From: Greg KH @ 2021-04-01 19:38 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: stable, jannh, ktkhai, torvalds, shli, namit, linux-mm,
	linux-kernel, kernel-team, Yang Shi, Kirill A. Shutemov,
	Hugh Dickins, Andrea Arcangeli, Christian Koenig,
	Claudio Imbrenda, Rik van Riel, Huang Ying, Minchan Kim,
	Andrew Morton

On Thu, Apr 01, 2021 at 11:17:37AM -0700, Suren Baghdasaryan wrote:
> From: Kirill Tkhai <ktkhai@virtuozzo.com>
> 
> Add an optimization for KSM pages almost in the same way that we have
> for ordinary anonymous pages.  If there is a write fault in a page,
> which is mapped to an only pte, and it is not related to swap cache; the
> page may be reused without copying its content.
> 
> [ Note that we do not consider PageSwapCache() pages at least for now,
>   since we don't want to complicate __get_ksm_page(), which has nice
>   optimization based on this (for the migration case). Currenly it is
>   spinning on PageSwapCache() pages, waiting for when they have
>   unfreezed counters (i.e., for the migration finish). But we don't want
>   to make it also spinning on swap cache pages, which we try to reuse,
>   since there is not a very high probability to reuse them. So, for now
>   we do not consider PageSwapCache() pages at all. ]
> 
> So in reuse_ksm_page() we check for 1) PageSwapCache() and 2)
> page_stable_node(), to skip a page, which KSM is currently trying to
> link to stable tree.  Then we do page_ref_freeze() to prohibit KSM to
> merge one more page into the page, we are reusing.  After that, nobody
> can refer to the reusing page: KSM skips !PageSwapCache() pages with
> zero refcount; and the protection against of all other participants is
> the same as for reused ordinary anon pages pte lock, page lock and
> mmap_sem.
> 
> [akpm@linux-foundation.org: replace BUG_ON()s with WARN_ON()s]
> Link: http://lkml.kernel.org/r/154471491016.31352.1168978849911555609.stgit@localhost.localdomain
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>  include/linux/ksm.h |  7 +++++++
>  mm/ksm.c            | 30 ++++++++++++++++++++++++++++--
>  mm/memory.c         | 16 ++++++++++++++--
>  3 files changed, 49 insertions(+), 4 deletions(-)

You forgot to put the git commit id of the upstream commit in here
somewhere so we can properly reference it and track it.

When/if you resend this, please add it to all of the commits.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-01 18:59   ` Linus Torvalds
@ 2021-04-01 19:43     ` Suren Baghdasaryan
  -1 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 19:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, stable, Greg Kroah-Hartman, Jann Horn, Kirill Tkhai,
	Shaohua Li, Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team

On Thu, Apr 1, 2021 at 11:59 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > We received a report that the copy-on-write issue repored by Jann Horn in
> > https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
> > reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
> > coded in vmsplice.c).
>
> Gaah.
>
> > I confirmed this and also that the issue was not
> > reproducible with 5.10 kernel. I tracked the fix to the following patch
> > introduced in 5.9 which changes the do_wp_page() logic:
> >
> > 09854ba94c6a 'mm: do_wp_page() simplification'
>
> The problem here is that there's a _lot_ more patches than the few you
> found that fixed various other cases (THP etc).
>
> > I backported this patch (#2 in the series) along with 2 prerequisite patches
> > (#1 and #4) that keep the backports clean and two followup fixes to the main
> > patch (#3 and #5). I had to skip the following fix:
> >
> > feb889fb40fa 'mm: don't put pinned pages into the swap cache'
> >
> > because it uses page_maybe_dma_pinned() which does not exists in earlier
> > kernels. Because pin_user_pages() does not exist there as well, I *think*
> > we can safely skip this fix on older kernels, but I would appreciate if
> > someone could confirm that claim.
>
> Hmm. I think this means that swap activity can now break the
> connection to a GUP page (the whole pre-pinning model), but it
> probably isn't a new problem for 4.9/4.19.
>
> I suspect the test there should be something like
>
>         /* Single mapper, more references than us and the map? */
>         if (page_mapcount(page) == 1 && page_count(page) > 2)
>                 goto keep_locked;
>
> in the pre-pinning days.
>
> But I really think that there are a number of other commits you're
> missing too, because we had a whole series for THP fixes for the same
> exact issue.
>
> Added Peter Xu to the cc, because he probably tracked those issues
> better than I did.
>
> So NAK on this for now, I think this limited patch-set likely
> introduces more problems than it fixes.

Thanks for confirming my worries. I'll be happy to add additional
backports if Peter can point me to them.
Thanks,
Suren.

>
>         Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-01 19:43     ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 19:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, stable, Greg Kroah-Hartman, Jann Horn, Kirill Tkhai,
	Shaohua Li, Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team

On Thu, Apr 1, 2021 at 11:59 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > We received a report that the copy-on-write issue repored by Jann Horn in
> > https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
> > reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
> > coded in vmsplice.c).
>
> Gaah.
>
> > I confirmed this and also that the issue was not
> > reproducible with 5.10 kernel. I tracked the fix to the following patch
> > introduced in 5.9 which changes the do_wp_page() logic:
> >
> > 09854ba94c6a 'mm: do_wp_page() simplification'
>
> The problem here is that there's a _lot_ more patches than the few you
> found that fixed various other cases (THP etc).
>
> > I backported this patch (#2 in the series) along with 2 prerequisite patches
> > (#1 and #4) that keep the backports clean and two followup fixes to the main
> > patch (#3 and #5). I had to skip the following fix:
> >
> > feb889fb40fa 'mm: don't put pinned pages into the swap cache'
> >
> > because it uses page_maybe_dma_pinned() which does not exists in earlier
> > kernels. Because pin_user_pages() does not exist there as well, I *think*
> > we can safely skip this fix on older kernels, but I would appreciate if
> > someone could confirm that claim.
>
> Hmm. I think this means that swap activity can now break the
> connection to a GUP page (the whole pre-pinning model), but it
> probably isn't a new problem for 4.9/4.19.
>
> I suspect the test there should be something like
>
>         /* Single mapper, more references than us and the map? */
>         if (page_mapcount(page) == 1 && page_count(page) > 2)
>                 goto keep_locked;
>
> in the pre-pinning days.
>
> But I really think that there are a number of other commits you're
> missing too, because we had a whole series for THP fixes for the same
> exact issue.
>
> Added Peter Xu to the cc, because he probably tracked those issues
> better than I did.
>
> So NAK on this for now, I think this limited patch-set likely
> introduces more problems than it fixes.

Thanks for confirming my worries. I'll be happy to add additional
backports if Peter can point me to them.
Thanks,
Suren.

>
>         Linus


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/5] mm: reuse only-pte-mapped KSM page in do_wp_page()
  2021-04-01 19:38   ` Greg KH
@ 2021-04-01 19:47       ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 19:47 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, Jann Horn, Kirill Tkhai, Linus Torvalds, Shaohua Li,
	Nadav Amit, linux-mm, LKML, kernel-team, Yang Shi,
	Kirill A. Shutemov, Hugh Dickins, Andrea Arcangeli,
	Christian Koenig, Claudio Imbrenda, Rik van Riel, Huang Ying,
	Minchan Kim, Andrew Morton

On Thu, Apr 1, 2021 at 12:38 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Apr 01, 2021 at 11:17:37AM -0700, Suren Baghdasaryan wrote:
> > From: Kirill Tkhai <ktkhai@virtuozzo.com>
> >
> > Add an optimization for KSM pages almost in the same way that we have
> > for ordinary anonymous pages.  If there is a write fault in a page,
> > which is mapped to an only pte, and it is not related to swap cache; the
> > page may be reused without copying its content.
> >
> > [ Note that we do not consider PageSwapCache() pages at least for now,
> >   since we don't want to complicate __get_ksm_page(), which has nice
> >   optimization based on this (for the migration case). Currenly it is
> >   spinning on PageSwapCache() pages, waiting for when they have
> >   unfreezed counters (i.e., for the migration finish). But we don't want
> >   to make it also spinning on swap cache pages, which we try to reuse,
> >   since there is not a very high probability to reuse them. So, for now
> >   we do not consider PageSwapCache() pages at all. ]
> >
> > So in reuse_ksm_page() we check for 1) PageSwapCache() and 2)
> > page_stable_node(), to skip a page, which KSM is currently trying to
> > link to stable tree.  Then we do page_ref_freeze() to prohibit KSM to
> > merge one more page into the page, we are reusing.  After that, nobody
> > can refer to the reusing page: KSM skips !PageSwapCache() pages with
> > zero refcount; and the protection against of all other participants is
> > the same as for reused ordinary anon pages pte lock, page lock and
> > mmap_sem.
> >
> > [akpm@linux-foundation.org: replace BUG_ON()s with WARN_ON()s]
> > Link: http://lkml.kernel.org/r/154471491016.31352.1168978849911555609.stgit@localhost.localdomain
> > Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> > Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
> > Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Christian Koenig <christian.koenig@amd.com>
> > Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
> > Cc: Rik van Riel <riel@surriel.com>
> > Cc: Huang Ying <ying.huang@intel.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > ---
> >  include/linux/ksm.h |  7 +++++++
> >  mm/ksm.c            | 30 ++++++++++++++++++++++++++++--
> >  mm/memory.c         | 16 ++++++++++++++--
> >  3 files changed, 49 insertions(+), 4 deletions(-)
>
> You forgot to put the git commit id of the upstream commit in here
> somewhere so we can properly reference it and track it.
>
> When/if you resend this, please add it to all of the commits.

Will do. Thanks!

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/5] mm: reuse only-pte-mapped KSM page in do_wp_page()
@ 2021-04-01 19:47       ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-01 19:47 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, Jann Horn, Kirill Tkhai, Linus Torvalds, Shaohua Li,
	Nadav Amit, linux-mm, LKML, kernel-team, Yang Shi,
	Kirill A. Shutemov, Hugh Dickins, Andrea Arcangeli,
	Christian Koenig, Claudio Imbrenda, Rik van Riel, Huang Ying,
	Minchan Kim, Andrew Morton

On Thu, Apr 1, 2021 at 12:38 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Apr 01, 2021 at 11:17:37AM -0700, Suren Baghdasaryan wrote:
> > From: Kirill Tkhai <ktkhai@virtuozzo.com>
> >
> > Add an optimization for KSM pages almost in the same way that we have
> > for ordinary anonymous pages.  If there is a write fault in a page,
> > which is mapped to an only pte, and it is not related to swap cache; the
> > page may be reused without copying its content.
> >
> > [ Note that we do not consider PageSwapCache() pages at least for now,
> >   since we don't want to complicate __get_ksm_page(), which has nice
> >   optimization based on this (for the migration case). Currenly it is
> >   spinning on PageSwapCache() pages, waiting for when they have
> >   unfreezed counters (i.e., for the migration finish). But we don't want
> >   to make it also spinning on swap cache pages, which we try to reuse,
> >   since there is not a very high probability to reuse them. So, for now
> >   we do not consider PageSwapCache() pages at all. ]
> >
> > So in reuse_ksm_page() we check for 1) PageSwapCache() and 2)
> > page_stable_node(), to skip a page, which KSM is currently trying to
> > link to stable tree.  Then we do page_ref_freeze() to prohibit KSM to
> > merge one more page into the page, we are reusing.  After that, nobody
> > can refer to the reusing page: KSM skips !PageSwapCache() pages with
> > zero refcount; and the protection against of all other participants is
> > the same as for reused ordinary anon pages pte lock, page lock and
> > mmap_sem.
> >
> > [akpm@linux-foundation.org: replace BUG_ON()s with WARN_ON()s]
> > Link: http://lkml.kernel.org/r/154471491016.31352.1168978849911555609.stgit@localhost.localdomain
> > Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> > Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
> > Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Christian Koenig <christian.koenig@amd.com>
> > Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
> > Cc: Rik van Riel <riel@surriel.com>
> > Cc: Huang Ying <ying.huang@intel.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > ---
> >  include/linux/ksm.h |  7 +++++++
> >  mm/ksm.c            | 30 ++++++++++++++++++++++++++++--
> >  mm/memory.c         | 16 ++++++++++++++--
> >  3 files changed, 49 insertions(+), 4 deletions(-)
>
> You forgot to put the git commit id of the upstream commit in here
> somewhere so we can properly reference it and track it.
>
> When/if you resend this, please add it to all of the commits.

Will do. Thanks!

>
> thanks,
>
> greg k-h


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-01 19:43     ` Suren Baghdasaryan
  (?)
@ 2021-04-01 23:47     ` Peter Xu
  2021-04-02  0:12         ` Suren Baghdasaryan
  -1 siblings, 1 reply; 45+ messages in thread
From: Peter Xu @ 2021-04-01 23:47 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Linus Torvalds, stable, Greg Kroah-Hartman, Jann Horn,
	Kirill Tkhai, Shaohua Li, Nadav Amit, Linux-MM,
	Linux Kernel Mailing List, Android Kernel Team

Hi, Suren,

On Thu, Apr 01, 2021 at 12:43:51PM -0700, Suren Baghdasaryan wrote:
> On Thu, Apr 1, 2021 at 11:59 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > We received a report that the copy-on-write issue repored by Jann Horn in
> > > https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
> > > reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
> > > coded in vmsplice.c).
> >
> > Gaah.
> >
> > > I confirmed this and also that the issue was not
> > > reproducible with 5.10 kernel. I tracked the fix to the following patch
> > > introduced in 5.9 which changes the do_wp_page() logic:
> > >
> > > 09854ba94c6a 'mm: do_wp_page() simplification'
> >
> > The problem here is that there's a _lot_ more patches than the few you
> > found that fixed various other cases (THP etc).
> >
> > > I backported this patch (#2 in the series) along with 2 prerequisite patches
> > > (#1 and #4) that keep the backports clean and two followup fixes to the main
> > > patch (#3 and #5). I had to skip the following fix:
> > >
> > > feb889fb40fa 'mm: don't put pinned pages into the swap cache'
> > >
> > > because it uses page_maybe_dma_pinned() which does not exists in earlier
> > > kernels. Because pin_user_pages() does not exist there as well, I *think*
> > > we can safely skip this fix on older kernels, but I would appreciate if
> > > someone could confirm that claim.
> >
> > Hmm. I think this means that swap activity can now break the
> > connection to a GUP page (the whole pre-pinning model), but it
> > probably isn't a new problem for 4.9/4.19.
> >
> > I suspect the test there should be something like
> >
> >         /* Single mapper, more references than us and the map? */
> >         if (page_mapcount(page) == 1 && page_count(page) > 2)
> >                 goto keep_locked;
> >
> > in the pre-pinning days.
> >
> > But I really think that there are a number of other commits you're
> > missing too, because we had a whole series for THP fixes for the same
> > exact issue.
> >
> > Added Peter Xu to the cc, because he probably tracked those issues
> > better than I did.
> >
> > So NAK on this for now, I think this limited patch-set likely
> > introduces more problems than it fixes.
> 
> Thanks for confirming my worries. I'll be happy to add additional
> backports if Peter can point me to them.

If for a full-alignment with current upstream, I can at least think of below
series:

Early cow for general pages:
https://lore.kernel.org/lkml/20200925222600.6832-1-peterx@redhat.com/

A race fix for copy_page and gup-fast:
https://lore.kernel.org/linux-mm/0-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com/

Early cow for hugetlbfs (which is very recently):
https://lore.kernel.org/lkml/20210217233547.93892-1-peterx@redhat.com/

But I believe they'll bring a number of dependencies too like the page pinned
work; so seems not easy.

Btw, AFAICT you don't need patch 4/5 in this series for 4.14/4.19, since
those're only for uffd-wp and it doesn't exist until 5.7.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-01 23:47     ` Peter Xu
@ 2021-04-02  0:12         ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-02  0:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, stable, Greg Kroah-Hartman, Jann Horn,
	Kirill Tkhai, Shaohua Li, Nadav Amit, Linux-MM,
	Linux Kernel Mailing List, Android Kernel Team

On Thu, Apr 1, 2021 at 4:47 PM Peter Xu <peterx@redhat.com> wrote:
>
> Hi, Suren,
>
> On Thu, Apr 01, 2021 at 12:43:51PM -0700, Suren Baghdasaryan wrote:
> > On Thu, Apr 1, 2021 at 11:59 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > We received a report that the copy-on-write issue repored by Jann Horn in
> > > > https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
> > > > reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
> > > > coded in vmsplice.c).
> > >
> > > Gaah.
> > >
> > > > I confirmed this and also that the issue was not
> > > > reproducible with 5.10 kernel. I tracked the fix to the following patch
> > > > introduced in 5.9 which changes the do_wp_page() logic:
> > > >
> > > > 09854ba94c6a 'mm: do_wp_page() simplification'
> > >
> > > The problem here is that there's a _lot_ more patches than the few you
> > > found that fixed various other cases (THP etc).
> > >
> > > > I backported this patch (#2 in the series) along with 2 prerequisite patches
> > > > (#1 and #4) that keep the backports clean and two followup fixes to the main
> > > > patch (#3 and #5). I had to skip the following fix:
> > > >
> > > > feb889fb40fa 'mm: don't put pinned pages into the swap cache'
> > > >
> > > > because it uses page_maybe_dma_pinned() which does not exists in earlier
> > > > kernels. Because pin_user_pages() does not exist there as well, I *think*
> > > > we can safely skip this fix on older kernels, but I would appreciate if
> > > > someone could confirm that claim.
> > >
> > > Hmm. I think this means that swap activity can now break the
> > > connection to a GUP page (the whole pre-pinning model), but it
> > > probably isn't a new problem for 4.9/4.19.
> > >
> > > I suspect the test there should be something like
> > >
> > >         /* Single mapper, more references than us and the map? */
> > >         if (page_mapcount(page) == 1 && page_count(page) > 2)
> > >                 goto keep_locked;
> > >
> > > in the pre-pinning days.
> > >
> > > But I really think that there are a number of other commits you're
> > > missing too, because we had a whole series for THP fixes for the same
> > > exact issue.
> > >
> > > Added Peter Xu to the cc, because he probably tracked those issues
> > > better than I did.
> > >
> > > So NAK on this for now, I think this limited patch-set likely
> > > introduces more problems than it fixes.
> >
> > Thanks for confirming my worries. I'll be happy to add additional
> > backports if Peter can point me to them.
>
> If for a full-alignment with current upstream, I can at least think of below
> series:
>
> Early cow for general pages:
> https://lore.kernel.org/lkml/20200925222600.6832-1-peterx@redhat.com/
>
> A race fix for copy_page and gup-fast:
> https://lore.kernel.org/linux-mm/0-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com/
>
> Early cow for hugetlbfs (which is very recently):
> https://lore.kernel.org/lkml/20210217233547.93892-1-peterx@redhat.com/
>
> But I believe they'll bring a number of dependencies too like the page pinned
> work; so seems not easy.

Thanks Peter. Let me try backporting these and I'll see if it's doable.

>
> Btw, AFAICT you don't need patch 4/5 in this series for 4.14/4.19, since
> those're only for uffd-wp and it doesn't exist until 5.7.

Got it. Will drop it from the next series.
Thanks,
Suren.

>
> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-02  0:12         ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-02  0:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, stable, Greg Kroah-Hartman, Jann Horn,
	Kirill Tkhai, Shaohua Li, Nadav Amit, Linux-MM,
	Linux Kernel Mailing List, Android Kernel Team

On Thu, Apr 1, 2021 at 4:47 PM Peter Xu <peterx@redhat.com> wrote:
>
> Hi, Suren,
>
> On Thu, Apr 01, 2021 at 12:43:51PM -0700, Suren Baghdasaryan wrote:
> > On Thu, Apr 1, 2021 at 11:59 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > We received a report that the copy-on-write issue repored by Jann Horn in
> > > > https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
> > > > reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
> > > > coded in vmsplice.c).
> > >
> > > Gaah.
> > >
> > > > I confirmed this and also that the issue was not
> > > > reproducible with 5.10 kernel. I tracked the fix to the following patch
> > > > introduced in 5.9 which changes the do_wp_page() logic:
> > > >
> > > > 09854ba94c6a 'mm: do_wp_page() simplification'
> > >
> > > The problem here is that there's a _lot_ more patches than the few you
> > > found that fixed various other cases (THP etc).
> > >
> > > > I backported this patch (#2 in the series) along with 2 prerequisite patches
> > > > (#1 and #4) that keep the backports clean and two followup fixes to the main
> > > > patch (#3 and #5). I had to skip the following fix:
> > > >
> > > > feb889fb40fa 'mm: don't put pinned pages into the swap cache'
> > > >
> > > > because it uses page_maybe_dma_pinned() which does not exists in earlier
> > > > kernels. Because pin_user_pages() does not exist there as well, I *think*
> > > > we can safely skip this fix on older kernels, but I would appreciate if
> > > > someone could confirm that claim.
> > >
> > > Hmm. I think this means that swap activity can now break the
> > > connection to a GUP page (the whole pre-pinning model), but it
> > > probably isn't a new problem for 4.9/4.19.
> > >
> > > I suspect the test there should be something like
> > >
> > >         /* Single mapper, more references than us and the map? */
> > >         if (page_mapcount(page) == 1 && page_count(page) > 2)
> > >                 goto keep_locked;
> > >
> > > in the pre-pinning days.
> > >
> > > But I really think that there are a number of other commits you're
> > > missing too, because we had a whole series for THP fixes for the same
> > > exact issue.
> > >
> > > Added Peter Xu to the cc, because he probably tracked those issues
> > > better than I did.
> > >
> > > So NAK on this for now, I think this limited patch-set likely
> > > introduces more problems than it fixes.
> >
> > Thanks for confirming my worries. I'll be happy to add additional
> > backports if Peter can point me to them.
>
> If for a full-alignment with current upstream, I can at least think of below
> series:
>
> Early cow for general pages:
> https://lore.kernel.org/lkml/20200925222600.6832-1-peterx@redhat.com/
>
> A race fix for copy_page and gup-fast:
> https://lore.kernel.org/linux-mm/0-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com/
>
> Early cow for hugetlbfs (which is very recently):
> https://lore.kernel.org/lkml/20210217233547.93892-1-peterx@redhat.com/
>
> But I believe they'll bring a number of dependencies too like the page pinned
> work; so seems not easy.

Thanks Peter. Let me try backporting these and I'll see if it's doable.

>
> Btw, AFAICT you don't need patch 4/5 in this series for 4.14/4.19, since
> those're only for uffd-wp and it doesn't exist until 5.7.

Got it. Will drop it from the next series.
Thanks,
Suren.

>
> Thanks,
>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-01 18:59   ` Linus Torvalds
  (?)
  (?)
@ 2021-04-07 13:21   ` Vlastimil Babka
  2021-04-07 14:30     ` Peter Xu
  2021-04-07 16:07       ` Linus Torvalds
  -1 siblings, 2 replies; 45+ messages in thread
From: Vlastimil Babka @ 2021-04-07 13:21 UTC (permalink / raw)
  To: Linus Torvalds, Suren Baghdasaryan, Peter Xu
  Cc: stable, Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On 4/1/21 8:59 PM, Linus Torvalds wrote:
> On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:

Thanks Suren for bringing this up!

>> We received a report that the copy-on-write issue repored by Jann Horn in
>> https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
>> reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
>> coded in vmsplice.c).

Note that even upstream AFAIK still has the issue unfixed when Jann's reproducer
is converted to use THPs, as Andrea has shown in
https://lore.kernel.org/linux-mm/X%2FjgLGPgPb+Xms1t@redhat.com/

> Gaah.
> 
>> I confirmed this and also that the issue was not
>> reproducible with 5.10 kernel. I tracked the fix to the following patch
>> introduced in 5.9 which changes the do_wp_page() logic:
>>
>> 09854ba94c6a 'mm: do_wp_page() simplification'
> 
> The problem here is that there's a _lot_ more patches than the few you
> found that fixed various other cases (THP etc).
> 
>> I backported this patch (#2 in the series) along with 2 prerequisite patches
>> (#1 and #4) that keep the backports clean and two followup fixes to the main
>> patch (#3 and #5). I had to skip the following fix:
>>
>> feb889fb40fa 'mm: don't put pinned pages into the swap cache'
>>
>> because it uses page_maybe_dma_pinned() which does not exists in earlier
>> kernels. Because pin_user_pages() does not exist there as well, I *think*
>> we can safely skip this fix on older kernels, but I would appreciate if
>> someone could confirm that claim.
> 
> Hmm. I think this means that swap activity can now break the
> connection to a GUP page (the whole pre-pinning model), but it
> probably isn't a new problem for 4.9/4.19.
> 
> I suspect the test there should be something like
> 
>         /* Single mapper, more references than us and the map? */
>         if (page_mapcount(page) == 1 && page_count(page) > 2)
>                 goto keep_locked;
> 
> in the pre-pinning days.
> 
> But I really think that there are a number of other commits you're
> missing too, because we had a whole series for THP fixes for the same
> exact issue.
> 
> Added Peter Xu to the cc, because he probably tracked those issues
> better than I did.

Let me shamelessly plug these links for illustrating what kind of minefield we
would be going into backporting this. Also for references what not to miss, and
what may still become broken afterwards:

https://lwn.net/Articles/849638/
https://lwn.net/Articles/849876/

> So NAK on this for now, I think this limited patch-set likely
> introduces more problems than it fixes.

I personally think there are only two options safe enough for stable backports
(so that not more harm is caused than actually prevented).

1) Ignore the issue (outside of Android at least). The security model of zygote
is unusual. Where else a parent of fork() doesn't trust the child, which is the
same binary?

BTW, I think the CVE description is very misleading:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-29374

"does not properly consider the semantics of read operations and therefore can
grant unintended write access" - the bug was never about an unintended write
access, but about an info leak from parent to child, no?

2) For backports go with the original approach of 17839856fd58 ("gup: document
and work around "COW can break either way" issue"), thus break COW during the
GUP. But only for vmplice() so that nothing else gets broken. I think 5.4 stable
(another LTS) actually backported only 17839856fd58 out of everything else, so
it should have even the THP case covered, but its userfaultfd() is now probably
broken...

>         Linus
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-07 13:21   ` Vlastimil Babka
@ 2021-04-07 14:30     ` Peter Xu
  2021-04-07 16:07       ` Linus Torvalds
  1 sibling, 0 replies; 45+ messages in thread
From: Peter Xu @ 2021-04-07 14:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linus Torvalds, Suren Baghdasaryan, stable, Greg Kroah-Hartman,
	Jann Horn, Kirill Tkhai, Shaohua Li, Nadav Amit, Linux-MM,
	Linux Kernel Mailing List, Android Kernel Team, Andrea Arcangeli,
	David Hildenbrand, Jason Gunthorpe

On Wed, Apr 07, 2021 at 03:21:55PM +0200, Vlastimil Babka wrote:
> 2) For backports go with the original approach of 17839856fd58 ("gup: document
> and work around "COW can break either way" issue"), thus break COW during the
> GUP. But only for vmplice() so that nothing else gets broken. I think 5.4 stable
> (another LTS) actually backported only 17839856fd58 out of everything else, so
> it should have even the THP case covered, but its userfaultfd() is now probably
> broken...

Since you mentioned this approach - AFAIU userfaultfd was only broken because
with that approach the kernel pretends some read accesses as writes, while
userfaultfd needs that accurate resolution.  Adding something like
FOLL_BREAK_COW [1] upon 17839856fd58 should keep both the vmsplice issue fixed
but also uffd working since that'll keep the read/write operation separate.

Meanwhile, I know Andrea was actively working on a complete solution [2] that's
a few steps further.  E.g., FOLL_BREAK_COW is done with FOLL_UNSHARE [3], speed
up in COW path [4] with similar idea of what we do right now with latest
upstream in 09854ba94c6aad7, allow write-protect with pinned pages (which is
right now forbidden), and something more.  However that's definitely a huge
branch, even discussing upstream (or maybe stopped discussing for quite some
days already?).

Neither of above are within upstream, so I don't really know whether these
information could be anything useful, just raise it up.  If Android could drop
userfaultfd, then I think solution 2) above is indeed the most efficient.  Note
that I think only uffd-wp was affected by 17839856fd58 but not the "missing
mode", so if Android is only using missing mode it still looks fine to only
have 17839856fd58.  It's just that I remembered there's another report besides
uffd-wp on 17839856fd58, but I can't remember the details of the other report.

Thanks,

[1] https://lkml.org/lkml/2020/8/10/439
[2] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/log/?h=mapcount_deshare
[3] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=mapcount_deshare&id=7c3a31caa34ac6ac4a4ec0559b1307b5edfc0821
[4] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=mapcount_deshare&id=599aa62474f51a470408b28fd4365320a5357aca

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-07 13:21   ` Vlastimil Babka
@ 2021-04-07 16:07       ` Linus Torvalds
  2021-04-07 16:07       ` Linus Torvalds
  1 sibling, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2021-04-07 16:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, Peter Xu, stable, Greg Kroah-Hartman,
	Jann Horn, Kirill Tkhai, Shaohua Li, Nadav Amit, Linux-MM,
	Linux Kernel Mailing List, Android Kernel Team, Andrea Arcangeli,
	David Hildenbrand, Jason Gunthorpe

On Wed, Apr 7, 2021 at 6:22 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> 1) Ignore the issue (outside of Android at least). The security model of zygote
> is unusual. Where else a parent of fork() doesn't trust the child, which is the
> same binary?

Agreed. I think this is basically an android-only issue (with
_possibly_ some impact on crazy "pin-and-fork" loads), and doesn't
necessarily merit a backport at all.

If Android people insist on using very old kernels, knowing that they
do things that are questionable with those old kernels, at some point
it's just _their_ problem.

                 Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-07 16:07       ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2021-04-07 16:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, Peter Xu, stable, Greg Kroah-Hartman,
	Jann Horn, Kirill Tkhai, Shaohua Li, Nadav Amit, Linux-MM,
	Linux Kernel Mailing List, Android Kernel Team, Andrea Arcangeli,
	David Hildenbrand, Jason Gunthorpe

On Wed, Apr 7, 2021 at 6:22 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> 1) Ignore the issue (outside of Android at least). The security model of zygote
> is unusual. Where else a parent of fork() doesn't trust the child, which is the
> same binary?

Agreed. I think this is basically an android-only issue (with
_possibly_ some impact on crazy "pin-and-fork" loads), and doesn't
necessarily merit a backport at all.

If Android people insist on using very old kernels, knowing that they
do things that are questionable with those old kernels, at some point
it's just _their_ problem.

                 Linus


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-07 16:07       ` Linus Torvalds
@ 2021-04-07 16:33         ` Suren Baghdasaryan
  -1 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-07 16:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vlastimil Babka, Peter Xu, stable, Greg Kroah-Hartman, Jann Horn,
	Kirill Tkhai, Shaohua Li, Nadav Amit, Linux-MM,
	Linux Kernel Mailing List, Android Kernel Team, Andrea Arcangeli,
	David Hildenbrand, Jason Gunthorpe

On Wed, Apr 7, 2021 at 9:07 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Apr 7, 2021 at 6:22 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > 1) Ignore the issue (outside of Android at least). The security model of zygote
> > is unusual. Where else a parent of fork() doesn't trust the child, which is the
> > same binary?
>
> Agreed. I think this is basically an android-only issue (with
> _possibly_ some impact on crazy "pin-and-fork" loads), and doesn't
> necessarily merit a backport at all.
>
> If Android people insist on using very old kernels, knowing that they
> do things that are questionable with those old kernels, at some point
> it's just _their_ problem.

We don't really insist on using old kernels but rather we are stuck
with them for some time.
Trying my hand at backporting the patchsets Peter mentioned proved
this to be far from easy with many dependencies. Let me look into
Vlastimil's suggestion to backport only 17839856fd58 and it sounds
like 5.4 already followed that path. Thanks for all the information!
Suren.

>
>                  Linus
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-07 16:33         ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-07 16:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vlastimil Babka, Peter Xu, stable, Greg Kroah-Hartman, Jann Horn,
	Kirill Tkhai, Shaohua Li, Nadav Amit, Linux-MM,
	Linux Kernel Mailing List, Android Kernel Team, Andrea Arcangeli,
	David Hildenbrand, Jason Gunthorpe

On Wed, Apr 7, 2021 at 9:07 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Apr 7, 2021 at 6:22 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > 1) Ignore the issue (outside of Android at least). The security model of zygote
> > is unusual. Where else a parent of fork() doesn't trust the child, which is the
> > same binary?
>
> Agreed. I think this is basically an android-only issue (with
> _possibly_ some impact on crazy "pin-and-fork" loads), and doesn't
> necessarily merit a backport at all.
>
> If Android people insist on using very old kernels, knowing that they
> do things that are questionable with those old kernels, at some point
> it's just _their_ problem.

We don't really insist on using old kernels but rather we are stuck
with them for some time.
Trying my hand at backporting the patchsets Peter mentioned proved
this to be far from easy with many dependencies. Let me look into
Vlastimil's suggestion to backport only 17839856fd58 and it sounds
like 5.4 already followed that path. Thanks for all the information!
Suren.

>
>                  Linus
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-07 16:33         ` Suren Baghdasaryan
@ 2021-04-07 17:04           ` Linus Torvalds
  -1 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2021-04-07 17:04 UTC (permalink / raw)
  To: Suren Baghdasaryan, Mikulas Patocka
  Cc: Vlastimil Babka, Peter Xu, stable, Greg Kroah-Hartman, Jann Horn,
	Kirill Tkhai, Shaohua Li, Nadav Amit, Linux-MM,
	Linux Kernel Mailing List, Android Kernel Team, Andrea Arcangeli,
	David Hildenbrand, Jason Gunthorpe

On Wed, Apr 7, 2021 at 9:33 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Trying my hand at backporting the patchsets Peter mentioned proved
> this to be far from easy with many dependencies. Let me look into
> Vlastimil's suggestion to backport only 17839856fd58 and it sounds
> like 5.4 already followed that path.

Well, in many ways 17839856fd58 was the "simple and obvious" fix, and
I do think it's easily backportable.

But it *did* cause problems too. Those problems may not be issues on
those old kernels, though.

In particular, commit 17839856fd58 caused uffd-wp to stop working
right, and it caused some issues with debugging (I forget the exact
details, but I think it was strace accessing PROT_NONE or write-only
pages or something like that, and COW failed).

But yes, in many ways that commit is a much simpler and more
straightforward one (which is why I tried it once - we ended up with
the much more subtle and far-reaching fixes after the UFFD issues
crept up).

The issues that 17839856fd58 caused may be entire non-events in old
kernels. In fact, the uffd writeprotect API was added fairly recently
(see commit 63b2d4174c4a that made it into v5.7), so the uffd-wp issue
that was triggered probably cannot happen in the old kernels.

The strace issue might not be relevant either, but I forget what the
details were. Mikilas should know.

See

  https://lore.kernel.org/lkml/alpine.LRH.2.02.2009031328040.6929@file01.intranet.prod.int.rdu2.redhat.com/

for Mikulas report. I never looked into it in detail, because by then
the uffd-wp issue had already come up, so it was juat another nail in
the coffin for that simpler approach.

Mikulas, do you remember?

            Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-07 17:04           ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2021-04-07 17:04 UTC (permalink / raw)
  To: Suren Baghdasaryan, Mikulas Patocka
  Cc: Vlastimil Babka, Peter Xu, stable, Greg Kroah-Hartman, Jann Horn,
	Kirill Tkhai, Shaohua Li, Nadav Amit, Linux-MM,
	Linux Kernel Mailing List, Android Kernel Team, Andrea Arcangeli,
	David Hildenbrand, Jason Gunthorpe

On Wed, Apr 7, 2021 at 9:33 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Trying my hand at backporting the patchsets Peter mentioned proved
> this to be far from easy with many dependencies. Let me look into
> Vlastimil's suggestion to backport only 17839856fd58 and it sounds
> like 5.4 already followed that path.

Well, in many ways 17839856fd58 was the "simple and obvious" fix, and
I do think it's easily backportable.

But it *did* cause problems too. Those problems may not be issues on
those old kernels, though.

In particular, commit 17839856fd58 caused uffd-wp to stop working
right, and it caused some issues with debugging (I forget the exact
details, but I think it was strace accessing PROT_NONE or write-only
pages or something like that, and COW failed).

But yes, in many ways that commit is a much simpler and more
straightforward one (which is why I tried it once - we ended up with
the much more subtle and far-reaching fixes after the UFFD issues
crept up).

The issues that 17839856fd58 caused may be entire non-events in old
kernels. In fact, the uffd writeprotect API was added fairly recently
(see commit 63b2d4174c4a that made it into v5.7), so the uffd-wp issue
that was triggered probably cannot happen in the old kernels.

The strace issue might not be relevant either, but I forget what the
details were. Mikilas should know.

See

  https://lore.kernel.org/lkml/alpine.LRH.2.02.2009031328040.6929@file01.intranet.prod.int.rdu2.redhat.com/

for Mikulas report. I never looked into it in detail, because by then
the uffd-wp issue had already come up, so it was juat another nail in
the coffin for that simpler approach.

Mikulas, do you remember?

            Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-07 17:04           ` Linus Torvalds
  (?)
@ 2021-04-07 18:47           ` Mikulas Patocka
  2021-04-07 19:22               ` Linus Torvalds
  -1 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2021-04-07 18:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Suren Baghdasaryan, Vlastimil Babka, Peter Xu, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe



On Wed, 7 Apr 2021, Linus Torvalds wrote:

> On Wed, Apr 7, 2021 at 9:33 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > Trying my hand at backporting the patchsets Peter mentioned proved
> > this to be far from easy with many dependencies. Let me look into
> > Vlastimil's suggestion to backport only 17839856fd58 and it sounds
> > like 5.4 already followed that path.
> 
> Well, in many ways 17839856fd58 was the "simple and obvious" fix, and
> I do think it's easily backportable.
> 
> But it *did* cause problems too. Those problems may not be issues on
> those old kernels, though.
> 
> In particular, commit 17839856fd58 caused uffd-wp to stop working
> right, and it caused some issues with debugging (I forget the exact
> details, but I think it was strace accessing PROT_NONE or write-only
> pages or something like that, and COW failed).
> 
> But yes, in many ways that commit is a much simpler and more
> straightforward one (which is why I tried it once - we ended up with
> the much more subtle and far-reaching fixes after the UFFD issues
> crept up).
> 
> The issues that 17839856fd58 caused may be entire non-events in old
> kernels. In fact, the uffd writeprotect API was added fairly recently
> (see commit 63b2d4174c4a that made it into v5.7), so the uffd-wp issue
> that was triggered probably cannot happen in the old kernels.
> 
> The strace issue might not be relevant either, but I forget what the
> details were. Mikilas should know.
> 
> See
> 
>   https://lore.kernel.org/lkml/alpine.LRH.2.02.2009031328040.6929@file01.intranet.prod.int.rdu2.redhat.com/
> 
> for Mikulas report. I never looked into it in detail, because by then
> the uffd-wp issue had already come up, so it was juat another nail in
> the coffin for that simpler approach.
> 
> Mikulas, do you remember?
> 
>             Linus

Hi

I think that we never found a root cause for this bug. I was testing if 
the whole system can run from persistent memory and found out that strace 
didn't work. I bisected it, reported it and when I received Peter Xu's 
patches (which fixed it), I stopped bothering about it.

So, we fixed it, but we don't know why.

Peter Xu's patchset that fixed it is here: 
https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-07 18:47           ` Mikulas Patocka
@ 2021-04-07 19:22               ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2021-04-07 19:22 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Suren Baghdasaryan, Vlastimil Babka, Peter Xu, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> So, we fixed it, but we don't know why.
>
> Peter Xu's patchset that fixed it is here:
> https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/

Yeah, that's the part that ends up being really painful to backport
(with all the subsequent fixes too), so the 4.14 people would prefer
to avoid it.

But I think that if it's a "requires dax pmem and ptrace on top", it
may simply be a non-issue for those users. Although who knows - maybe
that ends up being a real issue on Android..

            Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-07 19:22               ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2021-04-07 19:22 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Suren Baghdasaryan, Vlastimil Babka, Peter Xu, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> So, we fixed it, but we don't know why.
>
> Peter Xu's patchset that fixed it is here:
> https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/

Yeah, that's the part that ends up being really painful to backport
(with all the subsequent fixes too), so the 4.14 people would prefer
to avoid it.

But I think that if it's a "requires dax pmem and ptrace on top", it
may simply be a non-issue for those users. Although who knows - maybe
that ends up being a real issue on Android..

            Linus


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-07 19:22               ` Linus Torvalds
@ 2021-04-07 21:53                 ` Suren Baghdasaryan
  -1 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-07 21:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mikulas Patocka, Vlastimil Babka, Peter Xu, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 7, 2021 at 12:23 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> > So, we fixed it, but we don't know why.
> >
> > Peter Xu's patchset that fixed it is here:
> > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/
>
> Yeah, that's the part that ends up being really painful to backport
> (with all the subsequent fixes too), so the 4.14 people would prefer
> to avoid it.
>
> But I think that if it's a "requires dax pmem and ptrace on top", it
> may simply be a non-issue for those users. Although who knows - maybe
> that ends up being a real issue on Android..

A lot to digest, so I need to do some reading now. Thanks everyone!

>
>             Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-07 21:53                 ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-07 21:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mikulas Patocka, Vlastimil Babka, Peter Xu, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 7, 2021 at 12:23 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> > So, we fixed it, but we don't know why.
> >
> > Peter Xu's patchset that fixed it is here:
> > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/
>
> Yeah, that's the part that ends up being really painful to backport
> (with all the subsequent fixes too), so the 4.14 people would prefer
> to avoid it.
>
> But I think that if it's a "requires dax pmem and ptrace on top", it
> may simply be a non-issue for those users. Although who knows - maybe
> that ends up being a real issue on Android..

A lot to digest, so I need to do some reading now. Thanks everyone!

>
>             Linus


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-07 21:53                 ` Suren Baghdasaryan
@ 2021-04-21 20:01                   ` Suren Baghdasaryan
  -1 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-21 20:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mikulas Patocka, Vlastimil Babka, Peter Xu, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 7, 2021 at 2:53 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Apr 7, 2021 at 12:23 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > > So, we fixed it, but we don't know why.
> > >
> > > Peter Xu's patchset that fixed it is here:
> > > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/
> >
> > Yeah, that's the part that ends up being really painful to backport
> > (with all the subsequent fixes too), so the 4.14 people would prefer
> > to avoid it.
> >
> > But I think that if it's a "requires dax pmem and ptrace on top", it
> > may simply be a non-issue for those users. Although who knows - maybe
> > that ends up being a real issue on Android..
>
> A lot to digest, so I need to do some reading now. Thanks everyone!

After a delay due to vacation I prepared backports of 17839856fd58
("gup: document and work around "COW can break either way" issue") for
4.14 and 4.19 kernels. As Linus pointed out, uffd-wp was introduced
later in 5.7, so is not an issue for 4.x kernels. The issue with THPs
is still unresolved, so with or without this patch it's still there
(Android is not affected by this since we do not use THPs with older
kernels).
Andrea pointed out that there are other issues and to properly fix
them his COR approach is needed. However it has not been accepted yet,
so I can't really backport it. I'll be happy to do that though if it
is accepted in the future.

Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
distinguish real writes vs enforced COW read requests, however I also
see that you had a later version of this patch here:
https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
backport? Or is it not needed in the absence of uffd-wp support in the
earlier kernels?
Thanks,
Suren.

>
> >
> >             Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-21 20:01                   ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-21 20:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mikulas Patocka, Vlastimil Babka, Peter Xu, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 7, 2021 at 2:53 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Apr 7, 2021 at 12:23 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > > So, we fixed it, but we don't know why.
> > >
> > > Peter Xu's patchset that fixed it is here:
> > > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/
> >
> > Yeah, that's the part that ends up being really painful to backport
> > (with all the subsequent fixes too), so the 4.14 people would prefer
> > to avoid it.
> >
> > But I think that if it's a "requires dax pmem and ptrace on top", it
> > may simply be a non-issue for those users. Although who knows - maybe
> > that ends up being a real issue on Android..
>
> A lot to digest, so I need to do some reading now. Thanks everyone!

After a delay due to vacation I prepared backports of 17839856fd58
("gup: document and work around "COW can break either way" issue") for
4.14 and 4.19 kernels. As Linus pointed out, uffd-wp was introduced
later in 5.7, so is not an issue for 4.x kernels. The issue with THPs
is still unresolved, so with or without this patch it's still there
(Android is not affected by this since we do not use THPs with older
kernels).
Andrea pointed out that there are other issues and to properly fix
them his COR approach is needed. However it has not been accepted yet,
so I can't really backport it. I'll be happy to do that though if it
is accepted in the future.

Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
distinguish real writes vs enforced COW read requests, however I also
see that you had a later version of this patch here:
https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
backport? Or is it not needed in the absence of uffd-wp support in the
earlier kernels?
Thanks,
Suren.

>
> >
> >             Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-21 20:01                   ` Suren Baghdasaryan
  (?)
@ 2021-04-21 21:05                   ` Peter Xu
  2021-04-21 21:17                       ` Suren Baghdasaryan
  -1 siblings, 1 reply; 45+ messages in thread
From: Peter Xu @ 2021-04-21 21:05 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Linus Torvalds, Mikulas Patocka, Vlastimil Babka, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

Hi, Suren,

On Wed, Apr 21, 2021 at 01:01:34PM -0700, Suren Baghdasaryan wrote:
> Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> distinguish real writes vs enforced COW read requests, however I also
> see that you had a later version of this patch here:
> https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> backport? Or is it not needed in the absence of uffd-wp support in the
> earlier kernels?

Sorry I have no ability to evaluate the rest... but according to Linus's
previous reply, my understanding is that it is not needed, not to mention it's
not upstreamed too.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-21 21:05                   ` Peter Xu
@ 2021-04-21 21:17                       ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-21 21:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, Mikulas Patocka, Vlastimil Babka, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 21, 2021 at 2:05 PM Peter Xu <peterx@redhat.com> wrote:
>
> Hi, Suren,
>
> On Wed, Apr 21, 2021 at 01:01:34PM -0700, Suren Baghdasaryan wrote:
> > Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> > distinguish real writes vs enforced COW read requests, however I also
> > see that you had a later version of this patch here:
> > https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> > backport? Or is it not needed in the absence of uffd-wp support in the
> > earlier kernels?
>
> Sorry I have no ability to evaluate the rest... but according to Linus's
> previous reply, my understanding is that it is not needed, not to mention it's
> not upstreamed too.

Thanks! Then I'll send the backports for 17839856fd58 alone and if
more backports are needed I'll post followup patches.
Cheers,
Suren.

>
> Thanks,
>
> --
> Peter Xu
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-21 21:17                       ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-21 21:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, Mikulas Patocka, Vlastimil Babka, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 21, 2021 at 2:05 PM Peter Xu <peterx@redhat.com> wrote:
>
> Hi, Suren,
>
> On Wed, Apr 21, 2021 at 01:01:34PM -0700, Suren Baghdasaryan wrote:
> > Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> > distinguish real writes vs enforced COW read requests, however I also
> > see that you had a later version of this patch here:
> > https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> > backport? Or is it not needed in the absence of uffd-wp support in the
> > earlier kernels?
>
> Sorry I have no ability to evaluate the rest... but according to Linus's
> previous reply, my understanding is that it is not needed, not to mention it's
> not upstreamed too.

Thanks! Then I'll send the backports for 17839856fd58 alone and if
more backports are needed I'll post followup patches.
Cheers,
Suren.

>
> Thanks,
>
> --
> Peter Xu
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-21 20:01                   ` Suren Baghdasaryan
  (?)
  (?)
@ 2021-04-21 22:59                   ` Vlastimil Babka
  2021-04-21 23:05                       ` Suren Baghdasaryan
  -1 siblings, 1 reply; 45+ messages in thread
From: Vlastimil Babka @ 2021-04-21 22:59 UTC (permalink / raw)
  To: Suren Baghdasaryan, Linus Torvalds
  Cc: Mikulas Patocka, Peter Xu, stable, Greg Kroah-Hartman, Jann Horn,
	Kirill Tkhai, Shaohua Li, Nadav Amit, Linux-MM,
	Linux Kernel Mailing List, Android Kernel Team, Andrea Arcangeli,
	David Hildenbrand, Jason Gunthorpe

On 4/21/21 10:01 PM, Suren Baghdasaryan wrote:
> On Wed, Apr 7, 2021 at 2:53 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>
>> On Wed, Apr 7, 2021 at 12:23 PM Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >
>> > On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>> > >
>> > > So, we fixed it, but we don't know why.
>> > >
>> > > Peter Xu's patchset that fixed it is here:
>> > > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/
>> >
>> > Yeah, that's the part that ends up being really painful to backport
>> > (with all the subsequent fixes too), so the 4.14 people would prefer
>> > to avoid it.
>> >
>> > But I think that if it's a "requires dax pmem and ptrace on top", it
>> > may simply be a non-issue for those users. Although who knows - maybe
>> > that ends up being a real issue on Android..
>>
>> A lot to digest, so I need to do some reading now. Thanks everyone!
> 
> After a delay due to vacation I prepared backports of 17839856fd58
> ("gup: document and work around "COW can break either way" issue") for
> 4.14 and 4.19 kernels. As Linus pointed out, uffd-wp was introduced
> later in 5.7, so is not an issue for 4.x kernels. The issue with THPs
> is still unresolved, so with or without this patch it's still there
> (Android is not affected by this since we do not use THPs with older
> kernels).

Which THP issue do you mean here? The race that was part of the same Project
zero report and was solved by a different patch adding some locking? Or the
vmsplice info leak but applied to THP's? Because if it's the latter then I
believe 17839856fd58 did solve that too. It was the later switch of approach to
rely just on page_count() that left THP side unfixed.

> Andrea pointed out that there are other issues and to properly fix
> them his COR approach is needed. However it has not been accepted yet,
> so I can't really backport it. I'll be happy to do that though if it
> is accepted in the future.
> 
> Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> distinguish real writes vs enforced COW read requests, however I also
> see that you had a later version of this patch here:
> https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> backport? Or is it not needed in the absence of uffd-wp support in the
> earlier kernels?
> Thanks,
> Suren.
> 
>>
>> >
>> >             Linus
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-21 21:17                       ` Suren Baghdasaryan
@ 2021-04-21 23:01                         ` Suren Baghdasaryan
  -1 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-21 23:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, Mikulas Patocka, Vlastimil Babka, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 21, 2021 at 2:17 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Apr 21, 2021 at 2:05 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Hi, Suren,
> >
> > On Wed, Apr 21, 2021 at 01:01:34PM -0700, Suren Baghdasaryan wrote:
> > > Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> > > distinguish real writes vs enforced COW read requests, however I also
> > > see that you had a later version of this patch here:
> > > https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> > > backport? Or is it not needed in the absence of uffd-wp support in the
> > > earlier kernels?
> >
> > Sorry I have no ability to evaluate the rest... but according to Linus's
> > previous reply, my understanding is that it is not needed, not to mention it's
> > not upstreamed too.
>
> Thanks! Then I'll send the backports for 17839856fd58 alone and if
> more backports are needed I'll post followup patches.

Posted 4.19 backport: https://lore.kernel.org/patchwork/patch/1416747
and 4.14 backport: https://lore.kernel.org/patchwork/patch/1416748.
They are identical but Greg asked me to submit separate patches due to
an additional minor merge conflict in 4.14.

> Cheers,
> Suren.
>
> >
> > Thanks,
> >
> > --
> > Peter Xu
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-21 23:01                         ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-21 23:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, Mikulas Patocka, Vlastimil Babka, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 21, 2021 at 2:17 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Apr 21, 2021 at 2:05 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Hi, Suren,
> >
> > On Wed, Apr 21, 2021 at 01:01:34PM -0700, Suren Baghdasaryan wrote:
> > > Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> > > distinguish real writes vs enforced COW read requests, however I also
> > > see that you had a later version of this patch here:
> > > https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> > > backport? Or is it not needed in the absence of uffd-wp support in the
> > > earlier kernels?
> >
> > Sorry I have no ability to evaluate the rest... but according to Linus's
> > previous reply, my understanding is that it is not needed, not to mention it's
> > not upstreamed too.
>
> Thanks! Then I'll send the backports for 17839856fd58 alone and if
> more backports are needed I'll post followup patches.

Posted 4.19 backport: https://lore.kernel.org/patchwork/patch/1416747
and 4.14 backport: https://lore.kernel.org/patchwork/patch/1416748.
They are identical but Greg asked me to submit separate patches due to
an additional minor merge conflict in 4.14.

> Cheers,
> Suren.
>
> >
> > Thanks,
> >
> > --
> > Peter Xu
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
  2021-04-21 22:59                   ` Vlastimil Babka
@ 2021-04-21 23:05                       ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-21 23:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linus Torvalds, Mikulas Patocka, Peter Xu, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 21, 2021 at 3:59 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 4/21/21 10:01 PM, Suren Baghdasaryan wrote:
> > On Wed, Apr 7, 2021 at 2:53 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>
> >> On Wed, Apr 7, 2021 at 12:23 PM Linus Torvalds
> >> <torvalds@linux-foundation.org> wrote:
> >> >
> >> > On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >> > >
> >> > > So, we fixed it, but we don't know why.
> >> > >
> >> > > Peter Xu's patchset that fixed it is here:
> >> > > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/
> >> >
> >> > Yeah, that's the part that ends up being really painful to backport
> >> > (with all the subsequent fixes too), so the 4.14 people would prefer
> >> > to avoid it.
> >> >
> >> > But I think that if it's a "requires dax pmem and ptrace on top", it
> >> > may simply be a non-issue for those users. Although who knows - maybe
> >> > that ends up being a real issue on Android..
> >>
> >> A lot to digest, so I need to do some reading now. Thanks everyone!
> >
> > After a delay due to vacation I prepared backports of 17839856fd58
> > ("gup: document and work around "COW can break either way" issue") for
> > 4.14 and 4.19 kernels. As Linus pointed out, uffd-wp was introduced
> > later in 5.7, so is not an issue for 4.x kernels. The issue with THPs
> > is still unresolved, so with or without this patch it's still there
> > (Android is not affected by this since we do not use THPs with older
> > kernels).
>
> Which THP issue do you mean here? The race that was part of the same Project
> zero report and was solved by a different patch adding some locking? Or the
> vmsplice info leak but applied to THP's? Because if it's the latter then I
> believe 17839856fd58 did solve that too. It was the later switch of approach to
> rely just on page_count() that left THP side unfixed.

I meant the "vmsplice info leak applied to THP's" but now I realize
that 17839856fd58 does not use elevated reference count, so indeed
that should not be a problem. Thanks for the note!

>
> > Andrea pointed out that there are other issues and to properly fix
> > them his COR approach is needed. However it has not been accepted yet,
> > so I can't really backport it. I'll be happy to do that though if it
> > is accepted in the future.
> >
> > Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> > distinguish real writes vs enforced COW read requests, however I also
> > see that you had a later version of this patch here:
> > https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> > backport? Or is it not needed in the absence of uffd-wp support in the
> > earlier kernels?
> > Thanks,
> > Suren.
> >
> >>
> >> >
> >> >             Linus
> >
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue"
@ 2021-04-21 23:05                       ` Suren Baghdasaryan
  0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2021-04-21 23:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linus Torvalds, Mikulas Patocka, Peter Xu, stable,
	Greg Kroah-Hartman, Jann Horn, Kirill Tkhai, Shaohua Li,
	Nadav Amit, Linux-MM, Linux Kernel Mailing List,
	Android Kernel Team, Andrea Arcangeli, David Hildenbrand,
	Jason Gunthorpe

On Wed, Apr 21, 2021 at 3:59 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 4/21/21 10:01 PM, Suren Baghdasaryan wrote:
> > On Wed, Apr 7, 2021 at 2:53 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>
> >> On Wed, Apr 7, 2021 at 12:23 PM Linus Torvalds
> >> <torvalds@linux-foundation.org> wrote:
> >> >
> >> > On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >> > >
> >> > > So, we fixed it, but we don't know why.
> >> > >
> >> > > Peter Xu's patchset that fixed it is here:
> >> > > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/
> >> >
> >> > Yeah, that's the part that ends up being really painful to backport
> >> > (with all the subsequent fixes too), so the 4.14 people would prefer
> >> > to avoid it.
> >> >
> >> > But I think that if it's a "requires dax pmem and ptrace on top", it
> >> > may simply be a non-issue for those users. Although who knows - maybe
> >> > that ends up being a real issue on Android..
> >>
> >> A lot to digest, so I need to do some reading now. Thanks everyone!
> >
> > After a delay due to vacation I prepared backports of 17839856fd58
> > ("gup: document and work around "COW can break either way" issue") for
> > 4.14 and 4.19 kernels. As Linus pointed out, uffd-wp was introduced
> > later in 5.7, so is not an issue for 4.x kernels. The issue with THPs
> > is still unresolved, so with or without this patch it's still there
> > (Android is not affected by this since we do not use THPs with older
> > kernels).
>
> Which THP issue do you mean here? The race that was part of the same Project
> zero report and was solved by a different patch adding some locking? Or the
> vmsplice info leak but applied to THP's? Because if it's the latter then I
> believe 17839856fd58 did solve that too. It was the later switch of approach to
> rely just on page_count() that left THP side unfixed.

I meant the "vmsplice info leak applied to THP's" but now I realize
that 17839856fd58 does not use elevated reference count, so indeed
that should not be a problem. Thanks for the note!

>
> > Andrea pointed out that there are other issues and to properly fix
> > them his COR approach is needed. However it has not been accepted yet,
> > so I can't really backport it. I'll be happy to do that though if it
> > is accepted in the future.
> >
> > Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> > distinguish real writes vs enforced COW read requests, however I also
> > see that you had a later version of this patch here:
> > https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> > backport? Or is it not needed in the absence of uffd-wp support in the
> > earlier kernels?
> > Thanks,
> > Suren.
> >
> >>
> >> >
> >> >             Linus
> >
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2021-04-21 23:05 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01 18:17 [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue" Suren Baghdasaryan
2021-04-01 18:17 ` Suren Baghdasaryan
2021-04-01 18:17 ` [PATCH 1/5] mm: reuse only-pte-mapped KSM page in do_wp_page() Suren Baghdasaryan
2021-04-01 18:17   ` Suren Baghdasaryan
2021-04-01 19:38   ` Greg KH
2021-04-01 19:47     ` Suren Baghdasaryan
2021-04-01 19:47       ` Suren Baghdasaryan
2021-04-01 18:17 ` [PATCH 2/5] mm: do_wp_page() simplification Suren Baghdasaryan
2021-04-01 18:17   ` Suren Baghdasaryan
2021-04-01 18:17 ` [PATCH 3/5] mm: fix misplaced unlock_page in do_wp_page() Suren Baghdasaryan
2021-04-01 18:17   ` Suren Baghdasaryan
2021-04-01 18:17 ` [PATCH 4/5] userfaultfd: wp: add helper for writeprotect check Suren Baghdasaryan
2021-04-01 18:17   ` Suren Baghdasaryan
2021-04-01 18:17 ` [PATCH 5/5] mm/userfaultfd: fix memory corruption due to writeprotect Suren Baghdasaryan
2021-04-01 18:17   ` Suren Baghdasaryan
2021-04-01 18:59 ` [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue" Linus Torvalds
2021-04-01 18:59   ` Linus Torvalds
2021-04-01 19:43   ` Suren Baghdasaryan
2021-04-01 19:43     ` Suren Baghdasaryan
2021-04-01 23:47     ` Peter Xu
2021-04-02  0:12       ` Suren Baghdasaryan
2021-04-02  0:12         ` Suren Baghdasaryan
2021-04-07 13:21   ` Vlastimil Babka
2021-04-07 14:30     ` Peter Xu
2021-04-07 16:07     ` Linus Torvalds
2021-04-07 16:07       ` Linus Torvalds
2021-04-07 16:33       ` Suren Baghdasaryan
2021-04-07 16:33         ` Suren Baghdasaryan
2021-04-07 17:04         ` Linus Torvalds
2021-04-07 17:04           ` Linus Torvalds
2021-04-07 18:47           ` Mikulas Patocka
2021-04-07 19:22             ` Linus Torvalds
2021-04-07 19:22               ` Linus Torvalds
2021-04-07 21:53               ` Suren Baghdasaryan
2021-04-07 21:53                 ` Suren Baghdasaryan
2021-04-21 20:01                 ` Suren Baghdasaryan
2021-04-21 20:01                   ` Suren Baghdasaryan
2021-04-21 21:05                   ` Peter Xu
2021-04-21 21:17                     ` Suren Baghdasaryan
2021-04-21 21:17                       ` Suren Baghdasaryan
2021-04-21 23:01                       ` Suren Baghdasaryan
2021-04-21 23:01                         ` Suren Baghdasaryan
2021-04-21 22:59                   ` Vlastimil Babka
2021-04-21 23:05                     ` Suren Baghdasaryan
2021-04-21 23:05                       ` Suren Baghdasaryan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.