linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap
@ 2022-01-31 16:29 David Hildenbrand
  2022-01-31 16:29 ` [PATCH v3 1/9] mm: optimize do_wp_page() for exclusive pages in the swapcache David Hildenbrand
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: David Hildenbrand @ 2022-01-31 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Vlastimil Babka, Jann Horn, Michal Hocko, Nadav Amit,
	Rik van Riel, Roman Gushchin, Andrea Arcangeli, Peter Xu,
	Donald Dutile, Christoph Hellwig, Oleg Nesterov, Jan Kara,
	Liang Zhang, linux-mm, David Hildenbrand, Nadav Amit

mm: COW fixes part 1: fix the COW security issue for THP and swap

This series is the result of the discussion on the previous approach [1].
More information on the general COW issues can be found there.


This series attempts to optimize and streamline the COW logic for ordinary
anon pages and THP anon pages, fixing two remaining instances of
CVE-2020-29374 in do_swap_page() and do_huge_pmd_wp_page(): information can
leak from a parent process to a child process via anonymous pages shared
during fork().

This issue, including other related COW issues, has been summarized in [2]:
"
  1. Observing Memory Modifications of Private Pages From A Child Process

  Long story short: process-private memory might not be as private as you
  think once you fork(): successive modifications of private memory
  regions in the parent process can still be observed by the child
  process, for example, by smart use of vmsplice()+munmap().

  The core problem is that pinning pages readable in a child process, such
  as done via the vmsplice system call, can result in a child process
  observing memory modifications done in the parent process the child is
  not supposed to observe. [1] contains an excellent summary and [2]
  contains further details. This issue was assigned CVE-2020-29374 [9].

  For this to trigger, it's required to use a fork() without subsequent
  exec(), for example, as used under Android zygote. Without further
  details about an application that forks less-privileged child processes,
  one cannot really say what's actually affected and what's not -- see the
  details section the end of this mail for a short sshd/openssh analysis.

  While commit 17839856fd58 ("gup: document and work around "COW can break
  either way" issue") fixed this issue and resulted in other problems
  (e.g., ptrace on pmem), commit 09854ba94c6a ("mm: do_wp_page()
  simplification") re-introduced part of the problem unfortunately.

  The original reproducer can be modified quite easily to use THP [3] and
  make the issue appear again on upstream kernels. I modified it to use
  hugetlb [4] and it triggers as well. The problem is certainly less
  severe with hugetlb than with THP; it merely highlights that we still
  have plenty of open holes we should be closing/fixing.

  Regarding vmsplice(), the only known workaround is to disallow the
  vmsplice() system call ... or disable THP and hugetlb. But who knows
  what else is affected (RDMA? O_DIRECT?) to achieve the same goal -- in
  the end, it's a more generic issue.
"

This security issue was first reported by Jann Horn on 27 May 2020 and it
currently affects anonymous pages during swapin, anonymous THP and hugetlb.
This series tackles anonymous pages during swapin and anonymous THP:
* do_swap_page() for handling COW on PTEs during swapin directly
* do_huge_pmd_wp_page() for handling COW on PMD-mapped THP during write
  faults

With this series, we'll apply the same COW logic we have in do_wp_page()
to all swappable anon pages: don't reuse (map writable) the page in
case there are additional references (page_count() != 1). All users of
reuse_swap_page() are remove, and consequently reuse_swap_page() is
removed.

In general, we're struggling with the following COW-related issues:
(1) "missed COW": we miss to copy on write and reuse the page (map it
    writable) although we must copy because there are pending references
    from another process to this page. The result is a security issue.
(2) "wrong COW": we copy on write although we wouldn't have to and
    shouldn't: if there are valid GUP references, they will become out of
    sync with the pages mapped into the page table. We fail to detect that
    such a page can be reused safely, especially if never more than a
    single process mapped the page. The result is an intra process
    memory corruption.
(3) "unnecessary COW": we copy on write although we wouldn't have to:
    performance degradation and temporary increases swap+memory consumption
    can be the result.

While this series fixes (1) for swappable anon pages, it tries to reduce
reported cases of (3) first as good and easy as possible to limit the
impact when streamlining. The individual patches try to describe in which
cases we will run into (3).

This series certainly makes (2) worse for THP, because a THP will now get
PTE-mapped on write faults if there are additional references, even if
there was only ever a single process involved: once PTE-mapped, we'll copy
each and every subpage and won't reuse any subpage as long as the
underlying compound page wasn't split.

I'm working on an approach to fix (2) and improve (3): PageAnonExclusive to
mark anon pages that are exclusive to a single process, allow GUP pins only
on such exclusive pages, and allow turning exclusive pages shared
(clearing PageAnonExclusive) only if there are no GUP pins. Anon pages with
PageAnonExclusive set never have to be copied during write faults, but
eventually during fork() if they cannot be turned shared. The improved
reuse logic in this series will essentially also be the logic to reset
PageAnonExclusive. This work will certainly take a while, but I'm planning
on sharing details before having code fully ready.


#1-#5 can be applied independently of the rest. #6-#9 are mostly only
cleanups related to reuse_swap_page().


Notes:
* For now, I'll leave hugetlb code untouched: "unnecessary COW" might
  easily break existing setups because hugetlb pages are a scarce resource
  and we could just end up having to crash the application when we run out
  of hugetlb pages. We have to be very careful and the security aspect with
  hugetlb is most certainly less relevant than for unprivileged anon pages.
* Instead of lru_add_drain() we might actually just drain the lru_add list
  or even just remove the single page of interest from the lru_add list.
  This would require a new helper function, and could be added if the
  conditional lru_add_drain() turn out to be a problem.
* I extended the test case already included in [1] to also test for the
  newly found do_swap_page() case. I'll send that out separately once/if
  this part was merged.

[1] https://lkml.kernel.org/r/20211217113049.23850-1-david@redhat.com
[2] https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@redhat.com


RFC v2 -> v3:
* "mm: optimize do_wp_page() for exclusive pages in the swapcache"
 * Extend patch description
 * Add RB/Ack
* "mm: optimize do_wp_page() for fresh pages in local LRU pagevecs"
 * Simplify first early check, but keep second early check as is
 * Extend patch description to state why
* "mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()"
 * Remove conditional LRU pagevec draining and simplify
 * Extend patch description + comments
* "mm/khugepaged: remove reuse_swap_page() usage"
 * Remove the special swapcache handling instead
 * Update patch description

David Hildenbrand (9):
  mm: optimize do_wp_page() for exclusive pages in the swapcache
  mm: optimize do_wp_page() for fresh pages in local LRU pagevecs
  mm: slightly clarify KSM logic in do_swap_page()
  mm: streamline COW logic in do_swap_page()
  mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()
  mm/khugepaged: remove reuse_swap_page() usage
  mm/swapfile: remove stale reuse_swap_page()
  mm/huge_memory: remove stale page_trans_huge_mapcount()
  mm/huge_memory: remove stale locking logic from __split_huge_pmd()

 include/linux/mm.h                 |   5 --
 include/linux/swap.h               |   4 -
 include/trace/events/huge_memory.h |   1 -
 mm/huge_memory.c                   |  93 +++-------------------
 mm/khugepaged.c                    |  11 ---
 mm/memory.c                        | 121 +++++++++++++++++++++--------
 mm/swapfile.c                      | 104 -------------------------
 7 files changed, 98 insertions(+), 241 deletions(-)


base-commit: 26291c54e111ff6ba87a164d85d4a4e134b7315c
-- 
2.34.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 1/9] mm: optimize do_wp_page() for exclusive pages in the swapcache
  2022-01-31 16:29 [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
@ 2022-01-31 16:29 ` David Hildenbrand
  2022-01-31 16:29 ` [PATCH v3 2/9] mm: optimize do_wp_page() for fresh pages in local LRU pagevecs David Hildenbrand
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2022-01-31 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Vlastimil Babka, Jann Horn, Michal Hocko, Nadav Amit,
	Rik van Riel, Roman Gushchin, Andrea Arcangeli, Peter Xu,
	Donald Dutile, Christoph Hellwig, Oleg Nesterov, Jan Kara,
	Liang Zhang, linux-mm, David Hildenbrand, Nadav Amit

Liang Zhang reported [1] that the current COW logic in do_wp_page() is
sub-optimal when it comes to swap+read fault+write fault of anonymous
pages that have a single user, visible via a performance degradation in
the redis benchmark. Something similar was previously reported [2] by
Nadav with a simple reproducer.

After we put an anon page into the swapcache and unmapped it from a single
process, that process might read that page again and refault it read-only.
If that process then writes to that page, the process is actually the
exclusive user of the page, however, the COW logic in do_co_page() won't be
able to reuse it due to the additional reference from the swapcache.

Let's optimize for pages that have been added to the swapcache but only
have an exclusive user. Try removing the swapcache reference if there is
hope that we're the exclusive user.

We will fail removing the swapcache reference in two scenarios:
(1) There are additional swap entries referencing the page: copying
    instead of reusing is the right thing to do.
(2) The page is under writeback: theoretically we might be able to reuse
    in some cases, however, we cannot remove the additional reference
    and will have to copy.

Note that we'll only try removing the page from the swapcache when it's
highly likely that we'll be the exclusive owner after removing the
page from the swapache. As we're about to map that page writable and
redirty it, that should not affect reclaim but is rather the right thing
to do.

Further, we might have additional references from the LRU pagevecs,
which will force us to copy instead of being able to reuse. We'll try
handling such references for some scenarios next. Concurrent writeback
cannot be handled easily and we'll always have to copy.

While at it, remove the superfluous page_mapcount() check: it's
implicitly covered by the page_count() for ordinary anon pages.

[1] https://lkml.kernel.org/r/20220113140318.11117-1-zhangliang5@huawei.com
[2] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com

Reported-by: Liang Zhang <zhangliang5@huawei.com>
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index c125c4969913..bcd3b7c50891 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3291,19 +3291,27 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
 	if (PageAnon(vmf->page)) {
 		struct page *page = vmf->page;
 
-		/* PageKsm() doesn't necessarily raise the page refcount */
-		if (PageKsm(page) || page_count(page) != 1)
+		/*
+		 * We have to verify under page lock: these early checks are
+		 * just an optimization to avoid locking the page and freeing
+		 * the swapcache if there is little hope that we can reuse.
+		 *
+		 * PageKsm() doesn't necessarily raise the page refcount.
+		 */
+		if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page))
 			goto copy;
 		if (!trylock_page(page))
 			goto copy;
-		if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) {
+		if (PageSwapCache(page))
+			try_to_free_swap(page);
+		if (PageKsm(page) || page_count(page) != 1) {
 			unlock_page(page);
 			goto copy;
 		}
 		/*
-		 * Ok, we've got the only map reference, and the only
-		 * page count reference, and the page is locked,
-		 * it's dark out, and we're wearing sunglasses. Hit it.
+		 * Ok, we've got the only page reference from our mapping
+		 * and the page is locked, it's dark out, and we're wearing
+		 * sunglasses. Hit it.
 		 */
 		unlock_page(page);
 		wp_page_reuse(vmf);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 2/9] mm: optimize do_wp_page() for fresh pages in local LRU pagevecs
  2022-01-31 16:29 [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
  2022-01-31 16:29 ` [PATCH v3 1/9] mm: optimize do_wp_page() for exclusive pages in the swapcache David Hildenbrand
@ 2022-01-31 16:29 ` David Hildenbrand
  2022-03-09 17:53   ` Vlastimil Babka
  2022-01-31 16:29 ` [PATCH v3 3/9] mm: slightly clarify KSM logic in do_swap_page() David Hildenbrand
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2022-01-31 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Vlastimil Babka, Jann Horn, Michal Hocko, Nadav Amit,
	Rik van Riel, Roman Gushchin, Andrea Arcangeli, Peter Xu,
	Donald Dutile, Christoph Hellwig, Oleg Nesterov, Jan Kara,
	Liang Zhang, linux-mm, David Hildenbrand

For example, if a page just got swapped in via a read fault, the LRU
pagevecs might still hold a reference to the page. If we trigger a
write fault on such a page, the additional reference from the LRU
pagevecs will prohibit reusing the page.

Let's conditionally drain the local LRU pagevecs when we stumble over a
!PageLRU() page. We cannot easily drain remote LRU pagevecs and it might
not be desirable performance-wise. Consequently, this will only avoid
copying in some cases.

Add a simple "page_count(page) > 3" check first but keep the
"page_count(page) > 1 + PageSwapCache(page)" check in place, as
we want to minimize cases where we remove a page from the swapcache but
won't be able to reuse it, for example, because another process has it
mapped R/O, to not affect reclaim.

We cannot easily handle the following cases and we will always have to
copy:

(1) The page is referenced in the LRU pagevecs of other CPUs. We really
    would have to drain the LRU pagevecs of all CPUs -- most probably
    copying is much cheaper.

(2) The page is already PageLRU() but is getting moved between LRU
    lists, for example, for activation (e.g., mark_page_accessed()),
    deactivation (MADV_COLD), or lazyfree (MADV_FREE). We'd have to
    drain mostly unconditionally, which might be bad performance-wise.
    Most probably this won't happen too often in practice.

Note that there are other reasons why an anon page might temporarily not
be PageLRU(): for example, compaction and migration have to isolate LRU
pages from the LRU lists first (isolate_lru_page()), moving them to
temporary local lists and clearing PageLRU() and holding an additional
reference on the page. In that case, we'll always copy.

This change seems to be fairly effective with the reproducer [1] shared
by Nadav, as long as writeback is done synchronously, for example, using
zram. However, with asynchronous writeback, we'll usually fail to free the
swapcache because the page is still under writeback: something we cannot
easily optimize for, and maybe it's not really relevant in practice.

[1] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index bcd3b7c50891..923165b4c27e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3298,7 +3298,15 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
 		 *
 		 * PageKsm() doesn't necessarily raise the page refcount.
 		 */
-		if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page))
+		if (PageKsm(page) || page_count(page) > 3)
+			goto copy;
+		if (!PageLRU(page))
+			/*
+			 * Note: We cannot easily detect+handle references from
+			 * remote LRU pagevecs or references to PageLRU() pages.
+			 */
+			lru_add_drain();
+		if (page_count(page) > 1 + PageSwapCache(page))
 			goto copy;
 		if (!trylock_page(page))
 			goto copy;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 3/9] mm: slightly clarify KSM logic in do_swap_page()
  2022-01-31 16:29 [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
  2022-01-31 16:29 ` [PATCH v3 1/9] mm: optimize do_wp_page() for exclusive pages in the swapcache David Hildenbrand
  2022-01-31 16:29 ` [PATCH v3 2/9] mm: optimize do_wp_page() for fresh pages in local LRU pagevecs David Hildenbrand
@ 2022-01-31 16:29 ` David Hildenbrand
  2022-03-09 18:03   ` Vlastimil Babka
  2022-03-09 18:48   ` Yang Shi
  2022-01-31 16:29 ` [PATCH v3 4/9] mm: streamline COW " David Hildenbrand
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 25+ messages in thread
From: David Hildenbrand @ 2022-01-31 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Vlastimil Babka, Jann Horn, Michal Hocko, Nadav Amit,
	Rik van Riel, Roman Gushchin, Andrea Arcangeli, Peter Xu,
	Donald Dutile, Christoph Hellwig, Oleg Nesterov, Jan Kara,
	Liang Zhang, linux-mm, David Hildenbrand

Let's make it clearer that KSM might only have to copy a page
in case we have a page in the swapcache, not if we allocated a fresh
page and bypassed the swapcache. While at it, add a comment why this is
usually necessary and merge the two swapcache conditions.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 923165b4c27e..3c91294cca98 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3615,21 +3615,29 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 		goto out_release;
 	}
 
-	/*
-	 * Make sure try_to_free_swap or reuse_swap_page or swapoff did not
-	 * release the swapcache from under us.  The page pin, and pte_same
-	 * test below, are not enough to exclude that.  Even if it is still
-	 * swapcache, we need to check that the page's swap has not changed.
-	 */
-	if (unlikely((!PageSwapCache(page) ||
-			page_private(page) != entry.val)) && swapcache)
-		goto out_page;
-
-	page = ksm_might_need_to_copy(page, vma, vmf->address);
-	if (unlikely(!page)) {
-		ret = VM_FAULT_OOM;
-		page = swapcache;
-		goto out_page;
+	if (swapcache) {
+		/*
+		 * Make sure try_to_free_swap or reuse_swap_page or swapoff did
+		 * not release the swapcache from under us.  The page pin, and
+		 * pte_same test below, are not enough to exclude that.  Even if
+		 * it is still swapcache, we need to check that the page's swap
+		 * has not changed.
+		 */
+		if (unlikely(!PageSwapCache(page) ||
+			     page_private(page) != entry.val))
+			goto out_page;
+
+		/*
+		 * KSM sometimes has to copy on read faults, for example, if
+		 * page->index of !PageKSM() pages would be nonlinear inside the
+		 * anon VMA -- PageKSM() is lost on actual swapout.
+		 */
+		page = ksm_might_need_to_copy(page, vma, vmf->address);
+		if (unlikely(!page)) {
+			ret = VM_FAULT_OOM;
+			page = swapcache;
+			goto out_page;
+		}
 	}
 
 	cgroup_throttle_swaprate(page, GFP_KERNEL);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 4/9] mm: streamline COW logic in do_swap_page()
  2022-01-31 16:29 [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
                   ` (2 preceding siblings ...)
  2022-01-31 16:29 ` [PATCH v3 3/9] mm: slightly clarify KSM logic in do_swap_page() David Hildenbrand
@ 2022-01-31 16:29 ` David Hildenbrand
  2022-03-10  9:41   ` Vlastimil Babka
  2022-01-31 16:29 ` [PATCH v3 5/9] mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() David Hildenbrand
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2022-01-31 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Vlastimil Babka, Jann Horn, Michal Hocko, Nadav Amit,
	Rik van Riel, Roman Gushchin, Andrea Arcangeli, Peter Xu,
	Donald Dutile, Christoph Hellwig, Oleg Nesterov, Jan Kara,
	Liang Zhang, linux-mm, David Hildenbrand

Currently we have a different COW logic when:
* triggering a read-fault to swapin first and then trigger a write-fault
  -> do_swap_page() + do_wp_page()
* triggering a write-fault to swapin
  -> do_swap_page() + do_wp_page() only if we fail reuse in do_swap_page()

The COW logic in do_swap_page() is different than our reuse logic in
do_wp_page(). The COW logic in do_wp_page() -- page_count() == 1 --  makes
currently sure that we certainly don't have a remaining reference, e.g.,
via GUP, on the target page we want to reuse: if there is any unexpected
reference, we have to copy to avoid information leaks.

As do_swap_page() behaves differently, in environments with swap enabled we
can currently have an unintended information leak from the parent to the
child, similar as known from CVE-2020-29374:

	1. Parent writes to anonymous page
	-> Page is mapped writable and modified
	2. Page is swapped out
	-> Page is unmapped and replaced by swap entry
	3. fork()
	-> Swap entries are copied to child
	4. Child pins page R/O
	-> Page is mapped R/O into child
	5. Child unmaps page
	-> Child still holds GUP reference
	6. Parent writes to page
	-> Page is reused in do_swap_page()
	-> Child can observe changes

Exchanging 2. and 3. should have the same effect.

Let's apply the same COW logic as in do_wp_page(), conditionally trying to
remove the page from the swapcache after freeing the swap entry, however,
before actually mapping our page. We can change the order now that
we use try_to_free_swap(), which doesn't care about the mapcount,
instead of reuse_swap_page().

To handle references from the LRU pagevecs, conditionally drain the local
LRU pagevecs when required, however, don't consider the page_count() when
deciding whether to drain to keep it simple for now.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 55 +++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 43 insertions(+), 12 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 3c91294cca98..c6177d897964 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3497,6 +3497,25 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
 	return 0;
 }
 
+static inline bool should_try_to_free_swap(struct page *page,
+					   struct vm_area_struct *vma,
+					   unsigned int fault_flags)
+{
+	if (!PageSwapCache(page))
+		return false;
+	if (mem_cgroup_swap_full(page) || (vma->vm_flags & VM_LOCKED) ||
+	    PageMlocked(page))
+		return true;
+	/*
+	 * If we want to map a page that's in the swapcache writable, we
+	 * have to detect via the refcount if we're really the exclusive
+	 * user. Try freeing the swapcache to get rid of the swapcache
+	 * reference only in case it's likely that we'll be the exlusive user.
+	 */
+	return (fault_flags & FAULT_FLAG_WRITE) && !PageKsm(page) &&
+		page_count(page) == 2;
+}
+
 /*
  * We enter with non-exclusive mmap_lock (to exclude vma changes,
  * but allow concurrent faults), and pte mapped but not yet locked.
@@ -3638,6 +3657,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 			page = swapcache;
 			goto out_page;
 		}
+
+		/*
+		 * If we want to map a page that's in the swapcache writable, we
+		 * have to detect via the refcount if we're really the exclusive
+		 * owner. Try removing the extra reference from the local LRU
+		 * pagevecs if required.
+		 */
+		if ((vmf->flags & FAULT_FLAG_WRITE) && page == swapcache &&
+		    !PageKsm(page) && !PageLRU(page))
+			lru_add_drain();
 	}
 
 	cgroup_throttle_swaprate(page, GFP_KERNEL);
@@ -3656,19 +3685,25 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	}
 
 	/*
-	 * The page isn't present yet, go ahead with the fault.
-	 *
-	 * Be careful about the sequence of operations here.
-	 * To get its accounting right, reuse_swap_page() must be called
-	 * while the page is counted on swap but not yet in mapcount i.e.
-	 * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
-	 * must be called after the swap_free(), or it will never succeed.
+	 * Remove the swap entry and conditionally try to free up the swapcache.
+	 * We're already holding a reference on the page but haven't mapped it
+	 * yet.
 	 */
+	swap_free(entry);
+	if (should_try_to_free_swap(page, vma, vmf->flags))
+		try_to_free_swap(page);
 
 	inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 	dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
-	if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
+
+	/*
+	 * Same logic as in do_wp_page(); however, optimize for fresh pages
+	 * that are certainly not shared because we just allocated them without
+	 * exposing them to the swapcache.
+	 */
+	if ((vmf->flags & FAULT_FLAG_WRITE) && !PageKsm(page) &&
+	    (page != swapcache || page_count(page) == 1)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		vmf->flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
@@ -3694,10 +3729,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
 	arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
 
-	swap_free(entry);
-	if (mem_cgroup_swap_full(page) ||
-	    (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
-		try_to_free_swap(page);
 	unlock_page(page);
 	if (page != swapcache && swapcache) {
 		/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 5/9] mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()
  2022-01-31 16:29 [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
                   ` (3 preceding siblings ...)
  2022-01-31 16:29 ` [PATCH v3 4/9] mm: streamline COW " David Hildenbrand
@ 2022-01-31 16:29 ` David Hildenbrand
  2022-03-10  9:52   ` Vlastimil Babka
  2022-01-31 16:29 ` [PATCH v3 6/9] mm/khugepaged: remove reuse_swap_page() usage David Hildenbrand
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2022-01-31 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Vlastimil Babka, Jann Horn, Michal Hocko, Nadav Amit,
	Rik van Riel, Roman Gushchin, Andrea Arcangeli, Peter Xu,
	Donald Dutile, Christoph Hellwig, Oleg Nesterov, Jan Kara,
	Liang Zhang, linux-mm, David Hildenbrand

We currently have a different COW logic for anon THP than we have for
ordinary anon pages in do_wp_page(): the effect is that the issue reported
in CVE-2020-29374 is currently still possible for anon THP: an unintended
information leak from the parent to the child.

Let's apply the same logic (page_count() == 1), with similar
optimizations to remove additional references first as we really want to
avoid PTE-mapping the THP and copying individual pages best we can.

If we end up with a page that has page_count() != 1, we'll have to PTE-map
the THP and fallback to do_wp_page(), which will always copy the page.

Note that KSM does not apply to THP.

I. Interaction with the swapcache and writeback

While a THP is in the swapcache, the swapcache holds one reference on each
subpage of the THP. So with PageSwapCache() set, we expect as many
additional references as we have subpages. If we manage to remove the
THP from the swapcache, all these references will be gone.

Usually, a THP is not split when entered into the swapcache and stays a
compound page. However, try_to_unmap() will PTE-map the THP and use PTE
swap entries. There are no PMD swap entries for that purpose, consequently,
we always only swapin subpages into PTEs.

Removing a page from the swapcache can fail either when there are remaining
swap entries (in which case COW is the right thing to do) or if the page is
currently under writeback.

Having a locked, R/O PMD-mapped THP that is in the swapcache seems to be
possible only in corner cases, for example, if try_to_unmap() failed
after adding the page to the swapcache. However, it's comparatively easy to
handle.

As we have to fully unmap a THP before starting writeback, and swapin is
always done on the PTE level, we shouldn't find a R/O PMD-mapped THP in the
swapcache that is under writeback. This should at least leave writeback
out of the picture.

II. Interaction with GUP references

Having a R/O PMD-mapped THP with GUP references (i.e., R/O references)
will result in PTE-mapping the THP on a write fault. Similar to ordinary
anon pages, do_wp_page() will have to copy sub-pages and result in a
disconnect between the GUP references and the pages actually mapped into
the page tables. To improve the situation in the future, we'll need
additional handling to mark anonymous pages as definitely exclusive to a
single process, only allow GUP pins on exclusive anon pages, and
disallow sharing of exclusive anon pages with GUP pins e.g., during
fork().

III. Interaction with references from LRU pagevecs

There is no need to try draining the (local) LRU pagevecs in case we would
stumble over a !PageLRU() page: folio_add_lru() and friends will always
flush the affected pagevec after adding a compound page to it
immediately -- pagevec_add_and_need_flush() always returns "true" for them.
Note that the LRU pagevecs will hold a reference on the compound page for
a very short time, between adding the page to the pagevec and draining it
immediately afterwards.

IV. Interaction with speculative/temporary references

Similar to ordinary anon pages, other speculative/temporary references on
the THP, for example, from the pagecache or page migration code, will
disallow exclusive reuse of the page. We'll have to PTE-map the THP.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/huge_memory.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 406a3c28c026..f34ebc5cb827 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1303,7 +1303,6 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf)
 	page = pmd_page(orig_pmd);
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 
-	/* Lock page for reuse_swap_page() */
 	if (!trylock_page(page)) {
 		get_page(page);
 		spin_unlock(vmf->ptl);
@@ -1319,10 +1318,15 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf)
 	}
 
 	/*
-	 * We can only reuse the page if nobody else maps the huge page or it's
-	 * part.
+	 * See do_wp_page(): we can only map the page writable if there are
+	 * no additional references. Note that we always drain the LRU
+	 * pagevecs immediately after adding a THP.
 	 */
-	if (reuse_swap_page(page)) {
+	if (page_count(page) > 1 + PageSwapCache(page) * thp_nr_pages(page))
+		goto unlock_fallback;
+	if (PageSwapCache(page))
+		try_to_free_swap(page);
+	if (page_count(page) == 1) {
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
@@ -1333,6 +1337,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf)
 		return VM_FAULT_WRITE;
 	}
 
+unlock_fallback:
 	unlock_page(page);
 	spin_unlock(vmf->ptl);
 fallback:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 6/9] mm/khugepaged: remove reuse_swap_page() usage
  2022-01-31 16:29 [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
                   ` (4 preceding siblings ...)
  2022-01-31 16:29 ` [PATCH v3 5/9] mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() David Hildenbrand
@ 2022-01-31 16:29 ` David Hildenbrand
  2022-02-01 21:31   ` Yang Shi
  2022-03-10 10:37   ` Vlastimil Babka
  2022-01-31 16:29 ` [PATCH v3 7/9] mm/swapfile: remove stale reuse_swap_page() David Hildenbrand
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 25+ messages in thread
From: David Hildenbrand @ 2022-01-31 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Vlastimil Babka, Jann Horn, Michal Hocko, Nadav Amit,
	Rik van Riel, Roman Gushchin, Andrea Arcangeli, Peter Xu,
	Donald Dutile, Christoph Hellwig, Oleg Nesterov, Jan Kara,
	Liang Zhang, linux-mm, David Hildenbrand

reuse_swap_page() currently indicates if we can write to an anon page
without COW. A COW is required if the page is shared by multiple
processes (either already mapped or via swap entries) or if there is
concurrent writeback that cannot tolerate concurrent page modifications.

However, in the context of khugepaged we're not actually going to write
to a read-only mapped page, we'll copy the page content to our newly
allocated THP and map that THP writable. All we have to make sure
is that the read-only mapped page we're about to copy won't get reused
by another process sharing the page, otherwise, page content would
get modified. But that is already guaranteed via multiple mechanisms
(e.g., holding a reference, holding the page lock, removing the rmap after
 copying the page).

The swapcache handling was introduced in commit 10359213d05a ("mm:
incorporate read-only pages into transparent huge pages") and it sounds
like it merely wanted to mimic what do_swap_page() would do when trying
to map a page obtained via the swapcache writable.

As that logic is unnecessary, let's just remove it, removing the last
user of reuse_swap_page().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/trace/events/huge_memory.h |  1 -
 mm/khugepaged.c                    | 11 -----------
 2 files changed, 12 deletions(-)

diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
index 4fdb14a81108..d651f3437367 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -29,7 +29,6 @@
 	EM( SCAN_VMA_NULL,		"vma_null")			\
 	EM( SCAN_VMA_CHECK,		"vma_check_failed")		\
 	EM( SCAN_ADDRESS_RANGE,		"not_suitable_address_range")	\
-	EM( SCAN_SWAP_CACHE_PAGE,	"page_swap_cache")		\
 	EM( SCAN_DEL_PAGE_LRU,		"could_not_delete_page_from_lru")\
 	EM( SCAN_ALLOC_HUGE_PAGE_FAIL,	"alloc_huge_page_failed")	\
 	EM( SCAN_CGROUP_CHARGE_FAIL,	"ccgroup_charge_failed")	\
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 35f14d0a00a6..9da9325ab4d4 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -45,7 +45,6 @@ enum scan_result {
 	SCAN_VMA_NULL,
 	SCAN_VMA_CHECK,
 	SCAN_ADDRESS_RANGE,
-	SCAN_SWAP_CACHE_PAGE,
 	SCAN_DEL_PAGE_LRU,
 	SCAN_ALLOC_HUGE_PAGE_FAIL,
 	SCAN_CGROUP_CHARGE_FAIL,
@@ -682,16 +681,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 			result = SCAN_PAGE_COUNT;
 			goto out;
 		}
-		if (!pte_write(pteval) && PageSwapCache(page) &&
-				!reuse_swap_page(page)) {
-			/*
-			 * Page is in the swap cache and cannot be re-used.
-			 * It cannot be collapsed into a THP.
-			 */
-			unlock_page(page);
-			result = SCAN_SWAP_CACHE_PAGE;
-			goto out;
-		}
 
 		/*
 		 * Isolate the page to avoid collapsing an hugepage
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 7/9] mm/swapfile: remove stale reuse_swap_page()
  2022-01-31 16:29 [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
                   ` (5 preceding siblings ...)
  2022-01-31 16:29 ` [PATCH v3 6/9] mm/khugepaged: remove reuse_swap_page() usage David Hildenbrand
@ 2022-01-31 16:29 ` David Hildenbrand
  2022-02-02 14:35   ` Christoph Hellwig
  2022-03-10 10:44   ` Vlastimil Babka
  2022-01-31 16:29 ` [PATCH v3 8/9] mm/huge_memory: remove stale page_trans_huge_mapcount() David Hildenbrand
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 25+ messages in thread
From: David Hildenbrand @ 2022-01-31 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Vlastimil Babka, Jann Horn, Michal Hocko, Nadav Amit,
	Rik van Riel, Roman Gushchin, Andrea Arcangeli, Peter Xu,
	Donald Dutile, Christoph Hellwig, Oleg Nesterov, Jan Kara,
	Liang Zhang, linux-mm, David Hildenbrand

All users are gone, let's remove it. We'll let SWP_STABLE_WRITES stick
around for now, as it might come in handy in the near future.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/swap.h |   4 --
 mm/swapfile.c        | 104 -------------------------------------------
 2 files changed, 108 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1d38d9475c4d..b546e4bd5c5a 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -514,7 +514,6 @@ extern int __swp_swapcount(swp_entry_t entry);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
 extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
-extern bool reuse_swap_page(struct page *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
 extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
@@ -680,9 +679,6 @@ static inline int swp_swapcount(swp_entry_t entry)
 	return 0;
 }
 
-#define reuse_swap_page(page) \
-	(page_trans_huge_mapcount(page) == 1)
-
 static inline int try_to_free_swap(struct page *page)
 {
 	return 0;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index bf0df7aa7158..a5183315dc58 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1167,16 +1167,6 @@ static struct swap_info_struct *_swap_info_get(swp_entry_t entry)
 	return NULL;
 }
 
-static struct swap_info_struct *swap_info_get(swp_entry_t entry)
-{
-	struct swap_info_struct *p;
-
-	p = _swap_info_get(entry);
-	if (p)
-		spin_lock(&p->lock);
-	return p;
-}
-
 static struct swap_info_struct *swap_info_get_cont(swp_entry_t entry,
 					struct swap_info_struct *q)
 {
@@ -1601,100 +1591,6 @@ static bool page_swapped(struct page *page)
 	return false;
 }
 
-static int page_trans_huge_map_swapcount(struct page *page,
-					 int *total_swapcount)
-{
-	int i, map_swapcount, _total_swapcount;
-	unsigned long offset = 0;
-	struct swap_info_struct *si;
-	struct swap_cluster_info *ci = NULL;
-	unsigned char *map = NULL;
-	int swapcount = 0;
-
-	/* hugetlbfs shouldn't call it */
-	VM_BUG_ON_PAGE(PageHuge(page), page);
-
-	if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!PageTransCompound(page))) {
-		if (PageSwapCache(page))
-			swapcount = page_swapcount(page);
-		if (total_swapcount)
-			*total_swapcount = swapcount;
-		return swapcount + page_trans_huge_mapcount(page);
-	}
-
-	page = compound_head(page);
-
-	_total_swapcount = map_swapcount = 0;
-	if (PageSwapCache(page)) {
-		swp_entry_t entry;
-
-		entry.val = page_private(page);
-		si = _swap_info_get(entry);
-		if (si) {
-			map = si->swap_map;
-			offset = swp_offset(entry);
-		}
-	}
-	if (map)
-		ci = lock_cluster(si, offset);
-	for (i = 0; i < HPAGE_PMD_NR; i++) {
-		int mapcount = atomic_read(&page[i]._mapcount) + 1;
-		if (map) {
-			swapcount = swap_count(map[offset + i]);
-			_total_swapcount += swapcount;
-		}
-		map_swapcount = max(map_swapcount, mapcount + swapcount);
-	}
-	unlock_cluster(ci);
-
-	if (PageDoubleMap(page))
-		map_swapcount -= 1;
-
-	if (total_swapcount)
-		*total_swapcount = _total_swapcount;
-
-	return map_swapcount + compound_mapcount(page);
-}
-
-/*
- * We can write to an anon page without COW if there are no other references
- * to it.  And as a side-effect, free up its swap: because the old content
- * on disk will never be read, and seeking back there to write new content
- * later would only waste time away from clustering.
- */
-bool reuse_swap_page(struct page *page)
-{
-	int count, total_swapcount;
-
-	VM_BUG_ON_PAGE(!PageLocked(page), page);
-	if (unlikely(PageKsm(page)))
-		return false;
-	count = page_trans_huge_map_swapcount(page, &total_swapcount);
-	if (count == 1 && PageSwapCache(page) &&
-	    (likely(!PageTransCompound(page)) ||
-	     /* The remaining swap count will be freed soon */
-	     total_swapcount == page_swapcount(page))) {
-		if (!PageWriteback(page)) {
-			page = compound_head(page);
-			delete_from_swap_cache(page);
-			SetPageDirty(page);
-		} else {
-			swp_entry_t entry;
-			struct swap_info_struct *p;
-
-			entry.val = page_private(page);
-			p = swap_info_get(entry);
-			if (p->flags & SWP_STABLE_WRITES) {
-				spin_unlock(&p->lock);
-				return false;
-			}
-			spin_unlock(&p->lock);
-		}
-	}
-
-	return count <= 1;
-}
-
 /*
  * If swap is getting full, or if there are no more mappings of this page,
  * then try_to_free_swap is called to free its swap space.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 8/9] mm/huge_memory: remove stale page_trans_huge_mapcount()
  2022-01-31 16:29 [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
                   ` (6 preceding siblings ...)
  2022-01-31 16:29 ` [PATCH v3 7/9] mm/swapfile: remove stale reuse_swap_page() David Hildenbrand
@ 2022-01-31 16:29 ` David Hildenbrand
  2022-03-10 10:50   ` Vlastimil Babka
  2022-01-31 16:29 ` [PATCH v3 9/9] mm/huge_memory: remove stale locking logic from __split_huge_pmd() David Hildenbrand
  2022-02-01 18:59 ` [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap Linus Torvalds
  9 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2022-01-31 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Vlastimil Babka, Jann Horn, Michal Hocko, Nadav Amit,
	Rik van Riel, Roman Gushchin, Andrea Arcangeli, Peter Xu,
	Donald Dutile, Christoph Hellwig, Oleg Nesterov, Jan Kara,
	Liang Zhang, linux-mm, David Hildenbrand

All users are gone, let's remove it.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm.h |  5 -----
 mm/huge_memory.c   | 48 ----------------------------------------------
 2 files changed, 53 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 213cc569b192..a12291cfe5dd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -820,16 +820,11 @@ static inline int page_mapcount(struct page *page)
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
-int page_trans_huge_mapcount(struct page *page);
 #else
 static inline int total_mapcount(struct page *page)
 {
 	return page_mapcount(page);
 }
-static inline int page_trans_huge_mapcount(struct page *page)
-{
-	return page_mapcount(page);
-}
 #endif
 
 static inline struct page *virt_to_head_page(const void *x)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f34ebc5cb827..a6dc5af1a763 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2523,54 +2523,6 @@ int total_mapcount(struct page *page)
 	return ret;
 }
 
-/*
- * This calculates accurately how many mappings a transparent hugepage
- * has (unlike page_mapcount() which isn't fully accurate). This full
- * accuracy is primarily needed to know if copy-on-write faults can
- * reuse the page and change the mapping to read-write instead of
- * copying them. At the same time this returns the total_mapcount too.
- *
- * The function returns the highest mapcount any one of the subpages
- * has. If the return value is one, even if different processes are
- * mapping different subpages of the transparent hugepage, they can
- * all reuse it, because each process is reusing a different subpage.
- *
- * The total_mapcount is instead counting all virtual mappings of the
- * subpages. If the total_mapcount is equal to "one", it tells the
- * caller all mappings belong to the same "mm" and in turn the
- * anon_vma of the transparent hugepage can become the vma->anon_vma
- * local one as no other process may be mapping any of the subpages.
- *
- * It would be more accurate to replace page_mapcount() with
- * page_trans_huge_mapcount(), however we only use
- * page_trans_huge_mapcount() in the copy-on-write faults where we
- * need full accuracy to avoid breaking page pinning, because
- * page_trans_huge_mapcount() is slower than page_mapcount().
- */
-int page_trans_huge_mapcount(struct page *page)
-{
-	int i, ret;
-
-	/* hugetlbfs shouldn't call it */
-	VM_BUG_ON_PAGE(PageHuge(page), page);
-
-	if (likely(!PageTransCompound(page)))
-		return atomic_read(&page->_mapcount) + 1;
-
-	page = compound_head(page);
-
-	ret = 0;
-	for (i = 0; i < thp_nr_pages(page); i++) {
-		int mapcount = atomic_read(&page[i]._mapcount) + 1;
-		ret = max(ret, mapcount);
-	}
-
-	if (PageDoubleMap(page))
-		ret -= 1;
-
-	return ret + compound_mapcount(page);
-}
-
 /* Racy check whether the huge page can be split */
 bool can_split_huge_page(struct page *page, int *pextra_pins)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 9/9] mm/huge_memory: remove stale locking logic from __split_huge_pmd()
  2022-01-31 16:29 [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
                   ` (7 preceding siblings ...)
  2022-01-31 16:29 ` [PATCH v3 8/9] mm/huge_memory: remove stale page_trans_huge_mapcount() David Hildenbrand
@ 2022-01-31 16:29 ` David Hildenbrand
  2022-03-10 11:02   ` Vlastimil Babka
  2022-02-01 18:59 ` [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap Linus Torvalds
  9 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2022-01-31 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Vlastimil Babka, Jann Horn, Michal Hocko, Nadav Amit,
	Rik van Riel, Roman Gushchin, Andrea Arcangeli, Peter Xu,
	Donald Dutile, Christoph Hellwig, Oleg Nesterov, Jan Kara,
	Liang Zhang, linux-mm, David Hildenbrand

Let's remove the stale logic that was required for reuse_swap_page().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/huge_memory.c | 32 +-------------------------------
 1 file changed, 1 insertion(+), 31 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a6dc5af1a763..cda88d8ac1bd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2152,8 +2152,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 {
 	spinlock_t *ptl;
 	struct mmu_notifier_range range;
-	bool do_unlock_page = false;
-	pmd_t _pmd;
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
 				address & HPAGE_PMD_MASK,
@@ -2172,35 +2170,9 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			goto out;
 	}
 
-repeat:
 	if (pmd_trans_huge(*pmd)) {
-		if (!page) {
+		if (!page)
 			page = pmd_page(*pmd);
-			/*
-			 * An anonymous page must be locked, to ensure that a
-			 * concurrent reuse_swap_page() sees stable mapcount;
-			 * but reuse_swap_page() is not used on shmem or file,
-			 * and page lock must not be taken when zap_pmd_range()
-			 * calls __split_huge_pmd() while i_mmap_lock is held.
-			 */
-			if (PageAnon(page)) {
-				if (unlikely(!trylock_page(page))) {
-					get_page(page);
-					_pmd = *pmd;
-					spin_unlock(ptl);
-					lock_page(page);
-					spin_lock(ptl);
-					if (unlikely(!pmd_same(*pmd, _pmd))) {
-						unlock_page(page);
-						put_page(page);
-						page = NULL;
-						goto repeat;
-					}
-					put_page(page);
-				}
-				do_unlock_page = true;
-			}
-		}
 		if (PageMlocked(page))
 			clear_page_mlock(page);
 	} else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
@@ -2208,8 +2180,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 	__split_huge_pmd_locked(vma, pmd, range.start, freeze);
 out:
 	spin_unlock(ptl);
-	if (do_unlock_page)
-		unlock_page(page);
 	/*
 	 * No need to double call mmu_notifier->invalidate_range() callback.
 	 * They are 3 cases to consider inside __split_huge_pmd_locked():
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap
  2022-01-31 16:29 [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
                   ` (8 preceding siblings ...)
  2022-01-31 16:29 ` [PATCH v3 9/9] mm/huge_memory: remove stale locking logic from __split_huge_pmd() David Hildenbrand
@ 2022-02-01 18:59 ` Linus Torvalds
  9 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2022-02-01 18:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux Kernel Mailing List, Andrew Morton, Hugh Dickins,
	David Rientjes, Shakeel Butt, John Hubbard, Jason Gunthorpe,
	Mike Kravetz, Mike Rapoport, Yang Shi, Kirill A . Shutemov,
	Matthew Wilcox, Vlastimil Babka, Jann Horn, Michal Hocko,
	Nadav Amit, Rik van Riel, Roman Gushchin, Andrea Arcangeli,
	Peter Xu, Donald Dutile, Christoph Hellwig, Oleg Nesterov,
	Jan Kara, Liang Zhang, Linux-MM, Nadav Amit

On Mon, Jan 31, 2022 at 8:31 AM David Hildenbrand <david@redhat.com> wrote:
>
>  7 files changed, 98 insertions(+), 241 deletions(-)

The series looks sane to me, and I love that diffstat..

               Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 6/9] mm/khugepaged: remove reuse_swap_page() usage
  2022-01-31 16:29 ` [PATCH v3 6/9] mm/khugepaged: remove reuse_swap_page() usage David Hildenbrand
@ 2022-02-01 21:31   ` Yang Shi
  2022-03-10 10:37   ` Vlastimil Babka
  1 sibling, 0 replies; 25+ messages in thread
From: Yang Shi @ 2022-02-01 21:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux Kernel Mailing List, Andrew Morton, Hugh Dickins,
	Linus Torvalds, David Rientjes, Shakeel Butt, John Hubbard,
	Jason Gunthorpe, Mike Kravetz, Mike Rapoport,
	Kirill A . Shutemov, Matthew Wilcox, Vlastimil Babka, Jann Horn,
	Michal Hocko, Nadav Amit, Rik van Riel, Roman Gushchin,
	Andrea Arcangeli, Peter Xu, Donald Dutile, Christoph Hellwig,
	Oleg Nesterov, Jan Kara, Liang Zhang, Linux MM

On Mon, Jan 31, 2022 at 8:33 AM David Hildenbrand <david@redhat.com> wrote:
>
> reuse_swap_page() currently indicates if we can write to an anon page
> without COW. A COW is required if the page is shared by multiple
> processes (either already mapped or via swap entries) or if there is
> concurrent writeback that cannot tolerate concurrent page modifications.
>
> However, in the context of khugepaged we're not actually going to write
> to a read-only mapped page, we'll copy the page content to our newly
> allocated THP and map that THP writable. All we have to make sure
> is that the read-only mapped page we're about to copy won't get reused
> by another process sharing the page, otherwise, page content would
> get modified. But that is already guaranteed via multiple mechanisms
> (e.g., holding a reference, holding the page lock, removing the rmap after
>  copying the page).
>
> The swapcache handling was introduced in commit 10359213d05a ("mm:
> incorporate read-only pages into transparent huge pages") and it sounds
> like it merely wanted to mimic what do_swap_page() would do when trying
> to map a page obtained via the swapcache writable.
>
> As that logic is unnecessary, let's just remove it, removing the last
> user of reuse_swap_page().

Thanks for cleaning this up. I didn't spot anything wrong. You could
add Reviewed-by: Yang Shi <shy828301@gmail.com>

>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/trace/events/huge_memory.h |  1 -
>  mm/khugepaged.c                    | 11 -----------
>  2 files changed, 12 deletions(-)
>
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index 4fdb14a81108..d651f3437367 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -29,7 +29,6 @@
>         EM( SCAN_VMA_NULL,              "vma_null")                     \
>         EM( SCAN_VMA_CHECK,             "vma_check_failed")             \
>         EM( SCAN_ADDRESS_RANGE,         "not_suitable_address_range")   \
> -       EM( SCAN_SWAP_CACHE_PAGE,       "page_swap_cache")              \
>         EM( SCAN_DEL_PAGE_LRU,          "could_not_delete_page_from_lru")\
>         EM( SCAN_ALLOC_HUGE_PAGE_FAIL,  "alloc_huge_page_failed")       \
>         EM( SCAN_CGROUP_CHARGE_FAIL,    "ccgroup_charge_failed")        \
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 35f14d0a00a6..9da9325ab4d4 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -45,7 +45,6 @@ enum scan_result {
>         SCAN_VMA_NULL,
>         SCAN_VMA_CHECK,
>         SCAN_ADDRESS_RANGE,
> -       SCAN_SWAP_CACHE_PAGE,
>         SCAN_DEL_PAGE_LRU,
>         SCAN_ALLOC_HUGE_PAGE_FAIL,
>         SCAN_CGROUP_CHARGE_FAIL,
> @@ -682,16 +681,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>                         result = SCAN_PAGE_COUNT;
>                         goto out;
>                 }
> -               if (!pte_write(pteval) && PageSwapCache(page) &&
> -                               !reuse_swap_page(page)) {
> -                       /*
> -                        * Page is in the swap cache and cannot be re-used.
> -                        * It cannot be collapsed into a THP.
> -                        */
> -                       unlock_page(page);
> -                       result = SCAN_SWAP_CACHE_PAGE;
> -                       goto out;
> -               }
>
>                 /*
>                  * Isolate the page to avoid collapsing an hugepage
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/9] mm/swapfile: remove stale reuse_swap_page()
  2022-01-31 16:29 ` [PATCH v3 7/9] mm/swapfile: remove stale reuse_swap_page() David Hildenbrand
@ 2022-02-02 14:35   ` Christoph Hellwig
  2022-02-02 17:01     ` David Hildenbrand
  2022-03-10 10:44   ` Vlastimil Babka
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-02 14:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, Andrew Morton, Hugh Dickins, Linus Torvalds,
	David Rientjes, Shakeel Butt, John Hubbard, Jason Gunthorpe,
	Mike Kravetz, Mike Rapoport, Yang Shi, Kirill A . Shutemov,
	Matthew Wilcox, Vlastimil Babka, Jann Horn, Michal Hocko,
	Nadav Amit, Rik van Riel, Roman Gushchin, Andrea Arcangeli,
	Peter Xu, Donald Dutile, Christoph Hellwig, Oleg Nesterov,
	Jan Kara, Liang Zhang, linux-mm

On Mon, Jan 31, 2022 at 05:29:37PM +0100, David Hildenbrand wrote:
> All users are gone, let's remove it. We'll let SWP_STABLE_WRITES stick
> around for now, as it might come in handy in the near future.

I don't think leaving a flag that has no user an a completely trivial
place to set it around is a good idea.  This is a classic case of
bitrot.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/9] mm/swapfile: remove stale reuse_swap_page()
  2022-02-02 14:35   ` Christoph Hellwig
@ 2022-02-02 17:01     ` David Hildenbrand
  0 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2022-02-02 17:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, Andrew Morton, Hugh Dickins, Linus Torvalds,
	David Rientjes, Shakeel Butt, John Hubbard, Jason Gunthorpe,
	Mike Kravetz, Mike Rapoport, Yang Shi, Kirill A . Shutemov,
	Matthew Wilcox, Vlastimil Babka, Jann Horn, Michal Hocko,
	Nadav Amit, Rik van Riel, Roman Gushchin, Andrea Arcangeli,
	Peter Xu, Donald Dutile, Oleg Nesterov, Jan Kara, Liang Zhang,
	linux-mm

On 02.02.22 15:35, Christoph Hellwig wrote:
> On Mon, Jan 31, 2022 at 05:29:37PM +0100, David Hildenbrand wrote:
>> All users are gone, let's remove it. We'll let SWP_STABLE_WRITES stick
>> around for now, as it might come in handy in the near future.
> 
> I don't think leaving a flag that has no user an a completely trivial
> place to set it around is a good idea.  This is a classic case of
> bitrot.

Right now I'm planning on using it in part 2 -- on which I'm currently
working on. If I don't end up reusing it, I'll remember to remove it.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 2/9] mm: optimize do_wp_page() for fresh pages in local LRU pagevecs
  2022-01-31 16:29 ` [PATCH v3 2/9] mm: optimize do_wp_page() for fresh pages in local LRU pagevecs David Hildenbrand
@ 2022-03-09 17:53   ` Vlastimil Babka
  0 siblings, 0 replies; 25+ messages in thread
From: Vlastimil Babka @ 2022-03-09 17:53 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Jann Horn, Michal Hocko, Nadav Amit, Rik van Riel,
	Roman Gushchin, Andrea Arcangeli, Peter Xu, Donald Dutile,
	Christoph Hellwig, Oleg Nesterov, Jan Kara, Liang Zhang,
	linux-mm

On 1/31/22 17:29, David Hildenbrand wrote:
> For example, if a page just got swapped in via a read fault, the LRU
> pagevecs might still hold a reference to the page. If we trigger a
> write fault on such a page, the additional reference from the LRU
> pagevecs will prohibit reusing the page.
> 
> Let's conditionally drain the local LRU pagevecs when we stumble over a
> !PageLRU() page. We cannot easily drain remote LRU pagevecs and it might
> not be desirable performance-wise. Consequently, this will only avoid
> copying in some cases.
> 
> Add a simple "page_count(page) > 3" check first but keep the
> "page_count(page) > 1 + PageSwapCache(page)" check in place, as
> we want to minimize cases where we remove a page from the swapcache but
> won't be able to reuse it, for example, because another process has it
> mapped R/O, to not affect reclaim.
> 
> We cannot easily handle the following cases and we will always have to
> copy:
> 
> (1) The page is referenced in the LRU pagevecs of other CPUs. We really
>     would have to drain the LRU pagevecs of all CPUs -- most probably
>     copying is much cheaper.
> 
> (2) The page is already PageLRU() but is getting moved between LRU
>     lists, for example, for activation (e.g., mark_page_accessed()),
>     deactivation (MADV_COLD), or lazyfree (MADV_FREE). We'd have to
>     drain mostly unconditionally, which might be bad performance-wise.
>     Most probably this won't happen too often in practice.
> 
> Note that there are other reasons why an anon page might temporarily not
> be PageLRU(): for example, compaction and migration have to isolate LRU
> pages from the LRU lists first (isolate_lru_page()), moving them to
> temporary local lists and clearing PageLRU() and holding an additional
> reference on the page. In that case, we'll always copy.
> 
> This change seems to be fairly effective with the reproducer [1] shared
> by Nadav, as long as writeback is done synchronously, for example, using
> zram. However, with asynchronous writeback, we'll usually fail to free the
> swapcache because the page is still under writeback: something we cannot
> easily optimize for, and maybe it's not really relevant in practice.
> 
> [1] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/9] mm: slightly clarify KSM logic in do_swap_page()
  2022-01-31 16:29 ` [PATCH v3 3/9] mm: slightly clarify KSM logic in do_swap_page() David Hildenbrand
@ 2022-03-09 18:03   ` Vlastimil Babka
  2022-03-09 18:48   ` Yang Shi
  1 sibling, 0 replies; 25+ messages in thread
From: Vlastimil Babka @ 2022-03-09 18:03 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Jann Horn, Michal Hocko, Nadav Amit, Rik van Riel,
	Roman Gushchin, Andrea Arcangeli, Peter Xu, Donald Dutile,
	Christoph Hellwig, Oleg Nesterov, Jan Kara, Liang Zhang,
	linux-mm

On 1/31/22 17:29, David Hildenbrand wrote:
> Let's make it clearer that KSM might only have to copy a page
> in case we have a page in the swapcache, not if we allocated a fresh
> page and bypassed the swapcache. While at it, add a comment why this is
> usually necessary and merge the two swapcache conditions.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>


> ---
>  mm/memory.c | 38 +++++++++++++++++++++++---------------
>  1 file changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 923165b4c27e..3c91294cca98 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3615,21 +3615,29 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>  		goto out_release;
>  	}
>  
> -	/*
> -	 * Make sure try_to_free_swap or reuse_swap_page or swapoff did not
> -	 * release the swapcache from under us.  The page pin, and pte_same
> -	 * test below, are not enough to exclude that.  Even if it is still
> -	 * swapcache, we need to check that the page's swap has not changed.
> -	 */
> -	if (unlikely((!PageSwapCache(page) ||
> -			page_private(page) != entry.val)) && swapcache)
> -		goto out_page;
> -
> -	page = ksm_might_need_to_copy(page, vma, vmf->address);
> -	if (unlikely(!page)) {
> -		ret = VM_FAULT_OOM;
> -		page = swapcache;
> -		goto out_page;
> +	if (swapcache) {
> +		/*
> +		 * Make sure try_to_free_swap or reuse_swap_page or swapoff did
> +		 * not release the swapcache from under us.  The page pin, and
> +		 * pte_same test below, are not enough to exclude that.  Even if
> +		 * it is still swapcache, we need to check that the page's swap
> +		 * has not changed.
> +		 */
> +		if (unlikely(!PageSwapCache(page) ||
> +			     page_private(page) != entry.val))
> +			goto out_page;
> +
> +		/*
> +		 * KSM sometimes has to copy on read faults, for example, if
> +		 * page->index of !PageKSM() pages would be nonlinear inside the
> +		 * anon VMA -- PageKSM() is lost on actual swapout.
> +		 */
> +		page = ksm_might_need_to_copy(page, vma, vmf->address);
> +		if (unlikely(!page)) {
> +			ret = VM_FAULT_OOM;
> +			page = swapcache;
> +			goto out_page;
> +		}
>  	}
>  
>  	cgroup_throttle_swaprate(page, GFP_KERNEL);


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/9] mm: slightly clarify KSM logic in do_swap_page()
  2022-01-31 16:29 ` [PATCH v3 3/9] mm: slightly clarify KSM logic in do_swap_page() David Hildenbrand
  2022-03-09 18:03   ` Vlastimil Babka
@ 2022-03-09 18:48   ` Yang Shi
  2022-03-09 19:15     ` David Hildenbrand
  1 sibling, 1 reply; 25+ messages in thread
From: Yang Shi @ 2022-03-09 18:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux Kernel Mailing List, Andrew Morton, Hugh Dickins,
	Linus Torvalds, David Rientjes, Shakeel Butt, John Hubbard,
	Jason Gunthorpe, Mike Kravetz, Mike Rapoport,
	Kirill A . Shutemov, Matthew Wilcox, Vlastimil Babka, Jann Horn,
	Michal Hocko, Nadav Amit, Rik van Riel, Roman Gushchin,
	Andrea Arcangeli, Peter Xu, Donald Dutile, Christoph Hellwig,
	Oleg Nesterov, Jan Kara, Liang Zhang, Linux MM

On Mon, Jan 31, 2022 at 8:33 AM David Hildenbrand <david@redhat.com> wrote:
>
> Let's make it clearer that KSM might only have to copy a page
> in case we have a page in the swapcache, not if we allocated a fresh
> page and bypassed the swapcache. While at it, add a comment why this is
> usually necessary and merge the two swapcache conditions.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory.c | 38 +++++++++++++++++++++++---------------
>  1 file changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 923165b4c27e..3c91294cca98 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3615,21 +3615,29 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>                 goto out_release;
>         }
>
> -       /*
> -        * Make sure try_to_free_swap or reuse_swap_page or swapoff did not

We could remove the reference to "reuse_swap_page", right?

> -        * release the swapcache from under us.  The page pin, and pte_same
> -        * test below, are not enough to exclude that.  Even if it is still
> -        * swapcache, we need to check that the page's swap has not changed.
> -        */
> -       if (unlikely((!PageSwapCache(page) ||
> -                       page_private(page) != entry.val)) && swapcache)
> -               goto out_page;
> -
> -       page = ksm_might_need_to_copy(page, vma, vmf->address);
> -       if (unlikely(!page)) {
> -               ret = VM_FAULT_OOM;
> -               page = swapcache;
> -               goto out_page;
> +       if (swapcache) {
> +               /*
> +                * Make sure try_to_free_swap or reuse_swap_page or swapoff did
> +                * not release the swapcache from under us.  The page pin, and
> +                * pte_same test below, are not enough to exclude that.  Even if
> +                * it is still swapcache, we need to check that the page's swap
> +                * has not changed.
> +                */
> +               if (unlikely(!PageSwapCache(page) ||
> +                            page_private(page) != entry.val))
> +                       goto out_page;
> +
> +               /*
> +                * KSM sometimes has to copy on read faults, for example, if
> +                * page->index of !PageKSM() pages would be nonlinear inside the
> +                * anon VMA -- PageKSM() is lost on actual swapout.
> +                */
> +               page = ksm_might_need_to_copy(page, vma, vmf->address);
> +               if (unlikely(!page)) {
> +                       ret = VM_FAULT_OOM;
> +                       page = swapcache;
> +                       goto out_page;
> +               }
>         }
>
>         cgroup_throttle_swaprate(page, GFP_KERNEL);
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/9] mm: slightly clarify KSM logic in do_swap_page()
  2022-03-09 18:48   ` Yang Shi
@ 2022-03-09 19:15     ` David Hildenbrand
  2022-03-09 20:50       ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2022-03-09 19:15 UTC (permalink / raw)
  To: Yang Shi
  Cc: Linux Kernel Mailing List, Andrew Morton, Hugh Dickins,
	Linus Torvalds, David Rientjes, Shakeel Butt, John Hubbard,
	Jason Gunthorpe, Mike Kravetz, Mike Rapoport,
	Kirill A . Shutemov, Matthew Wilcox, Vlastimil Babka, Jann Horn,
	Michal Hocko, Nadav Amit, Rik van Riel, Roman Gushchin,
	Andrea Arcangeli, Peter Xu, Donald Dutile, Christoph Hellwig,
	Oleg Nesterov, Jan Kara, Liang Zhang, Linux MM

On 09.03.22 19:48, Yang Shi wrote:
> On Mon, Jan 31, 2022 at 8:33 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> Let's make it clearer that KSM might only have to copy a page
>> in case we have a page in the swapcache, not if we allocated a fresh
>> page and bypassed the swapcache. While at it, add a comment why this is
>> usually necessary and merge the two swapcache conditions.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  mm/memory.c | 38 +++++++++++++++++++++++---------------
>>  1 file changed, 23 insertions(+), 15 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 923165b4c27e..3c91294cca98 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3615,21 +3615,29 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>                 goto out_release;
>>         }
>>
>> -       /*
>> -        * Make sure try_to_free_swap or reuse_swap_page or swapoff did not
> 
> We could remove the reference to "reuse_swap_page", right?
>
Yes, I noticed this a couple of days ago as well and already have a
patch prepared for that ("mm: adjust stale comment in do_swap_page()
mentioning reuse_swap_page()" at
https://github.com/davidhildenbrand/linux/commits/cow_fixes_part_3)

If Andrew wants, we can fix that up directly before sending upstream or
I'll simply include that patch when sending out part2 v2.

(I want to avoid sending another series just for this)

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/9] mm: slightly clarify KSM logic in do_swap_page()
  2022-03-09 19:15     ` David Hildenbrand
@ 2022-03-09 20:50       ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2022-03-09 20:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Yang Shi, Linux Kernel Mailing List, Hugh Dickins,
	Linus Torvalds, David Rientjes, Shakeel Butt, John Hubbard,
	Jason Gunthorpe, Mike Kravetz, Mike Rapoport,
	Kirill A . Shutemov, Matthew Wilcox, Vlastimil Babka, Jann Horn,
	Michal Hocko, Nadav Amit, Rik van Riel, Roman Gushchin,
	Andrea Arcangeli, Peter Xu, Donald Dutile, Christoph Hellwig,
	Oleg Nesterov, Jan Kara, Liang Zhang, Linux MM

On Wed, 9 Mar 2022 20:15:54 +0100 David Hildenbrand <david@redhat.com> wrote:

> On 09.03.22 19:48, Yang Shi wrote:
> > On Mon, Jan 31, 2022 at 8:33 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> Let's make it clearer that KSM might only have to copy a page
> >> in case we have a page in the swapcache, not if we allocated a fresh
> >> page and bypassed the swapcache. While at it, add a comment why this is
> >> usually necessary and merge the two swapcache conditions.
> >>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  mm/memory.c | 38 +++++++++++++++++++++++---------------
> >>  1 file changed, 23 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index 923165b4c27e..3c91294cca98 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -3615,21 +3615,29 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> >>                 goto out_release;
> >>         }
> >>
> >> -       /*
> >> -        * Make sure try_to_free_swap or reuse_swap_page or swapoff did not
> > 
> > We could remove the reference to "reuse_swap_page", right?
> >
> Yes, I noticed this a couple of days ago as well and already have a
> patch prepared for that ("mm: adjust stale comment in do_swap_page()
> mentioning reuse_swap_page()" at
> https://github.com/davidhildenbrand/linux/commits/cow_fixes_part_3)
> 
> If Andrew wants, we can fix that up directly before sending upstream or
> I'll simply include that patch when sending out part2 v2.
> 
> (I want to avoid sending another series just for this)

Thanks, I did this.  The same change plus gratuitous comment reflowing.

--- a/mm/memory.c~mm-slightly-clarify-ksm-logic-in-do_swap_page-fix
+++ a/mm/memory.c
@@ -3609,11 +3609,11 @@ vm_fault_t do_swap_page(struct vm_fault
 
 	if (swapcache) {
 		/*
-		 * Make sure try_to_free_swap or reuse_swap_page or swapoff did
-		 * not release the swapcache from under us.  The page pin, and
-		 * pte_same test below, are not enough to exclude that.  Even if
-		 * it is still swapcache, we need to check that the page's swap
-		 * has not changed.
+		 * Make sure try_to_free_swap or swapoff did not release the
+		 * swapcache from under us.  The page pin, and pte_same test
+		 * below, are not enough to exclude that.  Even if it is still
+		 * swapcache, we need to check that the page's swap has not
+		 * changed.
 		 */
 		if (unlikely(!PageSwapCache(page) ||
 			     page_private(page) != entry.val))
_


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 4/9] mm: streamline COW logic in do_swap_page()
  2022-01-31 16:29 ` [PATCH v3 4/9] mm: streamline COW " David Hildenbrand
@ 2022-03-10  9:41   ` Vlastimil Babka
  0 siblings, 0 replies; 25+ messages in thread
From: Vlastimil Babka @ 2022-03-10  9:41 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Jann Horn, Michal Hocko, Nadav Amit, Rik van Riel,
	Roman Gushchin, Andrea Arcangeli, Peter Xu, Donald Dutile,
	Christoph Hellwig, Oleg Nesterov, Jan Kara, Liang Zhang,
	linux-mm

On 1/31/22 17:29, David Hildenbrand wrote:
> Currently we have a different COW logic when:
> * triggering a read-fault to swapin first and then trigger a write-fault
>   -> do_swap_page() + do_wp_page()
> * triggering a write-fault to swapin
>   -> do_swap_page() + do_wp_page() only if we fail reuse in do_swap_page()
> 
> The COW logic in do_swap_page() is different than our reuse logic in
> do_wp_page(). The COW logic in do_wp_page() -- page_count() == 1 --  makes
> currently sure that we certainly don't have a remaining reference, e.g.,
> via GUP, on the target page we want to reuse: if there is any unexpected
> reference, we have to copy to avoid information leaks.
> 
> As do_swap_page() behaves differently, in environments with swap enabled we
> can currently have an unintended information leak from the parent to the
> child, similar as known from CVE-2020-29374:
> 
> 	1. Parent writes to anonymous page
> 	-> Page is mapped writable and modified
> 	2. Page is swapped out
> 	-> Page is unmapped and replaced by swap entry
> 	3. fork()
> 	-> Swap entries are copied to child
> 	4. Child pins page R/O
> 	-> Page is mapped R/O into child
> 	5. Child unmaps page
> 	-> Child still holds GUP reference
> 	6. Parent writes to page
> 	-> Page is reused in do_swap_page()
> 	-> Child can observe changes
> 
> Exchanging 2. and 3. should have the same effect.
> 
> Let's apply the same COW logic as in do_wp_page(), conditionally trying to
> remove the page from the swapcache after freeing the swap entry, however,
> before actually mapping our page. We can change the order now that
> we use try_to_free_swap(), which doesn't care about the mapcount,
> instead of reuse_swap_page().
> 
> To handle references from the LRU pagevecs, conditionally drain the local
> LRU pagevecs when required, however, don't consider the page_count() when
> deciding whether to drain to keep it simple for now.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/9] mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()
  2022-01-31 16:29 ` [PATCH v3 5/9] mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() David Hildenbrand
@ 2022-03-10  9:52   ` Vlastimil Babka
  0 siblings, 0 replies; 25+ messages in thread
From: Vlastimil Babka @ 2022-03-10  9:52 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Jann Horn, Michal Hocko, Nadav Amit, Rik van Riel,
	Roman Gushchin, Andrea Arcangeli, Peter Xu, Donald Dutile,
	Christoph Hellwig, Oleg Nesterov, Jan Kara, Liang Zhang,
	linux-mm

On 1/31/22 17:29, David Hildenbrand wrote:
> We currently have a different COW logic for anon THP than we have for
> ordinary anon pages in do_wp_page(): the effect is that the issue reported
> in CVE-2020-29374 is currently still possible for anon THP: an unintended
> information leak from the parent to the child.
> 
> Let's apply the same logic (page_count() == 1), with similar
> optimizations to remove additional references first as we really want to
> avoid PTE-mapping the THP and copying individual pages best we can.
> 
> If we end up with a page that has page_count() != 1, we'll have to PTE-map
> the THP and fallback to do_wp_page(), which will always copy the page.
> 
> Note that KSM does not apply to THP.
> 
> I. Interaction with the swapcache and writeback
> 
> While a THP is in the swapcache, the swapcache holds one reference on each
> subpage of the THP. So with PageSwapCache() set, we expect as many
> additional references as we have subpages. If we manage to remove the
> THP from the swapcache, all these references will be gone.
> 
> Usually, a THP is not split when entered into the swapcache and stays a
> compound page. However, try_to_unmap() will PTE-map the THP and use PTE
> swap entries. There are no PMD swap entries for that purpose, consequently,
> we always only swapin subpages into PTEs.
> 
> Removing a page from the swapcache can fail either when there are remaining
> swap entries (in which case COW is the right thing to do) or if the page is
> currently under writeback.
> 
> Having a locked, R/O PMD-mapped THP that is in the swapcache seems to be
> possible only in corner cases, for example, if try_to_unmap() failed
> after adding the page to the swapcache. However, it's comparatively easy to
> handle.
> 
> As we have to fully unmap a THP before starting writeback, and swapin is
> always done on the PTE level, we shouldn't find a R/O PMD-mapped THP in the
> swapcache that is under writeback. This should at least leave writeback
> out of the picture.
> 
> II. Interaction with GUP references
> 
> Having a R/O PMD-mapped THP with GUP references (i.e., R/O references)
> will result in PTE-mapping the THP on a write fault. Similar to ordinary
> anon pages, do_wp_page() will have to copy sub-pages and result in a
> disconnect between the GUP references and the pages actually mapped into
> the page tables. To improve the situation in the future, we'll need
> additional handling to mark anonymous pages as definitely exclusive to a
> single process, only allow GUP pins on exclusive anon pages, and
> disallow sharing of exclusive anon pages with GUP pins e.g., during
> fork().
> 
> III. Interaction with references from LRU pagevecs
> 
> There is no need to try draining the (local) LRU pagevecs in case we would
> stumble over a !PageLRU() page: folio_add_lru() and friends will always
> flush the affected pagevec after adding a compound page to it
> immediately -- pagevec_add_and_need_flush() always returns "true" for them.
> Note that the LRU pagevecs will hold a reference on the compound page for
> a very short time, between adding the page to the pagevec and draining it
> immediately afterwards.
> 
> IV. Interaction with speculative/temporary references
> 
> Similar to ordinary anon pages, other speculative/temporary references on
> the THP, for example, from the pagecache or page migration code, will
> disallow exclusive reuse of the page. We'll have to PTE-map the THP.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/huge_memory.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 406a3c28c026..f34ebc5cb827 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1303,7 +1303,6 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf)
>  	page = pmd_page(orig_pmd);
>  	VM_BUG_ON_PAGE(!PageHead(page), page);
>  
> -	/* Lock page for reuse_swap_page() */
>  	if (!trylock_page(page)) {
>  		get_page(page);
>  		spin_unlock(vmf->ptl);
> @@ -1319,10 +1318,15 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf)
>  	}
>  
>  	/*
> -	 * We can only reuse the page if nobody else maps the huge page or it's
> -	 * part.
> +	 * See do_wp_page(): we can only map the page writable if there are
> +	 * no additional references. Note that we always drain the LRU
> +	 * pagevecs immediately after adding a THP.
>  	 */
> -	if (reuse_swap_page(page)) {
> +	if (page_count(page) > 1 + PageSwapCache(page) * thp_nr_pages(page))
> +		goto unlock_fallback;
> +	if (PageSwapCache(page))
> +		try_to_free_swap(page);
> +	if (page_count(page) == 1) {
>  		pmd_t entry;
>  		entry = pmd_mkyoung(orig_pmd);
>  		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> @@ -1333,6 +1337,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf)
>  		return VM_FAULT_WRITE;
>  	}
>  
> +unlock_fallback:
>  	unlock_page(page);
>  	spin_unlock(vmf->ptl);
>  fallback:


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 6/9] mm/khugepaged: remove reuse_swap_page() usage
  2022-01-31 16:29 ` [PATCH v3 6/9] mm/khugepaged: remove reuse_swap_page() usage David Hildenbrand
  2022-02-01 21:31   ` Yang Shi
@ 2022-03-10 10:37   ` Vlastimil Babka
  1 sibling, 0 replies; 25+ messages in thread
From: Vlastimil Babka @ 2022-03-10 10:37 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Jann Horn, Michal Hocko, Nadav Amit, Rik van Riel,
	Roman Gushchin, Andrea Arcangeli, Peter Xu, Donald Dutile,
	Christoph Hellwig, Oleg Nesterov, Jan Kara, Liang Zhang,
	linux-mm

On 1/31/22 17:29, David Hildenbrand wrote:
> reuse_swap_page() currently indicates if we can write to an anon page
> without COW. A COW is required if the page is shared by multiple
> processes (either already mapped or via swap entries) or if there is
> concurrent writeback that cannot tolerate concurrent page modifications.
> 
> However, in the context of khugepaged we're not actually going to write
> to a read-only mapped page, we'll copy the page content to our newly
> allocated THP and map that THP writable. All we have to make sure
> is that the read-only mapped page we're about to copy won't get reused
> by another process sharing the page, otherwise, page content would
> get modified. But that is already guaranteed via multiple mechanisms
> (e.g., holding a reference, holding the page lock, removing the rmap after
>  copying the page).
> 
> The swapcache handling was introduced in commit 10359213d05a ("mm:
> incorporate read-only pages into transparent huge pages") and it sounds
> like it merely wanted to mimic what do_swap_page() would do when trying
> to map a page obtained via the swapcache writable.
> 
> As that logic is unnecessary, let's just remove it, removing the last
> user of reuse_swap_page().
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/trace/events/huge_memory.h |  1 -
>  mm/khugepaged.c                    | 11 -----------
>  2 files changed, 12 deletions(-)
> 
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index 4fdb14a81108..d651f3437367 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -29,7 +29,6 @@
>  	EM( SCAN_VMA_NULL,		"vma_null")			\
>  	EM( SCAN_VMA_CHECK,		"vma_check_failed")		\
>  	EM( SCAN_ADDRESS_RANGE,		"not_suitable_address_range")	\
> -	EM( SCAN_SWAP_CACHE_PAGE,	"page_swap_cache")		\
>  	EM( SCAN_DEL_PAGE_LRU,		"could_not_delete_page_from_lru")\
>  	EM( SCAN_ALLOC_HUGE_PAGE_FAIL,	"alloc_huge_page_failed")	\
>  	EM( SCAN_CGROUP_CHARGE_FAIL,	"ccgroup_charge_failed")	\
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 35f14d0a00a6..9da9325ab4d4 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -45,7 +45,6 @@ enum scan_result {
>  	SCAN_VMA_NULL,
>  	SCAN_VMA_CHECK,
>  	SCAN_ADDRESS_RANGE,
> -	SCAN_SWAP_CACHE_PAGE,
>  	SCAN_DEL_PAGE_LRU,
>  	SCAN_ALLOC_HUGE_PAGE_FAIL,
>  	SCAN_CGROUP_CHARGE_FAIL,
> @@ -682,16 +681,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  			result = SCAN_PAGE_COUNT;
>  			goto out;
>  		}
> -		if (!pte_write(pteval) && PageSwapCache(page) &&
> -				!reuse_swap_page(page)) {
> -			/*
> -			 * Page is in the swap cache and cannot be re-used.
> -			 * It cannot be collapsed into a THP.
> -			 */
> -			unlock_page(page);
> -			result = SCAN_SWAP_CACHE_PAGE;
> -			goto out;
> -		}
>  
>  		/*
>  		 * Isolate the page to avoid collapsing an hugepage


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/9] mm/swapfile: remove stale reuse_swap_page()
  2022-01-31 16:29 ` [PATCH v3 7/9] mm/swapfile: remove stale reuse_swap_page() David Hildenbrand
  2022-02-02 14:35   ` Christoph Hellwig
@ 2022-03-10 10:44   ` Vlastimil Babka
  1 sibling, 0 replies; 25+ messages in thread
From: Vlastimil Babka @ 2022-03-10 10:44 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Jann Horn, Michal Hocko, Nadav Amit, Rik van Riel,
	Roman Gushchin, Andrea Arcangeli, Peter Xu, Donald Dutile,
	Christoph Hellwig, Oleg Nesterov, Jan Kara, Liang Zhang,
	linux-mm

On 1/31/22 17:29, David Hildenbrand wrote:
> All users are gone, let's remove it. We'll let SWP_STABLE_WRITES stick
> around for now, as it might come in handy in the near future.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Nice cleanup.

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 8/9] mm/huge_memory: remove stale page_trans_huge_mapcount()
  2022-01-31 16:29 ` [PATCH v3 8/9] mm/huge_memory: remove stale page_trans_huge_mapcount() David Hildenbrand
@ 2022-03-10 10:50   ` Vlastimil Babka
  0 siblings, 0 replies; 25+ messages in thread
From: Vlastimil Babka @ 2022-03-10 10:50 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Jann Horn, Michal Hocko, Nadav Amit, Rik van Riel,
	Roman Gushchin, Andrea Arcangeli, Peter Xu, Donald Dutile,
	Christoph Hellwig, Oleg Nesterov, Jan Kara, Liang Zhang,
	linux-mm

On 1/31/22 17:29, David Hildenbrand wrote:
> All users are gone, let's remove it.

Good riddance.

> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 9/9] mm/huge_memory: remove stale locking logic from __split_huge_pmd()
  2022-01-31 16:29 ` [PATCH v3 9/9] mm/huge_memory: remove stale locking logic from __split_huge_pmd() David Hildenbrand
@ 2022-03-10 11:02   ` Vlastimil Babka
  0 siblings, 0 replies; 25+ messages in thread
From: Vlastimil Babka @ 2022-03-10 11:02 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Andrew Morton, Hugh Dickins, Linus Torvalds, David Rientjes,
	Shakeel Butt, John Hubbard, Jason Gunthorpe, Mike Kravetz,
	Mike Rapoport, Yang Shi, Kirill A . Shutemov, Matthew Wilcox,
	Jann Horn, Michal Hocko, Nadav Amit, Rik van Riel,
	Roman Gushchin, Andrea Arcangeli, Peter Xu, Donald Dutile,
	Christoph Hellwig, Oleg Nesterov, Jan Kara, Liang Zhang,
	linux-mm

On 1/31/22 17:29, David Hildenbrand wrote:
> Let's remove the stale logic that was required for reuse_swap_page().
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-03-10 11:02 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-31 16:29 [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
2022-01-31 16:29 ` [PATCH v3 1/9] mm: optimize do_wp_page() for exclusive pages in the swapcache David Hildenbrand
2022-01-31 16:29 ` [PATCH v3 2/9] mm: optimize do_wp_page() for fresh pages in local LRU pagevecs David Hildenbrand
2022-03-09 17:53   ` Vlastimil Babka
2022-01-31 16:29 ` [PATCH v3 3/9] mm: slightly clarify KSM logic in do_swap_page() David Hildenbrand
2022-03-09 18:03   ` Vlastimil Babka
2022-03-09 18:48   ` Yang Shi
2022-03-09 19:15     ` David Hildenbrand
2022-03-09 20:50       ` Andrew Morton
2022-01-31 16:29 ` [PATCH v3 4/9] mm: streamline COW " David Hildenbrand
2022-03-10  9:41   ` Vlastimil Babka
2022-01-31 16:29 ` [PATCH v3 5/9] mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() David Hildenbrand
2022-03-10  9:52   ` Vlastimil Babka
2022-01-31 16:29 ` [PATCH v3 6/9] mm/khugepaged: remove reuse_swap_page() usage David Hildenbrand
2022-02-01 21:31   ` Yang Shi
2022-03-10 10:37   ` Vlastimil Babka
2022-01-31 16:29 ` [PATCH v3 7/9] mm/swapfile: remove stale reuse_swap_page() David Hildenbrand
2022-02-02 14:35   ` Christoph Hellwig
2022-02-02 17:01     ` David Hildenbrand
2022-03-10 10:44   ` Vlastimil Babka
2022-01-31 16:29 ` [PATCH v3 8/9] mm/huge_memory: remove stale page_trans_huge_mapcount() David Hildenbrand
2022-03-10 10:50   ` Vlastimil Babka
2022-01-31 16:29 ` [PATCH v3 9/9] mm/huge_memory: remove stale locking logic from __split_huge_pmd() David Hildenbrand
2022-03-10 11:02   ` Vlastimil Babka
2022-02-01 18:59 ` [PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).