All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Michal Hocko <mhocko@kernel.org>, Nadav Amit <namit@vmware.com>,
	Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Xu <peterx@redhat.com>, Donald Dutile <ddutile@redhat.com>,
	Christoph Hellwig <hch@lst.de>, Oleg Nesterov <oleg@redhat.com>,
	Jan Kara <jack@suse.cz>, Liang Zhang <zhangliang5@huawei.com>,
	Linux MM <linux-mm@kvack.org>
Subject: Re: [PATCH RFC v2 6/9] mm/khugepaged: remove reuse_swap_page() usage
Date: Thu, 27 Jan 2022 13:23:41 -0800	[thread overview]
Message-ID: <CAHbLzkpRgeYkPHUc3KAUc_Fr-YexQxK1cH92Suueac5GrwZHsw@mail.gmail.com> (raw)
In-Reply-To: <20220126095557.32392-7-david@redhat.com>

On Wed, Jan 26, 2022 at 2:00 AM David Hildenbrand <david@redhat.com> wrote:
>
> reuse_swap_page() currently indicates if we can write to an anon page
> without COW. A COW is required if the page is shared by multiple
> processes (either already mapped or via swap entries) or if there is
> concurrent writeback that cannot tolerate concurrent page modifications.
>
> reuse_swap_page() doesn't check for pending references from other
> processes that already unmapped the page, however,
> is_refcount_suitable() essentially does the same thing in the context of
> khugepaged. khugepaged is the last remaining user of reuse_swap_page() and
> we want to remove that function.
>
> In the context of khugepaged, we are not actually going to write to the
> page and we don't really care about other processes mapping the page:
> for example, without swap, we don't care about shared pages at all.
>
> The current logic seems to be:
> * Writable: -> Not shared, but might be in the swapcache. Nobody can
>   fault it in from the swapcache as there are no other swap entries.
> * Readable and not in the swapcache: Might be shared (but nobody can
>   fault it in from the swapcache).
> * Readable and in the swapcache: Might be shared and someone might be
>   able to fault it in from the swapcache. Make sure we're the exclusive
>   owner via reuse_swap_page().
>
> Having to guess due to lack of comments and documentation, the current
> logic really only wants to make sure that a page that might be shared
> cannot be faulted in from the swapcache while khugepaged is active.
> It's hard to guess why that is that case and if it's really still required,
> but let's try keeping that logic unmodified.

I don't think it could be faulted in while khugepaged is active since
khugepaged does hold mmap_lock in write mode IIUC. So page fault is
serialized against khugepaged.

My wild guess is that collapsing shared pages was not supported before
v5.8, so we need reuse_swap_page() to tell us if the page in swap
cache is shared or not. But it is not true anymore. And khugepaged
just allocates a THP then copy the data from base pages to huge page
then replace PTEs to PMD, it doesn't change the content of the page,
so I failed to see a problem by collapsing a shared page in swap
cache. But I'm really not entirely sure, I may miss something...



>
> Instead of relying on reuse_swap_page(), let's unconditionally
> try_to_free_swap(), special casing PageKsm(). try_to_free_swap() will fail
> if there are still swap entries targeting the page or if the page is under
> writeback.
>
> After a successful try_to_free_swap() that page cannot be readded to the
> swapcache because we're keeping the page locked and removed from the LRU
> until we actually perform the copy. So once we succeeded removing a page
> from the swapcache, it cannot be re-added until we're done copying. Add a
> comment stating that.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/khugepaged.c | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 35f14d0a00a6..bc0ff598e98f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -683,10 +683,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>                         goto out;
>                 }
>                 if (!pte_write(pteval) && PageSwapCache(page) &&
> -                               !reuse_swap_page(page)) {
> +                   (PageKsm(page) || !try_to_free_swap(page))) {
>                         /*
> -                        * Page is in the swap cache and cannot be re-used.
> -                        * It cannot be collapsed into a THP.
> +                        * Possibly shared page cannot be removed from the
> +                        * swapache. It cannot be collapsed into a THP.
>                          */
>                         unlock_page(page);
>                         result = SCAN_SWAP_CACHE_PAGE;
> @@ -702,6 +702,16 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>                         result = SCAN_DEL_PAGE_LRU;
>                         goto out;
>                 }
> +
> +               /*
> +                * We're holding the page lock and removed the page from the
> +                * LRU. Once done copying, we'll unlock and readd to the
> +                * LRU via release_pte_page(). If the page is still in the
> +                * swapcache, we're the exclusive owner. Due to the page lock
> +                * the page cannot be added to the swapcache until we're done
> +                * and consequently it cannot be faulted in from the swapcache
> +                * into another process.
> +                */
>                 mod_node_page_state(page_pgdat(page),
>                                 NR_ISOLATED_ANON + page_is_file_lru(page),
>                                 compound_nr(page));
> --
> 2.34.1
>

  reply	other threads:[~2022-01-27 21:23 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26  9:55 [PATCH RFC v2 0/9] mm: COW fixes part 1: fix the COW security issue for THP and swap David Hildenbrand
2022-01-26  9:55 ` [PATCH RFC v2 1/9] mm: optimize do_wp_page() for exclusive pages in the swapcache David Hildenbrand
2022-01-26 14:25   ` Matthew Wilcox
2022-01-28 12:53   ` Vlastimil Babka
2022-01-28 13:44     ` David Hildenbrand
2022-01-26  9:55 ` [PATCH RFC v2 2/9] mm: optimize do_wp_page() for fresh pages in local LRU pagevecs David Hildenbrand
2022-01-26 14:31   ` Matthew Wilcox
2022-01-26 14:36     ` David Hildenbrand
2022-01-26  9:55 ` [PATCH RFC v2 3/9] mm: slightly clarify KSM logic in do_swap_page() David Hildenbrand
2022-01-26  9:55 ` [PATCH RFC v2 4/9] mm: streamline COW " David Hildenbrand
2022-01-26  9:55 ` [PATCH RFC v2 5/9] mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() David Hildenbrand
2022-01-26 20:36   ` Yang Shi
2022-01-27  8:14     ` David Hildenbrand
2022-01-26  9:55 ` [PATCH RFC v2 6/9] mm/khugepaged: remove reuse_swap_page() usage David Hildenbrand
2022-01-27 21:23   ` Yang Shi [this message]
2022-01-28  8:41     ` David Hildenbrand
2022-01-28 17:10       ` Yang Shi
2022-01-26  9:55 ` [PATCH RFC v2 7/9] mm/swapfile: remove reuse_swap_page() David Hildenbrand
2022-01-26  9:55 ` [PATCH RFC v2 8/9] mm/huge_memory: remove stale page_trans_huge_mapcount() David Hildenbrand
2022-01-26  9:55 ` [PATCH RFC v2 9/9] mm/huge_memory: remove stale locking logic from __split_huge_pmd() David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkpRgeYkPHUc3KAUc_Fr-YexQxK1cH92Suueac5GrwZHsw@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=guro@fb.com \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=namit@vmware.com \
    --cc=oleg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=zhangliang5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.