linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
To: David Hildenbrand <david@redhat.com>,
	Peter Xu <peterx@redhat.com>,
	Tiberiu Georgescu <tiberiu.georgescu@nutanix.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Jerome Glisse <jglisse@redhat.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Alistair Popple <apopple@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	"Carl Waldspurger [C]" <carl.waldspurger@nutanix.com>,
	Florian Schmidt <flosch@nutanix.com>,
	"ovzxemul@gmail.com" <ovzxemul@gmail.com>
Subject: RE: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs
Date: Wed, 21 Jul 2021 19:54:44 +0000	[thread overview]
Message-ID: <CY4PR0201MB3460AAED19F46CD184B2AB30E9E39@CY4PR0201MB3460.namprd02.prod.outlook.com> (raw)
In-Reply-To: <5c3c84ee-02f6-a2af-13b8-5dcf70676641@redhat.com>

On Wed, Jul 21, 2021 4:20 PM +0000, David Hildenbrand wrote:
> On 21.07.21 16:38, Ivan Teterevkov wrote:
> > On Mon, Jul 19, 2021 5:56 PM +0000, Peter Xu wrote:
> >> I'm also curious what would be the real use to have an accurate
> >> PM_SWAP accounting.  To me current implementation may not provide
> >> accurate value but should be good enough for most cases.  However not
> >> sure whether it's also true for your use case.
> >
> > We want the PM_SWAP bit implemented (for shared memory in the pagemap
> > interface) to enhance the live migration for some fraction of the
> > guest VMs that have their pages swapped out to the host swap. Once
> > those pages are paged in and transferred over network, we then want to
> > release them with madvise(MADV_PAGEOUT) and preserve the working set
> > of the guest VMs to reduce the thrashing of the host swap.
> 
> There are 3 possibilities I think (swap is just another variant of the page cache):
> 
> 1) The page is not in the page cache, e.g., it resides on disk or in a swap file.
> pte_none().
> 2) The page is in the page cache and is not mapped into the page table.
> pte_none().
> 3) The page is in the page cache and mapped into the page table.
> !pte_none().
> 
> Do I understand correctly that you want to identify 1) and indicate it via
> PM_SWAP?

Yes, and I also want to outline the context so we're on the same page.

This series introduces the support for userfaultfd-wp for shared memory
because once a shared page is swapped, its PTE is cleared. Upon retrieval
from a swap file, there's no way to "recover" the _PAGE_SWP_UFFD_WP flag
because unlike private memory it's not kept in PTE or elsewhere.

We came across the same issue with PM_SWAP in the pagemap interface, but
fortunately, there's the place that we could query: the i_pages field of
the struct address_space (XArray). In https://lkml.org/lkml/2021/7/14/595
we do it similarly to what shmem_fault() does when it handles #PF.

Now, in the context of this series, we were exploring whether it makes
any practical sense to introduce more brand new flags to the special
PTE to populate the pagemap flags "on the spot" from the given PTE.

However, I can't see how (and why) to achieve that specifically for
PM_SWAP even with an extra bit: the XArray is precisely what we need for
the live migration use case. Another flag PM_SOFT_DIRTY suffers the same
problem as UFFD_WP_SWP_PTE_SPECIAL before this patch series, but we don't
need it at the moment.

Hope that clarification makes sense?

The only outstanding note I have is about the compatibility of our
patches around pte_to_pagemap_entry(). I think the resulting code
should look like this:

	static pagemap_entry_t pte_to_pagemap_entry(...)
	{
		if (pte_present(pte)) {
			...
		} else if (is_swap_pte(pte) || shmem_file(vma->vm_file)) {
			...
			if (pte_swp_uffd_wp_special(pte)) {
				flags |= PM_UFFD_WP;
			}
		}
	}

The is_swap_pte() branch will be taken for the swapped out shared pages,
thanks to shmem_file(), so the pte_swp_uffd_wp_special() can be checked
inside.

Alternatively, we could just remove "else" statement:

	static pagemap_entry_t pte_to_pagemap_entry(...)
	{
		if (pte_present(pte)) {
			...
		} else if (is_swap_pte(pte) || shmem_file(vma->vm_file)) {
			...
		}

		if (pte_swp_uffd_wp_special(pte)) {
			flags |= PM_UFFD_WP;
		}
	}

What do you reckon?

Thanks,
Ivan

  reply	other threads:[~2021-07-21 19:55 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-15 20:13 [PATCH v5 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-07-15 20:13 ` [PATCH v5 01/26] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
2021-07-15 20:13 ` [PATCH v5 02/26] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-07-15 20:13 ` [PATCH v5 03/26] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
2021-07-15 20:14 ` [PATCH v5 04/26] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
2021-07-15 20:14 ` [PATCH v5 05/26] mm/swap: Introduce the idea of special swap ptes Peter Xu
2021-07-16  5:50   ` Alistair Popple
2021-07-16 19:11     ` Peter Xu
2021-07-21 11:28       ` Alistair Popple
2021-07-21 21:35         ` Peter Xu
2021-07-22  1:08           ` Alistair Popple
2021-07-22 15:21             ` Peter Xu
2021-07-15 20:14 ` [PATCH v5 06/26] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
2021-07-15 20:14 ` [PATCH v5 07/26] mm: Drop first_index/last_index in zap_details Peter Xu
2021-07-15 20:14 ` [PATCH v5 08/26] mm: Introduce zap_details.zap_flags Peter Xu
2021-07-15 20:14 ` [PATCH v5 09/26] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
2021-07-15 20:14 ` [PATCH v5 10/26] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-07-15 20:15 ` [PATCH v5 11/26] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Peter Xu
2021-07-15 20:16 ` [PATCH v5 12/26] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-07-15 20:16 ` [PATCH v5 13/26] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
2021-07-15 20:16 ` [PATCH v5 14/26] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
2021-07-15 20:16 ` [PATCH v5 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
2021-07-15 20:16 ` [PATCH v5 16/26] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Peter Xu
2021-07-15 20:16 ` [PATCH v5 17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
2021-07-20 15:37   ` kernel test robot
2021-07-21 21:50     ` Peter Xu
2021-07-15 20:16 ` [PATCH v5 18/26] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-07-20 23:59   ` kernel test robot
2021-07-15 20:16 ` [PATCH v5 19/26] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-07-21  8:24   ` kernel test robot
2021-07-15 20:16 ` [PATCH v5 20/26] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
2021-07-15 20:16 ` [PATCH v5 21/26] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
2021-07-15 20:16 ` [PATCH v5 22/26] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
2021-07-15 20:16 ` [PATCH v5 23/26] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Peter Xu
2021-07-15 20:16 ` [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Peter Xu
2021-07-19  9:53   ` Tiberiu Georgescu
2021-07-19 16:03     ` Peter Xu
2021-07-19 17:23       ` Tiberiu Georgescu
2021-07-19 17:56         ` Peter Xu
2021-07-21 14:38           ` Ivan Teterevkov
2021-07-21 16:19             ` David Hildenbrand
2021-07-21 19:54               ` Ivan Teterevkov [this message]
2021-07-21 22:28                 ` Peter Xu
2021-07-21 22:57                   ` Peter Xu
2021-07-22  6:27                     ` David Hildenbrand
2021-07-22 16:08                       ` Peter Xu
2021-07-15 20:16 ` [PATCH v5 25/26] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-07-15 20:16 ` [PATCH v5 26/26] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu
2021-07-19 19:21 ` [PATCH v5 00/26] userfaultfd-wp: Support shmem and hugetlbfs David Hildenbrand
2021-07-19 20:12   ` Peter Xu
2021-07-22 18:30 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY4PR0201MB3460AAED19F46CD184B2AB30E9E39@CY4PR0201MB3460.namprd02.prod.outlook.com \
    --to=ivan.teterevkov@nutanix.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=carl.waldspurger@nutanix.com \
    --cc=david@redhat.com \
    --cc=flosch@nutanix.com \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=ovzxemul@gmail.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=tiberiu.georgescu@nutanix.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).