All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrei Vagin <avagin@gmail.com>
To: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: "Peter Xu" <peterx@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Michał Mirosław" <emmir@google.com>,
	"Danylo Mocherniuk" <mdanylo@google.com>,
	"Paul Gofman" <pgofman@codeweavers.com>,
	"Cyrill Gorcunov" <gorcunov@gmail.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Nadav Amit" <namit@vmware.com>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Shuah Khan" <shuah@kernel.org>,
	"Christian Brauner" <brauner@kernel.org>,
	"Yang Shi" <shy828301@gmail.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	"Yun Zhou" <yun.zhou@windriver.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Alex Sierra" <alex.sierra@amd.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Axel Rasmussen" <axelrasmussen@google.com>,
	"Gustavo A . R . Silva" <gustavoars@kernel.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	"Greg KH" <gregkh@linuxfoundation.org>,
	kernel@collabora.com
Subject: Re: [PATCH v19 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs
Date: Tue, 20 Jun 2023 23:42:12 -0700	[thread overview]
Message-ID: <ZJKbxKrJRy/L2JuA@gmail.com> (raw)
In-Reply-To: <212e331f-35b0-5ae7-6371-26caa577d637@collabora.com>

On Mon, Jun 19, 2023 at 11:06:36AM +0500, Muhammad Usama Anjum wrote:
> On 6/17/23 11:39 AM, Andrei Vagin wrote:
> > On Thu, Jun 15, 2023 at 07:11:41PM +0500, Muhammad Usama Anjum wrote:
> >> +static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start,
> >> +				  unsigned long end, struct mm_walk *walk)
> >> +{
> >> +	bool is_written, flush = false, is_interesting = true;
> >> +	struct pagemap_scan_private *p = walk->private;
> >> +	struct vm_area_struct *vma = walk->vma;
> >> +	unsigned long bitmap, addr = end;
> >> +	pte_t *pte, *orig_pte, ptent;
> >> +	spinlock_t *ptl;
> >> +	int ret = 0;
> >> +
> >> +	arch_enter_lazy_mmu_mode();
> >> +
> >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >> +	ptl = pmd_trans_huge_lock(pmd, vma);
> >> +	if (ptl) {
> >> +		unsigned long n_pages = (end - start)/PAGE_SIZE;
> >> +
> >> +		if (p->max_pages && n_pages > p->max_pages - p->found_pages)
> >> +			n_pages = p->max_pages - p->found_pages;
> >> +
> >> +		is_written = !is_pmd_uffd_wp(*pmd);
> >> +
> >> +		/*
> >> +		 * Break huge page into small pages if the WP operation need to
> >> +		 * be performed is on a portion of the huge page.
> >> +		 */
> >> +		if (is_written && IS_PM_SCAN_WP(p->flags) &&
> >> +		    n_pages < HPAGE_SIZE/PAGE_SIZE) {
> >> +			spin_unlock(ptl);
> >> +
> >> +			split_huge_pmd(vma, pmd, start);
> >> +			goto process_smaller_pages;
> >> +		}
> >> +
> >> +		bitmap = PM_SCAN_FLAGS(is_written, (bool)vma->vm_file,
> >> +				       pmd_present(*pmd), is_swap_pmd(*pmd));
> >> +
> >> +		if (IS_PM_SCAN_GET(p->flags)) {
> >> +			is_interesting = pagemap_scan_is_interesting_page(bitmap, p);
> >> +			if (is_interesting)
> >> +				ret = pagemap_scan_output(bitmap, p, start, n_pages);
> >> +		}
> >> +
> >> +		if (IS_PM_SCAN_WP(p->flags) && is_written && is_interesting &&
> >> +		    ret >= 0) {
> >> +			make_uffd_wp_pmd(vma, start, pmd);
> >> +			flush_tlb_range(vma, start, end);
> >> +		}
> >> +
> >> +		spin_unlock(ptl);
> >> +
> >> +		arch_leave_lazy_mmu_mode();
> >> +		return ret;
> >> +	}
> >> +
> >> +process_smaller_pages:
> >> +#endif
> >> +
> >> +	orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl);
> >> +	if (!pte) {
> > 
> > Do we need to unlock ptl here?
> > 
> > 		spin_unlock(ptl);
> No, please look at these recently merged patches:
> https://lore.kernel.org/all/c1c9a74a-bc5b-15ea-e5d2-8ec34bc921d@google.com
> 
> > 
> >> +		walk->action = ACTION_AGAIN;
> >> +		return 0;
> >> +	}
> >> +
> >> +	for (addr = start; addr < end && !ret; pte++, addr += PAGE_SIZE) {
> >> +		ptent = ptep_get(pte);
> >> +		is_written = !is_pte_uffd_wp(ptent);
> >> +
> >> +		bitmap = PM_SCAN_FLAGS(is_written, (bool)vma->vm_file,
> >> +				       pte_present(ptent), is_swap_pte(ptent));
> > 
> > The vma->vm_file check isn't correct in this case. You can look when
> > pte_to_pagemap_entry sets PM_FILE. This flag is used to detect what
> > pages have a file backing store and what pages are anonymous.
> I'll update.
> 
> > 
> > I was trying to integrate this new interace into CRIU and I found
> > one more thing that is required. We need to detect zero pages.
> Should we name it ZERO_PFN_PRESENT_PAGE to be exact or what?

IMHO, ZERO_PFN_PRESENT_PAGE looks a bit monstrous.
It looks like zero page is a proper noun in the kernel, so PAGE_IS_ZERO
might be a good choice here, but it is up to you.

> 
> > 
> > It should look something like this:
> > 
> > #define PM_SCAN_FLAGS(wt, file, present, swap, zero)   \
> >        ((wt) | ((file) << 1) | ((present) << 2) | ((swap) << 3) | ((zero) << 4))
> > 
> > 
> > bitmap = PM_SCAN_FLAGS(is_written, page && !PageAnon(page),
> > 		      pte_present(ptent), is_swap_pte(ptent),
> > 		      pte_present(ptent) && is_zero_pfn(pte_pfn(ptent)));
> Okay. Can you please confirm my assumptions:
> - A THP cannot be file backed. (PM_FILE isn't being set for THP case)

```
Currently THP only works for anonymous memory mappings and tmpfs/shmem.
But in the future it can expand to other filesystems. 
```
https://www.kernel.org/doc/html/next/admin-guide/mm/transhuge.html

so THP can be "file backed".

> - A hole is also not file backed.
> 
> A hole isn't present in memory. So its pfn would be zero. But as it isn't
> present, it shouldn't report zero page. Right? For hole::
> 
> PM_SCAN_FLAGS(false, false, false, false, false)

This looks correct to me.


  parent reply	other threads:[~2023-06-21  6:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15 14:11 [PATCH v19 0/5] Implement IOCTL to get and optionally clear info about PTEs Muhammad Usama Anjum
2023-06-15 14:11 ` [PATCH v19 1/5] userfaultfd: UFFD_FEATURE_WP_ASYNC Muhammad Usama Anjum
2023-06-15 14:11 ` [PATCH v19 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs Muhammad Usama Anjum
2023-06-15 16:40   ` kernel test robot
2023-06-17  6:39   ` Andrei Vagin
2023-06-19  6:06     ` Muhammad Usama Anjum
2023-06-20 11:19       ` Muhammad Usama Anjum
2023-06-21  6:42       ` Andrei Vagin [this message]
2023-06-21  7:02         ` Muhammad Usama Anjum
2023-06-20 18:03   ` Andrei Vagin
2023-06-21  6:34     ` Muhammad Usama Anjum
2023-06-21 13:29       ` Michał Mirosław
2023-06-22  9:59         ` Muhammad Usama Anjum
2023-06-21 19:45       ` Andrei Vagin
2023-06-22 10:20         ` Muhammad Usama Anjum
2023-06-23  9:44           ` Michał Mirosław
2023-06-15 14:11 ` [PATCH v19 3/5] tools headers UAPI: Update linux/fs.h with the kernel sources Muhammad Usama Anjum
2023-06-15 14:11 ` [PATCH v19 4/5] mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL Muhammad Usama Anjum
2023-06-15 14:11 ` [PATCH v19 5/5] selftests: mm: add pagemap ioctl tests Muhammad Usama Anjum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJKbxKrJRy/L2JuA@gmail.com \
    --to=avagin@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=axelrasmussen@google.com \
    --cc=brauner@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=emmir@google.com \
    --cc=gorcunov@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gustavoars@kernel.org \
    --cc=kernel@collabora.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mdanylo@google.com \
    --cc=namit@vmware.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=pgofman@codeweavers.com \
    --cc=rppt@kernel.org \
    --cc=shuah@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=surenb@google.com \
    --cc=usama.anjum@collabora.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yun.zhou@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.