All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: "Michał Mirosław" <emmir@google.com>,
	"Andrei Vagin" <avagin@gmail.com>,
	"Danylo Mocherniuk" <mdanylo@google.com>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Greg KH" <gregkh@linuxfoundation.org>,
	"Christian Brauner" <brauner@kernel.org>,
	"Yang Shi" <shy828301@gmail.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Zach O'Keefe" <zokeefe@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Gustavo A. R. Silva" <gustavoars@kernel.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	kernel@collabora.com,
	"Gabriel Krisman Bertazi" <krisman@collabora.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Peter Enderborg" <peter.enderborg@sony.com>,
	"open list : KERNEL SELFTEST FRAMEWORK"
	<linux-kselftest@vger.kernel.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"open list" <linux-kernel@vger.kernel.org>,
	"open list : PROC FILESYSTEM" <linux-fsdevel@vger.kernel.org>,
	"open list : MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	"Paul Gofman" <pgofman@codeweavers.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>
Subject: Re: [PATCH v6 0/3] Implement IOCTL to get and/or the clear info about PTEs
Date: Wed, 23 Nov 2022 09:11:30 -0500	[thread overview]
Message-ID: <Y34qEo6cB3oDwoCe@x1n> (raw)
In-Reply-To: <20221109102303.851281-1-usama.anjum@collabora.com>

On Wed, Nov 09, 2022 at 03:23:00PM +0500, Muhammad Usama Anjum wrote:
> Soft-dirty PTE bit of the memory pages can be read by using the pagemap
> procfs file. The soft-dirty PTE bit for the whole memory range of the
> process can be cleared by writing to the clear_refs file. There are other
> methods to mimic this information entirely in userspace with poor
> performance:
> - The mprotect syscall and SIGSEGV handler for bookkeeping
> - The userfaultfd syscall with the handler for bookkeeping

Userfaultfd is definitely slow in this case because it needs the messaging
roundtrip that happens in two different threads synchronously, so at least
more schedule effort even than mprotect.

I saw the other patch on vma merging with SOFTDIRTY, didn't look deeper
there but IIUC it won't really help much if the other commit (34228d47)
can't be reverted then it seems to help nothing.  And, it does looks risky
to revert that because in the same commit it mentioned the case where one
can clear ref right before a vma merge, so definitely worth more thoughts
and testings, which I agree with you.

I'm thinking whether the vma issue can be totally avoided.  For example by
providing an async version of uffd-wp.

Currently uffd-wp must be synchronous and it'll be slow but it services
specific purposes.  And this is definitely not the 1st time any of us
thinking about uffd-wp being async, it's just that we need to solve the
problem of storage on the dirty information.

Actually we can also use other storage form but so far I didn't think of
anything that's easy and clean.  Current soft-dirty bit also has its
defects (e.g. the need to take mmap lock and walk the pgtables), but that
part will be the same as soft-dirty for now.

Now I'm wildly thinking whether we can just reuse the soft-dirty bit in the
ptes already defined.  The GET interface could be similar as proposed here,
or at least a separate issue.

So _maybe_ we can have a feature (bound to the uffd context) for uffd that
enables async uffd-wp, in which case the wr-protect fault is not sending
any message anymore (nor enqueuing) but instead setting the soft-dirty then
quickly resolving the write bit immediately and continue the fault.

Clearing of the soft-dirty bit needs to be done in UFFDIO_WRITEPROTECT
alongside of clearing uffd-wp bit.  So on that part the current GET+CLEAR
interface for pagemap may need to be replaced.  And frankly, it feels weird
to me to allow change mm layout in pagemap ioctls..  With this we can keep
the pagemap interface to only fetch information, like before.

A major benefit of using uffd is that uffd is by nature pte-based, so no
fiddling with vma needed at all.  Firstly, no need to worry about merging
vmas with tons of false positives.  Meanwhile, one can wr-protect in
page-size granule easily.  All the wr-protect is not governed by vma flag
anymore but based on uffd-wp flag, so no extra overhead too on any page
that the monitor is not interested.  There's already infrastructure code
for persisting uffd-wp bit, so it'll naturally work similarly for an async
mode if to come to the world.

It's just that we'll also need to consider exclusive use of the bit, so
we'll need to fail clear_refs on vmas where we have VM_UFFD_WP and also the
async feature enabled.  I would hope that's very rare, but worth thinking
about its side effect.  The same will need to apply to UFFDIO_REGISTER on
async wp mode when soft-dirty enabled, we'll need to bailout too.

Said that, this is not a suggestion of a new design, but just something I
thought about when reading this, and quickly writting this down.

Thanks,

-- 
Peter Xu


      parent reply	other threads:[~2022-11-23 14:12 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-09 10:23 [PATCH v6 0/3] Implement IOCTL to get and/or the clear info about PTEs Muhammad Usama Anjum
2022-11-09 10:23 ` [PATCH v6 1/3] fs/proc/task_mmu: update functions to clear the soft-dirty PTE bit Muhammad Usama Anjum
2022-11-09 10:23 ` [PATCH v6 2/3] fs/proc/task_mmu: Implement IOCTL to get and/or the clear info about PTEs Muhammad Usama Anjum
2022-11-09 23:54   ` Andrei Vagin
2022-11-11 10:10     ` Muhammad Usama Anjum
2022-11-10 17:58   ` kernel test robot
2022-11-11 17:13   ` kernel test robot
2022-11-11 17:53     ` Muhammad Usama Anjum
2022-11-18  1:32   ` kernel test robot
2022-12-12 20:42   ` Cyrill Gorcunov
2022-12-13 13:04     ` Muhammad Usama Anjum
2022-12-13 22:22       ` Cyrill Gorcunov
2022-11-09 10:23 ` [PATCH v6 3/3] selftests: vm: add pagemap ioctl tests Muhammad Usama Anjum
2022-11-09 10:34 ` [PATCH v6 0/3] Implement IOCTL to get and/or the clear info about PTEs David Hildenbrand
2022-11-11  7:08   ` Muhammad Usama Anjum
2022-11-14 15:46     ` David Hildenbrand
2022-11-21 15:00       ` Muhammad Usama Anjum
2022-11-21 15:55         ` David Hildenbrand
2022-11-30 11:42           ` Muhammad Usama Anjum
2022-11-30 12:10             ` David Hildenbrand
2022-12-05 15:29               ` Muhammad Usama Anjum
2022-12-05 15:39                 ` David Hildenbrand
2022-11-23 14:11 ` Peter Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y34qEo6cB3oDwoCe@x1n \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=brauner@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=emmir@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gustavoars@kernel.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mdanylo@google.com \
    --cc=peter.enderborg@sony.com \
    --cc=pgofman@codeweavers.com \
    --cc=shuah@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=surenb@google.com \
    --cc=usama.anjum@collabora.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.