All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrei Vagin <avagin@gmail.com>
To: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shuah Khan <shuah@kernel.org>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:PROC FILESYSTEM" <linux-fsdevel@vger.kernel.org>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	"open list:KERNEL SELFTEST FRAMEWORK" 
	<linux-kselftest@vger.kernel.org>,
	kernel@collabora.com,
	Gabriel Krisman Bertazi <krisman@collabora.com>,
	David Hildenbrand <david@redhat.com>,
	Peter Enderborg <peter.enderborg@sony.com>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [PATCH v3 0/4] Implement IOCTL to get and clear soft dirty PTE
Date: Mon, 19 Sep 2022 07:58:11 -0700	[thread overview]
Message-ID: <YyiDg79flhWoMDZB@gmail.com> (raw)
In-Reply-To: <20220826064535.1941190-1-usama.anjum@collabora.com>

On Fri, Aug 26, 2022 at 11:45:31AM +0500, Muhammad Usama Anjum wrote:
> 
> Hello,
> 
> This patch series implements a new ioctl on the pagemap proc fs file to
> get, clear and perform both get and clear at the same time atomically on
> the specified range of the memory.
> 
> Soft-dirty PTE bit of the memory pages can be viewed by using pagemap
> procfs file. The soft-dirty PTE bit for the whole memory range of the
> process can be cleared by writing to the clear_refs file. This series
> adds features that weren't present earlier.
> - There is no atomic get soft-dirty PTE bit status and clear operation
>   present.
> - The soft-dirty PTE bit of only a part of memory cannot be cleared.
> 
> Historically, soft-dirty PTE bit tracking has been used in the CRIU
> project. The proc fs interface is enough for that as I think the process
> is frozen. We have the use case where we need to track the soft-dirty
> PTE bit for the running processes. We need this tracking and clear
> mechanism of a region of memory while the process is running to emulate
> the getWriteWatch() syscall of Windows. This syscall is used by games to
> keep track of dirty pages and keep processing only the dirty pages. This
> new ioctl can be used by the CRIU project and other applications which
> require soft-dirty PTE bit information.
> 
> As in the current kernel there is no way to clear a part of memory (instead
> of clearing the Soft-Dirty bits for the entire process) and get+clear
> operation cannot be performed atomically, there are other methods to mimic
> this information entirely in userspace with poor performance:
> - The mprotect syscall and SIGSEGV handler for bookkeeping
> - The userfaultfd syscall with the handler for bookkeeping
> Some benchmarks can be seen [1].
> 
> This ioctl can be used by the CRIU project and other applications which
> require soft-dirty PTE bit information. The following operations are
> supported in this ioctl:
> - Get the pages that are soft-dirty.

I think this interface doesn't have to be limited by the soft-dirty
bits only. For example, CRIU needs to know whether file, present and swap bits
are set or not.

I mean we should be able to specify for what pages we need to get info
for. An ioctl argument can have these four fields:
* required bits (rmask & mask == mask) - all bits from this mask have to be set.
* any of these bits (amask & mask != 0) - any of these bits is set.
* exclude masks (emask & mask == 0) = none of these bits are set.
* return mask - bits that have to be reported to user.

> - Clear the pages which are soft-dirty.
> - The optional flag to ignore the VM_SOFTDIRTY and only track per page
> soft-dirty PTE bit
> 
> There are two decisions which have been taken about how to get the output
> from the syscall.
> - Return offsets of the pages from the start in the vec

We can conside to return regions that contains pages with the same set
of bits.

struct page_region {
	void *start;
	long size;
	u64 bitmap;
}

And ioctl returns arrays of page_region-s. I believe it will be more
compact form for many cases.

> - Stop execution when vec is filled with dirty pages
> These two arguments doesn't follow the mincore() philosophy where the
> output array corresponds to the address range in one to one fashion, hence
> the output buffer length isn't passed and only a flag is set if the page
> is present. This makes mincore() easy to use with less control. We are
> passing the size of the output array and putting return data consecutively
> which is offset of dirty pages from the start. The user can convert these
> offsets back into the dirty page addresses easily. Suppose, the user want
> to get first 10 dirty pages from a total memory of 100 pages. He'll
> allocate output buffer of size 10 and the ioctl will abort after finding the
> 10 pages. This behaviour is needed to support Windows' getWriteWatch(). The
> behaviour like mincore() can be achieved by passing output buffer of 100
> size. This interface can be used for any desired behaviour.
> 
> [1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora.com/
> 
> Regards,
> Muhammad Usama Anjum
> 
> Muhammad Usama Anjum (4):
>   fs/proc/task_mmu: update functions to clear the soft-dirty PTE bit
>   fs/proc/task_mmu: Implement IOCTL to get and clear soft dirty PTE bit
>   selftests: vm: add pagemap ioctl tests
>   mm: add documentation of the new ioctl on pagemap
> 
>  Documentation/admin-guide/mm/soft-dirty.rst |  42 +-
>  fs/proc/task_mmu.c                          | 342 ++++++++++-
>  include/uapi/linux/fs.h                     |  23 +
>  tools/include/uapi/linux/fs.h               |  23 +
>  tools/testing/selftests/vm/.gitignore       |   1 +
>  tools/testing/selftests/vm/Makefile         |   2 +
>  tools/testing/selftests/vm/pagemap_ioctl.c  | 649 ++++++++++++++++++++
>  7 files changed, 1050 insertions(+), 32 deletions(-)
>  create mode 100644 tools/testing/selftests/vm/pagemap_ioctl.c
> 
> -- 
> 2.30.2
> 

  parent reply	other threads:[~2022-09-19 14:58 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26  6:45 [PATCH v3 0/4] Implement IOCTL to get and clear soft dirty PTE Muhammad Usama Anjum
2022-08-26  6:45 ` [PATCH v3 1/4] fs/proc/task_mmu: update functions to clear the soft-dirty PTE bit Muhammad Usama Anjum
2022-08-26  6:45 ` [PATCH v3 2/4] fs/proc/task_mmu: Implement IOCTL to get and clear soft dirty " Muhammad Usama Anjum
2022-08-26  6:45 ` [PATCH v3 3/4] selftests: vm: add pagemap ioctl tests Muhammad Usama Anjum
2022-08-26  6:45 ` [PATCH v3 4/4] mm: add documentation of the new ioctl on pagemap Muhammad Usama Anjum
2022-08-26  8:22   ` Bagas Sanjaya
2022-09-07  9:40 ` [PATCH v3 0/4] Implement IOCTL to get and clear soft dirty PTE Muhammad Usama Anjum
2022-09-19 14:58 ` Andrei Vagin [this message]
2022-09-21 18:26   ` Muhammad Usama Anjum
2022-09-28 17:24     ` Andrei Vagin
2022-10-03 11:21       ` Muhammad Usama Anjum
2022-10-11  4:52         ` Andrei Vagin
2022-10-14 13:48           ` Danylo Mocherniuk
2022-10-18 10:36             ` Muhammad Usama Anjum
2022-10-18 10:48               ` Greg KH
2022-10-18 12:30                 ` Muhammad Usama Anjum
2022-10-18 11:11               ` Michał Mirosław
2022-10-18 13:22                 ` Muhammad Usama Anjum
2022-10-18 17:17                   ` Michał Mirosław
2022-10-19  7:18                     ` Muhammad Usama Anjum
2022-10-18 13:32               ` Muhammad Usama Anjum
2022-10-18 18:20                 ` Greg KH
2022-09-21 18:30 ` Muhammad Usama Anjum
2022-09-28  6:03 ` Muhammad Usama Anjum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YyiDg79flhWoMDZB@gmail.com \
    --to=avagin@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peter.enderborg@sony.com \
    --cc=shuah@kernel.org \
    --cc=usama.anjum@collabora.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.