All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muhammad Usama Anjum <usama.anjum@collabora.com>
To: Danylo Mocherniuk <mdanylo@google.com>,
	avagin@gmail.com, linux-mm@kvack.org, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org
Cc: corbet@lwn.net, david@redhat.com, kernel@collabora.com,
	krisman@collabora.com, linux-doc@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, peter.enderborg@sony.com,
	shuah@kernel.org, viro@zeniv.linux.org.uk, willy@infradead.org,
	emmir@google.com, figiel@google.com, kyurtsever@google.com,
	Paul Gofman <pgofman@codeweavers.com>,
	surenb@google.com
Subject: Re: [PATCH v3 0/4] Implement IOCTL to get and clear soft dirty PTE
Date: Tue, 18 Oct 2022 18:32:46 +0500	[thread overview]
Message-ID: <8e6ae988-ae89-9e94-ca05-38a4c2548356@collabora.com> (raw)
In-Reply-To: <474513c0-4ff9-7978-9d77-839fe775d04c@collabora.com>

On 10/18/22 3:36 PM, Muhammad Usama Anjum wrote:
>>>>>>> I mean we should be able to specify for what pages we need to get 
>>>>>>> info
>>>>>>> for. An ioctl argument can have these four fields:
>>>>>>> * required bits (rmask & mask == mask) - all bits from this mask 
>>>>>>> have to be set.
>>>>>>> * any of these bits (amask & mask != 0) - any of these bits is set.
>>>>>>> * exclude masks (emask & mask == 0) = none of these bits are set.
>>>>>>> * return mask - bits that have to be reported to user.
>>>> The required mask (rmask) makes sense to me. At the moment, I only know
>>>> about the practical use case for the required mask. Can you share how
>>>> can any and exclude masks help for the CRIU?
>>>>
>>>
>>> I looked at should_dump_page in the CRIU code:
>>> https://github.com/checkpoint-restore/criu/blob/45641ab26d7bb78706a6215fdef8f9133abf8d10/criu/mem.c#L102
>>>
>>> When CRIU dumps file private mappings, it needs to get pages that have
>>> PME_PRESENT or PME_SWAP but don't have PME_FILE.
>>
>> I would really like to see the mask discussed will be adopted. With it 
>> CRIU will
>> be able to migrate huge sparse VMAs assuming that a single hole is 
>> processed in
>> O(1) time.
>>
>> Use cases for migrating sparse VMAs are binaries sanitized with ASAN, 
>> MSAN or
>> TSAN [1]. All of these sanitizers produce sparse mappings of shadow 
>> memory [2].
>> Being able to migrate such binaries allows to highly reduce the amount 
>> of work
>> needed to identify and fix post-migration crashes, which happen 
>> constantly.
>>
> 
> Hello all,
> 
> I've included the masks which the CRIU developers have specified. 
> max_out_page is another new optional variable which is needed to 
> terminate the operation without visiting all the pages after finding the 
> max_out_page number of desired pages. There is no way to terminate the 
> operation without this variable.
> 
> How does the interface looks now? Please comment.
> 
Updated interface with only one IOCTL. If vec is defined, get operation 
will be performed. If PAGEMAP_SD_CLEAR flag is specified, soft dirty bit 
will be cleared as well. CLEAR flag can only be specified for clearing 
soft dirty bit.

/* PAGEMAP IOCTL */
#define PAGEMAP_SCAN	_IOWR('f', 16, struct pagemap_sd_args)

/* Bits are set in the bitmap of the page_region and masks in 
pagemap_sd_args */
#define PAGE_IS_SD	1 << 0
#define PAGE_IS_FILE	1 << 1
#define PAGE_IS_PRESENT	1 << 2
#define PAGE_IS_SWAPED	1 << 3

/**
  * struct page_region - Page region with bitmap flags
  * @start:	Start of the region
  * @len:	Length of the region
  * bitmap:	Bits sets for the region
  */
struct page_region {
	__u64 start;
	__u64 len;
	__u64 bitmap;
};

/**
  * struct pagemap_sd_args - Soft-dirty IOCTL argument
  * @start:		Starting address of the page
  * @len:		Length of the region (All the pages in this length are included)
  * @vec:		Output page_region struct array
  * @vec_len:		Length of the page_region struct array
  * @max_out_page:	Optional max output pages (It must be less than 
vec_len if specified)
  * @flags:		Special flags for the IOCTL
  * @rmask:		Required mask - All of these bits have to be set
  * @amask:		Any mask - Any of these bits are set
  * @emask:		Exclude mask - None of these bits are set
  * @rmask:		Bits that have to be reported to the user in page_region
  */
struct pagemap_scan_args {
	__u64 __user start;
	__u64 len;
	__u64 __user vec;
	__u64 vec_len;
	__u32 max_out_page;
	__u32 flags;
	__u32 rmask;
	__u32 amask;
	__u32 emask;
	__u32 rmask;
};

/* Special flags */
#define PAGEMAP_SD_CLEAR		1 << 0
#define PAGEMAP_NO_REUSED_REGIONS	1 << 1


> /* PAGEMAP IOCTL */
> #define PAGEMAP_GET        _IOWR('f', 16, struct pagemap_sd_args)
> #define PAGEMAP_CLEAR        _IOWR('f', 17, struct pagemap_sd_args)
> #define PAGEMAP_GET_AND_CLEAR    _IOWR('f', 18, struct pagemap_sd_args)
> 
> /* Bits are set in the bitmap of the page_region and masks in 
> pagemap_sd_args */
> #define PAGE_IS_SD    1 << 0
> #define PAGE_IS_FILE    1 << 1
> #define PAGE_IS_PRESENT    1 << 2
> #define PAGE_IS_SWAPED    1 << 3
> 
> /**
>   * struct page_region - Page region with bitmap flags
>   * @start:    Start of the region
>   * @len:    Length of the region
>   * bitmap:    Bits sets for the region
>   */
> struct page_region {
>      __u64 start;
>      __u64 len;
>      __u64 bitmap;
> };
> 
> /**
>   * struct pagemap_sd_args - Soft-dirty IOCTL argument
>   * @start:        Starting address
>   * @len:        Length of the region
>   * @vec:        Output page_region struct array
>   * @vec_len:        Length of the page_region struct array
>   * @max_out_page:    Optional max output pages (It must be less than 
> vec_len if specified)
>   * @flags:        Special flags for the IOCTL
>   * @rmask:        Special flags for the IOCTL
>   * @amask:        Special flags for the IOCTL
>   * @emask:        Special flags for the IOCTL
>   * @__reserved:        Reserved member to preserve data alignment. Must 
> be 0.
>   */
> struct pagemap_sd_args {
>      __u64 __user start;
>      __u64 len;
>      __u64 __user vec; // page_region
>      __u64 vec_len;    // sizeof(page_region)
>      __u32 flags;      // special flags
>      __u32 rmask;
>      __u32 amask;
>      __u32 emask;
>      __u32 max_out_page;
>      __u32 __reserved;
> };
> 
> /* Special flags */
> #define PAGEMAP_NO_REUSED_REGIONS    0x1
> 
> 
>>>
>>>>>>>> - Clear the pages which are soft-dirty.
>>>>>>>> - The optional flag to ignore the VM_SOFTDIRTY and only track 
>>>>>>>> per page
>>>>>>>> soft-dirty PTE bit
>>>>>>>>
>>>>>>>> There are two decisions which have been taken about how to get 
>>>>>>>> the output
>>>>>>>> from the syscall.
>>>>>>>> - Return offsets of the pages from the start in the vec
>>>>>>>
>>>>>>> We can conside to return regions that contains pages with the 
>>>>>>> same set
>>>>>>> of bits.
>>>>>>>
>>>>>>> struct page_region {
>>>>>>>        void *start;
>>>>>>>        long size;
>>>>>>>        u64 bitmap;
>>>>>>> }
>>>>>>>
>>>>>>> And ioctl returns arrays of page_region-s. I believe it will be more
>>>>>>> compact form for many cases.
>>>>>> Thank you for mentioning this. I'd considered this while development.
>>>>>> But I gave up and used the simple array to return the offsets of the
>>>>>> pages as in the problem I'm trying to solve, the dirty pages may be
>>>>>> present amid non-dirty pages. The range may not be useful in that 
>>>>>> case.
>>>>>
>>>>> This is a good example. If we expect more than two consequent pages
>>>>> on average, the "region" interface looks more prefered. I don't 
>>>>> know your
>>>>> use-case, but in the case of CRIU, this assumption looks reasonable.
>>
>> Plus one for page_region data structure. It will make ASAN shadow memory
>> representation much more compact as well as any other classical use-case.
>>
>> [1] https://github.com/google/sanitizers
>> [2] 
>> https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
>>
>> Best,
>> Danylo
>>

  parent reply	other threads:[~2022-10-18 13:33 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26  6:45 [PATCH v3 0/4] Implement IOCTL to get and clear soft dirty PTE Muhammad Usama Anjum
2022-08-26  6:45 ` [PATCH v3 1/4] fs/proc/task_mmu: update functions to clear the soft-dirty PTE bit Muhammad Usama Anjum
2022-08-26  6:45 ` [PATCH v3 2/4] fs/proc/task_mmu: Implement IOCTL to get and clear soft dirty " Muhammad Usama Anjum
2022-08-26  6:45 ` [PATCH v3 3/4] selftests: vm: add pagemap ioctl tests Muhammad Usama Anjum
2022-08-26  6:45 ` [PATCH v3 4/4] mm: add documentation of the new ioctl on pagemap Muhammad Usama Anjum
2022-08-26  8:22   ` Bagas Sanjaya
2022-09-07  9:40 ` [PATCH v3 0/4] Implement IOCTL to get and clear soft dirty PTE Muhammad Usama Anjum
2022-09-19 14:58 ` Andrei Vagin
2022-09-21 18:26   ` Muhammad Usama Anjum
2022-09-28 17:24     ` Andrei Vagin
2022-10-03 11:21       ` Muhammad Usama Anjum
2022-10-11  4:52         ` Andrei Vagin
2022-10-14 13:48           ` Danylo Mocherniuk
2022-10-18 10:36             ` Muhammad Usama Anjum
2022-10-18 10:48               ` Greg KH
2022-10-18 12:30                 ` Muhammad Usama Anjum
2022-10-18 11:11               ` Michał Mirosław
2022-10-18 13:22                 ` Muhammad Usama Anjum
2022-10-18 17:17                   ` Michał Mirosław
2022-10-19  7:18                     ` Muhammad Usama Anjum
2022-10-18 13:32               ` Muhammad Usama Anjum [this message]
2022-10-18 18:20                 ` Greg KH
2022-09-21 18:30 ` Muhammad Usama Anjum
2022-09-28  6:03 ` Muhammad Usama Anjum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e6ae988-ae89-9e94-ca05-38a4c2548356@collabora.com \
    --to=usama.anjum@collabora.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=emmir@google.com \
    --cc=figiel@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=kyurtsever@google.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mdanylo@google.com \
    --cc=peter.enderborg@sony.com \
    --cc=pgofman@codeweavers.com \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.