linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>,
	Tiberiu Georgescu <tiberiu.georgescu@nutanix.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"peter.xu@redhat.com" <peter.xu@redhat.com>,
	Ivan Teterevkov <ivan.teterevkov@nutanix.com>,
	Florian Schmidt <flosch@nutanix.com>,
	"Carl Waldspurger [C]" <carl.waldspurger@nutanix.com>,
	Jonathan Davies <jond@nutanix.com>
Subject: Re: [PATCH] Documentation: update pagemap with SOFT_DIRTY & UFFD_WP shmem issue
Date: Mon, 23 Aug 2021 10:40:39 +0200	[thread overview]
Message-ID: <3196e878-7ab3-a385-74e3-4c4bfbc66e36@redhat.com> (raw)
In-Reply-To: <YSAP0d8nxBShQiF+@t490s>

On 20.08.21 22:25, Peter Xu wrote:
> Hi, Tiberiu,
> 
> On Fri, Aug 20, 2021 at 05:10:20PM +0000, Tiberiu Georgescu wrote:
>> Currently, the missing information for shmem is this:
>> 1. Difference between is_swap(pte) and is_none(pte).
>>      * is_swap(pte) is always false;
>>      * is_none(pte) is true when is_swap() should have been;
>>      * is_present(pte) is fine.
>> 2. swp_entry(pte)
>>      Particularly, swp_type() and swp_offset().
>> 3. SOFT_DIRTY_BIT
>>      This is not always missing for shmem.
>>      Once 4 is written to clear_refs, if the page is dirtied, the bit is fine as long as it
>>      is still in memory. If the page is swapped out, the bit is lost. Then, if the page is
>>      brought back into memory, the bit is still lost.
>>
>> For 1, you mentioned how lseek() and madvise() can be used to get this
>> information [2], and I proposed a different method with a little help from
>> the current pagemap[3]. They have slightly different output and applications, so
>> the difference should be taken into consideration.
>> For 2, if anyone knows of any way of retrieve the missing information cleanly,
>> please let us know.
>> As for 3, AFAIK, we will need to leverage Peter's special PTE marker mechanism
>> and implement it in another patch.
>>
>> [2]: https://lore.kernel.org/lkml/5766d353-6ff8-fdfa-f8f9-764e8de9b5aa@redhat.com/
>> [3]: https://lore.kernel.org/lkml/B130B700-B3DB-4D07-A632-73030BCBC715@nutanix.com/
>>
>> ============================
>> For completeness, I would like to mention Peter's RFC[4] and my own patch[5],
>> which deal with adding missing functionality to the pagemap when pages are
>> shmem/tmpfs.
>>
>> Peter's patch[4] adds the missing information at 1 to the pagemap, with very little performance overhead. AFAIK, it is still WIP.
>>
>> My patch[5] fixes both 1 and 2, at the expense of a significant loss in performance
>> when dealing with swapped out shared pages. This performance loss can be
>> reduced with batching, for use cases when high performance matters. Also, this
>> patch on top of Peter's RFC yields better performance[6]. Still 2x as slow on
>> average compared to pre-patch.
>>
>> Peter's patch has a config flag, and I intend to add one to mine in the next
>> version. So I wanted to propose, if alternatives are not implemented yet (mincore,
>> lseek, map_files or otherwise are insufficient), we upstream our patches (once
>> they are ready), so that users can toggle them on or off, depending on whether
>> they need the extra functionality or not. And, of course, document their usage.
>>
>> If neither sounds like a particularly useful/convenient option, we might need to
>> look into designs of retrieving the missing information via another mechanism
>> (sys/fs, ioctl, netlink etc).
>>
>> That is, unless we find that we can/should place this info in the pagemap still, for
>> the sake of correctness and completeness. For that though, we should convene
>> on what do we expect the pagemap to do in the end. Is shmem/tmpfs out of
>> bounds for it or not?
>>
>> [4]: https://lore.kernel.org/lkml/20210807032521.7591-1-peterx@redhat.com/
>> [5]: https://lore.kernel.org/lkml/20210730160826.63785-1-tiberiu.georgescu@nutanix.com/
>> [6]: https://lore.kernel.org/lkml/C0DB3FED-F779-4838-9697-D05BE96C3514@nutanix.com/
> 
> Thanks for summarizing the issues.
> 
> Before going further, I really would like to understand a few questions that I
> already raised in the other thread here:
> 
> https://lore.kernel.org/lkml/YR%2F+gfL8RCP8XoB1@t490s/
> 
> They're:
> 
>    (1) Whether does mincore() suit your need already?
> 
>    (2) What would you like to do with swap entries in pagemap?
> 
> I'm more interested in question (2) because I never figured it out before, and
> I really don't see how it would work even if the kernel can share swap format
> to userspace.  E.g., right after you decided to "zero copy" that page, the page
> can be faulted in right before live migration finishes, and it can be dirtied
> again.  Then the page on the shared network storage will be stall, the same to
> the swap entry you just scanned.

I wonder if one should much rather try using shared file-backed memory 
located on a network storage instead of hacking into swap here.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-08-23  8:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12 15:58 [PATCH] Documentation: update pagemap with SOFT_DIRTY & UFFD_WP shmem issue Tiberiu A Georgescu
2021-08-18 19:14 ` David Hildenbrand
2021-08-20 17:10   ` Tiberiu Georgescu
2021-08-20 20:25     ` Peter Xu
2021-08-23  8:40       ` David Hildenbrand [this message]
2021-08-23  8:52     ` David Hildenbrand
2021-08-25 15:48       ` Tiberiu Georgescu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3196e878-7ab3-a385-74e3-4c4bfbc66e36@redhat.com \
    --to=david@redhat.com \
    --cc=carl.waldspurger@nutanix.com \
    --cc=corbet@lwn.net \
    --cc=flosch@nutanix.com \
    --cc=ivan.teterevkov@nutanix.com \
    --cc=jond@nutanix.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peter.xu@redhat.com \
    --cc=peterx@redhat.com \
    --cc=tiberiu.georgescu@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).