linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Tiberiu Georgescu <tiberiu.georgescu@nutanix.com>
Cc: David Hildenbrand <david@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"peter.xu@redhat.com" <peter.xu@redhat.com>,
	Ivan Teterevkov <ivan.teterevkov@nutanix.com>,
	Florian Schmidt <flosch@nutanix.com>,
	"Carl Waldspurger [C]" <carl.waldspurger@nutanix.com>,
	Jonathan Davies <jond@nutanix.com>
Subject: Re: [PATCH] Documentation: update pagemap with SOFT_DIRTY & UFFD_WP shmem issue
Date: Fri, 20 Aug 2021 16:25:53 -0400	[thread overview]
Message-ID: <YSAP0d8nxBShQiF+@t490s> (raw)
In-Reply-To: <F04C4283-0D25-4D0E-B3A8-05B36ACFF30D@nutanix.com>

Hi, Tiberiu,

On Fri, Aug 20, 2021 at 05:10:20PM +0000, Tiberiu Georgescu wrote:
> Currently, the missing information for shmem is this:
> 1. Difference between is_swap(pte) and is_none(pte).
>     * is_swap(pte) is always false;
>     * is_none(pte) is true when is_swap() should have been;
>     * is_present(pte) is fine.
> 2. swp_entry(pte)
>     Particularly, swp_type() and swp_offset().
> 3. SOFT_DIRTY_BIT
>     This is not always missing for shmem. 
>     Once 4 is written to clear_refs, if the page is dirtied, the bit is fine as long as it
>     is still in memory. If the page is swapped out, the bit is lost. Then, if the page is
>     brought back into memory, the bit is still lost.
> 
> For 1, you mentioned how lseek() and madvise() can be used to get this
> information [2], and I proposed a different method with a little help from
> the current pagemap[3]. They have slightly different output and applications, so
> the difference should be taken into consideration.
> For 2, if anyone knows of any way of retrieve the missing information cleanly,
> please let us know. 
> As for 3, AFAIK, we will need to leverage Peter's special PTE marker mechanism
> and implement it in another patch.
> 
> [2]: https://lore.kernel.org/lkml/5766d353-6ff8-fdfa-f8f9-764e8de9b5aa@redhat.com/
> [3]: https://lore.kernel.org/lkml/B130B700-B3DB-4D07-A632-73030BCBC715@nutanix.com/
> 
> ============================
> For completeness, I would like to mention Peter's RFC[4] and my own patch[5],
> which deal with adding missing functionality to the pagemap when pages are
> shmem/tmpfs.
> 
> Peter's patch[4] adds the missing information at 1 to the pagemap, with very little performance overhead. AFAIK, it is still WIP.
> 
> My patch[5] fixes both 1 and 2, at the expense of a significant loss in performance
> when dealing with swapped out shared pages. This performance loss can be
> reduced with batching, for use cases when high performance matters. Also, this
> patch on top of Peter's RFC yields better performance[6]. Still 2x as slow on
> average compared to pre-patch.
> 
> Peter's patch has a config flag, and I intend to add one to mine in the next
> version. So I wanted to propose, if alternatives are not implemented yet (mincore,
> lseek, map_files or otherwise are insufficient), we upstream our patches (once
> they are ready), so that users can toggle them on or off, depending on whether
> they need the extra functionality or not. And, of course, document their usage.
> 
> If neither sounds like a particularly useful/convenient option, we might need to
> look into designs of retrieving the missing information via another mechanism
> (sys/fs, ioctl, netlink etc).
> 
> That is, unless we find that we can/should place this info in the pagemap still, for
> the sake of correctness and completeness. For that though, we should convene
> on what do we expect the pagemap to do in the end. Is shmem/tmpfs out of
> bounds for it or not?
> 
> [4]: https://lore.kernel.org/lkml/20210807032521.7591-1-peterx@redhat.com/
> [5]: https://lore.kernel.org/lkml/20210730160826.63785-1-tiberiu.georgescu@nutanix.com/
> [6]: https://lore.kernel.org/lkml/C0DB3FED-F779-4838-9697-D05BE96C3514@nutanix.com/

Thanks for summarizing the issues.

Before going further, I really would like to understand a few questions that I
already raised in the other thread here:

https://lore.kernel.org/lkml/YR%2F+gfL8RCP8XoB1@t490s/

They're:

  (1) Whether does mincore() suit your need already?

  (2) What would you like to do with swap entries in pagemap?

I'm more interested in question (2) because I never figured it out before, and
I really don't see how it would work even if the kernel can share swap format
to userspace.  E.g., right after you decided to "zero copy" that page, the page
can be faulted in right before live migration finishes, and it can be dirtied
again.  Then the page on the shared network storage will be stall, the same to
the swap entry you just scanned.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2021-08-20 20:25 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12 15:58 [PATCH] Documentation: update pagemap with SOFT_DIRTY & UFFD_WP shmem issue Tiberiu A Georgescu
2021-08-18 19:14 ` David Hildenbrand
2021-08-20 17:10   ` Tiberiu Georgescu
2021-08-20 20:25     ` Peter Xu [this message]
2021-08-23  8:40       ` David Hildenbrand
2021-08-23  8:52     ` David Hildenbrand
2021-08-25 15:48       ` Tiberiu Georgescu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YSAP0d8nxBShQiF+@t490s \
    --to=peterx@redhat.com \
    --cc=carl.waldspurger@nutanix.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=flosch@nutanix.com \
    --cc=ivan.teterevkov@nutanix.com \
    --cc=jond@nutanix.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peter.xu@redhat.com \
    --cc=tiberiu.georgescu@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).