All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tiberiu Georgescu <tiberiu.georgescu@nutanix.com>
To: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	"david@redhat.com" <david@redhat.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ivan Teterevkov <ivan.teterevkov@nutanix.com>,
	Florian Schmidt <flosch@nutanix.com>,
	"Carl Waldspurger [C]" <carl.waldspurger@nutanix.com>,
	Jonathan Davies <jond@nutanix.com>
Subject: Re: [PATCH v2 1/1] Documentation: update pagemap with shmem exceptions
Date: Tue, 21 Sep 2021 08:52:32 +0000	[thread overview]
Message-ID: <F6A49621-C7A4-4643-95C2-F47B02F132D2@nutanix.com> (raw)
In-Reply-To: <YUjb91tWhd/YAgQW@t490s>


> On 20 Sep 2021, at 20:07, Peter Xu <peterx@redhat.com> wrote:
> 
> Hi, Tiberiu,
> 
> Thanks for the patch!  Yes it would still be nice to comment on this behavior,
> some trivial nitpicks below.
> 
> On Mon, Sep 20, 2021 at 04:49:31PM +0000, Tiberiu A Georgescu wrote:
>> +In user space, whether the page is swapped or none can be deduced with the
>> +lseek system call. For a single page, the algorithm is:
>> +
>> +0. If the pagemap entry of the page has bit 63 (page present) set, the page
>> +   is present.
>> +1. Otherwise, get an fd to the file where the page is backed. For anonymous
>> +   shared pages, the file can be found in ``/proc/pid/map_files/``.
>> +2. Call lseek with LSEEK_DATA flag and seek to the virtual address of the page
> 
> s/LSEEK_DATA/SEEK_DATA/

Oops, mb. Will change that.
> 
>> +   you wish to inspect. If it overshoots the PAGE_SIZE, the page is NONE.
>> +3. Otherwise, the page is in swap.
> 
> It could also not be in swap, right?
> 
> Example 1: this process mmap()ed an existing shmem file with data filled in,
> but without accessing it yet.  Then the page cache exists, not in swap, but
> pgtables will be empty.
> 
> Example 2: this process has mapped this shmem with 2M thp, all data filled in,
> then due to some reason thp splits, then the pgtable can also be none but lseek
> will succeed, I think.
> 
Ok, those are a lot of exceptions. So it's possible for the pagemap entry to be
empty, yet the page itself to be actually present. When that happens, the page is
mistakenly considered in "swap" by the current algorithm.

Thanks a lot for pointing that out!

> So to further identify whether that's in swap, we need a step 5 with mincore()
> system call, perhaps?

I tested it some more, and it still looks like the mincore() syscall considers pages
in the swap cache as "in memory". This is how I tested:

1. Create a cgroup with 1M limit_in_bytes, and allow swapping
2. mmap 1024 pages (both shared and private present the same behaviour)
3. write to all pages in order
4. compare mincore output with pagemap output

This is an example of a usual mincore output in this scenario, shortened for
coherency (4x8 instead of 16x64):
00000000
00000000
00001110   <- this bugs me
01111111

The last 7 bits are definitely marking pages present in memory, but there are
some other bits set a little earlier. When comparing this output with the pagemap,
indeed, there are 7 consecutive pages present, and the rest of them are
swapped, including those 3 which are marked as present by mincore.
At this point, I can only assume the bits in between are on the swap cache.

If you have another explanation, please share it with me. In the meanwhile,
I will rework the doc patch, and see if there is any other way to differentiate
clearly between the three types of pages. If not, I guess we'll stick to
mincore() and a best-effort 5th step.

--
Kind regards,
Tibi


  reply	other threads:[~2021-09-21  8:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-20 16:49 [PATCH v2 0/1] Documenting shmem as an exception case for the pagemap Tiberiu A Georgescu
2021-09-20 16:49 ` [PATCH v2 1/1] Documentation: update pagemap with shmem exceptions Tiberiu A Georgescu
2021-09-20 17:36   ` David Hildenbrand
2021-09-20 19:07   ` Peter Xu
2021-09-21  8:52     ` Tiberiu Georgescu [this message]
2021-09-21 15:04       ` Peter Xu
2021-09-21 16:08         ` Tiberiu Georgescu
2021-09-21 16:30           ` Peter Xu
2021-09-21 17:09             ` Tiberiu Georgescu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F6A49621-C7A4-4643-95C2-F47B02F132D2@nutanix.com \
    --to=tiberiu.georgescu@nutanix.com \
    --cc=akpm@linux-foundation.org \
    --cc=carl.waldspurger@nutanix.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=flosch@nutanix.com \
    --cc=ivan.teterevkov@nutanix.com \
    --cc=jond@nutanix.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.