linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tiberiu Georgescu <tiberiu.georgescu@nutanix.com>
To: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	"david@redhat.com" <david@redhat.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ivan Teterevkov <ivan.teterevkov@nutanix.com>,
	Florian Schmidt <flosch@nutanix.com>,
	"Carl Waldspurger [C]" <carl.waldspurger@nutanix.com>,
	Jonathan Davies <jond@nutanix.com>
Subject: Re: [PATCH v2 1/1] Documentation: update pagemap with shmem exceptions
Date: Tue, 21 Sep 2021 16:08:29 +0000	[thread overview]
Message-ID: <C983908F-7AF4-410B-90FF-DB4B9A06E917@nutanix.com> (raw)
In-Reply-To: <YUn0ikP4Gip3Yc6L@t490s>


> On 21 Sep 2021, at 16:04, Peter Xu <peterx@redhat.com> wrote:
> 
> Hi, Tiberiu,
> 
> On Tue, Sep 21, 2021 at 08:52:32AM +0000, Tiberiu Georgescu wrote:
>> I tested it some more, and it still looks like the mincore() syscall considers pages
>> in the swap cache as "in memory". This is how I tested:
>> 
>> 1. Create a cgroup with 1M limit_in_bytes, and allow swapping
>> 2. mmap 1024 pages (both shared and private present the same behaviour)
>> 3. write to all pages in order
>> 4. compare mincore output with pagemap output
>> 
>> This is an example of a usual mincore output in this scenario, shortened for
>> coherency (4x8 instead of 16x64):
>> 00000000
>> 00000000
>> 00001110   <- this bugs me
>> 01111111
>> 
>> The last 7 bits are definitely marking pages present in memory, but there are
>> some other bits set a little earlier. When comparing this output with the pagemap,
>> indeed, there are 7 consecutive pages present, and the rest of them are
>> swapped, including those 3 which are marked as present by mincore.
>> At this point, I can only assume the bits in between are on the swap cache.
>> 
>> If you have another explanation, please share it with me. In the meanwhile,
>> I will rework the doc patch, and see if there is any other way to differentiate
>> clearly between the three types of pages. If not, I guess we'll stick to
>> mincore() and a best-effort 5th step.
> 
> IIUC it could be because of that the pages are still in swap cache, so
> mincore() will return 1 for them too.

That is my assumption as well.
> 
> What swap device are you using?  I'm wildly guessing you're not using frontswap
> like zram.  If that's the case, would you try zram?  That should flush the page
> synchronously iiuc, then all the "suspecious 1s" will go away above.

Correct, I was not using frontswap.
> 
> To do that, you may need to firstly turn off your current swap:
> 
>        # swapoff -a
> 
> Then to configure zram you need:
> 
>        # modprobe zram
>        # echo 4G > /sys/block/zram0/disksize
>        # mkswap --label zram0 /dev/zram0
>        # swapon --priority 100 /dev/zram0
> 
> Quotting from here:
> 
>        https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.archlinux.org_title_Improving-5Fperformance-23zram-5For-5Fzswap&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=rRM5dtWOv0DNo5dDxZ2U16jl4WAw6ql5szOKa9cu_RA&m=XWzLVqSSl8CSEcw2x6sUmspJhiUJei2gq6GTiaky8hk&s=k3BDgO9LN63Nn3vxorlc41MlUYzOUN0efajz4lol-k8&e= 
> 
> Then you can try run the same test program again.

Thanks, it worked!

Hmmm, so if we put emphasis on the accuracy of swap info, or accuracy in
general, we need to use frontswap. Otherwise, mincore() could suffer from
race conditions, and mark pages in the swap cache as being present.

Do you reckon this info (frontswap for mincore) should be present in
the pagemap docs? I wouldn't want to bloat the section either.

Kind regards,
Tibi


  reply	other threads:[~2021-09-21 16:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-20 16:49 [PATCH v2 0/1] Documenting shmem as an exception case for the pagemap Tiberiu A Georgescu
2021-09-20 16:49 ` [PATCH v2 1/1] Documentation: update pagemap with shmem exceptions Tiberiu A Georgescu
2021-09-20 17:36   ` David Hildenbrand
2021-09-20 19:07   ` Peter Xu
2021-09-21  8:52     ` Tiberiu Georgescu
2021-09-21 15:04       ` Peter Xu
2021-09-21 16:08         ` Tiberiu Georgescu [this message]
2021-09-21 16:30           ` Peter Xu
2021-09-21 17:09             ` Tiberiu Georgescu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C983908F-7AF4-410B-90FF-DB4B9A06E917@nutanix.com \
    --to=tiberiu.georgescu@nutanix.com \
    --cc=akpm@linux-foundation.org \
    --cc=carl.waldspurger@nutanix.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=flosch@nutanix.com \
    --cc=ivan.teterevkov@nutanix.com \
    --cc=jond@nutanix.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).