All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tiberiu Georgescu <tiberiu.georgescu@nutanix.com>
To: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Alistair Popple <apopple@nvidia.com>,
	Ivan Teterevkov <ivan.teterevkov@nutanix.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Hugh Dickins <hughd@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	"Carl Waldspurger [C]" <carl.waldspurger@nutanix.com>,
	Florian Schmidt <flosch@nutanix.com>,
	Jonathan Davies <jond@nutanix.com>
Subject: Re: [PATCH RFC 0/4] mm: Enable PM_SWAP for shmem with PTE_MARKER
Date: Thu, 19 Aug 2021 14:54:18 +0000	[thread overview]
Message-ID: <0A4C4E37-88C9-4490-9D8B-6990D805F447@nutanix.com> (raw)
In-Reply-To: <6ab58270-c487-2a56-b522-ea5100edb13c@redhat.com>


> On 18 Aug 2021, at 19:13, David Hildenbrand <david@redhat.com> wrote:
> 
>>> 
>>>> I'm now wondering whether for Tiberiu's case mincore() can also be used.  It
>>>> should just still be a bit slow because it'll look up the cache too, but it
>>>> should work similarly like the original proposal.
>> I am afraid that the information returned by mincore is a little too vague to be of better help, compared to what the pagemap should provide in theory. I will have a look to see whether lseek on
>> proc/map_files works as a "PM_SWAP" equivalent. However, the swap offset would still be missing.
> 
> Well, with mincore() you could at least decide "page is present" vs. "page is swapped or not existent". At least for making pageout decisions it shouldn't really matter, no? madvise(MADV_PAGEOUT) on a hole is a nop.

I think you are right. In the optimisation we first presented, we should be able to
send the madvise(MADV_PAGEOUT) call even if the page is none quite safely
and get the wanted behaviour. Also, the "is_present" or "is_swap_or_none"
question can be answered by the current pagemap too. Nice catch.

However, not all use cases are the same. AFAIK, there is still no way to figure
out whether a shared page is swapped out or none unless it is directly
read/accessed after a pagemap check. Bringing a page into memory to check
if it previously was in swap does not seem ideal.

Also, we still have no mechanism to retrieve the swap offsets of shmem pages
AFAIK. There is one more QEMU optimisation we are working on that requires
these mappings available outside of kernel space.
> 
> But I'm not 100% sure what exactly your use case is here and what you would really need, so you know best :)

Maybe, but I am always open to (m)advise :)
> 
>>> 
>>> Very right, maybe we can just avoid tampering with pagemap on shmem completely (which sounds like an excellent idea to me) and document it as "On shared memory, we will never indicate SWAPPED if the pages have been swapped out. Further, PRESENT might be under-indicated: if a shared page is currently not mapped into the page table of a process.". I saw there was a related, proposed doc update, maybe we can finetune that.
>>> 
>> We could take into consideration an alternative approach to retrieving the shared page info in user
>> space, like storing it in sys/fs instead of per process. However, just leaving the pagemap functionality
>> incomplete, and not providing an alternative to retrieve the missing information, does not seem right. Updating the docs with a "can't do" should be temporary, until an alternative or fix.
> 
> As I stated before, making pagemap less broken is not a good idea IMHO. Either make it really correct or just leave it all broken -- and document that e.g., other interfaces (lseek) shall be used. It sounds like they exist and are good enough for CRUI.
> 
> And TBH, if other interfaces already exist and get the job done, I'm more than happy that we can avoid mixing more shmem stuff into pagemap and trying to compensate performance problems by introducing inconsistency.
> 
> If it has an fd and we can punch that into syscalls, we should much rather use that fd to lookup stuff then going via process page tables -- if possible of course (to be evaluated, because I haven't looked into the CRIU details and how they use lseek with anonymous shared memory).

I found out that it is possible to retrieve the fds of shmem/tmpfs file allocations
using proc/pid/map_files, which is neat. Still, CRIU does not seem to care
whether a page is swapped out or just empty, only if it is present on page cache.
The holes that lseek finds would not be able to infer this difference, AFAIK. Will
test the behaviour to make sure.
> 
>> Also, I think you are talking about my own doc update patch[3]. If not, please share a link with your
>> next reply.
>> [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fm-3D162878395426774&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=rRM5dtWOv0DNo5dDxZ2U16jl4WAw6ql5szOKa9cu_RA&m=T9yjhM3vhL_Ip2wxg2x-BZclbbY3WAO5Oc-y7IqNs7Y&s=HujDmGVIi1iXQ22oWF_GE-sPxvQ2ORTcCWEfnpqq35o&e= 
> 
> No, that's it.
> 
Great, I'll head right there.

--
Kind regards,

Tibi

  reply	other threads:[~2021-08-19 14:54 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-07  3:25 [PATCH RFC 0/4] mm: Enable PM_SWAP for shmem with PTE_MARKER Peter Xu
2021-08-07  3:25 ` [PATCH RFC 1/4] mm: Introduce PTE_MARKER swap entry Peter Xu
2021-08-07  3:25 ` [PATCH RFC 2/4] mm: Check against orig_pte for finish_fault() Peter Xu
2021-08-07  3:25 ` [PATCH RFC 3/4] mm: Handle PTE_MARKER page faults Peter Xu
2021-08-07  3:25 ` [PATCH RFC 4/4] mm: Install marker pte when page out for shmem pages Peter Xu
2021-08-13 15:18   ` Tiberiu Georgescu
2021-08-13 16:01     ` Peter Xu
2021-08-18 18:02       ` Tiberiu Georgescu
2021-08-17  9:04 ` [PATCH RFC 0/4] mm: Enable PM_SWAP for shmem with PTE_MARKER David Hildenbrand
2021-08-17 17:09   ` Peter Xu
2021-08-17 18:46     ` David Hildenbrand
2021-08-17 20:24       ` Peter Xu
2021-08-18  8:24         ` David Hildenbrand
2021-08-18 17:52           ` Tiberiu Georgescu
2021-08-18 18:13             ` David Hildenbrand
2021-08-19 14:54               ` Tiberiu Georgescu [this message]
2021-08-19 17:26                 ` David Hildenbrand
2021-08-20 16:49                   ` Tiberiu Georgescu
2021-08-20 19:12                     ` Peter Xu
2021-08-25 13:40                       ` Tiberiu Georgescu
2021-08-25 14:59                         ` Peter Xu
2021-08-26 15:01                           ` Tiberiu Georgescu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0A4C4E37-88C9-4490-9D8B-6990D805F447@nutanix.com \
    --to=tiberiu.georgescu@nutanix.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=carl.waldspurger@nutanix.com \
    --cc=david@redhat.com \
    --cc=flosch@nutanix.com \
    --cc=hughd@google.com \
    --cc=ivan.teterevkov@nutanix.com \
    --cc=jond@nutanix.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.