linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com>,
	akpm@linux-foundation.org, viro@zeniv.linux.org.uk,
	christian.brauner@ubuntu.com, ebiederm@xmission.com,
	adobriyan@gmail.com, songmuchun@bytedance.com, axboe@kernel.dk,
	vincenzo.frascino@arm.com, catalin.marinas@arm.com,
	peterz@infradead.org, chinwen.chang@mediatek.com,
	linmiaohe@huawei.com, jannh@google.com, apopple@nvidia.com,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, ivan.teterevkov@nutanix.com,
	florian.schmidt@nutanix.com, carl.waldspurger@nutanix.com,
	jonathan.davies@nutanix.com
Subject: Re: [PATCH 0/1] pagemap: swap location for shared pages
Date: Wed, 11 Aug 2021 18:17:09 +0200	[thread overview]
Message-ID: <32fd63ef-3a8f-a037-28dc-a63dc11087a3@redhat.com> (raw)
In-Reply-To: <0beb1386-d670-aab1-6291-5c3cb0d661e0@redhat.com>

On 11.08.21 18:15, David Hildenbrand wrote:
> On 04.08.21 21:17, Peter Xu wrote:
>> On Wed, Aug 04, 2021 at 08:49:14PM +0200, David Hildenbrand wrote:
>>> TBH, I tend to really dislike the PTE marker idea. IMHO, we shouldn't store
>>> any state information regarding shared memory in per-process page tables: it
>>> just doesn't make too much sense.
>>>
>>> And this is similar to SOFTDIRTY or UFFD_WP bits: this information actually
>>> belongs to the shared file ("did *someone* write to this page", "is
>>> *someone* interested into changes to that page", "is there something"). I
>>> know, that screams for a completely different design in respect to these
>>> features.
>>>
>>> I guess we start learning the hard way that shared memory is just different
>>> and requires different interfaces than per-process page table interfaces we
>>> have (pagemap, userfaultfd).
>>>
>>> I didn't have time to explore any alternatives yet, but I wonder if tracking
>>> such stuff per an actual fd/memfd and not via process page tables is
>>> actually the right and clean approach. There are certainly many issues to
>>> solve, but conceptually to me it feels more natural to have these shared
>>> memory features not mangled into process page tables.
>>
>> Yes, we can explore all the possibilities, I'm totally fine with it.
>>
>> I just want to say I still don't think when there's page cache then we must put
>> all the page-relevant things into the page cache.
> 
> [sorry for the late reply]
> 
> Right, but for the case of shared, swapped out pages, the information is
> already there, in the page cache :)
> 
>>
>> They're shared by processes, but process can still have its own way to describe
>> the relationship to that page in the cache, to me it's as simple as "we allow
>> process A to write to page cache P", while "we don't allow process B to write
>> to the same page" like the write bit.
> 
> The issue I'm having uffd-wp as it was proposed for shared memory is
> that there is hardly a sane use case where we would *want* it to work
> that way.
> 
> A UFFD-WP flag in a page table for shared memory means "please notify
> once this process modifies the shared memory (via page tables, not via
> any other fd modification)". Do we have an example application where
> these semantics makes sense and don't over-complicate the whole
> approach? I don't know any, thus I'm asking dumb questions :)
> 
> 
> For background snapshots in QEMU the flow would currently be like this,
> assuming all processes have the shared guest memory mapped.
> 
> 1. Background snapshot preparation: QEMU requests all processes
>      to uffd-wp the range
> a) All processes register a uffd handler on guest RAM
> b) All processes fault in all guest memory (essentially populating all
>      memory): with a uffd-WP extensions we might be able to get rid of
>      that, I remember you were working on that.
> c) All processes uffd-WP the range to set the bit in their page table
> 
> 2. Background snapshot runs:
> a) A process either receives a UFFD-WP event and forwards it to QEMU or
>      QEMU polls all other processes for UFFD events.
> b) QEMU writes the to-be-changed page to the migration stream.
> c) QEMU triggers all processes to un-protect the page and wake up any
>      waiters. All processes clear the uffd-WP bit in their page tables.

Oh, and I forgot, whenever we save any page to the migration stream, we 
have to trigger all processes to un-protect.


-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-08-11 16:17 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30 16:08 [PATCH 0/1] pagemap: swap location for shared pages Tiberiu A Georgescu
2021-07-30 16:08 ` [PATCH 1/1] pagemap: report " Tiberiu A Georgescu
2021-07-30 17:28 ` [PATCH 0/1] pagemap: " Eric W. Biederman
2021-08-02 12:20   ` Tiberiu Georgescu
2021-08-04 18:33 ` Peter Xu
2021-08-04 18:49   ` David Hildenbrand
2021-08-04 19:17     ` Peter Xu
2021-08-11 16:15       ` David Hildenbrand
2021-08-11 16:17         ` David Hildenbrand [this message]
2021-08-11 18:25         ` Peter Xu
2021-08-11 18:41           ` David Hildenbrand
2021-08-11 19:54             ` Peter Xu
2021-08-11 20:13               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32fd63ef-3a8f-a037-28dc-a63dc11087a3@redhat.com \
    --to=david@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axboe@kernel.dk \
    --cc=carl.waldspurger@nutanix.com \
    --cc=catalin.marinas@arm.com \
    --cc=chinwen.chang@mediatek.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=ebiederm@xmission.com \
    --cc=florian.schmidt@nutanix.com \
    --cc=ivan.teterevkov@nutanix.com \
    --cc=jannh@google.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=songmuchun@bytedance.com \
    --cc=tiberiu.georgescu@nutanix.com \
    --cc=vincenzo.frascino@arm.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).