From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DDC1C432BE for ; Wed, 11 Aug 2021 16:15:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B0BD860FA0 for ; Wed, 11 Aug 2021 16:15:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B0BD860FA0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 097486B0071; Wed, 11 Aug 2021 12:15:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 048276B0072; Wed, 11 Aug 2021 12:15:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E50908D0001; Wed, 11 Aug 2021 12:15:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113]) by kanga.kvack.org (Postfix) with ESMTP id C83DE6B0071 for ; Wed, 11 Aug 2021 12:15:45 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 5BAFD248A8 for ; Wed, 11 Aug 2021 16:15:45 +0000 (UTC) X-FDA: 78463300650.21.8AF0AD8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf17.hostedemail.com (Postfix) with ESMTP id E4901F009AE8 for ; Wed, 11 Aug 2021 16:15:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628698544; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=31qE1bVII2HIf93HYG2MbHWEfiiik6GenrXLfQ/a8MA=; b=NWKMmoW+4Ov0aHExX0q7/rnhBUZb1jEW6i5b1ZLAcGf4+JWodSmlBPeKbQ9tqEWr6pfJTH xn8d27BrstY1MyYwEg2DtweblA9aeM5DuNG+Yauj5PYq7krWUnpWKNW5G99lvP7TBjRzKD U4c0DW7MzBlIEIjbP1ZCqvNIMdpzdAU= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-592-bcIOF8n2M7mZ2R7DDFeBCw-1; Wed, 11 Aug 2021 12:15:41 -0400 X-MC-Unique: bcIOF8n2M7mZ2R7DDFeBCw-1 Received: by mail-wm1-f70.google.com with SMTP id i6-20020a05600c3546b029025b0d825fd2so616581wmq.4 for ; Wed, 11 Aug 2021 09:15:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=31qE1bVII2HIf93HYG2MbHWEfiiik6GenrXLfQ/a8MA=; b=ttiEkBg4PydPiGpl72s6QnXPixPB+U+qzqcAnjzuKJN2nqaI6hLsjGCgsXN8+T5Ea1 Hs4g+t57OvLO5zs0fmAwNO7bE3LZHT8e5fjNE5nNc89kU5nEbF/LwGIeY7RRgpCt6xFb J5SlKwYnNUhlZQpAJlIPjv0N0csZ7SzBp3hFuNOT2TiJI3vC969Ii1gFhD2F1JGuZNR5 V47qGv/v7I1hlFbV7HQsxU02dEqwqw4zcXoX2+O4hGSN+dxGuy1AV/+pQB3K2LFUtG3q bDN1A0HR8sGeNdAmZbGTqqELZCqULywHsVW+VnuxkxxC1muvECgapd88ks36g++nenxO nKKw== X-Gm-Message-State: AOAM5302wPBr61GVjx5yfnRMwADR7vs1p4fRsdX0D//gPkouqy2l0sPV B3ICBJVGEDCc8ZXxealMHxn0OGDu3xKGEUxLIWxUt4wT+F1drAHk4Xf0Dp8RO4QsJt5WrzRu7i5 SsXK0PwxVOVE= X-Received: by 2002:a5d:42c9:: with SMTP id t9mr3932964wrr.356.1628698539964; Wed, 11 Aug 2021 09:15:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy4hMFf5taUyGh68P+K4K446qluU1H8SPaIK2FUHBp8IrAP9TpNdfncoCfAlpaLQtVS57MwaQ== X-Received: by 2002:a5d:42c9:: with SMTP id t9mr3932915wrr.356.1628698539636; Wed, 11 Aug 2021 09:15:39 -0700 (PDT) Received: from [192.168.3.132] (p5b0c64a0.dip0.t-ipconnect.de. [91.12.100.160]) by smtp.gmail.com with ESMTPSA id n8sm26832112wrx.46.2021.08.11.09.15.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 11 Aug 2021 09:15:39 -0700 (PDT) To: Peter Xu Cc: Tiberiu A Georgescu , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, christian.brauner@ubuntu.com, ebiederm@xmission.com, adobriyan@gmail.com, songmuchun@bytedance.com, axboe@kernel.dk, vincenzo.frascino@arm.com, catalin.marinas@arm.com, peterz@infradead.org, chinwen.chang@mediatek.com, linmiaohe@huawei.com, jannh@google.com, apopple@nvidia.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, ivan.teterevkov@nutanix.com, florian.schmidt@nutanix.com, carl.waldspurger@nutanix.com, jonathan.davies@nutanix.com References: <20210730160826.63785-1-tiberiu.georgescu@nutanix.com> <839e82f7-2c54-d1ef-8371-0a332a4cb447@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH 0/1] pagemap: swap location for shared pages Message-ID: <0beb1386-d670-aab1-6291-5c3cb0d661e0@redhat.com> Date: Wed, 11 Aug 2021 18:15:37 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E4901F009AE8 X-Stat-Signature: fn4cccp3xgdr5huq31uz3maw1xx5ezeg Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NWKMmoW+; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf17.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-HE-Tag: 1628698544-595640 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 04.08.21 21:17, Peter Xu wrote: > On Wed, Aug 04, 2021 at 08:49:14PM +0200, David Hildenbrand wrote: >> TBH, I tend to really dislike the PTE marker idea. IMHO, we shouldn't = store >> any state information regarding shared memory in per-process page tabl= es: it >> just doesn't make too much sense. >> >> And this is similar to SOFTDIRTY or UFFD_WP bits: this information act= ually >> belongs to the shared file ("did *someone* write to this page", "is >> *someone* interested into changes to that page", "is there something")= . I >> know, that screams for a completely different design in respect to the= se >> features. >> >> I guess we start learning the hard way that shared memory is just diff= erent >> and requires different interfaces than per-process page table interfac= es we >> have (pagemap, userfaultfd). >> >> I didn't have time to explore any alternatives yet, but I wonder if tr= acking >> such stuff per an actual fd/memfd and not via process page tables is >> actually the right and clean approach. There are certainly many issues= to >> solve, but conceptually to me it feels more natural to have these shar= ed >> memory features not mangled into process page tables. >=20 > Yes, we can explore all the possibilities, I'm totally fine with it. >=20 > I just want to say I still don't think when there's page cache then we = must put > all the page-relevant things into the page cache. [sorry for the late reply] Right, but for the case of shared, swapped out pages, the information is=20 already there, in the page cache :) >=20 > They're shared by processes, but process can still have its own way to = describe > the relationship to that page in the cache, to me it's as simple as "we= allow > process A to write to page cache P", while "we don't allow process B to= write > to the same page" like the write bit. The issue I'm having uffd-wp as it was proposed for shared memory is=20 that there is hardly a sane use case where we would *want* it to work=20 that way. A UFFD-WP flag in a page table for shared memory means "please notify=20 once this process modifies the shared memory (via page tables, not via=20 any other fd modification)". Do we have an example application where=20 these semantics makes sense and don't over-complicate the whole=20 approach? I don't know any, thus I'm asking dumb questions :) For background snapshots in QEMU the flow would currently be like this,=20 assuming all processes have the shared guest memory mapped. 1. Background snapshot preparation: QEMU requests all processes to uffd-wp the range a) All processes register a uffd handler on guest RAM b) All processes fault in all guest memory (essentially populating all memory): with a uffd-WP extensions we might be able to get rid of that, I remember you were working on that. c) All processes uffd-WP the range to set the bit in their page table 2. Background snapshot runs: a) A process either receives a UFFD-WP event and forwards it to QEMU or QEMU polls all other processes for UFFD events. b) QEMU writes the to-be-changed page to the migration stream. c) QEMU triggers all processes to un-protect the page and wake up any waiters. All processes clear the uffd-WP bit in their page tables. 3. Background snapshot completes: a) All processes unregister the uffd handler Now imagine something like this: 1. Background snapshot preparation: a) QEMU registers a UFFD-WP handler on a *memfd file* that corresponds to guest memory. b) QEMU uffd-wp's the whole file 2. Background snapshot runs: a) QEMU receives a UFFD-WP event. b) QEMU writes the to-be-changed page to the migration stream. c) QEMU un-protect the page and wake up any waiters. 3. Background snapshot completes: a) QEMU unregister the uffd handler Wouldn't that be much nicer and much easier to handle? Yes, it is much=20 harder to implement because such an infrastructure does not exist yet,=20 and it most probably wouldn't be called uffd anymore, because we are=20 dealing with file access. But this way, it would actually be super easy=20 to use the feature across multiple processes and eventually to even=20 catch other file modifications. Again, I am not sure if uffd-wp or softdirty make too much sense in=20 general when applied to shmem. But I'm happy to learn more. --=20 Thanks, David / dhildenb