From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E304C4338F for ; Wed, 11 Aug 2021 16:17:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD4B660F55 for ; Wed, 11 Aug 2021 16:17:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org CD4B660F55 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 37BE06B0071; Wed, 11 Aug 2021 12:17:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 304A46B0072; Wed, 11 Aug 2021 12:17:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A5D18D0001; Wed, 11 Aug 2021 12:17:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0133.hostedemail.com [216.40.44.133]) by kanga.kvack.org (Postfix) with ESMTP id EC5BD6B0071 for ; Wed, 11 Aug 2021 12:17:16 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8F0AF8249980 for ; Wed, 11 Aug 2021 16:17:16 +0000 (UTC) X-FDA: 78463304472.28.A36BADF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 2CDF2F000EF2 for ; Wed, 11 Aug 2021 16:17:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628698635; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KbPmmnKi/MdclSZPycD32Y/2euWfOzjLukbxHGukuxM=; b=AZIi290UCsnJ/NyhCT71Hj/9CPOU/i/yDquSuGAGa6xV2oVEe+zOlvj9Y5pxJtWCQ4GBEy /odW1CcUBSSoAUlHJQd0n9teC5NQHWMGLJfZ1LCIMzVL5c1o4GEV5EG+X/MLm9qZEGE4fi H/rbuNQ69Ny6AA/xLLn1oTBoliVbPBU= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-507-4cEeDiDSOte443UqhZhnAA-1; Wed, 11 Aug 2021 12:17:14 -0400 X-MC-Unique: 4cEeDiDSOte443UqhZhnAA-1 Received: by mail-wr1-f71.google.com with SMTP id r17-20020adfda510000b02901526f76d738so942669wrl.0 for ; Wed, 11 Aug 2021 09:17:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=KbPmmnKi/MdclSZPycD32Y/2euWfOzjLukbxHGukuxM=; b=ZoDSjuzyim0MZSppVnNO4gUFZbEvD3ZqMMOUA435SkvaTe3Bh9vpmXKAoh/xXoWGrc s3k3OVAccMc2cEExFlNpgB7jmkcob4Ybx19sUylINILLJ8R3DcoglXKWWAMHvtxq/9E7 yU6GAqYhDFcKIgyizFUcN4tPoho3rbGkACfjVBxLpQKsB4JpifLfuCJWKP3kzkZGf0xL C3NOJgNP2vSJFQqd1TVXF+cD3Bj7mmpJCIrjVGGI94Be/1gDoosadhHghEIpTrk/erWU WRq7y0oJFlS4YvfRbcicOGG9rPtp20xWqd42YjIxF/VLxOm/llqaoYilyB1mKOGQxQlg U+Cw== X-Gm-Message-State: AOAM533V9WOO5gYhCh9dhvzOIuH1YRqWhtgZcM/2ynNKwOGFef8LtRI5 cSgHGdh9U2rxurd60NLX2knJ1nY42Z8qEKLTOsueGAD6u2Tc/vGoSSWfc97+uYP6Me/UscKVEIq KWWKWCeIw0wQ= X-Received: by 2002:a1c:1d84:: with SMTP id d126mr9155705wmd.160.1628698633037; Wed, 11 Aug 2021 09:17:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJza0e8GuoOdy4XqhKEXnglKaZzx0VUKls9M0hy/X//jR9bKk6lYivtGRFB5xhYG0ciLOU11Hg== X-Received: by 2002:a1c:1d84:: with SMTP id d126mr9155655wmd.160.1628698632694; Wed, 11 Aug 2021 09:17:12 -0700 (PDT) Received: from [192.168.3.132] (p5b0c64a0.dip0.t-ipconnect.de. [91.12.100.160]) by smtp.gmail.com with ESMTPSA id t16sm27629034wmi.13.2021.08.11.09.17.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 11 Aug 2021 09:17:12 -0700 (PDT) Subject: Re: [PATCH 0/1] pagemap: swap location for shared pages From: David Hildenbrand To: Peter Xu Cc: Tiberiu A Georgescu , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, christian.brauner@ubuntu.com, ebiederm@xmission.com, adobriyan@gmail.com, songmuchun@bytedance.com, axboe@kernel.dk, vincenzo.frascino@arm.com, catalin.marinas@arm.com, peterz@infradead.org, chinwen.chang@mediatek.com, linmiaohe@huawei.com, jannh@google.com, apopple@nvidia.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, ivan.teterevkov@nutanix.com, florian.schmidt@nutanix.com, carl.waldspurger@nutanix.com, jonathan.davies@nutanix.com References: <20210730160826.63785-1-tiberiu.georgescu@nutanix.com> <839e82f7-2c54-d1ef-8371-0a332a4cb447@redhat.com> <0beb1386-d670-aab1-6291-5c3cb0d661e0@redhat.com> Organization: Red Hat Message-ID: <32fd63ef-3a8f-a037-28dc-a63dc11087a3@redhat.com> Date: Wed, 11 Aug 2021 18:17:09 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <0beb1386-d670-aab1-6291-5c3cb0d661e0@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AZIi290U; spf=none (imf16.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 2CDF2F000EF2 X-Stat-Signature: swfc44qsoc8133n3xx7dm396t4ykgc7o X-HE-Tag: 1628698636-853925 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11.08.21 18:15, David Hildenbrand wrote: > On 04.08.21 21:17, Peter Xu wrote: >> On Wed, Aug 04, 2021 at 08:49:14PM +0200, David Hildenbrand wrote: >>> TBH, I tend to really dislike the PTE marker idea. IMHO, we shouldn't store >>> any state information regarding shared memory in per-process page tables: it >>> just doesn't make too much sense. >>> >>> And this is similar to SOFTDIRTY or UFFD_WP bits: this information actually >>> belongs to the shared file ("did *someone* write to this page", "is >>> *someone* interested into changes to that page", "is there something"). I >>> know, that screams for a completely different design in respect to these >>> features. >>> >>> I guess we start learning the hard way that shared memory is just different >>> and requires different interfaces than per-process page table interfaces we >>> have (pagemap, userfaultfd). >>> >>> I didn't have time to explore any alternatives yet, but I wonder if tracking >>> such stuff per an actual fd/memfd and not via process page tables is >>> actually the right and clean approach. There are certainly many issues to >>> solve, but conceptually to me it feels more natural to have these shared >>> memory features not mangled into process page tables. >> >> Yes, we can explore all the possibilities, I'm totally fine with it. >> >> I just want to say I still don't think when there's page cache then we must put >> all the page-relevant things into the page cache. > > [sorry for the late reply] > > Right, but for the case of shared, swapped out pages, the information is > already there, in the page cache :) > >> >> They're shared by processes, but process can still have its own way to describe >> the relationship to that page in the cache, to me it's as simple as "we allow >> process A to write to page cache P", while "we don't allow process B to write >> to the same page" like the write bit. > > The issue I'm having uffd-wp as it was proposed for shared memory is > that there is hardly a sane use case where we would *want* it to work > that way. > > A UFFD-WP flag in a page table for shared memory means "please notify > once this process modifies the shared memory (via page tables, not via > any other fd modification)". Do we have an example application where > these semantics makes sense and don't over-complicate the whole > approach? I don't know any, thus I'm asking dumb questions :) > > > For background snapshots in QEMU the flow would currently be like this, > assuming all processes have the shared guest memory mapped. > > 1. Background snapshot preparation: QEMU requests all processes > to uffd-wp the range > a) All processes register a uffd handler on guest RAM > b) All processes fault in all guest memory (essentially populating all > memory): with a uffd-WP extensions we might be able to get rid of > that, I remember you were working on that. > c) All processes uffd-WP the range to set the bit in their page table > > 2. Background snapshot runs: > a) A process either receives a UFFD-WP event and forwards it to QEMU or > QEMU polls all other processes for UFFD events. > b) QEMU writes the to-be-changed page to the migration stream. > c) QEMU triggers all processes to un-protect the page and wake up any > waiters. All processes clear the uffd-WP bit in their page tables. Oh, and I forgot, whenever we save any page to the migration stream, we have to trigger all processes to un-protect. -- Thanks, David / dhildenb