From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8959EC43461 for ; Tue, 13 Apr 2021 01:22:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6D34061249 for ; Tue, 13 Apr 2021 01:22:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242673AbhDMBWS (ORCPT ); Mon, 12 Apr 2021 21:22:18 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:51629 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241031AbhDMBWS (ORCPT ); Mon, 12 Apr 2021 21:22:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1618276919; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=NEBjah6KfnPorZE8w92CEcFAPh+QAGov51+i+ByIFoM=; b=Cz1P3TT46JJdZqreKS9C1BcSFp1FdVgkDteWZSR2GXifkP624MkGCegkavEev0jqv3bSOO TtWajD2S2G4QWLyIx7Qs/Ir/b0C8DjK57/uMuLHGKGuhNgV163Oy/dF5BwXp6gKrc/q7NY DwMwSVGxMNrueM0edf9G1FZ8GweRrTI= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-528-geSEuBrRMgKmhTPgaBKKUA-1; Mon, 12 Apr 2021 21:21:57 -0400 X-MC-Unique: geSEuBrRMgKmhTPgaBKKUA-1 Received: by mail-qt1-f197.google.com with SMTP id b14so104572qtp.0 for ; Mon, 12 Apr 2021 18:21:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=NEBjah6KfnPorZE8w92CEcFAPh+QAGov51+i+ByIFoM=; b=dzg2bxEPzBcESi1fd9ZEDoRyTBrRihS2s16d2ummzZFaFUpKXGJWDmAwLnZLmiS2np TrfIw2rz4QsWXx0GPSGGn+/O7xKQYlFD4C9t6CRk4MdHFnCQy9911nZ18XihEUQs4l4M 7mpRzvOZqrJYVu7RS8x4SSAAUn/u+t6Sp67qvVo65PR7zljC0j7GXKevPK8SdVDz3Iv3 l24I/tcd/3oDFr6vYWMCcCglfs4Dpea2sbozdrvLv/7E3nmJgnX14iU2oZ7ojQrknRC9 7R6wEnMFskYwcDnid43glMCVLkBMAfaZc0mnMnIl3V8R5OHE9sKteB5hCL+Hzb9Acp+Y fr6w== X-Gm-Message-State: AOAM530Tq+r+c8jQCzZqWB7ftmloyskxy4PEykFKd25P01X+PnLNjln3 eaIXOnIXLtDzELtw9gmXLS3BqsK6AhYlRCX25tcZkF2ViLfjIkuqLDM52P4jwm4HCPC86EunqNv 7Y3GgZPNc6S2DqweoEHqwkmU0DnV9 X-Received: by 2002:ac8:5996:: with SMTP id e22mr29052764qte.64.1618276916971; Mon, 12 Apr 2021 18:21:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy+67rQx0v+TacpKguZsqrrTReCrJzBFDPqLkJ0oEZn8JA/kzQvGaBMhEOyxp/AVVcMzR5Q6Q== X-Received: by 2002:ac8:5996:: with SMTP id e22mr29052737qte.64.1618276916674; Mon, 12 Apr 2021 18:21:56 -0700 (PDT) Received: from xz-x1 (bras-base-toroon474qw-grc-88-174-93-75-154.dsl.bell.ca. [174.93.75.154]) by smtp.gmail.com with ESMTPSA id t4sm2659038qkg.75.2021.04.12.18.21.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Apr 2021 18:21:55 -0700 (PDT) Date: Mon, 12 Apr 2021 21:21:52 -0400 From: Peter Xu To: Hugh Dickins Cc: Axel Rasmussen , Alexander Viro , Andrea Arcangeli , Andrew Morton , Jerome Glisse , Joe Perches , Lokesh Gidra , Mike Rapoport , Shaohua Li , Shuah Khan , Stephen Rothwell , Wang Qing , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Brian Geffon , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Michel Lespinasse , Mina Almasry , Oliver Upton Subject: Re: [PATCH v4] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTINUE behavior Message-ID: <20210413012152.GE1002612@xz-x1> References: <20210401183701.1774159-1-axelrasmussen@google.com> <20210412215437.GA1001332@xz-x1> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org On Mon, Apr 12, 2021 at 05:51:14PM -0700, Hugh Dickins wrote: > On Mon, 12 Apr 2021, Peter Xu wrote: > > On Tue, Apr 06, 2021 at 11:14:30PM -0700, Hugh Dickins wrote: > > > > +static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t *dst_pmd, > > > > + struct vm_area_struct *dst_vma, > > > > + unsigned long dst_addr, struct page *page, > > > > + enum mcopy_atomic_mode mode, bool wp_copy) > > > > +{ > > > > [...] > > > > > > + if (writable) { > > > > + _dst_pte = pte_mkdirty(_dst_pte); > > > > + if (wp_copy) > > > > + _dst_pte = pte_mkuffd_wp(_dst_pte); > > > > + else > > > > + _dst_pte = pte_mkwrite(_dst_pte); > > > > + } else if (vm_shared) { > > > > + /* > > > > + * Since we didn't pte_mkdirty(), mark the page dirty or it > > > > + * could be freed from under us. We could do this > > > > + * unconditionally, but doing it only if !writable is faster. > > > > + */ > > > > + set_page_dirty(page); > > > > > > I do not remember why Andrea or I preferred set_page_dirty() here to > > > pte_mkdirty(); but I suppose there might somewhere be a BUG_ON(pte_dirty) > > > which this would avoid. Risky to change it, though it does look odd. > > > > Is any of the possible BUG_ON(pte_dirty) going to trigger because the pte has > > write bit cleared? That's one question I was not very sure, e.g., whether one > > pte is allowed to be "dirty" if it's not writable. > > > > To me it's okay, it's actually very suitable for UFFDIO_COPY case, where it is > > definitely dirty data (so we must never drop it) even if it's installed as RO, > > however to achieve that we can still set the dirty on the page rather than the > > pte as what we do here. It's just a bit awkward as you said. > > > > Meanwhile today I just noticed this in arm64 code: > > > > static inline pte_t pte_wrprotect(pte_t pte) > > { > > /* > > * If hardware-dirty (PTE_WRITE/DBM bit set and PTE_RDONLY > > * clear), set the PTE_DIRTY bit. > > */ > > if (pte_hw_dirty(pte)) > > pte = pte_mkdirty(pte); > > > > pte = clear_pte_bit(pte, __pgprot(PTE_WRITE)); > > pte = set_pte_bit(pte, __pgprot(PTE_RDONLY)); > > return pte; > > } > > > > So arm64 will explicitly set the dirty bit (from the HW dirty bit) when > > wr-protect. It seems to prove that at least for arm64 it's very valid to have > > !write && dirty pte. > > I did not mean to imply that it's wrong to have pte_dirty without > pte_write: no, I agree with you, I believe that there are accepted > and generic ways in which we can have pte_dirty without pte_write > (and we could each probably add a warning somewhere which would > very quickly prove that - but those would not prove that there > are not BUG_ONs on some other path, which had been my fear). > > I wanted now to demonstrate that by pointing to change_pte_range() in > mm/mprotect.c, showing that it does not clear pte_dirty when it clears > pte_write. But alarmingly found rather the reverse: that it appears to > set pte_write when it finds pte_dirty - if dirty_accountable. > > That looks very wrong, but if I spent long enough following up > dirty_accountable in detail, I think I would be reassured to find that > it is only adding the pte_write there when it had removed it from the > prot passed down, for dirty accounting reasons (which apply !VM_SHARED > protections in the VM_SHARED case, so that page_mkwrite() is called > and dirty accounting done when necessary). > > What I did mean to imply is that changing set_page_dirty to pte_mkdirty, > to make that userfaultfd code block look nicer, is not a change to be > done lightly: by all means try it out, test it, and send a patch after > Axel's series is in, but please do not ask Axel to make that change as > a part of his series - it would be taking a risk, just for a cleanup. Fair enough. Sorry if I wasn't clear when asking, the reason to ask was that I wanted to be clear on these differences. For example, in my uffd-wp shmem series I'll have to make sure dirty bit always there, I used pte_mkdirty() unconditionally but I wanted to make sure I'm not overlooking anything.. Though this case is slightly special here, since if without the cleaning up the logic will definitely be harder to review (with the cleanup, it'll be as simple as one unconditional pte_mkdirty() and that's all). However you're definitely right, it's not a reason to overload Axel with this patchset, especially if such a cleanup is seen to be risky. > > Now, I have also looked up the mail exchange with Andrea which led to > his dcf7fe9d8976 ("userfaultfd: shmem: UFFDIO_COPY: set the page dirty > if VM_WRITE is not set") - it had to be off-list at the time. And he > was rather led to that set_page_dirty by following old patterns left > over in shmem_getpage_gfp(); but when I said "or it could be done with > pte_mkdirty without pte_mkwrite", he answered "I explicitly avoided > that because pte_dirty then has side effects on mprotect to decide > pte_write. It looks safer to do set_page_dirty and not set dirty bits > in not writable ptes unnecessarily". > > Haha: I think Andrea is referring to exactly the dirty_accountable code > in change_pte_protection() which worried me above. Now, I think that > will turn out okay (shmem does not have a page_mkwrite(), and does not > participate in dirty accounting), but you will have to do some work to > assure us all of that, before sending in a cleanup patch. OK. Maybe I can clean this after Axel's work. Thanks a lot for all these details, Hugh! -- Peter Xu