From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C7B0C433EF for ; Wed, 20 Jul 2022 17:25:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240195AbiGTRZy (ORCPT ); Wed, 20 Jul 2022 13:25:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232362AbiGTRZu (ORCPT ); Wed, 20 Jul 2022 13:25:50 -0400 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C781B25C75 for ; Wed, 20 Jul 2022 10:25:49 -0700 (PDT) Received: by mail-pl1-x635.google.com with SMTP id z3so3222656plb.1 for ; Wed, 20 Jul 2022 10:25:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=KVy1jSlav0FOqpeAQ11Wb/ZstcAqe1KYzN+PWSIXtU8=; b=CarY0uRd9wdPwbNAzGtbvccyFjlu6IcKFsrNhmsAsz4IKmj1HvPTZGXBuM449JvuxH f95ulwixlmjvYZgka2dXgVU2vH/1Dkszh2AYpltxXqeRLSU5B4ynoJE0H/FZzpj0iRPG kjrDt8Bk1UfKUtb7s5LXPvjJBC5CBCktCdr8Lpud12PhX5S5FAm2/+TQ4SIQbzH11Wi4 RGMfBSz4MDDsmEk+FVRP9wXZR66ygynfm0VIEgCsRG+A+pPK3wgWplGE50TFgi92UESE Lya8B38DMjF8nkwZbEgwoYhdRyz15X5pMaLK/S+sUnwKevrwfwwMrdW1Aw/rPZkQ/uvh Sgkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=KVy1jSlav0FOqpeAQ11Wb/ZstcAqe1KYzN+PWSIXtU8=; b=LMZoq/+CuQuY0QJdekLAXCr9wxeU4YkiRI1kh1xIVopcScUpGh0Hi0pJubqIlR1reD kmZirMw+vp8XBwjR2WALE62imre/NQW90s11AqN1FoTHb/5PRDElS/f2iKTwB2z1H5nb 36tS1l2Qbf/wlAzqi0sBwnWKnzGQf1SqKoBSdLq96ED3mFqzD3vO/kYrOzDFAhf4lAoz Ekgx0sQVeWdKwlYUy/K+5yY9WwP4NvYUK4oZJqKSCX0wfrE1qCXr9EA7MtXm0lcr4Hdd G5wN0cyvU77H7b/oFhbPLdOGaqp+pFEDkZeGmutPAAx6Udlo5MZjfB1OFReutVTKOSaZ cZzw== X-Gm-Message-State: AJIora8RCf++8Q0SmRY0iAMvwtXwk523+BO4vfOphXJOIV8bQS098aiB r4S+i+NKseNT2zW4VinpMww= X-Google-Smtp-Source: AGRyM1vuNFYUnTL4pLfVPY+7Q3tvAgLv/p7Egq2AmXWYJ3xqpA/0xROIZUHeJyVuqaYPK3GDdptMwQ== X-Received: by 2002:a17:903:1ca:b0:16c:4e2f:9294 with SMTP id e10-20020a17090301ca00b0016c4e2f9294mr40602508plh.30.1658337948964; Wed, 20 Jul 2022 10:25:48 -0700 (PDT) Received: from smtpclient.apple ([66.170.99.113]) by smtp.gmail.com with ESMTPSA id z26-20020aa79e5a000000b00517c84fd24asm14211873pfq.172.2022.07.20.10.25.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Jul 2022 10:25:48 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Subject: Re: [RFC PATCH 03/14] mm/mprotect: allow exclusive anon pages to be writable From: Nadav Amit In-Reply-To: <23a9d678-487e-5940-4cde-dc53d920fb48@redhat.com> Date: Wed, 20 Jul 2022 10:25:46 -0700 Cc: Linux MM , LKML , Andrew Morton , Mike Rapoport , Axel Rasmussen , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin Content-Transfer-Encoding: quoted-printable Message-Id: References: <20220718120212.3180-1-namit@vmware.com> <20220718120212.3180-4-namit@vmware.com> <23a9d678-487e-5940-4cde-dc53d920fb48@redhat.com> To: David Hildenbrand X-Mailer: Apple Mail (2.3696.100.31) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Jul 20, 2022, at 8:19 AM, David Hildenbrand wrote: > On 18.07.22 14:02, Nadav Amit wrote: >> From: Nadav Amit >>=20 >> Anonymous pages might have the dirty bit clear, but this should not >> prevent mprotect from making them writable if they are exclusive. >> Therefore, skip the test whether the page is dirty in this case. >>=20 >> Cc: Andrea Arcangeli >> Cc: Andrew Cooper >> Cc: Andrew Morton >> Cc: Andy Lutomirski >> Cc: Dave Hansen >> Cc: David Hildenbrand >> Cc: Peter Xu >> Cc: Peter Zijlstra >> Cc: Thomas Gleixner >> Cc: Will Deacon >> Cc: Yu Zhao >> Cc: Nick Piggin >> Signed-off-by: Nadav Amit >> --- >> mm/mprotect.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >>=20 >> diff --git a/mm/mprotect.c b/mm/mprotect.c >> index 34c2dfb68c42..da5b9bf8204f 100644 >> --- a/mm/mprotect.c >> +++ b/mm/mprotect.c >> @@ -45,7 +45,7 @@ static inline bool can_change_pte_writable(struct = vm_area_struct *vma, >>=20 >> VM_BUG_ON(!(vma->vm_flags & VM_WRITE) || pte_write(pte)); >>=20 >> - if (pte_protnone(pte) || !pte_dirty(pte)) >> + if (pte_protnone(pte)) >> return false; >>=20 >> /* Do we need write faults for softdirty tracking? */ >> @@ -66,7 +66,8 @@ static inline bool can_change_pte_writable(struct = vm_area_struct *vma, >> page =3D vm_normal_page(vma, addr, pte); >> if (!page || !PageAnon(page) || = !PageAnonExclusive(page)) >> return false; >> - } >> + } else if (!pte_dirty(pte)) >> + return false; >>=20 >> return true; >> } >=20 > When I wrote that code, I was wondering how often that would actually > happen in practice -- and if we care about optimizing that. Do you = have > a gut feeling in which scenarios this would happen and if we care? >=20 > If the page is in the swapcache and was swapped out, you'd be = requiring > a writeback even though nobody modified the page and possibly isn't > going to do so in the near future. So here is my due diligence: I did not really encounter a scenario in = which it showed up. When I looked at your code, I assumed this was an = oversight and not a thoughtful decision. For me the issue is more of the = discrepancy between how a certain page is handled before and after it was pages out. The way that I see it, there is a tradeoff in the way dirty-bit should be handled: (1) Writable-clean PTEs introduce some non-negligible overhead. (2) Marking a PTE dirty speculatively would require a write back. =E2=80=A6 But this tradeoff should not affect whether a PTE is writable, = i.e., mapping the PTE as writable-clean should not cause a writeback. In other words, if you are concerned about unnecessary writebacks, which I think = is a fair concern, then do not set the dirty-bit. In a later patch I try to = avoid TLB flushes on clean-writable entries that are write-protected. So I do not think that the writeback you mentioned should be a real = issue. Yet if you think that using the fact that the page is not-dirty is a = good hueristics to avoid future TLB flushes (for P->NP; as I said there is a solution for RW->RO), or if you are concerned about the cost of vm_normal_page(), perhaps those are valid concerned (although I do not = think so). -- [ Regarding (1): After some discussions with Peter and reading more = code, I thought at some point that perhaps avoiding having writable-clean PTE as much as possible makes sense [*], since setting the dirty-bit costs ~550 cycles and a page fault is not a lot more than 1000. But with all the mitigations (and after adding IBRS for retbless) page-fault entry is = kind of expensive.=20 [*] At least on x86 ]=