From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A167BC433E1 for ; Fri, 24 Jul 2020 20:23:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4CED4206F0 for ; Fri, 24 Jul 2020 20:23:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="ApCNmQ4L" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4CED4206F0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D0C2A6B0026; Fri, 24 Jul 2020 16:23:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C94BC6B0027; Fri, 24 Jul 2020 16:23:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B36AC6B0028; Fri, 24 Jul 2020 16:23:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0074.hostedemail.com [216.40.44.74]) by kanga.kvack.org (Postfix) with ESMTP id 9B2C86B0026 for ; Fri, 24 Jul 2020 16:23:11 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4999BDB38 for ; Fri, 24 Jul 2020 20:23:11 +0000 (UTC) X-FDA: 77074093782.08.birth68_3900cc526f4a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 1F6D9180CE8A3 for ; Fri, 24 Jul 2020 20:23:11 +0000 (UTC) X-HE-Tag: birth68_3900cc526f4a X-Filterd-Recvd-Size: 8182 Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) by imf26.hostedemail.com (Postfix) with ESMTP for ; Fri, 24 Jul 2020 20:23:10 +0000 (UTC) Received: by mail-lf1-f68.google.com with SMTP id u12so5852049lff.2 for ; Fri, 24 Jul 2020 13:23:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8BZz7z3Uc8x6e+JirIzfDLbKlxTVrs7B3cqZj+2kU8Y=; b=ApCNmQ4L+REj9HjShM+ZBaP/Ejl1DvCAq44y3E5kN8lbDXnhJX5xmRXbTSDsn1gztt U9zY4xuRan3JT4USpIxEFawZFvqtrxhVQDfRq1CbAkUufXd9bKG4kNv5raU5Nia1NZ83 l2wDDBGk+PY3Vc88l9zzd2m6noa0MkJ1Q/hY4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8BZz7z3Uc8x6e+JirIzfDLbKlxTVrs7B3cqZj+2kU8Y=; b=QN+wcbAoL7cDfukvHREIa0hIVXz7oo1G11YCyMJPsemYZDjKEDRR7QefZZQa+fd2sQ tK59H6/MBtgMuMGULVbiV3r/rnme+PKtGVrQHcIqO8cQpLIVDYA4sXYb0UnX/IrKog0/ dbrD4vEddOBtqfb9OUosySEI2EpFO1VxvnREAGQWp64GANF96DJfJJEHKPnl8ffrMxSZ 1uHOEbTpi2BJg/24CRcPSgl6M3z5XwhlCgz4Syt3U4i1QPoHQYgFYBn1oa7C3uW3+wev R1EJhhD2wlfZOvB0VwwYls4rTJDFTrB7vWS8aXBV6WxcJGHxaHGCkgbfViTDmu8xwb6t 3EyA== X-Gm-Message-State: AOAM5322U7u8CCfoYjvJlJR7AWN3IBOqfI/Jwb9lVA+Q6oGswYugrW5a bN9cQq66umjncBTG5+IAP3uAhhrxN6U= X-Google-Smtp-Source: ABdhPJyQv7+OcLRwzCXqf5xZxlZ11joYf+FMhAN3hHPUwDJ2cxrnqovBt2Ah8ECZKmj6wnP68eHlYQ== X-Received: by 2002:a19:6b15:: with SMTP id d21mr1194757lfa.215.1595622188628; Fri, 24 Jul 2020 13:23:08 -0700 (PDT) Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com. [209.85.167.53]) by smtp.gmail.com with ESMTPSA id h5sm555446lfm.70.2020.07.24.13.23.07 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 24 Jul 2020 13:23:07 -0700 (PDT) Received: by mail-lf1-f53.google.com with SMTP id b11so5841690lfe.10 for ; Fri, 24 Jul 2020 13:23:07 -0700 (PDT) X-Received: by 2002:ac2:521a:: with SMTP id a26mr5738067lfl.192.1595622186672; Fri, 24 Jul 2020 13:23:06 -0700 (PDT) MIME-Version: 1.0 References: <20200723211432.b31831a0df3bc2cbdae31b40@linux-foundation.org> <20200724041508.QlTbrHnfh%akpm@linux-foundation.org> In-Reply-To: From: Linus Torvalds Date: Fri, 24 Jul 2020 13:22:50 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [patch 01/15] mm/memory.c: avoid access flag update TLB flush for retried page fault To: Andrew Morton Cc: Catalin Marinas , Johannes Weiner , Hillf Danton , Hugh Dickins , Josef Bacik , "Kirill A . Shutemov" , Linux-MM , mm-commits@vger.kernel.org, Will Deacon , Matthew Wilcox , xuyu@linux.alibaba.com, yang.shi@linux.alibaba.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 1F6D9180CE8A3 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 24, 2020 at 12:27 PM Linus Torvalds wrote: > > It *may* make sense to say "ok, don't bother flushing the TLB if this > is a retry, because we already did that originally". MAYBE. That sounds wrong to me too. Maybe a *BIG* and understandable comment about why the FAULT_FLAG_TRIED case shouldn't actually do anything at all. But it does smell like the real issue is that the "else" case for ptep_set_access_flags() is simply wrong. Or maybe something else. But this thing from the changelog really raises my hackles: "But it seems not necessary to modify those bits again for #3 since they should be already set by the first page fault try" since we just verified that we know _exactly_ what the pte is: if (unlikely(!pte_same(*vmf->pte, entry))) { update_mmu_tlb(vmf->vma, vmf->address, vmf->pte); goto unlock; } so there is no "should be already set" case. We have 100% information about what the current state is. And if we don't make any changes, then that's exactly when ptep_set_access_flags() returns zero. So the real question is "why do we need the flush_tlb_fix_spurious_fault() thing". We could say that we never need it at all for FAULT_FLAG_RETRY. That makes a lot of sense to me. So a patch that does something like the appended (intentionally whitespace-damaged) seems sensible. But note the XYZ in that commit. When do we actually have stale TLB entries? Do we actually do the lazy "avoid TLB flushing when loosening the rules" anywhere? I think that "when it's a write fault" is actually bogus. I could imagine that code pages could get the same issue. So the "FAULT_FLAG_RETRY" part of the check makes perfect sense to me, but the legacy "FAULT_FLAG_WRITE" case I'd actually want to document more. On x86, we never care about lazy faults. Page faulting will always update the TLB. On other architectures, I can see spurious faults happening either due to lazy reasons, or simply because another core is modifying the page table right now (ie the concurrent fault thing), but hasn't actually flushed yet. Can somebody flesh out the comment about the "spurious_protection_fault()" thing? Because something like this I wouldn't mind, but I'd like that comment to explain the FAULT_FLAG_WRITE part too. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 3ecad55103ad..9994c98d88c3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4163,6 +4163,26 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud) return VM_FAULT_FALLBACK; } +/* + * If ptep_set_access_flags() returns zero, that means that + * it made no changes. Why did we get a fault? + * + * It might be a spurious protection fault because we at + * some point lazily didn't flush a TLB when we only loosened + * the protection rules. But it might also be because a + * concurrent fault on another CPU had already marked things + * young, and our young/dirty changes didn't change anything. + * + * The lazy TLB optimization only happens when we make things + * writable. See XYZ. + */ +static inline bool spurious_protection_fault(unsigned int flags) +{ + if (flags & FAULT_FLAG_RETRY) + return false; + return flags & FAULT_FLAG_WRITE; +} + /* * These routines also need to handle stuff like marking pages dirty * and/or accessed for architectures that don't do it in hardware (most @@ -4247,15 +4267,8 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, vmf->flags & FAULT_FLAG_WRITE)) { update_mmu_cache(vmf->vma, vmf->address, vmf->pte); - } else { - /* - * This is needed only for protection faults but the arch code - * is not yet telling us if this is a protection fault or not. - * This still avoids useless tlb flushes for .text page faults - * with threads. - */ - if (vmf->flags & FAULT_FLAG_WRITE) - flush_tlb_fix_spurious_fault(vmf->vma, vmf->address); + } else if (spurious_protection_fault(vmf->flags)) { + flush_tlb_fix_spurious_fault(vmf->vma, vmf->address); } unlock: pte_unmap_unlock(vmf->pte, vmf->ptl);