From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8710C433DF for ; Tue, 28 Jul 2020 09:22:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4CA762074F for ; Tue, 28 Jul 2020 09:22:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="tUXubLwZ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4CA762074F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9D9838D0002; Tue, 28 Jul 2020 05:22:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 98A2E6B000D; Tue, 28 Jul 2020 05:22:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 879B08D0002; Tue, 28 Jul 2020 05:22:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 6FA566B000A for ; Tue, 28 Jul 2020 05:22:28 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 172EEDDEC for ; Tue, 28 Jul 2020 09:22:28 +0000 (UTC) X-FDA: 77086943976.18.hot87_390d23726f68 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id D774C100EDBE9 for ; Tue, 28 Jul 2020 09:22:27 +0000 (UTC) X-HE-Tag: hot87_390d23726f68 X-Filterd-Recvd-Size: 6813 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Tue, 28 Jul 2020 09:22:27 +0000 (UTC) Received: from willie-the-truck (236.31.169.217.in-addr.arpa [217.169.31.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0EBCE206F5; Tue, 28 Jul 2020 09:22:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595928146; bh=hLkKhQ61iu5eRuGCXAGBcyQsfXbCp5iJfSKBtfwsSeE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=tUXubLwZUNuTPCD0669pxuhFMFRphDtNnN79ENQA1fIKPq49lZ7/FE2NgYuIuu+L7 sE9T41O8B+3uV2dwVCjWLRTuQK8POn7bquzp+35U0cpAaVv+1yqtMoNhA8LY88devZ IlmWoQWPxO5ewB/UF5+6V5fQE01/XuBd/pXoiUe8= Date: Tue, 28 Jul 2020 10:22:20 +0100 From: Will Deacon To: Catalin Marinas Cc: Linus Torvalds , Yang Shi , Andrew Morton , Johannes Weiner , Hillf Danton , Hugh Dickins , Josef Bacik , "Kirill A . Shutemov" , Linux-MM , mm-commits@vger.kernel.org, Will Deacon , Matthew Wilcox , xuyu@linux.alibaba.com Subject: Re: [patch 01/15] mm/memory.c: avoid access flag update TLB flush for retried page fault Message-ID: <20200728092220.GA21800@willie-the-truck> References: <20200723211432.b31831a0df3bc2cbdae31b40@linux-foundation.org> <20200724041508.QlTbrHnfh%akpm@linux-foundation.org> <7de20d4a-f86c-8e1f-b238-65f02b560325@linux.alibaba.com> <20200725155841.GA14490@gaia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200725155841.GA14490@gaia> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Queue-Id: D774C100EDBE9 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jul 25, 2020 at 04:58:43PM +0100, Catalin Marinas wrote: > On Fri, Jul 24, 2020 at 06:29:43PM -0700, Linus Torvalds wrote: > > On Fri, Jul 24, 2020 at 5:37 PM Yang Shi wrote: > > > A follow-up question about your comment in the previous email "The > > > notion of "this is a retry, so let's do nothing" is fundamentally > > > wrong.", do you mean it is not safe? > > > > I mean it fails my "smell test". > > > > The patch didn't just avoid the TLB flush, it avoided all the other > > "mark it dirty and young" things too. And that made me go "why would > > RETRY be different in this regard"? > > I had a similar concern, couldn't convince myself it's entirely safe. > Even if it is safe now, some mm change in the future may break the > current assumptions. > > The arm64 do_page_fault() sets FAULT_FLAG_TRIED if a previous > handle_mm_fault() returns VM_FAULT_RETRY. A quick grep for > VM_FAULT_RETRY shows a few more instances than what Yang listed. Maybe > they are all safe, I just couldn't get my head around it. > > > For any architecture that guarantees that a page fault will always > > flush the old TLB entry for this kind of situation, that > > flush_tlb_fix_spurious_fault() thing can be a no-op. > > > > So that's why on x86, we just do > > > > #define flush_tlb_fix_spurious_fault(vma, address) do { } while (0) > > > > and have no issues. > > > > Note that it does *not* need to do any cross-CPU flushing or anything > > like that. So it's actually wrong (I think) to have that default > > fallback for > > > > #define flush_tlb_fix_spurious_fault(vma, address) > > flush_tlb_page(vma, address) > > > > because flush_tlb_page() is the serious "do cross CPU etc". > > > > Does the arm64 flush_tlb_page() perhaps do the whole expensive > > cross-CPU thing rather than the much cheaper "just local invalidate" > > version? > > I think it makes sense to have a local-only > flush_tlb_fix_spurious_fault(), but with ptep_set_access_flags() updated > to still issue the full broadcast TLBI. In addition, I added a minor > optimisation to avoid the TLB flush if the old pte was not accessible. > In a read-access fault case (followed by mkyoung), the TLB wouldn't have > cached a non-accessible pte (not sure it makes much difference to Yang's > case). Anyway, from ARMv8.1 onwards, the hardware handles the access > flag automatically. > > I'm not sure the first dsb(nshst) below is of much use in this case. If > we got a spurious fault, the write to the pte happened on a different > CPU (IIUC, we shouldn't return to user with updated ptes without a TLB > flush on the same CPU). Anyway, we can refine this if it solves Yang's > performance regression. > > -------------8<----------------------- > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h > index d493174415db..d1401cbad7d4 100644 > --- a/arch/arm64/include/asm/tlbflush.h > +++ b/arch/arm64/include/asm/tlbflush.h > @@ -268,6 +268,20 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, > dsb(ish); > } > > +static inline void local_flush_tlb_page(struct vm_area_struct *vma, > + unsigned long uaddr) > +{ > + unsigned long addr = __TLBI_VADDR(uaddr, ASID(vma->vm_mm)); > + > + dsb(nshst); > + __tlbi(vale1, addr); > + __tlbi_user(vale1, addr); > + dsb(nsh); > +} > + > +#define flush_tlb_fix_spurious_fault(vma, address) \ > + local_flush_tlb_page(vma, address) Why can't we just have flush_tlb_fix_spurious_fault() be a NOP on arm64? We only perform local TLB invalidation in a few special places (e.g. ASID rollover, hibernate) so everywhere else that TLB invalidation is required to prevent a CPU from re-taking a fault, it will have been broadcast. Given that the architecture prohibits the TLB from caching invalid entries, then software access/dirty is fine without additional flushing. The only problematic case I can think of is on the invalid->valid (i.e. map) path, where we elide the expensive DSB instruction because (a) most CPUs have a walker that can snoop the store buffer and (b) even if they don't, the store buffer tends to drain by the time we get back to userspace. Even if that was a problem, flush_tlb_fix_spurious_fault() wouldn't be the right hook, since the DSB must occur on the CPU that did the pte update. So I'd be inclined to drop flush_tlb_fix_spurious_fault() altogether, as I don't see why it's needed. Have I missed something? Will