From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8505AC3F68F for ; Thu, 6 Feb 2020 05:54:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 47C04217F4 for ; Thu, 6 Feb 2020 05:54:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=c-s.fr header.i=@c-s.fr header.b="S0F7eAPt" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47C04217F4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=c-s.fr Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D237A6B0003; Thu, 6 Feb 2020 00:54:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CD2E76B0006; Thu, 6 Feb 2020 00:54:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE9876B0007; Thu, 6 Feb 2020 00:54:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0134.hostedemail.com [216.40.44.134]) by kanga.kvack.org (Postfix) with ESMTP id A78B36B0003 for ; Thu, 6 Feb 2020 00:54:44 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4B30B40C6 for ; Thu, 6 Feb 2020 05:54:44 +0000 (UTC) X-FDA: 76458638088.13.cat54_46635ede7ba24 X-HE-Tag: cat54_46635ede7ba24 X-Filterd-Recvd-Size: 7485 Received: from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Thu, 6 Feb 2020 05:54:43 +0000 (UTC) Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 48Cnh11K9LzB09b4; Thu, 6 Feb 2020 06:54:41 +0100 (CET) Authentication-Results: localhost; dkim=pass reason="1024-bit key; insecure key" header.d=c-s.fr header.i=@c-s.fr header.b=S0F7eAPt; dkim-adsp=pass; dkim-atps=neutral X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id PeyJBm5d3ZjO; Thu, 6 Feb 2020 06:54:41 +0100 (CET) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 48Cnh06jH1zB09b3; Thu, 6 Feb 2020 06:54:40 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=c-s.fr; s=mail; t=1580968480; bh=+f8zGT9Ebstem/Pxu4Cc5kGrF5RDE+p8c6MevcVDVRY=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=S0F7eAPt7dAILYlJsklcUf+6F62oSjTIx7JnglrKhz714+RhC1qH7YiuZIag86/SP iOtkJ41PYMq7as/vgcVlvIC7B892qJ2prn5WkGVNA7Lrn6l0P75rW/jvDwBTaO5tcr bjeXm4IrDa8ro+bGiE5QSFM5FtpGZAbMttDt3MeM= Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id B002F8B787; Thu, 6 Feb 2020 06:54:41 +0100 (CET) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id EalpXgNeYE9k; Thu, 6 Feb 2020 06:54:41 +0100 (CET) Received: from [192.168.4.90] (unknown [192.168.4.90]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 0AEA38B776; Thu, 6 Feb 2020 06:54:40 +0100 (CET) Subject: Re: [PATCH v6 01/11] asm-generic/pgtable: Adds generic functions to track lockless pgtable walks To: Leonardo Bras , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Arnd Bergmann , Andrew Morton , "Aneesh Kumar K.V" , Nicholas Piggin , Steven Price , Robin Murphy , Mahesh Salgaonkar , Balbir Singh , Reza Arbab , Thomas Gleixner , Allison Randal , Greg Kroah-Hartman , Mike Rapoport , Michal Suchanek Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org References: <20200206030900.147032-1-leonardo@linux.ibm.com> <20200206030900.147032-2-leonardo@linux.ibm.com> From: Christophe Leroy Message-ID: Date: Thu, 6 Feb 2020 06:54:39 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: <20200206030900.147032-2-leonardo@linux.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Le 06/02/2020 =C3=A0 04:08, Leonardo Bras a =C3=A9crit=C2=A0: > It's necessary to track lockless pagetable walks, in order to avoid doi= ng > THP splitting/collapsing during them. >=20 > The default solution is to disable irq before lockless pagetable walks = and > enable it after it's finished. >=20 > On code, this means you can find local_irq_disable() and local_irq_enab= le() > around some pieces of code, usually without comments on why it is neede= d. >=20 > This patch proposes a set of generic functions to be called before star= ting > and after finishing a lockless pagetable walk. It is supposed to make c= lear > that a lockless pagetable walk happens there, and also carries informat= ion > on why the irq disable/enable is needed. >=20 > begin_lockless_pgtbl_walk() > Insert before starting any lockless pgtable walk > end_lockless_pgtbl_walk() > Insert after the end of any lockless pgtable walk > (Mostly after the ptep is last used) >=20 > A memory barrier was also added just to make sure there is no speculati= ve > read outside the interrupt disabled area. Other than that, it is not > supposed to have any change of behavior from current code. Is that speculative barrier necessary for all architectures ? Does it=20 impact performance ? Shouldn't this be another patch ? >=20 > It is planned to allow arch-specific versions, so that additional steps= can > be added while keeping the code clean. >=20 > Signed-off-by: Leonardo Bras > --- > include/asm-generic/pgtable.h | 51 ++++++++++++++++++++++++++++++++++= + > 1 file changed, 51 insertions(+) >=20 > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtabl= e.h > index e2e2bef07dd2..8d368d3c0974 100644 > --- a/include/asm-generic/pgtable.h > +++ b/include/asm-generic/pgtable.h > @@ -1222,6 +1222,57 @@ static inline bool arch_has_pfn_modify_check(voi= d) > #endif > #endif > =20 > +#ifndef __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL > +/* > + * begin_lockless_pgtbl_walk: Must be inserted before a function call = that does > + * lockless pagetable walks, such as __find_linux_pte() > + */ > +static inline > +unsigned long begin_lockless_pgtbl_walk(void) What about keeping the same syntax as local_irq_save(), something like: #define begin_lockless_pgtbl_walk(flags) \ do { local_irq_save(flags); smp_mb(); } while (0) > +{ > + unsigned long irq_mask; > + > + /* > + * Interrupts must be disabled during the lockless page table walk. > + * That's because the deleting or splitting involves flushing TLBs, > + * which in turn issues interrupts, that will block when disabled. > + */ > + local_irq_save(irq_mask); > + > + /* > + * This memory barrier pairs with any code that is either trying to > + * delete page tables, or split huge pages. Without this barrier, > + * the page tables could be read speculatively outside of interrupt > + * disabling. > + */ > + smp_mb(); > + > + return irq_mask; > +} > + > +/* > + * end_lockless_pgtbl_walk: Must be inserted after the last use of a p= ointer > + * returned by a lockless pagetable walk, such as __find_linux_pte() > + */ > +static inline void end_lockless_pgtbl_walk(unsigned long irq_mask) Same #define end_lockless_pgtbl_walk(flags) \ do { smp_mb(); local_irq_restore(flags); } while (0); > +{ > + /* > + * This memory barrier pairs with any code that is either trying to > + * delete page tables, or split huge pages. Without this barrier, > + * the page tables could be read speculatively outside of interrupt > + * disabling. > + */ > + smp_mb(); > + > + /* > + * Interrupts must be disabled during the lockless page table walk. > + * That's because the deleting or splitting involves flushing TLBs, > + * which in turn issues interrupts, that will block when disabled. > + */ > + local_irq_restore(irq_mask); > +} > +#endif > + > /* > * On some architectures it depends on the mm if the p4d/pud or pmd > * layer of the page table hierarchy is folded or not. >=20 Christophe