From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46314C433ED for ; Fri, 16 Apr 2021 05:49:10 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9394261073 for ; Fri, 16 Apr 2021 05:49:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9394261073 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=csgroup.eu Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4FM4yq6pyJz3bTp for ; Fri, 16 Apr 2021 15:49:07 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=csgroup.eu (client-ip=93.17.236.30; helo=pegase1.c-s.fr; envelope-from=christophe.leroy@csgroup.eu; receiver=) Received: from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4FM4yR6bKfz2yZK for ; Fri, 16 Apr 2021 15:48:45 +1000 (AEST) Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 4FM4yL18bxzB09bL; Fri, 16 Apr 2021 07:48:42 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id y8gfrr0M5iy9; Fri, 16 Apr 2021 07:48:42 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 4FM4yK6ptpzB09bK; Fri, 16 Apr 2021 07:48:41 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id C3C238B81C; Fri, 16 Apr 2021 07:48:42 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id GFphMEGbTxDw; Fri, 16 Apr 2021 07:48:42 +0200 (CEST) Received: from [192.168.4.90] (unknown [192.168.4.90]) by messagerie.si.c-s.fr (Postfix) with ESMTP id BFF598B81A; Fri, 16 Apr 2021 07:48:41 +0200 (CEST) Subject: Re: [PATCH v1 1/5] mm: pagewalk: Fix walk for hugepage tables To: Daniel Axtens , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Steven Price , akpm@linux-foundation.org References: <733408f48b1ed191f53518123ee6fc6d42287cc6.1618506910.git.christophe.leroy@csgroup.eu> <877dl3184l.fsf@dja-thinkpad.axtens.net> From: Christophe Leroy Message-ID: <56d4c630-ac1e-6b75-39a5-7b5bfbd5b1aa@csgroup.eu> Date: Fri, 16 Apr 2021 07:48:41 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 MIME-Version: 1.0 In-Reply-To: <877dl3184l.fsf@dja-thinkpad.axtens.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: 8bit X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arch@vger.kernel.org, linux-s390@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" Le 16/04/2021 à 00:43, Daniel Axtens a écrit : > Hi Christophe, > >> Pagewalk ignores hugepd entries and walk down the tables >> as if it was traditionnal entries, leading to crazy result. >> >> Add walk_hugepd_range() and use it to walk hugepage tables. >> >> Signed-off-by: Christophe Leroy >> --- >> mm/pagewalk.c | 54 +++++++++++++++++++++++++++++++++++++++++++++------ >> 1 file changed, 48 insertions(+), 6 deletions(-) >> >> diff --git a/mm/pagewalk.c b/mm/pagewalk.c >> index e81640d9f177..410a9d8f7572 100644 >> --- a/mm/pagewalk.c >> +++ b/mm/pagewalk.c >> @@ -58,6 +58,32 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, >> return err; >> } >> >> +static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr, >> + unsigned long end, struct mm_walk *walk, int pdshift) >> +{ >> + int err = 0; >> +#ifdef CONFIG_ARCH_HAS_HUGEPD >> + const struct mm_walk_ops *ops = walk->ops; >> + int shift = hugepd_shift(*phpd); >> + int page_size = 1 << shift; >> + >> + if (addr & (page_size - 1)) >> + return 0; >> + >> + for (;;) { >> + pte_t *pte = hugepte_offset(*phpd, addr, pdshift); >> + >> + err = ops->pte_entry(pte, addr, addr + page_size, walk); >> + if (err) >> + break; >> + if (addr >= end - page_size) >> + break; >> + addr += page_size; >> + } > > Initially I thought this was a somewhat unintuitive way to structure > this loop, but I see it parallels the structure of walk_pte_range_inner, > so I think the consistency is worth it. > > I notice the pte walking code potentially takes some locks: does this > code need to do that? > > arch/powerpc/mm/hugetlbpage.c says that hugepds are protected by the > mm->page_table_lock, but I don't think we're taking it in this code. I'll add it, thanks. > >> +#endif >> + return err; >> +} >> + >> static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, >> struct mm_walk *walk) >> { >> @@ -108,7 +134,10 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, >> goto again; >> } >> >> - err = walk_pte_range(pmd, addr, next, walk); >> + if (is_hugepd(__hugepd(pmd_val(*pmd)))) >> + err = walk_hugepd_range((hugepd_t *)pmd, addr, next, walk, PMD_SHIFT); >> + else >> + err = walk_pte_range(pmd, addr, next, walk); >> if (err) >> break; >> } while (pmd++, addr = next, addr != end); >> @@ -157,7 +186,10 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, >> if (pud_none(*pud)) >> goto again; >> >> - err = walk_pmd_range(pud, addr, next, walk); >> + if (is_hugepd(__hugepd(pud_val(*pud)))) >> + err = walk_hugepd_range((hugepd_t *)pud, addr, next, walk, PUD_SHIFT); >> + else >> + err = walk_pmd_range(pud, addr, next, walk); > > I'm a bit worried you might end up calling into walk_hugepd_range with > ops->pte_entry == NULL, and then jumping to 0. You are right, I missed it. I'll bail out of walk_hugepd_range() when ops->pte_entry is NULL. > > static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, > struct mm_walk *walk) > { > ... > pud = pud_offset(p4d, addr); > do { > ... > if ((!walk->vma && (pud_leaf(*pud) || !pud_present(*pud))) || > walk->action == ACTION_CONTINUE || > !(ops->pmd_entry || ops->pte_entry)) <<< THIS CHECK > continue; > ... > if (is_hugepd(__hugepd(pud_val(*pud)))) > err = walk_hugepd_range((hugepd_t *)pud, addr, next, walk, PUD_SHIFT); > else > err = walk_pmd_range(pud, addr, next, walk); > if (err) > break; > } while (pud++, addr = next, addr != end); > > walk_pud_range will proceed if there is _either_ an ops->pmd_entry _or_ > an ops->pte_entry, but walk_hugepd_range will call ops->pte_entry > unconditionally. > > The same issue applies to walk_{p4d,pgd}_range... > > Kind regards, > Daniel > Thanks Christophe