From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99652C07E9B for ; Tue, 20 Jul 2021 22:22:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 887F860FE9 for ; Tue, 20 Jul 2021 22:22:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230373AbhGTVlU (ORCPT ); Tue, 20 Jul 2021 17:41:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36122 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233901AbhGTVja (ORCPT ); Tue, 20 Jul 2021 17:39:30 -0400 Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 868B7C061574; Tue, 20 Jul 2021 15:20:06 -0700 (PDT) Received: by mail-ed1-x52f.google.com with SMTP id dj21so30442760edb.0; Tue, 20 Jul 2021 15:20:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=x7JmVa6y/UxTfaCe/AUj//AP0XdgR4+S1fhuystE3/E=; b=dOx3ohMHYsSpOXQotFv5lOPfvph/3jbXKrJ0oui54VB8nSBB/IAU+BrevrzWGLK8nZ smFmnrIWTZpV3tcYPTBKyElteqS32VFq7qlj1MzRjt3CDCLBOmocuYGSFBhCkXw7xfA7 oPN7A2ic7NbwczD68m/6oLn8Xsd9dBgHNtjHI2ew8kERXttIxYoPnDeEddH4zmKkG9gb tP+Djl14Q7lQMAsBkjCDBzj/nW9WPx8AH4fywuLWgefGIYUySLYVatOduXt+NGqGb4BI /WWEygmlFnpBafBlEwa7Kao9jnuo2+OW66etEyaVwRBsR/E7kODjTMmfvNhxaEq7W93A lMtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=x7JmVa6y/UxTfaCe/AUj//AP0XdgR4+S1fhuystE3/E=; b=eriFL2xXWas5r2C96sQd64Z3i7HFlWzUBIAAJZt0I5A03eMjHrJsW6K1qZvSmq3ChU EXWry6XM/P9MFLKLSYCUUvRr7hmGuyhR7gvXotE1RZs/QvKhCb/zmM4dXMZONB2DDg4p Ajz/65klzGF6pjqm7RsR8+0lFK2ERBsk7WOYIPkBALKmb7obBvSrvEECwb9e95n1jnjZ 5fK5X0pjHvdjQwM9XGUsfm7mfwxvQaTUms4nbo4XfELzF88HrNwjQ03/gCQ0t2y1A8Wq pr9gfGKXogumk7TaSrlJfJ3oUrDX385k4mw5dXvFSfUqTc5Lr45On98lc3IJlsLBfJvm jkfg== X-Gm-Message-State: AOAM5309PIpBZ0BNLv547gq37zkyFvIPCisX3Q/M0qIDHqKFDBf54rEv sSlg7i3ilSRKj+Z0Vht6Ly9AvZWQ8Aw/XgpOAwQ= X-Google-Smtp-Source: ABdhPJy9MwCAIcgSAEgHMfp+2OI6E/qr+2onYU8cfXajbpCOvqYq5P+fl6VSck1kFuwoULawBqq318bklW+vXBgGIC4= X-Received: by 2002:aa7:c689:: with SMTP id n9mr44215925edq.151.1626819605059; Tue, 20 Jul 2021 15:20:05 -0700 (PDT) MIME-Version: 1.0 References: <20210720065529.716031-1-ying.huang@intel.com> <0D75A92F-E2AA-480C-9E9A-0B6EE7897757@nvidia.com> In-Reply-To: <0D75A92F-E2AA-480C-9E9A-0B6EE7897757@nvidia.com> From: Yang Shi Date: Tue, 20 Jul 2021 15:19:50 -0700 Message-ID: Subject: Re: [PATCH] mm,do_huge_pmd_numa_page: remove unnecessary TLB flushing code To: Zi Yan Cc: Christian Borntraeger , Huang Ying , Andrew Morton , Linux MM , Linux Kernel Mailing List , Dan Carpenter , Mel Gorman , Gerald Schaefer , Heiko Carstens , Hugh Dickins , Andrea Arcangeli , "Kirill A . Shutemov" , Michal Hocko , Vasily Gorbik , Paolo Bonzini , kvm list Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 20, 2021 at 2:04 PM Zi Yan wrote: > > On 20 Jul 2021, at 16:53, Yang Shi wrote: > > > On Tue, Jul 20, 2021 at 7:25 AM Christian Borntraeger > > wrote: > >> > >> > >> > >> On 20.07.21 08:55, Huang Ying wrote: > >>> Before the commit c5b5a3dd2c1f ("mm: thp: refactor NUMA fault > >>> handling"), the TLB flushing is done in do_huge_pmd_numa_page() itsel= f > >>> via flush_tlb_range(). > >>> > >>> But after commit c5b5a3dd2c1f ("mm: thp: refactor NUMA fault > >>> handling"), the TLB flushing is done in migrate_pages() as in the > >>> following code path anyway. > >>> > >>> do_huge_pmd_numa_page > >>> migrate_misplaced_page > >>> migrate_pages > >>> > >>> So now, the TLB flushing code in do_huge_pmd_numa_page() becomes > >>> unnecessary. So the code is deleted in this patch to simplify the > >>> code. This is only code cleanup, there's no visible performance > >>> difference. > >>> > >>> Signed-off-by: "Huang, Ying" > >>> Cc: Yang Shi > >>> Cc: Dan Carpenter > >>> Cc: Mel Gorman > >>> Cc: Christian Borntraeger > >>> Cc: Gerald Schaefer > >>> Cc: Heiko Carstens > >>> Cc: Hugh Dickins > >>> Cc: Andrea Arcangeli > >>> Cc: Kirill A. Shutemov > >>> Cc: Michal Hocko > >>> Cc: Vasily Gorbik > >>> Cc: Zi Yan > >>> --- > >>> mm/huge_memory.c | 26 -------------------------- > >>> 1 file changed, 26 deletions(-) > >>> > >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>> index afff3ac87067..9f21e44c9030 100644 > >>> --- a/mm/huge_memory.c > >>> +++ b/mm/huge_memory.c > >>> @@ -1440,32 +1440,6 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fau= lt *vmf) > >>> goto out; > >>> } > >>> > >>> - /* > >>> - * Since we took the NUMA fault, we must have observed the !acc= essible > >>> - * bit. Make sure all other CPUs agree with that, to avoid them > >>> - * modifying the page we're about to migrate. > >>> - * > >>> - * Must be done under PTL such that we'll observe the relevant > >>> - * inc_tlb_flush_pending(). > >>> - * > >>> - * We are not sure a pending tlb flush here is for a huge page > >>> - * mapping or not. Hence use the tlb range variant > >>> - */ > >>> - if (mm_tlb_flush_pending(vma->vm_mm)) { > >>> - flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE); > >>> - /* > >>> - * change_huge_pmd() released the pmd lock before > >>> - * invalidating the secondary MMUs sharing the primary > >>> - * MMU pagetables (with ->invalidate_range()). The > >>> - * mmu_notifier_invalidate_range_end() (which > >>> - * internally calls ->invalidate_range()) in > >>> - * change_pmd_range() will run after us, so we can't > >>> - * rely on it here and we need an explicit invalidate. > >>> - */ > >>> - mmu_notifier_invalidate_range(vma->vm_mm, haddr, > >>> - haddr + HPAGE_PMD_SIZE); > >>> - } > >>> CC Paolo/KVM list so we also remove the mmu notifier here. Do we need= those > >> now in migrate_pages? I am not an expert in that code, but I cant find > >> an equivalent mmu_notifier in migrate_misplaced_pages. > >> I might be totally wrong, just something that I noticed. > > > > Do you mean the missed mmu notifier invalidate for the THP migration > > case? Yes, I noticed that too. But I'm not sure whether it is intended > > or just missed. > > From my understand of mmu_notifier document, mmu_notifier_invalidate_rang= e() > is needed only if the PTE is updated to point to a new page or the page p= ointed > by the PTE is freed. Page migration does not fall into either case. > In addition, in migrate_pages(), more specifically try_to_migrate_one(), > there is a pair of mmu_notifier_invalidate_range_start() and > mmu_notifier_invalidate_range_end() around the PTE manipulation code, whi= ch should > be sufficient to notify secondary TLBs (including KVM) about the PTE chan= ge > for page migration. Correct me if I am wrong. Thanks, I think you are correct. By looking into commit 7066f0f933a1 ("mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()"), the tlb flush and mmu notifier invalidate were needed since the old numa fault implementation didn't change PTE to migration entry so it may cause data corruption due to the writes from GPU secondary MMU. The refactor does use the generic migration code which converts PTE to migration entry before copying data to the new page. > > =E2=80=94 > Best Regards, > Yan, Zi From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B757BC07E9B for ; Tue, 20 Jul 2021 22:20:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 55DCB61165 for ; Tue, 20 Jul 2021 22:20:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 55DCB61165 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0475B6B0011; Tue, 20 Jul 2021 18:20:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F39246B0036; Tue, 20 Jul 2021 18:20:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD9656B006C; Tue, 20 Jul 2021 18:20:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0080.hostedemail.com [216.40.44.80]) by kanga.kvack.org (Postfix) with ESMTP id BABC26B0011 for ; Tue, 20 Jul 2021 18:20:09 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id AC29D1850EB47 for ; Tue, 20 Jul 2021 22:20:06 +0000 (UTC) X-FDA: 78384385212.22.B3441EB Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf09.hostedemail.com (Postfix) with ESMTP id 6FC05300013E for ; Tue, 20 Jul 2021 22:20:06 +0000 (UTC) Received: by mail-ed1-f53.google.com with SMTP id ca14so30379538edb.2 for ; Tue, 20 Jul 2021 15:20:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=x7JmVa6y/UxTfaCe/AUj//AP0XdgR4+S1fhuystE3/E=; b=dOx3ohMHYsSpOXQotFv5lOPfvph/3jbXKrJ0oui54VB8nSBB/IAU+BrevrzWGLK8nZ smFmnrIWTZpV3tcYPTBKyElteqS32VFq7qlj1MzRjt3CDCLBOmocuYGSFBhCkXw7xfA7 oPN7A2ic7NbwczD68m/6oLn8Xsd9dBgHNtjHI2ew8kERXttIxYoPnDeEddH4zmKkG9gb tP+Djl14Q7lQMAsBkjCDBzj/nW9WPx8AH4fywuLWgefGIYUySLYVatOduXt+NGqGb4BI /WWEygmlFnpBafBlEwa7Kao9jnuo2+OW66etEyaVwRBsR/E7kODjTMmfvNhxaEq7W93A lMtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=x7JmVa6y/UxTfaCe/AUj//AP0XdgR4+S1fhuystE3/E=; b=DYKsC0FJoozaWQy817ljJYiQWGCIgWg632fT8iRLZBoYgLTFJgDS+pj+UXEp5TLaJD 1SngvY9xml5n58vH0C1QPJJWAkoX1tyoMlcpZsY2NwqUTxQ5WbeRgE+YBLexIl6MNY5X 4ZMeFeJmyTw9Zvz+uf8c8kawi3DKPct+bXmczlsYoFBKs1DD911RovD22YO/oi/BU594 CAGBSQ6jTF2j3Fi9gNYZj4Bbivq7yo6pYzSxzoz8ghizljwNbW2UNjxKUs4z6MsP/JvL g43GbM4zFzBeProV3xH6s0eN6w0VfILlroVkZtWBDssUS/pL1DHGkWHWjGkIycVSGX0X Iy9g== X-Gm-Message-State: AOAM531IqU62wfqxWjP4cij8iW3SkZ12xByq/E+9LCe8y+DGnk6YCX2N 3xygf6ihMvLRdRbXAGkHJpUIl+40VX3h/m/pND8= X-Google-Smtp-Source: ABdhPJy9MwCAIcgSAEgHMfp+2OI6E/qr+2onYU8cfXajbpCOvqYq5P+fl6VSck1kFuwoULawBqq318bklW+vXBgGIC4= X-Received: by 2002:aa7:c689:: with SMTP id n9mr44215925edq.151.1626819605059; Tue, 20 Jul 2021 15:20:05 -0700 (PDT) MIME-Version: 1.0 References: <20210720065529.716031-1-ying.huang@intel.com> <0D75A92F-E2AA-480C-9E9A-0B6EE7897757@nvidia.com> In-Reply-To: <0D75A92F-E2AA-480C-9E9A-0B6EE7897757@nvidia.com> From: Yang Shi Date: Tue, 20 Jul 2021 15:19:50 -0700 Message-ID: Subject: Re: [PATCH] mm,do_huge_pmd_numa_page: remove unnecessary TLB flushing code To: Zi Yan Cc: Christian Borntraeger , Huang Ying , Andrew Morton , Linux MM , Linux Kernel Mailing List , Dan Carpenter , Mel Gorman , Gerald Schaefer , Heiko Carstens , Hugh Dickins , Andrea Arcangeli , "Kirill A . Shutemov" , Michal Hocko , Vasily Gorbik , Paolo Bonzini , kvm list Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=dOx3ohMH; spf=pass (imf09.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: ffzwrq3i91ahj6pray4zfhqxf9f9b5ui X-Rspamd-Queue-Id: 6FC05300013E X-Rspamd-Server: rspam01 X-HE-Tag: 1626819606-726544 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jul 20, 2021 at 2:04 PM Zi Yan wrote: > > On 20 Jul 2021, at 16:53, Yang Shi wrote: > > > On Tue, Jul 20, 2021 at 7:25 AM Christian Borntraeger > > wrote: > >> > >> > >> > >> On 20.07.21 08:55, Huang Ying wrote: > >>> Before the commit c5b5a3dd2c1f ("mm: thp: refactor NUMA fault > >>> handling"), the TLB flushing is done in do_huge_pmd_numa_page() itsel= f > >>> via flush_tlb_range(). > >>> > >>> But after commit c5b5a3dd2c1f ("mm: thp: refactor NUMA fault > >>> handling"), the TLB flushing is done in migrate_pages() as in the > >>> following code path anyway. > >>> > >>> do_huge_pmd_numa_page > >>> migrate_misplaced_page > >>> migrate_pages > >>> > >>> So now, the TLB flushing code in do_huge_pmd_numa_page() becomes > >>> unnecessary. So the code is deleted in this patch to simplify the > >>> code. This is only code cleanup, there's no visible performance > >>> difference. > >>> > >>> Signed-off-by: "Huang, Ying" > >>> Cc: Yang Shi > >>> Cc: Dan Carpenter > >>> Cc: Mel Gorman > >>> Cc: Christian Borntraeger > >>> Cc: Gerald Schaefer > >>> Cc: Heiko Carstens > >>> Cc: Hugh Dickins > >>> Cc: Andrea Arcangeli > >>> Cc: Kirill A. Shutemov > >>> Cc: Michal Hocko > >>> Cc: Vasily Gorbik > >>> Cc: Zi Yan > >>> --- > >>> mm/huge_memory.c | 26 -------------------------- > >>> 1 file changed, 26 deletions(-) > >>> > >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>> index afff3ac87067..9f21e44c9030 100644 > >>> --- a/mm/huge_memory.c > >>> +++ b/mm/huge_memory.c > >>> @@ -1440,32 +1440,6 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fau= lt *vmf) > >>> goto out; > >>> } > >>> > >>> - /* > >>> - * Since we took the NUMA fault, we must have observed the !acc= essible > >>> - * bit. Make sure all other CPUs agree with that, to avoid them > >>> - * modifying the page we're about to migrate. > >>> - * > >>> - * Must be done under PTL such that we'll observe the relevant > >>> - * inc_tlb_flush_pending(). > >>> - * > >>> - * We are not sure a pending tlb flush here is for a huge page > >>> - * mapping or not. Hence use the tlb range variant > >>> - */ > >>> - if (mm_tlb_flush_pending(vma->vm_mm)) { > >>> - flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE); > >>> - /* > >>> - * change_huge_pmd() released the pmd lock before > >>> - * invalidating the secondary MMUs sharing the primary > >>> - * MMU pagetables (with ->invalidate_range()). The > >>> - * mmu_notifier_invalidate_range_end() (which > >>> - * internally calls ->invalidate_range()) in > >>> - * change_pmd_range() will run after us, so we can't > >>> - * rely on it here and we need an explicit invalidate. > >>> - */ > >>> - mmu_notifier_invalidate_range(vma->vm_mm, haddr, > >>> - haddr + HPAGE_PMD_SIZE); > >>> - } > >>> CC Paolo/KVM list so we also remove the mmu notifier here. Do we need= those > >> now in migrate_pages? I am not an expert in that code, but I cant find > >> an equivalent mmu_notifier in migrate_misplaced_pages. > >> I might be totally wrong, just something that I noticed. > > > > Do you mean the missed mmu notifier invalidate for the THP migration > > case? Yes, I noticed that too. But I'm not sure whether it is intended > > or just missed. > > From my understand of mmu_notifier document, mmu_notifier_invalidate_rang= e() > is needed only if the PTE is updated to point to a new page or the page p= ointed > by the PTE is freed. Page migration does not fall into either case. > In addition, in migrate_pages(), more specifically try_to_migrate_one(), > there is a pair of mmu_notifier_invalidate_range_start() and > mmu_notifier_invalidate_range_end() around the PTE manipulation code, whi= ch should > be sufficient to notify secondary TLBs (including KVM) about the PTE chan= ge > for page migration. Correct me if I am wrong. Thanks, I think you are correct. By looking into commit 7066f0f933a1 ("mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()"), the tlb flush and mmu notifier invalidate were needed since the old numa fault implementation didn't change PTE to migration entry so it may cause data corruption due to the writes from GPU secondary MMU. The refactor does use the generic migration code which converts PTE to migration entry before copying data to the new page. > > =E2=80=94 > Best Regards, > Yan, Zi