All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Huang Ying <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Dan Carpenter <dan.carpenter@oracle.com>,
	Mel Gorman <mgorman@suse.de>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>, Vasily Gorbik <gor@linux.ibm.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm list <kvm@vger.kernel.org>
Subject: Re: [PATCH] mm,do_huge_pmd_numa_page: remove unnecessary TLB flushing code
Date: Wed, 21 Jul 2021 15:41:05 +0000	[thread overview]
Message-ID: <YPhAEcHOCZ5yII/T@google.com> (raw)
In-Reply-To: <CAHbLzkqZZEic7+H0ky9u+aKO5o_cF0N5xQ=JO2tMpc8jg8RcnQ@mail.gmail.com>

On Tue, Jul 20, 2021, Yang Shi wrote:
> On Tue, Jul 20, 2021 at 2:04 PM Zi Yan <ziy@nvidia.com> wrote:
> >
> > On 20 Jul 2021, at 16:53, Yang Shi wrote:
> >
> > > On Tue, Jul 20, 2021 at 7:25 AM Christian Borntraeger
> > > <borntraeger@de.ibm.com> wrote:
> > >>> -     if (mm_tlb_flush_pending(vma->vm_mm)) {
> > >>> -             flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
> > >>> -             /*
> > >>> -              * change_huge_pmd() released the pmd lock before
> > >>> -              * invalidating the secondary MMUs sharing the primary
> > >>> -              * MMU pagetables (with ->invalidate_range()). The
> > >>> -              * mmu_notifier_invalidate_range_end() (which
> > >>> -              * internally calls ->invalidate_range()) in
> > >>> -              * change_pmd_range() will run after us, so we can't
> > >>> -              * rely on it here and we need an explicit invalidate.
> > >>> -              */
> > >>> -             mmu_notifier_invalidate_range(vma->vm_mm, haddr,
> > >>> -                                           haddr + HPAGE_PMD_SIZE);
> > >>> -     }
> > >>> CC Paolo/KVM list so we also remove the mmu notifier here. Do we need those
> > >> now in migrate_pages? I am not an expert in that code, but I cant find
> > >> an equivalent mmu_notifier in migrate_misplaced_pages.
> > >> I might be totally wrong, just something that I noticed.
> > >
> > > Do you mean the missed mmu notifier invalidate for the THP migration
> > > case? Yes, I noticed that too. But I'm not sure whether it is intended
> > > or just missed.
> >
> > From my understand of mmu_notifier document, mmu_notifier_invalidate_range()
> > is needed only if the PTE is updated to point to a new page or the page pointed
> > by the PTE is freed. Page migration does not fall into either case.

The "new page" part of

  a page table entry is updated to point to a new page

is referring to a different physical page, i.e. a different pfn, not a different
struct page.  do_huge_pmd_numa_page() is moving a THP between nodes, thus it's
changing the backing pfn and needs to invalidate secondary MMUs at some point.

> > In addition, in migrate_pages(), more specifically try_to_migrate_one(),
> > there is a pair of mmu_notifier_invalidate_range_start() and
> > mmu_notifier_invalidate_range_end() around the PTE manipulation code, which should
> > be sufficient to notify secondary TLBs (including KVM) about the PTE change
> > for page migration. Correct me if I am wrong.
> 
> Thanks, I think you are correct. By looking into commit 7066f0f933a1
> ("mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()"),
> the tlb flush and mmu notifier invalidate were needed since the old
> numa fault implementation didn't change PTE to migration entry so it
> may cause data corruption due to the writes from GPU secondary MMU.
> 
> The refactor does use the generic migration code which converts PTE to
> migration entry before copying data to the new page.

That's my understanding as well, based on this blurb from commit 7066f0f933a1.

    The standard PAGE_SIZEd migrate_misplaced_page is less accelerated and
    uses the generic migrate_pages which transitions the pte from
    numa/protnone to a migration entry in try_to_unmap_one() and flushes TLBs
    and all mmu notifiers there before copying the page.

That analysis/justification for removing the invalidate_range() call should be
captured in the changelog.  Confirmation from Andrea would be a nice bonus.

  reply	other threads:[~2021-07-21 15:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-20  6:55 [PATCH] mm,do_huge_pmd_numa_page: remove unnecessary TLB flushing code Huang Ying
2021-07-20 13:36 ` Zi Yan
2021-07-20 14:25 ` Christian Borntraeger
2021-07-20 20:53   ` Yang Shi
2021-07-20 20:53     ` Yang Shi
2021-07-20 21:04     ` Zi Yan
2021-07-20 22:19       ` Yang Shi
2021-07-20 22:19         ` Yang Shi
2021-07-21 15:41         ` Sean Christopherson [this message]
2021-07-22  0:26           ` Huang, Ying
2021-07-22  0:26             ` Huang, Ying
2021-07-22  7:36             ` Christian Borntraeger
2021-07-22 23:10               ` Huang, Ying
2021-07-22 23:10                 ` Huang, Ying
2021-07-23  0:03               ` Huang, Ying
2021-07-23  0:03                 ` Huang, Ying
2021-07-23 20:19                 ` Andrew Morton
2021-07-20 20:48 ` Yang Shi
2021-07-20 20:48   ` Yang Shi
2021-07-20 22:21   ` Yang Shi
2021-07-20 22:21     ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YPhAEcHOCZ5yII/T@google.com \
    --to=seanjc@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=dan.carpenter@oracle.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=shy828301@gmail.com \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.