All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yan Zhao <yan.y.zhao@intel.com>
To: David Hildenbrand <david@redhat.com>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<kvm@vger.kernel.org>, <pbonzini@redhat.com>, <seanjc@google.com>,
	<mike.kravetz@oracle.com>, <apopple@nvidia.com>, <jgg@nvidia.com>,
	<rppt@kernel.org>, <akpm@linux-foundation.org>,
	<kevin.tian@intel.com>
Subject: Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM
Date: Thu, 10 Aug 2023 17:50:06 +0800	[thread overview]
Message-ID: <ZNSyzgyTxubo0g/D@yzhao56-desk.sh.intel.com> (raw)
In-Reply-To: <41a893e1-f2e7-23f4-cad2-d5c353a336a3@redhat.com>

On Thu, Aug 10, 2023 at 11:34:07AM +0200, David Hildenbrand wrote:
> > This series first introduces a new flag MMU_NOTIFIER_RANGE_NUMA in patch 1
> > to work with mmu notifier event type MMU_NOTIFY_PROTECTION_VMA, so that
> > the subscriber (e.g.KVM) of the mmu notifier can know that an invalidation
> > event is sent for NUMA migration purpose in specific.
> > 
> > Patch 2 skips setting PROT_NONE to long-term pinned pages in the primary
> > MMU to avoid NUMA protection introduced page faults and restoration of old
> > huge PMDs/PTEs in primary MMU.
> > 
> > Patch 3 introduces a new mmu notifier callback .numa_protect(), which
> > will be called in patch 4 when a page is ensured to be PROT_NONE protected.
> > 
> > Then in patch 5, KVM can recognize a .invalidate_range_start() notification
> > is for NUMA balancing specific and do not do the page unmap in secondary
> > MMU until .numa_protect() comes.
> > 
> 
> Why do we need all that, when we should simply not be applying PROT_NONE to
> pinned pages?
> 
> In change_pte_range() we already have:
> 
> if (is_cow_mapping(vma->vm_flags) &&
>     page_count(page) != 1)
> 
> Which includes both, shared and pinned pages.
Ah, right, currently in my side, I don't see any pinned pages are
outside of this condition. 
But I have a question regarding to is_cow_mapping(vma->vm_flags), do we
need to allow pinned pages in !is_cow_mapping(vma->vm_flags)?

> Staring at page #2, are we still missing something similar for THPs?
Yes.

> Why is that MMU notifier thingy and touching KVM code required?
Because NUMA balancing code will firstly send .invalidate_range_start() with
event type MMU_NOTIFY_PROTECTION_VMA to KVM in change_pmd_range()
unconditionally, before it goes down into change_pte_range() and
change_huge_pmd() to check each page count and apply PROT_NONE.

Then current KVM will unmap all notified pages from secondary MMU
in .invalidate_range_start(), which could include pages that finally not
set to PROT_NONE in primary MMU.

For VMs with pass-through devices, though all guest pages are pinned,
KVM still periodically unmap pages in response to the
.invalidate_range_start() notification from auto NUMA balancing, which
is a waste.

So, if there's a new callback sent when pages is set to PROT_NONE for NUMA
migrate only, KVM can unmap only those pages.
As KVM still needs to unmap pages for other type of event in its handler of
.invalidate_range_start() (.i.e. kvm_mmu_notifier_invalidate_range_start()),
and MMU_NOTIFY_PROTECTION_VMA also include other reasons, so patch 1
added a range flag to help KVM not to do a blind unmap in
.invalidate_range_start(), but do it in the new .numa_protect() handler.

> 
> -- 
> Cheers,
> 
> David / dhildenb
> 
> 

  reply	other threads:[~2023-08-10 10:17 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10  8:56 [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM Yan Zhao
2023-08-10  8:57 ` [RFC PATCH v2 1/5] mm/mmu_notifier: introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA Yan Zhao
2023-08-10  8:58 ` [RFC PATCH v2 2/5] mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate purpose Yan Zhao
2023-08-10  9:00 ` [RFC PATCH v2 3/5] mm/mmu_notifier: introduce a new callback .numa_protect Yan Zhao
2023-08-10  9:00 ` [RFC PATCH v2 4/5] mm/autonuma: call .numa_protect() when page is protected for NUMA migrate Yan Zhao
2023-08-10 13:45   ` kernel test robot
2023-08-10 13:55   ` kernel test robot
2023-08-11 18:52   ` Nadav Amit
2023-08-14  7:52     ` Yan Zhao
2023-08-10  9:02 ` [RFC PATCH v2 5/5] KVM: Unmap pages only when it's indeed protected for NUMA migration Yan Zhao
2023-08-10 13:16   ` bibo mao
2023-08-11  3:45     ` Yan Zhao
2023-08-11  7:40       ` bibo mao
2023-08-11  8:01         ` Yan Zhao
2023-08-11 17:14           ` Sean Christopherson
2023-08-11 17:18             ` Jason Gunthorpe
2023-08-14  6:52             ` Yan Zhao
2023-08-14  7:44               ` Yan Zhao
2023-08-14 16:40               ` Sean Christopherson
2023-08-15  1:54                 ` Yan Zhao
2023-08-15 14:50                   ` Sean Christopherson
2023-08-16  2:43                     ` bibo mao
2023-08-16  3:44                       ` bibo mao
2023-08-16  5:14                         ` Yan Zhao
2023-08-16  7:29                           ` bibo mao
2023-08-16  7:18                             ` Yan Zhao
2023-08-16  7:53                               ` bibo mao
2023-08-16 13:39                                 ` Sean Christopherson
2023-08-10 15:19   ` kernel test robot
2023-08-10  9:34 ` [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM David Hildenbrand
2023-08-10  9:50   ` Yan Zhao [this message]
2023-08-11 17:25     ` David Hildenbrand
2023-08-11 18:20       ` John Hubbard
2023-08-11 18:39         ` David Hildenbrand
2023-08-11 19:35           ` John Hubbard
2023-08-14  9:09             ` Yan Zhao
2023-08-15  2:34               ` John Hubbard
2023-08-16  7:43                 ` David Hildenbrand
2023-08-16  9:06                   ` Yan Zhao
2023-08-16  9:49                     ` David Hildenbrand
2023-08-16 18:00                       ` John Hubbard
2023-08-17  5:05                         ` Yan Zhao
2023-08-17  7:38                           ` David Hildenbrand
2023-08-18  0:13                             ` Yan Zhao
2023-08-18  2:29                               ` John Hubbard
2023-09-04  9:18                                 ` Yan Zhao
2023-08-15  2:36               ` Yuan Yao
2023-08-15  2:37                 ` Yan Zhao
2023-08-10 13:58 ` Chao Gao
2023-08-11  5:22   ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZNSyzgyTxubo0g/D@yzhao56-desk.sh.intel.com \
    --to=yan.y.zhao@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=pbonzini@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.