All of lore.kernel.org
 help / color / mirror / Atom feed
From: bibo mao <maobibo@loongson.cn>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, pbonzini@redhat.com, seanjc@google.com,
	mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com,
	rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com,
	david@redhat.com
Subject: Re: [RFC PATCH v2 5/5] KVM: Unmap pages only when it's indeed protected for NUMA migration
Date: Fri, 11 Aug 2023 15:40:44 +0800	[thread overview]
Message-ID: <e7032573-9717-b1b9-7335-cbb0da12cd2a@loongson.cn> (raw)
In-Reply-To: <ZNWu2YCxy2FQBl4z@yzhao56-desk.sh.intel.com>



在 2023/8/11 11:45, Yan Zhao 写道:
>>> +static void kvm_mmu_notifier_numa_protect(struct mmu_notifier *mn,
>>> +					  struct mm_struct *mm,
>>> +					  unsigned long start,
>>> +					  unsigned long end)
>>> +{
>>> +	struct kvm *kvm = mmu_notifier_to_kvm(mn);
>>> +
>>> +	WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
>>> +	if (!READ_ONCE(kvm->mmu_invalidate_in_progress))
>>> +		return;
>>> +
>>> +	kvm_handle_hva_range(mn, start, end, __pte(0), kvm_unmap_gfn_range);
>>> +}
>> numa balance will scan wide memory range, and there will be one time
> Though scanning memory range is wide, .invalidate_range_start() is sent
> for each 2M range.
yes, range is huge page size when changing numa protection during numa scanning.

> 
>> ipi notification with kvm_flush_remote_tlbs. With page level notification,
>> it may bring out lots of flush remote tlb ipi notification.
> 
> Hmm, for VMs with assigned devices, apparently, the flush remote tlb IPIs
> will be reduced to 0 with this series.
> 
> For VMs without assigned devices or mdev devices, I was previously also
> worried about that there might be more IPIs.
> But with current test data, there's no more remote tlb IPIs on average.
> 
> The reason is below:
> 
> Before this series, kvm_unmap_gfn_range() is called for once for a 2M
> range.
> After this series, kvm_unmap_gfn_range() is called for once if the 2M is
> mapped to a huge page in primary MMU, and called for at most 512 times
> if mapped to 4K pages in primary MMU.
> 
> 
> Though kvm_unmap_gfn_range() is only called once before this series,
> as the range is blockable, when there're contentions, remote tlb IPIs
> can be sent page by page in 4K granularity (in tdp_mmu_iter_cond_resched())
I do not know much about x86, does this happen always or only need reschedule
from code?  so that there will be many times of tlb IPIs in only once function
call about kvm_unmap_gfn_range.

> if the pages are mapped in 4K in secondary MMU.
> 
> With this series, on the other hand, .numa_protect() sets range to be
> unblockable. So there could be less remote tlb IPIs when a 2M range is
> mapped into small PTEs in secondary MMU.
> Besides, .numa_protect() is not sent for all pages in a given 2M range.
No, .numa_protect() is not sent for all pages. It depends on the workload,
whether the page is accessed for different cpu threads cross-nodes.

> 
> Below is my testing data on a VM without assigned devices:
> The data is an average of 10 times guest boot-up.
>                    
>     data           | numa balancing caused  | numa balancing caused    
>   on average       | #kvm_unmap_gfn_range() | #kvm_flush_remote_tlbs() 
> -------------------|------------------------|--------------------------
> before this series |         35             |     8625                 
> after  this series |      10037             |     4610   
just be cautious, before the series there are  8625/35 = 246 IPI tlb flush ops
during one time kvm_unmap_gfn_range, is that x86 specific or generic? 

By the way are primary mmu and secondary mmu both 4K small page size "on average"?

Regards
Bibo Mao
              
> 
> For a single guest bootup,
>                    | numa balancing caused  | numa balancing caused    
>     best  data     | #kvm_unmap_gfn_range() | #kvm_flush_remote_tlbs() 
> -------------------|------------------------|--------------------------
> before this series |         28             |       13                  
> after  this series |        406             |      195                  
> 
>                    | numa balancing caused  | numa balancing caused    
>    worst  data     | #kvm_unmap_gfn_range() | #kvm_flush_remote_tlbs() 
> -------------------|------------------------|--------------------------
> before this series |         44             |    43920               
> after  this series |      17352             |     8668                 

> 
> 
>>
>> however numa balance notification, pmd table of vm maybe needs not be freed
>> in kvm_unmap_gfn_range.
>>
>  


  reply	other threads:[~2023-08-11  7:40 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10  8:56 [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM Yan Zhao
2023-08-10  8:57 ` [RFC PATCH v2 1/5] mm/mmu_notifier: introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA Yan Zhao
2023-08-10  8:58 ` [RFC PATCH v2 2/5] mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate purpose Yan Zhao
2023-08-10  9:00 ` [RFC PATCH v2 3/5] mm/mmu_notifier: introduce a new callback .numa_protect Yan Zhao
2023-08-10  9:00 ` [RFC PATCH v2 4/5] mm/autonuma: call .numa_protect() when page is protected for NUMA migrate Yan Zhao
2023-08-10 13:45   ` kernel test robot
2023-08-10 13:55   ` kernel test robot
2023-08-11 18:52   ` Nadav Amit
2023-08-14  7:52     ` Yan Zhao
2023-08-10  9:02 ` [RFC PATCH v2 5/5] KVM: Unmap pages only when it's indeed protected for NUMA migration Yan Zhao
2023-08-10 13:16   ` bibo mao
2023-08-11  3:45     ` Yan Zhao
2023-08-11  7:40       ` bibo mao [this message]
2023-08-11  8:01         ` Yan Zhao
2023-08-11 17:14           ` Sean Christopherson
2023-08-11 17:18             ` Jason Gunthorpe
2023-08-14  6:52             ` Yan Zhao
2023-08-14  7:44               ` Yan Zhao
2023-08-14 16:40               ` Sean Christopherson
2023-08-15  1:54                 ` Yan Zhao
2023-08-15 14:50                   ` Sean Christopherson
2023-08-16  2:43                     ` bibo mao
2023-08-16  3:44                       ` bibo mao
2023-08-16  5:14                         ` Yan Zhao
2023-08-16  7:29                           ` bibo mao
2023-08-16  7:18                             ` Yan Zhao
2023-08-16  7:53                               ` bibo mao
2023-08-16 13:39                                 ` Sean Christopherson
2023-08-10 15:19   ` kernel test robot
2023-08-10  9:34 ` [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM David Hildenbrand
2023-08-10  9:50   ` Yan Zhao
2023-08-11 17:25     ` David Hildenbrand
2023-08-11 18:20       ` John Hubbard
2023-08-11 18:39         ` David Hildenbrand
2023-08-11 19:35           ` John Hubbard
2023-08-14  9:09             ` Yan Zhao
2023-08-15  2:34               ` John Hubbard
2023-08-16  7:43                 ` David Hildenbrand
2023-08-16  9:06                   ` Yan Zhao
2023-08-16  9:49                     ` David Hildenbrand
2023-08-16 18:00                       ` John Hubbard
2023-08-17  5:05                         ` Yan Zhao
2023-08-17  7:38                           ` David Hildenbrand
2023-08-18  0:13                             ` Yan Zhao
2023-08-18  2:29                               ` John Hubbard
2023-09-04  9:18                                 ` Yan Zhao
2023-08-15  2:36               ` Yuan Yao
2023-08-15  2:37                 ` Yan Zhao
2023-08-10 13:58 ` Chao Gao
2023-08-11  5:22   ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7032573-9717-b1b9-7335-cbb0da12cd2a@loongson.cn \
    --to=maobibo@loongson.cn \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=pbonzini@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.