All of lore.kernel.org
 help / color / mirror / Atom feed
From: bibo mao <maobibo@loongson.cn>
To: Sean Christopherson <seanjc@google.com>, Yan Zhao <yan.y.zhao@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, pbonzini@redhat.com,
	mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com,
	rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com,
	david@redhat.com
Subject: Re: [RFC PATCH v2 5/5] KVM: Unmap pages only when it's indeed protected for NUMA migration
Date: Wed, 16 Aug 2023 10:43:44 +0800	[thread overview]
Message-ID: <107cdaaf-237f-16b9-ebe2-7eefd2b21f8f@loongson.cn> (raw)
In-Reply-To: <ZNuQ0grC44Dbh5hS@google.com>



在 2023/8/15 22:50, Sean Christopherson 写道:
> On Tue, Aug 15, 2023, Yan Zhao wrote:
>> On Mon, Aug 14, 2023 at 09:40:44AM -0700, Sean Christopherson wrote:
>>>>> Note, I'm assuming secondary MMUs aren't allowed to map swap entries...
>>>>>
>>>>> Compile tested only.
>>>>
>>>> I don't find a matching end to each
>>>> mmu_notifier_invalidate_range_start_nonblock().
>>>
>>> It pairs with existing call to mmu_notifier_invalidate_range_end() in change_pmd_range():
>>>
>>> 	if (range.start)
>>> 		mmu_notifier_invalidate_range_end(&range);
>> No, It doesn't work for mmu_notifier_invalidate_range_start() sent in change_pte_range(),
>> if we only want the range to include pages successfully set to PROT_NONE.
> 
> Precise invalidation was a non-goal for my hack-a-patch.  The intent was purely
> to defer invalidation until it was actually needed, but still perform only a
> single notification so as to batch the TLB flushes, e.g. the start() call still
> used the original @end.
> 
> The idea was to play nice with the scenario where nothing in a VMA could be migrated.
> It was complete untested though, so it may not have actually done anything to reduce
> the number of pointless invalidations.
For numa-balance scenery, can original page still be used by application even if pte
is changed with PROT_NONE?  If it can be used, maybe we can zap shadow mmu and flush tlb
in notification mmu_notifier_invalidate_range_end with precised range, the range can
be cross-range between range mmu_gather and mmu_notifier_range.

Regards
Bibo Mao
> 
>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>> index 9e4cd8b4a202..f29718a16211 100644
>>> --- a/arch/x86/kvm/mmu/mmu.c
>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>> @@ -4345,6 +4345,9 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>>>  	if (unlikely(!fault->slot))
>>>  		return kvm_handle_noslot_fault(vcpu, fault, access);
>>>  
>>> +	if (mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva))
>>> +		return RET_PF_RETRY;
>>> +
>> This can effectively reduce the remote flush IPIs a lot!
>> One Nit is that, maybe rmb() or READ_ONCE() is required for kvm->mmu_invalidate_range_start
>> and kvm->mmu_invalidate_range_end.
>> Otherwise, I'm somewhat worried about constant false positive and retry.
> 
> If anything, this needs a READ_ONCE() on mmu_invalidate_in_progress.  The ranges
> aren't touched when when mmu_invalidate_in_progress goes to zero, so ensuring they
> are reloaded wouldn't do anything.  The key to making forward progress is seeing
> that there is no in-progress invalidation.
> 
> I did consider adding said READ_ONCE(), but practically speaking, constant false
> positives are impossible.  KVM will re-enter the guest when retrying, and there
> is zero chance of the compiler avoiding reloads across VM-Enter+VM-Exit.
> 
> I suppose in theory we might someday differentiate between "retry because a different
> vCPU may have fixed the fault" and "retry because there's an in-progress invalidation",
> and not bother re-entering the guest for the latter, e.g. have it try to yield
> instead.  
> 
> All that said, READ_ONCE() on mmu_invalidate_in_progress should effectively be a
> nop, so it wouldn't hurt to be paranoid in this case.
> 
> Hmm, at that point, it probably makes sense to add a READ_ONCE() for mmu_invalidate_seq
> too, e.g. so that a sufficiently clever compiler doesn't completely optimize away
> the check.  Losing the check wouldn't be problematic (false negatives are fine,
> especially on that particular check), but the generated code would *look* buggy.


  reply	other threads:[~2023-08-16  2:44 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10  8:56 [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM Yan Zhao
2023-08-10  8:57 ` [RFC PATCH v2 1/5] mm/mmu_notifier: introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA Yan Zhao
2023-08-10  8:58 ` [RFC PATCH v2 2/5] mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate purpose Yan Zhao
2023-08-10  9:00 ` [RFC PATCH v2 3/5] mm/mmu_notifier: introduce a new callback .numa_protect Yan Zhao
2023-08-10  9:00 ` [RFC PATCH v2 4/5] mm/autonuma: call .numa_protect() when page is protected for NUMA migrate Yan Zhao
2023-08-10 13:45   ` kernel test robot
2023-08-10 13:55   ` kernel test robot
2023-08-11 18:52   ` Nadav Amit
2023-08-14  7:52     ` Yan Zhao
2023-08-10  9:02 ` [RFC PATCH v2 5/5] KVM: Unmap pages only when it's indeed protected for NUMA migration Yan Zhao
2023-08-10 13:16   ` bibo mao
2023-08-11  3:45     ` Yan Zhao
2023-08-11  7:40       ` bibo mao
2023-08-11  8:01         ` Yan Zhao
2023-08-11 17:14           ` Sean Christopherson
2023-08-11 17:18             ` Jason Gunthorpe
2023-08-14  6:52             ` Yan Zhao
2023-08-14  7:44               ` Yan Zhao
2023-08-14 16:40               ` Sean Christopherson
2023-08-15  1:54                 ` Yan Zhao
2023-08-15 14:50                   ` Sean Christopherson
2023-08-16  2:43                     ` bibo mao [this message]
2023-08-16  3:44                       ` bibo mao
2023-08-16  5:14                         ` Yan Zhao
2023-08-16  7:29                           ` bibo mao
2023-08-16  7:18                             ` Yan Zhao
2023-08-16  7:53                               ` bibo mao
2023-08-16 13:39                                 ` Sean Christopherson
2023-08-10 15:19   ` kernel test robot
2023-08-10  9:34 ` [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM David Hildenbrand
2023-08-10  9:50   ` Yan Zhao
2023-08-11 17:25     ` David Hildenbrand
2023-08-11 18:20       ` John Hubbard
2023-08-11 18:39         ` David Hildenbrand
2023-08-11 19:35           ` John Hubbard
2023-08-14  9:09             ` Yan Zhao
2023-08-15  2:34               ` John Hubbard
2023-08-16  7:43                 ` David Hildenbrand
2023-08-16  9:06                   ` Yan Zhao
2023-08-16  9:49                     ` David Hildenbrand
2023-08-16 18:00                       ` John Hubbard
2023-08-17  5:05                         ` Yan Zhao
2023-08-17  7:38                           ` David Hildenbrand
2023-08-18  0:13                             ` Yan Zhao
2023-08-18  2:29                               ` John Hubbard
2023-09-04  9:18                                 ` Yan Zhao
2023-08-15  2:36               ` Yuan Yao
2023-08-15  2:37                 ` Yan Zhao
2023-08-10 13:58 ` Chao Gao
2023-08-11  5:22   ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=107cdaaf-237f-16b9-ebe2-7eefd2b21f8f@loongson.cn \
    --to=maobibo@loongson.cn \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=pbonzini@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.