Linux-MIPS Archive on lore.kernel.org
 help / color / Atom feed
From: Gary Fu <qfu@wavecomp.com>
To: Paul Burton <pburton@wavecomp.com>
Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>,
	Archer Yan <ayan@wavecomp.com>, James Hogan <jhogan@kernel.org>
Subject: RE: [PATCH] KVM: Fix an issue in non-preemptible kernel.
Date: Thu, 5 Sep 2019 13:54:19 +0000
Message-ID: <DM5PR22MB08601676A09479ECA9B43D6EC3BB0@DM5PR22MB0860.namprd22.prod.outlook.com> (raw)
In-Reply-To: <20190904135343.gbqfs4nlpnjvyfhc@pburton-laptop>

Hi Paul,

Thanks for your review.
Please see my comments below.

-----Original Message-----
From: Paul Burton <pburton@wavecomp.com> 
Sent: Wednesday, September 4, 2019 10:03 PM
To: Gary Fu <qfu@wavecomp.com>
Cc: linux-mips@vger.kernel.org; Paul Burton <pburton@wavecomp.com>; Archer Yan <ayan@wavecomp.com>; James Hogan <jhogan@kernel.org>
Subject: Re: [PATCH] KVM: Fix an issue in non-preemptible kernel.

Hi Gary,

On Mon, Sep 02, 2019 at 09:02:32AM +0000, Gary Fu wrote:
> Add a cond_resched() to give the scheduler a chance to run madvise 
> task to avoid endless loop here in non-preemptible kernel.

Thanks for the patch!

> Otherwise, the kvm_mmu_notifier would have no chance to be descreased

s/descreased/decreased/
(and in the comment too)

Thank you for your comments. I'll fix these typo errors.

> to 0 by madvise task -> syscall -> zap_page_range -> 
> mmu_notifier_invalidate_range_end -> 
> __mmu_notifier_invalidate_range_end -> invalidate_range_end -> 
> kvm_mmu_notifier_invalidate_range_end, as the madvise task would be 
> scheduled when running unmap_single_vma -> unmap_page_range -> 
> zap_p4d_range -> zap_pud_range -> zap_pmd_range -> cond_resched which 
> is called before mmu_notifier_invalidate_range_end in zap_page_range.

I'm not entirely sure I follow - could you clarify whether the task invoking the madvise syscall is related to the task using KVM?

Yes, the QEMU application invokes the madvise syscall with behavior param MADV_DONTNEED.
When handling GPA faults by creating a new GPA mapping in kvm_mips_map_page, it will be retrying to get available page. In the low memory case, it is waiting for the memory resources freed by madvise syscall with MADV_DONTNEED (QEMU application -> madvise with MADV_DONTNEED -> syscall -> madvise_vma -> madvise_dontneed_free -> madvise_dontneed_single_vma -> zap_page_range). In zap_page_range, after the TLB of given address range is cleared by unmap_single_vma, it will call __mmu_notifier_invalidate_range_end which finally calls kvm_mmu_notifier_invalidate_range_end to decrease mmu_notifier_count to 0. The retrying loop in kvm_mips_map_page checks the mmu_notifier_count and if the value is 0 which indicates that some new page is available for mapping, it will jump out the retrying loop and set up PTE for a new GPA mapping.
During the TLB clearing ( in unmap_single_vma in madvise syscall) mentioned above, it will call cond_resched() per PMD for avoiding occupying CPU for a long time (in case of huge page range zapping). When this happened in the non-preemptible kernel, the retrying loop in kvm_mips_map_page will be running endlessly as there is no chance to reschedule back to madvise syscall to run __mmu_notifier_invalidate_range_end to decrease mmu_notifier_count so that the value of  mmu_notifier_count is always 1.
Adding a scheduling point before every retry in kvm_mips_map_page will give the madvise syscall (invoked by QEMU) a chance to be re-scheduled back to zap all the given pages and clear mmu_notifier_count value to let kvm_mips_map_page task jump out the loop.

> Signed-off-by: Gary Fu <qfu@wavecomp.com>
> ---
>  arch/mips/kvm/mmu.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c index 
> 97e538a8c1be..e52e63d225f4 100644
> --- a/arch/mips/kvm/mmu.c
> +++ b/arch/mips/kvm/mmu.c
> @@ -746,6 +746,22 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
>  		 */
>  		spin_unlock(&kvm->mmu_lock);
>  		kvm_release_pfn_clean(pfn);
> +		/*
> +		 * Add a cond_resched() to give the scheduler a chance to run
> +		 * madvise task to avoid endless loop here in non-preemptible
> +		 * kernel.
> +		 * Otherwise, the kvm_mmu_notifier would have no chance to be
> +		 * descreased to 0 by madvise task -> syscall -> zap_page_range
> +		 * -> mmu_notifier_invalidate_range_end ->
> +		 * __mmu_notifier_invalidate_range_end -> invalidate_range_end
> +		 * -> kvm_mmu_notifier_invalidate_range_end, as the madvise task
> +		 * would be scheduled when running unmap_single_vma ->
> +		 * unmap_page_range -> zap_p4d_range -> zap_pud_range ->
> +		 * zap_pmd_range -> cond_resched which is called before
> +		 * mmu_notifier_invalidate_range_end in zap_page_range.
> +		 */
> +		if (need_resched())
> +			cond_resched();

Can we remove the need_resched() check here? cond_resched() already checks should_resched(0) which tests the same thread-info flag as need_resched(). So we should be fine to just call cond_resched() unconditionally.

Yes, you're right. Just calling cond_resched() is enough. I'll update the patch.

Thanks,
    Paul

>  		goto retry;
>  	}
>  
> --
> 2.17.1
> 

  reply index

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-02  9:02 Gary Fu
2019-09-04 14:02 ` Paul Burton
2019-09-05 13:54   ` Gary Fu [this message]
2019-09-09  2:49 Gary Fu

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM5PR22MB08601676A09479ECA9B43D6EC3BB0@DM5PR22MB0860.namprd22.prod.outlook.com \
    --to=qfu@wavecomp.com \
    --cc=ayan@wavecomp.com \
    --cc=jhogan@kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=pburton@wavecomp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-MIPS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mips/0 linux-mips/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mips linux-mips/ https://lore.kernel.org/linux-mips \
		linux-mips@vger.kernel.org linux-mips@archiver.kernel.org
	public-inbox-index linux-mips

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-mips


AGPL code for this site: git clone https://public-inbox.org/ public-inbox