From: Gary Fu <qfu@wavecomp.com>
To: Paul Burton <pburton@wavecomp.com>
Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>,
Archer Yan <ayan@wavecomp.com>, James Hogan <jhogan@kernel.org>
Subject: RE: [PATCH] KVM: Fix an issue in non-preemptible kernel.
Date: Thu, 5 Sep 2019 13:54:19 +0000 [thread overview]
Message-ID: <DM5PR22MB08601676A09479ECA9B43D6EC3BB0@DM5PR22MB0860.namprd22.prod.outlook.com> (raw)
In-Reply-To: <20190904135343.gbqfs4nlpnjvyfhc@pburton-laptop>
Hi Paul,
Thanks for your review.
Please see my comments below.
-----Original Message-----
From: Paul Burton <pburton@wavecomp.com>
Sent: Wednesday, September 4, 2019 10:03 PM
To: Gary Fu <qfu@wavecomp.com>
Cc: linux-mips@vger.kernel.org; Paul Burton <pburton@wavecomp.com>; Archer Yan <ayan@wavecomp.com>; James Hogan <jhogan@kernel.org>
Subject: Re: [PATCH] KVM: Fix an issue in non-preemptible kernel.
Hi Gary,
On Mon, Sep 02, 2019 at 09:02:32AM +0000, Gary Fu wrote:
> Add a cond_resched() to give the scheduler a chance to run madvise
> task to avoid endless loop here in non-preemptible kernel.
Thanks for the patch!
> Otherwise, the kvm_mmu_notifier would have no chance to be descreased
s/descreased/decreased/
(and in the comment too)
Thank you for your comments. I'll fix these typo errors.
> to 0 by madvise task -> syscall -> zap_page_range ->
> mmu_notifier_invalidate_range_end ->
> __mmu_notifier_invalidate_range_end -> invalidate_range_end ->
> kvm_mmu_notifier_invalidate_range_end, as the madvise task would be
> scheduled when running unmap_single_vma -> unmap_page_range ->
> zap_p4d_range -> zap_pud_range -> zap_pmd_range -> cond_resched which
> is called before mmu_notifier_invalidate_range_end in zap_page_range.
I'm not entirely sure I follow - could you clarify whether the task invoking the madvise syscall is related to the task using KVM?
Yes, the QEMU application invokes the madvise syscall with behavior param MADV_DONTNEED.
When handling GPA faults by creating a new GPA mapping in kvm_mips_map_page, it will be retrying to get available page. In the low memory case, it is waiting for the memory resources freed by madvise syscall with MADV_DONTNEED (QEMU application -> madvise with MADV_DONTNEED -> syscall -> madvise_vma -> madvise_dontneed_free -> madvise_dontneed_single_vma -> zap_page_range). In zap_page_range, after the TLB of given address range is cleared by unmap_single_vma, it will call __mmu_notifier_invalidate_range_end which finally calls kvm_mmu_notifier_invalidate_range_end to decrease mmu_notifier_count to 0. The retrying loop in kvm_mips_map_page checks the mmu_notifier_count and if the value is 0 which indicates that some new page is available for mapping, it will jump out the retrying loop and set up PTE for a new GPA mapping.
During the TLB clearing ( in unmap_single_vma in madvise syscall) mentioned above, it will call cond_resched() per PMD for avoiding occupying CPU for a long time (in case of huge page range zapping). When this happened in the non-preemptible kernel, the retrying loop in kvm_mips_map_page will be running endlessly as there is no chance to reschedule back to madvise syscall to run __mmu_notifier_invalidate_range_end to decrease mmu_notifier_count so that the value of mmu_notifier_count is always 1.
Adding a scheduling point before every retry in kvm_mips_map_page will give the madvise syscall (invoked by QEMU) a chance to be re-scheduled back to zap all the given pages and clear mmu_notifier_count value to let kvm_mips_map_page task jump out the loop.
> Signed-off-by: Gary Fu <qfu@wavecomp.com>
> ---
> arch/mips/kvm/mmu.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c index
> 97e538a8c1be..e52e63d225f4 100644
> --- a/arch/mips/kvm/mmu.c
> +++ b/arch/mips/kvm/mmu.c
> @@ -746,6 +746,22 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
> */
> spin_unlock(&kvm->mmu_lock);
> kvm_release_pfn_clean(pfn);
> + /*
> + * Add a cond_resched() to give the scheduler a chance to run
> + * madvise task to avoid endless loop here in non-preemptible
> + * kernel.
> + * Otherwise, the kvm_mmu_notifier would have no chance to be
> + * descreased to 0 by madvise task -> syscall -> zap_page_range
> + * -> mmu_notifier_invalidate_range_end ->
> + * __mmu_notifier_invalidate_range_end -> invalidate_range_end
> + * -> kvm_mmu_notifier_invalidate_range_end, as the madvise task
> + * would be scheduled when running unmap_single_vma ->
> + * unmap_page_range -> zap_p4d_range -> zap_pud_range ->
> + * zap_pmd_range -> cond_resched which is called before
> + * mmu_notifier_invalidate_range_end in zap_page_range.
> + */
> + if (need_resched())
> + cond_resched();
Can we remove the need_resched() check here? cond_resched() already checks should_resched(0) which tests the same thread-info flag as need_resched(). So we should be fine to just call cond_resched() unconditionally.
Yes, you're right. Just calling cond_resched() is enough. I'll update the patch.
Thanks,
Paul
> goto retry;
> }
>
> --
> 2.17.1
>
next prev parent reply other threads:[~2019-09-05 13:54 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-02 9:02 [PATCH] KVM: Fix an issue in non-preemptible kernel Gary Fu
2019-09-04 14:02 ` Paul Burton
2019-09-05 13:54 ` Gary Fu [this message]
2019-09-09 2:49 Gary Fu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM5PR22MB08601676A09479ECA9B43D6EC3BB0@DM5PR22MB0860.namprd22.prod.outlook.com \
--to=qfu@wavecomp.com \
--cc=ayan@wavecomp.com \
--cc=jhogan@kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=pburton@wavecomp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).