From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
LKML <linux-kernel@vger.kernel.org>, KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH 00/13] KVM: MMU: fast page fault
Date: Thu, 29 Mar 2012 19:40:51 +0800 [thread overview]
Message-ID: <4F744A43.4060600@linux.vnet.ibm.com> (raw)
In-Reply-To: <4F7436FB.9000004@redhat.com>
On 03/29/2012 06:18 PM, Avi Kivity wrote:
> On 03/29/2012 11:20 AM, Xiao Guangrong wrote:
>> * Idea
>> The present bit of page fault error code (EFEC.P) indicates whether the
>> page table is populated on all levels, if this bit is set, we can know
>> the page fault is caused by the page-protection bits (e.g. W/R bit) or
>> the reserved bits.
>>
>> In KVM, in most cases, all this kind of page fault (EFEC.P = 1) can be
>> simply fixed: the page fault caused by reserved bit
>> (EFFC.P = 1 && EFEC.RSV = 1) has already been filtered out in fast mmio
>> path. What we need do to fix the rest page fault (EFEC.P = 1 && RSV != 1)
>> is just increasing the corresponding access on the spte.
>>
>> This pachset introduces a fast path to fix this kind of page fault: it
>> is out of mmu-lock and need not walk host page table to get the mapping
>> from gfn to pfn.
>
> Wow!
>
> Looks like interesting times are back in mmu-land.
>
:)
> Comments below are before review of actual patches, so maybe they're
> already answered there, or maybe they're just nonsense.
>
Your comments are always appreciated!
>> * Implementation
>> We can freely walk the page between walk_shadow_page_lockless_begin and
>> walk_shadow_page_lockless_end, it can ensure all the shadow page is valid.
>>
>> In the most case, cmpxchg is fair enough to change the access bit of spte,
>> but the write-protect path on softmmu/nested mmu is a especial case: it is
>> a read-check-modify path: read spte, check W bit, then clear W bit.
>
> We also set gpte.D and gpte.A, no? How do you handle that?
>
We still need walk gust page table before fast page fault to check
whether the access is valid.
>> In order
>> to avoid marking spte writable after/during page write-protect, we do the
>> trick like below:
>>
>> fast page fault path:
>> lock RCU
>> set identification in the spte
>
> What if you can't (already taken)? Spin? Slow path?
In this patch, it allows to concurrently access on the same spte:
it freely set its identification on the spte, because i did not
want to introduce other atomic operations.
You remind me that there may be a risk: if many vcpu fault on the
same spte, it will retry the spte forever. Hmm, how about fix it
like this:
if ( spte.identification = 0) {
set spte.identification = vcpu.id
goto cmpxchg-path
}
if (spte.identification == vcpu.id)
goto cmpxchg-path
return to guest and retry the address again;
cmpxchg-path:
do checks and cmpxchg
It can ensure the spte can be updated.
>> The identification should be unique to avoid the below race:
>>
>> VCPU 0 VCPU 1 VCPU 2
>> lock RCU
>> spte + identification
>> check conditions
>> do write-protect, clear
>> identification
>> lock RCU
>> set identification
>> cmpxchg + w - identification
>> OOPS!!!
>
> Is it not sufficient to use just two bits?
>
> pf_lock - taken by page fault path
> wp_lock - taken by write protect path
>
> pf cmpxchg checks both bits.
>
If we just use two byte as identification, it has to use atomic
operations to maintain these bits? or i misunderstood?
>> - For ept:
>> $ x11perfcomp baseline-hard optimaze-hard
>> 1: baseline-hard
>> 2: optimaze-hard
>>
>> 1 2 Operation
>> -------- -------- ---------
>> 7060.0 7150.0 Composite 500x500 from pixmap to window
>>
>> - For shadow mmu:
>> $ x11perfcomp baseline-soft optimaze-soft
>> 1: baseline-soft
>> 2: optimaze-soft
>>
>> 1 2 Operation
>> -------- -------- ---------
>> 6980.0 7490.0 Composite 500x500 from pixmap to window
>>
>> ( It is interesting that after this patch, the performance of x11perf on
>> softmmu is better than it on hardmmu, i have tested it for many times,
>> it is really true. :) )
>
> It could be because you cannot use THP with dirty logging, so you pay
> the overhead of TDP.
>
Yes, i think so.
>> Any comments are welcome. :)
>>
>
> Very impressive. Now to review the patches (will take me some time).
>
Thank you, Avi!
next prev parent reply other threads:[~2012-03-29 11:41 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-29 9:20 [PATCH 00/13] KVM: MMU: fast page fault Xiao Guangrong
2012-03-29 9:20 ` [PATCH 01/13] KVM: MMU: properly assert spte on rmap_next path Xiao Guangrong
2012-03-29 9:21 ` [PATCH 02/13] KVM: MMU: abstract spte write-protect Xiao Guangrong
2012-03-29 11:11 ` Avi Kivity
2012-03-29 11:51 ` Xiao Guangrong
2012-03-29 9:22 ` [PATCH 03/13] KVM: MMU: split FNAME(prefetch_invalid_gpte) Xiao Guangrong
2012-03-29 13:00 ` Avi Kivity
2012-03-30 3:51 ` Xiao Guangrong
2012-03-29 9:22 ` [PATCH 04/13] KVM: MMU: introduce FNAME(get_sp_gpa) Xiao Guangrong
2012-03-29 13:07 ` Avi Kivity
2012-03-30 5:01 ` Xiao Guangrong
2012-04-01 12:42 ` Avi Kivity
2012-03-29 9:23 ` [PATCH 05/13] KVM: MMU: reset shadow_mmio_mask Xiao Guangrong
2012-03-29 13:10 ` Avi Kivity
2012-03-29 15:28 ` Avi Kivity
2012-03-29 16:24 ` Avi Kivity
2012-03-29 9:23 ` [PATCH 06/13] KVM: VMX: export PFEC.P bit on ept Xiao Guangrong
2012-03-29 9:24 ` [PATCH 07/13] KVM: MMU: store more bits in rmap Xiao Guangrong
2012-03-29 9:25 ` [PATCH 08/13] KVM: MMU: fask check whether page is writable Xiao Guangrong
2012-03-29 15:49 ` Avi Kivity
2012-03-30 5:10 ` Xiao Guangrong
2012-04-01 15:52 ` Avi Kivity
2012-04-05 17:54 ` Xiao Guangrong
2012-04-12 23:08 ` Marcelo Tosatti
2012-04-13 10:26 ` Xiao Guangrong
2012-03-29 9:25 ` [PATCH 09/13] KVM: MMU: get expected spte out of mmu-lock Xiao Guangrong
2012-04-01 15:53 ` Avi Kivity
2012-04-05 18:25 ` Xiao Guangrong
2012-04-09 12:28 ` Avi Kivity
2012-04-09 13:16 ` Takuya Yoshikawa
2012-04-09 13:21 ` Avi Kivity
2012-03-29 9:26 ` [PATCH 10/13] KVM: MMU: store vcpu id in spte to notify page write-protect path Xiao Guangrong
2012-03-29 9:27 ` [PATCH 11/13] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
2012-03-31 12:24 ` Xiao Guangrong
2012-04-01 16:23 ` Avi Kivity
2012-04-03 13:04 ` Avi Kivity
2012-04-05 19:39 ` Xiao Guangrong
2012-03-29 9:27 ` [PATCH 12/13] KVM: MMU: trace fast " Xiao Guangrong
2012-03-29 9:28 ` [PATCH 13/13] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint Xiao Guangrong
2012-03-29 10:18 ` [PATCH 00/13] KVM: MMU: fast page fault Avi Kivity
2012-03-29 11:40 ` Xiao Guangrong [this message]
2012-03-29 12:57 ` Avi Kivity
2012-03-30 9:18 ` Xiao Guangrong
2012-03-31 13:12 ` Xiao Guangrong
2012-04-01 12:58 ` Avi Kivity
2012-04-05 21:57 ` Xiao Guangrong
2012-04-06 5:24 ` Xiao Guangrong
2012-04-09 13:20 ` Avi Kivity
2012-04-09 13:59 ` Xiao Guangrong
2012-04-09 13:12 ` Avi Kivity
2012-04-09 13:55 ` Xiao Guangrong
2012-04-09 14:01 ` Xiao Guangrong
2012-04-09 14:25 ` Avi Kivity
2012-04-09 17:58 ` Marcelo Tosatti
2012-04-09 18:13 ` Xiao Guangrong
2012-04-09 19:31 ` Marcelo Tosatti
2012-04-09 18:26 ` Xiao Guangrong
2012-04-09 19:46 ` Marcelo Tosatti
2012-04-10 3:06 ` Xiao Guangrong
2012-04-10 10:04 ` Avi Kivity
2012-04-11 1:47 ` Marcelo Tosatti
2012-04-11 9:15 ` Avi Kivity
2012-04-10 10:39 ` Avi Kivity
2012-04-10 11:40 ` Takuya Yoshikawa
2012-04-10 11:58 ` Xiao Guangrong
2012-04-11 12:15 ` Takuya Yoshikawa
2012-04-11 12:38 ` Xiao Guangrong
2012-04-11 14:14 ` Takuya Yoshikawa
2012-04-11 14:21 ` Avi Kivity
2012-04-11 22:26 ` Takuya Yoshikawa
2012-04-13 14:25 ` Takuya Yoshikawa
2012-04-15 9:32 ` Avi Kivity
2012-04-16 15:49 ` Takuya Yoshikawa
2012-04-16 16:02 ` Avi Kivity
2012-04-17 6:26 ` Xiao Guangrong
2012-04-17 7:51 ` Avi Kivity
2012-04-17 12:37 ` Takuya Yoshikawa
2012-04-17 12:41 ` Avi Kivity
2012-04-17 14:54 ` Takuya Yoshikawa
2012-04-17 14:56 ` Avi Kivity
2012-04-18 13:42 ` Takuya Yoshikawa
2012-04-17 6:16 ` Xiao Guangrong
2012-04-10 10:10 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F744A43.4060600@linux.vnet.ibm.com \
--to=xiaoguangrong@linux.vnet.ibm.com \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).