All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Avi Kivity <avi@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH v2 21/22] KVM: MMU: mmio page fault support
Date: Thu, 23 Jun 2011 11:19:26 +0800	[thread overview]
Message-ID: <4E02B0BE.7070003@cn.fujitsu.com> (raw)
In-Reply-To: <20110622215940.GA30064@amt.cnet>

Marcelo,

Thanks for your review!

On 06/23/2011 05:59 AM, Marcelo Tosatti wrote:

>>  static int is_large_pte(u64 pte)
>> @@ -2123,6 +2158,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>>  	u64 spte, entry = *sptep;
>>  	int ret = 0;
>>  
>> +	if (set_mmio_spte(sptep, gfn, pfn, pte_access))
>> +		return 0;
>> +
> 
> 
> Should zap all mmio sptes when creating new memory slots (current qemu
> never exhibits that pattern, but its not an unsupported scenario).
> Easier to zap all mmu in that case.
> 

OK, will do it in the next version.

> For shadow you need to remove mmio sptes on invlpg and all mmio sptes
> under CR3 (mmu_sync_roots), other than clearing the gva/gpa variables.
> 

Oh, kvm_mmu_pte_write and invlpg do not zap mmio spte, i will
fix these, thanks for your point it out.

For mmu_sync_roots, we properly do it in FNAME(sync_page) in this patch:

static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, unsigned access,
			   int *nr_present)
{
	if (unlikely(is_mmio_spte(*sptep))) {
		if (gfn != get_mmio_spte_gfn(*sptep)) {
			mmu_spte_clear_no_track(sptep);
			return true;
		}

		(*nr_present)++;
		mark_mmio_spte(sptep, gfn, access);
		return true;
	}

	return false;
}

> Otherwise you can move the mmio info from an mmio spte back to
> mmio_gva/mmio_gfn after a TLB flush, without rereading the guest
> pagetable.
> 

We do not read the guest page table when mmio page fault occurred,
we just do it as you say:
- Firstly, walk the shadow page table to get the mmio spte
- Then, cache the mmio spte info to mmio_gva/mmio_gfn

I missed your meaning?

>> +{
>> +	struct kvm_shadow_walk_iterator iterator;
>> +	u64 spte, ret = 0ull;
>> +
>> +	walk_shadow_page_lockless_begin(vcpu);
>> +	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
>> +		if (iterator.level == 1) {
>> +			ret = spte;
>> +			break;
>> +		}
> 
> Have you verified it is safe to walk sptes with parallel modifications
> to them? Other than shadow page deletion which is in theory covered by
> RCU. It would be good to have all cases written down.
> 

Um, i think it is safe, when spte is being write, we can get the old spte
or the new spte, both cases are ok for us: it is just like the page structure
cache on the real machine, the cpu can fetch the old spte even if the page
structure is changed by other cpus.

>> +
>> +		if (!is_shadow_present_pte(spte))
>> +			break;
>> +	}
>> +	walk_shadow_page_lockless_end(vcpu);
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>> + * If it is a real mmio page fault, return 1 and emulat the instruction
>> + * directly, return 0 if it needs page fault path to fix it, -1 is
>> + * returned if bug is detected.
>> + */
>> +int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
>> +{
>> +	u64 spte;
>> +
>> +	if (quickly_check_mmio_pf(vcpu, addr, direct))
>> +		return 1;
>> +
>> +	spte = walk_shadow_page_get_mmio_spte(vcpu, addr);
>> +
>> +	if (is_mmio_spte(spte)) {
>> +		gfn_t gfn = get_mmio_spte_gfn(spte);
>> +		unsigned access = get_mmio_spte_access(spte);
>> +
>> +		if (direct)
>> +			addr = 0;
>> +		vcpu_cache_mmio_info(vcpu, addr, gfn, access);
>> +		return 1;
>> +	}
>> +
>> +	/*
>> +	 * It's ok if the gva is remapped by other cpus on shadow guest,
>> +	 * it's a BUG if the gfn is not a mmio page.
>> +	 */
> 
> If the gva is remapped there should be no mmio page fault in the first
> place.
> 

For example, as follow case:

VCPU 0                                     VCPU 1
gva is mapped to a mmio region
                                        access gva
remap gva to other page
                                       emulate mmio access by kvm
                                       (*the point we are discussing*)
flush all tlb

In theory, it can happen.

>> +	if (direct && is_shadow_present_pte(spte))
>> +		return -1;
> 
> An spte does not have to contain the present bit to generate a valid EPT
> misconfiguration (and an spte dump is still required in that case).
> Use !is_mmio_spte() instead.
> 

We can not use !is_mmio_spte() here, since the shadow page can be zapped anytime,
for example:

sp.spt[i] = mmio-spte

          VCPU 0                                    VCPU 1    
Access sp.spte[i], ept misconfig is occurred
                                                   delete sp
                                   (if the number of shadow page is out of the limit
                                    or page shrink is required, and other events...)

Walk shadow page out of the lock and get the
non-present spte
(*the point we are discussing*)

So, the bug we can detect is: it is the mmio access but the spte is point to the normal
page.

> 
>> +
>> +	/*
>> +	 * If the page table is zapped by other cpus, let the page
>> +	 * fault path to fix it.
>> +	 */
>> +	return 0;
>> +}
> 
> I don't understand when would this happen, can you please explain?
> 

The case is above :-)

  reply	other threads:[~2011-06-23  3:17 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-22 14:27 [PATCH v2 0/22] KVM: optimize for MMIO handled Xiao Guangrong
2011-06-22 14:28 ` [PATCH v2 01/22] KVM: MMU: fix walking shadow page table Xiao Guangrong
2011-06-22 17:13   ` Marcelo Tosatti
2011-06-23  2:05     ` Xiao Guangrong
2011-06-27  6:35     ` Xiao Guangrong
2011-06-22 14:28 ` [PATCH v2 02/22] KVM: MMU: do not update slot bitmap if spte is nonpresent Xiao Guangrong
2011-06-22 14:29 ` [PATCH v2 03/22] KVM: x86: fix broken read emulation spans a page boundary Xiao Guangrong
2011-06-29  8:21   ` Avi Kivity
2011-06-29 10:53     ` Xiao Guangrong
2011-06-29 11:19       ` Avi Kivity
2011-06-22 14:29 ` [PATCH v2 04/22] KVM: x86: introduce vcpu_gva_to_gpa to cleanup the code Xiao Guangrong
2011-06-29  8:24   ` Avi Kivity
2011-06-29 10:56     ` Xiao Guangrong
2011-06-29 11:09       ` Avi Kivity
2011-06-29 11:26         ` Xiao Guangrong
2011-06-29 11:26           ` Avi Kivity
2011-06-29 11:48             ` Gleb Natapov
2011-06-22 14:30 ` [PATCH v2 05/22] KVM: x86: abstract the operation for read/write emulation Xiao Guangrong
2011-06-29  8:37   ` Avi Kivity
2011-06-29 10:59     ` Xiao Guangrong
2011-06-22 14:30 ` [PATCH v2 06/22] KVM: x86: cleanup the code of " Xiao Guangrong
2011-06-22 14:31 ` [PATCH v2 07/22] KVM: MMU: cache mmio info on page fault path Xiao Guangrong
2011-06-29  8:48   ` Avi Kivity
2011-06-29 11:09     ` Xiao Guangrong
2011-06-29 11:10       ` Avi Kivity
2011-06-22 14:31 ` [PATCH v2 08/22] KVM: MMU: optimize to handle dirty bit Xiao Guangrong
2011-06-22 14:31 ` [PATCH v2 09/22] KVM: MMU: cleanup for FNAME(fetch) Xiao Guangrong
2011-06-22 14:32 ` [PATCH v2 10/22] KVM: MMU: rename 'pt_write' to 'emulate' Xiao Guangrong
2011-06-22 14:32 ` [PATCH v2 11/22] KVM: MMU: count used shadow pages on prepareing path Xiao Guangrong
2011-06-22 14:32 ` [PATCH v2 12/22] KVM: MMU: split kvm_mmu_free_page Xiao Guangrong
2011-06-22 14:33 ` [PATCH v2 13/22] KVM: MMU: remove bypass_guest_pf Xiao Guangrong
2011-06-22 14:33 ` [PATCH v2 14/22] KVM: MMU: filter out the mmio pfn from the fault pfn Xiao Guangrong
2011-06-22 14:34 ` [PATCH v2 15/22] KVM: MMU: abstract some functions to handle " Xiao Guangrong
2011-06-22 14:34 ` [PATCH v2 16/22] KVM: MMU: introduce the rules to modify shadow page table Xiao Guangrong
2011-06-22 14:34 ` [PATCH v2 17/22] KVM: MMU: clean up spte updating and clearing Xiao Guangrong
2011-06-22 14:35 ` [PATCH 18/22] KVM: MMU: do not need atomicly to set/clear spte Xiao Guangrong
2011-06-22 14:35 ` [PATCH v2 19/22] KVM: MMU: lockless walking shadow page table Xiao Guangrong
2011-06-29  9:16   ` Avi Kivity
2011-06-29 11:16     ` Xiao Guangrong
2011-06-29 11:18       ` Avi Kivity
2011-06-29 11:50         ` Xiao Guangrong
2011-06-29 12:18           ` Avi Kivity
2011-06-29 12:28             ` Xiao Guangrong
2011-06-29 12:27               ` Avi Kivity
2011-06-29 12:39                 ` Xiao Guangrong
2011-06-29 13:01                   ` Avi Kivity
2011-06-29 13:05                     ` Xiao Guangrong
2011-06-22 14:35 ` [PATCH v2 20/22] KVM: MMU: reorganize struct kvm_shadow_walk_iterator Xiao Guangrong
2011-06-22 14:36 ` [PATCH v2 21/22] KVM: MMU: mmio page fault support Xiao Guangrong
2011-06-22 21:59   ` Marcelo Tosatti
2011-06-23  3:19     ` Xiao Guangrong [this message]
2011-06-23  6:40       ` Xiao Guangrong
2011-06-23 14:21       ` Marcelo Tosatti
2011-06-23 17:55         ` Xiao Guangrong
2011-06-23 20:13           ` Marcelo Tosatti
2011-06-24  2:04             ` Xiao Guangrong
2011-06-26  8:42           ` Avi Kivity
2011-06-27 11:00   ` [PATCH v3 " Xiao Guangrong
2011-06-29  9:22   ` [PATCH v2 " Avi Kivity
2011-06-29 12:28     ` Xiao Guangrong
2011-06-22 14:36 ` [PATCH v2 22/22] KVM: MMU: trace mmio page fault Xiao Guangrong
2011-06-29  9:23 ` [PATCH v2 0/22] KVM: optimize for MMIO handled Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E02B0BE.7070003@cn.fujitsu.com \
    --to=xiaoguangrong@cn.fujitsu.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.