All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH v3 5/9] KVM: MMU: introduce SPTE_WRITE_PROTECT bit
Date: Fri, 20 Apr 2012 21:55:55 -0300	[thread overview]
Message-ID: <20120421005555.GA16526@amt.cnet> (raw)
In-Reply-To: <20120421004030.GA16191@amt.cnet>

On Fri, Apr 20, 2012 at 09:40:30PM -0300, Marcelo Tosatti wrote:
> On Fri, Apr 20, 2012 at 06:52:11PM -0300, Marcelo Tosatti wrote:
> > On Fri, Apr 20, 2012 at 04:19:17PM +0800, Xiao Guangrong wrote:
> > > If this bit is set, it means the W bit of the spte is cleared due
> > > to shadow page table protection
> > > 
> > > Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> > > ---
> > >  arch/x86/kvm/mmu.c |   56 ++++++++++++++++++++++++++++++++++-----------------
> > >  1 files changed, 37 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > > index dd984b6..eb02fc4 100644
> > > --- a/arch/x86/kvm/mmu.c
> > > +++ b/arch/x86/kvm/mmu.c
> > > @@ -147,6 +147,7 @@ module_param(dbg, bool, 0644);
> > > 
> > >  #define SPTE_HOST_WRITEABLE	(1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
> > >  #define SPTE_ALLOW_WRITE	(1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))
> > > +#define SPTE_WRITE_PROTECT	(1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 2))
> > > 
> > >  #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
> > > 
> > > @@ -1042,36 +1043,51 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
> > >  		rmap_remove(kvm, sptep);
> > >  }
> > > 
> > > +static bool spte_wp_by_dirty_log(u64 spte)
> > > +{
> > > +	WARN_ON(is_writable_pte(spte));
> > > +
> > > +	return (spte & SPTE_ALLOW_WRITE) && !(spte & SPTE_WRITE_PROTECT);
> > > +}
> > 
> > Is the information accurate? Say:
> > 
> > - dirty log write protect, set SPTE_ALLOW_WRITE, clear WRITABLE.
> > - shadow gfn, rmap_write_protect finds page not WRITABLE.
> > - spte points to shadow gfn, but SPTE_WRITE_PROTECT is not set.
> > 
> > BTW,
> > 
> > "introduce SPTE_ALLOW_WRITE bit
> > 
> > This bit indicates whether the spte is allow to be writable that
> > means the gpte of this spte is writable and the pfn pointed by
> > this spte is writable on host"
> > 
> > Other than the fact that each bit should have one meaning, how
> > can this bit be accurate without write protection of the gpte?
> > 
> > As soon as guest writes to gpte, information in bit is outdated.
> 
> Ok, i found one example where mmu_lock was expecting sptes not 
> to change:
> 
> 
> VCPU0				VCPU1
> 
> - read-only gpte
> - read-only spte
> - write fault
> - spte = *sptep
> 				guest write to gpte, set writable bit
> 				spte writable
> 				parent page unsync
> 				guest write to gpte writable bit clear
> 				guest invlpg updates spte to RO
> 				sync_page
> 				enter set_spte from sync_page
> - cmpxchg(spte) is now writable
> [window where another vcpu can
> cache spte with writable bit
> set]
> 
> 				if (is_writable_pte(entry) && !is_writable_pte(*sptep))
> 					kvm_flush_remote_tlbs(vcpu->kvm);
> 
> The flush is not executed because spte was read-only (which is 
> a correct assumption as long as sptes updates are protected
> by mmu_lock).
> 
> So this is an example of implicit assumptions which break if you update
> spte without mmu_lock. Certainly there are more cases. :(

OK, i now see you mentioned a similar case in the document, for
rmap_write_protect.

More importantly than the particular flush TLB case, the point is
every piece of code that reads and writes sptes must now be aware that
mmu_lock alone does not guarantee stability. Everything must be audited.

Where the bulk of the improvement comes from again? If there is little
or no mmu_lock contention (which we have no consistent data to be honest
in your testcase) is the bouncing off mmu_lock's cacheline that hurts?


  reply	other threads:[~2012-04-21  0:59 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-20  8:16 [PATCH v3 0/9] KVM: MMU: fast page fault Xiao Guangrong
2012-04-20  8:17 ` [PATCH v3 1/9] KVM: MMU: return bool in __rmap_write_protect Xiao Guangrong
2012-04-20  8:17 ` [PATCH v3 2/9] KVM: MMU: abstract spte write-protect Xiao Guangrong
2012-04-20 21:33   ` Marcelo Tosatti
2012-04-21  1:10     ` Takuya Yoshikawa
2012-04-21  4:34       ` Xiao Guangrong
2012-04-21  3:24     ` Xiao Guangrong
2012-04-21  4:18       ` Marcelo Tosatti
2012-04-21  6:52         ` Xiao Guangrong
2012-04-20  8:18 ` [PATCH v3 3/9] KVM: VMX: export PFEC.P bit on ept Xiao Guangrong
2012-04-20  8:18 ` [PATCH v3 4/9] KVM: MMU: introduce SPTE_ALLOW_WRITE bit Xiao Guangrong
2012-04-20 21:39   ` Marcelo Tosatti
2012-04-21  3:30     ` Xiao Guangrong
2012-04-21  4:22       ` Marcelo Tosatti
2012-04-21  6:55         ` Xiao Guangrong
2012-04-22 15:12         ` Avi Kivity
2012-04-23  7:24           ` Xiao Guangrong
2012-04-20  8:19 ` [PATCH v3 5/9] KVM: MMU: introduce SPTE_WRITE_PROTECT bit Xiao Guangrong
2012-04-20 21:52   ` Marcelo Tosatti
2012-04-21  0:40     ` Marcelo Tosatti
2012-04-21  0:55       ` Marcelo Tosatti [this message]
2012-04-21  1:38         ` Takuya Yoshikawa
2012-04-21  4:29         ` Xiao Guangrong
2012-04-21  4:00       ` Xiao Guangrong
2012-04-24  0:45         ` Marcelo Tosatti
2012-04-24  3:34           ` Xiao Guangrong
2012-04-21  3:47     ` Xiao Guangrong
2012-04-21  4:38       ` Marcelo Tosatti
2012-04-21  7:25         ` Xiao Guangrong
2012-04-24  0:24           ` Marcelo Tosatti
2012-04-20  8:19 ` [PATCH v3 6/9] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
2012-04-20  8:20 ` [PATCH v3 7/9] KVM: MMU: trace fast " Xiao Guangrong
2012-04-20  8:20 ` [PATCH v3 8/9] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint Xiao Guangrong
2012-04-20  8:21 ` [PATCH v3 9/9] KVM: MMU: document mmu-lock and fast page fault Xiao Guangrong
2012-04-21  0:59 ` [PATCH v3 0/9] KVM: MMU: " Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120421005555.GA16526@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.