All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Alexander Graf <agraf@suse.de>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
Date: Thu, 9 Aug 2012 22:33:00 -0300	[thread overview]
Message-ID: <20120810013300.GA17204@amt.cnet> (raw)
In-Reply-To: <20120810012532.GA15142@amt.cnet>

On Thu, Aug 09, 2012 at 10:25:32PM -0300, Marcelo Tosatti wrote:
> On Fri, Aug 10, 2012 at 10:34:39AM +1000, Paul Mackerras wrote:
> > On Thu, Aug 09, 2012 at 03:16:12PM -0300, Marcelo Tosatti wrote:
> > 
> > > The !memslot->npages case is handled in __kvm_set_memory_region
> > > (please read that part, before kvm_arch_prepare_memory_region() call).
> > > 
> > > kvm_arch_flush_shadow should be implemented.
> > 
> > Book3S HV doesn't have shadow page tables per se, rather the hardware
> > page table is under the control of the hypervisor (i.e. KVM), and
> > entries are added and removed by the guest using hypercalls.  On
> > recent machines (POWER7) the hypervisor can choose whether or not to
> > have the hardware PTE point to a real page of memory; if it doesn't,
> > access by the guest will trap to the hypervisor.  On older machines
> > (PPC970) we don't have that flexibility, and we have to provide a real
> > page of memory (i.e. RAM or I/O) behind every hardware PTE.  (This is
> > because PPC970 provides no way for page faults in the guest to go to
> > the hypervisor.)
> > 
> > I could implement kvm_arch_flush_shadow to remove the backing pages
> > behind every hardware PTE, but that would be very slow and inefficient
> > on POWER7, and would break the guest on PPC970, particularly in the
> > case where userspace is removing a small memory slot containing some
> > I/O device and leaving the memory slot for system RAM untouched.
> > 
> > So the reason for unmapping the hardware PTEs in
> > kvm_arch_prepare_memory_region rather than kvm_arch_flush_shadow is
> > that that way we know which memslot is going away.
> > 
> > What exactly are the semantics of kvm_arch_flush_shadow?  
> 
> It removes all translations mapped via memslots. Its used in cases where
> the translations become stale, or during shutdown.
> 
> > I presume that on x86 with NPT/EPT it basically does nothing - is that right?
> 
> It does, it removes all NPT/EPT ptes (named "sptes" in arch/x86/kvm/).
> The translations are rebuilt on demand (when accesses by the guest fault
> into the HV).
> 
> > > > +	if (old->npages) {
> > > > +		/* modifying guest_phys or flags */
> > > > +		if (old->base_gfn != memslot->base_gfn)
> > > > +			kvmppc_unmap_memslot(kvm, old);
> > > 
> > > This case is also handled generically by the last kvm_arch_flush_shadow
> > > call in __kvm_set_memory_region.
> > 
> > Again, to use this we would need to know which memslot we're
> > flushing.  If we could change __kvm_set_memory_region to pass the
> > memslot for these kvm_arch_flush_shadow calls, then I could do as you
> > suggest.  (Though I would need to think carefully about what would
> > happen with guest invalidations of hardware PTEs in the interval
> > between the rcu_assign_pointer(kvm->memslots, slots) and the
> > kvm_arch_flush_shadow, and whether the invalidation would find the
> > correct location in the rmap array, given that we have updated the
> > base_gfn in the memslot without first getting rid of any references to
> > those pages in the hardware page table.)
> 
> That can be done.
> 
> I'll send a patch to flush per memslot in the next days, you can work
> out the PPC details in the meantime.
> 
> To be clear: this is necessary to have consistent behaviour across
> arches in the kvm_set_memory codepath which is tricky (not nitpicking).
> 
> Alternatively, kvm_arch_flush_shadow can be split into two methods (but
> thats not necessary if memslot information is sufficient for PPC).

What kvm_set_memory requires is that there are no stale translations. On
x86, it is cheaper to nuke all translation entries and let them be rebuilt on
pagefaults.

But, if necessary we can introduce a new callback
"kvm_arch_sync_shadow", which can be used by PPC to verify translations 
are valid, if it makes sense.

WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Alexander Graf <agraf@suse.de>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
Date: Fri, 10 Aug 2012 01:33:00 +0000	[thread overview]
Message-ID: <20120810013300.GA17204@amt.cnet> (raw)
In-Reply-To: <20120810012532.GA15142@amt.cnet>

On Thu, Aug 09, 2012 at 10:25:32PM -0300, Marcelo Tosatti wrote:
> On Fri, Aug 10, 2012 at 10:34:39AM +1000, Paul Mackerras wrote:
> > On Thu, Aug 09, 2012 at 03:16:12PM -0300, Marcelo Tosatti wrote:
> > 
> > > The !memslot->npages case is handled in __kvm_set_memory_region
> > > (please read that part, before kvm_arch_prepare_memory_region() call).
> > > 
> > > kvm_arch_flush_shadow should be implemented.
> > 
> > Book3S HV doesn't have shadow page tables per se, rather the hardware
> > page table is under the control of the hypervisor (i.e. KVM), and
> > entries are added and removed by the guest using hypercalls.  On
> > recent machines (POWER7) the hypervisor can choose whether or not to
> > have the hardware PTE point to a real page of memory; if it doesn't,
> > access by the guest will trap to the hypervisor.  On older machines
> > (PPC970) we don't have that flexibility, and we have to provide a real
> > page of memory (i.e. RAM or I/O) behind every hardware PTE.  (This is
> > because PPC970 provides no way for page faults in the guest to go to
> > the hypervisor.)
> > 
> > I could implement kvm_arch_flush_shadow to remove the backing pages
> > behind every hardware PTE, but that would be very slow and inefficient
> > on POWER7, and would break the guest on PPC970, particularly in the
> > case where userspace is removing a small memory slot containing some
> > I/O device and leaving the memory slot for system RAM untouched.
> > 
> > So the reason for unmapping the hardware PTEs in
> > kvm_arch_prepare_memory_region rather than kvm_arch_flush_shadow is
> > that that way we know which memslot is going away.
> > 
> > What exactly are the semantics of kvm_arch_flush_shadow?  
> 
> It removes all translations mapped via memslots. Its used in cases where
> the translations become stale, or during shutdown.
> 
> > I presume that on x86 with NPT/EPT it basically does nothing - is that right?
> 
> It does, it removes all NPT/EPT ptes (named "sptes" in arch/x86/kvm/).
> The translations are rebuilt on demand (when accesses by the guest fault
> into the HV).
> 
> > > > +	if (old->npages) {
> > > > +		/* modifying guest_phys or flags */
> > > > +		if (old->base_gfn != memslot->base_gfn)
> > > > +			kvmppc_unmap_memslot(kvm, old);
> > > 
> > > This case is also handled generically by the last kvm_arch_flush_shadow
> > > call in __kvm_set_memory_region.
> > 
> > Again, to use this we would need to know which memslot we're
> > flushing.  If we could change __kvm_set_memory_region to pass the
> > memslot for these kvm_arch_flush_shadow calls, then I could do as you
> > suggest.  (Though I would need to think carefully about what would
> > happen with guest invalidations of hardware PTEs in the interval
> > between the rcu_assign_pointer(kvm->memslots, slots) and the
> > kvm_arch_flush_shadow, and whether the invalidation would find the
> > correct location in the rmap array, given that we have updated the
> > base_gfn in the memslot without first getting rid of any references to
> > those pages in the hardware page table.)
> 
> That can be done.
> 
> I'll send a patch to flush per memslot in the next days, you can work
> out the PPC details in the meantime.
> 
> To be clear: this is necessary to have consistent behaviour across
> arches in the kvm_set_memory codepath which is tricky (not nitpicking).
> 
> Alternatively, kvm_arch_flush_shadow can be split into two methods (but
> thats not necessary if memslot information is sufficient for PPC).

What kvm_set_memory requires is that there are no stale translations. On
x86, it is cheaper to nuke all translation entries and let them be rebuilt on
pagefaults.

But, if necessary we can introduce a new callback
"kvm_arch_sync_shadow", which can be used by PPC to verify translations 
are valid, if it makes sense.


  reply	other threads:[~2012-08-10  1:33 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-06 10:02 [PATCH 0/5] Improve memory slot handling and other fixes Paul Mackerras
2012-08-06 10:02 ` Paul Mackerras
2012-08-06 10:03 ` [PATCH 1/5] KVM: PPC: Book3S HV: Fix incorrect branch in H_CEDE code Paul Mackerras
2012-08-06 10:03   ` Paul Mackerras
2012-08-06 10:04 ` [PATCH 2/5] KVM: PPC: Quieten message about allocating linear regions Paul Mackerras
2012-08-06 10:04   ` Paul Mackerras
2012-08-06 10:06 ` [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly Paul Mackerras
2012-08-06 10:06   ` Paul Mackerras
2012-08-09 18:16   ` Marcelo Tosatti
2012-08-09 18:16     ` Marcelo Tosatti
2012-08-10  0:34     ` Paul Mackerras
2012-08-10  0:34       ` Paul Mackerras
2012-08-10  1:25       ` Marcelo Tosatti
2012-08-10  1:25         ` Marcelo Tosatti
2012-08-10  1:33         ` Marcelo Tosatti [this message]
2012-08-10  1:33           ` Marcelo Tosatti
2012-08-10  2:09         ` Takuya Yoshikawa
2012-08-10  2:09           ` Takuya Yoshikawa
2012-08-10 18:35           ` Marcelo Tosatti
2012-08-10 18:35             ` Marcelo Tosatti
2012-08-11  0:37             ` Paul Mackerras
2012-08-11  0:37               ` Paul Mackerras
2012-08-13 16:34               ` Marcelo Tosatti
2012-08-13 16:34                 ` Marcelo Tosatti
2012-08-13 22:04                 ` Marcelo Tosatti
2012-08-13 22:04                   ` Marcelo Tosatti
2012-08-15  9:26                   ` Avi Kivity
2012-08-15  9:26                     ` Avi Kivity
2012-08-15 17:59                     ` Marcelo Tosatti
2012-08-15 17:59                       ` Marcelo Tosatti
2012-08-17  7:06                       ` Benjamin Herrenschmidt
2012-08-17  7:06                         ` Benjamin Herrenschmidt
2012-08-17 18:39                         ` Marcelo Tosatti
2012-08-17 18:39                           ` Marcelo Tosatti
2012-08-17 20:32                           ` Benjamin Herrenschmidt
2012-08-17 20:32                             ` Benjamin Herrenschmidt
2012-08-23 13:55                             ` Marcelo Tosatti
2012-08-23 13:55                               ` Marcelo Tosatti
2012-08-24  9:29                               ` Paul Mackerras
2012-08-24  9:29                                 ` Paul Mackerras
2012-08-24 18:58                                 ` Marcelo Tosatti
2012-08-24 18:58                                   ` Marcelo Tosatti
2012-08-19  9:39                           ` Avi Kivity
2012-08-19  9:39                             ` Avi Kivity
2012-08-15  6:06                 ` Paul Mackerras
2012-08-15  6:06                   ` Paul Mackerras
2012-08-15  9:23                 ` Avi Kivity
2012-08-15  9:23                   ` Avi Kivity
2012-08-06 10:06 ` [PATCH 4/5] KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots Paul Mackerras
2012-08-06 10:06   ` Paul Mackerras
2012-08-09 18:22   ` Marcelo Tosatti
2012-08-09 18:22     ` Marcelo Tosatti
2012-08-10  0:45     ` Paul Mackerras
2012-08-10  0:45       ` Paul Mackerras
2012-08-06 10:08 ` [RFC PATCH 5/5] KVM: PPC: Take the SRCU lock around memslot use Paul Mackerras
2012-08-06 10:08   ` Paul Mackerras
2012-08-09 18:27   ` Marcelo Tosatti
2012-08-09 18:27     ` Marcelo Tosatti
2012-08-10  0:37     ` Paul Mackerras
2012-08-10  0:37       ` Paul Mackerras
2012-08-10  9:27       ` Alexander Graf
2012-08-10  9:27         ` Alexander Graf
2012-08-15  8:16         ` Benjamin Herrenschmidt
2012-08-15  8:16           ` Benjamin Herrenschmidt
2012-08-10  9:23 ` [PATCH 0/5] Improve memory slot handling and other fixes Alexander Graf
2012-08-10  9:23   ` Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120810013300.GA17204@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=agraf@suse.de \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.