All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] fix bugs with stale or corrupt MMIO caches
@ 2014-08-29 10:31 Paolo Bonzini
  2014-08-29 10:31 ` [PATCH 1/3] KVM: do not bias the generation number in kvm_current_mmio_generation Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Paolo Bonzini @ 2014-08-29 10:31 UTC (permalink / raw)
  To: linux-kernel, kvm

David and Xiao, here's my take on the MMIO generation patches.  Now
with documentation, too. :)  Please review!

David Matlack (2):
  kvm: fix potentially corrupt mmio cache
  kvm: x86: fix stale mmio cache bug

Paolo Bonzini (1):
  KVM: do not bias the generation number in kvm_current_mmio_generation

 Documentation/virtual/kvm/mmu.txt | 14 ++++++++++++++
 arch/x86/include/asm/kvm_host.h   |  1 +
 arch/x86/kvm/mmu.c                | 29 ++++++++++++++---------------
 arch/x86/kvm/x86.h                | 20 +++++++++++++++-----
 virt/kvm/kvm_main.c               | 30 +++++++++++++++++++++++-------
 5 files changed, 67 insertions(+), 27 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] KVM: do not bias the generation number in kvm_current_mmio_generation
  2014-08-29 10:31 [PATCH 0/3] fix bugs with stale or corrupt MMIO caches Paolo Bonzini
@ 2014-08-29 10:31 ` Paolo Bonzini
  2014-08-29 10:31 ` [PATCH 2/3] kvm: fix potentially corrupt mmio cache Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2014-08-29 10:31 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: stable

The next patch will give a meaning (a la seqcount) to the low bit of the
generation number.  Ensure that it matches between kvm->memslots->generation
and kvm_current_mmio_generation().

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c  | 7 +------
 virt/kvm/kvm_main.c | 7 +++++++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 931467881da7..323c3f5f5c84 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -236,12 +236,7 @@ static unsigned int get_mmio_spte_generation(u64 spte)
 
 static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
 {
-	/*
-	 * Init kvm generation close to MMIO_MAX_GEN to easily test the
-	 * code of handling generation number wrap-around.
-	 */
-	return (kvm_memslots(kvm)->generation +
-		      MMIO_MAX_GEN - 150) & MMIO_GEN_MASK;
+	return kvm_memslots(kvm)->generation & MMIO_GEN_MASK;
 }
 
 static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7176929a4cda..0bfdb673db26 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -477,6 +477,13 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!kvm->memslots)
 		goto out_err_no_srcu;
+
+	/*
+	 * Init kvm generation close to the maximum to easily test the
+	 * code of handling generation number wrap-around.
+	 */
+	kvm->memslots->generation = -150;
+
 	kvm_init_memslots_id(kvm);
 	if (init_srcu_struct(&kvm->srcu))
 		goto out_err_no_srcu;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] kvm: fix potentially corrupt mmio cache
  2014-08-29 10:31 [PATCH 0/3] fix bugs with stale or corrupt MMIO caches Paolo Bonzini
  2014-08-29 10:31 ` [PATCH 1/3] KVM: do not bias the generation number in kvm_current_mmio_generation Paolo Bonzini
@ 2014-08-29 10:31 ` Paolo Bonzini
  2014-09-02 16:44   ` David Matlack
  2014-08-29 10:31 ` [PATCH 3/3] kvm: x86: fix stale mmio cache bug Paolo Bonzini
  2014-09-02 15:42 ` [PATCH 0/3] fix bugs with stale or corrupt MMIO caches Paolo Bonzini
  3 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-08-29 10:31 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: David Matlack, stable, Xiao Guangrong

From: David Matlack <dmatlack@google.com>

vcpu exits and memslot mutations can run concurrently as long as the
vcpu does not aquire the slots mutex. Thus it is theoretically possible
for memslots to change underneath a vcpu that is handling an exit.

If we increment the memslot generation number again after
synchronize_srcu_expedited(), vcpus can safely cache memslot generation
without maintaining a single rcu_dereference through an entire vm exit.
And much of the x86/kvm code does not maintain a single rcu_dereference
of the current memslots during each exit.

We can prevent the following case:

   vcpu (CPU 0)                             | thread (CPU 1)
--------------------------------------------+--------------------------
1  vm exit                                  |
2  srcu_read_unlock(&kvm->srcu)             |
3  decide to cache something based on       |
     old memslots                           |
4                                           | change memslots
                                            | (increments generation)
5                                           | synchronize_srcu(&kvm->srcu);
6  retrieve generation # from new memslots  |
7  tag cache with new memslot generation    |
8  srcu_read_unlock(&kvm->srcu)             |
...                                         |
   <action based on cache occurs even       |
    though the caching decision was based   |
    on the old memslots>                    |
...                                         |
   <action *continues* to occur until next  |
    memslot generation change, which may    |
    be never>                               |
                                            |

By incrementing the generation after synchronizing with kvm->srcu readers,
we ensure that the generation retrieved in (6) will become invalid soon
after (8).

Keeping the existing increment is not strictly necessary, but we
do keep it and just move it for consistency from update_memslots to
install_new_memslots.  It invalidates old cached MMIOs immediately,
instead of having to wait for the end of synchronize_srcu_expedited,
which makes the code more clearly correct in case CPU 1 is preempted
right after synchronize_srcu() returns.

To avoid halving the generation space in SPTEs, always presume that the
low bit of the generation is zero when reconstructing a generation number
out of an SPTE.  This effectively disables MMIO caching in SPTEs during
the call to synchronize_srcu_expedited.  Using the low bit this way is
somewhat like a seqcount---where the protected thing is a cache, and
instead of retrying we can simply punt if we observe the low bit to be 1.

Cc: stable@vger.kernel.org
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virtual/kvm/mmu.txt | 14 ++++++++++++++
 arch/x86/kvm/mmu.c                | 20 ++++++++++++--------
 virt/kvm/kvm_main.c               | 23 ++++++++++++++++-------
 3 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index 290894176142..53838d9c6295 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -425,6 +425,20 @@ fault through the slow path.
 Since only 19 bits are used to store generation-number on mmio spte, all
 pages are zapped when there is an overflow.
 
+Unfortunately, a single memory access might access kvm_memslots(kvm) multiple
+times, the last one happening when the generation number is retrieved and
+stored into the MMIO spte.  Thus, the MMIO spte might be created based on
+out-of-date information, but with an up-to-date generation number.
+
+To avoid this, the generation number is incremented again after synchronize_srcu
+returns; thus, the low bit of kvm_memslots(kvm)->generation is only 1 during a
+memslot update, while some SRCU readers might be using the old copy.  We do not
+want to use an MMIO sptes created with an odd generation number, and we can do
+this without losing a bit in the MMIO spte.  The low bit of the generation
+is not stored in MMIO spte, and presumed zero when it is extracted out of the
+spte.  If KVM is unlucky and creates an MMIO spte while the low bit is 1,
+the next access to the spte will always be a cache miss.
+
 
 Further reading
 ===============
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 323c3f5f5c84..96515957ba82 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -199,16 +199,20 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask)
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
 
 /*
- * spte bits of bit 3 ~ bit 11 are used as low 9 bits of generation number,
- * the bits of bits 52 ~ bit 61 are used as high 10 bits of generation
- * number.
+ * the low bit of the generation number is always presumed to be zero.
+ * This disables mmio caching during memslot updates.  The concept is
+ * similar to a seqcount but instead of retrying the access we just punt
+ * and ignore the cache.
+ *
+ * spte bits 3-11 are used as bits 1-9 of the generation number,
+ * the bits 52-61 are used as bits 10-19 of the generation number.
  */
-#define MMIO_SPTE_GEN_LOW_SHIFT		3
+#define MMIO_SPTE_GEN_LOW_SHIFT		2
 #define MMIO_SPTE_GEN_HIGH_SHIFT	52
 
-#define MMIO_GEN_SHIFT			19
-#define MMIO_GEN_LOW_SHIFT		9
-#define MMIO_GEN_LOW_MASK		((1 << MMIO_GEN_LOW_SHIFT) - 1)
+#define MMIO_GEN_SHIFT			20
+#define MMIO_GEN_LOW_SHIFT		10
+#define MMIO_GEN_LOW_MASK		((1 << MMIO_GEN_LOW_SHIFT) - 2)
 #define MMIO_GEN_MASK			((1 << MMIO_GEN_SHIFT) - 1)
 #define MMIO_MAX_GEN			((1 << MMIO_GEN_SHIFT) - 1)
 
@@ -4428,7 +4432,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
 	 * The very rare case: if the generation-number is round,
 	 * zap all shadow pages.
 	 */
-	if (unlikely(kvm_current_mmio_generation(kvm) >= MMIO_MAX_GEN)) {
+	if (unlikely(kvm_current_mmio_generation(kvm) == 0)) {
 		printk_ratelimited(KERN_INFO "kvm: zapping shadow pages for mmio generation wraparound\n");
 		kvm_mmu_invalidate_zap_all_pages(kvm);
 	}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0bfdb673db26..bb8641b5d83b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -95,8 +95,6 @@ static int hardware_enable_all(void);
 static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
-static void update_memslots(struct kvm_memslots *slots,
-			    struct kvm_memory_slot *new, u64 last_generation);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
 static void mark_page_dirty_in_slot(struct kvm *kvm,
@@ -695,8 +693,7 @@ static void sort_memslots(struct kvm_memslots *slots)
 }
 
 static void update_memslots(struct kvm_memslots *slots,
-			    struct kvm_memory_slot *new,
-			    u64 last_generation)
+			    struct kvm_memory_slot *new)
 {
 	if (new) {
 		int id = new->id;
@@ -707,8 +704,6 @@ static void update_memslots(struct kvm_memslots *slots,
 		if (new->npages != npages)
 			sort_memslots(slots);
 	}
-
-	slots->generation = last_generation + 1;
 }
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
@@ -730,10 +725,24 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 {
 	struct kvm_memslots *old_memslots = kvm->memslots;
 
-	update_memslots(slots, new, kvm->memslots->generation);
+	/*
+	 * Set the low bit in the generation, which disables SPTE caching
+	 * until the end of synchronize_srcu_expedited.
+	 */
+	WARN_ON(old_memslots->generation & 1);
+	slots->generation = old_memslots->generation + 1;
+
+	update_memslots(slots, new);
 	rcu_assign_pointer(kvm->memslots, slots);
 	synchronize_srcu_expedited(&kvm->srcu);
 
+	/*
+	 * Increment the new memslot generation a second time. This prevents
+	 * vm exits that race with memslot updates from caching a memslot
+	 * generation that will (potentially) be valid forever.
+	 */
+	slots->generation++;
+
 	kvm_arch_memslots_updated(kvm);
 
 	return old_memslots;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] kvm: x86: fix stale mmio cache bug
  2014-08-29 10:31 [PATCH 0/3] fix bugs with stale or corrupt MMIO caches Paolo Bonzini
  2014-08-29 10:31 ` [PATCH 1/3] KVM: do not bias the generation number in kvm_current_mmio_generation Paolo Bonzini
  2014-08-29 10:31 ` [PATCH 2/3] kvm: fix potentially corrupt mmio cache Paolo Bonzini
@ 2014-08-29 10:31 ` Paolo Bonzini
  2014-09-02 15:42 ` [PATCH 0/3] fix bugs with stale or corrupt MMIO caches Paolo Bonzini
  3 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2014-08-29 10:31 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: David Matlack, stable, Xiao Guangrong

From: David Matlack <dmatlack@google.com>

The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:

(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.

(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.

(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.

This patch fixes the issue by using the memslot generation number
to validate the mmio cache.

[ xiaoguangrong: adjust the code to make it simpler for stable-tree fix. ]

Cc: stable@vger.kernel.org
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu.c              |  2 +-
 arch/x86/kvm/x86.h              | 20 +++++++++++++++-----
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 031e1dcea5a6..a9dc893dcf1b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -481,6 +481,7 @@ struct kvm_vcpu_arch {
 	u64 mmio_gva;
 	unsigned access;
 	gfn_t mmio_gfn;
+	u64 mmio_gen;
 
 	struct kvm_pmu pmu;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 96515957ba82..1cd2a5fbde07 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3162,7 +3162,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
 	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
 		return;
 
-	vcpu_clear_mmio_info(vcpu, ~0ul);
+	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
 	kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
 	if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 306a1b77581f..985fb2c006fa 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -88,15 +88,23 @@ static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
 	vcpu->arch.mmio_gva = gva & PAGE_MASK;
 	vcpu->arch.access = access;
 	vcpu->arch.mmio_gfn = gfn;
+	vcpu->arch.mmio_gen = kvm_memslots(vcpu->kvm)->generation;
+}
+
+static inline bool vcpu_match_mmio_gen(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.mmio_gen == kvm_memslots(vcpu->kvm)->generation;
 }
 
 /*
- * Clear the mmio cache info for the given gva,
- * specially, if gva is ~0ul, we clear all mmio cache info.
+ * Clear the mmio cache info for the given gva. If gva is MMIO_GVA_ANY, we
+ * clear all mmio cache info.
  */
+#define MMIO_GVA_ANY (~(gva_t)0)
+
 static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
 {
-	if (gva != (~0ul) && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
+	if (gva != MMIO_GVA_ANY && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
 		return;
 
 	vcpu->arch.mmio_gva = 0;
@@ -104,7 +112,8 @@ static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
 
 static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva)
 {
-	if (vcpu->arch.mmio_gva && vcpu->arch.mmio_gva == (gva & PAGE_MASK))
+	if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gva &&
+	      vcpu->arch.mmio_gva == (gva & PAGE_MASK))
 		return true;
 
 	return false;
@@ -112,7 +121,8 @@ static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva)
 
 static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
-	if (vcpu->arch.mmio_gfn && vcpu->arch.mmio_gfn == gpa >> PAGE_SHIFT)
+	if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gfn &&
+	      vcpu->arch.mmio_gfn == gpa >> PAGE_SHIFT)
 		return true;
 
 	return false;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] fix bugs with stale or corrupt MMIO caches
  2014-08-29 10:31 [PATCH 0/3] fix bugs with stale or corrupt MMIO caches Paolo Bonzini
                   ` (2 preceding siblings ...)
  2014-08-29 10:31 ` [PATCH 3/3] kvm: x86: fix stale mmio cache bug Paolo Bonzini
@ 2014-09-02 15:42 ` Paolo Bonzini
  2014-09-02 16:47   ` David Matlack
  2014-09-03  5:30   ` Xiao Guangrong
  3 siblings, 2 replies; 12+ messages in thread
From: Paolo Bonzini @ 2014-09-02 15:42 UTC (permalink / raw)
  To: linux-kernel, kvm, Xiao Guangrong, David Matlack

Il 29/08/2014 12:31, Paolo Bonzini ha scritto:
> David and Xiao, here's my take on the MMIO generation patches.  Now
> with documentation, too. :)  Please review!
> 
> David Matlack (2):
>   kvm: fix potentially corrupt mmio cache
>   kvm: x86: fix stale mmio cache bug
> 
> Paolo Bonzini (1):
>   KVM: do not bias the generation number in kvm_current_mmio_generation
> 
>  Documentation/virtual/kvm/mmu.txt | 14 ++++++++++++++
>  arch/x86/include/asm/kvm_host.h   |  1 +
>  arch/x86/kvm/mmu.c                | 29 ++++++++++++++---------------
>  arch/x86/kvm/x86.h                | 20 +++++++++++++++-----
>  virt/kvm/kvm_main.c               | 30 +++++++++++++++++++++++-------
>  5 files changed, 67 insertions(+), 27 deletions(-)
> 

Ping?

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] kvm: fix potentially corrupt mmio cache
  2014-08-29 10:31 ` [PATCH 2/3] kvm: fix potentially corrupt mmio cache Paolo Bonzini
@ 2014-09-02 16:44   ` David Matlack
  2014-09-02 16:49     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: David Matlack @ 2014-09-02 16:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, stable, Xiao Guangrong

On Fri, Aug 29, 2014 at 3:31 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> From: David Matlack <dmatlack@google.com>
>
> vcpu exits and memslot mutations can run concurrently as long as the
> vcpu does not aquire the slots mutex. Thus it is theoretically possible
> for memslots to change underneath a vcpu that is handling an exit.
>
> If we increment the memslot generation number again after
> synchronize_srcu_expedited(), vcpus can safely cache memslot generation
> without maintaining a single rcu_dereference through an entire vm exit.
> And much of the x86/kvm code does not maintain a single rcu_dereference
> of the current memslots during each exit.
>
> We can prevent the following case:
>
>    vcpu (CPU 0)                             | thread (CPU 1)
> --------------------------------------------+--------------------------
> 1  vm exit                                  |
> 2  srcu_read_unlock(&kvm->srcu)             |
> 3  decide to cache something based on       |
>      old memslots                           |
> 4                                           | change memslots
>                                             | (increments generation)
> 5                                           | synchronize_srcu(&kvm->srcu);
> 6  retrieve generation # from new memslots  |
> 7  tag cache with new memslot generation    |
> 8  srcu_read_unlock(&kvm->srcu)             |
> ...                                         |
>    <action based on cache occurs even       |
>     though the caching decision was based   |
>     on the old memslots>                    |
> ...                                         |
>    <action *continues* to occur until next  |
>     memslot generation change, which may    |
>     be never>                               |
>                                             |
>
> By incrementing the generation after synchronizing with kvm->srcu readers,
> we ensure that the generation retrieved in (6) will become invalid soon
> after (8).
>
> Keeping the existing increment is not strictly necessary, but we
> do keep it and just move it for consistency from update_memslots to
> install_new_memslots.  It invalidates old cached MMIOs immediately,
> instead of having to wait for the end of synchronize_srcu_expedited,
> which makes the code more clearly correct in case CPU 1 is preempted
> right after synchronize_srcu() returns.
>
> To avoid halving the generation space in SPTEs, always presume that the
> low bit of the generation is zero when reconstructing a generation number
> out of an SPTE.  This effectively disables MMIO caching in SPTEs during
> the call to synchronize_srcu_expedited.  Using the low bit this way is
> somewhat like a seqcount---where the protected thing is a cache, and
> instead of retrying we can simply punt if we observe the low bit to be 1.
>
> Cc: stable@vger.kernel.org
> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> Signed-off-by: David Matlack <dmatlack@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/virtual/kvm/mmu.txt | 14 ++++++++++++++
>  arch/x86/kvm/mmu.c                | 20 ++++++++++++--------
>  virt/kvm/kvm_main.c               | 23 ++++++++++++++++-------
>  3 files changed, 42 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
> index 290894176142..53838d9c6295 100644
> --- a/Documentation/virtual/kvm/mmu.txt
> +++ b/Documentation/virtual/kvm/mmu.txt
> @@ -425,6 +425,20 @@ fault through the slow path.
>  Since only 19 bits are used to store generation-number on mmio spte, all
>  pages are zapped when there is an overflow.
>
> +Unfortunately, a single memory access might access kvm_memslots(kvm) multiple
> +times, the last one happening when the generation number is retrieved and
> +stored into the MMIO spte.  Thus, the MMIO spte might be created based on
> +out-of-date information, but with an up-to-date generation number.
> +
> +To avoid this, the generation number is incremented again after synchronize_srcu
> +returns; thus, the low bit of kvm_memslots(kvm)->generation is only 1 during a
> +memslot update, while some SRCU readers might be using the old copy.  We do not
> +want to use an MMIO sptes created with an odd generation number, and we can do
> +this without losing a bit in the MMIO spte.  The low bit of the generation
> +is not stored in MMIO spte, and presumed zero when it is extracted out of the
> +spte.  If KVM is unlucky and creates an MMIO spte while the low bit is 1,
> +the next access to the spte will always be a cache miss.
> +
>
>  Further reading
>  ===============
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 323c3f5f5c84..96515957ba82 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -199,16 +199,20 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask)
>  EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
>
>  /*
> - * spte bits of bit 3 ~ bit 11 are used as low 9 bits of generation number,
> - * the bits of bits 52 ~ bit 61 are used as high 10 bits of generation
> - * number.
> + * the low bit of the generation number is always presumed to be zero.
> + * This disables mmio caching during memslot updates.  The concept is
> + * similar to a seqcount but instead of retrying the access we just punt
> + * and ignore the cache.
> + *
> + * spte bits 3-11 are used as bits 1-9 of the generation number,
> + * the bits 52-61 are used as bits 10-19 of the generation number.
>   */
> -#define MMIO_SPTE_GEN_LOW_SHIFT                3
> +#define MMIO_SPTE_GEN_LOW_SHIFT                2
>  #define MMIO_SPTE_GEN_HIGH_SHIFT       52
>
> -#define MMIO_GEN_SHIFT                 19
> -#define MMIO_GEN_LOW_SHIFT             9
> -#define MMIO_GEN_LOW_MASK              ((1 << MMIO_GEN_LOW_SHIFT) - 1)
> +#define MMIO_GEN_SHIFT                 20
> +#define MMIO_GEN_LOW_SHIFT             10
> +#define MMIO_GEN_LOW_MASK              ((1 << MMIO_GEN_LOW_SHIFT) - 2)
>  #define MMIO_GEN_MASK                  ((1 << MMIO_GEN_SHIFT) - 1)
>  #define MMIO_MAX_GEN                   ((1 << MMIO_GEN_SHIFT) - 1)
>
> @@ -4428,7 +4432,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
>          * The very rare case: if the generation-number is round,
>          * zap all shadow pages.
>          */
> -       if (unlikely(kvm_current_mmio_generation(kvm) >= MMIO_MAX_GEN)) {
> +       if (unlikely(kvm_current_mmio_generation(kvm) == 0)) {

This should be in patch 1/3.

>                 printk_ratelimited(KERN_INFO "kvm: zapping shadow pages for mmio generation wraparound\n");
>                 kvm_mmu_invalidate_zap_all_pages(kvm);
>         }
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 0bfdb673db26..bb8641b5d83b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -95,8 +95,6 @@ static int hardware_enable_all(void);
>  static void hardware_disable_all(void);
>
>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
> -static void update_memslots(struct kvm_memslots *slots,
> -                           struct kvm_memory_slot *new, u64 last_generation);
>
>  static void kvm_release_pfn_dirty(pfn_t pfn);
>  static void mark_page_dirty_in_slot(struct kvm *kvm,
> @@ -695,8 +693,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>  }
>
>  static void update_memslots(struct kvm_memslots *slots,
> -                           struct kvm_memory_slot *new,
> -                           u64 last_generation)
> +                           struct kvm_memory_slot *new)
>  {
>         if (new) {
>                 int id = new->id;
> @@ -707,8 +704,6 @@ static void update_memslots(struct kvm_memslots *slots,
>                 if (new->npages != npages)
>                         sort_memslots(slots);
>         }
> -
> -       slots->generation = last_generation + 1;
>  }
>
>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
> @@ -730,10 +725,24 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>  {
>         struct kvm_memslots *old_memslots = kvm->memslots;
>
> -       update_memslots(slots, new, kvm->memslots->generation);
> +       /*
> +        * Set the low bit in the generation, which disables SPTE caching
> +        * until the end of synchronize_srcu_expedited.
> +        */
> +       WARN_ON(old_memslots->generation & 1);
> +       slots->generation = old_memslots->generation + 1;
> +
> +       update_memslots(slots, new);
>         rcu_assign_pointer(kvm->memslots, slots);
>         synchronize_srcu_expedited(&kvm->srcu);
>
> +       /*
> +        * Increment the new memslot generation a second time. This prevents
> +        * vm exits that race with memslot updates from caching a memslot
> +        * generation that will (potentially) be valid forever.
> +        */
> +       slots->generation++;
> +
>         kvm_arch_memslots_updated(kvm);
>
>         return old_memslots;
> --
> 1.8.3.1
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] fix bugs with stale or corrupt MMIO caches
  2014-09-02 15:42 ` [PATCH 0/3] fix bugs with stale or corrupt MMIO caches Paolo Bonzini
@ 2014-09-02 16:47   ` David Matlack
  2014-09-02 16:50     ` Paolo Bonzini
  2014-09-03  5:30   ` Xiao Guangrong
  1 sibling, 1 reply; 12+ messages in thread
From: David Matlack @ 2014-09-02 16:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Xiao Guangrong

On Tue, Sep 2, 2014 at 8:42 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 29/08/2014 12:31, Paolo Bonzini ha scritto:
>> David and Xiao, here's my take on the MMIO generation patches.  Now
>> with documentation, too. :)  Please review!
>>
>> David Matlack (2):
>>   kvm: fix potentially corrupt mmio cache
>>   kvm: x86: fix stale mmio cache bug
>>
>> Paolo Bonzini (1):
>>   KVM: do not bias the generation number in kvm_current_mmio_generation
>>
>>  Documentation/virtual/kvm/mmu.txt | 14 ++++++++++++++
>>  arch/x86/include/asm/kvm_host.h   |  1 +
>>  arch/x86/kvm/mmu.c                | 29 ++++++++++++++---------------
>>  arch/x86/kvm/x86.h                | 20 +++++++++++++++-----
>>  virt/kvm/kvm_main.c               | 30 +++++++++++++++++++++++-------
>>  5 files changed, 67 insertions(+), 27 deletions(-)
>>
>
> Ping?

Sorry for the delay. I think the patches look good. And patch 3/3 still
fixes the bug I was originally seeing, so I'm happy :). I just had one
small comment (see my reply to patch 2/3).

>
> Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] kvm: fix potentially corrupt mmio cache
  2014-09-02 16:44   ` David Matlack
@ 2014-09-02 16:49     ` Paolo Bonzini
  2014-09-02 17:17       ` David Matlack
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-09-02 16:49 UTC (permalink / raw)
  To: David Matlack; +Cc: linux-kernel, kvm, stable, Xiao Guangrong

Il 02/09/2014 18:44, David Matlack ha scritto:
> >
> > -#define MMIO_GEN_SHIFT                 19
> > -#define MMIO_GEN_LOW_SHIFT             9
> > -#define MMIO_GEN_LOW_MASK              ((1 << MMIO_GEN_LOW_SHIFT) - 1)
> > +#define MMIO_GEN_SHIFT                 20
> > +#define MMIO_GEN_LOW_SHIFT             10
> > +#define MMIO_GEN_LOW_MASK              ((1 << MMIO_GEN_LOW_SHIFT) - 2)
> >  #define MMIO_GEN_MASK                  ((1 << MMIO_GEN_SHIFT) - 1)
> >  #define MMIO_MAX_GEN                   ((1 << MMIO_GEN_SHIFT) - 1)
> >
> > @@ -4428,7 +4432,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
> >          * The very rare case: if the generation-number is round,
> >          * zap all shadow pages.
> >          */
> > -       if (unlikely(kvm_current_mmio_generation(kvm) >= MMIO_MAX_GEN)) {
> > +       if (unlikely(kvm_current_mmio_generation(kvm) == 0)) {
> 
> This should be in patch 1/3.

I don't think so.  This change is not due to the removal of biasing in
x86.c, but rather due to removing bit 0 from MMIO_GEN_LOW_MASK.

I placed it here, because the previous test works just fine until bit 0
is removed from MMIO_GEN_LOW_MASK.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] fix bugs with stale or corrupt MMIO caches
  2014-09-02 16:47   ` David Matlack
@ 2014-09-02 16:50     ` Paolo Bonzini
  2014-09-02 17:18       ` David Matlack
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-09-02 16:50 UTC (permalink / raw)
  To: David Matlack; +Cc: linux-kernel, kvm, Xiao Guangrong

Il 02/09/2014 18:47, David Matlack ha scritto:
>> > Ping?
> Sorry for the delay. I think the patches look good. And patch 3/3 still
> fixes the bug I was originally seeing, so I'm happy :). I just had one
> small comment (see my reply to patch 2/3).
> 

I answered that question now.  Can I add your Reviewed-by and, for patch
3, Tested-by?

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] kvm: fix potentially corrupt mmio cache
  2014-09-02 16:49     ` Paolo Bonzini
@ 2014-09-02 17:17       ` David Matlack
  0 siblings, 0 replies; 12+ messages in thread
From: David Matlack @ 2014-09-02 17:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, stable, Xiao Guangrong

On Tue, Sep 2, 2014 at 9:49 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 02/09/2014 18:44, David Matlack ha scritto:
>> >
>> > -#define MMIO_GEN_SHIFT                 19
>> > -#define MMIO_GEN_LOW_SHIFT             9
>> > -#define MMIO_GEN_LOW_MASK              ((1 << MMIO_GEN_LOW_SHIFT) - 1)
>> > +#define MMIO_GEN_SHIFT                 20
>> > +#define MMIO_GEN_LOW_SHIFT             10
>> > +#define MMIO_GEN_LOW_MASK              ((1 << MMIO_GEN_LOW_SHIFT) - 2)
>> >  #define MMIO_GEN_MASK                  ((1 << MMIO_GEN_SHIFT) - 1)
>> >  #define MMIO_MAX_GEN                   ((1 << MMIO_GEN_SHIFT) - 1)
>> >
>> > @@ -4428,7 +4432,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
>> >          * The very rare case: if the generation-number is round,
>> >          * zap all shadow pages.
>> >          */
>> > -       if (unlikely(kvm_current_mmio_generation(kvm) >= MMIO_MAX_GEN)) {
>> > +       if (unlikely(kvm_current_mmio_generation(kvm) == 0)) {
>>
>> This should be in patch 1/3.
>
> I don't think so.  This change is not due to the removal of biasing in
> x86.c, but rather due to removing bit 0 from MMIO_GEN_LOW_MASK.
>
> I placed it here, because the previous test works just fine until bit 0
> is removed from MMIO_GEN_LOW_MASK.

Ah ok, you're right.

>
> Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] fix bugs with stale or corrupt MMIO caches
  2014-09-02 16:50     ` Paolo Bonzini
@ 2014-09-02 17:18       ` David Matlack
  0 siblings, 0 replies; 12+ messages in thread
From: David Matlack @ 2014-09-02 17:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Xiao Guangrong

On Tue, Sep 2, 2014 at 9:50 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 02/09/2014 18:47, David Matlack ha scritto:
>>> > Ping?
>> Sorry for the delay. I think the patches look good. And patch 3/3 still
>> fixes the bug I was originally seeing, so I'm happy :). I just had one
>> small comment (see my reply to patch 2/3).
>>
>
> I answered that question now.  Can I add your Reviewed-by and, for patch
> 3, Tested-by?

Please do. And thanks again for working on these!

>
> Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] fix bugs with stale or corrupt MMIO caches
  2014-09-02 15:42 ` [PATCH 0/3] fix bugs with stale or corrupt MMIO caches Paolo Bonzini
  2014-09-02 16:47   ` David Matlack
@ 2014-09-03  5:30   ` Xiao Guangrong
  1 sibling, 0 replies; 12+ messages in thread
From: Xiao Guangrong @ 2014-09-03  5:30 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm, David Matlack

On 09/02/2014 11:42 PM, Paolo Bonzini wrote:
> Il 29/08/2014 12:31, Paolo Bonzini ha scritto:
>> David and Xiao, here's my take on the MMIO generation patches.  Now
>> with documentation, too. :)  Please review!
>>
>> David Matlack (2):
>>   kvm: fix potentially corrupt mmio cache
>>   kvm: x86: fix stale mmio cache bug
>>
>> Paolo Bonzini (1):
>>   KVM: do not bias the generation number in kvm_current_mmio_generation
>>
>>  Documentation/virtual/kvm/mmu.txt | 14 ++++++++++++++
>>  arch/x86/include/asm/kvm_host.h   |  1 +
>>  arch/x86/kvm/mmu.c                | 29 ++++++++++++++---------------
>>  arch/x86/kvm/x86.h                | 20 +++++++++++++++-----
>>  virt/kvm/kvm_main.c               | 30 +++++++++++++++++++++++-------
>>  5 files changed, 67 insertions(+), 27 deletions(-)
>>
> 
> Ping?

Looks good to me.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-09-03  5:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-29 10:31 [PATCH 0/3] fix bugs with stale or corrupt MMIO caches Paolo Bonzini
2014-08-29 10:31 ` [PATCH 1/3] KVM: do not bias the generation number in kvm_current_mmio_generation Paolo Bonzini
2014-08-29 10:31 ` [PATCH 2/3] kvm: fix potentially corrupt mmio cache Paolo Bonzini
2014-09-02 16:44   ` David Matlack
2014-09-02 16:49     ` Paolo Bonzini
2014-09-02 17:17       ` David Matlack
2014-08-29 10:31 ` [PATCH 3/3] kvm: x86: fix stale mmio cache bug Paolo Bonzini
2014-09-02 15:42 ` [PATCH 0/3] fix bugs with stale or corrupt MMIO caches Paolo Bonzini
2014-09-02 16:47   ` David Matlack
2014-09-02 16:50     ` Paolo Bonzini
2014-09-02 17:18       ` David Matlack
2014-09-03  5:30   ` Xiao Guangrong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.