linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] kvm: Growable memory slot array
@ 2012-12-03 23:39 Alex Williamson
  2012-12-03 23:39 ` [RFC PATCH 1/6] kvm: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS Alex Williamson
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Alex Williamson @ 2012-12-03 23:39 UTC (permalink / raw)
  To: mtosatti, kvm, gleb; +Cc: linux-kernel

Memory slots are currently a fixed resource with a relatively small
limit.  When using PCI device assignment in a qemu guest it's fairly
easy to exhaust the number of available slots.  I posted patches
exploring growing the number of memory slots a while ago, but it was
prior to caching memory slot array misses and thefore had potentially
poor performance.  Now that we do that, Avi seemed receptive to
increasing the memory slot array to arbitrary lengths.  I think we
still don't want to impose unnecessary kernel memory consumptions on
guests not making use of this, so I present again a growable memory
slot array.

A couple notes/questions; in the previous version we had a
kvm_arch_flush_shadow() call when we increased the number of slots.
I'm not sure if this is still necessary.  I had also made the x86
specific slot_bitmap dynamically grow as well and switch between a
direct bitmap and indirect pointer to a bitmap.  That may have
contributed to needing the flush.  I haven't done that yet here
because it seems like an unnecessary complication if we have a max
on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
overhead.  If we want to go more, maybe we should make it switch.
That leads to the final question, we need an upper bound since this
does allow consumption of extra kernel memory, what should it be?  A
PCI bus filled with assigned devices can theorically use up to 2048
slots (32 devices * 8 functions * (6 BARs + ROM + possibly split
MSI-X BAR)).  For this RFC, I don't change the max, just make it
grow up to 32 user slots.  Untested on anything but x86 so far.
Thanks,

Alex
---

Alex Williamson (6):
      kvm: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS
      kvm: Make KVM_PRIVATE_MEM_SLOTS optional
      kvm: Merge id_to_index into memslots
      kvm: Move private memory slots to start of memslots array
      kvm: Re-introduce memslots->nmemslots
      kvm: Allow memory slots to grow


 arch/ia64/include/asm/kvm_host.h    |    4 --
 arch/ia64/kvm/kvm-ia64.c            |    6 +--
 arch/powerpc/include/asm/kvm_host.h |    6 +--
 arch/powerpc/kvm/book3s_hv.c        |    4 +-
 arch/s390/include/asm/kvm_host.h    |    4 --
 arch/x86/include/asm/kvm_host.h     |    8 ++-
 arch/x86/include/asm/vmx.h          |    6 +--
 arch/x86/kvm/vmx.c                  |    3 +
 arch/x86/kvm/x86.c                  |   10 +++-
 include/linux/kvm_host.h            |   20 +++++---
 virt/kvm/kvm_main.c                 |   83 ++++++++++++++++++++++++-----------
 11 files changed, 93 insertions(+), 61 deletions(-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 1/6] kvm: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS
  2012-12-03 23:39 [RFC PATCH 0/6] kvm: Growable memory slot array Alex Williamson
@ 2012-12-03 23:39 ` Alex Williamson
  2012-12-03 23:39 ` [RFC PATCH 2/6] kvm: Make KVM_PRIVATE_MEM_SLOTS optional Alex Williamson
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2012-12-03 23:39 UTC (permalink / raw)
  To: mtosatti, kvm, gleb; +Cc: linux-kernel

It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM.  One is
the user accessible slots and the other is user + private.  Make this
more obvious.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 arch/ia64/include/asm/kvm_host.h    |    2 +-
 arch/ia64/kvm/kvm-ia64.c            |    2 +-
 arch/powerpc/include/asm/kvm_host.h |    4 ++--
 arch/powerpc/kvm/book3s_hv.c        |    2 +-
 arch/s390/include/asm/kvm_host.h    |    2 +-
 arch/x86/include/asm/kvm_host.h     |    4 ++--
 arch/x86/include/asm/vmx.h          |    6 +++---
 arch/x86/kvm/x86.c                  |    6 +++---
 include/linux/kvm_host.h            |    2 +-
 virt/kvm/kvm_main.c                 |    8 ++++----
 10 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 6d6a5ac..48d7b0e 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -23,7 +23,7 @@
 #ifndef __ASM_KVM_HOST_H
 #define __ASM_KVM_HOST_H
 
-#define KVM_MEMORY_SLOTS 32
+#define KVM_USER_MEM_SLOTS 32
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
 
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 8b3a9c0..f1a46bd 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1831,7 +1831,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	mutex_lock(&kvm->slots_lock);
 
 	r = -EINVAL;
-	if (log->slot >= KVM_MEMORY_SLOTS)
+	if (log->slot >= KVM_USER_MEM_SLOTS)
 		goto out;
 
 	memslot = id_to_memslot(kvm->memslots, log->slot);
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 28e8f5e..5eb1dd8 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -37,10 +37,10 @@
 
 #define KVM_MAX_VCPUS		NR_CPUS
 #define KVM_MAX_VCORES		NR_CPUS
-#define KVM_MEMORY_SLOTS 32
+#define KVM_USER_MEM_SLOTS 32
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
-#define KVM_MEM_SLOTS_NUM (KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS)
+#define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
 
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 721d460..75ce80e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1262,7 +1262,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 	mutex_lock(&kvm->slots_lock);
 
 	r = -EINVAL;
-	if (log->slot >= KVM_MEMORY_SLOTS)
+	if (log->slot >= KVM_USER_MEM_SLOTS)
 		goto out;
 
 	memslot = id_to_memslot(kvm->memslots, log->slot);
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index b784154..ac33432 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -20,7 +20,7 @@
 #include <asm/cpu.h>
 
 #define KVM_MAX_VCPUS 64
-#define KVM_MEMORY_SLOTS 32
+#define KVM_USER_MEM_SLOTS 32
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b2e11f4..e619519 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -31,10 +31,10 @@
 
 #define KVM_MAX_VCPUS 254
 #define KVM_SOFT_MAX_VCPUS 160
-#define KVM_MEMORY_SLOTS 32
+#define KVM_USER_MEM_SLOTS 32
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
-#define KVM_MEM_SLOTS_NUM (KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS)
+#define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
 
 #define KVM_MMIO_SIZE 16
 
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 36ec21c..72932d2 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -427,9 +427,9 @@ enum vmcs_field {
 
 #define AR_RESERVD_MASK 0xfffe0f00
 
-#define TSS_PRIVATE_MEMSLOT			(KVM_MEMORY_SLOTS + 0)
-#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT	(KVM_MEMORY_SLOTS + 1)
-#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT	(KVM_MEMORY_SLOTS + 2)
+#define TSS_PRIVATE_MEMSLOT			(KVM_USER_MEM_SLOTS + 0)
+#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT	(KVM_USER_MEM_SLOTS + 1)
+#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT	(KVM_USER_MEM_SLOTS + 2)
 
 #define VMX_NR_VPIDS				(1 << 16)
 #define VMX_VPID_EXTENT_SINGLE_CONTEXT		1
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4f76417..1aa3fae 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2204,7 +2204,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 		r = KVM_MAX_VCPUS;
 		break;
 	case KVM_CAP_NR_MEMSLOTS:
-		r = KVM_MEMORY_SLOTS;
+		r = KVM_USER_MEM_SLOTS;
 		break;
 	case KVM_CAP_PV_MMU:	/* obsolete */
 		r = 0;
@@ -3135,7 +3135,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 	mutex_lock(&kvm->slots_lock);
 
 	r = -EINVAL;
-	if (log->slot >= KVM_MEMORY_SLOTS)
+	if (log->slot >= KVM_USER_MEM_SLOTS)
 		goto out;
 
 	memslot = id_to_memslot(kvm->memslots, log->slot);
@@ -6449,7 +6449,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	int map_flags = MAP_PRIVATE | MAP_ANONYMOUS;
 
 	/* Prevent internal slot pages from being moved by fork()/COW. */
-	if (memslot->id >= KVM_MEMORY_SLOTS)
+	if (memslot->id >= KVM_USER_MEM_SLOTS)
 		map_flags = MAP_SHARED | MAP_ANONYMOUS;
 
 	/*To keep backward compatibility with older userspace,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ecc5543..fb9354d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -299,7 +299,7 @@ struct kvm_irq_routing_table {};
 #endif
 
 #ifndef KVM_MEM_SLOTS_NUM
-#define KVM_MEM_SLOTS_NUM (KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS)
+#define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
 #endif
 
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6e8fa7e..48d3ede 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -761,7 +761,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	/* Check for overlaps */
 	r = -EEXIST;
 	kvm_for_each_memslot(slot, kvm->memslots) {
-		if (slot->id >= KVM_MEMORY_SLOTS || slot == memslot)
+		if (slot->id >= KVM_USER_MEM_SLOTS || slot == memslot)
 			continue;
 		if (!((base_gfn + npages <= slot->base_gfn) ||
 		      (base_gfn >= slot->base_gfn + slot->npages)))
@@ -879,7 +879,7 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
 				   kvm_userspace_memory_region *mem,
 				   int user_alloc)
 {
-	if (mem->slot >= KVM_MEMORY_SLOTS)
+	if (mem->slot >= KVM_USER_MEM_SLOTS)
 		return -EINVAL;
 	return kvm_set_memory_region(kvm, mem, user_alloc);
 }
@@ -893,7 +893,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
 	unsigned long any = 0;
 
 	r = -EINVAL;
-	if (log->slot >= KVM_MEMORY_SLOTS)
+	if (log->slot >= KVM_USER_MEM_SLOTS)
 		goto out;
 
 	memslot = id_to_memslot(kvm->memslots, log->slot);
@@ -939,7 +939,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 {
 	struct kvm_memory_slot *memslot = gfn_to_memslot(kvm, gfn);
 
-	if (!memslot || memslot->id >= KVM_MEMORY_SLOTS ||
+	if (!memslot || memslot->id >= KVM_USER_MEM_SLOTS ||
 	      memslot->flags & KVM_MEMSLOT_INVALID)
 		return 0;
 


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 2/6] kvm: Make KVM_PRIVATE_MEM_SLOTS optional
  2012-12-03 23:39 [RFC PATCH 0/6] kvm: Growable memory slot array Alex Williamson
  2012-12-03 23:39 ` [RFC PATCH 1/6] kvm: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS Alex Williamson
@ 2012-12-03 23:39 ` Alex Williamson
  2012-12-03 23:39 ` [RFC PATCH 3/6] kvm: Merge id_to_index into memslots Alex Williamson
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2012-12-03 23:39 UTC (permalink / raw)
  To: mtosatti, kvm, gleb; +Cc: linux-kernel

Seems like everyone copied x86 and defined 4 private memory slots
that never actually get used.  Even x86 only uses 3 of the 4.  These
aren't exposed so there's no need to add padding.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 arch/ia64/include/asm/kvm_host.h    |    2 --
 arch/powerpc/include/asm/kvm_host.h |    4 +---
 arch/s390/include/asm/kvm_host.h    |    2 --
 arch/x86/include/asm/kvm_host.h     |    4 ++--
 include/linux/kvm_host.h            |    4 ++++
 5 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 48d7b0e..cfa7498 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -24,8 +24,6 @@
 #define __ASM_KVM_HOST_H
 
 #define KVM_USER_MEM_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 5eb1dd8..23ca70d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -38,9 +38,7 @@
 #define KVM_MAX_VCPUS		NR_CPUS
 #define KVM_MAX_VCORES		NR_CPUS
 #define KVM_USER_MEM_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
-#define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
+#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS
 
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index ac33432..711c5ab 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -21,8 +21,6 @@
 
 #define KVM_MAX_VCPUS 64
 #define KVM_USER_MEM_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
 
 struct sca_entry {
 	atomic_t scn;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e619519..ce8b037 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -32,8 +32,8 @@
 #define KVM_MAX_VCPUS 254
 #define KVM_SOFT_MAX_VCPUS 160
 #define KVM_USER_MEM_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
+/* memory slots that are not exposed to userspace */
+#define KVM_PRIVATE_MEM_SLOTS 3
 #define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
 
 #define KVM_MMIO_SIZE 16
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fb9354d..bf8380f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -298,6 +298,10 @@ struct kvm_irq_routing_table {};
 
 #endif
 
+#ifndef KVM_PRIVATE_MEM_SLOTS
+#define KVM_PRIVATE_MEM_SLOTS 0
+#endif
+
 #ifndef KVM_MEM_SLOTS_NUM
 #define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
 #endif


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 3/6] kvm: Merge id_to_index into memslots
  2012-12-03 23:39 [RFC PATCH 0/6] kvm: Growable memory slot array Alex Williamson
  2012-12-03 23:39 ` [RFC PATCH 1/6] kvm: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS Alex Williamson
  2012-12-03 23:39 ` [RFC PATCH 2/6] kvm: Make KVM_PRIVATE_MEM_SLOTS optional Alex Williamson
@ 2012-12-03 23:39 ` Alex Williamson
  2012-12-05 21:22   ` Marcelo Tosatti
  2012-12-03 23:39 ` [RFC PATCH 4/6] kvm: Move private memory slots to start of memslots array Alex Williamson
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2012-12-03 23:39 UTC (permalink / raw)
  To: mtosatti, kvm, gleb; +Cc: linux-kernel

This allows us to resize this structure and therefore the number of
memslots as part of the RCU update.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 include/linux/kvm_host.h |    5 ++---
 virt/kvm/kvm_main.c      |    4 ++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bf8380f..7b3d5c4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -257,6 +257,7 @@ struct kvm_memory_slot {
 	unsigned long userspace_addr;
 	int user_alloc;
 	int id;
+	int id_to_index;
 };
 
 static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot)
@@ -314,8 +315,6 @@ struct kvm_irq_routing_table {};
 struct kvm_memslots {
 	u64 generation;
 	struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
-	/* The mapping table from slot id to the index in memslots[]. */
-	int id_to_index[KVM_MEM_SLOTS_NUM];
 };
 
 struct kvm {
@@ -425,7 +424,7 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
 static inline struct kvm_memory_slot *
 id_to_memslot(struct kvm_memslots *slots, int id)
 {
-	int index = slots->id_to_index[id];
+	int index = slots->memslots[id].id_to_index;
 	struct kvm_memory_slot *slot;
 
 	slot = &slots->memslots[index];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 48d3ede..4f8ae4b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -445,7 +445,7 @@ static void kvm_init_memslots_id(struct kvm *kvm)
 	struct kvm_memslots *slots = kvm->memslots;
 
 	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
-		slots->id_to_index[i] = slots->memslots[i].id = i;
+		slots->memslots[i].id_to_index = slots->memslots[i].id = i;
 }
 
 static struct kvm *kvm_create_vm(unsigned long type)
@@ -662,7 +662,7 @@ static void sort_memslots(struct kvm_memslots *slots)
 	      sizeof(struct kvm_memory_slot), cmp_memslot, NULL);
 
 	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
-		slots->id_to_index[slots->memslots[i].id] = i;
+		slots->memslots[slots->memslots[i].id].id_to_index = i;
 }
 
 void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new)


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 4/6] kvm: Move private memory slots to start of memslots array
  2012-12-03 23:39 [RFC PATCH 0/6] kvm: Growable memory slot array Alex Williamson
                   ` (2 preceding siblings ...)
  2012-12-03 23:39 ` [RFC PATCH 3/6] kvm: Merge id_to_index into memslots Alex Williamson
@ 2012-12-03 23:39 ` Alex Williamson
  2012-12-05 21:24   ` Marcelo Tosatti
  2012-12-03 23:39 ` [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots Alex Williamson
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2012-12-03 23:39 UTC (permalink / raw)
  To: mtosatti, kvm, gleb; +Cc: linux-kernel

In order to make the memslots array grow on demand, move the private
slots to the lower indexes of the array.  The private slots are
assumed likely to be in use, so if we didn't do this we'd end up
allocating the full memslots array all the time.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 arch/ia64/kvm/kvm-ia64.c     |    4 ++--
 arch/powerpc/kvm/book3s_hv.c |    2 +-
 arch/x86/include/asm/vmx.h   |    6 +++---
 arch/x86/kvm/vmx.c           |    2 +-
 arch/x86/kvm/x86.c           |    2 +-
 virt/kvm/kvm_main.c          |   15 ++++++++++-----
 6 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index f1a46bd..012e5dd 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -949,7 +949,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = -EFAULT;
 		if (copy_from_user(&kvm_mem, argp, sizeof kvm_mem))
 			goto out;
-		kvm_userspace_mem.slot = kvm_mem.slot;
+		kvm_userspace_mem.slot = kvm_mem.slot + KVM_PRIVATE_MEM_SLOTS;
 		kvm_userspace_mem.flags = kvm_mem.flags;
 		kvm_userspace_mem.guest_phys_addr =
 					kvm_mem.guest_phys_addr;
@@ -1831,7 +1831,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	mutex_lock(&kvm->slots_lock);
 
 	r = -EINVAL;
-	if (log->slot >= KVM_USER_MEM_SLOTS)
+	if (log->slot >= KVM_MEM_SLOTS_NUM)
 		goto out;
 
 	memslot = id_to_memslot(kvm->memslots, log->slot);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 75ce80e..56067db 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1262,7 +1262,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 	mutex_lock(&kvm->slots_lock);
 
 	r = -EINVAL;
-	if (log->slot >= KVM_USER_MEM_SLOTS)
+	if (log->slot >= KVM_MEM_SLOTS_NUM)
 		goto out;
 
 	memslot = id_to_memslot(kvm->memslots, log->slot);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 72932d2..97bcd7d 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -427,9 +427,9 @@ enum vmcs_field {
 
 #define AR_RESERVD_MASK 0xfffe0f00
 
-#define TSS_PRIVATE_MEMSLOT			(KVM_USER_MEM_SLOTS + 0)
-#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT	(KVM_USER_MEM_SLOTS + 1)
-#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT	(KVM_USER_MEM_SLOTS + 2)
+#define TSS_PRIVATE_MEMSLOT			0
+#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT	1
+#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT	2
 
 #define VMX_NR_VPIDS				(1 << 16)
 #define VMX_VPID_EXTENT_SINGLE_CONTEXT		1
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f858159..2bb9157 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2750,7 +2750,7 @@ static gva_t rmode_tss_base(struct kvm *kvm)
 		gfn_t base_gfn;
 
 		slots = kvm_memslots(kvm);
-		slot = id_to_memslot(slots, 0);
+		slot = id_to_memslot(slots, KVM_PRIVATE_MEM_SLOTS);
 		base_gfn = slot->base_gfn + slot->npages - 3;
 
 		return base_gfn << PAGE_SHIFT;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1aa3fae..8765485 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6449,7 +6449,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	int map_flags = MAP_PRIVATE | MAP_ANONYMOUS;
 
 	/* Prevent internal slot pages from being moved by fork()/COW. */
-	if (memslot->id >= KVM_USER_MEM_SLOTS)
+	if (memslot->id < KVM_PRIVATE_MEM_SLOTS)
 		map_flags = MAP_SHARED | MAP_ANONYMOUS;
 
 	/*To keep backward compatibility with older userspace,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4f8ae4b..3ce2664 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -761,7 +761,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	/* Check for overlaps */
 	r = -EEXIST;
 	kvm_for_each_memslot(slot, kvm->memslots) {
-		if (slot->id >= KVM_USER_MEM_SLOTS || slot == memslot)
+		if (slot->id < KVM_PRIVATE_MEM_SLOTS || slot == memslot)
 			continue;
 		if (!((base_gfn + npages <= slot->base_gfn) ||
 		      (base_gfn >= slot->base_gfn + slot->npages)))
@@ -879,7 +879,7 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
 				   kvm_userspace_memory_region *mem,
 				   int user_alloc)
 {
-	if (mem->slot >= KVM_USER_MEM_SLOTS)
+	if (mem->slot >= KVM_MEM_SLOTS_NUM)
 		return -EINVAL;
 	return kvm_set_memory_region(kvm, mem, user_alloc);
 }
@@ -893,7 +893,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
 	unsigned long any = 0;
 
 	r = -EINVAL;
-	if (log->slot >= KVM_USER_MEM_SLOTS)
+	if (log->slot >= KVM_MEM_SLOTS_NUM)
 		goto out;
 
 	memslot = id_to_memslot(kvm->memslots, log->slot);
@@ -939,7 +939,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 {
 	struct kvm_memory_slot *memslot = gfn_to_memslot(kvm, gfn);
 
-	if (!memslot || memslot->id >= KVM_USER_MEM_SLOTS ||
+	if (!memslot || memslot->id < KVM_PRIVATE_MEM_SLOTS ||
 	      memslot->flags & KVM_MEMSLOT_INVALID)
 		return 0;
 
@@ -2137,6 +2137,8 @@ static long kvm_vm_ioctl(struct file *filp,
 						sizeof kvm_userspace_mem))
 			goto out;
 
+		kvm_userspace_mem.slot += KVM_PRIVATE_MEM_SLOTS;
+
 		r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem, 1);
 		if (r)
 			goto out;
@@ -2148,6 +2150,9 @@ static long kvm_vm_ioctl(struct file *filp,
 		r = -EFAULT;
 		if (copy_from_user(&log, argp, sizeof log))
 			goto out;
+
+		log.slot += KVM_PRIVATE_MEM_SLOTS;
+
 		r = kvm_vm_ioctl_get_dirty_log(kvm, &log);
 		if (r)
 			goto out;
@@ -2276,7 +2281,7 @@ static long kvm_vm_compat_ioctl(struct file *filp,
 		if (copy_from_user(&compat_log, (void __user *)arg,
 				   sizeof(compat_log)))
 			goto out;
-		log.slot	 = compat_log.slot;
+		log.slot	 = compat_log.slot + KVM_PRIVATE_MEM_SLOTS;
 		log.padding1	 = compat_log.padding1;
 		log.padding2	 = compat_log.padding2;
 		log.dirty_bitmap = compat_ptr(compat_log.dirty_bitmap);


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots
  2012-12-03 23:39 [RFC PATCH 0/6] kvm: Growable memory slot array Alex Williamson
                   ` (3 preceding siblings ...)
  2012-12-03 23:39 ` [RFC PATCH 4/6] kvm: Move private memory slots to start of memslots array Alex Williamson
@ 2012-12-03 23:39 ` Alex Williamson
  2012-12-05 21:26   ` Marcelo Tosatti
  2012-12-03 23:39 ` [RFC PATCH 6/6] kvm: Allow memory slots to grow Alex Williamson
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2012-12-03 23:39 UTC (permalink / raw)
  To: mtosatti, kvm, gleb; +Cc: linux-kernel

struct kvm_memory_slot is currently 52 bytes (LP64), not counting the
arch data.  On x86 this means the memslot array to support a tiny 32+3
entries (user+private) is over 2k.  We'd like to support more slots
so that we can support more assigned devices, but it doesn't make
sense to penalize everyone by using a statically allocated array.
This allows us to start introducing a grow-able array.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 arch/ia64/kvm/kvm-ia64.c     |    2 +-
 arch/powerpc/kvm/book3s_hv.c |    2 +-
 arch/x86/kvm/vmx.c           |    1 +
 arch/x86/kvm/x86.c           |    4 +++-
 include/linux/kvm_host.h     |    9 ++++++---
 virt/kvm/kvm_main.c          |   10 ++++++----
 6 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 012e5dd..96401b5 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1836,7 +1836,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
 	memslot = id_to_memslot(kvm->memslots, log->slot);
 	r = -ENOENT;
-	if (!memslot->dirty_bitmap)
+	if (!memslots || !memslot->dirty_bitmap)
 		goto out;
 
 	kvm_ia64_sync_dirty_log(kvm, memslot);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 56067db..0417190 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1267,7 +1267,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 
 	memslot = id_to_memslot(kvm->memslots, log->slot);
 	r = -ENOENT;
-	if (!memslot->dirty_bitmap)
+	if (!memslot || !memslot->dirty_bitmap)
 		goto out;
 
 	n = kvm_dirty_bitmap_bytes(memslot);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2bb9157..07fdd90 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2751,6 +2751,7 @@ static gva_t rmode_tss_base(struct kvm *kvm)
 
 		slots = kvm_memslots(kvm);
 		slot = id_to_memslot(slots, KVM_PRIVATE_MEM_SLOTS);
+		BUG_ON(!slot);
 		base_gfn = slot->base_gfn + slot->npages - 3;
 
 		return base_gfn << PAGE_SHIFT;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8765485..53fe9b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3139,9 +3139,11 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 		goto out;
 
 	memslot = id_to_memslot(kvm->memslots, log->slot);
+	r = -ENOENT;
+	if (!memslot)
+		goto out;
 
 	dirty_bitmap = memslot->dirty_bitmap;
-	r = -ENOENT;
 	if (!dirty_bitmap)
 		goto out;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7b3d5c4..1955a4e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -313,6 +313,7 @@ struct kvm_irq_routing_table {};
  * to get the memslot by its id.
  */
 struct kvm_memslots {
+	int nmemslots;
 	u64 generation;
 	struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
 };
@@ -397,7 +398,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
 
 #define kvm_for_each_memslot(memslot, slots)	\
 	for (memslot = &slots->memslots[0];	\
-	      memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
+	      memslot < slots->memslots + slots->nmemslots && memslot->npages;\
 		memslot++)
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id);
@@ -424,10 +425,12 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
 static inline struct kvm_memory_slot *
 id_to_memslot(struct kvm_memslots *slots, int id)
 {
-	int index = slots->memslots[id].id_to_index;
 	struct kvm_memory_slot *slot;
 
-	slot = &slots->memslots[index];
+	if (id >= slots->nmemslots)
+		return NULL;
+
+	slot = &slots->memslots[slots->memslots[id].id_to_index];
 
 	WARN_ON(slot->id != id);
 	return slot;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3ce2664..ebd3960 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -444,7 +444,9 @@ static void kvm_init_memslots_id(struct kvm *kvm)
 	int i;
 	struct kvm_memslots *slots = kvm->memslots;
 
-	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
+	slots->nmemslots = KVM_MEM_SLOTS_NUM;
+
+	for (i = 0; i < kvm->memslots->nmemslots; i++)
 		slots->memslots[i].id_to_index = slots->memslots[i].id = i;
 }
 
@@ -658,10 +660,10 @@ static void sort_memslots(struct kvm_memslots *slots)
 {
 	int i;
 
-	sort(slots->memslots, KVM_MEM_SLOTS_NUM,
+	sort(slots->memslots, slots->nmemslots,
 	      sizeof(struct kvm_memory_slot), cmp_memslot, NULL);
 
-	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
+	for (i = 0; i < slots->nmemslots; i++)
 		slots->memslots[slots->memslots[i].id].id_to_index = i;
 }
 
@@ -898,7 +900,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
 
 	memslot = id_to_memslot(kvm->memslots, log->slot);
 	r = -ENOENT;
-	if (!memslot->dirty_bitmap)
+	if (!memslot || !memslot->dirty_bitmap)
 		goto out;
 
 	n = kvm_dirty_bitmap_bytes(memslot);


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 6/6] kvm: Allow memory slots to grow
  2012-12-03 23:39 [RFC PATCH 0/6] kvm: Growable memory slot array Alex Williamson
                   ` (4 preceding siblings ...)
  2012-12-03 23:39 ` [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots Alex Williamson
@ 2012-12-03 23:39 ` Alex Williamson
  2012-12-04 11:48 ` [RFC PATCH 0/6] kvm: Growable memory slot array Gleb Natapov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2012-12-03 23:39 UTC (permalink / raw)
  To: mtosatti, kvm, gleb; +Cc: linux-kernel

Start with zero and grow up to KVM_MEM_SLOTS_NUM.  A modest guest
without device assignment likely uses around 1/4 of the total
entries.  We don't attempt to shrink the array when slots are
released.  Both x86 and powerpc still have some statically sized
elements elsewhere, but this covers the bulk of the memory used.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 include/linux/kvm_host.h |    2 +
 virt/kvm/kvm_main.c      |   62 +++++++++++++++++++++++++++++++---------------
 2 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1955a4e..effc800 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -315,7 +315,7 @@ struct kvm_irq_routing_table {};
 struct kvm_memslots {
 	int nmemslots;
 	u64 generation;
-	struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
+	struct kvm_memory_slot memslots[];
 };
 
 struct kvm {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ebd3960..fa4df50 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -439,17 +439,6 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
 
 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
 
-static void kvm_init_memslots_id(struct kvm *kvm)
-{
-	int i;
-	struct kvm_memslots *slots = kvm->memslots;
-
-	slots->nmemslots = KVM_MEM_SLOTS_NUM;
-
-	for (i = 0; i < kvm->memslots->nmemslots; i++)
-		slots->memslots[i].id_to_index = slots->memslots[i].id = i;
-}
-
 static struct kvm *kvm_create_vm(unsigned long type)
 {
 	int r, i;
@@ -475,7 +464,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!kvm->memslots)
 		goto out_err_nosrcu;
-	kvm_init_memslots_id(kvm);
 	if (init_srcu_struct(&kvm->srcu))
 		goto out_err_nosrcu;
 	for (i = 0; i < KVM_NR_BUSES; i++) {
@@ -696,6 +684,40 @@ static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
 	return 0;
 }
 
+struct kvm_memslots *__kvm_dup_and_grow_memslots(struct kvm_memslots *oldslots,
+						 int slot)
+{
+	int nmemslots;
+	struct kvm_memslots *newslots;
+
+	nmemslots = (slot >= oldslots->nmemslots) ?
+		    slot + 1 : oldslots->nmemslots;
+
+	newslots = kmalloc(sizeof(struct kvm_memslots) +
+			nmemslots * sizeof(struct kvm_memory_slot), GFP_KERNEL);
+	if (!newslots)
+		return NULL;
+
+	memcpy(newslots, oldslots, sizeof(struct kvm_memslots) +
+	       oldslots->nmemslots * sizeof(struct kvm_memory_slot));
+
+	if (nmemslots != oldslots->nmemslots) {
+		int i;
+		memset(&newslots->memslots[oldslots->nmemslots], 0,
+		       (nmemslots - oldslots->nmemslots) *
+		       sizeof(struct kvm_memory_slot));
+
+		for (i = oldslots->nmemslots; i < nmemslots; i++) {
+			newslots->memslots[i].id_to_index = i;
+			newslots->memslots[i].id = i;
+		}
+
+		newslots->nmemslots = nmemslots;
+	}
+
+	return newslots;
+}
+
 /*
  * Allocate some memory and give it an address in the guest physical address
  * space.
@@ -711,8 +733,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	int r;
 	gfn_t base_gfn;
 	unsigned long npages;
-	struct kvm_memory_slot *memslot, *slot;
-	struct kvm_memory_slot old, new;
+	struct kvm_memory_slot *memslot = NULL, *slot;
+	struct kvm_memory_slot old = {}, new = {};
 	struct kvm_memslots *slots, *old_memslots;
 
 	r = check_memory_region_flags(mem);
@@ -737,7 +759,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
 		goto out;
 
-	memslot = id_to_memslot(kvm->memslots, mem->slot);
 	base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
 	npages = mem->memory_size >> PAGE_SHIFT;
 
@@ -748,7 +769,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	if (!npages)
 		mem->flags &= ~KVM_MEM_LOG_DIRTY_PAGES;
 
-	new = old = *memslot;
+	if (mem->slot < kvm->memslots->nmemslots) {
+		memslot = id_to_memslot(kvm->memslots, mem->slot);
+		new = old = *memslot;
+	}
 
 	new.id = mem->slot;
 	new.base_gfn = base_gfn;
@@ -796,8 +820,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		struct kvm_memory_slot *slot;
 
 		r = -ENOMEM;
-		slots = kmemdup(kvm->memslots, sizeof(struct kvm_memslots),
-				GFP_KERNEL);
+		slots = __kvm_dup_and_grow_memslots(kvm->memslots, mem->slot);
 		if (!slots)
 			goto out_free;
 		slot = id_to_memslot(slots, mem->slot);
@@ -832,8 +855,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		kvm_iommu_unmap_pages(kvm, &old);
 
 	r = -ENOMEM;
-	slots = kmemdup(kvm->memslots, sizeof(struct kvm_memslots),
-			GFP_KERNEL);
+	slots = __kvm_dup_and_grow_memslots(kvm->memslots, mem->slot);
 	if (!slots)
 		goto out_free;
 


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/6] kvm: Growable memory slot array
  2012-12-03 23:39 [RFC PATCH 0/6] kvm: Growable memory slot array Alex Williamson
                   ` (5 preceding siblings ...)
  2012-12-03 23:39 ` [RFC PATCH 6/6] kvm: Allow memory slots to grow Alex Williamson
@ 2012-12-04 11:48 ` Gleb Natapov
  2012-12-04 15:21   ` Alex Williamson
  2012-12-04 14:48 ` Takuya Yoshikawa
  2012-12-05 21:32 ` Marcelo Tosatti
  8 siblings, 1 reply; 28+ messages in thread
From: Gleb Natapov @ 2012-12-04 11:48 UTC (permalink / raw)
  To: Alex Williamson; +Cc: mtosatti, kvm, linux-kernel

On Mon, Dec 03, 2012 at 04:39:05PM -0700, Alex Williamson wrote:
> Memory slots are currently a fixed resource with a relatively small
> limit.  When using PCI device assignment in a qemu guest it's fairly
> easy to exhaust the number of available slots.  I posted patches
> exploring growing the number of memory slots a while ago, but it was
> prior to caching memory slot array misses and thefore had potentially
> poor performance.  Now that we do that, Avi seemed receptive to
> increasing the memory slot array to arbitrary lengths.  I think we
> still don't want to impose unnecessary kernel memory consumptions on
> guests not making use of this, so I present again a growable memory
> slot array.
> 
> A couple notes/questions; in the previous version we had a
> kvm_arch_flush_shadow() call when we increased the number of slots.
> I'm not sure if this is still necessary.  I had also made the x86
> specific slot_bitmap dynamically grow as well and switch between a
> direct bitmap and indirect pointer to a bitmap.  That may have
> contributed to needing the flush.  I haven't done that yet here
> because it seems like an unnecessary complication if we have a max
> on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
> overhead.  If we want to go more, maybe we should make it switch.
> That leads to the final question, we need an upper bound since this
> does allow consumption of extra kernel memory, what should it be?  A
This is the most important question :) If we want to have 1000s of
them or 100 is enough? Also what about changing kvm_memslots->memslots[]
array to be "struct kvm_memory_slot *memslots[KVM_MEM_SLOTS_NUM]"? It
will save us good amount of memory for unused slots.

> PCI bus filled with assigned devices can theorically use up to 2048
> slots (32 devices * 8 functions * (6 BARs + ROM + possibly split
> MSI-X BAR)).  For this RFC, I don't change the max, just make it
> grow up to 32 user slots.  Untested on anything but x86 so far.
> Thanks,
> 
> Alex
> ---
> 
> Alex Williamson (6):
>       kvm: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS
>       kvm: Make KVM_PRIVATE_MEM_SLOTS optional
>       kvm: Merge id_to_index into memslots
>       kvm: Move private memory slots to start of memslots array
>       kvm: Re-introduce memslots->nmemslots
>       kvm: Allow memory slots to grow
> 
> 
>  arch/ia64/include/asm/kvm_host.h    |    4 --
>  arch/ia64/kvm/kvm-ia64.c            |    6 +--
>  arch/powerpc/include/asm/kvm_host.h |    6 +--
>  arch/powerpc/kvm/book3s_hv.c        |    4 +-
>  arch/s390/include/asm/kvm_host.h    |    4 --
>  arch/x86/include/asm/kvm_host.h     |    8 ++-
>  arch/x86/include/asm/vmx.h          |    6 +--
>  arch/x86/kvm/vmx.c                  |    3 +
>  arch/x86/kvm/x86.c                  |   10 +++-
>  include/linux/kvm_host.h            |   20 +++++---
>  virt/kvm/kvm_main.c                 |   83 ++++++++++++++++++++++++-----------
>  11 files changed, 93 insertions(+), 61 deletions(-)

--
			Gleb.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/6] kvm: Growable memory slot array
  2012-12-03 23:39 [RFC PATCH 0/6] kvm: Growable memory slot array Alex Williamson
                   ` (6 preceding siblings ...)
  2012-12-04 11:48 ` [RFC PATCH 0/6] kvm: Growable memory slot array Gleb Natapov
@ 2012-12-04 14:48 ` Takuya Yoshikawa
  2012-12-04 15:26   ` Alex Williamson
  2012-12-05 21:32 ` Marcelo Tosatti
  8 siblings, 1 reply; 28+ messages in thread
From: Takuya Yoshikawa @ 2012-12-04 14:48 UTC (permalink / raw)
  To: Alex Williamson; +Cc: mtosatti, kvm, gleb, linux-kernel

On Mon, 03 Dec 2012 16:39:05 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:

> A couple notes/questions; in the previous version we had a
> kvm_arch_flush_shadow() call when we increased the number of slots.
> I'm not sure if this is still necessary.  I had also made the x86
> specific slot_bitmap dynamically grow as well and switch between a
> direct bitmap and indirect pointer to a bitmap.  That may have
> contributed to needing the flush.  I haven't done that yet here
> because it seems like an unnecessary complication if we have a max
> on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
> overhead.  If we want to go more, maybe we should make it switch.

I have a patch set which removes the slot_bitmap in kvm mmu page
by using reverse mappings for write protecting a memslot.

A bit of concern I still have is the total write protection time
for large memslots.  But since this approach allows us to control
mmu_lock hold time, I think this is a reasonable trade-off.
... and this should be much better than introducing any complication
for slot_bitmap handling.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/6] kvm: Growable memory slot array
  2012-12-04 11:48 ` [RFC PATCH 0/6] kvm: Growable memory slot array Gleb Natapov
@ 2012-12-04 15:21   ` Alex Williamson
  2012-12-04 15:30     ` Gleb Natapov
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2012-12-04 15:21 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: mtosatti, kvm, linux-kernel

On Tue, 2012-12-04 at 13:48 +0200, Gleb Natapov wrote:
> On Mon, Dec 03, 2012 at 04:39:05PM -0700, Alex Williamson wrote:
> > Memory slots are currently a fixed resource with a relatively small
> > limit.  When using PCI device assignment in a qemu guest it's fairly
> > easy to exhaust the number of available slots.  I posted patches
> > exploring growing the number of memory slots a while ago, but it was
> > prior to caching memory slot array misses and thefore had potentially
> > poor performance.  Now that we do that, Avi seemed receptive to
> > increasing the memory slot array to arbitrary lengths.  I think we
> > still don't want to impose unnecessary kernel memory consumptions on
> > guests not making use of this, so I present again a growable memory
> > slot array.
> > 
> > A couple notes/questions; in the previous version we had a
> > kvm_arch_flush_shadow() call when we increased the number of slots.
> > I'm not sure if this is still necessary.  I had also made the x86
> > specific slot_bitmap dynamically grow as well and switch between a
> > direct bitmap and indirect pointer to a bitmap.  That may have
> > contributed to needing the flush.  I haven't done that yet here
> > because it seems like an unnecessary complication if we have a max
> > on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
> > overhead.  If we want to go more, maybe we should make it switch.
> > That leads to the final question, we need an upper bound since this
> > does allow consumption of extra kernel memory, what should it be?  A
> This is the most important question :) If we want to have 1000s of
> them or 100 is enough?

We can certainly hit respectable numbers of assigned devices in the
hundreds.  Worst case is 8 slots per assigned device, typical case is 4
or less.  So 512 slots would more or less guarantee 64 devices (we do
need some slots for actual memory), and more typically allow at least
128 devices.  Philosophically, supporting a full PCI bus, 256 functions,
2048 slots, is an attractive target, but it's probably no practical.

I think on x86 a slot is 72 bytes w/ alignment padding, so a maximum of
36k @512 slots.

>  Also what about changing kvm_memslots->memslots[]
> array to be "struct kvm_memory_slot *memslots[KVM_MEM_SLOTS_NUM]"? It
> will save us good amount of memory for unused slots.

I'm not following where that results in memory savings.  Can you
clarify.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/6] kvm: Growable memory slot array
  2012-12-04 14:48 ` Takuya Yoshikawa
@ 2012-12-04 15:26   ` Alex Williamson
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2012-12-04 15:26 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, gleb, linux-kernel

On Tue, 2012-12-04 at 23:48 +0900, Takuya Yoshikawa wrote:
> On Mon, 03 Dec 2012 16:39:05 -0700
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > A couple notes/questions; in the previous version we had a
> > kvm_arch_flush_shadow() call when we increased the number of slots.
> > I'm not sure if this is still necessary.  I had also made the x86
> > specific slot_bitmap dynamically grow as well and switch between a
> > direct bitmap and indirect pointer to a bitmap.  That may have
> > contributed to needing the flush.  I haven't done that yet here
> > because it seems like an unnecessary complication if we have a max
> > on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
> > overhead.  If we want to go more, maybe we should make it switch.
> 
> I have a patch set which removes the slot_bitmap in kvm mmu page
> by using reverse mappings for write protecting a memslot.
> 
> A bit of concern I still have is the total write protection time
> for large memslots.  But since this approach allows us to control
> mmu_lock hold time, I think this is a reasonable trade-off.
> ... and this should be much better than introducing any complication
> for slot_bitmap handling.

Great!  A bi-modal bitmap would be rather ugly, so I look forward to
your patches to remove it entirely ;)  Thanks,

Alex


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/6] kvm: Growable memory slot array
  2012-12-04 15:21   ` Alex Williamson
@ 2012-12-04 15:30     ` Gleb Natapov
  2012-12-04 15:39       ` Alex Williamson
  0 siblings, 1 reply; 28+ messages in thread
From: Gleb Natapov @ 2012-12-04 15:30 UTC (permalink / raw)
  To: Alex Williamson; +Cc: mtosatti, kvm, linux-kernel

On Tue, Dec 04, 2012 at 08:21:55AM -0700, Alex Williamson wrote:
> On Tue, 2012-12-04 at 13:48 +0200, Gleb Natapov wrote:
> > On Mon, Dec 03, 2012 at 04:39:05PM -0700, Alex Williamson wrote:
> > > Memory slots are currently a fixed resource with a relatively small
> > > limit.  When using PCI device assignment in a qemu guest it's fairly
> > > easy to exhaust the number of available slots.  I posted patches
> > > exploring growing the number of memory slots a while ago, but it was
> > > prior to caching memory slot array misses and thefore had potentially
> > > poor performance.  Now that we do that, Avi seemed receptive to
> > > increasing the memory slot array to arbitrary lengths.  I think we
> > > still don't want to impose unnecessary kernel memory consumptions on
> > > guests not making use of this, so I present again a growable memory
> > > slot array.
> > > 
> > > A couple notes/questions; in the previous version we had a
> > > kvm_arch_flush_shadow() call when we increased the number of slots.
> > > I'm not sure if this is still necessary.  I had also made the x86
> > > specific slot_bitmap dynamically grow as well and switch between a
> > > direct bitmap and indirect pointer to a bitmap.  That may have
> > > contributed to needing the flush.  I haven't done that yet here
> > > because it seems like an unnecessary complication if we have a max
> > > on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
> > > overhead.  If we want to go more, maybe we should make it switch.
> > > That leads to the final question, we need an upper bound since this
> > > does allow consumption of extra kernel memory, what should it be?  A
> > This is the most important question :) If we want to have 1000s of
> > them or 100 is enough?
> 
> We can certainly hit respectable numbers of assigned devices in the
> hundreds.  Worst case is 8 slots per assigned device, typical case is 4
> or less.  So 512 slots would more or less guarantee 64 devices (we do
> need some slots for actual memory), and more typically allow at least
> 128 devices.  Philosophically, supporting a full PCI bus, 256 functions,
> 2048 slots, is an attractive target, but it's probably no practical.
> 
> I think on x86 a slot is 72 bytes w/ alignment padding, so a maximum of
> 36k @512 slots.
> 
> >  Also what about changing kvm_memslots->memslots[]
> > array to be "struct kvm_memory_slot *memslots[KVM_MEM_SLOTS_NUM]"? It
> > will save us good amount of memory for unused slots.
> 
> I'm not following where that results in memory savings.  Can you
> clarify.  Thanks,
> 
We will waste sizeof(void*) for each unused slot instead of
sizeof(struct kvm_memory_slot).

--
			Gleb.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/6] kvm: Growable memory slot array
  2012-12-04 15:30     ` Gleb Natapov
@ 2012-12-04 15:39       ` Alex Williamson
  2012-12-04 16:42         ` Gleb Natapov
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2012-12-04 15:39 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: mtosatti, kvm, linux-kernel

On Tue, 2012-12-04 at 17:30 +0200, Gleb Natapov wrote:
> On Tue, Dec 04, 2012 at 08:21:55AM -0700, Alex Williamson wrote:
> > On Tue, 2012-12-04 at 13:48 +0200, Gleb Natapov wrote:
> > > On Mon, Dec 03, 2012 at 04:39:05PM -0700, Alex Williamson wrote:
> > > > Memory slots are currently a fixed resource with a relatively small
> > > > limit.  When using PCI device assignment in a qemu guest it's fairly
> > > > easy to exhaust the number of available slots.  I posted patches
> > > > exploring growing the number of memory slots a while ago, but it was
> > > > prior to caching memory slot array misses and thefore had potentially
> > > > poor performance.  Now that we do that, Avi seemed receptive to
> > > > increasing the memory slot array to arbitrary lengths.  I think we
> > > > still don't want to impose unnecessary kernel memory consumptions on
> > > > guests not making use of this, so I present again a growable memory
> > > > slot array.
> > > > 
> > > > A couple notes/questions; in the previous version we had a
> > > > kvm_arch_flush_shadow() call when we increased the number of slots.
> > > > I'm not sure if this is still necessary.  I had also made the x86
> > > > specific slot_bitmap dynamically grow as well and switch between a
> > > > direct bitmap and indirect pointer to a bitmap.  That may have
> > > > contributed to needing the flush.  I haven't done that yet here
> > > > because it seems like an unnecessary complication if we have a max
> > > > on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
> > > > overhead.  If we want to go more, maybe we should make it switch.
> > > > That leads to the final question, we need an upper bound since this
> > > > does allow consumption of extra kernel memory, what should it be?  A
> > > This is the most important question :) If we want to have 1000s of
> > > them or 100 is enough?
> > 
> > We can certainly hit respectable numbers of assigned devices in the
> > hundreds.  Worst case is 8 slots per assigned device, typical case is 4
> > or less.  So 512 slots would more or less guarantee 64 devices (we do
> > need some slots for actual memory), and more typically allow at least
> > 128 devices.  Philosophically, supporting a full PCI bus, 256 functions,
> > 2048 slots, is an attractive target, but it's probably no practical.
> > 
> > I think on x86 a slot is 72 bytes w/ alignment padding, so a maximum of
> > 36k @512 slots.
> > 
> > >  Also what about changing kvm_memslots->memslots[]
> > > array to be "struct kvm_memory_slot *memslots[KVM_MEM_SLOTS_NUM]"? It
> > > will save us good amount of memory for unused slots.
> > 
> > I'm not following where that results in memory savings.  Can you
> > clarify.  Thanks,
> > 
> We will waste sizeof(void*) for each unused slot instead of
> sizeof(struct kvm_memory_slot).

Ah, of course.  That means for 512 slots we're wasting a full page just
for the pointers, whereas we can fit 56 slots in the same space.  Given
that most users get by just fine w/ 32 slots, I don't think that's a win
in the typical case.  Maybe if we want to support sparse arrays, but a
tree would probably be better at that point.  A drawback of the growable
array is that userspace can subvert any savings by using slot N-1 first,
but that's why we put a limit at a reasonable size.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/6] kvm: Growable memory slot array
  2012-12-04 15:39       ` Alex Williamson
@ 2012-12-04 16:42         ` Gleb Natapov
  2012-12-04 17:56           ` Alex Williamson
  0 siblings, 1 reply; 28+ messages in thread
From: Gleb Natapov @ 2012-12-04 16:42 UTC (permalink / raw)
  To: Alex Williamson; +Cc: mtosatti, kvm, linux-kernel

On Tue, Dec 04, 2012 at 08:39:47AM -0700, Alex Williamson wrote:
> On Tue, 2012-12-04 at 17:30 +0200, Gleb Natapov wrote:
> > On Tue, Dec 04, 2012 at 08:21:55AM -0700, Alex Williamson wrote:
> > > On Tue, 2012-12-04 at 13:48 +0200, Gleb Natapov wrote:
> > > > On Mon, Dec 03, 2012 at 04:39:05PM -0700, Alex Williamson wrote:
> > > > > Memory slots are currently a fixed resource with a relatively small
> > > > > limit.  When using PCI device assignment in a qemu guest it's fairly
> > > > > easy to exhaust the number of available slots.  I posted patches
> > > > > exploring growing the number of memory slots a while ago, but it was
> > > > > prior to caching memory slot array misses and thefore had potentially
> > > > > poor performance.  Now that we do that, Avi seemed receptive to
> > > > > increasing the memory slot array to arbitrary lengths.  I think we
> > > > > still don't want to impose unnecessary kernel memory consumptions on
> > > > > guests not making use of this, so I present again a growable memory
> > > > > slot array.
> > > > > 
> > > > > A couple notes/questions; in the previous version we had a
> > > > > kvm_arch_flush_shadow() call when we increased the number of slots.
> > > > > I'm not sure if this is still necessary.  I had also made the x86
> > > > > specific slot_bitmap dynamically grow as well and switch between a
> > > > > direct bitmap and indirect pointer to a bitmap.  That may have
> > > > > contributed to needing the flush.  I haven't done that yet here
> > > > > because it seems like an unnecessary complication if we have a max
> > > > > on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
> > > > > overhead.  If we want to go more, maybe we should make it switch.
> > > > > That leads to the final question, we need an upper bound since this
> > > > > does allow consumption of extra kernel memory, what should it be?  A
> > > > This is the most important question :) If we want to have 1000s of
> > > > them or 100 is enough?
> > > 
> > > We can certainly hit respectable numbers of assigned devices in the
> > > hundreds.  Worst case is 8 slots per assigned device, typical case is 4
> > > or less.  So 512 slots would more or less guarantee 64 devices (we do
> > > need some slots for actual memory), and more typically allow at least
> > > 128 devices.  Philosophically, supporting a full PCI bus, 256 functions,
> > > 2048 slots, is an attractive target, but it's probably no practical.
> > > 
> > > I think on x86 a slot is 72 bytes w/ alignment padding, so a maximum of
> > > 36k @512 slots.
> > > 
> > > >  Also what about changing kvm_memslots->memslots[]
> > > > array to be "struct kvm_memory_slot *memslots[KVM_MEM_SLOTS_NUM]"? It
> > > > will save us good amount of memory for unused slots.
> > > 
> > > I'm not following where that results in memory savings.  Can you
> > > clarify.  Thanks,
> > > 
> > We will waste sizeof(void*) for each unused slot instead of
> > sizeof(struct kvm_memory_slot).
> 
> Ah, of course.  That means for 512 slots we're wasting a full page just
> for the pointers, whereas we can fit 56 slots in the same space.  Given
> that most users get by just fine w/ 32 slots, I don't think that's a win
> in the typical case.  Maybe if we want to support sparse arrays, but a
You can look at it differently :). We can increase number of slots to 288
with the same memory footprint we have now. And 288 looks like a lot of
slots.

Memory is cheap and getting cheaper and complicated code stays
complicated. Of course we should not go crazy with wisting memory, but
not go crazy to save each byte either.

> tree would probably be better at that point.  A drawback of the growable
> array is that userspace can subvert any savings by using slot N-1 first,
> but that's why we put a limit at a reasonable size.  Thanks,
> 
Why using slot as an index? May be changing id_to_index[] into hash
table and use that to map from slot id to array index?

--
			Gleb.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/6] kvm: Growable memory slot array
  2012-12-04 16:42         ` Gleb Natapov
@ 2012-12-04 17:56           ` Alex Williamson
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2012-12-04 17:56 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: mtosatti, kvm, linux-kernel

On Tue, 2012-12-04 at 18:42 +0200, Gleb Natapov wrote:
> On Tue, Dec 04, 2012 at 08:39:47AM -0700, Alex Williamson wrote:
> > On Tue, 2012-12-04 at 17:30 +0200, Gleb Natapov wrote:
> > > On Tue, Dec 04, 2012 at 08:21:55AM -0700, Alex Williamson wrote:
> > > > On Tue, 2012-12-04 at 13:48 +0200, Gleb Natapov wrote:
> > > > > On Mon, Dec 03, 2012 at 04:39:05PM -0700, Alex Williamson wrote:
> > > > > > Memory slots are currently a fixed resource with a relatively small
> > > > > > limit.  When using PCI device assignment in a qemu guest it's fairly
> > > > > > easy to exhaust the number of available slots.  I posted patches
> > > > > > exploring growing the number of memory slots a while ago, but it was
> > > > > > prior to caching memory slot array misses and thefore had potentially
> > > > > > poor performance.  Now that we do that, Avi seemed receptive to
> > > > > > increasing the memory slot array to arbitrary lengths.  I think we
> > > > > > still don't want to impose unnecessary kernel memory consumptions on
> > > > > > guests not making use of this, so I present again a growable memory
> > > > > > slot array.
> > > > > > 
> > > > > > A couple notes/questions; in the previous version we had a
> > > > > > kvm_arch_flush_shadow() call when we increased the number of slots.
> > > > > > I'm not sure if this is still necessary.  I had also made the x86
> > > > > > specific slot_bitmap dynamically grow as well and switch between a
> > > > > > direct bitmap and indirect pointer to a bitmap.  That may have
> > > > > > contributed to needing the flush.  I haven't done that yet here
> > > > > > because it seems like an unnecessary complication if we have a max
> > > > > > on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
> > > > > > overhead.  If we want to go more, maybe we should make it switch.
> > > > > > That leads to the final question, we need an upper bound since this
> > > > > > does allow consumption of extra kernel memory, what should it be?  A
> > > > > This is the most important question :) If we want to have 1000s of
> > > > > them or 100 is enough?
> > > > 
> > > > We can certainly hit respectable numbers of assigned devices in the
> > > > hundreds.  Worst case is 8 slots per assigned device, typical case is 4
> > > > or less.  So 512 slots would more or less guarantee 64 devices (we do
> > > > need some slots for actual memory), and more typically allow at least
> > > > 128 devices.  Philosophically, supporting a full PCI bus, 256 functions,
> > > > 2048 slots, is an attractive target, but it's probably no practical.
> > > > 
> > > > I think on x86 a slot is 72 bytes w/ alignment padding, so a maximum of
> > > > 36k @512 slots.
> > > > 
> > > > >  Also what about changing kvm_memslots->memslots[]
> > > > > array to be "struct kvm_memory_slot *memslots[KVM_MEM_SLOTS_NUM]"? It
> > > > > will save us good amount of memory for unused slots.
> > > > 
> > > > I'm not following where that results in memory savings.  Can you
> > > > clarify.  Thanks,
> > > > 
> > > We will waste sizeof(void*) for each unused slot instead of
> > > sizeof(struct kvm_memory_slot).
> > 
> > Ah, of course.  That means for 512 slots we're wasting a full page just
> > for the pointers, whereas we can fit 56 slots in the same space.  Given
> > that most users get by just fine w/ 32 slots, I don't think that's a win
> > in the typical case.  Maybe if we want to support sparse arrays, but a
> You can look at it differently :). We can increase number of slots to 288
> with the same memory footprint we have now. And 288 looks like a lot of
> slots.

I could settle for 288 slots, but...

288*8 = 2304
2304/72 = 32

So yes, we can fit the pointer array into the same size as 32 slots, but
we haven't allocated the slots themselves yet, so the footprint is not
the same.  Are you thinking we'd re-use the slots between generations
and only allocate new/changed ones?  That still comes out to more memory
per update for the typical case, but we're also adding the indirection
overhead.

> Memory is cheap and getting cheaper and complicated code stays
> complicated. Of course we should not go crazy with wisting memory, but
> not go crazy to save each byte either.

Yep, we could just bump the number of slots and ignore everything else.
512 slots would be 36k on x86.  As above, we're currently using 2.25k.
Is it worth it?  If not, at what point would it be worth it?  50k?
100k?  I think the first 3, maybe 4 patches in this series are still
worthwhile even if we decide not to do anything clever with saving
memory.

> > tree would probably be better at that point.  A drawback of the growable
> > array is that userspace can subvert any savings by using slot N-1 first,
> > but that's why we put a limit at a reasonable size.  Thanks,
> > 
> Why using slot as an index? May be changing id_to_index[] into hash
> table and use that to map from slot id to array index?

It was just to keep thing simple, but yes, we could fix it by allocating
only as necessary.  id_to_index would have more overhead, but maybe we
could restrict ourselves to a short (or byte?) for indexes to save
space.  If we assume 512 slots and use a short for id_to_index[], that's
1k overhead, but we can still allocate 17 slots in the same 2304 bytes
we have now, which likely easily covers typical usage.  I can give it a
try.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 3/6] kvm: Merge id_to_index into memslots
  2012-12-03 23:39 ` [RFC PATCH 3/6] kvm: Merge id_to_index into memslots Alex Williamson
@ 2012-12-05 21:22   ` Marcelo Tosatti
  2012-12-05 22:58     ` Alex Williamson
  0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2012-12-05 21:22 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, gleb, linux-kernel

On Mon, Dec 03, 2012 at 04:39:24PM -0700, Alex Williamson wrote:
> This allows us to resize this structure and therefore the number of
> memslots as part of the RCU update.

Why is this necessary? "struct memslots" is updated, message above
conflicts with that.

If there is a reason, "id_to_index" becomes confusing.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 4/6] kvm: Move private memory slots to start of memslots array
  2012-12-03 23:39 ` [RFC PATCH 4/6] kvm: Move private memory slots to start of memslots array Alex Williamson
@ 2012-12-05 21:24   ` Marcelo Tosatti
  2012-12-05 22:58     ` Alex Williamson
  0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2012-12-05 21:24 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, gleb, linux-kernel

On Mon, Dec 03, 2012 at 04:39:30PM -0700, Alex Williamson wrote:
> In order to make the memslots array grow on demand, move the private
> slots to the lower indexes of the array.  The private slots are
> assumed likely to be in use, so if we didn't do this we'd end up
> allocating the full memslots array all the time.

You're changing private slot ids. Fail to see why is that relevant
to on-demand growing.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots
  2012-12-03 23:39 ` [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots Alex Williamson
@ 2012-12-05 21:26   ` Marcelo Tosatti
  2012-12-05 23:02     ` Alex Williamson
  0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2012-12-05 21:26 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, gleb, linux-kernel

On Mon, Dec 03, 2012 at 04:39:36PM -0700, Alex Williamson wrote:
> struct kvm_memory_slot is currently 52 bytes (LP64), not counting the
> arch data.  On x86 this means the memslot array to support a tiny 32+3
> entries (user+private) is over 2k.  We'd like to support more slots
> so that we can support more assigned devices, but it doesn't make
> sense to penalize everyone by using a statically allocated array.
> This allows us to start introducing a grow-able array.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  arch/ia64/kvm/kvm-ia64.c     |    2 +-
>  arch/powerpc/kvm/book3s_hv.c |    2 +-
>  arch/x86/kvm/vmx.c           |    1 +
>  arch/x86/kvm/x86.c           |    4 +++-
>  include/linux/kvm_host.h     |    9 ++++++---
>  virt/kvm/kvm_main.c          |   10 ++++++----
>  6 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index 012e5dd..96401b5 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1836,7 +1836,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  
>  	memslot = id_to_memslot(kvm->memslots, log->slot);
>  	r = -ENOENT;
> -	if (!memslot->dirty_bitmap)
> +	if (!memslots || !memslot->dirty_bitmap)
>  		goto out;
>  
>  	kvm_ia64_sync_dirty_log(kvm, memslot);
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 56067db..0417190 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1267,7 +1267,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  
>  	memslot = id_to_memslot(kvm->memslots, log->slot);
>  	r = -ENOENT;
> -	if (!memslot->dirty_bitmap)
> +	if (!memslot || !memslot->dirty_bitmap)
>  		goto out;
>  
>  	n = kvm_dirty_bitmap_bytes(memslot);
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 2bb9157..07fdd90 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2751,6 +2751,7 @@ static gva_t rmode_tss_base(struct kvm *kvm)
>  
>  		slots = kvm_memslots(kvm);
>  		slot = id_to_memslot(slots, KVM_PRIVATE_MEM_SLOTS);
> +		BUG_ON(!slot);
>  		base_gfn = slot->base_gfn + slot->npages - 3;
>  
>  		return base_gfn << PAGE_SHIFT;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8765485..53fe9b2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3139,9 +3139,11 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  		goto out;
>  
>  	memslot = id_to_memslot(kvm->memslots, log->slot);
> +	r = -ENOENT;
> +	if (!memslot)
> +		goto out;
>  
>  	dirty_bitmap = memslot->dirty_bitmap;
> -	r = -ENOENT;
>  	if (!dirty_bitmap)
>  		goto out;
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7b3d5c4..1955a4e 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -313,6 +313,7 @@ struct kvm_irq_routing_table {};
>   * to get the memslot by its id.
>   */
>  struct kvm_memslots {
> +	int nmemslots;
>  	u64 generation;
>  	struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
>  };
> @@ -397,7 +398,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
>  
>  #define kvm_for_each_memslot(memslot, slots)	\
>  	for (memslot = &slots->memslots[0];	\
> -	      memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
> +	      memslot < slots->memslots + slots->nmemslots && memslot->npages;\
>  		memslot++)
>  
>  int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id);
> @@ -424,10 +425,12 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
>  static inline struct kvm_memory_slot *
>  id_to_memslot(struct kvm_memslots *slots, int id)
>  {
> -	int index = slots->memslots[id].id_to_index;
>  	struct kvm_memory_slot *slot;
>  
> -	slot = &slots->memslots[index];
> +	if (id >= slots->nmemslots)
> +		return NULL;
> +
> +	slot = &slots->memslots[slots->memslots[id].id_to_index];
>  
>  	WARN_ON(slot->id != id);
>  	return slot;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 3ce2664..ebd3960 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -444,7 +444,9 @@ static void kvm_init_memslots_id(struct kvm *kvm)
>  	int i;
>  	struct kvm_memslots *slots = kvm->memslots;
>  
> -	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> +	slots->nmemslots = KVM_MEM_SLOTS_NUM;
> +
> +	for (i = 0; i < kvm->memslots->nmemslots; i++)
>  		slots->memslots[i].id_to_index = slots->memslots[i].id = i;
>  }
>  
> @@ -658,10 +660,10 @@ static void sort_memslots(struct kvm_memslots *slots)
>  {
>  	int i;
>  
> -	sort(slots->memslots, KVM_MEM_SLOTS_NUM,
> +	sort(slots->memslots, slots->nmemslots,
>  	      sizeof(struct kvm_memory_slot), cmp_memslot, NULL);
>  
> -	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> +	for (i = 0; i < slots->nmemslots; i++)
>  		slots->memslots[slots->memslots[i].id].id_to_index = i;
>  }
>  
> @@ -898,7 +900,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
>  
>  	memslot = id_to_memslot(kvm->memslots, log->slot);
>  	r = -ENOENT;
> -	if (!memslot->dirty_bitmap)
> +	if (!memslot || !memslot->dirty_bitmap)
>  		goto out;
>  
>  	n = kvm_dirty_bitmap_bytes(memslot);

I suppose this should be checked earlier, not at id_to_memslot time.
eg for kvm_get_dirty_log at 

        r = -EINVAL;
        if (log->slot >= KVM_MEMORY_SLOTS)
                goto out;

time


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/6] kvm: Growable memory slot array
  2012-12-03 23:39 [RFC PATCH 0/6] kvm: Growable memory slot array Alex Williamson
                   ` (7 preceding siblings ...)
  2012-12-04 14:48 ` Takuya Yoshikawa
@ 2012-12-05 21:32 ` Marcelo Tosatti
  2012-12-05 22:57   ` Alex Williamson
  8 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2012-12-05 21:32 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, gleb, linux-kernel

On Mon, Dec 03, 2012 at 04:39:05PM -0700, Alex Williamson wrote:
> Memory slots are currently a fixed resource with a relatively small
> limit.  When using PCI device assignment in a qemu guest it's fairly
> easy to exhaust the number of available slots.  I posted patches
> exploring growing the number of memory slots a while ago, but it was
> prior to caching memory slot array misses and thefore had potentially
> poor performance.  Now that we do that, Avi seemed receptive to
> increasing the memory slot array to arbitrary lengths.  I think we
> still don't want to impose unnecessary kernel memory consumptions on
> guests not making use of this, so I present again a growable memory
> slot array.
> 
> A couple notes/questions; in the previous version we had a
> kvm_arch_flush_shadow() call when we increased the number of slots.
> I'm not sure if this is still necessary.  I had also made the x86
> specific slot_bitmap dynamically grow as well and switch between a
> direct bitmap and indirect pointer to a bitmap.  That may have
> contributed to needing the flush.  

I don't remember. Do you recall what was the argument back then?
(there must have been some).

> I haven't done that yet here
> because it seems like an unnecessary complication if we have a max
> on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
> overhead.  If we want to go more, maybe we should make it switch.
> That leads to the final question, we need an upper bound since this
> does allow consumption of extra kernel memory, what should it be?  A
> PCI bus filled with assigned devices can theorically use up to 2048
> slots (32 devices * 8 functions * (6 BARs + ROM + possibly split
> MSI-X BAR)).  For this RFC, I don't change the max, just make it
> grow up to 32 user slots.  Untested on anything but x86 so far.
> Thanks,

Not sure. Some reasonable number based on current usage expectations?
(can be increased later if necessary).


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/6] kvm: Growable memory slot array
  2012-12-05 21:32 ` Marcelo Tosatti
@ 2012-12-05 22:57   ` Alex Williamson
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2012-12-05 22:57 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, gleb, linux-kernel

On Wed, 2012-12-05 at 19:32 -0200, Marcelo Tosatti wrote:
> On Mon, Dec 03, 2012 at 04:39:05PM -0700, Alex Williamson wrote:
> > Memory slots are currently a fixed resource with a relatively small
> > limit.  When using PCI device assignment in a qemu guest it's fairly
> > easy to exhaust the number of available slots.  I posted patches
> > exploring growing the number of memory slots a while ago, but it was
> > prior to caching memory slot array misses and thefore had potentially
> > poor performance.  Now that we do that, Avi seemed receptive to
> > increasing the memory slot array to arbitrary lengths.  I think we
> > still don't want to impose unnecessary kernel memory consumptions on
> > guests not making use of this, so I present again a growable memory
> > slot array.
> > 
> > A couple notes/questions; in the previous version we had a
> > kvm_arch_flush_shadow() call when we increased the number of slots.
> > I'm not sure if this is still necessary.  I had also made the x86
> > specific slot_bitmap dynamically grow as well and switch between a
> > direct bitmap and indirect pointer to a bitmap.  That may have
> > contributed to needing the flush.  
> 
> I don't remember. Do you recall what was the argument back then?
> (there must have been some).

I vaguely recall chatting with you on irc about it before posting, so
unfortunately there's no list discussion.  It's been almost 2 years, so
it's not surprising we've all forgotten.  Here's the original post:

http://article.gmane.org/gmane.linux.kernel/1103962

(click on the subject to get to the thread)  That version also included
an optimization to the x86-only slot_bitmap, and it's entirely possible
the flush had more to do with that than the memslots themselves.  I
think Avi kind of alludes to this in his first reply that the flushing
is more aggressive than necessary and indicates it could happen only
when crossing BITS_PER_LONG boundaries.

> > I haven't done that yet here
> > because it seems like an unnecessary complication if we have a max
> > on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
> > overhead.  If we want to go more, maybe we should make it switch.
> > That leads to the final question, we need an upper bound since this
> > does allow consumption of extra kernel memory, what should it be?  A
> > PCI bus filled with assigned devices can theorically use up to 2048
> > slots (32 devices * 8 functions * (6 BARs + ROM + possibly split
> > MSI-X BAR)).  For this RFC, I don't change the max, just make it
> > grow up to 32 user slots.  Untested on anything but x86 so far.
> > Thanks,
> 
> Not sure. Some reasonable number based on current usage expectations?
> (can be increased later if necessary).

The first obvious step is to double it to 64 slots.  With typical
devices, that would give us 16+ assigned devices.  There are already
people bumping into the 8 device limit we set in RHEL, so doubling it
doesn't feel like much headroom.

If we double again to 128 slots then we can likely support 32 typical
devices.  That's a full PCI bus of single function devices.  That's
probably the first acceptable step.

It looks like each slot on x86_64 is 64bytes (somehow I was throwing
around 72bytes before, not sure where I counted wrong), so we currently
have:

32 user + 4 private slots = 36*64 = 2304
32+4 id_to_index = 36*4 = 144
32+4 entry slot_bitmap = 8
Total = 2456

At 132 (128+4), this becomes 8448 + 528 + 24 = 9000 bytes

We can actually compact struct kvm_memory_slot down to 56 bytes (flags
-> u32, user_alloc -> bool, id -> short), which also cuts id_to_index in
half, so that gives us: 7392 + 264 + 24 = 7680

(I might sacrifice a couple user slots just to make these powers of 2,
ie. 124 user + 4 private = 128, 7440 bytes)

Should we target that as a first step and ignore all this extra
complication?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 3/6] kvm: Merge id_to_index into memslots
  2012-12-05 21:22   ` Marcelo Tosatti
@ 2012-12-05 22:58     ` Alex Williamson
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2012-12-05 22:58 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, gleb, linux-kernel

On Wed, 2012-12-05 at 19:22 -0200, Marcelo Tosatti wrote:
> On Mon, Dec 03, 2012 at 04:39:24PM -0700, Alex Williamson wrote:
> > This allows us to resize this structure and therefore the number of
> > memslots as part of the RCU update.
> 
> Why is this necessary? "struct memslots" is updated, message above
> conflicts with that.
> 
> If there is a reason, "id_to_index" becomes confusing.
> 

I can't have two indeterminate sized arrays in the same structure, so by
moving id_to_index into struct kvm_memory_slot I get back to one array.
I'm playing with whether this is worthwhile.  If we reduce id_to_index
to a short then we can leave it with a static size and still get a
memory savings for the typical case.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 4/6] kvm: Move private memory slots to start of memslots array
  2012-12-05 21:24   ` Marcelo Tosatti
@ 2012-12-05 22:58     ` Alex Williamson
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2012-12-05 22:58 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, gleb, linux-kernel

On Wed, 2012-12-05 at 19:24 -0200, Marcelo Tosatti wrote:
> On Mon, Dec 03, 2012 at 04:39:30PM -0700, Alex Williamson wrote:
> > In order to make the memslots array grow on demand, move the private
> > slots to the lower indexes of the array.  The private slots are
> > assumed likely to be in use, so if we didn't do this we'd end up
> > allocating the full memslots array all the time.
> 
> You're changing private slot ids. Fail to see why is that relevant
> to on-demand growing.

This series takes a really simple approach and grows the array up to the
given slot->id.  If we have private slots at the end of the array then
we always allocate the whole thing.  I'm playing with a version now that
fixes this, but complexity becomes a much more significant factor.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots
  2012-12-05 21:26   ` Marcelo Tosatti
@ 2012-12-05 23:02     ` Alex Williamson
  2012-12-06  1:45       ` Marcelo Tosatti
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2012-12-05 23:02 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, gleb, linux-kernel

On Wed, 2012-12-05 at 19:26 -0200, Marcelo Tosatti wrote:
> On Mon, Dec 03, 2012 at 04:39:36PM -0700, Alex Williamson wrote:
> > struct kvm_memory_slot is currently 52 bytes (LP64), not counting the
> > arch data.  On x86 this means the memslot array to support a tiny 32+3
> > entries (user+private) is over 2k.  We'd like to support more slots
> > so that we can support more assigned devices, but it doesn't make
> > sense to penalize everyone by using a statically allocated array.
> > This allows us to start introducing a grow-able array.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >  arch/ia64/kvm/kvm-ia64.c     |    2 +-
> >  arch/powerpc/kvm/book3s_hv.c |    2 +-
> >  arch/x86/kvm/vmx.c           |    1 +
> >  arch/x86/kvm/x86.c           |    4 +++-
> >  include/linux/kvm_host.h     |    9 ++++++---
> >  virt/kvm/kvm_main.c          |   10 ++++++----
> >  6 files changed, 18 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> > index 012e5dd..96401b5 100644
> > --- a/arch/ia64/kvm/kvm-ia64.c
> > +++ b/arch/ia64/kvm/kvm-ia64.c
> > @@ -1836,7 +1836,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> >  
> >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> >  	r = -ENOENT;
> > -	if (!memslot->dirty_bitmap)
> > +	if (!memslots || !memslot->dirty_bitmap)
> >  		goto out;
> >  
> >  	kvm_ia64_sync_dirty_log(kvm, memslot);
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 56067db..0417190 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -1267,7 +1267,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> >  
> >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> >  	r = -ENOENT;
> > -	if (!memslot->dirty_bitmap)
> > +	if (!memslot || !memslot->dirty_bitmap)
> >  		goto out;
> >  
> >  	n = kvm_dirty_bitmap_bytes(memslot);
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 2bb9157..07fdd90 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -2751,6 +2751,7 @@ static gva_t rmode_tss_base(struct kvm *kvm)
> >  
> >  		slots = kvm_memslots(kvm);
> >  		slot = id_to_memslot(slots, KVM_PRIVATE_MEM_SLOTS);
> > +		BUG_ON(!slot);
> >  		base_gfn = slot->base_gfn + slot->npages - 3;
> >  
> >  		return base_gfn << PAGE_SHIFT;
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 8765485..53fe9b2 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3139,9 +3139,11 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> >  		goto out;
> >  
> >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> > +	r = -ENOENT;
> > +	if (!memslot)
> > +		goto out;
> >  
> >  	dirty_bitmap = memslot->dirty_bitmap;
> > -	r = -ENOENT;
> >  	if (!dirty_bitmap)
> >  		goto out;
> >  
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 7b3d5c4..1955a4e 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -313,6 +313,7 @@ struct kvm_irq_routing_table {};
> >   * to get the memslot by its id.
> >   */
> >  struct kvm_memslots {
> > +	int nmemslots;
> >  	u64 generation;
> >  	struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
> >  };
> > @@ -397,7 +398,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> >  
> >  #define kvm_for_each_memslot(memslot, slots)	\
> >  	for (memslot = &slots->memslots[0];	\
> > -	      memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
> > +	      memslot < slots->memslots + slots->nmemslots && memslot->npages;\
> >  		memslot++)
> >  
> >  int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id);
> > @@ -424,10 +425,12 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
> >  static inline struct kvm_memory_slot *
> >  id_to_memslot(struct kvm_memslots *slots, int id)
> >  {
> > -	int index = slots->memslots[id].id_to_index;
> >  	struct kvm_memory_slot *slot;
> >  
> > -	slot = &slots->memslots[index];
> > +	if (id >= slots->nmemslots)
> > +		return NULL;
> > +
> > +	slot = &slots->memslots[slots->memslots[id].id_to_index];
> >  
> >  	WARN_ON(slot->id != id);
> >  	return slot;
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 3ce2664..ebd3960 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -444,7 +444,9 @@ static void kvm_init_memslots_id(struct kvm *kvm)
> >  	int i;
> >  	struct kvm_memslots *slots = kvm->memslots;
> >  
> > -	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> > +	slots->nmemslots = KVM_MEM_SLOTS_NUM;
> > +
> > +	for (i = 0; i < kvm->memslots->nmemslots; i++)
> >  		slots->memslots[i].id_to_index = slots->memslots[i].id = i;
> >  }
> >  
> > @@ -658,10 +660,10 @@ static void sort_memslots(struct kvm_memslots *slots)
> >  {
> >  	int i;
> >  
> > -	sort(slots->memslots, KVM_MEM_SLOTS_NUM,
> > +	sort(slots->memslots, slots->nmemslots,
> >  	      sizeof(struct kvm_memory_slot), cmp_memslot, NULL);
> >  
> > -	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> > +	for (i = 0; i < slots->nmemslots; i++)
> >  		slots->memslots[slots->memslots[i].id].id_to_index = i;
> >  }
> >  
> > @@ -898,7 +900,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
> >  
> >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> >  	r = -ENOENT;
> > -	if (!memslot->dirty_bitmap)
> > +	if (!memslot || !memslot->dirty_bitmap)
> >  		goto out;
> >  
> >  	n = kvm_dirty_bitmap_bytes(memslot);
> 
> I suppose this should be checked earlier, not at id_to_memslot time.
> eg for kvm_get_dirty_log at 
> 
>         r = -EINVAL;
>         if (log->slot >= KVM_MEMORY_SLOTS)
>                 goto out;
> 
> time

id_to_memslot seems like a good place to catch all the users since
that's the only way to get a slot from a slot id after the array is
sorted.  We need to check both is the slot in bounds (EINVAL), but also
is it allocated (ENOENT).  id_to_memslot could both of these if we
wanted to switch it to ERR_PTR.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots
  2012-12-05 23:02     ` Alex Williamson
@ 2012-12-06  1:45       ` Marcelo Tosatti
  2012-12-06  3:51         ` Alex Williamson
  0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2012-12-06  1:45 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, gleb, linux-kernel

On Wed, Dec 05, 2012 at 04:02:53PM -0700, Alex Williamson wrote:
> On Wed, 2012-12-05 at 19:26 -0200, Marcelo Tosatti wrote:
> > On Mon, Dec 03, 2012 at 04:39:36PM -0700, Alex Williamson wrote:
> > > struct kvm_memory_slot is currently 52 bytes (LP64), not counting the
> > > arch data.  On x86 this means the memslot array to support a tiny 32+3
> > > entries (user+private) is over 2k.  We'd like to support more slots
> > > so that we can support more assigned devices, but it doesn't make
> > > sense to penalize everyone by using a statically allocated array.
> > > This allows us to start introducing a grow-able array.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > >  arch/ia64/kvm/kvm-ia64.c     |    2 +-
> > >  arch/powerpc/kvm/book3s_hv.c |    2 +-
> > >  arch/x86/kvm/vmx.c           |    1 +
> > >  arch/x86/kvm/x86.c           |    4 +++-
> > >  include/linux/kvm_host.h     |    9 ++++++---
> > >  virt/kvm/kvm_main.c          |   10 ++++++----
> > >  6 files changed, 18 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> > > index 012e5dd..96401b5 100644
> > > --- a/arch/ia64/kvm/kvm-ia64.c
> > > +++ b/arch/ia64/kvm/kvm-ia64.c
> > > @@ -1836,7 +1836,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> > >  
> > >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> > >  	r = -ENOENT;
> > > -	if (!memslot->dirty_bitmap)
> > > +	if (!memslots || !memslot->dirty_bitmap)
> > >  		goto out;
> > >  
> > >  	kvm_ia64_sync_dirty_log(kvm, memslot);
> > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > > index 56067db..0417190 100644
> > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > @@ -1267,7 +1267,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> > >  
> > >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> > >  	r = -ENOENT;
> > > -	if (!memslot->dirty_bitmap)
> > > +	if (!memslot || !memslot->dirty_bitmap)
> > >  		goto out;
> > >  
> > >  	n = kvm_dirty_bitmap_bytes(memslot);
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index 2bb9157..07fdd90 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -2751,6 +2751,7 @@ static gva_t rmode_tss_base(struct kvm *kvm)
> > >  
> > >  		slots = kvm_memslots(kvm);
> > >  		slot = id_to_memslot(slots, KVM_PRIVATE_MEM_SLOTS);
> > > +		BUG_ON(!slot);
> > >  		base_gfn = slot->base_gfn + slot->npages - 3;
> > >  
> > >  		return base_gfn << PAGE_SHIFT;
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 8765485..53fe9b2 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -3139,9 +3139,11 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> > >  		goto out;
> > >  
> > >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> > > +	r = -ENOENT;
> > > +	if (!memslot)
> > > +		goto out;
> > >  
> > >  	dirty_bitmap = memslot->dirty_bitmap;
> > > -	r = -ENOENT;
> > >  	if (!dirty_bitmap)
> > >  		goto out;
> > >  
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 7b3d5c4..1955a4e 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -313,6 +313,7 @@ struct kvm_irq_routing_table {};
> > >   * to get the memslot by its id.
> > >   */
> > >  struct kvm_memslots {
> > > +	int nmemslots;
> > >  	u64 generation;
> > >  	struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
> > >  };
> > > @@ -397,7 +398,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> > >  
> > >  #define kvm_for_each_memslot(memslot, slots)	\
> > >  	for (memslot = &slots->memslots[0];	\
> > > -	      memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
> > > +	      memslot < slots->memslots + slots->nmemslots && memslot->npages;\
> > >  		memslot++)
> > >  
> > >  int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id);
> > > @@ -424,10 +425,12 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
> > >  static inline struct kvm_memory_slot *
> > >  id_to_memslot(struct kvm_memslots *slots, int id)
> > >  {
> > > -	int index = slots->memslots[id].id_to_index;
> > >  	struct kvm_memory_slot *slot;
> > >  
> > > -	slot = &slots->memslots[index];
> > > +	if (id >= slots->nmemslots)
> > > +		return NULL;
> > > +
> > > +	slot = &slots->memslots[slots->memslots[id].id_to_index];
> > >  
> > >  	WARN_ON(slot->id != id);
> > >  	return slot;
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 3ce2664..ebd3960 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -444,7 +444,9 @@ static void kvm_init_memslots_id(struct kvm *kvm)
> > >  	int i;
> > >  	struct kvm_memslots *slots = kvm->memslots;
> > >  
> > > -	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> > > +	slots->nmemslots = KVM_MEM_SLOTS_NUM;
> > > +
> > > +	for (i = 0; i < kvm->memslots->nmemslots; i++)
> > >  		slots->memslots[i].id_to_index = slots->memslots[i].id = i;
> > >  }
> > >  
> > > @@ -658,10 +660,10 @@ static void sort_memslots(struct kvm_memslots *slots)
> > >  {
> > >  	int i;
> > >  
> > > -	sort(slots->memslots, KVM_MEM_SLOTS_NUM,
> > > +	sort(slots->memslots, slots->nmemslots,
> > >  	      sizeof(struct kvm_memory_slot), cmp_memslot, NULL);
> > >  
> > > -	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> > > +	for (i = 0; i < slots->nmemslots; i++)
> > >  		slots->memslots[slots->memslots[i].id].id_to_index = i;
> > >  }
> > >  
> > > @@ -898,7 +900,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
> > >  
> > >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> > >  	r = -ENOENT;
> > > -	if (!memslot->dirty_bitmap)
> > > +	if (!memslot || !memslot->dirty_bitmap)
> > >  		goto out;
> > >  
> > >  	n = kvm_dirty_bitmap_bytes(memslot);
> > 
> > I suppose this should be checked earlier, not at id_to_memslot time.
> > eg for kvm_get_dirty_log at 
> > 
> >         r = -EINVAL;
> >         if (log->slot >= KVM_MEMORY_SLOTS)
> >                 goto out;
> > 
> > time
> 
> id_to_memslot seems like a good place to catch all the users since
> that's the only way to get a slot from a slot id after the array is
> sorted.  We need to check both is the slot in bounds (EINVAL), but also
> is it allocated (ENOENT).  id_to_memslot could both of these if we
> wanted to switch it to ERR_PTR.  Thanks,
> 
> Alex

There should never be a reference to a slot out of bounds by KVM itself
(BUG_ON). Only userspace can attempt a reference to such slot.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots
  2012-12-06  1:45       ` Marcelo Tosatti
@ 2012-12-06  3:51         ` Alex Williamson
  2012-12-06 23:58           ` Marcelo Tosatti
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2012-12-06  3:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, gleb, linux-kernel

On Wed, 2012-12-05 at 23:45 -0200, Marcelo Tosatti wrote:
> On Wed, Dec 05, 2012 at 04:02:53PM -0700, Alex Williamson wrote:
> > On Wed, 2012-12-05 at 19:26 -0200, Marcelo Tosatti wrote:
> > > On Mon, Dec 03, 2012 at 04:39:36PM -0700, Alex Williamson wrote:
> > > > struct kvm_memory_slot is currently 52 bytes (LP64), not counting the
> > > > arch data.  On x86 this means the memslot array to support a tiny 32+3
> > > > entries (user+private) is over 2k.  We'd like to support more slots
> > > > so that we can support more assigned devices, but it doesn't make
> > > > sense to penalize everyone by using a statically allocated array.
> > > > This allows us to start introducing a grow-able array.
> > > > 
> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > ---
> > > >  arch/ia64/kvm/kvm-ia64.c     |    2 +-
> > > >  arch/powerpc/kvm/book3s_hv.c |    2 +-
> > > >  arch/x86/kvm/vmx.c           |    1 +
> > > >  arch/x86/kvm/x86.c           |    4 +++-
> > > >  include/linux/kvm_host.h     |    9 ++++++---
> > > >  virt/kvm/kvm_main.c          |   10 ++++++----
> > > >  6 files changed, 18 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> > > > index 012e5dd..96401b5 100644
> > > > --- a/arch/ia64/kvm/kvm-ia64.c
> > > > +++ b/arch/ia64/kvm/kvm-ia64.c
> > > > @@ -1836,7 +1836,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> > > >  
> > > >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> > > >  	r = -ENOENT;
> > > > -	if (!memslot->dirty_bitmap)
> > > > +	if (!memslots || !memslot->dirty_bitmap)
> > > >  		goto out;
> > > >  
> > > >  	kvm_ia64_sync_dirty_log(kvm, memslot);
> > > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > > > index 56067db..0417190 100644
> > > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > > @@ -1267,7 +1267,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> > > >  
> > > >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> > > >  	r = -ENOENT;
> > > > -	if (!memslot->dirty_bitmap)
> > > > +	if (!memslot || !memslot->dirty_bitmap)
> > > >  		goto out;
> > > >  
> > > >  	n = kvm_dirty_bitmap_bytes(memslot);
> > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > index 2bb9157..07fdd90 100644
> > > > --- a/arch/x86/kvm/vmx.c
> > > > +++ b/arch/x86/kvm/vmx.c
> > > > @@ -2751,6 +2751,7 @@ static gva_t rmode_tss_base(struct kvm *kvm)
> > > >  
> > > >  		slots = kvm_memslots(kvm);
> > > >  		slot = id_to_memslot(slots, KVM_PRIVATE_MEM_SLOTS);
> > > > +		BUG_ON(!slot);
> > > >  		base_gfn = slot->base_gfn + slot->npages - 3;
> > > >  
> > > >  		return base_gfn << PAGE_SHIFT;
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 8765485..53fe9b2 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -3139,9 +3139,11 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> > > >  		goto out;
> > > >  
> > > >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> > > > +	r = -ENOENT;
> > > > +	if (!memslot)
> > > > +		goto out;
> > > >  
> > > >  	dirty_bitmap = memslot->dirty_bitmap;
> > > > -	r = -ENOENT;
> > > >  	if (!dirty_bitmap)
> > > >  		goto out;
> > > >  
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index 7b3d5c4..1955a4e 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -313,6 +313,7 @@ struct kvm_irq_routing_table {};
> > > >   * to get the memslot by its id.
> > > >   */
> > > >  struct kvm_memslots {
> > > > +	int nmemslots;
> > > >  	u64 generation;
> > > >  	struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
> > > >  };
> > > > @@ -397,7 +398,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> > > >  
> > > >  #define kvm_for_each_memslot(memslot, slots)	\
> > > >  	for (memslot = &slots->memslots[0];	\
> > > > -	      memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
> > > > +	      memslot < slots->memslots + slots->nmemslots && memslot->npages;\
> > > >  		memslot++)
> > > >  
> > > >  int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id);
> > > > @@ -424,10 +425,12 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
> > > >  static inline struct kvm_memory_slot *
> > > >  id_to_memslot(struct kvm_memslots *slots, int id)
> > > >  {
> > > > -	int index = slots->memslots[id].id_to_index;
> > > >  	struct kvm_memory_slot *slot;
> > > >  
> > > > -	slot = &slots->memslots[index];
> > > > +	if (id >= slots->nmemslots)
> > > > +		return NULL;
> > > > +
> > > > +	slot = &slots->memslots[slots->memslots[id].id_to_index];
> > > >  
> > > >  	WARN_ON(slot->id != id);
> > > >  	return slot;
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > index 3ce2664..ebd3960 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -444,7 +444,9 @@ static void kvm_init_memslots_id(struct kvm *kvm)
> > > >  	int i;
> > > >  	struct kvm_memslots *slots = kvm->memslots;
> > > >  
> > > > -	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> > > > +	slots->nmemslots = KVM_MEM_SLOTS_NUM;
> > > > +
> > > > +	for (i = 0; i < kvm->memslots->nmemslots; i++)
> > > >  		slots->memslots[i].id_to_index = slots->memslots[i].id = i;
> > > >  }
> > > >  
> > > > @@ -658,10 +660,10 @@ static void sort_memslots(struct kvm_memslots *slots)
> > > >  {
> > > >  	int i;
> > > >  
> > > > -	sort(slots->memslots, KVM_MEM_SLOTS_NUM,
> > > > +	sort(slots->memslots, slots->nmemslots,
> > > >  	      sizeof(struct kvm_memory_slot), cmp_memslot, NULL);
> > > >  
> > > > -	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> > > > +	for (i = 0; i < slots->nmemslots; i++)
> > > >  		slots->memslots[slots->memslots[i].id].id_to_index = i;
> > > >  }
> > > >  
> > > > @@ -898,7 +900,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
> > > >  
> > > >  	memslot = id_to_memslot(kvm->memslots, log->slot);
> > > >  	r = -ENOENT;
> > > > -	if (!memslot->dirty_bitmap)
> > > > +	if (!memslot || !memslot->dirty_bitmap)
> > > >  		goto out;
> > > >  
> > > >  	n = kvm_dirty_bitmap_bytes(memslot);
> > > 
> > > I suppose this should be checked earlier, not at id_to_memslot time.
> > > eg for kvm_get_dirty_log at 
> > > 
> > >         r = -EINVAL;
> > >         if (log->slot >= KVM_MEMORY_SLOTS)
> > >                 goto out;
> > > 
> > > time
> > 
> > id_to_memslot seems like a good place to catch all the users since
> > that's the only way to get a slot from a slot id after the array is
> > sorted.  We need to check both is the slot in bounds (EINVAL), but also
> > is it allocated (ENOENT).  id_to_memslot could both of these if we
> > wanted to switch it to ERR_PTR.  Thanks,
> > 
> > Alex
> 
> There should never be a reference to a slot out of bounds by KVM itself
> (BUG_ON). Only userspace can attempt a reference to such slot.

If I understand correctly, you're saying this last chunk is unique
because kvm_get_dirty_log() is an internal interface and the test should
be restricted to callers from userspace interfaces, namely
kvm_vm_ioctl_get_dirty_log().  That sounds reasonable; book3s_pr seems
to be the only caller that relies on kvm_get_dirty_log() validating the
slot.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots
  2012-12-06  3:51         ` Alex Williamson
@ 2012-12-06 23:58           ` Marcelo Tosatti
  2012-12-06 23:59             ` Marcelo Tosatti
  0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2012-12-06 23:58 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, gleb, linux-kernel

On Wed, Dec 05, 2012 at 08:51:37PM -0700, Alex Williamson wrote:
> > > id_to_memslot seems like a good place to catch all the users since
> > > that's the only way to get a slot from a slot id after the array is
> > > sorted.  We need to check both is the slot in bounds (EINVAL), but also
> > > is it allocated (ENOENT).  id_to_memslot could both of these if we
> > > wanted to switch it to ERR_PTR.  Thanks,
> > > 
> > > Alex
> > 
> > There should never be a reference to a slot out of bounds by KVM itself
> > (BUG_ON). Only userspace can attempt a reference to such slot.
> 
> If I understand correctly, you're saying this last chunk is unique
> because kvm_get_dirty_log() is an internal interface and the test should
> be restricted to callers from userspace interfaces, namely
> kvm_vm_ioctl_get_dirty_log().  That sounds reasonable; book3s_pr seems
> to be the only caller that relies on kvm_get_dirty_log() validating the
> slot.  Thanks,
> 
> Alex

Yep - so you can move the check to such userspace interfaces, and bug on 
on WARN otherwise (in id_to_memslot).

Does that make sense??


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots
  2012-12-06 23:58           ` Marcelo Tosatti
@ 2012-12-06 23:59             ` Marcelo Tosatti
  2012-12-07  0:07               ` Alex Williamson
  0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2012-12-06 23:59 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, gleb, linux-kernel

On Thu, Dec 06, 2012 at 09:58:48PM -0200, Marcelo Tosatti wrote:
> On Wed, Dec 05, 2012 at 08:51:37PM -0700, Alex Williamson wrote:
> > > > id_to_memslot seems like a good place to catch all the users since
> > > > that's the only way to get a slot from a slot id after the array is
> > > > sorted.  We need to check both is the slot in bounds (EINVAL), but also
> > > > is it allocated (ENOENT).  id_to_memslot could both of these if we
> > > > wanted to switch it to ERR_PTR.  Thanks,
> > > > 
> > > > Alex
> > > 
> > > There should never be a reference to a slot out of bounds by KVM itself
> > > (BUG_ON). Only userspace can attempt a reference to such slot.
> > 
> > If I understand correctly, you're saying this last chunk is unique
> > because kvm_get_dirty_log() is an internal interface and the test should
> > be restricted to callers from userspace interfaces, namely
> > kvm_vm_ioctl_get_dirty_log().  That sounds reasonable; book3s_pr seems
> > to be the only caller that relies on kvm_get_dirty_log() validating the
> > slot.  Thanks,
> > 
> > Alex
> 
> Yep - so you can move the check to such userspace interfaces, and bug on 
> on WARN otherwise (in id_to_memslot).

WARN_ON. The point is, if its not a valid condition, it should be
explicitly so.

> Does that make sense??
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots
  2012-12-06 23:59             ` Marcelo Tosatti
@ 2012-12-07  0:07               ` Alex Williamson
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2012-12-07  0:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, gleb, linux-kernel

On Thu, 2012-12-06 at 21:59 -0200, Marcelo Tosatti wrote:
> On Thu, Dec 06, 2012 at 09:58:48PM -0200, Marcelo Tosatti wrote:
> > On Wed, Dec 05, 2012 at 08:51:37PM -0700, Alex Williamson wrote:
> > > > > id_to_memslot seems like a good place to catch all the users since
> > > > > that's the only way to get a slot from a slot id after the array is
> > > > > sorted.  We need to check both is the slot in bounds (EINVAL), but also
> > > > > is it allocated (ENOENT).  id_to_memslot could both of these if we
> > > > > wanted to switch it to ERR_PTR.  Thanks,
> > > > > 
> > > > > Alex
> > > > 
> > > > There should never be a reference to a slot out of bounds by KVM itself
> > > > (BUG_ON). Only userspace can attempt a reference to such slot.
> > > 
> > > If I understand correctly, you're saying this last chunk is unique
> > > because kvm_get_dirty_log() is an internal interface and the test should
> > > be restricted to callers from userspace interfaces, namely
> > > kvm_vm_ioctl_get_dirty_log().  That sounds reasonable; book3s_pr seems
> > > to be the only caller that relies on kvm_get_dirty_log() validating the
> > > slot.  Thanks,
> > > 
> > > Alex
> > 
> > Yep - so you can move the check to such userspace interfaces, and bug on 
> > on WARN otherwise (in id_to_memslot).
> 
> WARN_ON. The point is, if its not a valid condition, it should be
> explicitly so.
> 
> > Does that make sense??

Yep, I'll add that if we decide to go that route.  This patch isn't
necessary with the series I just posted since the array is still static.
Thanks,

Alex



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2012-12-07  0:07 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-03 23:39 [RFC PATCH 0/6] kvm: Growable memory slot array Alex Williamson
2012-12-03 23:39 ` [RFC PATCH 1/6] kvm: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS Alex Williamson
2012-12-03 23:39 ` [RFC PATCH 2/6] kvm: Make KVM_PRIVATE_MEM_SLOTS optional Alex Williamson
2012-12-03 23:39 ` [RFC PATCH 3/6] kvm: Merge id_to_index into memslots Alex Williamson
2012-12-05 21:22   ` Marcelo Tosatti
2012-12-05 22:58     ` Alex Williamson
2012-12-03 23:39 ` [RFC PATCH 4/6] kvm: Move private memory slots to start of memslots array Alex Williamson
2012-12-05 21:24   ` Marcelo Tosatti
2012-12-05 22:58     ` Alex Williamson
2012-12-03 23:39 ` [RFC PATCH 5/6] kvm: Re-introduce memslots->nmemslots Alex Williamson
2012-12-05 21:26   ` Marcelo Tosatti
2012-12-05 23:02     ` Alex Williamson
2012-12-06  1:45       ` Marcelo Tosatti
2012-12-06  3:51         ` Alex Williamson
2012-12-06 23:58           ` Marcelo Tosatti
2012-12-06 23:59             ` Marcelo Tosatti
2012-12-07  0:07               ` Alex Williamson
2012-12-03 23:39 ` [RFC PATCH 6/6] kvm: Allow memory slots to grow Alex Williamson
2012-12-04 11:48 ` [RFC PATCH 0/6] kvm: Growable memory slot array Gleb Natapov
2012-12-04 15:21   ` Alex Williamson
2012-12-04 15:30     ` Gleb Natapov
2012-12-04 15:39       ` Alex Williamson
2012-12-04 16:42         ` Gleb Natapov
2012-12-04 17:56           ` Alex Williamson
2012-12-04 14:48 ` Takuya Yoshikawa
2012-12-04 15:26   ` Alex Williamson
2012-12-05 21:32 ` Marcelo Tosatti
2012-12-05 22:57   ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).