All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
@ 2011-01-21 23:48 Alex Williamson
  2011-01-21 23:48 ` [RFC PATCH 1/2] kvm: Allow querying free slots Alex Williamson
                   ` (3 more replies)
  0 siblings, 4 replies; 50+ messages in thread
From: Alex Williamson @ 2011-01-21 23:48 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, ddutile, mst, avi, chrisw, jan.kiszka

When doing device assignment, we use cpu_register_physical_memory() to
directly map the qemu mmap of the device resource into the address
space of the guest.  The unadvertised feature of the register physical
memory code path on kvm, at least for this type of mapping, is that it
needs to allocate an index from a small, fixed array of memory slots.
Even better, if it can't get an index, the code aborts deep in the
kvm specific bits, preventing the caller from having a chance to
recover.

It's really easy to hit this by hot adding too many assigned devices
to a guest (pretty easy to hit with too many devices at instantiation
time too, but the abort is slightly more bearable there).

I'm assuming it's pretty difficult to make the memory slot array
dynamically sized.  If that's not the case, please let me know as
that would be a much better solution.

I'm not terribly happy with the solution in this series, it doesn't
provide any guarantees whether a cpu_register_physical_memory() will
succeed, only slightly better educated guesses.

Are there better ideas how we could solve this?  Thanks,

Alex

---

Alex Williamson (2):
      device-assignment: Count required kvm memory slots
      kvm: Allow querying free slots


 hw/device-assignment.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++-
 hw/device-assignment.h |    3 ++
 kvm-all.c              |   16 +++++++++++++
 kvm.h                  |    2 ++
 4 files changed, 79 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC PATCH 1/2] kvm: Allow querying free slots
  2011-01-21 23:48 [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts Alex Williamson
@ 2011-01-21 23:48 ` Alex Williamson
  2011-01-21 23:48 ` [RFC PATCH 2/2] device-assignment: Count required kvm memory slots Alex Williamson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 50+ messages in thread
From: Alex Williamson @ 2011-01-21 23:48 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, ddutile, mst, avi, chrisw, jan.kiszka

KVM memory slots are used any place we want a guest to have direct
access to a chunk of memory.  Unfortunately, there's only a small,
fixed number of them, and accidentally going over the limit causes
an abort.  Add a trivial interface so that callers can at least
guess if they have a chance to successfully map memory.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 kvm-all.c |   16 ++++++++++++++++
 kvm.h     |    2 ++
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 2f203dd..4fe3631 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -96,6 +96,22 @@ static KVMSlot *kvm_alloc_slot(KVMState *s)
     abort();
 }
 
+int kvm_free_slots(void)
+{
+    KVMState *s = kvm_state;
+    int i, j;
+
+    for (i = 0, j = 0; i < ARRAY_SIZE(s->slots); i++) {
+        /* KVM private memory slots and used slots */
+        if ((i >= 8 && i < 12) || s->slots[i].memory_size) {
+            continue;
+        }
+        j++;
+    }
+
+    return j;
+}
+
 static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
                                          target_phys_addr_t start_addr,
                                          target_phys_addr_t end_addr)
diff --git a/kvm.h b/kvm.h
index 02280a6..93da155 100644
--- a/kvm.h
+++ b/kvm.h
@@ -221,4 +221,6 @@ int kvm_irqchip_in_kernel(void);
 
 int kvm_set_irq(int irq, int level, int *status);
 
+int kvm_free_slots(void);
+
 #endif


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH 2/2] device-assignment: Count required kvm memory slots
  2011-01-21 23:48 [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts Alex Williamson
  2011-01-21 23:48 ` [RFC PATCH 1/2] kvm: Allow querying free slots Alex Williamson
@ 2011-01-21 23:48 ` Alex Williamson
  2011-01-22 22:11 ` [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts Michael S. Tsirkin
  2011-01-24  9:32 ` Marcelo Tosatti
  3 siblings, 0 replies; 50+ messages in thread
From: Alex Williamson @ 2011-01-21 23:48 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, ddutile, mst, avi, chrisw, jan.kiszka

Each MMIO PCI BAR of an assigned device is directly mapped via a KVM
memory slot to avoid bouncing reads and writes through qemu.  KVM only
provides a (small) fixed number of these slots and attempting to
exceed the unadvertised limit results in an abort.  We can't reserve
slots, but let's at least try to make an attempt to check whether
there are enough available before adding a device.

The non-hotplug case is troublesome here because we have no visibility
as to what else might make use of these slots, but hasn't yet been
mapped.  We used to limit the number of devices that could be specified
on the commandline using the -pcidevice option.  The heuristic here
seems to work and provides a similar limit.

We can also avoid using these memory slots by allowing devices to
bounce mmio access through qemu.  This is trivially accomplished by
adding a force_slow=on option to pci-assign.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/device-assignment.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++-
 hw/device-assignment.h |    3 ++
 2 files changed, 61 insertions(+), 1 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index e97f565..0063a11 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -546,7 +546,9 @@ static int assigned_dev_register_regions(PCIRegion *io_regions,
                 ? PCI_BASE_ADDRESS_MEM_PREFETCH
                 : PCI_BASE_ADDRESS_SPACE_MEMORY;
 
-            if (cur_region->size & 0xFFF) {
+            if (pci_dev->features & ASSIGNED_DEVICE_FORCE_SLOW_MASK) {
+                slow_map = 1;
+            } else if (cur_region->size & 0xFFF) {
                 fprintf(stderr, "PCI region %d at address 0x%llx "
                         "has size 0x%x, which is not a multiple of 4K. "
                         "You might experience some performance hit "
@@ -556,6 +558,10 @@ static int assigned_dev_register_regions(PCIRegion *io_regions,
                 slow_map = 1;
             }
 
+            if (!slow_map) {
+                pci_dev->slots_needed++;
+            }
+
             /* map physical memory */
             pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
             pci_dev->v_addrs[i].u.r_virtbase = mmap(NULL, cur_region->size,
@@ -1666,6 +1672,30 @@ static CPUReadMemoryFunc *msix_mmio_read[] = {
 
 static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
 {
+    int i;
+    PCIRegion *pci_region = dev->real_device.regions;
+
+    /* Determine if the MSI-X table splits a BAR, requiring the use of
+     * two memory slots, one to map each remaining part. */
+    if (!(dev->features & ASSIGNED_DEVICE_FORCE_SLOW_MASK)) {
+        for (i = 0; i < dev->real_device.region_number; i++, pci_region++) {
+            if (!pci_region->valid) {
+                continue;
+            }
+
+            if (ranges_overlap(pci_region->base_addr, pci_region->size,
+                               dev->msix_table_addr, 0x1000)) {
+                target_phys_addr_t offset;
+
+                offset = dev->msix_table_addr - pci_region->base_addr;
+                if (offset && pci_region->size > offset + 0x1000) {
+                    dev->slots_needed++;
+                }
+                break;
+            }
+        }
+    }
+
     dev->msix_table_page = mmap(NULL, 0x1000,
                                 PROT_READ|PROT_WRITE,
                                 MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
@@ -1768,6 +1798,31 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
         if (assigned_dev_register_msix_mmio(dev))
             goto assigned_out;
 
+    if (!(dev->features & ASSIGNED_DEVICE_FORCE_SLOW_MASK)) {
+        int free_slots = kvm_free_slots();
+        int total_slots = dev->slots_needed;
+
+        if (!dev->dev.qdev.hotplugged) {
+            AssignedDevice *adev;
+
+            QLIST_FOREACH(adev, &devs, next) {
+                total_slots += adev->slots_needed;
+            }
+
+            /* This seems to work, but it's completely heuristically
+             * determined.  Any number of things might make use of kvm
+             * memory slots before the guest starts mapping memory BARs.
+             * This is really just a guess. */
+            free_slots -= 13;
+        }
+
+        if (total_slots > free_slots) {
+            error_report("pci-assign: Out of memory slots, need %d, have %d\n",
+                         total_slots, free_slots);
+            goto assigned_out;
+        }
+    }
+
     assigned_dev_load_option_rom(dev);
     QLIST_INSERT_HEAD(&devs, dev, next);
 
@@ -1837,6 +1892,8 @@ static PCIDeviceInfo assign_info = {
                         ASSIGNED_DEVICE_USE_IOMMU_BIT, true),
         DEFINE_PROP_BIT("prefer_msi", AssignedDevice, features,
                         ASSIGNED_DEVICE_PREFER_MSI_BIT, true),
+        DEFINE_PROP_BIT("force_slow", AssignedDevice, features,
+                        ASSIGNED_DEVICE_FORCE_SLOW_BIT, false),
         DEFINE_PROP_STRING("configfd", AssignedDevice, configfd_name),
         DEFINE_PROP_END_OF_LIST(),
     },
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index c94a730..ad3cc80 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -76,9 +76,11 @@ typedef struct {
 
 #define ASSIGNED_DEVICE_USE_IOMMU_BIT	0
 #define ASSIGNED_DEVICE_PREFER_MSI_BIT	1
+#define ASSIGNED_DEVICE_FORCE_SLOW_BIT	2
 
 #define ASSIGNED_DEVICE_USE_IOMMU_MASK	(1 << ASSIGNED_DEVICE_USE_IOMMU_BIT)
 #define ASSIGNED_DEVICE_PREFER_MSI_MASK	(1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
+#define ASSIGNED_DEVICE_FORCE_SLOW_MASK	(1 << ASSIGNED_DEVICE_FORCE_SLOW_BIT)
 
 typedef struct AssignedDevice {
     PCIDevice dev;
@@ -111,6 +113,7 @@ typedef struct AssignedDevice {
     int mmio_index;
     int need_emulate_cmd;
     char *configfd_name;
+    int slots_needed;
     QLIST_ENTRY(AssignedDevice) next;
 } AssignedDevice;
 


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-21 23:48 [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts Alex Williamson
  2011-01-21 23:48 ` [RFC PATCH 1/2] kvm: Allow querying free slots Alex Williamson
  2011-01-21 23:48 ` [RFC PATCH 2/2] device-assignment: Count required kvm memory slots Alex Williamson
@ 2011-01-22 22:11 ` Michael S. Tsirkin
  2011-01-24  9:32 ` Marcelo Tosatti
  3 siblings, 0 replies; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-22 22:11 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, ddutile, avi, chrisw, jan.kiszka

On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
> When doing device assignment, we use cpu_register_physical_memory() to
> directly map the qemu mmap of the device resource into the address
> space of the guest.  The unadvertised feature of the register physical
> memory code path on kvm, at least for this type of mapping, is that it
> needs to allocate an index from a small, fixed array of memory slots.
> Even better, if it can't get an index, the code aborts deep in the
> kvm specific bits, preventing the caller from having a chance to
> recover.
> 
> It's really easy to hit this by hot adding too many assigned devices
> to a guest (pretty easy to hit with too many devices at instantiation
> time too, but the abort is slightly more bearable there).
> 
> I'm assuming it's pretty difficult to make the memory slot array
> dynamically sized.  If that's not the case, please let me know as
> that would be a much better solution.
> 
> I'm not terribly happy with the solution in this series, it doesn't
> provide any guarantees whether a cpu_register_physical_memory() will
> succeed, only slightly better educated guesses.
> 
> Are there better ideas how we could solve this?  Thanks,
> 
> Alex

Put the table in qemu memory, make kvm access it with copy from/to user?
It can then be any size ...

> ---
> 
> Alex Williamson (2):
>       device-assignment: Count required kvm memory slots
>       kvm: Allow querying free slots
> 
> 
>  hw/device-assignment.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/device-assignment.h |    3 ++
>  kvm-all.c              |   16 +++++++++++++
>  kvm.h                  |    2 ++
>  4 files changed, 79 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-21 23:48 [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts Alex Williamson
                   ` (2 preceding siblings ...)
  2011-01-22 22:11 ` [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts Michael S. Tsirkin
@ 2011-01-24  9:32 ` Marcelo Tosatti
  2011-01-24 14:16   ` Jan Kiszka
  2011-01-25 10:20   ` Avi Kivity
  3 siblings, 2 replies; 50+ messages in thread
From: Marcelo Tosatti @ 2011-01-24  9:32 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, ddutile, mst, avi, chrisw, jan.kiszka

On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
> When doing device assignment, we use cpu_register_physical_memory() to
> directly map the qemu mmap of the device resource into the address
> space of the guest.  The unadvertised feature of the register physical
> memory code path on kvm, at least for this type of mapping, is that it
> needs to allocate an index from a small, fixed array of memory slots.
> Even better, if it can't get an index, the code aborts deep in the
> kvm specific bits, preventing the caller from having a chance to
> recover.
> 
> It's really easy to hit this by hot adding too many assigned devices
> to a guest (pretty easy to hit with too many devices at instantiation
> time too, but the abort is slightly more bearable there).
> 
> I'm assuming it's pretty difficult to make the memory slot array
> dynamically sized.  If that's not the case, please let me know as
> that would be a much better solution.

Its not difficult to either increase the maximum number (defined as
32 now in both qemu and kernel) of static slots, or support dynamic
increases, if it turns out to be a performance issue.

But you'd probably want to fix the abort for currently supported kernels
anyway.

> I'm not terribly happy with the solution in this series, it doesn't
> provide any guarantees whether a cpu_register_physical_memory() will
> succeed, only slightly better educated guesses.
> 
> Are there better ideas how we could solve this?  Thanks,

Why can't cpu_register_physical_memory() return an error so you can
fallback to slow mode or cancel device insertion?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-24  9:32 ` Marcelo Tosatti
@ 2011-01-24 14:16   ` Jan Kiszka
  2011-01-24 15:44     ` Alex Williamson
  2011-01-25 10:20   ` Avi Kivity
  1 sibling, 1 reply; 50+ messages in thread
From: Jan Kiszka @ 2011-01-24 14:16 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Alex Williamson, kvm, ddutile, mst, avi, chrisw

On 2011-01-24 10:32, Marcelo Tosatti wrote:
> On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
>> When doing device assignment, we use cpu_register_physical_memory() to
>> directly map the qemu mmap of the device resource into the address
>> space of the guest.  The unadvertised feature of the register physical
>> memory code path on kvm, at least for this type of mapping, is that it
>> needs to allocate an index from a small, fixed array of memory slots.
>> Even better, if it can't get an index, the code aborts deep in the
>> kvm specific bits, preventing the caller from having a chance to
>> recover.
>>
>> It's really easy to hit this by hot adding too many assigned devices
>> to a guest (pretty easy to hit with too many devices at instantiation
>> time too, but the abort is slightly more bearable there).
>>
>> I'm assuming it's pretty difficult to make the memory slot array
>> dynamically sized.  If that's not the case, please let me know as
>> that would be a much better solution.
> 
> Its not difficult to either increase the maximum number (defined as
> 32 now in both qemu and kernel) of static slots, or support dynamic
> increases, if it turns out to be a performance issue.

Static limits are waiting to be hit again, just a bit later.

I would start thinking about a tree search as well because iterating
over all slots won't get faster over the time.

> 
> But you'd probably want to fix the abort for currently supported kernels
> anyway.

Jep. Depending on how soon we have smarter solution in the kernel, this
fix may vary in pragmatism.

> 
>> I'm not terribly happy with the solution in this series, it doesn't
>> provide any guarantees whether a cpu_register_physical_memory() will
>> succeed, only slightly better educated guesses.
>>
>> Are there better ideas how we could solve this?  Thanks,
> 
> Why can't cpu_register_physical_memory() return an error so you can
> fallback to slow mode or cancel device insertion?
> 

Doesn't that registration happens much later than the call to
pci_register_bar? In any case, this will require significantly more
invasive work (but it would be much nicer if possible, no question).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-24 14:16   ` Jan Kiszka
@ 2011-01-24 15:44     ` Alex Williamson
  2011-01-25  5:37       ` Alex Williamson
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Williamson @ 2011-01-24 15:44 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, ddutile, mst, avi, chrisw

On Mon, 2011-01-24 at 15:16 +0100, Jan Kiszka wrote:
> On 2011-01-24 10:32, Marcelo Tosatti wrote:
> > On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
> >> When doing device assignment, we use cpu_register_physical_memory() to
> >> directly map the qemu mmap of the device resource into the address
> >> space of the guest.  The unadvertised feature of the register physical
> >> memory code path on kvm, at least for this type of mapping, is that it
> >> needs to allocate an index from a small, fixed array of memory slots.
> >> Even better, if it can't get an index, the code aborts deep in the
> >> kvm specific bits, preventing the caller from having a chance to
> >> recover.
> >>
> >> It's really easy to hit this by hot adding too many assigned devices
> >> to a guest (pretty easy to hit with too many devices at instantiation
> >> time too, but the abort is slightly more bearable there).
> >>
> >> I'm assuming it's pretty difficult to make the memory slot array
> >> dynamically sized.  If that's not the case, please let me know as
> >> that would be a much better solution.
> > 
> > Its not difficult to either increase the maximum number (defined as
> > 32 now in both qemu and kernel) of static slots, or support dynamic
> > increases, if it turns out to be a performance issue.
> 
> Static limits are waiting to be hit again, just a bit later.

Yep, and I think static limits large enough that we can't hit them would
be performance prohibitive.

> I would start thinking about a tree search as well because iterating
> over all slots won't get faster over the time.
> 
> > 
> > But you'd probably want to fix the abort for currently supported kernels
> > anyway.
> 
> Jep. Depending on how soon we have smarter solution in the kernel, this
> fix may vary in pragmatism.
> 
> >
> >> I'm not terribly happy with the solution in this series, it doesn't
> >> provide any guarantees whether a cpu_register_physical_memory() will
> >> succeed, only slightly better educated guesses.
> >>
> >> Are there better ideas how we could solve this?  Thanks,
> > 
> > Why can't cpu_register_physical_memory() return an error so you can
> > fallback to slow mode or cancel device insertion?

It appears that it'd be pretty intrusive to fix this since
cpu_register_physical_memory() is a return void, and the kvm hook into
this is via a set_memory callback for the phys memory client.

> Doesn't that registration happens much later than the call to
> pci_register_bar? In any case, this will require significantly more
> invasive work (but it would be much nicer if possible, no question).

Right, we register BARs in the initfn, but we don't map them until the
guest writes the BARs, mapping them into MMIO space.  I don't think we
want to fall back to slow mapping at that point, so we either need to
fail in the initfn (like this series) or be able to dynamically allocate
more slots so the kvm callback can't fail.  I'll look at how we might be
able to allocate slots on demand.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-24 15:44     ` Alex Williamson
@ 2011-01-25  5:37       ` Alex Williamson
  2011-01-25  7:36         ` Jan Kiszka
                           ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Alex Williamson @ 2011-01-25  5:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, ddutile, mst, avi, chrisw

On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
> I'll look at how we might be
> able to allocate slots on demand.  Thanks,

Here's a first cut just to see if this looks agreeable.  This allows the
slot array to grow on demand.  This works with current userspace, as
well as userspace trivially modified to double KVMState.slots and
hotplugging enough pci-assign devices to exceed the previous limit (w/ &
w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
the right path?  If so, I'll work on removing the fixed limit from
userspace next.  Thanks,

Alex


kvm: Allow memory slot array to grow on demand

Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
to grow on demand.  Private slots are now allocated at the
front instead of the end.  Only x86 seems to use private slots,
so this is now zero for all other archs.  The memslots pointer
is already updated using rcu, so changing the size off the
array when it's replaces is straight forward.  x86 also keeps
a bitmap of slots used by a kvm_mmu_page, which requires a
shadow tlb flush whenever we increase the number of slots.
This forces the pages to be rebuilt with the new bitmap size.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 arch/ia64/include/asm/kvm_host.h    |    4 --
 arch/ia64/kvm/kvm-ia64.c            |    2 +
 arch/powerpc/include/asm/kvm_host.h |    3 --
 arch/s390/include/asm/kvm_host.h    |    3 --
 arch/x86/include/asm/kvm_host.h     |    3 +-
 arch/x86/include/asm/vmx.h          |    6 ++-
 arch/x86/kvm/mmu.c                  |    7 +++-
 arch/x86/kvm/x86.c                  |    6 ++-
 include/linux/kvm_host.h            |    7 +++-
 virt/kvm/kvm_main.c                 |   65 ++++++++++++++++++++++++-----------
 10 files changed, 63 insertions(+), 43 deletions(-)


diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 2689ee5..11d0ab2 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -23,10 +23,6 @@
 #ifndef __ASM_KVM_HOST_H
 #define __ASM_KVM_HOST_H
 
-#define KVM_MEMORY_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
-
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
 /* define exit reasons from vmm to kvm*/
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 70d224d..f1adda2 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1814,7 +1814,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	mutex_lock(&kvm->slots_lock);
 
 	r = -EINVAL;
-	if (log->slot >= KVM_MEMORY_SLOTS)
+	if (log->slot >= kvm->memslots->nmemslots)
 		goto out;
 
 	memslot = &kvm->memslots->memslots[log->slot];
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index bba3b9b..dc80057 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -29,9 +29,6 @@
 #include <asm/kvm_asm.h>
 
 #define KVM_MAX_VCPUS 1
-#define KVM_MEMORY_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index cef7dbf..92a964c 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -20,9 +20,6 @@
 #include <asm/cpu.h>
 
 #define KVM_MAX_VCPUS 64
-#define KVM_MEMORY_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
 
 struct sca_entry {
 	atomic_t scn;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ffd7f8d..df1382c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -27,7 +27,6 @@
 #include <asm/msr-index.h>
 
 #define KVM_MAX_VCPUS 64
-#define KVM_MEMORY_SLOTS 32
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
 
@@ -207,7 +206,7 @@ struct kvm_mmu_page {
 	 * One bit set per slot which has memory
 	 * in this shadow page.
 	 */
-	DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
+	unsigned long *slot_bitmap;
 	bool multimapped;         /* More than one parent_pte? */
 	bool unsync;
 	int root_count;          /* Currently serving as active root */
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 84471b8..7fd8c89 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -370,9 +370,9 @@ enum vmcs_field {
 
 #define AR_RESERVD_MASK 0xfffe0f00
 
-#define TSS_PRIVATE_MEMSLOT			(KVM_MEMORY_SLOTS + 0)
-#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT	(KVM_MEMORY_SLOTS + 1)
-#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT	(KVM_MEMORY_SLOTS + 2)
+#define TSS_PRIVATE_MEMSLOT			0
+#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT	1
+#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT	2
 
 #define VMX_NR_VPIDS				(1 << 16)
 #define VMX_VPID_EXTENT_SINGLE_CONTEXT		1
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ccacf0b..8c2533a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1032,6 +1032,7 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	ASSERT(is_empty_shadow_page(sp->spt));
 	hlist_del(&sp->hash_link);
 	list_del(&sp->link);
+	kfree(sp->slot_bitmap);
 	__free_page(virt_to_page(sp->spt));
 	if (!sp->role.direct)
 		__free_page(virt_to_page(sp->gfns));
@@ -1048,6 +1049,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 					       u64 *parent_pte, int direct)
 {
 	struct kvm_mmu_page *sp;
+	struct kvm_memslots *slots = kvm_memslots(vcpu->kvm);
 
 	sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp);
 	sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
@@ -1056,7 +1058,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 						  PAGE_SIZE);
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
-	bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
+	sp->slot_bitmap = kzalloc(sizeof(long) *
+				  BITS_TO_LONGS(slots->nmemslots), GFP_KERNEL);
+	if (!sp->slot_bitmap)
+		return NULL;
 	sp->multimapped = 0;
 	sp->parent_pte = parent_pte;
 	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5eccdba..c002fac 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1978,7 +1978,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 		r = KVM_MAX_VCPUS;
 		break;
 	case KVM_CAP_NR_MEMSLOTS:
-		r = KVM_MEMORY_SLOTS;
+		r = INT_MAX - KVM_PRIVATE_MEM_SLOTS;
 		break;
 	case KVM_CAP_PV_MMU:	/* obsolete */
 		r = 0;
@@ -3201,7 +3201,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	mutex_lock(&kvm->slots_lock);
 
 	r = -EINVAL;
-	if (log->slot >= KVM_MEMORY_SLOTS)
+	if (log->slot >= kvm->memslots->nmemslots)
 		goto out;
 
 	memslot = &kvm->memslots->memslots[log->slot];
@@ -6068,7 +6068,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	int map_flags = MAP_PRIVATE | MAP_ANONYMOUS;
 
 	/* Prevent internal slot pages from being moved by fork()/COW. */
-	if (memslot->id >= KVM_MEMORY_SLOTS)
+	if (memslot->id < KVM_PRIVATE_MEM_SLOTS)
 		map_flags = MAP_SHARED | MAP_ANONYMOUS;
 
 	/*To keep backward compatibility with older userspace,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b5021db..4cb9f94 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -27,6 +27,10 @@
 
 #include <asm/kvm_host.h>
 
+#ifndef KVM_PRIVATE_MEM_SLOTS
+ #define KVM_PRIVATE_MEM_SLOTS 0
+#endif
+
 /*
  * vcpu->requests bit members
  */
@@ -206,8 +210,7 @@ struct kvm_irq_routing_table {};
 struct kvm_memslots {
 	int nmemslots;
 	u64 generation;
-	struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS +
-					KVM_PRIVATE_MEM_SLOTS];
+	struct kvm_memory_slot memslots[];
 };
 
 struct kvm {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fd67bcd..32f023c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -623,13 +623,14 @@ int __kvm_set_memory_region(struct kvm *kvm,
 			    struct kvm_userspace_memory_region *mem,
 			    int user_alloc)
 {
-	int r;
+	int r, nmemslots;
 	gfn_t base_gfn;
 	unsigned long npages;
 	unsigned long i;
-	struct kvm_memory_slot *memslot;
-	struct kvm_memory_slot old, new;
+	struct kvm_memory_slot *memslot = NULL;
+	struct kvm_memory_slot old = {}, new = {};
 	struct kvm_memslots *slots, *old_memslots;
+	bool flush = false;
 
 	r = -EINVAL;
 	/* General sanity checks */
@@ -639,12 +640,9 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		goto out;
 	if (user_alloc && (mem->userspace_addr & (PAGE_SIZE - 1)))
 		goto out;
-	if (mem->slot >= KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS)
-		goto out;
 	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
 		goto out;
 
-	memslot = &kvm->memslots->memslots[mem->slot];
 	base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
 	npages = mem->memory_size >> PAGE_SHIFT;
 
@@ -655,7 +653,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	if (!npages)
 		mem->flags &= ~KVM_MEM_LOG_DIRTY_PAGES;
 
-	new = old = *memslot;
+	if (mem->slot < kvm->memslots->nmemslots) {
+		memslot = &kvm->memslots->memslots[mem->slot];
+		new = old = *memslot;
+	}
 
 	new.id = mem->slot;
 	new.base_gfn = base_gfn;
@@ -669,7 +670,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 
 	/* Check for overlaps */
 	r = -EEXIST;
-	for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
+	for (i = KVM_PRIVATE_MEM_SLOTS; i < kvm->memslots->nmemslots; ++i) {
 		struct kvm_memory_slot *s = &kvm->memslots->memslots[i];
 
 		if (s == memslot || !s->npages)
@@ -752,12 +753,19 @@ skip_lpage:
 
 	if (!npages) {
 		r = -ENOMEM;
-		slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
+
+		nmemslots = (mem->slot >= kvm->memslots->nmemslots) ?
+			    mem->slot + 1 : kvm->memslots->nmemslots;
+
+		slots = kzalloc(sizeof(struct kvm_memslots) +
+				nmemslots * sizeof(struct kvm_memory_slot),
+				GFP_KERNEL);
 		if (!slots)
 			goto out_free;
-		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
-		if (mem->slot >= slots->nmemslots)
-			slots->nmemslots = mem->slot + 1;
+		memcpy(slots, kvm->memslots,
+		       sizeof(struct kvm_memslots) + kvm->memslots->nmemslots *
+		       sizeof(struct kvm_memory_slot));
+		slots->nmemslots = nmemslots;
 		slots->generation++;
 		slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID;
 
@@ -787,12 +795,21 @@ skip_lpage:
 	}
 
 	r = -ENOMEM;
-	slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
+
+	if (mem->slot >= kvm->memslots->nmemslots) {
+		nmemslots = mem->slot + 1;
+		flush = true;
+	} else
+		nmemslots = kvm->memslots->nmemslots;
+
+	slots = kzalloc(sizeof(struct kvm_memslots) +
+			nmemslots * sizeof(struct kvm_memory_slot),
+			GFP_KERNEL);
 	if (!slots)
 		goto out_free;
-	memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
-	if (mem->slot >= slots->nmemslots)
-		slots->nmemslots = mem->slot + 1;
+	memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots) +
+	       kvm->memslots->nmemslots * sizeof(struct kvm_memory_slot));
+	slots->nmemslots = nmemslots;
 	slots->generation++;
 
 	/* actual memory is freed via old in kvm_free_physmem_slot below */
@@ -808,6 +825,9 @@ skip_lpage:
 	rcu_assign_pointer(kvm->memslots, slots);
 	synchronize_srcu_expedited(&kvm->srcu);
 
+	if (flush)
+		kvm_arch_flush_shadow(kvm);
+
 	kvm_arch_commit_memory_region(kvm, mem, old, user_alloc);
 
 	kvm_free_physmem_slot(&old, &new);
@@ -841,8 +861,6 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
 				   kvm_userspace_memory_region *mem,
 				   int user_alloc)
 {
-	if (mem->slot >= KVM_MEMORY_SLOTS)
-		return -EINVAL;
 	return kvm_set_memory_region(kvm, mem, user_alloc);
 }
 
@@ -855,7 +873,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
 	unsigned long any = 0;
 
 	r = -EINVAL;
-	if (log->slot >= KVM_MEMORY_SLOTS)
+	if (log->slot >= kvm->memslots->nmemslots)
 		goto out;
 
 	memslot = &kvm->memslots->memslots[log->slot];
@@ -947,7 +965,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 	int i;
 	struct kvm_memslots *slots = kvm_memslots(kvm);
 
-	for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
+	for (i = KVM_PRIVATE_MEM_SLOTS; i < slots->nmemslots; ++i) {
 		struct kvm_memory_slot *memslot = &slots->memslots[i];
 
 		if (memslot->flags & KVM_MEMSLOT_INVALID)
@@ -1832,6 +1850,8 @@ static long kvm_vm_ioctl(struct file *filp,
 						sizeof kvm_userspace_mem))
 			goto out;
 
+		kvm_userspace_mem.slot += KVM_PRIVATE_MEM_SLOTS;
+
 		r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem, 1);
 		if (r)
 			goto out;
@@ -1843,6 +1863,9 @@ static long kvm_vm_ioctl(struct file *filp,
 		r = -EFAULT;
 		if (copy_from_user(&log, argp, sizeof log))
 			goto out;
+
+		log.slot += KVM_PRIVATE_MEM_SLOTS;
+
 		r = kvm_vm_ioctl_get_dirty_log(kvm, &log);
 		if (r)
 			goto out;
@@ -1937,7 +1960,7 @@ static long kvm_vm_compat_ioctl(struct file *filp,
 		if (copy_from_user(&compat_log, (void __user *)arg,
 				   sizeof(compat_log)))
 			goto out;
-		log.slot	 = compat_log.slot;
+		log.slot	 = compat_log.slot + KVM_PRIVATE_MEM_SLOTS;
 		log.padding1	 = compat_log.padding1;
 		log.padding2	 = compat_log.padding2;
 		log.dirty_bitmap = compat_ptr(compat_log.dirty_bitmap);




^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25  5:37       ` Alex Williamson
@ 2011-01-25  7:36         ` Jan Kiszka
  2011-01-25 14:41           ` Alex Williamson
  2011-01-25 10:23         ` Avi Kivity
  2011-01-31 19:18         ` Marcelo Tosatti
  2 siblings, 1 reply; 50+ messages in thread
From: Jan Kiszka @ 2011-01-25  7:36 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Marcelo Tosatti, kvm, ddutile, mst, avi, chrisw

[-- Attachment #1: Type: text/plain, Size: 1873 bytes --]

On 2011-01-25 06:37, Alex Williamson wrote:
> On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
>> I'll look at how we might be
>> able to allocate slots on demand.  Thanks,
> 
> Here's a first cut just to see if this looks agreeable.  This allows the
> slot array to grow on demand.  This works with current userspace, as
> well as userspace trivially modified to double KVMState.slots and
> hotplugging enough pci-assign devices to exceed the previous limit (w/ &
> w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
> the right path?  If so, I'll work on removing the fixed limit from
> userspace next.  Thanks,
> 
> Alex
> 
> 
> kvm: Allow memory slot array to grow on demand
> 
> Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
> to grow on demand.  Private slots are now allocated at the
> front instead of the end.  Only x86 seems to use private slots,

Hmm, doesn't current user space expect slots 8..11 to be the private
ones and wouldn't it cause troubles if slots 0..3 are suddenly reserved?

> so this is now zero for all other archs.  The memslots pointer
> is already updated using rcu, so changing the size off the
> array when it's replaces is straight forward.  x86 also keeps
> a bitmap of slots used by a kvm_mmu_page, which requires a
> shadow tlb flush whenever we increase the number of slots.
> This forces the pages to be rebuilt with the new bitmap size.

Is it possible for user space to increase the slot number to ridiculous
amounts (at least as far as kmalloc allows) and then trigger a kernel
walk through them in non-preemptible contexts? Just wondering, I haven't
checked all contexts of functions like kvm_is_visible_gfn yet.

If yes, we should already switch to rbtree or something like that.
Otherwise that may wait a bit, but probably not too long.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-24  9:32 ` Marcelo Tosatti
  2011-01-24 14:16   ` Jan Kiszka
@ 2011-01-25 10:20   ` Avi Kivity
  2011-01-25 14:46     ` Alex Williamson
  2011-01-25 14:55     ` Michael S. Tsirkin
  1 sibling, 2 replies; 50+ messages in thread
From: Avi Kivity @ 2011-01-25 10:20 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Alex Williamson, kvm, ddutile, mst, chrisw, jan.kiszka

On 01/24/2011 11:32 AM, Marcelo Tosatti wrote:
> On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
> >  When doing device assignment, we use cpu_register_physical_memory() to
> >  directly map the qemu mmap of the device resource into the address
> >  space of the guest.  The unadvertised feature of the register physical
> >  memory code path on kvm, at least for this type of mapping, is that it
> >  needs to allocate an index from a small, fixed array of memory slots.
> >  Even better, if it can't get an index, the code aborts deep in the
> >  kvm specific bits, preventing the caller from having a chance to
> >  recover.
> >
> >  It's really easy to hit this by hot adding too many assigned devices
> >  to a guest (pretty easy to hit with too many devices at instantiation
> >  time too, but the abort is slightly more bearable there).
> >
> >  I'm assuming it's pretty difficult to make the memory slot array
> >  dynamically sized.  If that's not the case, please let me know as
> >  that would be a much better solution.
>
> Its not difficult to either increase the maximum number (defined as
> 32 now in both qemu and kernel) of static slots, or support dynamic
> increases, if it turns out to be a performance issue.
>

We can't make it unbounded in the kernel, since a malicious user could 
start creating an infinite amount of memory slots, pinning unbounded 
kernel memory.

If we make the limit much larger, we should start to think about 
efficiency.  Every mmio vmexit is currently a linear scan of the memory 
slot table, which is efficient at a small number of slots, but not at a 
large number.  We could conceivably encode the "no slot" information 
into a bit in the not-present spte.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25  5:37       ` Alex Williamson
  2011-01-25  7:36         ` Jan Kiszka
@ 2011-01-25 10:23         ` Avi Kivity
  2011-01-25 14:57           ` Alex Williamson
  2011-01-31 19:18         ` Marcelo Tosatti
  2 siblings, 1 reply; 50+ messages in thread
From: Avi Kivity @ 2011-01-25 10:23 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jan Kiszka, Marcelo Tosatti, kvm, ddutile, mst, chrisw

On 01/25/2011 07:37 AM, Alex Williamson wrote:
> On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
> >  I'll look at how we might be
> >  able to allocate slots on demand.  Thanks,
>
> Here's a first cut just to see if this looks agreeable.  This allows the
> slot array to grow on demand.  This works with current userspace, as
> well as userspace trivially modified to double KVMState.slots and
> hotplugging enough pci-assign devices to exceed the previous limit (w/&
> w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
> the right path?

This can be trivially exhausted to pin all RAM.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25  7:36         ` Jan Kiszka
@ 2011-01-25 14:41           ` Alex Williamson
  2011-01-25 14:45             ` Michael S. Tsirkin
  2011-01-25 14:53             ` Avi Kivity
  0 siblings, 2 replies; 50+ messages in thread
From: Alex Williamson @ 2011-01-25 14:41 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, ddutile, mst, avi, chrisw

On Tue, 2011-01-25 at 08:36 +0100, Jan Kiszka wrote:
> On 2011-01-25 06:37, Alex Williamson wrote:
> > On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
> >> I'll look at how we might be
> >> able to allocate slots on demand.  Thanks,
> > 
> > Here's a first cut just to see if this looks agreeable.  This allows the
> > slot array to grow on demand.  This works with current userspace, as
> > well as userspace trivially modified to double KVMState.slots and
> > hotplugging enough pci-assign devices to exceed the previous limit (w/ &
> > w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
> > the right path?  If so, I'll work on removing the fixed limit from
> > userspace next.  Thanks,
> > 
> > Alex
> > 
> > 
> > kvm: Allow memory slot array to grow on demand
> > 
> > Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
> > to grow on demand.  Private slots are now allocated at the
> > front instead of the end.  Only x86 seems to use private slots,
> 
> Hmm, doesn't current user space expect slots 8..11 to be the private
> ones and wouldn't it cause troubles if slots 0..3 are suddenly reserved?

The private slots aren't currently visible to userspace, they're
actually slots 32..35.  The patch automatically increments user passed
slot ids so userspace has it's own zero-based view of the array.
Frankly, I don't understand why userspace reserves slots 8..11, is this
compatibility with older kernel implementations?

> > so this is now zero for all other archs.  The memslots pointer
> > is already updated using rcu, so changing the size off the
> > array when it's replaces is straight forward.  x86 also keeps
> > a bitmap of slots used by a kvm_mmu_page, which requires a
> > shadow tlb flush whenever we increase the number of slots.
> > This forces the pages to be rebuilt with the new bitmap size.
> 
> Is it possible for user space to increase the slot number to ridiculous
> amounts (at least as far as kmalloc allows) and then trigger a kernel
> walk through them in non-preemptible contexts? Just wondering, I haven't
> checked all contexts of functions like kvm_is_visible_gfn yet.
> 
> If yes, we should already switch to rbtree or something like that.
> Otherwise that may wait a bit, but probably not too long.

Yeah, Avi has brought up the hole that userspace can exploit this
interface with these changes.  However, for 99+% of users, this change
leaves the slot array at about the same size, or makes it smaller.  Only
huge, scale-out guests would probably even see a doubling of slots (my
guest with 14 82576 VFs uses 48 slots).  On the kernel side, I think we
can safely save a tree implementation as a later optimization should we
determine it's necessary.  We'll have to see how the userspace side
matches to figure out what's best there.  Thanks,

Alex




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 14:41           ` Alex Williamson
@ 2011-01-25 14:45             ` Michael S. Tsirkin
  2011-01-25 14:54               ` Avi Kivity
  2011-01-25 14:53             ` Avi Kivity
  1 sibling, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-25 14:45 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jan Kiszka, Marcelo Tosatti, kvm, ddutile, avi, chrisw

On Tue, Jan 25, 2011 at 07:41:32AM -0700, Alex Williamson wrote:
> On Tue, 2011-01-25 at 08:36 +0100, Jan Kiszka wrote:
> > On 2011-01-25 06:37, Alex Williamson wrote:
> > > On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
> > >> I'll look at how we might be
> > >> able to allocate slots on demand.  Thanks,
> > > 
> > > Here's a first cut just to see if this looks agreeable.  This allows the
> > > slot array to grow on demand.  This works with current userspace, as
> > > well as userspace trivially modified to double KVMState.slots and
> > > hotplugging enough pci-assign devices to exceed the previous limit (w/ &
> > > w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
> > > the right path?  If so, I'll work on removing the fixed limit from
> > > userspace next.  Thanks,
> > > 
> > > Alex
> > > 
> > > 
> > > kvm: Allow memory slot array to grow on demand
> > > 
> > > Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
> > > to grow on demand.  Private slots are now allocated at the
> > > front instead of the end.  Only x86 seems to use private slots,
> > 
> > Hmm, doesn't current user space expect slots 8..11 to be the private
> > ones and wouldn't it cause troubles if slots 0..3 are suddenly reserved?
> 
> The private slots aren't currently visible to userspace, they're
> actually slots 32..35.  The patch automatically increments user passed
> slot ids so userspace has it's own zero-based view of the array.
> Frankly, I don't understand why userspace reserves slots 8..11, is this
> compatibility with older kernel implementations?
> 
> > > so this is now zero for all other archs.  The memslots pointer
> > > is already updated using rcu, so changing the size off the
> > > array when it's replaces is straight forward.  x86 also keeps
> > > a bitmap of slots used by a kvm_mmu_page, which requires a
> > > shadow tlb flush whenever we increase the number of slots.
> > > This forces the pages to be rebuilt with the new bitmap size.
> > 
> > Is it possible for user space to increase the slot number to ridiculous
> > amounts (at least as far as kmalloc allows) and then trigger a kernel
> > walk through them in non-preemptible contexts? Just wondering, I haven't
> > checked all contexts of functions like kvm_is_visible_gfn yet.
> > 
> > If yes, we should already switch to rbtree or something like that.
> > Otherwise that may wait a bit, but probably not too long.
> 
> Yeah, Avi has brought up the hole that userspace can exploit this
> interface with these changes.  However, for 99+% of users, this change
> leaves the slot array at about the same size, or makes it smaller.  Only
> huge, scale-out guests would probably even see a doubling of slots (my
> guest with 14 82576 VFs uses 48 slots).  On the kernel side, I think we
> can safely save a tree implementation as a later optimization should we
> determine it's necessary.  We'll have to see how the userspace side
> matches to figure out what's best there.  Thanks,
> 
> Alex

Also, is there any time where we need to scan all slots
on data path?

-- 
MST

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 10:20   ` Avi Kivity
@ 2011-01-25 14:46     ` Alex Williamson
  2011-01-25 14:56       ` Avi Kivity
  2011-01-25 14:55     ` Michael S. Tsirkin
  1 sibling, 1 reply; 50+ messages in thread
From: Alex Williamson @ 2011-01-25 14:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm, ddutile, mst, chrisw, jan.kiszka

On Tue, 2011-01-25 at 12:20 +0200, Avi Kivity wrote:
> On 01/24/2011 11:32 AM, Marcelo Tosatti wrote:
> > On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
> > >  When doing device assignment, we use cpu_register_physical_memory() to
> > >  directly map the qemu mmap of the device resource into the address
> > >  space of the guest.  The unadvertised feature of the register physical
> > >  memory code path on kvm, at least for this type of mapping, is that it
> > >  needs to allocate an index from a small, fixed array of memory slots.
> > >  Even better, if it can't get an index, the code aborts deep in the
> > >  kvm specific bits, preventing the caller from having a chance to
> > >  recover.
> > >
> > >  It's really easy to hit this by hot adding too many assigned devices
> > >  to a guest (pretty easy to hit with too many devices at instantiation
> > >  time too, but the abort is slightly more bearable there).
> > >
> > >  I'm assuming it's pretty difficult to make the memory slot array
> > >  dynamically sized.  If that's not the case, please let me know as
> > >  that would be a much better solution.
> >
> > Its not difficult to either increase the maximum number (defined as
> > 32 now in both qemu and kernel) of static slots, or support dynamic
> > increases, if it turns out to be a performance issue.
> >
> 
> We can't make it unbounded in the kernel, since a malicious user could 
> start creating an infinite amount of memory slots, pinning unbounded 
> kernel memory.
> 
> If we make the limit much larger, we should start to think about 
> efficiency.  Every mmio vmexit is currently a linear scan of the memory 
> slot table, which is efficient at a small number of slots, but not at a 
> large number.  We could conceivably encode the "no slot" information 
> into a bit in the not-present spte.

On the plus side, very, very few users need more than the current 32
slot limit and the implementation presented likely results in fewer
slots for the majority of the users.  We can maybe save efficiency
issues until we start seeing problems there.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 14:41           ` Alex Williamson
  2011-01-25 14:45             ` Michael S. Tsirkin
@ 2011-01-25 14:53             ` Avi Kivity
  2011-01-25 14:59               ` Michael S. Tsirkin
  2011-01-25 16:35               ` Jan Kiszka
  1 sibling, 2 replies; 50+ messages in thread
From: Avi Kivity @ 2011-01-25 14:53 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jan Kiszka, Marcelo Tosatti, kvm, ddutile, mst, chrisw

On 01/25/2011 04:41 PM, Alex Williamson wrote:
> >  >
> >  >
> >  >  kvm: Allow memory slot array to grow on demand
> >  >
> >  >  Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
> >  >  to grow on demand.  Private slots are now allocated at the
> >  >  front instead of the end.  Only x86 seems to use private slots,
> >
> >  Hmm, doesn't current user space expect slots 8..11 to be the private
> >  ones and wouldn't it cause troubles if slots 0..3 are suddenly reserved?
>
> The private slots aren't currently visible to userspace, they're
> actually slots 32..35.  The patch automatically increments user passed
> slot ids so userspace has it's own zero-based view of the array.
> Frankly, I don't understand why userspace reserves slots 8..11, is this
> compatibility with older kernel implementations?

I think so.  I believe these kernel versions are too old now to matter, 
but of course I can't be sure.

> >  >  so this is now zero for all other archs.  The memslots pointer
> >  >  is already updated using rcu, so changing the size off the
> >  >  array when it's replaces is straight forward.  x86 also keeps
> >  >  a bitmap of slots used by a kvm_mmu_page, which requires a
> >  >  shadow tlb flush whenever we increase the number of slots.
> >  >  This forces the pages to be rebuilt with the new bitmap size.
> >
> >  Is it possible for user space to increase the slot number to ridiculous
> >  amounts (at least as far as kmalloc allows) and then trigger a kernel
> >  walk through them in non-preemptible contexts? Just wondering, I haven't
> >  checked all contexts of functions like kvm_is_visible_gfn yet.
> >
> >  If yes, we should already switch to rbtree or something like that.
> >  Otherwise that may wait a bit, but probably not too long.
>
> Yeah, Avi has brought up the hole that userspace can exploit this
> interface with these changes.  However, for 99+% of users, this change
> leaves the slot array at about the same size, or makes it smaller.  Only
> huge, scale-out guests would probably even see a doubling of slots (my
> guest with 14 82576 VFs uses 48 slots).  On the kernel side, I think we
> can safely save a tree implementation as a later optimization should we
> determine it's necessary.  We'll have to see how the userspace side
> matches to figure out what's best there.  Thanks,
>

A tree would probably be a pessimization until we are able to cache the 
result of lookups.  That's because the linear scan generates a very 
simple pattern of branch predictions and memory accesses, while a tree 
uses a whole bunch of cachelines and generates unpredictable branches 
(if the inputs are unpredictable).

Note that with TDP most lookups result in failure, so all we need is a 
fast way to determine whether to perform the lookup at all or not.  That 
can be done by caching the last lookup for this address in the spte by 
setting a reserved bits.  For the other lookups, which we believe will 
succeed, we can assume the probablity of a match is related to the slot 
size, and sort the slots by page count.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 14:45             ` Michael S. Tsirkin
@ 2011-01-25 14:54               ` Avi Kivity
  0 siblings, 0 replies; 50+ messages in thread
From: Avi Kivity @ 2011-01-25 14:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On 01/25/2011 04:45 PM, Michael S. Tsirkin wrote:
> Also, is there any time where we need to scan all slots
> on data path?

All guest mmio accesses.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 10:20   ` Avi Kivity
  2011-01-25 14:46     ` Alex Williamson
@ 2011-01-25 14:55     ` Michael S. Tsirkin
  2011-01-25 14:58       ` Avi Kivity
  1 sibling, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-25 14:55 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Marcelo Tosatti, Alex Williamson, kvm, ddutile, chrisw, jan.kiszka

On Tue, Jan 25, 2011 at 12:20:03PM +0200, Avi Kivity wrote:
> On 01/24/2011 11:32 AM, Marcelo Tosatti wrote:
> >On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
> >>  When doing device assignment, we use cpu_register_physical_memory() to
> >>  directly map the qemu mmap of the device resource into the address
> >>  space of the guest.  The unadvertised feature of the register physical
> >>  memory code path on kvm, at least for this type of mapping, is that it
> >>  needs to allocate an index from a small, fixed array of memory slots.
> >>  Even better, if it can't get an index, the code aborts deep in the
> >>  kvm specific bits, preventing the caller from having a chance to
> >>  recover.
> >>
> >>  It's really easy to hit this by hot adding too many assigned devices
> >>  to a guest (pretty easy to hit with too many devices at instantiation
> >>  time too, but the abort is slightly more bearable there).
> >>
> >>  I'm assuming it's pretty difficult to make the memory slot array
> >>  dynamically sized.  If that's not the case, please let me know as
> >>  that would be a much better solution.
> >
> >Its not difficult to either increase the maximum number (defined as
> >32 now in both qemu and kernel) of static slots, or support dynamic
> >increases, if it turns out to be a performance issue.
> >
> 
> We can't make it unbounded in the kernel, since a malicious user
> could start creating an infinite amount of memory slots, pinning
> unbounded kernel memory.

How about keeping the slots in userspace memory, access them with copy
from user?

> If we make the limit much larger, we should start to think about
> efficiency.  Every mmio vmexit is currently a linear scan of the
> memory slot table, which is efficient at a small number of slots,
> but not at a large number.  We could conceivably encode the "no
> slot" information into a bit in the not-present spte.

OK, but the slots that Alex here wants to use are presumably
mostly not resulting in a pagefault at all, right?
Can we split such guys out so they don't slow mmio lookup?
Maybe keep *these* in userspace memory.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 14:46     ` Alex Williamson
@ 2011-01-25 14:56       ` Avi Kivity
  0 siblings, 0 replies; 50+ messages in thread
From: Avi Kivity @ 2011-01-25 14:56 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Marcelo Tosatti, kvm, ddutile, mst, chrisw, jan.kiszka

On 01/25/2011 04:46 PM, Alex Williamson wrote:
> On Tue, 2011-01-25 at 12:20 +0200, Avi Kivity wrote:
> >  On 01/24/2011 11:32 AM, Marcelo Tosatti wrote:
> >  >  On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
> >  >  >   When doing device assignment, we use cpu_register_physical_memory() to
> >  >  >   directly map the qemu mmap of the device resource into the address
> >  >  >   space of the guest.  The unadvertised feature of the register physical
> >  >  >   memory code path on kvm, at least for this type of mapping, is that it
> >  >  >   needs to allocate an index from a small, fixed array of memory slots.
> >  >  >   Even better, if it can't get an index, the code aborts deep in the
> >  >  >   kvm specific bits, preventing the caller from having a chance to
> >  >  >   recover.
> >  >  >
> >  >  >   It's really easy to hit this by hot adding too many assigned devices
> >  >  >   to a guest (pretty easy to hit with too many devices at instantiation
> >  >  >   time too, but the abort is slightly more bearable there).
> >  >  >
> >  >  >   I'm assuming it's pretty difficult to make the memory slot array
> >  >  >   dynamically sized.  If that's not the case, please let me know as
> >  >  >   that would be a much better solution.
> >  >
> >  >  Its not difficult to either increase the maximum number (defined as
> >  >  32 now in both qemu and kernel) of static slots, or support dynamic
> >  >  increases, if it turns out to be a performance issue.
> >  >
> >
> >  We can't make it unbounded in the kernel, since a malicious user could
> >  start creating an infinite amount of memory slots, pinning unbounded
> >  kernel memory.
> >
> >  If we make the limit much larger, we should start to think about
> >  efficiency.  Every mmio vmexit is currently a linear scan of the memory
> >  slot table, which is efficient at a small number of slots, but not at a
> >  large number.  We could conceivably encode the "no slot" information
> >  into a bit in the not-present spte.
>
> On the plus side, very, very few users need more than the current 32
> slot limit and the implementation presented likely results in fewer
> slots for the majority of the users.  We can maybe save efficiency
> issues until we start seeing problems there.  Thanks,
>

Well, we need a static cap, but certainly limiting the search to the 
number of populated slots is an improvement.

We might keep the array size static (but only use the populated part).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 10:23         ` Avi Kivity
@ 2011-01-25 14:57           ` Alex Williamson
  2011-01-25 17:11             ` Avi Kivity
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Williamson @ 2011-01-25 14:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, Marcelo Tosatti, kvm, ddutile, mst, chrisw

On Tue, 2011-01-25 at 12:23 +0200, Avi Kivity wrote:
> On 01/25/2011 07:37 AM, Alex Williamson wrote:
> > On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
> > >  I'll look at how we might be
> > >  able to allocate slots on demand.  Thanks,
> >
> > Here's a first cut just to see if this looks agreeable.  This allows the
> > slot array to grow on demand.  This works with current userspace, as
> > well as userspace trivially modified to double KVMState.slots and
> > hotplugging enough pci-assign devices to exceed the previous limit (w/&
> > w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
> > the right path?
> 
> This can be trivially exhausted to pin all RAM.

What's a reasonable upper limit?  A PCI device can have at most 6 MMIO
BARs, each taking one slot.  It might also support MSI-X in a way that
splits a BAR, so an absolute max of 7 slots per PCI device.  Assuming
only 1 PCI bus for the moment and also assuming assigned devices are
typically single function, 32 devices * 7 slots/device = 224 slots, so
maybe a 256 limit?  Maybe we even bump it up to a limit of 64 devices
with a slot limit of 512.  It would be easier in device assignment code
to keep track of a number of devices limit than trying to guess whether
slots will be available when we need them.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 14:55     ` Michael S. Tsirkin
@ 2011-01-25 14:58       ` Avi Kivity
  2011-01-25 15:23         ` Michael S. Tsirkin
  0 siblings, 1 reply; 50+ messages in thread
From: Avi Kivity @ 2011-01-25 14:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marcelo Tosatti, Alex Williamson, kvm, ddutile, chrisw, jan.kiszka

On 01/25/2011 04:55 PM, Michael S. Tsirkin wrote:
> >
> >  We can't make it unbounded in the kernel, since a malicious user
> >  could start creating an infinite amount of memory slots, pinning
> >  unbounded kernel memory.
>
> How about keeping the slots in userspace memory, access them with copy
> from user?

Some of the data is validated by the kernel, so it needs a kernel copy.  
Other fields are completely internal to the kernel.

> >  If we make the limit much larger, we should start to think about
> >  efficiency.  Every mmio vmexit is currently a linear scan of the
> >  memory slot table, which is efficient at a small number of slots,
> >  but not at a large number.  We could conceivably encode the "no
> >  slot" information into a bit in the not-present spte.
>
> OK, but the slots that Alex here wants to use are presumably
> mostly not resulting in a pagefault at all, right?
> Can we split such guys out so they don't slow mmio lookup?
> Maybe keep *these* in userspace memory.

The algorithm is:

   page fault:
     for each slot:
        if addr in slot:
           populate spte
           return
     # no slot matches
     mmio

so we have to try out all slots before we find out none of them are needed.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 14:53             ` Avi Kivity
@ 2011-01-25 14:59               ` Michael S. Tsirkin
  2011-01-25 17:33                 ` Avi Kivity
  2011-01-25 16:35               ` Jan Kiszka
  1 sibling, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-25 14:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
> For the other lookups, which we
> believe will succeed, we can assume the probablity of a match is
> related to the slot size, and sort the slots by page count.

Unlikely to be true for assigned device BARs.

-- 
MST

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 14:58       ` Avi Kivity
@ 2011-01-25 15:23         ` Michael S. Tsirkin
  2011-01-25 17:34           ` Avi Kivity
  0 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-25 15:23 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Marcelo Tosatti, Alex Williamson, kvm, ddutile, chrisw, jan.kiszka

On Tue, Jan 25, 2011 at 04:58:41PM +0200, Avi Kivity wrote:
> On 01/25/2011 04:55 PM, Michael S. Tsirkin wrote:
> >>
> >>  We can't make it unbounded in the kernel, since a malicious user
> >>  could start creating an infinite amount of memory slots, pinning
> >>  unbounded kernel memory.
> >
> >How about keeping the slots in userspace memory, access them with copy
> >from user?
> 
> Some of the data is validated by the kernel, so it needs a kernel
> copy.  Other fields are completely internal to the kernel.
> 
> >>  If we make the limit much larger, we should start to think about
> >>  efficiency.  Every mmio vmexit is currently a linear scan of the
> >>  memory slot table, which is efficient at a small number of slots,
> >>  but not at a large number.  We could conceivably encode the "no
> >>  slot" information into a bit in the not-present spte.
> >
> >OK, but the slots that Alex here wants to use are presumably
> >mostly not resulting in a pagefault at all, right?
> >Can we split such guys out so they don't slow mmio lookup?
> >Maybe keep *these* in userspace memory.
> 
> The algorithm is:
> 
>   page fault:
>     for each slot:
>        if addr in slot:
>           populate spte
>           return
>     # no slot matches
>     mmio
> 
> so we have to try out all slots before we find out none of them are needed.

I see now. Yes, a flag in spte would help.  changes in slots would then
have to update all these flags.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 14:53             ` Avi Kivity
  2011-01-25 14:59               ` Michael S. Tsirkin
@ 2011-01-25 16:35               ` Jan Kiszka
  2011-01-25 19:13                 ` Alex Williamson
  1 sibling, 1 reply; 50+ messages in thread
From: Jan Kiszka @ 2011-01-25 16:35 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, Marcelo Tosatti, kvm, ddutile, mst, chrisw

[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

On 2011-01-25 15:53, Avi Kivity wrote:
> On 01/25/2011 04:41 PM, Alex Williamson wrote:
>> >  >
>> >  >
>> >  >  kvm: Allow memory slot array to grow on demand
>> >  >
>> >  >  Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
>> >  >  to grow on demand.  Private slots are now allocated at the
>> >  >  front instead of the end.  Only x86 seems to use private slots,
>> >
>> >  Hmm, doesn't current user space expect slots 8..11 to be the private
>> >  ones and wouldn't it cause troubles if slots 0..3 are suddenly
>> reserved?
>>
>> The private slots aren't currently visible to userspace, they're
>> actually slots 32..35.  The patch automatically increments user passed
>> slot ids so userspace has it's own zero-based view of the array.
>> Frankly, I don't understand why userspace reserves slots 8..11, is this
>> compatibility with older kernel implementations?
> 
> I think so.  I believe these kernel versions are too old now to matter,
> but of course I can't be sure.

Will check and enable those 4 additional slots for user space if we no
longer need to exclude them.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 14:57           ` Alex Williamson
@ 2011-01-25 17:11             ` Avi Kivity
  2011-01-25 17:43               ` Alex Williamson
  0 siblings, 1 reply; 50+ messages in thread
From: Avi Kivity @ 2011-01-25 17:11 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jan Kiszka, Marcelo Tosatti, kvm, ddutile, mst, chrisw

On 01/25/2011 04:57 PM, Alex Williamson wrote:
> On Tue, 2011-01-25 at 12:23 +0200, Avi Kivity wrote:
> >  On 01/25/2011 07:37 AM, Alex Williamson wrote:
> >  >  On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
> >  >  >   I'll look at how we might be
> >  >  >   able to allocate slots on demand.  Thanks,
> >  >
> >  >  Here's a first cut just to see if this looks agreeable.  This allows the
> >  >  slot array to grow on demand.  This works with current userspace, as
> >  >  well as userspace trivially modified to double KVMState.slots and
> >  >  hotplugging enough pci-assign devices to exceed the previous limit (w/&
> >  >  w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
> >  >  the right path?
> >
> >  This can be trivially exhausted to pin all RAM.
>
> What's a reasonable upper limit?  A PCI device can have at most 6 MMIO
> BARs, each taking one slot.

A BAR can take no slots, or several slots.  For example a BAR might have 
a framebuffer, followed by an off-screen framebuffer, followed by an 
mmio register area in one BAR.  You'd want the framebuffer to be dirty 
logged while the offscreen framebuffer is not tracked (so one slot for 
each) while the mmio range cannot be used as a slot.

That only holds for emulated devices.

>   It might also support MSI-X in a way that
> splits a BAR, so an absolute max of 7 slots per PCI device.  Assuming
> only 1 PCI bus for the moment and also assuming assigned devices are
> typically single function, 32 devices * 7 slots/device = 224 slots, so
> maybe a 256 limit?  Maybe we even bump it up to a limit of 64 devices
> with a slot limit of 512.  It would be easier in device assignment code
> to keep track of a number of devices limit than trying to guess whether
> slots will be available when we need them.

Sounds reasonable.  But we'd need a faster search for 512 slots.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 14:59               ` Michael S. Tsirkin
@ 2011-01-25 17:33                 ` Avi Kivity
  2011-01-25 17:58                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 50+ messages in thread
From: Avi Kivity @ 2011-01-25 17:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
> On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
> >  For the other lookups, which we
> >  believe will succeed, we can assume the probablity of a match is
> >  related to the slot size, and sort the slots by page count.
>
> Unlikely to be true for assigned device BARs.

We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and lots 
of small slots for BARs and such.

The vast majority of faults will either touch one of the two largest 
slots, or will miss all slots.  Relatively few will hit the small slots.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 15:23         ` Michael S. Tsirkin
@ 2011-01-25 17:34           ` Avi Kivity
  2011-01-25 18:00             ` Michael S. Tsirkin
  0 siblings, 1 reply; 50+ messages in thread
From: Avi Kivity @ 2011-01-25 17:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marcelo Tosatti, Alex Williamson, kvm, ddutile, chrisw, jan.kiszka

On 01/25/2011 05:23 PM, Michael S. Tsirkin wrote:
> On Tue, Jan 25, 2011 at 04:58:41PM +0200, Avi Kivity wrote:
> >  On 01/25/2011 04:55 PM, Michael S. Tsirkin wrote:
> >  >>
> >  >>   We can't make it unbounded in the kernel, since a malicious user
> >  >>   could start creating an infinite amount of memory slots, pinning
> >  >>   unbounded kernel memory.
> >  >
> >  >How about keeping the slots in userspace memory, access them with copy
> >  >from user?
> >
> >  Some of the data is validated by the kernel, so it needs a kernel
> >  copy.  Other fields are completely internal to the kernel.
> >
> >  >>   If we make the limit much larger, we should start to think about
> >  >>   efficiency.  Every mmio vmexit is currently a linear scan of the
> >  >>   memory slot table, which is efficient at a small number of slots,
> >  >>   but not at a large number.  We could conceivably encode the "no
> >  >>   slot" information into a bit in the not-present spte.
> >  >
> >  >OK, but the slots that Alex here wants to use are presumably
> >  >mostly not resulting in a pagefault at all, right?
> >  >Can we split such guys out so they don't slow mmio lookup?
> >  >Maybe keep *these* in userspace memory.
> >
> >  The algorithm is:
> >
> >    page fault:
> >      for each slot:
> >         if addr in slot:
> >            populate spte
> >            return
> >      # no slot matches
> >      mmio
> >
> >  so we have to try out all slots before we find out none of them are needed.
>
> I see now. Yes, a flag in spte would help.  changes in slots would then
> have to update all these flags.

That's easy, we drop all sptes.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 17:11             ` Avi Kivity
@ 2011-01-25 17:43               ` Alex Williamson
  2011-01-26  9:22                 ` Avi Kivity
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Williamson @ 2011-01-25 17:43 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, Marcelo Tosatti, kvm, ddutile, mst, chrisw

On Tue, 2011-01-25 at 19:11 +0200, Avi Kivity wrote:
> On 01/25/2011 04:57 PM, Alex Williamson wrote:
> > On Tue, 2011-01-25 at 12:23 +0200, Avi Kivity wrote:
> > >  On 01/25/2011 07:37 AM, Alex Williamson wrote:
> > >  >  On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
> > >  >  >   I'll look at how we might be
> > >  >  >   able to allocate slots on demand.  Thanks,
> > >  >
> > >  >  Here's a first cut just to see if this looks agreeable.  This allows the
> > >  >  slot array to grow on demand.  This works with current userspace, as
> > >  >  well as userspace trivially modified to double KVMState.slots and
> > >  >  hotplugging enough pci-assign devices to exceed the previous limit (w/&
> > >  >  w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
> > >  >  the right path?
> > >
> > >  This can be trivially exhausted to pin all RAM.
> >
> > What's a reasonable upper limit?  A PCI device can have at most 6 MMIO
> > BARs, each taking one slot.
> 
> A BAR can take no slots, or several slots.  For example a BAR might have 
> a framebuffer, followed by an off-screen framebuffer, followed by an 
> mmio register area in one BAR.  You'd want the framebuffer to be dirty 
> logged while the offscreen framebuffer is not tracked (so one slot for 
> each) while the mmio range cannot be used as a slot.
> 
> That only holds for emulated devices.

Sure, emulated devices can do lots of specialty mappings, but I also
expect that more typically, mmio access to emulated devices will get
bounced through qemu and not use any slots.

> >   It might also support MSI-X in a way that
> > splits a BAR, so an absolute max of 7 slots per PCI device.  Assuming
> > only 1 PCI bus for the moment and also assuming assigned devices are
> > typically single function, 32 devices * 7 slots/device = 224 slots, so
> > maybe a 256 limit?  Maybe we even bump it up to a limit of 64 devices
> > with a slot limit of 512.  It would be easier in device assignment code
> > to keep track of a number of devices limit than trying to guess whether
> > slots will be available when we need them.
> 
> Sounds reasonable.  But we'd need a faster search for 512 slots.

Ok, I'll add a limit.  I'm not convinced the typical use case is going
to increase slots at all, so I'm still a little resistant to optimizing
the search at this point.  BTW, simply by the ordering of when the
platform registers memory vs when other devices are mapped, we seem to
end up placing the actual memory regions at the front of the list.  Not
sure if that's by design or luck.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 17:33                 ` Avi Kivity
@ 2011-01-25 17:58                   ` Michael S. Tsirkin
  2011-01-26  9:17                     ` Avi Kivity
  0 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-25 17:58 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
> On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
> >On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
> >>  For the other lookups, which we
> >>  believe will succeed, we can assume the probablity of a match is
> >>  related to the slot size, and sort the slots by page count.
> >
> >Unlikely to be true for assigned device BARs.
> 
> We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
> lots of small slots for BARs and such.
> 
> The vast majority of faults will either touch one of the two largest
> slots, or will miss all slots.  Relatively few will hit the small
> slots.

Not if you are using one of the assigned devices and don't
do any faults on memory :)

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 17:34           ` Avi Kivity
@ 2011-01-25 18:00             ` Michael S. Tsirkin
  2011-01-26  9:25               ` Avi Kivity
  0 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-25 18:00 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Marcelo Tosatti, Alex Williamson, kvm, ddutile, chrisw, jan.kiszka

On Tue, Jan 25, 2011 at 07:34:18PM +0200, Avi Kivity wrote:
> On 01/25/2011 05:23 PM, Michael S. Tsirkin wrote:
> >On Tue, Jan 25, 2011 at 04:58:41PM +0200, Avi Kivity wrote:
> >>  On 01/25/2011 04:55 PM, Michael S. Tsirkin wrote:
> >>  >>
> >>  >>   We can't make it unbounded in the kernel, since a malicious user
> >>  >>   could start creating an infinite amount of memory slots, pinning
> >>  >>   unbounded kernel memory.
> >>  >
> >>  >How about keeping the slots in userspace memory, access them with copy
> >>  >from user?
> >>
> >>  Some of the data is validated by the kernel, so it needs a kernel
> >>  copy.  Other fields are completely internal to the kernel.
> >>
> >>  >>   If we make the limit much larger, we should start to think about
> >>  >>   efficiency.  Every mmio vmexit is currently a linear scan of the
> >>  >>   memory slot table, which is efficient at a small number of slots,
> >>  >>   but not at a large number.  We could conceivably encode the "no
> >>  >>   slot" information into a bit in the not-present spte.
> >>  >
> >>  >OK, but the slots that Alex here wants to use are presumably
> >>  >mostly not resulting in a pagefault at all, right?
> >>  >Can we split such guys out so they don't slow mmio lookup?
> >>  >Maybe keep *these* in userspace memory.
> >>
> >>  The algorithm is:
> >>
> >>    page fault:
> >>      for each slot:
> >>         if addr in slot:
> >>            populate spte
> >>            return
> >>      # no slot matches
> >>      mmio
> >>
> >>  so we have to try out all slots before we find out none of them are needed.
> >
> >I see now. Yes, a flag in spte would help.  changes in slots would then
> >have to update all these flags.
> 
> That's easy, we drop all sptes.

Ah, right. Hmm cpu has no flag to distinguish mmio sptes somehow already?

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 16:35               ` Jan Kiszka
@ 2011-01-25 19:13                 ` Alex Williamson
  2011-01-26  8:14                   ` Jan Kiszka
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Williamson @ 2011-01-25 19:13 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, ddutile, mst, chrisw

On Tue, 2011-01-25 at 17:35 +0100, Jan Kiszka wrote:
> On 2011-01-25 15:53, Avi Kivity wrote:
> > On 01/25/2011 04:41 PM, Alex Williamson wrote:
> >> >  >
> >> >  >
> >> >  >  kvm: Allow memory slot array to grow on demand
> >> >  >
> >> >  >  Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
> >> >  >  to grow on demand.  Private slots are now allocated at the
> >> >  >  front instead of the end.  Only x86 seems to use private slots,
> >> >
> >> >  Hmm, doesn't current user space expect slots 8..11 to be the private
> >> >  ones and wouldn't it cause troubles if slots 0..3 are suddenly
> >> reserved?
> >>
> >> The private slots aren't currently visible to userspace, they're
> >> actually slots 32..35.  The patch automatically increments user passed
> >> slot ids so userspace has it's own zero-based view of the array.
> >> Frankly, I don't understand why userspace reserves slots 8..11, is this
> >> compatibility with older kernel implementations?
> > 
> > I think so.  I believe these kernel versions are too old now to matter,
> > but of course I can't be sure.
> 
> Will check and enable those 4 additional slots for user space if we no
> longer need to exclude them.

Appears that the history goes something like this...

      * Once upon a time we had 8 memory slots

      * v2.6.24-4949-ge0d62c7 added 4 reserved slots above those 8

      * v2.6.24-4950-gcbc9402 made use of slot 8 for tss

      * v2.6.24-4962-gf78e0e2 made use of slot 9 for tpr

      * v2.6.25-5341-gef2979b bumped the 8 slots to 32

      * v2.6.26-rc1-13-gb7ebfb0 made use of slot 10 for ept

      * v2.6.28-4952-g6fe6397 moved the previously hard coded slots
        8,9,10 up to KVM_MEMORY_SLOTS + {0,1,2}

So we haven't needed to reserve 8..11 for a while and we've never made
use of all 4 private slots.  I should reduce that to 3 in my patch.
Maybe there's a test we could do in userspace to figure out we don't
have to skip those anymore?

Alex


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 19:13                 ` Alex Williamson
@ 2011-01-26  8:14                   ` Jan Kiszka
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Kiszka @ 2011-01-26  8:14 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Avi Kivity, Marcelo Tosatti, kvm, ddutile, mst, chrisw

[-- Attachment #1: Type: text/plain, Size: 2211 bytes --]

On 2011-01-25 20:13, Alex Williamson wrote:
> On Tue, 2011-01-25 at 17:35 +0100, Jan Kiszka wrote:
>> On 2011-01-25 15:53, Avi Kivity wrote:
>>> On 01/25/2011 04:41 PM, Alex Williamson wrote:
>>>>>  >
>>>>>  >
>>>>>  >  kvm: Allow memory slot array to grow on demand
>>>>>  >
>>>>>  >  Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
>>>>>  >  to grow on demand.  Private slots are now allocated at the
>>>>>  >  front instead of the end.  Only x86 seems to use private slots,
>>>>>
>>>>>  Hmm, doesn't current user space expect slots 8..11 to be the private
>>>>>  ones and wouldn't it cause troubles if slots 0..3 are suddenly
>>>> reserved?
>>>>
>>>> The private slots aren't currently visible to userspace, they're
>>>> actually slots 32..35.  The patch automatically increments user passed
>>>> slot ids so userspace has it's own zero-based view of the array.
>>>> Frankly, I don't understand why userspace reserves slots 8..11, is this
>>>> compatibility with older kernel implementations?
>>>
>>> I think so.  I believe these kernel versions are too old now to matter,
>>> but of course I can't be sure.
>>
>> Will check and enable those 4 additional slots for user space if we no
>> longer need to exclude them.
> 
> Appears that the history goes something like this...
> 
>       * Once upon a time we had 8 memory slots
> 
>       * v2.6.24-4949-ge0d62c7 added 4 reserved slots above those 8
> 
>       * v2.6.24-4950-gcbc9402 made use of slot 8 for tss
> 
>       * v2.6.24-4962-gf78e0e2 made use of slot 9 for tpr
> 
>       * v2.6.25-5341-gef2979b bumped the 8 slots to 32
> 
>       * v2.6.26-rc1-13-gb7ebfb0 made use of slot 10 for ept
> 
>       * v2.6.28-4952-g6fe6397 moved the previously hard coded slots
>         8,9,10 up to KVM_MEMORY_SLOTS + {0,1,2}
> 

Nice overview!

> So we haven't needed to reserve 8..11 for a while and we've never made
> use of all 4 private slots.  I should reduce that to 3 in my patch.
> Maybe there's a test we could do in userspace to figure out we don't
> have to skip those anymore?

We can simply remove the test, 2.6.29 is our entry barrier for both
upstream and qemu-kvm now.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 17:58                   ` Michael S. Tsirkin
@ 2011-01-26  9:17                     ` Avi Kivity
  2011-01-26  9:20                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 50+ messages in thread
From: Avi Kivity @ 2011-01-26  9:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:
> On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
> >  On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
> >  >On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
> >  >>   For the other lookups, which we
> >  >>   believe will succeed, we can assume the probablity of a match is
> >  >>   related to the slot size, and sort the slots by page count.
> >  >
> >  >Unlikely to be true for assigned device BARs.
> >
> >  We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
> >  lots of small slots for BARs and such.
> >
> >  The vast majority of faults will either touch one of the two largest
> >  slots, or will miss all slots.  Relatively few will hit the small
> >  slots.
>
> Not if you are using one of the assigned devices and don't
> do any faults on memory :)

It's impossible not to fault on memory.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-26  9:17                     ` Avi Kivity
@ 2011-01-26  9:20                       ` Michael S. Tsirkin
  2011-01-26  9:23                         ` Avi Kivity
  0 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-26  9:20 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On Wed, Jan 26, 2011 at 11:17:11AM +0200, Avi Kivity wrote:
> On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:
> >On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
> >>  On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
> >>  >On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
> >>  >>   For the other lookups, which we
> >>  >>   believe will succeed, we can assume the probablity of a match is
> >>  >>   related to the slot size, and sort the slots by page count.
> >>  >
> >>  >Unlikely to be true for assigned device BARs.
> >>
> >>  We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
> >>  lots of small slots for BARs and such.
> >>
> >>  The vast majority of faults will either touch one of the two largest
> >>  slots, or will miss all slots.  Relatively few will hit the small
> >>  slots.
> >
> >Not if you are using one of the assigned devices and don't
> >do any faults on memory :)
> 
> It's impossible not to fault on memory.

No I mean the RAM.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 17:43               ` Alex Williamson
@ 2011-01-26  9:22                 ` Avi Kivity
  0 siblings, 0 replies; 50+ messages in thread
From: Avi Kivity @ 2011-01-26  9:22 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jan Kiszka, Marcelo Tosatti, kvm, ddutile, mst, chrisw

On 01/25/2011 07:43 PM, Alex Williamson wrote:
> On Tue, 2011-01-25 at 19:11 +0200, Avi Kivity wrote:
> >  On 01/25/2011 04:57 PM, Alex Williamson wrote:
> >  >  On Tue, 2011-01-25 at 12:23 +0200, Avi Kivity wrote:
> >  >  >   On 01/25/2011 07:37 AM, Alex Williamson wrote:
> >  >  >   >   On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
> >  >  >   >   >    I'll look at how we might be
> >  >  >   >   >    able to allocate slots on demand.  Thanks,
> >  >  >   >
> >  >  >   >   Here's a first cut just to see if this looks agreeable.  This allows the
> >  >  >   >   slot array to grow on demand.  This works with current userspace, as
> >  >  >   >   well as userspace trivially modified to double KVMState.slots and
> >  >  >   >   hotplugging enough pci-assign devices to exceed the previous limit (w/&
> >  >  >   >   w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
> >  >  >   >   the right path?
> >  >  >
> >  >  >   This can be trivially exhausted to pin all RAM.
> >  >
> >  >  What's a reasonable upper limit?  A PCI device can have at most 6 MMIO
> >  >  BARs, each taking one slot.
> >
> >  A BAR can take no slots, or several slots.  For example a BAR might have
> >  a framebuffer, followed by an off-screen framebuffer, followed by an
> >  mmio register area in one BAR.  You'd want the framebuffer to be dirty
> >  logged while the offscreen framebuffer is not tracked (so one slot for
> >  each) while the mmio range cannot be used as a slot.
> >
> >  That only holds for emulated devices.
>
> Sure, emulated devices can do lots of specialty mappings, but I also
> expect that more typically, mmio access to emulated devices will get
> bounced through qemu and not use any slots.

Right; the example I gave (a framebuffer) is the exception rather than 
the rule; and I believe modern framebuffers are usually accessed through 
dma rather than BARs.

> >  >    It might also support MSI-X in a way that
> >  >  splits a BAR, so an absolute max of 7 slots per PCI device.  Assuming
> >  >  only 1 PCI bus for the moment and also assuming assigned devices are
> >  >  typically single function, 32 devices * 7 slots/device = 224 slots, so
> >  >  maybe a 256 limit?  Maybe we even bump it up to a limit of 64 devices
> >  >  with a slot limit of 512.  It would be easier in device assignment code
> >  >  to keep track of a number of devices limit than trying to guess whether
> >  >  slots will be available when we need them.
> >
> >  Sounds reasonable.  But we'd need a faster search for 512 slots.
>
> Ok, I'll add a limit.  I'm not convinced the typical use case is going
> to increase slots at all, so I'm still a little resistant to optimizing
> the search at this point.

I don't want there to be two kvm implementations out there, both 
supporting 512 slots, while one is slow and one is fast.  It means that 
you have no idea what performance to expect.

> BTW, simply by the ordering of when the
> platform registers memory vs when other devices are mapped, we seem to
> end up placing the actual memory regions at the front of the list.  Not
> sure if that's by design or luck.  Thanks,

It's was done by design, though as qemu evolves, it becomes more and 
more a matter of luck that the order doesn't change.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-26  9:20                       ` Michael S. Tsirkin
@ 2011-01-26  9:23                         ` Avi Kivity
  2011-01-26  9:39                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 50+ messages in thread
From: Avi Kivity @ 2011-01-26  9:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On 01/26/2011 11:20 AM, Michael S. Tsirkin wrote:
> On Wed, Jan 26, 2011 at 11:17:11AM +0200, Avi Kivity wrote:
> >  On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:
> >  >On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
> >  >>   On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
> >  >>   >On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
> >  >>   >>    For the other lookups, which we
> >  >>   >>    believe will succeed, we can assume the probablity of a match is
> >  >>   >>    related to the slot size, and sort the slots by page count.
> >  >>   >
> >  >>   >Unlikely to be true for assigned device BARs.
> >  >>
> >  >>   We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
> >  >>   lots of small slots for BARs and such.
> >  >>
> >  >>   The vast majority of faults will either touch one of the two largest
> >  >>   slots, or will miss all slots.  Relatively few will hit the small
> >  >>   slots.
> >  >
> >  >Not if you are using one of the assigned devices and don't
> >  >do any faults on memory :)
> >
> >  It's impossible not to fault on memory.
>
> No I mean the RAM.

No idea what you mean.  It's impossible not to fault on RAM, either 
(unless you don't use it at all).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25 18:00             ` Michael S. Tsirkin
@ 2011-01-26  9:25               ` Avi Kivity
  0 siblings, 0 replies; 50+ messages in thread
From: Avi Kivity @ 2011-01-26  9:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marcelo Tosatti, Alex Williamson, kvm, ddutile, chrisw, jan.kiszka

On 01/25/2011 08:00 PM, Michael S. Tsirkin wrote:
> >  >
> >  >I see now. Yes, a flag in spte would help.  changes in slots would then
> >  >have to update all these flags.
> >
> >  That's easy, we drop all sptes.
>
> Ah, right. Hmm cpu has no flag to distinguish mmio sptes somehow already?

No.  Allocated-but-not-mapped and mmio sptes are identical.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-26  9:23                         ` Avi Kivity
@ 2011-01-26  9:39                           ` Michael S. Tsirkin
  2011-01-26  9:54                             ` Avi Kivity
  0 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-26  9:39 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On Wed, Jan 26, 2011 at 11:23:21AM +0200, Avi Kivity wrote:
> On 01/26/2011 11:20 AM, Michael S. Tsirkin wrote:
> >On Wed, Jan 26, 2011 at 11:17:11AM +0200, Avi Kivity wrote:
> >>  On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:
> >>  >On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
> >>  >>   On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
> >>  >>   >On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
> >>  >>   >>    For the other lookups, which we
> >>  >>   >>    believe will succeed, we can assume the probablity of a match is
> >>  >>   >>    related to the slot size, and sort the slots by page count.
> >>  >>   >
> >>  >>   >Unlikely to be true for assigned device BARs.
> >>  >>
> >>  >>   We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
> >>  >>   lots of small slots for BARs and such.
> >>  >>
> >>  >>   The vast majority of faults will either touch one of the two largest
> >>  >>   slots, or will miss all slots.  Relatively few will hit the small
> >>  >>   slots.
> >>  >
> >>  >Not if you are using one of the assigned devices and don't
> >>  >do any faults on memory :)
> >>
> >>  It's impossible not to fault on memory.
> >
> >No I mean the RAM.
> 
> No idea what you mean.  It's impossible not to fault on RAM, either
> (unless you don't use it at all).

I just mean that once you fault you map sptes and then you can use them
without exits.  mmio will cause exits each time. Right?

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-26  9:39                           ` Michael S. Tsirkin
@ 2011-01-26  9:54                             ` Avi Kivity
  2011-01-26 12:08                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 50+ messages in thread
From: Avi Kivity @ 2011-01-26  9:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On 01/26/2011 11:39 AM, Michael S. Tsirkin wrote:
> On Wed, Jan 26, 2011 at 11:23:21AM +0200, Avi Kivity wrote:
> >  On 01/26/2011 11:20 AM, Michael S. Tsirkin wrote:
> >  >On Wed, Jan 26, 2011 at 11:17:11AM +0200, Avi Kivity wrote:
> >  >>   On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:
> >  >>   >On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
> >  >>   >>    On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
> >  >>   >>    >On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
> >  >>   >>    >>     For the other lookups, which we
> >  >>   >>    >>     believe will succeed, we can assume the probablity of a match is
> >  >>   >>    >>     related to the slot size, and sort the slots by page count.
> >  >>   >>    >
> >  >>   >>    >Unlikely to be true for assigned device BARs.
> >  >>   >>
> >  >>   >>    We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
> >  >>   >>    lots of small slots for BARs and such.
> >  >>   >>
> >  >>   >>    The vast majority of faults will either touch one of the two largest
> >  >>   >>    slots, or will miss all slots.  Relatively few will hit the small
> >  >>   >>    slots.
> >  >>   >
> >  >>   >Not if you are using one of the assigned devices and don't
> >  >>   >do any faults on memory :)
> >  >>
> >  >>   It's impossible not to fault on memory.
> >  >
> >  >No I mean the RAM.
> >
> >  No idea what you mean.  It's impossible not to fault on RAM, either
> >  (unless you don't use it at all).
>
> I just mean that once you fault you map sptes and then you can use them
> without exits.  mmio will cause exits each time. Right?

The swapper scanning sptes, ksmd, khugepaged, and swapping can all cause 
a page to be unmapped.  Though it should certainly happen with a much 
lower frequency than mmio.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-26  9:54                             ` Avi Kivity
@ 2011-01-26 12:08                               ` Michael S. Tsirkin
  2011-01-27  9:21                                 ` Avi Kivity
  0 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-26 12:08 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On Wed, Jan 26, 2011 at 11:54:21AM +0200, Avi Kivity wrote:
> On 01/26/2011 11:39 AM, Michael S. Tsirkin wrote:
> >On Wed, Jan 26, 2011 at 11:23:21AM +0200, Avi Kivity wrote:
> >>  On 01/26/2011 11:20 AM, Michael S. Tsirkin wrote:
> >>  >On Wed, Jan 26, 2011 at 11:17:11AM +0200, Avi Kivity wrote:
> >>  >>   On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:
> >>  >>   >On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
> >>  >>   >>    On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
> >>  >>   >>    >On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
> >>  >>   >>    >>     For the other lookups, which we
> >>  >>   >>    >>     believe will succeed, we can assume the probablity of a match is
> >>  >>   >>    >>     related to the slot size, and sort the slots by page count.
> >>  >>   >>    >
> >>  >>   >>    >Unlikely to be true for assigned device BARs.
> >>  >>   >>
> >>  >>   >>    We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
> >>  >>   >>    lots of small slots for BARs and such.
> >>  >>   >>
> >>  >>   >>    The vast majority of faults will either touch one of the two largest
> >>  >>   >>    slots, or will miss all slots.  Relatively few will hit the small
> >>  >>   >>    slots.
> >>  >>   >
> >>  >>   >Not if you are using one of the assigned devices and don't
> >>  >>   >do any faults on memory :)
> >>  >>
> >>  >>   It's impossible not to fault on memory.
> >>  >
> >>  >No I mean the RAM.
> >>
> >>  No idea what you mean.  It's impossible not to fault on RAM, either
> >>  (unless you don't use it at all).
> >
> >I just mean that once you fault you map sptes and then you can use them
> >without exits.  mmio will cause exits each time. Right?
> 
> The swapper scanning sptes, ksmd, khugepaged, and swapping can all
> cause a page to be unmapped.  Though it should certainly happen with
> a much lower frequency than mmio.

Right. That's why I say that sorting by size might not be optimal.
Maybe a cache ...

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-26 12:08                               ` Michael S. Tsirkin
@ 2011-01-27  9:21                                 ` Avi Kivity
  2011-01-27  9:26                                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 50+ messages in thread
From: Avi Kivity @ 2011-01-27  9:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On 01/26/2011 02:08 PM, Michael S. Tsirkin wrote:
> >  >
> >  >I just mean that once you fault you map sptes and then you can use them
> >  >without exits.  mmio will cause exits each time. Right?
> >
> >  The swapper scanning sptes, ksmd, khugepaged, and swapping can all
> >  cause a page to be unmapped.  Though it should certainly happen with
> >  a much lower frequency than mmio.
>
> Right. That's why I say that sorting by size might not be optimal.
> Maybe a cache ...

Why would it not be optimal?

If you have 16GB RAM in two slots and a few megabytes here and there 
scattered in some slots, you have three orders of magnitudes to spare.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-27  9:21                                 ` Avi Kivity
@ 2011-01-27  9:26                                   ` Michael S. Tsirkin
  2011-01-27  9:28                                     ` Avi Kivity
  2011-01-27  9:28                                     ` Michael S. Tsirkin
  0 siblings, 2 replies; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-27  9:26 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On Thu, Jan 27, 2011 at 11:21:47AM +0200, Avi Kivity wrote:
> On 01/26/2011 02:08 PM, Michael S. Tsirkin wrote:
> >>  >
> >>  >I just mean that once you fault you map sptes and then you can use them
> >>  >without exits.  mmio will cause exits each time. Right?
> >>
> >>  The swapper scanning sptes, ksmd, khugepaged, and swapping can all
> >>  cause a page to be unmapped.  Though it should certainly happen with
> >>  a much lower frequency than mmio.
> >
> >Right. That's why I say that sorting by size might not be optimal.
> >Maybe a cache ...
> 
> Why would it not be optimal?
> 
> If you have 16GB RAM in two slots and a few megabytes here and there
> scattered in some slots, you have three orders of magnitudes to
> spare.

Yes but you might not be accessing all that RAM.
Maybe your workset is just tens of megabytes.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-27  9:26                                   ` Michael S. Tsirkin
@ 2011-01-27  9:28                                     ` Avi Kivity
  2011-01-27  9:29                                       ` Michael S. Tsirkin
  2011-01-27  9:28                                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 50+ messages in thread
From: Avi Kivity @ 2011-01-27  9:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On 01/27/2011 11:26 AM, Michael S. Tsirkin wrote:
> >  >Right. That's why I say that sorting by size might not be optimal.
> >  >Maybe a cache ...
> >
> >  Why would it not be optimal?
> >
> >  If you have 16GB RAM in two slots and a few megabytes here and there
> >  scattered in some slots, you have three orders of magnitudes to
> >  spare.
>
> Yes but you might not be accessing all that RAM.
> Maybe your workset is just tens of megabytes.

Great, then you fault it in and never touch the slots array again.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-27  9:26                                   ` Michael S. Tsirkin
  2011-01-27  9:28                                     ` Avi Kivity
@ 2011-01-27  9:28                                     ` Michael S. Tsirkin
  1 sibling, 0 replies; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-27  9:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On Thu, Jan 27, 2011 at 11:26:19AM +0200, Michael S. Tsirkin wrote:
> On Thu, Jan 27, 2011 at 11:21:47AM +0200, Avi Kivity wrote:
> > On 01/26/2011 02:08 PM, Michael S. Tsirkin wrote:
> > >>  >
> > >>  >I just mean that once you fault you map sptes and then you can use them
> > >>  >without exits.  mmio will cause exits each time. Right?
> > >>
> > >>  The swapper scanning sptes, ksmd, khugepaged, and swapping can all
> > >>  cause a page to be unmapped.  Though it should certainly happen with
> > >>  a much lower frequency than mmio.
> > >
> > >Right. That's why I say that sorting by size might not be optimal.
> > >Maybe a cache ...
> > 
> > Why would it not be optimal?
> > 
> > If you have 16GB RAM in two slots and a few megabytes here and there
> > scattered in some slots, you have three orders of magnitudes to
> > spare.
> 
> Yes but you might not be accessing all that RAM.
> Maybe your workset is just tens of megabytes.

Anyway, what I am saying this is all just guesswork.
Some kind of cache sounds, to me, like a better
approach if we have a ton of slots.


> > -- 
> > error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-27  9:28                                     ` Avi Kivity
@ 2011-01-27  9:29                                       ` Michael S. Tsirkin
  2011-01-27  9:51                                         ` Avi Kivity
  0 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2011-01-27  9:29 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On Thu, Jan 27, 2011 at 11:28:12AM +0200, Avi Kivity wrote:
> On 01/27/2011 11:26 AM, Michael S. Tsirkin wrote:
> >>  >Right. That's why I say that sorting by size might not be optimal.
> >>  >Maybe a cache ...
> >>
> >>  Why would it not be optimal?
> >>
> >>  If you have 16GB RAM in two slots and a few megabytes here and there
> >>  scattered in some slots, you have three orders of magnitudes to
> >>  spare.
> >
> >Yes but you might not be accessing all that RAM.
> >Maybe your workset is just tens of megabytes.
> 
> Great, then you fault it in and never touch the slots array again.

Yep. Except for emulated mmio :)

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-27  9:29                                       ` Michael S. Tsirkin
@ 2011-01-27  9:51                                         ` Avi Kivity
  0 siblings, 0 replies; 50+ messages in thread
From: Avi Kivity @ 2011-01-27  9:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Jan Kiszka, Marcelo Tosatti, kvm, ddutile, chrisw

On 01/27/2011 11:29 AM, Michael S. Tsirkin wrote:
> On Thu, Jan 27, 2011 at 11:28:12AM +0200, Avi Kivity wrote:
> >  On 01/27/2011 11:26 AM, Michael S. Tsirkin wrote:
> >  >>   >Right. That's why I say that sorting by size might not be optimal.
> >  >>   >Maybe a cache ...
> >  >>
> >  >>   Why would it not be optimal?
> >  >>
> >  >>   If you have 16GB RAM in two slots and a few megabytes here and there
> >  >>   scattered in some slots, you have three orders of magnitudes to
> >  >>   spare.
> >  >
> >  >Yes but you might not be accessing all that RAM.
> >  >Maybe your workset is just tens of megabytes.
> >
> >  Great, then you fault it in and never touch the slots array again.
>
> Yep. Except for emulated mmio :)
>

Which is why I want to cache the result of a negative lookup in the spte.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-25  5:37       ` Alex Williamson
  2011-01-25  7:36         ` Jan Kiszka
  2011-01-25 10:23         ` Avi Kivity
@ 2011-01-31 19:18         ` Marcelo Tosatti
  2011-02-23 21:46           ` Alex Williamson
  2 siblings, 1 reply; 50+ messages in thread
From: Marcelo Tosatti @ 2011-01-31 19:18 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jan Kiszka, kvm, ddutile, mst, avi, chrisw

On Mon, Jan 24, 2011 at 10:37:56PM -0700, Alex Williamson wrote:
> On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
> > I'll look at how we might be
> > able to allocate slots on demand.  Thanks,
> 
> Here's a first cut just to see if this looks agreeable.  This allows the
> slot array to grow on demand.  This works with current userspace, as
> well as userspace trivially modified to double KVMState.slots and
> hotplugging enough pci-assign devices to exceed the previous limit (w/ &
> w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
> the right path?  If so, I'll work on removing the fixed limit from
> userspace next.  Thanks,
> 
> Alex
> 
> 
> kvm: Allow memory slot array to grow on demand
> 
> Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
> to grow on demand.  Private slots are now allocated at the
> front instead of the end.  Only x86 seems to use private slots,
> so this is now zero for all other archs.  The memslots pointer
> is already updated using rcu, so changing the size off the
> array when it's replaces is straight forward.  x86 also keeps
> a bitmap of slots used by a kvm_mmu_page, which requires a
> shadow tlb flush whenever we increase the number of slots.
> This forces the pages to be rebuilt with the new bitmap size.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
>  arch/ia64/include/asm/kvm_host.h    |    4 --
>  arch/ia64/kvm/kvm-ia64.c            |    2 +
>  arch/powerpc/include/asm/kvm_host.h |    3 --
>  arch/s390/include/asm/kvm_host.h    |    3 --
>  arch/x86/include/asm/kvm_host.h     |    3 +-
>  arch/x86/include/asm/vmx.h          |    6 ++-
>  arch/x86/kvm/mmu.c                  |    7 +++-
>  arch/x86/kvm/x86.c                  |    6 ++-
>  include/linux/kvm_host.h            |    7 +++-
>  virt/kvm/kvm_main.c                 |   65 ++++++++++++++++++++++++-----------
>  10 files changed, 63 insertions(+), 43 deletions(-)
> 
> 
> diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
> index 2689ee5..11d0ab2 100644
> --- a/arch/ia64/include/asm/kvm_host.h
> +++ b/arch/ia64/include/asm/kvm_host.h
> @@ -23,10 +23,6 @@
>  #ifndef __ASM_KVM_HOST_H
>  #define __ASM_KVM_HOST_H
>  
> -#define KVM_MEMORY_SLOTS 32
> -/* memory slots that does not exposed to userspace */
> -#define KVM_PRIVATE_MEM_SLOTS 4
> -
>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>  
>  /* define exit reasons from vmm to kvm*/
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index 70d224d..f1adda2 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1814,7 +1814,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  	mutex_lock(&kvm->slots_lock);
>  
>  	r = -EINVAL;
> -	if (log->slot >= KVM_MEMORY_SLOTS)
> +	if (log->slot >= kvm->memslots->nmemslots)
>  		goto out;
>  
>  	memslot = &kvm->memslots->memslots[log->slot];
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index bba3b9b..dc80057 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -29,9 +29,6 @@
>  #include <asm/kvm_asm.h>
>  
>  #define KVM_MAX_VCPUS 1
> -#define KVM_MEMORY_SLOTS 32
> -/* memory slots that does not exposed to userspace */
> -#define KVM_PRIVATE_MEM_SLOTS 4
>  
>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>  
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index cef7dbf..92a964c 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -20,9 +20,6 @@
>  #include <asm/cpu.h>
>  
>  #define KVM_MAX_VCPUS 64
> -#define KVM_MEMORY_SLOTS 32
> -/* memory slots that does not exposed to userspace */
> -#define KVM_PRIVATE_MEM_SLOTS 4
>  
>  struct sca_entry {
>  	atomic_t scn;
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ffd7f8d..df1382c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -27,7 +27,6 @@
>  #include <asm/msr-index.h>
>  
>  #define KVM_MAX_VCPUS 64
> -#define KVM_MEMORY_SLOTS 32
>  /* memory slots that does not exposed to userspace */
>  #define KVM_PRIVATE_MEM_SLOTS 4
>  
> @@ -207,7 +206,7 @@ struct kvm_mmu_page {
>  	 * One bit set per slot which has memory
>  	 * in this shadow page.
>  	 */
> -	DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
> +	unsigned long *slot_bitmap;
>  	bool multimapped;         /* More than one parent_pte? */
>  	bool unsync;
>  	int root_count;          /* Currently serving as active root */
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 84471b8..7fd8c89 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -370,9 +370,9 @@ enum vmcs_field {
>  
>  #define AR_RESERVD_MASK 0xfffe0f00
>  
> -#define TSS_PRIVATE_MEMSLOT			(KVM_MEMORY_SLOTS + 0)
> -#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT	(KVM_MEMORY_SLOTS + 1)
> -#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT	(KVM_MEMORY_SLOTS + 2)
> +#define TSS_PRIVATE_MEMSLOT			0
> +#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT	1
> +#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT	2
>  
>  #define VMX_NR_VPIDS				(1 << 16)
>  #define VMX_VPID_EXTENT_SINGLE_CONTEXT		1
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ccacf0b..8c2533a 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1032,6 +1032,7 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  	ASSERT(is_empty_shadow_page(sp->spt));
>  	hlist_del(&sp->hash_link);
>  	list_del(&sp->link);
> +	kfree(sp->slot_bitmap);
>  	__free_page(virt_to_page(sp->spt));
>  	if (!sp->role.direct)
>  		__free_page(virt_to_page(sp->gfns));
> @@ -1048,6 +1049,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>  					       u64 *parent_pte, int direct)
>  {
>  	struct kvm_mmu_page *sp;
> +	struct kvm_memslots *slots = kvm_memslots(vcpu->kvm);
>  
>  	sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp);
>  	sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
> @@ -1056,7 +1058,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>  						  PAGE_SIZE);
>  	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
>  	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
> -	bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
> +	sp->slot_bitmap = kzalloc(sizeof(long) *
> +				  BITS_TO_LONGS(slots->nmemslots), GFP_KERNEL);
> +	if (!sp->slot_bitmap)
> +		return NULL;
>  	sp->multimapped = 0;
>  	sp->parent_pte = parent_pte;
>  	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5eccdba..c002fac 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1978,7 +1978,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  		r = KVM_MAX_VCPUS;
>  		break;
>  	case KVM_CAP_NR_MEMSLOTS:
> -		r = KVM_MEMORY_SLOTS;
> +		r = INT_MAX - KVM_PRIVATE_MEM_SLOTS;
>  		break;
>  	case KVM_CAP_PV_MMU:	/* obsolete */
>  		r = 0;
> @@ -3201,7 +3201,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  	mutex_lock(&kvm->slots_lock);
>  
>  	r = -EINVAL;
> -	if (log->slot >= KVM_MEMORY_SLOTS)
> +	if (log->slot >= kvm->memslots->nmemslots)
>  		goto out;
>  
>  	memslot = &kvm->memslots->memslots[log->slot];
> @@ -6068,7 +6068,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  	int map_flags = MAP_PRIVATE | MAP_ANONYMOUS;
>  
>  	/* Prevent internal slot pages from being moved by fork()/COW. */
> -	if (memslot->id >= KVM_MEMORY_SLOTS)
> +	if (memslot->id < KVM_PRIVATE_MEM_SLOTS)
>  		map_flags = MAP_SHARED | MAP_ANONYMOUS;
>  
>  	/*To keep backward compatibility with older userspace,
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b5021db..4cb9f94 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -27,6 +27,10 @@
>  
>  #include <asm/kvm_host.h>
>  
> +#ifndef KVM_PRIVATE_MEM_SLOTS
> + #define KVM_PRIVATE_MEM_SLOTS 0
> +#endif
> +
>  /*
>   * vcpu->requests bit members
>   */
> @@ -206,8 +210,7 @@ struct kvm_irq_routing_table {};
>  struct kvm_memslots {
>  	int nmemslots;
>  	u64 generation;
> -	struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS +
> -					KVM_PRIVATE_MEM_SLOTS];
> +	struct kvm_memory_slot memslots[];
>  };
>  
>  struct kvm {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fd67bcd..32f023c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -623,13 +623,14 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  			    struct kvm_userspace_memory_region *mem,
>  			    int user_alloc)
>  {
> -	int r;
> +	int r, nmemslots;
>  	gfn_t base_gfn;
>  	unsigned long npages;
>  	unsigned long i;
> -	struct kvm_memory_slot *memslot;
> -	struct kvm_memory_slot old, new;
> +	struct kvm_memory_slot *memslot = NULL;
> +	struct kvm_memory_slot old = {}, new = {};
>  	struct kvm_memslots *slots, *old_memslots;
> +	bool flush = false;
>  
>  	r = -EINVAL;
>  	/* General sanity checks */
> @@ -639,12 +640,9 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  		goto out;
>  	if (user_alloc && (mem->userspace_addr & (PAGE_SIZE - 1)))
>  		goto out;
> -	if (mem->slot >= KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS)
> -		goto out;
>  	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
>  		goto out;
>  
> -	memslot = &kvm->memslots->memslots[mem->slot];
>  	base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
>  	npages = mem->memory_size >> PAGE_SHIFT;
>  
> @@ -655,7 +653,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  	if (!npages)
>  		mem->flags &= ~KVM_MEM_LOG_DIRTY_PAGES;
>  
> -	new = old = *memslot;
> +	if (mem->slot < kvm->memslots->nmemslots) {
> +		memslot = &kvm->memslots->memslots[mem->slot];
> +		new = old = *memslot;
> +	}

>  		if (s == memslot || !s->npages)
> @@ -752,12 +753,19 @@ skip_lpage:
>  
>  	if (!npages) {
>  		r = -ENOMEM;
> -		slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> +
> +		nmemslots = (mem->slot >= kvm->memslots->nmemslots) ?
> +			    mem->slot + 1 : kvm->memslots->nmemslots;
> +
> +		slots = kzalloc(sizeof(struct kvm_memslots) +
> +				nmemslots * sizeof(struct kvm_memory_slot),
> +				GFP_KERNEL);
>  		if (!slots)
>  			goto out_free;
> -		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> -		if (mem->slot >= slots->nmemslots)
> -			slots->nmemslots = mem->slot + 1;
> +		memcpy(slots, kvm->memslots,
> +		       sizeof(struct kvm_memslots) + kvm->memslots->nmemslots *
> +		       sizeof(struct kvm_memory_slot));
> +		slots->nmemslots = nmemslots;

Other than the upper limit, should disallow increasing the number of
slots in case the new slot is being deleted (npages == 0). So none of
this additions to !npages case should be necessary.

Also, must disallow shrinking the number of slots.

>  		slots->generation++;
>  		slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID;
>  
> @@ -787,12 +795,21 @@ skip_lpage:
>  	}
>  
>  	r = -ENOMEM;
> -	slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> +
> +	if (mem->slot >= kvm->memslots->nmemslots) {
> +		nmemslots = mem->slot + 1;
> +		flush = true;
> +	} else
> +		nmemslots = kvm->memslots->nmemslots;
> +
> +	slots = kzalloc(sizeof(struct kvm_memslots) +
> +			nmemslots * sizeof(struct kvm_memory_slot),
> +			GFP_KERNEL);
>  	if (!slots)
>  		goto out_free;
> -	memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> -	if (mem->slot >= slots->nmemslots)
> -		slots->nmemslots = mem->slot + 1;
> +	memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots) +
> +	       kvm->memslots->nmemslots * sizeof(struct kvm_memory_slot));
> +	slots->nmemslots = nmemslots;
>  	slots->generation++;
>  
>  	/* actual memory is freed via old in kvm_free_physmem_slot below */
> @@ -808,6 +825,9 @@ skip_lpage:
>  	rcu_assign_pointer(kvm->memslots, slots);

It should be: 

    spin_lock(kvm->mmu_lock)
    rcu_assign_pointer()
    kvm_arch_flush_shadow()
    spin_unlock(kvm->mmu_lock)

But kvm_arch_flush_shadow() takes mmu_lock currently, so that needs
fixing. 

Otherwise looks fine.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-01-31 19:18         ` Marcelo Tosatti
@ 2011-02-23 21:46           ` Alex Williamson
  2011-02-24 12:34             ` Avi Kivity
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Williamson @ 2011-02-23 21:46 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Jan Kiszka, kvm, ddutile, mst, avi, chrisw

On Mon, 2011-01-31 at 17:18 -0200, Marcelo Tosatti wrote:
> On Mon, Jan 24, 2011 at 10:37:56PM -0700, Alex Williamson wrote:
> > On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
> > > I'll look at how we might be
> > > able to allocate slots on demand.  Thanks,
> > 
> > Here's a first cut just to see if this looks agreeable.  This allows the
> > slot array to grow on demand.  This works with current userspace, as
> > well as userspace trivially modified to double KVMState.slots and
> > hotplugging enough pci-assign devices to exceed the previous limit (w/ &
> > w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
> > the right path?  If so, I'll work on removing the fixed limit from
> > userspace next.  Thanks,
> > 
> > Alex
> > 
> > 
> > kvm: Allow memory slot array to grow on demand
> > 
> > Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
> > to grow on demand.  Private slots are now allocated at the
> > front instead of the end.  Only x86 seems to use private slots,
> > so this is now zero for all other archs.  The memslots pointer
> > is already updated using rcu, so changing the size off the
> > array when it's replaces is straight forward.  x86 also keeps
> > a bitmap of slots used by a kvm_mmu_page, which requires a
> > shadow tlb flush whenever we increase the number of slots.
> > This forces the pages to be rebuilt with the new bitmap size.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> > 
> >  arch/ia64/include/asm/kvm_host.h    |    4 --
> >  arch/ia64/kvm/kvm-ia64.c            |    2 +
> >  arch/powerpc/include/asm/kvm_host.h |    3 --
> >  arch/s390/include/asm/kvm_host.h    |    3 --
> >  arch/x86/include/asm/kvm_host.h     |    3 +-
> >  arch/x86/include/asm/vmx.h          |    6 ++-
> >  arch/x86/kvm/mmu.c                  |    7 +++-
> >  arch/x86/kvm/x86.c                  |    6 ++-
> >  include/linux/kvm_host.h            |    7 +++-
> >  virt/kvm/kvm_main.c                 |   65 ++++++++++++++++++++++++-----------
> >  10 files changed, 63 insertions(+), 43 deletions(-)
[snip]
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index fd67bcd..32f023c 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
[snip]
> > @@ -752,12 +753,19 @@ skip_lpage:
> >  
> >  	if (!npages) {
> >  		r = -ENOMEM;
> > -		slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> > +
> > +		nmemslots = (mem->slot >= kvm->memslots->nmemslots) ?
> > +			    mem->slot + 1 : kvm->memslots->nmemslots;
> > +
> > +		slots = kzalloc(sizeof(struct kvm_memslots) +
> > +				nmemslots * sizeof(struct kvm_memory_slot),
> > +				GFP_KERNEL);
> >  		if (!slots)
> >  			goto out_free;
> > -		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> > -		if (mem->slot >= slots->nmemslots)
> > -			slots->nmemslots = mem->slot + 1;
> > +		memcpy(slots, kvm->memslots,
> > +		       sizeof(struct kvm_memslots) + kvm->memslots->nmemslots *
> > +		       sizeof(struct kvm_memory_slot));
> > +		slots->nmemslots = nmemslots;
> 
> Other than the upper limit, should disallow increasing the number of
> slots in case the new slot is being deleted (npages == 0). So none of
> this additions to !npages case should be necessary.

The existing code currently allows adding a slot with !npages.  I'll
create a small separate patch to make this behavioral change.

> Also, must disallow shrinking the number of slots.

Right, we only grow, never shrink.

> >  		slots->generation++;
> >  		slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID;
> >  
> > @@ -787,12 +795,21 @@ skip_lpage:
> >  	}
> >  
> >  	r = -ENOMEM;
> > -	slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> > +
> > +	if (mem->slot >= kvm->memslots->nmemslots) {
> > +		nmemslots = mem->slot + 1;
> > +		flush = true;
> > +	} else
> > +		nmemslots = kvm->memslots->nmemslots;
> > +
> > +	slots = kzalloc(sizeof(struct kvm_memslots) +
> > +			nmemslots * sizeof(struct kvm_memory_slot),
> > +			GFP_KERNEL);
> >  	if (!slots)
> >  		goto out_free;
> > -	memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> > -	if (mem->slot >= slots->nmemslots)
> > -		slots->nmemslots = mem->slot + 1;
> > +	memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots) +
> > +	       kvm->memslots->nmemslots * sizeof(struct kvm_memory_slot));
> > +	slots->nmemslots = nmemslots;
> >  	slots->generation++;
> >  
> >  	/* actual memory is freed via old in kvm_free_physmem_slot below */
> > @@ -808,6 +825,9 @@ skip_lpage:
> >  	rcu_assign_pointer(kvm->memslots, slots);
> 
> It should be: 
> 
>     spin_lock(kvm->mmu_lock)
>     rcu_assign_pointer()
>     kvm_arch_flush_shadow()
>     spin_unlock(kvm->mmu_lock)
> 
> But kvm_arch_flush_shadow() takes mmu_lock currently, so that needs
> fixing. 

Hmm, I tried to follow the example in the !npages path just above this
that does:

rcu_assign_pointer()
synchronize_srcu_expedited()
kvm_arch_flush_shadow()

Do we have an existing issue there with mmu_lock?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-02-23 21:46           ` Alex Williamson
@ 2011-02-24 12:34             ` Avi Kivity
  2011-02-24 12:37               ` Avi Kivity
  2011-02-24 18:10               ` Alex Williamson
  0 siblings, 2 replies; 50+ messages in thread
From: Avi Kivity @ 2011-02-24 12:34 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Marcelo Tosatti, Jan Kiszka, kvm, ddutile, mst, chrisw

On 02/23/2011 11:46 PM, Alex Williamson wrote:
> >
> >  But kvm_arch_flush_shadow() takes mmu_lock currently, so that needs
> >  fixing.
>
> Hmm, I tried to follow the example in the !npages path just above this
> that does:
>
> rcu_assign_pointer()
> synchronize_srcu_expedited()
> kvm_arch_flush_shadow()
>
> Do we have an existing issue there with mmu_lock?  Thanks,
>

It's the issue I pointed out - after rcu_assign_pointer and before 
kvm_arch_flush_shadow() we can still get shadow pages created, so we 
have a mix of direct and indirect bitmap.  Either my solution (recording 
whether the bitmap is direct or not in kvm_mmu_page) or Marcelo's 
(preventing shadow instantiation during this period) should work.

Hmm, your patch dereferences kvm->memslots without rcu_dereference().  
That's a mortal (as in oops) sin.  Note that sticking an 
rcu_dereference() blindly may or may not work - we might end up using 
information from two different generations of kvm->memslots, and using 
inconsistent data.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-02-24 12:34             ` Avi Kivity
@ 2011-02-24 12:37               ` Avi Kivity
  2011-02-24 18:10               ` Alex Williamson
  1 sibling, 0 replies; 50+ messages in thread
From: Avi Kivity @ 2011-02-24 12:37 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Marcelo Tosatti, Jan Kiszka, kvm, ddutile, mst, chrisw

On 02/24/2011 02:34 PM, Avi Kivity wrote:
>
> Hmm, your patch dereferences kvm->memslots without rcu_dereference().  
> That's a mortal (as in oops) sin.

Sorry, that's wrong, all those places are under ->slots_lock.

>   Note that sticking an rcu_dereference() blindly may or may not work 
> - we might end up using information from two different generations of 
> kvm->memslots, and using inconsistent data.
>

This may still hold.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
  2011-02-24 12:34             ` Avi Kivity
  2011-02-24 12:37               ` Avi Kivity
@ 2011-02-24 18:10               ` Alex Williamson
  1 sibling, 0 replies; 50+ messages in thread
From: Alex Williamson @ 2011-02-24 18:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Jan Kiszka, kvm, ddutile, mst, chrisw

On Thu, 2011-02-24 at 14:34 +0200, Avi Kivity wrote:
> On 02/23/2011 11:46 PM, Alex Williamson wrote:
> > >
> > >  But kvm_arch_flush_shadow() takes mmu_lock currently, so that needs
> > >  fixing.
> >
> > Hmm, I tried to follow the example in the !npages path just above this
> > that does:
> >
> > rcu_assign_pointer()
> > synchronize_srcu_expedited()
> > kvm_arch_flush_shadow()
> >
> > Do we have an existing issue there with mmu_lock?  Thanks,
> >
> 
> It's the issue I pointed out - after rcu_assign_pointer and before 
> kvm_arch_flush_shadow() we can still get shadow pages created, so we 
> have a mix of direct and indirect bitmap.  Either my solution (recording 
> whether the bitmap is direct or not in kvm_mmu_page) or Marcelo's 
> (preventing shadow instantiation during this period) should work.

Thanks, I understand better now.

Alex


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2011-02-24 18:10 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-21 23:48 [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts Alex Williamson
2011-01-21 23:48 ` [RFC PATCH 1/2] kvm: Allow querying free slots Alex Williamson
2011-01-21 23:48 ` [RFC PATCH 2/2] device-assignment: Count required kvm memory slots Alex Williamson
2011-01-22 22:11 ` [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts Michael S. Tsirkin
2011-01-24  9:32 ` Marcelo Tosatti
2011-01-24 14:16   ` Jan Kiszka
2011-01-24 15:44     ` Alex Williamson
2011-01-25  5:37       ` Alex Williamson
2011-01-25  7:36         ` Jan Kiszka
2011-01-25 14:41           ` Alex Williamson
2011-01-25 14:45             ` Michael S. Tsirkin
2011-01-25 14:54               ` Avi Kivity
2011-01-25 14:53             ` Avi Kivity
2011-01-25 14:59               ` Michael S. Tsirkin
2011-01-25 17:33                 ` Avi Kivity
2011-01-25 17:58                   ` Michael S. Tsirkin
2011-01-26  9:17                     ` Avi Kivity
2011-01-26  9:20                       ` Michael S. Tsirkin
2011-01-26  9:23                         ` Avi Kivity
2011-01-26  9:39                           ` Michael S. Tsirkin
2011-01-26  9:54                             ` Avi Kivity
2011-01-26 12:08                               ` Michael S. Tsirkin
2011-01-27  9:21                                 ` Avi Kivity
2011-01-27  9:26                                   ` Michael S. Tsirkin
2011-01-27  9:28                                     ` Avi Kivity
2011-01-27  9:29                                       ` Michael S. Tsirkin
2011-01-27  9:51                                         ` Avi Kivity
2011-01-27  9:28                                     ` Michael S. Tsirkin
2011-01-25 16:35               ` Jan Kiszka
2011-01-25 19:13                 ` Alex Williamson
2011-01-26  8:14                   ` Jan Kiszka
2011-01-25 10:23         ` Avi Kivity
2011-01-25 14:57           ` Alex Williamson
2011-01-25 17:11             ` Avi Kivity
2011-01-25 17:43               ` Alex Williamson
2011-01-26  9:22                 ` Avi Kivity
2011-01-31 19:18         ` Marcelo Tosatti
2011-02-23 21:46           ` Alex Williamson
2011-02-24 12:34             ` Avi Kivity
2011-02-24 12:37               ` Avi Kivity
2011-02-24 18:10               ` Alex Williamson
2011-01-25 10:20   ` Avi Kivity
2011-01-25 14:46     ` Alex Williamson
2011-01-25 14:56       ` Avi Kivity
2011-01-25 14:55     ` Michael S. Tsirkin
2011-01-25 14:58       ` Avi Kivity
2011-01-25 15:23         ` Michael S. Tsirkin
2011-01-25 17:34           ` Avi Kivity
2011-01-25 18:00             ` Michael S. Tsirkin
2011-01-26  9:25               ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.