All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side)
@ 2016-01-18  5:44 David Gibson
  2016-01-18  5:44 ` [Qemu-devel] [RFC 1/3] pseries: Stub hypercalls for HPT resizing David Gibson
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: David Gibson @ 2016-01-18  5:44 UTC (permalink / raw)
  To: benh, paulus
  Cc: lvivier, thuth, agraf, qemu-devel, qemu-ppc, bharata, David Gibson

Here is a draft qemu implementation of my proposed PAPR extension for
allowing runtime resizing of a KVM/ppc64 guest's hash page table.
That in turn will allow for more flexible memory hotplug.

This should work with the guest kernel side patches I also posted
recently [1].

Still required to make this into a full implementation:
  * Guest needs to auto-resize HPT on memory hotplug events

  * qemu needs to allocate HPT size based on current rather than
    maximum memory if the guest is HPT resize aware

  * KVM host side implementation

  * PAPR standardization


[1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/90392

David Gibson (3):
  pseries: Stub hypercalls for HPT resizing
  pseries: Implement HPT resizing
  pseries: Advertise HPT resize capability

 hw/ppc/spapr.c          |   5 +-
 hw/ppc/spapr_hcall.c    | 331 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h  |   9 +-
 target-ppc/mmu-hash64.h |   4 +
 trace-events            |   2 +
 5 files changed, 348 insertions(+), 3 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC 1/3] pseries: Stub hypercalls for HPT resizing
  2016-01-18  5:44 [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side) David Gibson
@ 2016-01-18  5:44 ` David Gibson
  2016-01-18  5:44 ` [Qemu-devel] [RFC 2/3] pseries: Implement " David Gibson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2016-01-18  5:44 UTC (permalink / raw)
  To: benh, paulus
  Cc: lvivier, thuth, agraf, qemu-devel, qemu-ppc, bharata, David Gibson

This introduces stub implementations of the H_RESIZE_HPT_PREPARE and
H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR extension to
allow run time resizing of a guest's hash page table.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_hcall.c   | 29 +++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |  4 +++-
 trace-events           |  2 ++
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 1a1bea8..01c034c 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -316,6 +316,30 @@ static target_ulong h_read(PowerPCCPU *cpu, sPAPRMachineState *spapr,
     return H_SUCCESS;
 }
 
+static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
+                                         sPAPRMachineState *spapr,
+                                         target_ulong opcode,
+                                         target_ulong *args)
+{
+    target_ulong flags = args[0];
+    target_ulong shift = args[1];
+
+    trace_spapr_h_resize_hpt_prepare(flags, shift);
+    return H_HARDWARE;
+}
+
+static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu,
+                                        sPAPRMachineState *spapr,
+                                        target_ulong opcode,
+                                        target_ulong *args)
+{
+    target_ulong flags = args[0];
+    target_ulong shift = args[1];
+
+    trace_spapr_h_resize_hpt_commit(flags, shift);
+    return H_HARDWARE;
+}
+
 static target_ulong h_set_dabr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
                                target_ulong opcode, target_ulong *args)
 {
@@ -974,6 +998,11 @@ static void hypercall_register_types(void)
     /* hcall-bulk */
     spapr_register_hypercall(H_BULK_REMOVE, h_bulk_remove);
 
+    /* hcall-hpt-resize */
+    spapr_register_hypercall(KVMPPC_H_RESIZE_HPT_PREPARE,
+                             h_resize_hpt_prepare);
+    spapr_register_hypercall(KVMPPC_H_RESIZE_HPT_COMMIT, h_resize_hpt_commit);
+
     /* hcall-dabr */
     spapr_register_hypercall(H_SET_DABR, h_set_dabr);
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 53af76a..028afc9 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -352,7 +352,9 @@ struct sPAPRMachineState {
 #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
 /* Client Architecture support */
 #define KVMPPC_H_CAS            (KVMPPC_HCALL_BASE + 0x2)
-#define KVMPPC_HCALL_MAX        KVMPPC_H_CAS
+#define KVMPPC_H_RESIZE_HPT_PREPARE (KVMPPC_HCALL_BASE + 0x3)
+#define KVMPPC_H_RESIZE_HPT_COMMIT  (KVMPPC_HCALL_BASE + 0x4)
+#define KVMPPC_HCALL_MAX        KVMPPC_H_RESIZE_HPT_COMMIT
 
 typedef struct sPAPRDeviceTreeUpdateHeader {
     uint32_t version_id;
diff --git a/trace-events b/trace-events
index 934a7b6..f0d6e49 100644
--- a/trace-events
+++ b/trace-events
@@ -1403,6 +1403,8 @@ spapr_cas_continue(unsigned long n) "Copy changes to the guest: %ld bytes"
 # hw/ppc/spapr_hcall.c
 spapr_cas_pvr_try(uint32_t pvr) "%x"
 spapr_cas_pvr(uint32_t cur_pvr, bool cpu_match, uint32_t new_pvr, uint64_t pcr) "current=%x, cpu_match=%u, new=%x, compat flags=%"PRIx64
+spapr_h_resize_hpt_prepare(uint64_t flags, uint64_t shift) "flags=0x%"PRIx64", shift=%"PRIu64
+spapr_h_resize_hpt_commit(uint64_t flags, uint64_t shift) "flags=0x%"PRIx64", shift=%"PRIu64
 
 # hw/ppc/spapr_iommu.c
 spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC 2/3] pseries: Implement HPT resizing
  2016-01-18  5:44 [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side) David Gibson
  2016-01-18  5:44 ` [Qemu-devel] [RFC 1/3] pseries: Stub hypercalls for HPT resizing David Gibson
@ 2016-01-18  5:44 ` David Gibson
  2016-01-18  5:44 ` [Qemu-devel] [RFC 3/3] pseries: Advertise HPT resize capability David Gibson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2016-01-18  5:44 UTC (permalink / raw)
  To: benh, paulus
  Cc: lvivier, thuth, agraf, qemu-devel, qemu-ppc, bharata, David Gibson

This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table.  This will eventually allow for more flexible memory
hotplug.

The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function.  The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.

The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion.  If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.

The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT.  The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).

For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT.  This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).

In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs.  That's a project for another day, but should be possible
without any changes to the guest interface.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c          |   2 -
 hw/ppc/spapr_hcall.c    | 308 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/hw/ppc/spapr.h  |   5 +
 target-ppc/mmu-hash64.h |   4 +
 4 files changed, 314 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 50e5a26..e26baca 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -90,8 +90,6 @@
 
 #define PHANDLE_XICP            0x00001111
 
-#define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
-
 static XICSState *try_create_xics(const char *type, int nr_servers,
                                   int nr_irqs, Error **errp)
 {
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 01c034c..1d5efef 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1,4 +1,5 @@
 #include "sysemu/sysemu.h"
+#include "qemu/error-report.h"
 #include "cpu.h"
 #include "helper_regs.h"
 #include "hw/ppc/spapr.h"
@@ -316,16 +317,278 @@ static target_ulong h_read(PowerPCCPU *cpu, sPAPRMachineState *spapr,
     return H_SUCCESS;
 }
 
+struct sPAPRPendingHPT {
+    /* These fields are read-only after initialization */
+    int shift;
+    QemuThread thread;
+
+    /* These fields are protected by the BQL */
+    bool complete;
+
+    /* These fields are private to the preparation thread if
+     * !complete, otherwise protected by the BQL */
+    int ret;
+    void *hpt;
+};
+
+static void free_pending_hpt(sPAPRPendingHPT *pending)
+{
+    if (pending->hpt) {
+        qemu_vfree(pending->hpt);
+    }
+
+    g_free(pending);
+}
+
+static void *hpt_prepare_thread(void *opaque)
+{
+    sPAPRPendingHPT *pending = opaque;
+    size_t size = 1ULL << pending->shift;
+
+    pending->hpt = qemu_memalign(size, size);
+    if (pending->hpt) {
+        memset(pending->hpt, 0, size);
+        pending->ret = H_SUCCESS;
+    } else {
+        pending->ret = H_NO_MEM;
+    }
+
+    qemu_mutex_lock_iothread();
+
+    if (SPAPR_MACHINE(qdev_get_machine())->pending_hpt != pending) {
+        /* We've been cancelled, clean ourselves up */
+        free_pending_hpt(pending);
+        goto out;
+    }
+
+    pending->complete = true;
+
+out:
+    qemu_mutex_unlock_iothread();
+    return NULL;
+}
+
+/* Must be called with BQL held */
+static void cancel_hpt_prepare(sPAPRMachineState *spapr)
+{
+    sPAPRPendingHPT *pending = spapr->pending_hpt;
+
+    /* Let the thread know it's cancelled */
+    spapr->pending_hpt = NULL;
+
+    if (!pending) {
+        /* Nothing to do */
+        return;
+    }
+
+    if (!pending->complete) {
+        /* thread will clean itself up */
+        return;
+    }
+
+    free_pending_hpt(pending);
+}
+
 static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
                                          sPAPRMachineState *spapr,
                                          target_ulong opcode,
                                          target_ulong *args)
 {
     target_ulong flags = args[0];
-    target_ulong shift = args[1];
+    int shift = args[1];
+    sPAPRPendingHPT *pending = spapr->pending_hpt;
 
     trace_spapr_h_resize_hpt_prepare(flags, shift);
-    return H_HARDWARE;
+
+    if (flags != 0) {
+        return H_PARAMETER;
+    }
+
+    if (shift && ((shift < 18) || (shift > 46))) {
+        return H_PARAMETER;
+    }
+
+    if (pending) {
+        /* something already in progress */
+        if (pending->shift == shift) {
+            /* and it's suitable */
+            if (pending->complete) {
+                return pending->ret;
+            } else {
+                return H_LONG_BUSY_ORDER_100_MSEC;
+            }
+        }
+
+        /* not suitable, cancel and replace */
+        cancel_hpt_prepare(spapr);
+    }
+
+    if (!shift) {
+        /* nothing to do */
+        return H_SUCCESS;
+    }
+
+    /* start new prepare */
+    pending = g_malloc0(sizeof(*pending));
+    pending->shift = shift;
+    pending->ret = H_HARDWARE;
+
+    qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
+                       hpt_prepare_thread, pending, QEMU_THREAD_DETACHED);
+
+    spapr->pending_hpt = pending;
+
+    /* In theory we could estimate the time more accurately based on
+     * the new size, but there's not much point */
+    return H_LONG_BUSY_ORDER_100_MSEC;
+}
+
+static uint64 new_hpte_load0(void *htab, uint64_t pteg, int slot)
+{
+    uint8_t *addr = htab;
+
+    addr += pteg * HASH_PTEG_SIZE_64;
+    addr += slot * HASH_PTE_SIZE_64;
+    return  ldq_p(addr);
+}
+
+static void new_hpte_store(void *htab, uint64_t pteg, int slot,
+                           uint64_t pte0, uint64_t pte1)
+{
+    uint8_t *addr = htab;
+
+    addr += pteg * HASH_PTEG_SIZE_64;
+    addr += slot * HASH_PTE_SIZE_64;
+
+    stq_p(addr, pte0);
+    stq_p(addr + HASH_PTE_SIZE_64/2, pte1);
+}
+
+static int rehash_hpte(PowerPCCPU *cpu, uint64_t token,
+                       void *old, uint64_t oldsize,
+                       void *new, uint64_t newsize,
+                       uint64_t pteg, int slot)
+{
+    uint64_t old_hash_mask = (oldsize >> 7) - 1;
+    uint64_t new_hash_mask = (newsize >> 7) - 1;
+    target_ulong pte0 = ppc_hash64_load_hpte0(cpu, token, slot);
+    target_ulong pte1;
+    uint64_t avpn;
+    unsigned shift, spshift;
+    uint64_t hash, new_pteg, replace_pte0;
+
+    if (!(pte0 & HPTE64_V_VALID) || !(pte0 & HPTE64_V_BOLTED)) {
+        return H_SUCCESS;
+    }
+
+    pte1 = ppc_hash64_load_hpte1(cpu, token, slot);
+
+    shift = ppc_hash64_hpte_page_shift_noslb(cpu, pte0, pte1, &spshift);
+    assert(shift); /* H_ENTER should never have allowed a bad encoding */
+    avpn = HPTE64_V_AVPN_VAL(pte0) & ~(((1ULL << shift) - 1) >> 23);
+
+    if (pte0 & HPTE64_V_SECONDARY) {
+        pteg = ~pteg;
+    }
+
+    if ((pte0 & HPTE64_V_SSIZE) == HPTE64_V_SSIZE_256M) {
+        uint64_t offset, vsid;
+
+        /* We only have 28 - 23 bits of offset in avpn */
+        offset = (avpn & 0x1f) << 23;
+        vsid = avpn >> 5;
+        /* We can find more bits from the pteg value */
+        if (shift < 23) {
+            offset |= ((vsid ^ pteg) & old_hash_mask) << shift;
+        }
+
+        hash = vsid ^ (offset >> shift);
+    } else if ((pte0 & HPTE64_V_SSIZE) == HPTE64_V_SSIZE_1T) {
+        uint64_t offset, vsid;
+
+        /* We only have 40 - 23 bits of seg_off in avpn */
+        offset = (avpn & 0x1ffff) << 23;
+        vsid = avpn >> 17;
+        if (shift < 23) {
+            offset |= ((vsid ^ (vsid << 25) ^ pteg) & old_hash_mask) << shift;
+        }
+
+        hash = vsid ^ (vsid << 25) ^ (offset >> shift);
+    } else {
+        error_report("rehash_pte: Bad segment size in HPTE");
+        return H_HARDWARE;
+    }
+
+    new_pteg = hash & new_hash_mask;
+    if (pte0 & HPTE64_V_SECONDARY) {
+        assert(~pteg == (hash & old_hash_mask));
+        new_pteg = ~new_pteg;
+    } else {
+        assert(pteg == (hash & old_hash_mask));
+    }
+    assert((oldsize != newsize) || (pteg == new_pteg));
+    replace_pte0 = new_hpte_load0(new, new_pteg, slot);
+    if (replace_pte0 & HPTE64_V_VALID) {
+        assert(newsize < oldsize);
+        if (replace_pte0 & HPTE64_V_BOLTED) {
+            if (pte0 & HPTE64_V_BOLTED) {
+                /* Bolted collision, nothing we can do */
+                return H_PTEG_FULL;
+            } else {
+                /* Discard this hpte */
+                return H_SUCCESS;
+            }
+        }
+    }
+
+    new_hpte_store(new, new_pteg, slot, pte0, pte1);
+    return H_SUCCESS;
+}
+
+static int rehash_hpt(PowerPCCPU *cpu,
+                      void *old, uint64_t oldsize,
+                      void *new, uint64_t newsize)
+{
+    CPUPPCState *env = &cpu->env;
+    uint64_t n_ptegs = oldsize >> 7;
+    uint64_t pteg;
+    int slot;
+    int rc;
+
+    assert(env->external_htab == old);
+
+    for (pteg = 0; pteg < n_ptegs; pteg++) {
+        uint64_t token = ppc_hash64_start_access(cpu, pteg * HPTES_PER_GROUP);
+
+        if (!token) {
+            return H_HARDWARE;
+        }
+
+        for (slot = 0; slot < HPTES_PER_GROUP; slot++) {
+            rc = rehash_hpte(cpu, token, old, oldsize, new, newsize,
+                             pteg, slot);
+            if (rc != H_SUCCESS) {
+                ppc_hash64_stop_access(token);
+                return rc;
+            }
+        }
+        ppc_hash64_stop_access(token);
+    }
+
+    return H_SUCCESS;
+}
+
+static void pivot_hpt(void *arg)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    CPUState *cs = arg;
+    CPUPPCState *env = &POWERPC_CPU(cs)->env;
+
+    cpu_synchronize_state(cs);
+    env->external_htab = spapr->htab;
+    env->htab_mask = (1ULL << (spapr->htab_shift - 7)) - 1;
+    env->spr[SPR_SDR1] = (target_ulong)(uintptr_t)spapr->htab |
+        (spapr->htab_shift - 18);
 }
 
 static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu,
@@ -335,9 +598,48 @@ static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu,
 {
     target_ulong flags = args[0];
     target_ulong shift = args[1];
+    sPAPRPendingHPT *pending = spapr->pending_hpt;
+    int rc;
+    size_t newsize;
 
     trace_spapr_h_resize_hpt_commit(flags, shift);
-    return H_HARDWARE;
+
+    if (flags != 0) {
+        return H_PARAMETER;
+    }
+
+    if (!pending || (pending->shift != shift)) {
+        /* no matching prepare */
+        return H_CLOSED;
+    }
+
+    if (!pending->complete) {
+        /* prepare has not completed */
+        return H_BUSY;
+    }
+
+    newsize = 1ULL << pending->shift;
+    rc = rehash_hpt(cpu, spapr->htab, HTAB_SIZE(spapr),
+                    pending->hpt, newsize);
+    if (rc == H_SUCCESS) {
+        CPUState *cs;
+
+        qemu_vfree(spapr->htab);
+        spapr->htab = pending->hpt;
+        spapr->htab_shift = pending->shift;
+
+        CPU_FOREACH(cs) {
+            run_on_cpu(cs, pivot_hpt, cs);
+        }
+
+        pending->hpt = NULL; /* so it's not free()d */
+    }
+
+    /* Clean up */
+    spapr->pending_hpt = NULL;
+    free_pending_hpt(pending);
+
+    return rc;
 }
 
 static target_ulong h_set_dabr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 028afc9..02cb950 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -12,6 +12,7 @@ struct sPAPRPHBState;
 struct sPAPRNVRAM;
 typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
 typedef struct sPAPREventLogEntry sPAPREventLogEntry;
+typedef struct sPAPRPendingHPT sPAPRPendingHPT;
 
 #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
 #define SPAPR_ENTRY_POINT       0x100
@@ -54,6 +55,8 @@ struct sPAPRMachineState {
 
     void *htab;
     uint32_t htab_shift;
+    sPAPRPendingHPT *pending_hpt; /* in-progress resize */
+
     hwaddr rma_size;
     int vrma_adjust;
     hwaddr fdt_addr, rtas_addr;
@@ -649,4 +652,6 @@ int spapr_rng_populate_dt(void *fdt);
  */
 #define SPAPR_LMB_FLAGS_ASSIGNED 0x00000008
 
+#define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
+
 #endif /* !defined (__HW_SPAPR_H__) */
diff --git a/target-ppc/mmu-hash64.h b/target-ppc/mmu-hash64.h
index 1b1695b..6038fd2 100644
--- a/target-ppc/mmu-hash64.h
+++ b/target-ppc/mmu-hash64.h
@@ -58,11 +58,15 @@ unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
 #define HASH_PTE_SIZE_64        16
 #define HASH_PTEG_SIZE_64       (HASH_PTE_SIZE_64 * HPTES_PER_GROUP)
 
+#define HPTE64_V_SSIZE          SLB_VSID_B
+#define HPTE64_V_SSIZE_256M     SLB_VSID_B_256M
+#define HPTE64_V_SSIZE_1T       SLB_VSID_B_1T
 #define HPTE64_V_SSIZE_SHIFT    62
 #define HPTE64_V_AVPN_SHIFT     7
 #define HPTE64_V_AVPN           0x3fffffffffffff80ULL
 #define HPTE64_V_AVPN_VAL(x)    (((x) & HPTE64_V_AVPN) >> HPTE64_V_AVPN_SHIFT)
 #define HPTE64_V_COMPARE(x, y)  (!(((x) ^ (y)) & 0xffffffffffffff80ULL))
+#define HPTE64_V_BOLTED         0x0000000000000010ULL
 #define HPTE64_V_LARGE          0x0000000000000004ULL
 #define HPTE64_V_SECONDARY      0x0000000000000002ULL
 #define HPTE64_V_VALID          0x0000000000000001ULL
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC 3/3] pseries: Advertise HPT resize capability
  2016-01-18  5:44 [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side) David Gibson
  2016-01-18  5:44 ` [Qemu-devel] [RFC 1/3] pseries: Stub hypercalls for HPT resizing David Gibson
  2016-01-18  5:44 ` [Qemu-devel] [RFC 2/3] pseries: Implement " David Gibson
@ 2016-01-18  5:44 ` David Gibson
  2016-01-18  5:45 ` [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side) David Gibson
  2016-01-19  7:48 ` Bharata B Rao
  4 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2016-01-18  5:44 UTC (permalink / raw)
  To: benh, paulus
  Cc: lvivier, thuth, agraf, qemu-devel, qemu-ppc, bharata, David Gibson

This adds a new string to the hypertas property in the device tree,
advertising to the guest the availability of the HPT resizing hypercalls.
This is a tentative suggested value, and would need to be standardized by
PAPR before being merged.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e26baca..1147382 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -334,6 +334,9 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     add_str(hypertas, "hcall-splpar");
     add_str(hypertas, "hcall-bulk");
     add_str(hypertas, "hcall-set-mode");
+    if (!kvm_enabled()) { /* Not implemented in KVM yet */
+        add_str(hypertas, "hcall-hpt-resize");
+    }
     add_str(qemu_hypertas, "hcall-memop1");
 
     fdt = g_malloc0(FDT_MAX_SIZE);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side)
  2016-01-18  5:44 [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side) David Gibson
                   ` (2 preceding siblings ...)
  2016-01-18  5:44 ` [Qemu-devel] [RFC 3/3] pseries: Advertise HPT resize capability David Gibson
@ 2016-01-18  5:45 ` David Gibson
  2016-01-19  7:48 ` Bharata B Rao
  4 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2016-01-18  5:45 UTC (permalink / raw)
  To: benh, paulus; +Cc: lvivier, thuth, agraf, qemu-devel, qemu-ppc, bharata

[-- Attachment #1: Type: text/plain, Size: 1529 bytes --]

On Mon, Jan 18, 2016 at 04:44:38PM +1100, David Gibson wrote:
1;2802;0c> Here is a draft qemu implementation of my proposed PAPR extension for
> allowing runtime resizing of a KVM/ppc64 guest's hash page table.
> That in turn will allow for more flexible memory hotplug.
> 
> This should work with the guest kernel side patches I also posted
> recently [1].
> 
> Still required to make this into a full implementation:
>   * Guest needs to auto-resize HPT on memory hotplug events
> 
>   * qemu needs to allocate HPT size based on current rather than
>     maximum memory if the guest is HPT resize aware
> 
>   * KVM host side implementation
> 
>   * PAPR standardization
> 
> 
> [1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/90392

Sorry, forgot to mention that this series applies on top of my page
size handling cleanup series posted recently.

> 
> David Gibson (3):
>   pseries: Stub hypercalls for HPT resizing
>   pseries: Implement HPT resizing
>   pseries: Advertise HPT resize capability
> 
>  hw/ppc/spapr.c          |   5 +-
>  hw/ppc/spapr_hcall.c    | 331 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h  |   9 +-
>  target-ppc/mmu-hash64.h |   4 +
>  trace-events            |   2 +
>  5 files changed, 348 insertions(+), 3 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side)
  2016-01-18  5:44 [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side) David Gibson
                   ` (3 preceding siblings ...)
  2016-01-18  5:45 ` [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side) David Gibson
@ 2016-01-19  7:48 ` Bharata B Rao
  2016-01-19 11:02   ` David Gibson
  4 siblings, 1 reply; 12+ messages in thread
From: Bharata B Rao @ 2016-01-19  7:48 UTC (permalink / raw)
  To: David Gibson; +Cc: lvivier, thuth, qemu-devel, agraf, qemu-ppc, paulus

On Mon, Jan 18, 2016 at 04:44:38PM +1100, David Gibson wrote:
> Here is a draft qemu implementation of my proposed PAPR extension for
> allowing runtime resizing of a KVM/ppc64 guest's hash page table.
> That in turn will allow for more flexible memory hotplug.
> 
> This should work with the guest kernel side patches I also posted
> recently [1].
> 
> Still required to make this into a full implementation:
>   * Guest needs to auto-resize HPT on memory hotplug events
> 
>   * qemu needs to allocate HPT size based on current rather than
>     maximum memory if the guest is HPT resize aware
> 
>   * KVM host side implementation
> 
>   * PAPR standardization

So with the current patchset (QEMU and guest kernel changes), I should
be able to change the HTAB size of a PR guest right ? I see the below
failure though:

[root@localhost ~]# cat /sys/kernel/debug/powerpc/pft-size 
24
[root@localhost ~]# echo 26 > /sys/kernel/debug/powerpc/pft-size
[   65.996845] lpar: Attempting to resize HPT to shift 26
[   65.996845] lpar: Attempting to resize HPT to shift 26
[   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
[   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)

PR guest just hangs here while I see tons of below messages in
the 1st level guest:

KVM can't copy data from 0x3fff99e91400!
...
Couldn't emulate instruction 0x00000000 (op 0 xop 0)
kvmppc_handle_exit_pr: emulation at 700 failed (00000000)

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side)
  2016-01-19  7:48 ` Bharata B Rao
@ 2016-01-19 11:02   ` David Gibson
  2016-01-28 21:04     ` Alexander Graf
  0 siblings, 1 reply; 12+ messages in thread
From: David Gibson @ 2016-01-19 11:02 UTC (permalink / raw)
  To: Bharata B Rao; +Cc: lvivier, thuth, qemu-devel, agraf, qemu-ppc, paulus

[-- Attachment #1: Type: text/plain, Size: 2180 bytes --]

On Tue, Jan 19, 2016 at 01:18:17PM +0530, Bharata B Rao wrote:
> On Mon, Jan 18, 2016 at 04:44:38PM +1100, David Gibson wrote:
> > Here is a draft qemu implementation of my proposed PAPR extension for
> > allowing runtime resizing of a KVM/ppc64 guest's hash page table.
> > That in turn will allow for more flexible memory hotplug.
> > 
> > This should work with the guest kernel side patches I also posted
> > recently [1].
> > 
> > Still required to make this into a full implementation:
> >   * Guest needs to auto-resize HPT on memory hotplug events
> > 
> >   * qemu needs to allocate HPT size based on current rather than
> >     maximum memory if the guest is HPT resize aware
> > 
> >   * KVM host side implementation
> > 
> >   * PAPR standardization
> 
> So with the current patchset (QEMU and guest kernel changes), I should
> be able to change the HTAB size of a PR guest right ? I see the below
> failure though:

Uh.. to be honest I haven't really considered the KVM case at all.
I'm kind of surprised it didn't just refuse to do anything.

> [root@localhost ~]# cat /sys/kernel/debug/powerpc/pft-size 
> 24
> [root@localhost ~]# echo 26 > /sys/kernel/debug/powerpc/pft-size
> [   65.996845] lpar: Attempting to resize HPT to shift 26
> [   65.996845] lpar: Attempting to resize HPT to shift 26
> [   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
> [   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
> 
> PR guest just hangs here while I see tons of below messages in
> the 1st level guest:
> 
> KVM can't copy data from 0x3fff99e91400!
> ...
> Couldn't emulate instruction 0x00000000 (op 0 xop 0)
> kvmppc_handle_exit_pr: emulation at 700 failed (00000000)

Hm, not sure why that's happening.  At first I thought it was because
we weren't updating SDR1 with the address of the new htab, but that's
actually in there.  Maybe the KVM PR code isn't rereading it after
initial VM startup.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side)
  2016-01-19 11:02   ` David Gibson
@ 2016-01-28 21:04     ` Alexander Graf
  2016-01-28 22:09       ` Paul Mackerras
  2016-01-29  2:47       ` David Gibson
  0 siblings, 2 replies; 12+ messages in thread
From: Alexander Graf @ 2016-01-28 21:04 UTC (permalink / raw)
  To: David Gibson, Bharata B Rao; +Cc: lvivier, thuth, qemu-devel, qemu-ppc, paulus



On 01/19/2016 12:02 PM, David Gibson wrote:
> On Tue, Jan 19, 2016 at 01:18:17PM +0530, Bharata B Rao wrote:
>> On Mon, Jan 18, 2016 at 04:44:38PM +1100, David Gibson wrote:
>>> Here is a draft qemu implementation of my proposed PAPR extension for
>>> allowing runtime resizing of a KVM/ppc64 guest's hash page table.
>>> That in turn will allow for more flexible memory hotplug.
>>>
>>> This should work with the guest kernel side patches I also posted
>>> recently [1].
>>>
>>> Still required to make this into a full implementation:
>>>    * Guest needs to auto-resize HPT on memory hotplug events
>>>
>>>    * qemu needs to allocate HPT size based on current rather than
>>>      maximum memory if the guest is HPT resize aware
>>>
>>>    * KVM host side implementation
>>>
>>>    * PAPR standardization
>> So with the current patchset (QEMU and guest kernel changes), I should
>> be able to change the HTAB size of a PR guest right ? I see the below
>> failure though:
> Uh.. to be honest I haven't really considered the KVM case at all.
> I'm kind of surprised it didn't just refuse to do anything.
>
>> [root@localhost ~]# cat /sys/kernel/debug/powerpc/pft-size
>> 24
>> [root@localhost ~]# echo 26 > /sys/kernel/debug/powerpc/pft-size
>> [   65.996845] lpar: Attempting to resize HPT to shift 26
>> [   65.996845] lpar: Attempting to resize HPT to shift 26
>> [   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
>> [   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
>>
>> PR guest just hangs here while I see tons of below messages in
>> the 1st level guest:
>>
>> KVM can't copy data from 0x3fff99e91400!
>> ...
>> Couldn't emulate instruction 0x00000000 (op 0 xop 0)
>> kvmppc_handle_exit_pr: emulation at 700 failed (00000000)
> Hm, not sure why that's happening.  At first I thought it was because
> we weren't updating SDR1 with the address of the new htab, but that's
> actually in there.  Maybe the KVM PR code isn't rereading it after
> initial VM startup.

The KVM PR code doesn't care - it just rereads SDR1 on every pteg lookup 
;). There's no caching at all.

Of course, the guest needs to invalidate all pending tlb entries if 
they're now invalid.

Does this work on real hardware? Say, a G5?


Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side)
  2016-01-28 21:04     ` Alexander Graf
@ 2016-01-28 22:09       ` Paul Mackerras
  2016-01-29  2:47       ` David Gibson
  1 sibling, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2016-01-28 22:09 UTC (permalink / raw)
  To: Alexander Graf
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, Bharata B Rao, David Gibson

On Thu, Jan 28, 2016 at 10:04:58PM +0100, Alexander Graf wrote:
> 
> Does this work on real hardware? Say, a G5?

Do you mean, could a bare-metal kernel change its hashed page table?
It could - it would have to allocate a new table, copy over the bolted
mappings (at least), switch to real mode, change SDR1, switch back to
virtual mode.

Paul.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side)
  2016-01-28 21:04     ` Alexander Graf
  2016-01-28 22:09       ` Paul Mackerras
@ 2016-01-29  2:47       ` David Gibson
  2016-01-29  6:18         ` Alexander Graf
  1 sibling, 1 reply; 12+ messages in thread
From: David Gibson @ 2016-01-29  2:47 UTC (permalink / raw)
  To: Alexander Graf
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, Bharata B Rao, paulus

[-- Attachment #1: Type: text/plain, Size: 2904 bytes --]

On Thu, Jan 28, 2016 at 10:04:58PM +0100, Alexander Graf wrote:
> 
> 
> On 01/19/2016 12:02 PM, David Gibson wrote:
> >On Tue, Jan 19, 2016 at 01:18:17PM +0530, Bharata B Rao wrote:
> >>On Mon, Jan 18, 2016 at 04:44:38PM +1100, David Gibson wrote:
> >>>Here is a draft qemu implementation of my proposed PAPR extension for
> >>>allowing runtime resizing of a KVM/ppc64 guest's hash page table.
> >>>That in turn will allow for more flexible memory hotplug.
> >>>
> >>>This should work with the guest kernel side patches I also posted
> >>>recently [1].
> >>>
> >>>Still required to make this into a full implementation:
> >>>   * Guest needs to auto-resize HPT on memory hotplug events
> >>>
> >>>   * qemu needs to allocate HPT size based on current rather than
> >>>     maximum memory if the guest is HPT resize aware
> >>>
> >>>   * KVM host side implementation
> >>>
> >>>   * PAPR standardization
> >>So with the current patchset (QEMU and guest kernel changes), I should
> >>be able to change the HTAB size of a PR guest right ? I see the below
> >>failure though:
> >Uh.. to be honest I haven't really considered the KVM case at all.
> >I'm kind of surprised it didn't just refuse to do anything.
> >
> >>[root@localhost ~]# cat /sys/kernel/debug/powerpc/pft-size
> >>24
> >>[root@localhost ~]# echo 26 > /sys/kernel/debug/powerpc/pft-size
> >>[   65.996845] lpar: Attempting to resize HPT to shift 26
> >>[   65.996845] lpar: Attempting to resize HPT to shift 26
> >>[   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
> >>[   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
> >>
> >>PR guest just hangs here while I see tons of below messages in
> >>the 1st level guest:
> >>
> >>KVM can't copy data from 0x3fff99e91400!
> >>...
> >>Couldn't emulate instruction 0x00000000 (op 0 xop 0)
> >>kvmppc_handle_exit_pr: emulation at 700 failed (00000000)
> >Hm, not sure why that's happening.  At first I thought it was because
> >we weren't updating SDR1 with the address of the new htab, but that's
> >actually in there.  Maybe the KVM PR code isn't rereading it after
> >initial VM startup.
> 
> The KVM PR code doesn't care - it just rereads SDR1 on every pteg lookup ;).
> There's no caching at all.

Ok, no idea why it's not working then.  I'll investigate when I get a chance.

> Of course, the guest needs to invalidate all pending tlb entries if they're
> now invalid.
> 
> Does this work on real hardware? Say, a G5?

As Paulus says it would be possible to do HPT resizing on real
hardware, but the implementation I've done is specific to PAPR.  And
obviously qemu wouldn't be relevant to that case.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side)
  2016-01-29  2:47       ` David Gibson
@ 2016-01-29  6:18         ` Alexander Graf
  2016-01-29 23:11           ` David Gibson
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Graf @ 2016-01-29  6:18 UTC (permalink / raw)
  To: David Gibson; +Cc: lvivier, thuth, qemu-devel, qemu-ppc, Bharata B Rao, paulus



> Am 29.01.2016 um 04:47 schrieb David Gibson <david@gibson.dropbear.id.au>:
> 
>> On Thu, Jan 28, 2016 at 10:04:58PM +0100, Alexander Graf wrote:
>> 
>> 
>>> On 01/19/2016 12:02 PM, David Gibson wrote:
>>>> On Tue, Jan 19, 2016 at 01:18:17PM +0530, Bharata B Rao wrote:
>>>>> On Mon, Jan 18, 2016 at 04:44:38PM +1100, David Gibson wrote:
>>>>> Here is a draft qemu implementation of my proposed PAPR extension for
>>>>> allowing runtime resizing of a KVM/ppc64 guest's hash page table.
>>>>> That in turn will allow for more flexible memory hotplug.
>>>>> 
>>>>> This should work with the guest kernel side patches I also posted
>>>>> recently [1].
>>>>> 
>>>>> Still required to make this into a full implementation:
>>>>>  * Guest needs to auto-resize HPT on memory hotplug events
>>>>> 
>>>>>  * qemu needs to allocate HPT size based on current rather than
>>>>>    maximum memory if the guest is HPT resize aware
>>>>> 
>>>>>  * KVM host side implementation
>>>>> 
>>>>>  * PAPR standardization
>>>> So with the current patchset (QEMU and guest kernel changes), I should
>>>> be able to change the HTAB size of a PR guest right ? I see the below
>>>> failure though:
>>> Uh.. to be honest I haven't really considered the KVM case at all.
>>> I'm kind of surprised it didn't just refuse to do anything.
>>> 
>>>> [root@localhost ~]# cat /sys/kernel/debug/powerpc/pft-size
>>>> 24
>>>> [root@localhost ~]# echo 26 > /sys/kernel/debug/powerpc/pft-size
>>>> [   65.996845] lpar: Attempting to resize HPT to shift 26
>>>> [   65.996845] lpar: Attempting to resize HPT to shift 26
>>>> [   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
>>>> [   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
>>>> 
>>>> PR guest just hangs here while I see tons of below messages in
>>>> the 1st level guest:
>>>> 
>>>> KVM can't copy data from 0x3fff99e91400!
>>>> ...
>>>> Couldn't emulate instruction 0x00000000 (op 0 xop 0)
>>>> kvmppc_handle_exit_pr: emulation at 700 failed (00000000)
>>> Hm, not sure why that's happening.  At first I thought it was because
>>> we weren't updating SDR1 with the address of the new htab, but that's
>>> actually in there.  Maybe the KVM PR code isn't rereading it after
>>> initial VM startup.
>> 
>> The KVM PR code doesn't care - it just rereads SDR1 on every pteg lookup ;).
>> There's no caching at all.
> 
> Ok, no idea why it's not working then.  I'll investigate when I get a chance.
> 
>> Of course, the guest needs to invalidate all pending tlb entries if they're
>> now invalid.
>> 
>> Does this work on real hardware? Say, a G5?
> 
> As Paulus says it would be possible to do HPT resizing on real
> hardware, but the implementation I've done is specific to PAPR.  And
> obviously qemu wouldn't be relevant to that case.

So why make it specific to papr? Wouldn't it make sense to have it as a (ppc) generic interface in Linux?

For the PR PAPR case, QEMU allocates the HTAB, so it needs to make sure it pushes the changed address as new fake SDR1 value into kvm when it changes.


Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side)
  2016-01-29  6:18         ` Alexander Graf
@ 2016-01-29 23:11           ` David Gibson
  0 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2016-01-29 23:11 UTC (permalink / raw)
  To: Alexander Graf
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, Bharata B Rao, paulus

[-- Attachment #1: Type: text/plain, Size: 3916 bytes --]

On Fri, Jan 29, 2016 at 08:18:39AM +0200, Alexander Graf wrote:
> 
> 
> > Am 29.01.2016 um 04:47 schrieb David Gibson <david@gibson.dropbear.id.au>:
> > 
> >> On Thu, Jan 28, 2016 at 10:04:58PM +0100, Alexander Graf wrote:
> >> 
> >> 
> >>> On 01/19/2016 12:02 PM, David Gibson wrote:
> >>>> On Tue, Jan 19, 2016 at 01:18:17PM +0530, Bharata B Rao wrote:
> >>>>> On Mon, Jan 18, 2016 at 04:44:38PM +1100, David Gibson wrote:
> >>>>> Here is a draft qemu implementation of my proposed PAPR extension for
> >>>>> allowing runtime resizing of a KVM/ppc64 guest's hash page table.
> >>>>> That in turn will allow for more flexible memory hotplug.
> >>>>> 
> >>>>> This should work with the guest kernel side patches I also posted
> >>>>> recently [1].
> >>>>> 
> >>>>> Still required to make this into a full implementation:
> >>>>>  * Guest needs to auto-resize HPT on memory hotplug events
> >>>>> 
> >>>>>  * qemu needs to allocate HPT size based on current rather than
> >>>>>    maximum memory if the guest is HPT resize aware
> >>>>> 
> >>>>>  * KVM host side implementation
> >>>>> 
> >>>>>  * PAPR standardization
> >>>> So with the current patchset (QEMU and guest kernel changes), I should
> >>>> be able to change the HTAB size of a PR guest right ? I see the below
> >>>> failure though:
> >>> Uh.. to be honest I haven't really considered the KVM case at all.
> >>> I'm kind of surprised it didn't just refuse to do anything.
> >>> 
> >>>> [root@localhost ~]# cat /sys/kernel/debug/powerpc/pft-size
> >>>> 24
> >>>> [root@localhost ~]# echo 26 > /sys/kernel/debug/powerpc/pft-size
> >>>> [   65.996845] lpar: Attempting to resize HPT to shift 26
> >>>> [   65.996845] lpar: Attempting to resize HPT to shift 26
> >>>> [   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
> >>>> [   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
> >>>> 
> >>>> PR guest just hangs here while I see tons of below messages in
> >>>> the 1st level guest:
> >>>> 
> >>>> KVM can't copy data from 0x3fff99e91400!
> >>>> ...
> >>>> Couldn't emulate instruction 0x00000000 (op 0 xop 0)
> >>>> kvmppc_handle_exit_pr: emulation at 700 failed (00000000)
> >>> Hm, not sure why that's happening.  At first I thought it was because
> >>> we weren't updating SDR1 with the address of the new htab, but that's
> >>> actually in there.  Maybe the KVM PR code isn't rereading it after
> >>> initial VM startup.
> >> 
> >> The KVM PR code doesn't care - it just rereads SDR1 on every pteg lookup ;).
> >> There's no caching at all.
> > 
> > Ok, no idea why it's not working then.  I'll investigate when I get a chance.
> > 
> >> Of course, the guest needs to invalidate all pending tlb entries if they're
> >> now invalid.
> >> 
> >> Does this work on real hardware? Say, a G5?
> > 
> > As Paulus says it would be possible to do HPT resizing on real
> > hardware, but the implementation I've done is specific to PAPR.  And
> > obviously qemu wouldn't be relevant to that case.
> 
> So why make it specific to papr? Wouldn't it make sense to have it
> as a (ppc) generic interface in Linux?

Well, I sort of did, in that I added a ppc_md call for it.  I just
haven't implemented it for anything other than PAPR yet - the PAPR
implementation is quite different from what the native one would be,
since the hypervisor needs to handle the rehashing.

> For the PR PAPR case, QEMU allocates the HTAB, so it needs to make
> sure it pushes the changed address as new fake SDR1 value into kvm
> when it changes.

Yes, I'm doing that - have a look at the qemu series.  Not 100% sure
it's correct, since I haven't debugged with PR KVM yet.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-30 11:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-18  5:44 [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side) David Gibson
2016-01-18  5:44 ` [Qemu-devel] [RFC 1/3] pseries: Stub hypercalls for HPT resizing David Gibson
2016-01-18  5:44 ` [Qemu-devel] [RFC 2/3] pseries: Implement " David Gibson
2016-01-18  5:44 ` [Qemu-devel] [RFC 3/3] pseries: Advertise HPT resize capability David Gibson
2016-01-18  5:45 ` [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side) David Gibson
2016-01-19  7:48 ` Bharata B Rao
2016-01-19 11:02   ` David Gibson
2016-01-28 21:04     ` Alexander Graf
2016-01-28 22:09       ` Paul Mackerras
2016-01-29  2:47       ` David Gibson
2016-01-29  6:18         ` Alexander Graf
2016-01-29 23:11           ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.