All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/5] target/ppc: Implement support for the Large Decrementer
@ 2017-06-08  7:03 Suraj Jitindar Singh
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 1/5] target/ppc: Implement large decrementer support for TCG Suraj Jitindar Singh
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Suraj Jitindar Singh @ 2017-06-08  7:03 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel, david, agraf, Suraj Jitindar Singh

The POWER9 processor introduces a new operating mode of the decrementer
called large decrementer mode.

If large decrementer mode is disabled then the decrementer behaves before
as a 32-bit decrementing register. If large decrementer mode is enabled
then the decrementer behaves as a d-bit decrementing register, the value
of which is sign extended to 64-bits (where d is implementation dependent).

The hypervisor decrementer is now a h-bit decrementing register which is
always sign extended to 64-bits (where h is implementation dependent).

To use the large decrementer both the guest and the host must have support
for it. If the host has support then qemu will advertise this to the guest
and tell the host that the guest is using the large decrementer. In TCG we
always advertise support and enable the large decrementer if we detect that
the guest will use it. 

The large decrementer can be disabled on the command line to ensure
migration between hosts with differing levels of support or decrementer
size.

This patch series is based on the branch dwg/ppc-for-2.10

Suraj Jitindar Singh (5):
  target/ppc: Implement large decrementer support for TCG
  target/ppc: Implement large decrementer support for KVM
  target/ppc: Implement migration support for large decrementer
  target/ppc: Enable the large decrementer for TCG and KVM guests
  target/ppc: Add cmd line option to disable the large decrementer

 hw/ppc/ppc.c                |  81 +++++++++++++++++++---------
 hw/ppc/spapr.c              | 128 ++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_hcall.c        |  36 +++++++++++++
 include/hw/ppc/spapr.h      |   2 +
 target/ppc/cpu-qom.h        |   1 +
 target/ppc/cpu.h            |   8 +--
 target/ppc/kvm.c            |  59 ++++++++++++++++++++
 target/ppc/kvm_ppc.h        |  25 +++++++++
 target/ppc/mmu-hash64.c     |   2 +-
 target/ppc/translate.c      |   2 +-
 target/ppc/translate_init.c |   3 ++
 11 files changed, 317 insertions(+), 30 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 1/5] target/ppc: Implement large decrementer support for TCG
  2017-06-08  7:03 [Qemu-devel] [PATCH 0/5] target/ppc: Implement support for the Large Decrementer Suraj Jitindar Singh
@ 2017-06-08  7:03 ` Suraj Jitindar Singh
  2017-06-13  7:50   ` David Gibson
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 2/5] target/ppc: Implement large decrementer support for KVM Suraj Jitindar Singh
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Suraj Jitindar Singh @ 2017-06-08  7:03 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel, david, agraf, Suraj Jitindar Singh

The large decrementer is an operating mode of the decrementer.
The decrementer is normally a 32-bit register.
When operating in large decrementer mode the decrementer is a d-bit
register which is sign extended to 64-bits (where d is implementation
dependant).

Implement support for a TCG guest to use the decrementer in large
decrementer mode. This means updating the decrementer access functions
to accept larger values under the correct conditions.

The operting mode of the decrementer is controlled by the LPCR_LD bit in
the logical parition control register (LPCR).

The operating mode of the hypervisor decrementer is dependant on the cpu
model, >= POWER9 -> large hypervisor decrementer.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 hw/ppc/ppc.c                | 81 +++++++++++++++++++++++++++++++--------------
 target/ppc/cpu-qom.h        |  1 +
 target/ppc/cpu.h            |  8 ++---
 target/ppc/mmu-hash64.c     |  2 +-
 target/ppc/translate.c      |  2 +-
 target/ppc/translate_init.c |  3 ++
 6 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 224184d..49c52ed 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -649,11 +649,11 @@ bool ppc_decr_clear_on_delivery(CPUPPCState *env)
     return ((tb_env->flags & flags) == PPC_DECR_UNDERFLOW_TRIGGERED);
 }
 
-static inline uint32_t _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next)
+static inline target_ulong _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next,
+                                              bool large_decr)
 {
     ppc_tb_t *tb_env = env->tb_env;
-    uint32_t decr;
-    int64_t diff;
+    int64_t decr, diff;
 
     diff = next - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
     if (diff >= 0) {
@@ -663,12 +663,16 @@ static inline uint32_t _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next)
     }  else {
         decr = -muldiv64(-diff, tb_env->decr_freq, NANOSECONDS_PER_SECOND);
     }
-    LOG_TB("%s: %08" PRIx32 "\n", __func__, decr);
+    LOG_TB("%s: %016" PRIx64 "\n", __func__, decr);
 
-    return decr;
+    /*
+     * If large decrementer is enabled then the decrementer is signed extened
+     * to 64 bits, otherwise it is a 32 bit value.
+     */
+    return large_decr ? decr : (uint32_t) decr;
 }
 
-uint32_t cpu_ppc_load_decr (CPUPPCState *env)
+target_ulong cpu_ppc_load_decr (CPUPPCState *env)
 {
     ppc_tb_t *tb_env = env->tb_env;
 
@@ -676,14 +680,16 @@ uint32_t cpu_ppc_load_decr (CPUPPCState *env)
         return env->spr[SPR_DECR];
     }
 
-    return _cpu_ppc_load_decr(env, tb_env->decr_next);
+    return _cpu_ppc_load_decr(env, tb_env->decr_next,
+                              env->spr[SPR_LPCR] & LPCR_LD);
 }
 
-uint32_t cpu_ppc_load_hdecr (CPUPPCState *env)
+target_ulong cpu_ppc_load_hdecr (CPUPPCState *env)
 {
     ppc_tb_t *tb_env = env->tb_env;
 
-    return _cpu_ppc_load_decr(env, tb_env->hdecr_next);
+    return _cpu_ppc_load_decr(env, tb_env->hdecr_next,
+                              env->mmu_model & POWERPC_MMU_V3);
 }
 
 uint64_t cpu_ppc_load_purr (CPUPPCState *env)
@@ -737,13 +743,20 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t *nextp,
                                  QEMUTimer *timer,
                                  void (*raise_excp)(void *),
                                  void (*lower_excp)(PowerPCCPU *),
-                                 uint32_t decr, uint32_t value)
+                                 target_ulong decr, target_ulong value,
+                                 int decr_bits)
 {
     CPUPPCState *env = &cpu->env;
     ppc_tb_t *tb_env = env->tb_env;
     uint64_t now, next;
 
-    LOG_TB("%s: %08" PRIx32 " => %08" PRIx32 "\n", __func__,
+    /* Truncate value to decr_width and sign extend for simplicity */
+    value &= ((1ULL << decr_bits) - 1);
+    if (value & (1ULL << (decr_bits - 1))) { /* Negative */
+        value |= (0xFFFFFFFFULL << decr_bits);
+    }
+
+    LOG_TB("%s: " TARGET_FMT_lx " => " TARGET_FMT_lx "\n", __func__,
                 decr, value);
 
     if (kvm_enabled()) {
@@ -765,15 +778,16 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t *nextp,
      * an edge interrupt, so raise it here too.
      */
     if ((value < 3) ||
-        ((tb_env->flags & PPC_DECR_UNDERFLOW_LEVEL) && (value & 0x80000000)) ||
-        ((tb_env->flags & PPC_DECR_UNDERFLOW_TRIGGERED) && (value & 0x80000000)
-          && !(decr & 0x80000000))) {
+        ((tb_env->flags & PPC_DECR_UNDERFLOW_LEVEL) && (value & (1ULL << decr_bits))) ||
+        ((tb_env->flags & PPC_DECR_UNDERFLOW_TRIGGERED) && (value & (1ULL << decr_bits))
+          && !(decr & (1ULL << decr_bits)))) {
         (*raise_excp)(cpu);
         return;
     }
 
     /* On MSB level based systems a 0 for the MSB stops interrupt delivery */
-    if (!(value & 0x80000000) && (tb_env->flags & PPC_DECR_UNDERFLOW_LEVEL)) {
+    if (!(value & (1ULL << decr_bits)) && (tb_env->flags &
+                                         PPC_DECR_UNDERFLOW_LEVEL)) {
         (*lower_excp)(cpu);
     }
 
@@ -786,17 +800,24 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t *nextp,
     timer_mod(timer, next);
 }
 
-static inline void _cpu_ppc_store_decr(PowerPCCPU *cpu, uint32_t decr,
-                                       uint32_t value)
+static inline void _cpu_ppc_store_decr(PowerPCCPU *cpu, target_ulong decr,
+                                       target_ulong value)
 {
     ppc_tb_t *tb_env = cpu->env.tb_env;
+    int bits = 32;
+
+    if (cpu->env.spr[SPR_LPCR] & LPCR_LD) {
+        PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+
+        bits = pcc->large_decr_bits;
+    }
 
     __cpu_ppc_store_decr(cpu, &tb_env->decr_next, tb_env->decr_timer,
                          tb_env->decr_timer->cb, &cpu_ppc_decr_lower, decr,
-                         value);
+                         value, bits);
 }
 
-void cpu_ppc_store_decr (CPUPPCState *env, uint32_t value)
+void cpu_ppc_store_decr (CPUPPCState *env, target_ulong value)
 {
     PowerPCCPU *cpu = ppc_env_get_cpu(env);
 
@@ -810,19 +831,26 @@ static void cpu_ppc_decr_cb(void *opaque)
     cpu_ppc_decr_excp(cpu);
 }
 
-static inline void _cpu_ppc_store_hdecr(PowerPCCPU *cpu, uint32_t hdecr,
-                                        uint32_t value)
+static inline void _cpu_ppc_store_hdecr(PowerPCCPU *cpu, target_ulong hdecr,
+                                        target_ulong value)
 {
     ppc_tb_t *tb_env = cpu->env.tb_env;
+    int bits = 32;
+
+    if (cpu->env.mmu_model & POWERPC_MMU_V3) {
+        PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+
+        bits = pcc->large_decr_bits;
+    }
 
     if (tb_env->hdecr_timer != NULL) {
         __cpu_ppc_store_decr(cpu, &tb_env->hdecr_next, tb_env->hdecr_timer,
                              tb_env->hdecr_timer->cb, &cpu_ppc_hdecr_lower,
-                             hdecr, value);
+                             hdecr, value, bits);
     }
 }
 
-void cpu_ppc_store_hdecr (CPUPPCState *env, uint32_t value)
+void cpu_ppc_store_hdecr (CPUPPCState *env, target_ulong value)
 {
     PowerPCCPU *cpu = ppc_env_get_cpu(env);
 
@@ -848,7 +876,9 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t freq)
 {
     CPUPPCState *env = opaque;
     PowerPCCPU *cpu = ppc_env_get_cpu(env);
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
     ppc_tb_t *tb_env = env->tb_env;
+    int decr_bits = 32;
 
     tb_env->tb_freq = freq;
     tb_env->decr_freq = freq;
@@ -857,7 +887,10 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t freq)
      * it's not ready to handle it...
      */
     _cpu_ppc_store_decr(cpu, 0xFFFFFFFF, 0xFFFFFFFF);
-    _cpu_ppc_store_hdecr(cpu, 0xFFFFFFFF, 0xFFFFFFFF);
+    if (env->mmu_model & POWERPC_MMU_V3) {
+        decr_bits = pcc->large_decr_bits;
+    }
+    _cpu_ppc_store_hdecr(cpu, (1 << decr_bits) - 1, (1 << decr_bits) - 1);
     cpu_ppc_store_purr(cpu, 0x0000000000000000ULL);
 }
 
diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
index d0cf6ca..523979c 100644
--- a/target/ppc/cpu-qom.h
+++ b/target/ppc/cpu-qom.h
@@ -198,6 +198,7 @@ typedef struct PowerPCCPUClass {
     uint32_t l1_dcache_size, l1_icache_size;
     const struct ppc_segment_page_sizes *sps;
     struct ppc_radix_page_info *radix_page_info;
+    uint32_t large_decr_bits;
     void (*init_proc)(CPUPPCState *env);
     int  (*check_pow)(CPUPPCState *env);
     int (*handle_mmu_fault)(PowerPCCPU *cpu, vaddr eaddr, int rwx, int mmu_idx);
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 401e10e..f6e86b6 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1309,10 +1309,10 @@ uint32_t cpu_ppc_load_atbu (CPUPPCState *env);
 void cpu_ppc_store_atbl (CPUPPCState *env, uint32_t value);
 void cpu_ppc_store_atbu (CPUPPCState *env, uint32_t value);
 bool ppc_decr_clear_on_delivery(CPUPPCState *env);
-uint32_t cpu_ppc_load_decr (CPUPPCState *env);
-void cpu_ppc_store_decr (CPUPPCState *env, uint32_t value);
-uint32_t cpu_ppc_load_hdecr (CPUPPCState *env);
-void cpu_ppc_store_hdecr (CPUPPCState *env, uint32_t value);
+target_ulong cpu_ppc_load_decr (CPUPPCState *env);
+void cpu_ppc_store_decr (CPUPPCState *env, target_ulong value);
+target_ulong cpu_ppc_load_hdecr (CPUPPCState *env);
+void cpu_ppc_store_hdecr (CPUPPCState *env, target_ulong value);
 uint64_t cpu_ppc_load_purr (CPUPPCState *env);
 uint32_t cpu_ppc601_load_rtcl (CPUPPCState *env);
 uint32_t cpu_ppc601_load_rtcu (CPUPPCState *env);
diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index 14d34e5..b1e1764 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -1081,7 +1081,7 @@ void helper_store_lpcr(CPUPPCState *env, target_ulong val)
     case POWERPC_MMU_VER_3_00: /* P9 */
         lpcr = val & (LPCR_VPM1 | LPCR_ISL | LPCR_KBV | LPCR_DPFD |
                       (LPCR_PECE_U_MASK & LPCR_HVEE) | LPCR_ILE | LPCR_AIL |
-                      LPCR_UPRT | LPCR_EVIRT | LPCR_ONL |
+                      LPCR_UPRT | LPCR_EVIRT | LPCR_ONL | LPCR_LD |
                       (LPCR_PECE_L_MASK & (LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
                       LPCR_DEE | LPCR_OEE)) | LPCR_MER | LPCR_GTSE | LPCR_TC |
                       LPCR_HEIC | LPCR_LPES0 | LPCR_HVICE | LPCR_HDICE);
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c0cd64d..ebe1fa5 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -7006,7 +7006,7 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
 #if !defined(NO_TIMER_DUMP)
     cpu_fprintf(f, "TB %08" PRIu32 " %08" PRIu64
 #if !defined(CONFIG_USER_ONLY)
-                " DECR %08" PRIu32
+                " DECR " TARGET_FMT_lu
 #endif
                 "\n",
                 cpu_ppc_load_tbu(env), cpu_ppc_load_tbl(env)
diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
index 56a0ab2..a0b2934 100644
--- a/target/ppc/translate_init.c
+++ b/target/ppc/translate_init.c
@@ -8995,6 +8995,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
     /* segment page size remain the same */
     pcc->sps = &POWER7_POWER8_sps;
     pcc->radix_page_info = &POWER9_radix_page_info;
+    pcc->large_decr_bits = 56;
 #endif
     pcc->excp_model = POWERPC_EXCP_POWER8;
     pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
@@ -9047,6 +9048,8 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
          * tables and guest translation shootdown by default
          */
         lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
+        /* Disable Large Decrementer by Default */
+        lpcr->default_value &= ~LPCR_LD;
         lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
                                LPCR_OEE;
         break;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 2/5] target/ppc: Implement large decrementer support for KVM
  2017-06-08  7:03 [Qemu-devel] [PATCH 0/5] target/ppc: Implement support for the Large Decrementer Suraj Jitindar Singh
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 1/5] target/ppc: Implement large decrementer support for TCG Suraj Jitindar Singh
@ 2017-06-08  7:03 ` Suraj Jitindar Singh
  2017-06-13  8:15   ` David Gibson
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 3/5] target/ppc: Implement migration support for large decrementer Suraj Jitindar Singh
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Suraj Jitindar Singh @ 2017-06-08  7:03 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel, david, agraf, Suraj Jitindar Singh

The large decrementer is an operating mode of the decrementer.
The decrementer is normally a 32-bit register.
When operating in large decrementer mode the decrementer is a d-bit
register which is sign extended to 64-bits (where d is implementation
dependant).

Implement support for a KVM guest to use the decrementer in large
decrementer mode. This means adding functions to query the large
decrementer support of the hypervisor and to enable the large
decrementer with the hypervisor.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 target/ppc/kvm.c     | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 target/ppc/kvm_ppc.h | 25 ++++++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 51249ce..b2c94a0 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -88,6 +88,7 @@ static int cap_fixup_hcalls;
 static int cap_htm;             /* Hardware transactional memory support */
 static int cap_mmu_radix;
 static int cap_mmu_hash_v3;
+static int cap_large_decr;
 
 static uint32_t debug_inst_opcode;
 
@@ -393,6 +394,19 @@ target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu,
     }
 }
 
+void kvmppc_configure_large_decrementer(CPUState *cs, bool enable_ld)
+{
+    uint64_t lpcr;
+
+    kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
+    if (enable_ld) {
+        lpcr |= LPCR_LD;
+    } else {
+        lpcr &= LPCR_LD;
+    }
+    kvm_set_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
+}
+
 static bool kvm_valid_page_size(uint32_t flags, long rampgsize, uint32_t shift)
 {
     if (!(flags & KVM_PPC_PAGE_SIZES_REAL)) {
@@ -2004,6 +2018,11 @@ uint32_t kvmppc_get_dfp(void)
     return kvmppc_read_int_cpu_dt("ibm,dfp");
 }
 
+uint32_t kvmppc_get_dec_bits(void)
+{
+    return kvmppc_read_int_cpu_dt("ibm,dec-bits");
+}
+
 static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo)
  {
      PowerPCCPU *cpu = ppc_env_get_cpu(env);
@@ -2380,6 +2399,7 @@ static void kvmppc_host_cpu_class_init(ObjectClass *oc, void *data)
 
 #if defined(TARGET_PPC64)
     pcc->radix_page_info = kvm_get_radix_page_info();
+    pcc->large_decr_bits = kvmppc_get_dec_bits();
 
     if ((pcc->pvr & 0xffffff00) == CPU_POWERPC_POWER9_DD1) {
         /*
@@ -2424,6 +2444,45 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
     return cap_mmu_hash_v3;
 }
 
+void kvmppc_check_cap_large_decr(void)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+    CPUState *cs = CPU(cpu);
+    bool large_dec_support;
+    uint32_t dec_bits;
+    uint64_t lpcr;
+
+    /*
+     * Try and set the LPCR_LD (large decrementer) bit to enable the large
+     * decrementer. A hypervisor with large decrementer capabilities will allow
+     * this so the value read back after this will have the LPCR_LD bit set.
+     * Otherwise the bit will be cleared meaning we can't use the large
+     * decrementer.
+     */
+    kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
+    lpcr |= LPCR_LD;
+    kvm_set_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
+    kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
+    large_dec_support = !!(lpcr & LPCR_LD);
+    /* Probably a good idea to clear it again */
+    lpcr &= ~LPCR_LD;
+    kvm_set_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
+
+    /*
+     * Check for the ibm,dec-bits property on the host, if it isn't there then
+     * something has gone wrong and we're better off not letting the guest use
+     * the large decrementer.
+     */
+    dec_bits = kvmppc_get_dec_bits();
+
+    cap_large_decr = large_dec_support && dec_bits;
+}
+
+bool kvmppc_has_cap_large_decr(void)
+{
+    return cap_large_decr;
+}
+
 PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void)
 {
     uint32_t host_pvr = mfpvr();
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index f48243d..c49c7f0 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -17,6 +17,7 @@ uint32_t kvmppc_get_tbfreq(void);
 uint64_t kvmppc_get_clockfreq(void);
 uint32_t kvmppc_get_vmx(void);
 uint32_t kvmppc_get_dfp(void);
+uint32_t kvmppc_get_dec_bits(void);
 bool kvmppc_get_host_model(char **buf);
 bool kvmppc_get_host_serial(char **buf);
 int kvmppc_get_hasidle(CPUPPCState *env);
@@ -36,6 +37,8 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu);
 target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu,
                                      bool radix, bool gtse,
                                      uint64_t proc_tbl);
+void kvmppc_check_cap_large_decr(void);
+void kvmppc_configure_large_decrementer(CPUState *cs, bool enable_ld);
 #ifndef CONFIG_USER_ONLY
 off_t kvmppc_alloc_rma(void **rma);
 bool kvmppc_spapr_use_multitce(void);
@@ -60,6 +63,7 @@ bool kvmppc_has_cap_fixup_hcalls(void);
 bool kvmppc_has_cap_htm(void);
 bool kvmppc_has_cap_mmu_radix(void);
 bool kvmppc_has_cap_mmu_hash_v3(void);
+bool kvmppc_has_cap_large_decr(void);
 int kvmppc_enable_hwrng(void);
 int kvmppc_put_books_sregs(PowerPCCPU *cpu);
 PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void);
@@ -98,6 +102,11 @@ static inline uint32_t kvmppc_get_dfp(void)
     return 0;
 }
 
+static inline uint32_t kvmppc_get_dec_bits(void)
+{
+    return 0;
+}
+
 static inline int kvmppc_get_hasidle(CPUPPCState *env)
 {
     return 0;
@@ -170,6 +179,17 @@ static inline target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu,
     return 0;
 }
 
+static inline void kvmppc_check_cap_large_decr(void)
+{
+    return;
+}
+
+static inline void kvmppc_configure_large_decrementer(CPUState *cs,
+                                                      bool enable_ld)
+{
+    return;
+}
+
 #ifndef CONFIG_USER_ONLY
 static inline off_t kvmppc_alloc_rma(void **rma)
 {
@@ -282,6 +302,11 @@ static inline bool kvmppc_has_cap_mmu_hash_v3(void)
     return false;
 }
 
+static inline bool kvmppc_has_cap_large_decr(void)
+{
+    return false;
+}
+
 static inline int kvmppc_enable_hwrng(void)
 {
     return -1;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 3/5] target/ppc: Implement migration support for large decrementer
  2017-06-08  7:03 [Qemu-devel] [PATCH 0/5] target/ppc: Implement support for the Large Decrementer Suraj Jitindar Singh
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 1/5] target/ppc: Implement large decrementer support for TCG Suraj Jitindar Singh
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 2/5] target/ppc: Implement large decrementer support for KVM Suraj Jitindar Singh
@ 2017-06-08  7:03 ` Suraj Jitindar Singh
  2017-06-13  8:20   ` David Gibson
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 4/5] target/ppc: Enable the large decrementer for TCG and KVM guests Suraj Jitindar Singh
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 5/5] target/ppc: Add cmd line option to disable the large decrementer Suraj Jitindar Singh
  4 siblings, 1 reply; 9+ messages in thread
From: Suraj Jitindar Singh @ 2017-06-08  7:03 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel, david, agraf, Suraj Jitindar Singh

Implement support to migrate a guest which is using the large
decrementer.

We need to save the decrementer width to the migration stream.
On incoming migration we then need to check that the hypervisor is
capable of letting the guest use the large decrementer and that the
decrementer width is the same on the receiving side. Since there is no
way to tell the guest when the width of the decrementer changes we have
to terminate if the decrementer width is not what the guest expects.
If we can use the large decrementer then we have to tell the hypervisor
to enable it.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 hw/ppc/spapr.c         | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |  1 +
 2 files changed, 64 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5d10366..6ba869a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1420,6 +1420,45 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error **errp)
     }
 }
 
+static int spapr_import_large_decr_bits(sPAPRMachineState *spapr)
+{
+    /*
+     * If the guest uses the large decrementer then this hypervisor must also
+     * support it and have the exact same width. We must also enable the large
+     * decrementer because we have no way to tell the guest to stop using it.
+     */
+    if (spapr->large_decr_bits) {
+        uint32_t dec_bits = 32;
+        PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+        CPUState *cs = CPU(cpu);
+        PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
+
+        if (kvm_enabled()) {
+            if (!kvmppc_has_cap_large_decr()) {
+                error_report("Host doesn't support large decrementer and guest requires it");
+                return -EINVAL;
+            }
+            dec_bits = kvmppc_get_dec_bits();
+        } else {
+            dec_bits = pcc->large_decr_bits;
+        }
+
+        if (spapr->large_decr_bits != dec_bits) {
+            error_report("Host large decrementer size [%u] doesn't match what guest expects [%u]",
+                         dec_bits, spapr->large_decr_bits);
+            return -EINVAL;
+        }
+
+        if (kvm_enabled()) {
+            CPU_FOREACH(cs) {
+                kvmppc_configure_large_decrementer(cs, true);
+            }
+        }
+    }
+
+    return 0;
+}
+
 static int spapr_post_load(void *opaque, int version_id)
 {
     sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
@@ -1439,8 +1478,13 @@ static int spapr_post_load(void *opaque, int version_id)
      * value into the RTC device */
     if (version_id < 3) {
         err = spapr_rtc_import_offset(&spapr->rtc, spapr->rtc_offset);
+        if (err) {
+            return err;
+        }
     }
 
+    err = spapr_import_large_decr_bits(spapr);
+
     return err;
 }
 
@@ -1529,6 +1573,24 @@ static const VMStateDescription vmstate_spapr_patb_entry = {
     },
 };
 
+static bool spapr_large_decr_entry_needed(void *opaque)
+{
+    sPAPRMachineState *spapr = opaque;
+
+    return !!spapr->large_decr_bits;
+}
+
+static const VMStateDescription vmstate_spapr_large_decr_entry = {
+    .name = "spapr_large_decr_entry",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_large_decr_entry_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(large_decr_bits, sPAPRMachineState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
     .version_id = 3,
@@ -1547,6 +1609,7 @@ static const VMStateDescription vmstate_spapr = {
     .subsections = (const VMStateDescription*[]) {
         &vmstate_spapr_ov5_cas,
         &vmstate_spapr_patb_entry,
+        &vmstate_spapr_large_decr_entry,
         NULL
     }
 };
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 98fb78b..4ba9b89 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -91,6 +91,7 @@ struct sPAPRMachineState {
     sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
     bool cas_reboot;
     bool cas_legacy_guest_workaround;
+    uint32_t large_decr_bits; /* Large decrementer width (0 -> not in use) */
 
     Notifier epow_notifier;
     QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 4/5] target/ppc: Enable the large decrementer for TCG and KVM guests
  2017-06-08  7:03 [Qemu-devel] [PATCH 0/5] target/ppc: Implement support for the Large Decrementer Suraj Jitindar Singh
                   ` (2 preceding siblings ...)
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 3/5] target/ppc: Implement migration support for large decrementer Suraj Jitindar Singh
@ 2017-06-08  7:03 ` Suraj Jitindar Singh
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 5/5] target/ppc: Add cmd line option to disable the large decrementer Suraj Jitindar Singh
  4 siblings, 0 replies; 9+ messages in thread
From: Suraj Jitindar Singh @ 2017-06-08  7:03 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel, david, agraf, Suraj Jitindar Singh

Let the guest use the large decrementer.

We have support for TCG and KVM guests to use the large decrementer and
to migrate guests using the large decrementer, so add the final bits to
indicate this capability to the guest.

The guest will use the large decrementer if the cpu model is >= POWER9
and the ibm,dec-bits device-tree property of the cpu node is present.
Add the ibm,dec-bits property to the device-tree when the hypervisor can
support it. After CAS enable the large decrementer if the guest is going
to use it, this means setting the LPCR_LD bit.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 hw/ppc/spapr.c       | 18 ++++++++++++++++++
 hw/ppc/spapr_hcall.c | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6ba869a..6f38939 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -554,6 +554,19 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
                           pcc->radix_page_info->count *
                           sizeof(radix_AP_encodings[0]))));
     }
+
+    /*
+     * We set this property to let the guest know that it can use the large
+     * decrementer and its width in bits. This means we must be on a processor
+     * with a large decrementer and the hypervisor must support it. In TCG the
+     * large decrementer is always supported, in KVM we check the hypervisor
+     * capability.
+     */
+    if (pcc->large_decr_bits && ((!kvm_enabled()) ||
+                                 kvmppc_has_cap_large_decr())) {
+        _FDT((fdt_setprop_u32(fdt, offset, "ibm,dec-bits",
+                              pcc->large_decr_bits)));
+    }
 }
 
 static void spapr_populate_cpus_dt_node(void *fdt, sPAPRMachineState *spapr)
@@ -1328,6 +1341,11 @@ static void ppc_spapr_reset(void)
         spapr_setup_hpt_and_vrma(spapr);
     }
 
+    /* We have to do this after vcpus are created since it calls ioctls */
+    if (kvm_enabled()) {
+        kvmppc_check_cap_large_decr();
+    }
+
     qemu_devices_reset();
 
     /*
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index aae5a62..c06421b 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1091,6 +1091,37 @@ static uint32_t cas_check_pvr(PowerPCCPU *cpu, target_ulong *addr,
     return best_compat;
 }
 
+static void cas_enable_large_decr(PowerPCCPU *cpu, sPAPRMachineState *spapr)
+{
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+    bool guest_large_decr = false;
+
+    if (cpu->compat_pvr) {
+        guest_large_decr = cpu->compat_pvr >= CPU_POWERPC_LOGICAL_3_00;
+    } else {
+        guest_large_decr = (cpu->env.spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK)
+                           >= CPU_POWERPC_POWER9_BASE;
+    }
+
+    if (guest_large_decr && ((!kvm_enabled()) ||
+                             kvmppc_has_cap_large_decr())) {
+        CPUState *cs;
+
+        CPU_FOREACH(cs) {
+            if (kvm_enabled()) {
+                kvmppc_configure_large_decrementer(cs, true);
+            } else {
+                set_spr(cs, SPR_LPCR, LPCR_LD, LPCR_LD);
+            }
+        }
+
+        spapr->large_decr_bits = pcc->large_decr_bits;
+    } else {
+        /* By default the large decrementer is already disabled */
+        spapr->large_decr_bits = 0;
+    }
+}
+
 static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
                                                   sPAPRMachineState *spapr,
                                                   target_ulong opcode,
@@ -1166,6 +1197,9 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
     }
     spapr->cas_legacy_guest_workaround = !spapr_ovec_test(ov1_guest,
                                                           OV1_PPC_3_00);
+
+    cas_enable_large_decr(cpu, spapr);
+
     if (!spapr->cas_reboot) {
         spapr->cas_reboot =
             (spapr_h_cas_compose_response(spapr, args[1], args[2],
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 5/5] target/ppc: Add cmd line option to disable the large decrementer
  2017-06-08  7:03 [Qemu-devel] [PATCH 0/5] target/ppc: Implement support for the Large Decrementer Suraj Jitindar Singh
                   ` (3 preceding siblings ...)
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 4/5] target/ppc: Enable the large decrementer for TCG and KVM guests Suraj Jitindar Singh
@ 2017-06-08  7:03 ` Suraj Jitindar Singh
  4 siblings, 0 replies; 9+ messages in thread
From: Suraj Jitindar Singh @ 2017-06-08  7:03 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel, david, agraf, Suraj Jitindar Singh

Given there is no way to tell the guest if the size of the large
decrementer changes, it is not possible to migrate a guest between
machines where the decrementer size differs.

Add a command line option to disable the large decrementer for a guest
on boot. This means we will not advertise the availability of the large
decrementer to the guest and thus it won't try to use it.

This allows for a way for a guest to be started which will be compatible
with live migration to a system with a differing decrementer size
(assuming that system still implements the basic 32-bit decrementer
mode).

A required option is supplied to force large decrementer, qemu will fail
to start if the host doesn't support it. There is also a default option
where the large decrementer will be enabled/disabled based on the
capabilities of the hypervisor.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 hw/ppc/spapr.c         | 59 +++++++++++++++++++++++++++++++++++++++++++++-----
 hw/ppc/spapr_hcall.c   |  4 +++-
 include/hw/ppc/spapr.h |  1 +
 3 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6f38939..4290dd8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -558,12 +558,12 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
     /*
      * We set this property to let the guest know that it can use the large
      * decrementer and its width in bits. This means we must be on a processor
-     * with a large decrementer and the hypervisor must support it. In TCG the
-     * large decrementer is always supported, in KVM we check the hypervisor
-     * capability.
+     * with a large decrementer, it must not have been disabled and the
+     * hypervisor must support it. In TCG the large decrementer is always
+     * supported, in KVM we check the hypervisor capability.
      */
-    if (pcc->large_decr_bits && ((!kvm_enabled()) ||
-                                 kvmppc_has_cap_large_decr())) {
+    if (pcc->large_decr_bits && (spapr->large_decr_support != -1) &&
+            ((!kvm_enabled()) || kvmppc_has_cap_large_decr())) {
         _FDT((fdt_setprop_u32(fdt, offset, "ibm,dec-bits",
                               pcc->large_decr_bits)));
     }
@@ -1344,6 +1344,11 @@ static void ppc_spapr_reset(void)
     /* We have to do this after vcpus are created since it calls ioctls */
     if (kvm_enabled()) {
         kvmppc_check_cap_large_decr();
+
+        if ((spapr->large_decr_support == 1) && !kvmppc_has_cap_large_decr()) {
+            error_report("Large decrementer unsupported by hypervisor");
+            exit(1);
+        }
     }
 
     qemu_devices_reset();
@@ -1451,7 +1456,9 @@ static int spapr_import_large_decr_bits(sPAPRMachineState *spapr)
         CPUState *cs = CPU(cpu);
         PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
 
-        if (kvm_enabled()) {
+        if (spapr->large_decr_support == -1) {
+            /* Large decrementer disabled on the command line */
+        } else if (kvm_enabled()) {
             if (!kvmppc_has_cap_large_decr()) {
                 error_report("Host doesn't support large decrementer and guest requires it");
                 return -EINVAL;
@@ -2554,6 +2561,37 @@ static void spapr_set_modern_hotplug_events(Object *obj, bool value,
     spapr->use_hotplug_event_source = value;
 }
 
+static char *spapr_get_large_decr_support(Object *obj, Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    switch (spapr->large_decr_support) {
+    case -1:
+        return g_strdup("disabled");
+    case 1:
+        return g_strdup("required");
+    default:
+        return g_strdup("default");
+    }
+}
+
+static void spapr_set_large_decr_support(Object *obj, const char *value,
+                                         Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    if (!strncmp(value, "disabled", strlen("disabled"))) {
+        spapr->large_decr_support = -1;
+    } else if (!strncmp(value, "required", strlen("required"))) {
+        spapr->large_decr_support = 1;
+    } else if (!strncmp(value, "default", strlen("default"))) {
+        spapr->large_decr_support = 0;
+    } else {
+        error_report("Unknown large-decr-support specified '%s'", value);
+        exit(1);
+    }
+}
+
 static void spapr_machine_initfn(Object *obj)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
@@ -2574,6 +2612,15 @@ static void spapr_machine_initfn(Object *obj)
                                     " place of standard EPOW events when possible"
                                     " (required for memory hot-unplug support)",
                                     NULL);
+    object_property_add_str(obj, "large-decr-support",
+                            spapr_get_large_decr_support,
+                            spapr_set_large_decr_support, NULL);
+    object_property_set_description(obj, "large-decr-support",
+                                    "Specifies the level of large decrementer support"
+                                    " {required - don't start if not available "
+                                    "| disabled - disable the large decrementer"
+                                    " | default - depend on hypervisor support}"
+                                    , NULL);
 }
 
 static void spapr_machine_finalizefn(Object *obj)
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index c06421b..b4b22cb 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1096,7 +1096,9 @@ static void cas_enable_large_decr(PowerPCCPU *cpu, sPAPRMachineState *spapr)
     PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
     bool guest_large_decr = false;
 
-    if (cpu->compat_pvr) {
+    if (spapr->large_decr_support == -1) {
+        /* Large decrementer disabled on the command line */
+    } else if (cpu->compat_pvr) {
         guest_large_decr = cpu->compat_pvr >= CPU_POWERPC_LOGICAL_3_00;
     } else {
         guest_large_decr = (cpu->env.spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 4ba9b89..65c5659 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -114,6 +114,7 @@ struct sPAPRMachineState {
     /*< public >*/
     char *kvm_type;
     MemoryHotplugState hotplug_memory;
+    int large_decr_support; /* 1 -> required | 0 -> default | -1 -> disable */
 
     const char *icp_type;
 };
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] target/ppc: Implement large decrementer support for TCG
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 1/5] target/ppc: Implement large decrementer support for TCG Suraj Jitindar Singh
@ 2017-06-13  7:50   ` David Gibson
  0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2017-06-13  7:50 UTC (permalink / raw)
  To: Suraj Jitindar Singh; +Cc: qemu-ppc, qemu-devel, agraf

[-- Attachment #1: Type: text/plain, Size: 13259 bytes --]

On Thu, Jun 08, 2017 at 05:03:47PM +1000, Suraj Jitindar Singh wrote:
> The large decrementer is an operating mode of the decrementer.
> The decrementer is normally a 32-bit register.
> When operating in large decrementer mode the decrementer is a d-bit
> register which is sign extended to 64-bits (where d is implementation
> dependant).
> 
> Implement support for a TCG guest to use the decrementer in large
> decrementer mode. This means updating the decrementer access functions
> to accept larger values under the correct conditions.
> 
> The operting mode of the decrementer is controlled by the LPCR_LD bit in
> the logical parition control register (LPCR).
> 
> The operating mode of the hypervisor decrementer is dependant on the cpu
> model, >= POWER9 -> large hypervisor decrementer.
> 
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> ---
>  hw/ppc/ppc.c                | 81 +++++++++++++++++++++++++++++++--------------
>  target/ppc/cpu-qom.h        |  1 +
>  target/ppc/cpu.h            |  8 ++---
>  target/ppc/mmu-hash64.c     |  2 +-
>  target/ppc/translate.c      |  2 +-
>  target/ppc/translate_init.c |  3 ++
>  6 files changed, 67 insertions(+), 30 deletions(-)
> 
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index 224184d..49c52ed 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -649,11 +649,11 @@ bool ppc_decr_clear_on_delivery(CPUPPCState *env)
>      return ((tb_env->flags & flags) == PPC_DECR_UNDERFLOW_TRIGGERED);
>  }
>  
> -static inline uint32_t _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next)
> +static inline target_ulong _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next,
> +                                              bool large_decr)

I think this low-level internal function should always return the full
number of available bits.  It makes more sense to clamp to 32-bits in
the callers implementing the actual interfaces which require that.

>  {
>      ppc_tb_t *tb_env = env->tb_env;
> -    uint32_t decr;
> -    int64_t diff;
> +    int64_t decr, diff;
>  
>      diff = next - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
>      if (diff >= 0) {
> @@ -663,12 +663,16 @@ static inline uint32_t _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next)
>      }  else {
>          decr = -muldiv64(-diff, tb_env->decr_freq, NANOSECONDS_PER_SECOND);
>      }
> -    LOG_TB("%s: %08" PRIx32 "\n", __func__, decr);
> +    LOG_TB("%s: %016" PRIx64 "\n", __func__, decr);
>  
> -    return decr;
> +    /*
> +     * If large decrementer is enabled then the decrementer is signed extened
> +     * to 64 bits, otherwise it is a 32 bit value.
> +     */
> +    return large_decr ? decr : (uint32_t) decr;
>  }
>  
> -uint32_t cpu_ppc_load_decr (CPUPPCState *env)
> +target_ulong cpu_ppc_load_decr (CPUPPCState *env)
>  {
>      ppc_tb_t *tb_env = env->tb_env;
>  
> @@ -676,14 +680,16 @@ uint32_t cpu_ppc_load_decr (CPUPPCState *env)
>          return env->spr[SPR_DECR];
>      }
>  
> -    return _cpu_ppc_load_decr(env, tb_env->decr_next);
> +    return _cpu_ppc_load_decr(env, tb_env->decr_next,
> +                              env->spr[SPR_LPCR] & LPCR_LD);
>  }
>  
> -uint32_t cpu_ppc_load_hdecr (CPUPPCState *env)
> +target_ulong cpu_ppc_load_hdecr (CPUPPCState *env)
>  {
>      ppc_tb_t *tb_env = env->tb_env;
>  
> -    return _cpu_ppc_load_decr(env, tb_env->hdecr_next);
> +    return _cpu_ppc_load_decr(env, tb_env->hdecr_next,
> +                              env->mmu_model & POWERPC_MMU_V3);
>  }
>  
>  uint64_t cpu_ppc_load_purr (CPUPPCState *env)
> @@ -737,13 +743,20 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t *nextp,
>                                   QEMUTimer *timer,
>                                   void (*raise_excp)(void *),
>                                   void (*lower_excp)(PowerPCCPU *),
> -                                 uint32_t decr, uint32_t value)
> +                                 target_ulong decr, target_ulong value,
> +                                 int decr_bits)
>  {
>      CPUPPCState *env = &cpu->env;
>      ppc_tb_t *tb_env = env->tb_env;
>      uint64_t now, next;
>  
> -    LOG_TB("%s: %08" PRIx32 " => %08" PRIx32 "\n", __func__,
> +    /* Truncate value to decr_width and sign extend for simplicity */
> +    value &= ((1ULL << decr_bits) - 1);
> +    if (value & (1ULL << (decr_bits - 1))) { /* Negative */
> +        value |= (0xFFFFFFFFULL << decr_bits);
> +    }
> +
> +    LOG_TB("%s: " TARGET_FMT_lx " => " TARGET_FMT_lx "\n", __func__,
>                  decr, value);
>  
>      if (kvm_enabled()) {
> @@ -765,15 +778,16 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t *nextp,
>       * an edge interrupt, so raise it here too.
>       */
>      if ((value < 3) ||
> -        ((tb_env->flags & PPC_DECR_UNDERFLOW_LEVEL) && (value & 0x80000000)) ||
> -        ((tb_env->flags & PPC_DECR_UNDERFLOW_TRIGGERED) && (value & 0x80000000)
> -          && !(decr & 0x80000000))) {
> +        ((tb_env->flags & PPC_DECR_UNDERFLOW_LEVEL) && (value & (1ULL << decr_bits))) ||
> +        ((tb_env->flags & PPC_DECR_UNDERFLOW_TRIGGERED) && (value & (1ULL << decr_bits))
> +          && !(decr & (1ULL << decr_bits)))) {
>          (*raise_excp)(cpu);
>          return;
>      }
>  
>      /* On MSB level based systems a 0 for the MSB stops interrupt delivery */
> -    if (!(value & 0x80000000) && (tb_env->flags & PPC_DECR_UNDERFLOW_LEVEL)) {
> +    if (!(value & (1ULL << decr_bits)) && (tb_env->flags &
> +                                         PPC_DECR_UNDERFLOW_LEVEL)) {
>          (*lower_excp)(cpu);
>      }
>  
> @@ -786,17 +800,24 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t *nextp,
>      timer_mod(timer, next);
>  }
>  
> -static inline void _cpu_ppc_store_decr(PowerPCCPU *cpu, uint32_t decr,
> -                                       uint32_t value)
> +static inline void _cpu_ppc_store_decr(PowerPCCPU *cpu, target_ulong decr,
> +                                       target_ulong value)
>  {
>      ppc_tb_t *tb_env = cpu->env.tb_env;
> +    int bits = 32;
> +
> +    if (cpu->env.spr[SPR_LPCR] & LPCR_LD) {
> +        PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +
> +        bits = pcc->large_decr_bits;
> +    }
>  
>      __cpu_ppc_store_decr(cpu, &tb_env->decr_next, tb_env->decr_timer,
>                           tb_env->decr_timer->cb, &cpu_ppc_decr_lower, decr,
> -                         value);
> +                         value, bits);
>  }
>  
> -void cpu_ppc_store_decr (CPUPPCState *env, uint32_t value)
> +void cpu_ppc_store_decr (CPUPPCState *env, target_ulong value)
>  {
>      PowerPCCPU *cpu = ppc_env_get_cpu(env);
>  
> @@ -810,19 +831,26 @@ static void cpu_ppc_decr_cb(void *opaque)
>      cpu_ppc_decr_excp(cpu);
>  }
>  
> -static inline void _cpu_ppc_store_hdecr(PowerPCCPU *cpu, uint32_t hdecr,
> -                                        uint32_t value)
> +static inline void _cpu_ppc_store_hdecr(PowerPCCPU *cpu, target_ulong hdecr,
> +                                        target_ulong value)
>  {
>      ppc_tb_t *tb_env = cpu->env.tb_env;
> +    int bits = 32;
> +
> +    if (cpu->env.mmu_model & POWERPC_MMU_V3) {
> +        PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +
> +        bits = pcc->large_decr_bits;

The pcc already knows if it is POWER9 or not, so you could just have
decr_bits in there unconditionally, and set it to 32 for pre-POWER9
CPUs.  Only for the non-HV decr do you need to runtime clamp.

> +    }
>  
>      if (tb_env->hdecr_timer != NULL) {
>          __cpu_ppc_store_decr(cpu, &tb_env->hdecr_next, tb_env->hdecr_timer,
>                               tb_env->hdecr_timer->cb, &cpu_ppc_hdecr_lower,
> -                             hdecr, value);
> +                             hdecr, value, bits);
>      }
>  }
>  
> -void cpu_ppc_store_hdecr (CPUPPCState *env, uint32_t value)
> +void cpu_ppc_store_hdecr (CPUPPCState *env, target_ulong value)
>  {
>      PowerPCCPU *cpu = ppc_env_get_cpu(env);
>  
> @@ -848,7 +876,9 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t freq)
>  {
>      CPUPPCState *env = opaque;
>      PowerPCCPU *cpu = ppc_env_get_cpu(env);
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>      ppc_tb_t *tb_env = env->tb_env;
> +    int decr_bits = 32;
>  
>      tb_env->tb_freq = freq;
>      tb_env->decr_freq = freq;
> @@ -857,7 +887,10 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t freq)
>       * it's not ready to handle it...
>       */
>      _cpu_ppc_store_decr(cpu, 0xFFFFFFFF, 0xFFFFFFFF);
> -    _cpu_ppc_store_hdecr(cpu, 0xFFFFFFFF, 0xFFFFFFFF);
> +    if (env->mmu_model & POWERPC_MMU_V3) {
> +        decr_bits = pcc->large_decr_bits;
> +    }
> +    _cpu_ppc_store_hdecr(cpu, (1 << decr_bits) - 1, (1 << decr_bits) - 1);
>      cpu_ppc_store_purr(cpu, 0x0000000000000000ULL);
>  }
>  
> diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
> index d0cf6ca..523979c 100644
> --- a/target/ppc/cpu-qom.h
> +++ b/target/ppc/cpu-qom.h
> @@ -198,6 +198,7 @@ typedef struct PowerPCCPUClass {
>      uint32_t l1_dcache_size, l1_icache_size;
>      const struct ppc_segment_page_sizes *sps;
>      struct ppc_radix_page_info *radix_page_info;
> +    uint32_t large_decr_bits;
>      void (*init_proc)(CPUPPCState *env);
>      int  (*check_pow)(CPUPPCState *env);
>      int (*handle_mmu_fault)(PowerPCCPU *cpu, vaddr eaddr, int rwx, int mmu_idx);
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 401e10e..f6e86b6 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1309,10 +1309,10 @@ uint32_t cpu_ppc_load_atbu (CPUPPCState *env);
>  void cpu_ppc_store_atbl (CPUPPCState *env, uint32_t value);
>  void cpu_ppc_store_atbu (CPUPPCState *env, uint32_t value);
>  bool ppc_decr_clear_on_delivery(CPUPPCState *env);
> -uint32_t cpu_ppc_load_decr (CPUPPCState *env);
> -void cpu_ppc_store_decr (CPUPPCState *env, uint32_t value);
> -uint32_t cpu_ppc_load_hdecr (CPUPPCState *env);
> -void cpu_ppc_store_hdecr (CPUPPCState *env, uint32_t value);
> +target_ulong cpu_ppc_load_decr (CPUPPCState *env);
> +void cpu_ppc_store_decr (CPUPPCState *env, target_ulong value);
> +target_ulong cpu_ppc_load_hdecr (CPUPPCState *env);
> +void cpu_ppc_store_hdecr (CPUPPCState *env, target_ulong value);
>  uint64_t cpu_ppc_load_purr (CPUPPCState *env);
>  uint32_t cpu_ppc601_load_rtcl (CPUPPCState *env);
>  uint32_t cpu_ppc601_load_rtcu (CPUPPCState *env);
> diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> index 14d34e5..b1e1764 100644
> --- a/target/ppc/mmu-hash64.c
> +++ b/target/ppc/mmu-hash64.c
> @@ -1081,7 +1081,7 @@ void helper_store_lpcr(CPUPPCState *env, target_ulong val)
>      case POWERPC_MMU_VER_3_00: /* P9 */
>          lpcr = val & (LPCR_VPM1 | LPCR_ISL | LPCR_KBV | LPCR_DPFD |
>                        (LPCR_PECE_U_MASK & LPCR_HVEE) | LPCR_ILE | LPCR_AIL |
> -                      LPCR_UPRT | LPCR_EVIRT | LPCR_ONL |
> +                      LPCR_UPRT | LPCR_EVIRT | LPCR_ONL | LPCR_LD |
>                        (LPCR_PECE_L_MASK & (LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
>                        LPCR_DEE | LPCR_OEE)) | LPCR_MER | LPCR_GTSE | LPCR_TC |
>                        LPCR_HEIC | LPCR_LPES0 | LPCR_HVICE | LPCR_HDICE);
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index c0cd64d..ebe1fa5 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -7006,7 +7006,7 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
>  #if !defined(NO_TIMER_DUMP)
>      cpu_fprintf(f, "TB %08" PRIu32 " %08" PRIu64
>  #if !defined(CONFIG_USER_ONLY)
> -                " DECR %08" PRIu32
> +                " DECR " TARGET_FMT_lu
>  #endif
>                  "\n",
>                  cpu_ppc_load_tbu(env), cpu_ppc_load_tbl(env)
> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
> index 56a0ab2..a0b2934 100644
> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -8995,6 +8995,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
>      /* segment page size remain the same */
>      pcc->sps = &POWER7_POWER8_sps;
>      pcc->radix_page_info = &POWER9_radix_page_info;
> +    pcc->large_decr_bits = 56;
>  #endif
>      pcc->excp_model = POWERPC_EXCP_POWER8;
>      pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
> @@ -9047,6 +9048,8 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>           * tables and guest translation shootdown by default
>           */
>          lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
> +        /* Disable Large Decrementer by Default */
> +        lpcr->default_value &= ~LPCR_LD;
>          lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
>                                 LPCR_OEE;
>          break;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] target/ppc: Implement large decrementer support for KVM
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 2/5] target/ppc: Implement large decrementer support for KVM Suraj Jitindar Singh
@ 2017-06-13  8:15   ` David Gibson
  0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2017-06-13  8:15 UTC (permalink / raw)
  To: Suraj Jitindar Singh; +Cc: qemu-ppc, qemu-devel, agraf

[-- Attachment #1: Type: text/plain, Size: 7180 bytes --]

On Thu, Jun 08, 2017 at 05:03:48PM +1000, Suraj Jitindar Singh wrote:
> The large decrementer is an operating mode of the decrementer.
> The decrementer is normally a 32-bit register.
> When operating in large decrementer mode the decrementer is a d-bit
> register which is sign extended to 64-bits (where d is implementation
> dependant).
> 
> Implement support for a KVM guest to use the decrementer in large
> decrementer mode. This means adding functions to query the large
> decrementer support of the hypervisor and to enable the large
> decrementer with the hypervisor.
> 
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> ---
>  target/ppc/kvm.c     | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  target/ppc/kvm_ppc.h | 25 ++++++++++++++++++++++
>  2 files changed, 84 insertions(+)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 51249ce..b2c94a0 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -88,6 +88,7 @@ static int cap_fixup_hcalls;
>  static int cap_htm;             /* Hardware transactional memory support */
>  static int cap_mmu_radix;
>  static int cap_mmu_hash_v3;
> +static int cap_large_decr;
>  
>  static uint32_t debug_inst_opcode;
>  
> @@ -393,6 +394,19 @@ target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu,
>      }
>  }
>  
> +void kvmppc_configure_large_decrementer(CPUState *cs, bool enable_ld)
> +{
> +    uint64_t lpcr;
> +
> +    kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
> +    if (enable_ld) {
> +        lpcr |= LPCR_LD;
> +    } else {
> +        lpcr &= LPCR_LD;
> +    }
> +    kvm_set_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
> +}

This is never called, which seems bogus.  The LPCR should already be
synchronized with KVM, so I'm not sure why this needs a special call.

>  static bool kvm_valid_page_size(uint32_t flags, long rampgsize, uint32_t shift)
>  {
>      if (!(flags & KVM_PPC_PAGE_SIZES_REAL)) {
> @@ -2004,6 +2018,11 @@ uint32_t kvmppc_get_dfp(void)
>      return kvmppc_read_int_cpu_dt("ibm,dfp");
>  }
>  
> +uint32_t kvmppc_get_dec_bits(void)
> +{
> +    return kvmppc_read_int_cpu_dt("ibm,dec-bits");
> +}
> +
>  static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo)
>   {
>       PowerPCCPU *cpu = ppc_env_get_cpu(env);
> @@ -2380,6 +2399,7 @@ static void kvmppc_host_cpu_class_init(ObjectClass *oc, void *data)
>  
>  #if defined(TARGET_PPC64)
>      pcc->radix_page_info = kvm_get_radix_page_info();
> +    pcc->large_decr_bits = kvmppc_get_dec_bits();

As you may have heard from SamB, autodetecting properties from KVM to
set on the guest CPU is.. problematic.  We already use it in a bunch
of places, but I'd like to discourage more examples of it.

>      if ((pcc->pvr & 0xffffff00) == CPU_POWERPC_POWER9_DD1) {
>          /*
> @@ -2424,6 +2444,45 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>      return cap_mmu_hash_v3;
>  }
>  
> +void kvmppc_check_cap_large_decr(void)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> +    CPUState *cs = CPU(cpu);
> +    bool large_dec_support;
> +    uint32_t dec_bits;
> +    uint64_t lpcr;
> +
> +    /*
> +     * Try and set the LPCR_LD (large decrementer) bit to enable the large
> +     * decrementer. A hypervisor with large decrementer capabilities will allow
> +     * this so the value read back after this will have the LPCR_LD bit set.
> +     * Otherwise the bit will be cleared meaning we can't use the large
> +     * decrementer.
> +     */
> +    kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
> +    lpcr |= LPCR_LD;
> +    kvm_set_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
> +    kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
> +    large_dec_support = !!(lpcr & LPCR_LD);
> +    /* Probably a good idea to clear it again */
> +    lpcr &= ~LPCR_LD;
> +    kvm_set_one_reg(cs, KVM_REG_PPC_LPCR_64, &lpcr);
> +
> +    /*
> +     * Check for the ibm,dec-bits property on the host, if it isn't there then
> +     * something has gone wrong and we're better off not letting the guest use
> +     * the large decrementer.
> +     */
> +    dec_bits = kvmppc_get_dec_bits();
> +
> +    cap_large_decr = large_dec_support && dec_bits;
> +}
> +
> +bool kvmppc_has_cap_large_decr(void)
> +{
> +    return cap_large_decr;
> +}
> +
>  PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void)
>  {
>      uint32_t host_pvr = mfpvr();
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index f48243d..c49c7f0 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -17,6 +17,7 @@ uint32_t kvmppc_get_tbfreq(void);
>  uint64_t kvmppc_get_clockfreq(void);
>  uint32_t kvmppc_get_vmx(void);
>  uint32_t kvmppc_get_dfp(void);
> +uint32_t kvmppc_get_dec_bits(void);
>  bool kvmppc_get_host_model(char **buf);
>  bool kvmppc_get_host_serial(char **buf);
>  int kvmppc_get_hasidle(CPUPPCState *env);
> @@ -36,6 +37,8 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu);
>  target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu,
>                                       bool radix, bool gtse,
>                                       uint64_t proc_tbl);
> +void kvmppc_check_cap_large_decr(void);
> +void kvmppc_configure_large_decrementer(CPUState *cs, bool enable_ld);
>  #ifndef CONFIG_USER_ONLY
>  off_t kvmppc_alloc_rma(void **rma);
>  bool kvmppc_spapr_use_multitce(void);
> @@ -60,6 +63,7 @@ bool kvmppc_has_cap_fixup_hcalls(void);
>  bool kvmppc_has_cap_htm(void);
>  bool kvmppc_has_cap_mmu_radix(void);
>  bool kvmppc_has_cap_mmu_hash_v3(void);
> +bool kvmppc_has_cap_large_decr(void);
>  int kvmppc_enable_hwrng(void);
>  int kvmppc_put_books_sregs(PowerPCCPU *cpu);
>  PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void);
> @@ -98,6 +102,11 @@ static inline uint32_t kvmppc_get_dfp(void)
>      return 0;
>  }
>  
> +static inline uint32_t kvmppc_get_dec_bits(void)
> +{
> +    return 0;

IIUC, this should never be called on non-KVM, so this should have an
abort() (or g_assert_not_reached()) rather than returning a
clearly-wrong value.

> +}
> +
>  static inline int kvmppc_get_hasidle(CPUPPCState *env)
>  {
>      return 0;
> @@ -170,6 +179,17 @@ static inline target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu,
>      return 0;
>  }
>  
> +static inline void kvmppc_check_cap_large_decr(void)
> +{
> +    return;
> +}
> +
> +static inline void kvmppc_configure_large_decrementer(CPUState *cs,
> +                                                      bool enable_ld)
> +{
> +    return;
> +}
> +
>  #ifndef CONFIG_USER_ONLY
>  static inline off_t kvmppc_alloc_rma(void **rma)
>  {
> @@ -282,6 +302,11 @@ static inline bool kvmppc_has_cap_mmu_hash_v3(void)
>      return false;
>  }
>  
> +static inline bool kvmppc_has_cap_large_decr(void)
> +{
> +    return false;
> +}
> +
>  static inline int kvmppc_enable_hwrng(void)
>  {
>      return -1;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] target/ppc: Implement migration support for large decrementer
  2017-06-08  7:03 ` [Qemu-devel] [PATCH 3/5] target/ppc: Implement migration support for large decrementer Suraj Jitindar Singh
@ 2017-06-13  8:20   ` David Gibson
  0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2017-06-13  8:20 UTC (permalink / raw)
  To: Suraj Jitindar Singh; +Cc: qemu-ppc, qemu-devel, agraf

[-- Attachment #1: Type: text/plain, Size: 5489 bytes --]

On Thu, Jun 08, 2017 at 05:03:49PM +1000, Suraj Jitindar Singh wrote:
> Implement support to migrate a guest which is using the large
> decrementer.
> 
> We need to save the decrementer width to the migration stream.
> On incoming migration we then need to check that the hypervisor is
> capable of letting the guest use the large decrementer and that the
> decrementer width is the same on the receiving side. Since there is no
> way to tell the guest when the width of the decrementer changes we have
> to terminate if the decrementer width is not what the guest expects.
> If we can use the large decrementer then we have to tell the hypervisor
> to enable it.
> 
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> ---
>  hw/ppc/spapr.c         | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |  1 +
>  2 files changed, 64 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5d10366..6ba869a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1420,6 +1420,45 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error **errp)
>      }
>  }
>  
> +static int spapr_import_large_decr_bits(sPAPRMachineState *spapr)
> +{
> +    /*
> +     * If the guest uses the large decrementer then this hypervisor must also
> +     * support it and have the exact same width. We must also enable the large
> +     * decrementer because we have no way to tell the guest to stop using it.
> +     */
> +    if (spapr->large_decr_bits) {
> +        uint32_t dec_bits = 32;
> +        PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> +        CPUState *cs = CPU(cpu);
> +        PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
> +
> +        if (kvm_enabled()) {
> +            if (!kvmppc_has_cap_large_decr()) {
> +                error_report("Host doesn't support large decrementer and guest requires it");
> +                return -EINVAL;
> +            }
> +            dec_bits = kvmppc_get_dec_bits();
> +        } else {
> +            dec_bits = pcc->large_decr_bits;
> +        }
> +
> +        if (spapr->large_decr_bits != dec_bits) {
> +            error_report("Host large decrementer size [%u] doesn't match what guest expects [%u]",
> +                         dec_bits, spapr->large_decr_bits);
> +            return -EINVAL;
> +        }

Could you just use a VMSTATE_EQUAL() rather than explicit post_load logic?

> +
> +        if (kvm_enabled()) {
> +            CPU_FOREACH(cs) {
> +                kvmppc_configure_large_decrementer(cs, true);
> +            }
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  static int spapr_post_load(void *opaque, int version_id)
>  {
>      sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> @@ -1439,8 +1478,13 @@ static int spapr_post_load(void *opaque, int version_id)
>       * value into the RTC device */
>      if (version_id < 3) {
>          err = spapr_rtc_import_offset(&spapr->rtc, spapr->rtc_offset);
> +        if (err) {
> +            return err;
> +        }
>      }
>  
> +    err = spapr_import_large_decr_bits(spapr);
> +
>      return err;
>  }
>  
> @@ -1529,6 +1573,24 @@ static const VMStateDescription vmstate_spapr_patb_entry = {
>      },
>  };
>  
> +static bool spapr_large_decr_entry_needed(void *opaque)
> +{
> +    sPAPRMachineState *spapr = opaque;
> +
> +    return !!spapr->large_decr_bits;

Hrm.  We have no existing releases - upstream or down - that support
POWER9, which is the first thing with large decr.  Rather than fancy
conditional logic, could we just always transfer the decr size for
POWER9?

Do we need to support qemu on POWER9 with a pre-large-decr host
kernel?  Or is POWER9 support new enough that we can just say that you
require a host kernel with large decr support to run POWER9 guests.
That could simplify several things.

> +}
> +
> +static const VMStateDescription vmstate_spapr_large_decr_entry = {
> +    .name = "spapr_large_decr_entry",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_large_decr_entry_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(large_decr_bits, sPAPRMachineState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
> @@ -1547,6 +1609,7 @@ static const VMStateDescription vmstate_spapr = {
>      .subsections = (const VMStateDescription*[]) {
>          &vmstate_spapr_ov5_cas,
>          &vmstate_spapr_patb_entry,
> +        &vmstate_spapr_large_decr_entry,
>          NULL
>      }
>  };
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 98fb78b..4ba9b89 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -91,6 +91,7 @@ struct sPAPRMachineState {
>      sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
>      bool cas_reboot;
>      bool cas_legacy_guest_workaround;
> +    uint32_t large_decr_bits; /* Large decrementer width (0 -> not in use) */

Having this here as well as in the CPU class seems a bit icky.

>      Notifier epow_notifier;
>      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-06-13  8:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-08  7:03 [Qemu-devel] [PATCH 0/5] target/ppc: Implement support for the Large Decrementer Suraj Jitindar Singh
2017-06-08  7:03 ` [Qemu-devel] [PATCH 1/5] target/ppc: Implement large decrementer support for TCG Suraj Jitindar Singh
2017-06-13  7:50   ` David Gibson
2017-06-08  7:03 ` [Qemu-devel] [PATCH 2/5] target/ppc: Implement large decrementer support for KVM Suraj Jitindar Singh
2017-06-13  8:15   ` David Gibson
2017-06-08  7:03 ` [Qemu-devel] [PATCH 3/5] target/ppc: Implement migration support for large decrementer Suraj Jitindar Singh
2017-06-13  8:20   ` David Gibson
2017-06-08  7:03 ` [Qemu-devel] [PATCH 4/5] target/ppc: Enable the large decrementer for TCG and KVM guests Suraj Jitindar Singh
2017-06-08  7:03 ` [Qemu-devel] [PATCH 5/5] target/ppc: Add cmd line option to disable the large decrementer Suraj Jitindar Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.