* [PATCH RFC 0/5] powerpc & KVM: Work around POWER9 TM hardware bugs
@ 2018-03-08 7:02 ` Paul Mackerras
0 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
POWER9 has some shortcomings in its implementation of transactional
memory. Starting with v2.2 of the "Nimbus" chip, some changes have
been made to the hardware which make it able to generate hypervisor
interrupts in the situations where hardware needs the hypervisor to
provide some assistance with the implementation. Specifically, the
core does not have enough storage to store a complete checkpoint of
all the architected state for all 4 threads, and therefore needs to
be able to offload the checkpointed state of threads which are in
transactional suspended state (for threads that are in transactional
state, the hardware can simply abort the transaction).
This series implements the hypervisor assistance for TM for KVM
guests, thus allowing them to use TM. This then means that we can
allow live migration of guests on POWER8 that may be using TM to
POWER9 hosts.
arch/powerpc/include/asm/asm-prototypes.h | 3 +
arch/powerpc/include/asm/cputable.h | 5 +-
arch/powerpc/include/asm/kvm_asm.h | 2 +
arch/powerpc/include/asm/kvm_book3s.h | 4 +
arch/powerpc/include/asm/kvm_book3s_64.h | 43 ++++++
arch/powerpc/include/asm/kvm_book3s_asm.h | 1 +
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/powerpc/include/asm/paca.h | 3 +
arch/powerpc/include/asm/powernv.h | 1 +
arch/powerpc/include/asm/ppc-opcode.h | 4 +
arch/powerpc/include/asm/reg.h | 7 +
arch/powerpc/kernel/asm-offsets.c | 3 +
arch/powerpc/kernel/cputable.c | 23 +++-
arch/powerpc/kernel/dt_cpu_ftrs.c | 3 +
arch/powerpc/kernel/exceptions-64s.S | 4 +-
arch/powerpc/kernel/idle_book3s.S | 19 +++
arch/powerpc/kvm/Makefile | 7 +
arch/powerpc/kvm/book3s_hv.c | 18 ++-
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 150 ++++++++++++++++++++-
arch/powerpc/kvm/book3s_hv_tm.c | 216 ++++++++++++++++++++++++++++++
arch/powerpc/kvm/book3s_hv_tm_builtin.c | 109 +++++++++++++++
arch/powerpc/kvm/powerpc.c | 5 +-
arch/powerpc/platforms/powernv/idle.c | 77 +++++++++++
23 files changed, 694 insertions(+), 14 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH RFC 0/5] powerpc & KVM: Work around POWER9 TM hardware bugs
@ 2018-03-08 7:02 ` Paul Mackerras
0 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
POWER9 has some shortcomings in its implementation of transactional
memory. Starting with v2.2 of the "Nimbus" chip, some changes have
been made to the hardware which make it able to generate hypervisor
interrupts in the situations where hardware needs the hypervisor to
provide some assistance with the implementation. Specifically, the
core does not have enough storage to store a complete checkpoint of
all the architected state for all 4 threads, and therefore needs to
be able to offload the checkpointed state of threads which are in
transactional suspended state (for threads that are in transactional
state, the hardware can simply abort the transaction).
This series implements the hypervisor assistance for TM for KVM
guests, thus allowing them to use TM. This then means that we can
allow live migration of guests on POWER8 that may be using TM to
POWER9 hosts.
arch/powerpc/include/asm/asm-prototypes.h | 3 +
arch/powerpc/include/asm/cputable.h | 5 +-
arch/powerpc/include/asm/kvm_asm.h | 2 +
arch/powerpc/include/asm/kvm_book3s.h | 4 +
arch/powerpc/include/asm/kvm_book3s_64.h | 43 ++++++
arch/powerpc/include/asm/kvm_book3s_asm.h | 1 +
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/powerpc/include/asm/paca.h | 3 +
arch/powerpc/include/asm/powernv.h | 1 +
arch/powerpc/include/asm/ppc-opcode.h | 4 +
arch/powerpc/include/asm/reg.h | 7 +
arch/powerpc/kernel/asm-offsets.c | 3 +
arch/powerpc/kernel/cputable.c | 23 +++-
arch/powerpc/kernel/dt_cpu_ftrs.c | 3 +
arch/powerpc/kernel/exceptions-64s.S | 4 +-
arch/powerpc/kernel/idle_book3s.S | 19 +++
arch/powerpc/kvm/Makefile | 7 +
arch/powerpc/kvm/book3s_hv.c | 18 ++-
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 150 ++++++++++++++++++++-
arch/powerpc/kvm/book3s_hv_tm.c | 216 ++++++++++++++++++++++++++++++
arch/powerpc/kvm/book3s_hv_tm_builtin.c | 109 +++++++++++++++
arch/powerpc/kvm/powerpc.c | 5 +-
arch/powerpc/platforms/powernv/idle.c | 77 +++++++++++
23 files changed, 694 insertions(+), 14 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH RFC 1/5] powerpc: Add a CPU feature bit for TM bug workarounds on POWER9 v2.2
2018-03-08 7:02 ` Paul Mackerras
@ 2018-03-08 7:02 ` Paul Mackerras
-1 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
This adds a CPU feature bit which is set for POWER9 "Nimbus" DD2.2
processors which will be used to enable the hypervisor to assist
hardware with the handling of checkpointed register values while
the CPU is in suspend state, in order to work around hardware
bugs.
When the dt_cpu_ftrs subsystem is in use, the software assistance
can be enabled using a "tm-suspend-hypervisor-assist" node in the
device tree. In the absence of such a node, a quirk enables the
assistance for POWER9 "Nimbus" DD2.2 processors.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/cputable.h | 5 ++++-
arch/powerpc/kernel/cputable.c | 24 ++++++++++++++++++++++--
arch/powerpc/kernel/dt_cpu_ftrs.c | 3 +++
3 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index a2c5c95..46343ac 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -203,6 +203,7 @@ static inline void cpu_feature_keys_init(void) { }
#define CPU_FTR_DAWR LONG_ASM_CONST(0x0400000000000000)
#define CPU_FTR_DABRX LONG_ASM_CONST(0x0800000000000000)
#define CPU_FTR_PMAO_BUG LONG_ASM_CONST(0x1000000000000000)
+#define CPU_FTR_P9_TM_EMUL LONG_ASM_CONST(0x2000000000000000)
#define CPU_FTR_POWER9_DD1 LONG_ASM_CONST(0x4000000000000000)
#define CPU_FTR_POWER9_DD2_1 LONG_ASM_CONST(0x8000000000000000)
@@ -470,6 +471,7 @@ static inline void cpu_feature_keys_init(void) { }
(~CPU_FTR_SAO))
#define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9
#define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1)
+#define CPU_FTRS_POWER9_DD2_2 (CPU_FTRS_POWER9 | CPU_FTR_P9_TM_EMUL)
#define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -489,7 +491,8 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
CPU_FTRS_POWER8 | CPU_FTRS_POWER8_DD1 | CPU_FTRS_CELL | \
CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9 | \
- CPU_FTRS_POWER9_DD1 | CPU_FTRS_POWER9_DD2_1)
+ CPU_FTRS_POWER9_DD1 | CPU_FTRS_POWER9_DD2_1 | \
+ CPU_FTRS_POWER9_DD2_2)
#endif
#else
enum {
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index c40a9fc..68052ea 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -553,11 +553,31 @@ static struct cpu_spec __initdata cpu_specs[] = {
.machine_check_early = __machine_check_early_realmode_p9,
.platform = "power9",
},
- { /* Power9 DD 2.1 or later (see DD2.0 above) */
+ { /* Power9 DD 2.1 */
+ .pvr_mask = 0xffffefff,
+ .pvr_value = 0x004e0201,
+ .cpu_name = "POWER9 (raw)",
+ .cpu_features = CPU_FTRS_POWER9_DD2_1,
+ .cpu_user_features = COMMON_USER_POWER9,
+ .cpu_user_features2 = COMMON_USER2_POWER9,
+ .mmu_features = MMU_FTRS_POWER9,
+ .icache_bsize = 128,
+ .dcache_bsize = 128,
+ .num_pmcs = 6,
+ .pmc_type = PPC_PMC_IBM,
+ .oprofile_cpu_type = "ppc64/power9",
+ .oprofile_type = PPC_OPROFILE_INVALID,
+ .cpu_setup = __setup_cpu_power9,
+ .cpu_restore = __restore_cpu_power9,
+ .flush_tlb = __flush_tlb_power9,
+ .machine_check_early = __machine_check_early_realmode_p9,
+ .platform = "power9",
+ },
+ { /* Power9 DD2.2 or later */
.pvr_mask = 0xffff0000,
.pvr_value = 0x004e0000,
.cpu_name = "POWER9 (raw)",
- .cpu_features = CPU_FTRS_POWER9_DD2_1,
+ .cpu_features = CPU_FTRS_POWER9_DD2_2,
.cpu_user_features = COMMON_USER_POWER9,
.cpu_user_features2 = COMMON_USER2_POWER9,
.mmu_features = MMU_FTRS_POWER9,
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 945e2c2..e181266 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -590,6 +590,7 @@ static struct dt_cpu_feature_match __initdata
{"virtual-page-class-key-protection", feat_enable, 0},
{"transactional-memory", feat_enable_tm, CPU_FTR_TM},
{"transactional-memory-v3", feat_enable_tm, 0},
+ {"tm-suspend-hypervisor-assist", feat_enable, CPU_FTR_P9_TM_EMUL},
{"idle-nap", feat_enable_idle_nap, 0},
{"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},
{"idle-stop", feat_enable_idle_stop, 0},
@@ -709,6 +710,8 @@ static __init void cpufeatures_cpu_quirks(void)
cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD1;
else if ((version & 0xffffefff) == 0x004e0201)
cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD2_1;
+ else if ((version & 0xffffefff) == 0x004e0202)
+ cur_cpu_spec->cpu_features |= CPU_FTR_P9_TM_EMUL;
}
static void __init cpufeatures_setup_finished(void)
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 1/5] powerpc: Add a CPU feature bit for TM bug workarounds on POWER9 v2.2
@ 2018-03-08 7:02 ` Paul Mackerras
0 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
This adds a CPU feature bit which is set for POWER9 "Nimbus" DD2.2
processors which will be used to enable the hypervisor to assist
hardware with the handling of checkpointed register values while
the CPU is in suspend state, in order to work around hardware
bugs.
When the dt_cpu_ftrs subsystem is in use, the software assistance
can be enabled using a "tm-suspend-hypervisor-assist" node in the
device tree. In the absence of such a node, a quirk enables the
assistance for POWER9 "Nimbus" DD2.2 processors.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/cputable.h | 5 ++++-
arch/powerpc/kernel/cputable.c | 24 ++++++++++++++++++++++--
arch/powerpc/kernel/dt_cpu_ftrs.c | 3 +++
3 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index a2c5c95..46343ac 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -203,6 +203,7 @@ static inline void cpu_feature_keys_init(void) { }
#define CPU_FTR_DAWR LONG_ASM_CONST(0x0400000000000000)
#define CPU_FTR_DABRX LONG_ASM_CONST(0x0800000000000000)
#define CPU_FTR_PMAO_BUG LONG_ASM_CONST(0x1000000000000000)
+#define CPU_FTR_P9_TM_EMUL LONG_ASM_CONST(0x2000000000000000)
#define CPU_FTR_POWER9_DD1 LONG_ASM_CONST(0x4000000000000000)
#define CPU_FTR_POWER9_DD2_1 LONG_ASM_CONST(0x8000000000000000)
@@ -470,6 +471,7 @@ static inline void cpu_feature_keys_init(void) { }
(~CPU_FTR_SAO))
#define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9
#define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1)
+#define CPU_FTRS_POWER9_DD2_2 (CPU_FTRS_POWER9 | CPU_FTR_P9_TM_EMUL)
#define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -489,7 +491,8 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
CPU_FTRS_POWER8 | CPU_FTRS_POWER8_DD1 | CPU_FTRS_CELL | \
CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9 | \
- CPU_FTRS_POWER9_DD1 | CPU_FTRS_POWER9_DD2_1)
+ CPU_FTRS_POWER9_DD1 | CPU_FTRS_POWER9_DD2_1 | \
+ CPU_FTRS_POWER9_DD2_2)
#endif
#else
enum {
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index c40a9fc..68052ea 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -553,11 +553,31 @@ static struct cpu_spec __initdata cpu_specs[] = {
.machine_check_early = __machine_check_early_realmode_p9,
.platform = "power9",
},
- { /* Power9 DD 2.1 or later (see DD2.0 above) */
+ { /* Power9 DD 2.1 */
+ .pvr_mask = 0xffffefff,
+ .pvr_value = 0x004e0201,
+ .cpu_name = "POWER9 (raw)",
+ .cpu_features = CPU_FTRS_POWER9_DD2_1,
+ .cpu_user_features = COMMON_USER_POWER9,
+ .cpu_user_features2 = COMMON_USER2_POWER9,
+ .mmu_features = MMU_FTRS_POWER9,
+ .icache_bsize = 128,
+ .dcache_bsize = 128,
+ .num_pmcs = 6,
+ .pmc_type = PPC_PMC_IBM,
+ .oprofile_cpu_type = "ppc64/power9",
+ .oprofile_type = PPC_OPROFILE_INVALID,
+ .cpu_setup = __setup_cpu_power9,
+ .cpu_restore = __restore_cpu_power9,
+ .flush_tlb = __flush_tlb_power9,
+ .machine_check_early = __machine_check_early_realmode_p9,
+ .platform = "power9",
+ },
+ { /* Power9 DD2.2 or later */
.pvr_mask = 0xffff0000,
.pvr_value = 0x004e0000,
.cpu_name = "POWER9 (raw)",
- .cpu_features = CPU_FTRS_POWER9_DD2_1,
+ .cpu_features = CPU_FTRS_POWER9_DD2_2,
.cpu_user_features = COMMON_USER_POWER9,
.cpu_user_features2 = COMMON_USER2_POWER9,
.mmu_features = MMU_FTRS_POWER9,
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 945e2c2..e181266 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -590,6 +590,7 @@ static struct dt_cpu_feature_match __initdata
{"virtual-page-class-key-protection", feat_enable, 0},
{"transactional-memory", feat_enable_tm, CPU_FTR_TM},
{"transactional-memory-v3", feat_enable_tm, 0},
+ {"tm-suspend-hypervisor-assist", feat_enable, CPU_FTR_P9_TM_EMUL},
{"idle-nap", feat_enable_idle_nap, 0},
{"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},
{"idle-stop", feat_enable_idle_stop, 0},
@@ -709,6 +710,8 @@ static __init void cpufeatures_cpu_quirks(void)
cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD1;
else if ((version & 0xffffefff) = 0x004e0201)
cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD2_1;
+ else if ((version & 0xffffefff) = 0x004e0202)
+ cur_cpu_spec->cpu_features |= CPU_FTR_P9_TM_EMUL;
}
static void __init cpufeatures_setup_finished(void)
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 2/5] powerpc/powernv: Provide a way to force a core into SMT4 mode
2018-03-08 7:02 ` Paul Mackerras
@ 2018-03-08 7:02 ` Paul Mackerras
-1 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
POWER9 processors up to and including "Nimbus" v2.2 have hardware
bugs relating to transactional memory and thread reconfiguration.
One of these bugs has a workaround which is to get the core into
SMT4 state temporarily. This workaround is only needed when
running bare-metal.
This patch provides a function which gets the core into SMT4 mode
by preventing threads from going to a stop state, and waking up
those which are already in a stop state. Once at least 3 threads
are not in a stop state, the core will be in SMT4 and we can
continue.
To do this, we add a "dont_stop" flag to the paca to tell the
thread not to go into a stop state. If this flag is set,
power9_idle_stop() just returns immediately with a return value
of 0. The pnv_power9_force_smt4_catch() function does the following:
1. Set the dont_stop flag for each thread in the core, except
ourselves (in fact we use an atomic_inc() in case more than
one thread is calling this function concurrently).
2. See how many threads are awake, indicated by their
requested_psscr field in the paca being 0. If this is at
least 3, skip to step 5.
3. Send a doorbell interrupt to each thread that was seen as
being in a stop state in step 2.
4. Until at least 3 threads are awake, scan the threads to which
we sent a doorbell interrupt and check if they are awake now.
This relies on the following properties:
- Once dont_stop is non-zero, requested_psccr can't go from zero to
non-zero, except transiently (and without the thread doing stop).
- requested_psscr being zero guarantees that the thread isn't in
a state-losing stop state where thread reconfiguration could occur.
- Doing stop with a PSSCR value of 0 won't be a state-losing stop
and thus won't allow thread reconfiguration.
- Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
must be in SMT4 mode, since SMT modes are powers of 2.
This does add a sync to power9_idle_stop(), which is necessary to
provide the correct ordering between setting requested_psscr and
checking dont_stop. The overhead of the sync should be unnoticeable
compared to the latency of going into and out of a stop state.
In order to cater for uses where the caller has an operation that
has to be done while the core is in SMT4, the core continues to be
kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
until the pnv_power9_force_smt4_release() function is called.
It undoes the effect of step 1 above and allows the other threads
to go into a stop state.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/asm-prototypes.h | 3 ++
arch/powerpc/include/asm/paca.h | 3 ++
arch/powerpc/include/asm/powernv.h | 1 +
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/idle_book3s.S | 19 ++++++++
arch/powerpc/platforms/powernv/idle.c | 77 +++++++++++++++++++++++++++++++
6 files changed, 104 insertions(+)
diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 7330150..4e14d23 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -126,4 +126,7 @@ extern int __ucmpdi2(u64, u64);
void _mcount(void);
unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip);
+void pnv_power9_force_smt4_catch(void);
+void pnv_power9_force_smt4_release(void);
+
#endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index b62c310..4803cc1 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -32,6 +32,7 @@
#include <asm/accounting.h>
#include <asm/hmi.h>
#include <asm/cpuidle.h>
+#include <asm/atomic.h>
register struct paca_struct *local_paca asm("r13");
@@ -177,6 +178,8 @@ struct paca_struct {
u8 thread_mask;
/* Mask to denote subcore sibling threads */
u8 subcore_sibling_mask;
+ /* Flag to request this thread not to stop */
+ atomic_t dont_stop;
/*
* Pointer to an array which contains pointer
* to the sibling threads' paca.
diff --git a/arch/powerpc/include/asm/powernv.h b/arch/powerpc/include/asm/powernv.h
index dc5f6a5..d1c2d2e6 100644
--- a/arch/powerpc/include/asm/powernv.h
+++ b/arch/powerpc/include/asm/powernv.h
@@ -40,6 +40,7 @@ static inline int pnv_npu2_handle_fault(struct npu_context *context,
}
static inline void pnv_tm_init(void) { }
+static inline void pnv_power9_force_smt4(void) { }
#endif
#endif /* _ASM_POWERNV_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index ea5eb91..dbefe30 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -759,6 +759,7 @@ int main(void)
OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
OFFSET(PACA_REQ_PSSCR, paca_struct, requested_psscr);
+ OFFSET(PACA_DONT_STOP, paca_struct, dont_stop);
#define STOP_SPR(x, f) OFFSET(x, paca_struct, stop_sprs.f)
STOP_SPR(STOP_PID, pid);
STOP_SPR(STOP_LDBAR, ldbar);
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 01e1c19..bffbed2 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -339,6 +339,7 @@ power_enter_stop:
bne .Lhandle_esl_ec_set
PPC_STOP
li r3,0 /* Since we didn't lose state, return 0 */
+ std r3, PACA_REQ_PSSCR(r13)
/*
* pnv_wakeup_noloss() expects r12 to contain the SRR1 value so
@@ -429,11 +430,27 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \
* r3 contains desired PSSCR register value.
*/
_GLOBAL(power9_idle_stop)
+ lwz r5, PACA_DONT_STOP(r13)
+ cmpwi r5, 0
+ bne 1f
std r3, PACA_REQ_PSSCR(r13)
+ sync
+ lwz r5, PACA_DONT_STOP(r13)
+ cmpwi r5, 0
+ bne 1f
mtspr SPRN_PSSCR,r3
LOAD_REG_ADDR(r4,power_enter_stop)
b pnv_powersave_common
/* No return */
+1:
+ /*
+ * We get here when TM / thread reconfiguration bug workaround
+ * code wants to get the CPU into SMT4 mode, and therefore
+ * we are being asked not to stop.
+ */
+ li r3, 0
+ std r3, PACA_REQ_PSSCR(r13)
+ blr /* return 0 for wakeup cause / SRR1 value */
/*
* On waking up from stop 0,1,2 with ESL=1 on POWER9 DD1,
@@ -584,6 +601,8 @@ FTR_SECTION_ELSE_NESTED(71)
mfspr r5, SPRN_PSSCR
rldicl r5,r5,4,60
ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_POWER9_DD1, 71)
+ li r0, 0 /* clear requested_psscr to say we're awake */
+ std r0, PACA_REQ_PSSCR(r13)
cmpd cr4,r5,r4
bge cr4,pnv_wakeup_tb_loss /* returns to caller */
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 443d5ca..c1ad222 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -24,6 +24,7 @@
#include <asm/code-patching.h>
#include <asm/smp.h>
#include <asm/runlatch.h>
+#include <asm/dbell.h>
#include "powernv.h"
#include "subcore.h"
@@ -387,6 +388,82 @@ void power9_idle(void)
power9_idle_type(pnv_default_stop_val, pnv_default_stop_mask);
}
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+/*
+ * This is used in working around bugs in thread reconfiguration
+ * on POWER9 (at least up to Nimbus DD2.2) relating to transactional
+ * memory and the way that XER[SO] is checkpointed.
+ * This function forces the core into SMT4 in order by asking
+ * all other threads not to stop, and sending a message to any
+ * that are in a stop state.
+ * Must be called with preemption disabled.
+ */
+void pnv_power9_force_smt4_catch(void)
+{
+ int cpu, cpu0, thr;
+ struct paca_struct *tpaca;
+ int awake_threads = 1; /* this thread is awake */
+ int poke_threads = 0;
+ int need_awake = threads_per_core;
+
+ cpu = smp_processor_id();
+ cpu0 = cpu & ~(threads_per_core - 1);
+ tpaca = &paca[cpu0];
+ for (thr = 0; thr < threads_per_core; ++thr) {
+ if (cpu != cpu0 + thr)
+ atomic_inc(&tpaca[thr].dont_stop);
+ }
+ /* order setting dont_stop vs testing requested_psscr */
+ mb();
+ for (thr = 0; thr < threads_per_core; ++thr) {
+ if (!tpaca[thr].requested_psscr)
+ ++awake_threads;
+ else
+ poke_threads |= (1 << thr);
+ }
+
+ /* If at least 3 threads are awake, the core is in SMT4 already */
+ if (awake_threads < need_awake) {
+ /* We have to wake some threads; we'll use msgsnd */
+ for (thr = 0; thr < threads_per_core; ++thr) {
+ if (poke_threads & (1 << thr)) {
+ ppc_msgsnd_sync();
+ ppc_msgsnd(PPC_DBELL_MSGTYPE, 0,
+ tpaca[thr].hw_cpu_id);
+ }
+ }
+ /* now spin until at least 3 threads are awake */
+ do {
+ for (thr = 0; thr < threads_per_core; ++thr) {
+ if ((poke_threads & (1 << thr)) &&
+ !tpaca[thr].requested_psscr) {
+ ++awake_threads;
+ poke_threads &= ~(1 << thr);
+ }
+ }
+ } while (awake_threads < need_awake);
+ }
+}
+EXPORT_SYMBOL_GPL(pnv_power9_force_smt4_catch);
+
+void pnv_power9_force_smt4_release(void)
+{
+ int cpu, cpu0, thr;
+ struct paca_struct *tpaca;
+
+ cpu = smp_processor_id();
+ cpu0 = cpu & ~(threads_per_core - 1);
+ tpaca = &paca[cpu0];
+
+ /* clear all the dont_stop flags */
+ for (thr = 0; thr < threads_per_core; ++thr) {
+ if (cpu != cpu0 + thr)
+ atomic_dec(&tpaca[thr].dont_stop);
+ }
+}
+EXPORT_SYMBOL_GPL(pnv_power9_force_smt4_release);
+#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
+
#ifdef CONFIG_HOTPLUG_CPU
static void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val)
{
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 2/5] powerpc/powernv: Provide a way to force a core into SMT4 mode
@ 2018-03-08 7:02 ` Paul Mackerras
0 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
POWER9 processors up to and including "Nimbus" v2.2 have hardware
bugs relating to transactional memory and thread reconfiguration.
One of these bugs has a workaround which is to get the core into
SMT4 state temporarily. This workaround is only needed when
running bare-metal.
This patch provides a function which gets the core into SMT4 mode
by preventing threads from going to a stop state, and waking up
those which are already in a stop state. Once at least 3 threads
are not in a stop state, the core will be in SMT4 and we can
continue.
To do this, we add a "dont_stop" flag to the paca to tell the
thread not to go into a stop state. If this flag is set,
power9_idle_stop() just returns immediately with a return value
of 0. The pnv_power9_force_smt4_catch() function does the following:
1. Set the dont_stop flag for each thread in the core, except
ourselves (in fact we use an atomic_inc() in case more than
one thread is calling this function concurrently).
2. See how many threads are awake, indicated by their
requested_psscr field in the paca being 0. If this is at
least 3, skip to step 5.
3. Send a doorbell interrupt to each thread that was seen as
being in a stop state in step 2.
4. Until at least 3 threads are awake, scan the threads to which
we sent a doorbell interrupt and check if they are awake now.
This relies on the following properties:
- Once dont_stop is non-zero, requested_psccr can't go from zero to
non-zero, except transiently (and without the thread doing stop).
- requested_psscr being zero guarantees that the thread isn't in
a state-losing stop state where thread reconfiguration could occur.
- Doing stop with a PSSCR value of 0 won't be a state-losing stop
and thus won't allow thread reconfiguration.
- Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
must be in SMT4 mode, since SMT modes are powers of 2.
This does add a sync to power9_idle_stop(), which is necessary to
provide the correct ordering between setting requested_psscr and
checking dont_stop. The overhead of the sync should be unnoticeable
compared to the latency of going into and out of a stop state.
In order to cater for uses where the caller has an operation that
has to be done while the core is in SMT4, the core continues to be
kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
until the pnv_power9_force_smt4_release() function is called.
It undoes the effect of step 1 above and allows the other threads
to go into a stop state.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/asm-prototypes.h | 3 ++
arch/powerpc/include/asm/paca.h | 3 ++
arch/powerpc/include/asm/powernv.h | 1 +
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/idle_book3s.S | 19 ++++++++
arch/powerpc/platforms/powernv/idle.c | 77 +++++++++++++++++++++++++++++++
6 files changed, 104 insertions(+)
diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 7330150..4e14d23 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -126,4 +126,7 @@ extern int __ucmpdi2(u64, u64);
void _mcount(void);
unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip);
+void pnv_power9_force_smt4_catch(void);
+void pnv_power9_force_smt4_release(void);
+
#endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index b62c310..4803cc1 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -32,6 +32,7 @@
#include <asm/accounting.h>
#include <asm/hmi.h>
#include <asm/cpuidle.h>
+#include <asm/atomic.h>
register struct paca_struct *local_paca asm("r13");
@@ -177,6 +178,8 @@ struct paca_struct {
u8 thread_mask;
/* Mask to denote subcore sibling threads */
u8 subcore_sibling_mask;
+ /* Flag to request this thread not to stop */
+ atomic_t dont_stop;
/*
* Pointer to an array which contains pointer
* to the sibling threads' paca.
diff --git a/arch/powerpc/include/asm/powernv.h b/arch/powerpc/include/asm/powernv.h
index dc5f6a5..d1c2d2e6 100644
--- a/arch/powerpc/include/asm/powernv.h
+++ b/arch/powerpc/include/asm/powernv.h
@@ -40,6 +40,7 @@ static inline int pnv_npu2_handle_fault(struct npu_context *context,
}
static inline void pnv_tm_init(void) { }
+static inline void pnv_power9_force_smt4(void) { }
#endif
#endif /* _ASM_POWERNV_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index ea5eb91..dbefe30 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -759,6 +759,7 @@ int main(void)
OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
OFFSET(PACA_REQ_PSSCR, paca_struct, requested_psscr);
+ OFFSET(PACA_DONT_STOP, paca_struct, dont_stop);
#define STOP_SPR(x, f) OFFSET(x, paca_struct, stop_sprs.f)
STOP_SPR(STOP_PID, pid);
STOP_SPR(STOP_LDBAR, ldbar);
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 01e1c19..bffbed2 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -339,6 +339,7 @@ power_enter_stop:
bne .Lhandle_esl_ec_set
PPC_STOP
li r3,0 /* Since we didn't lose state, return 0 */
+ std r3, PACA_REQ_PSSCR(r13)
/*
* pnv_wakeup_noloss() expects r12 to contain the SRR1 value so
@@ -429,11 +430,27 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \
* r3 contains desired PSSCR register value.
*/
_GLOBAL(power9_idle_stop)
+ lwz r5, PACA_DONT_STOP(r13)
+ cmpwi r5, 0
+ bne 1f
std r3, PACA_REQ_PSSCR(r13)
+ sync
+ lwz r5, PACA_DONT_STOP(r13)
+ cmpwi r5, 0
+ bne 1f
mtspr SPRN_PSSCR,r3
LOAD_REG_ADDR(r4,power_enter_stop)
b pnv_powersave_common
/* No return */
+1:
+ /*
+ * We get here when TM / thread reconfiguration bug workaround
+ * code wants to get the CPU into SMT4 mode, and therefore
+ * we are being asked not to stop.
+ */
+ li r3, 0
+ std r3, PACA_REQ_PSSCR(r13)
+ blr /* return 0 for wakeup cause / SRR1 value */
/*
* On waking up from stop 0,1,2 with ESL=1 on POWER9 DD1,
@@ -584,6 +601,8 @@ FTR_SECTION_ELSE_NESTED(71)
mfspr r5, SPRN_PSSCR
rldicl r5,r5,4,60
ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_POWER9_DD1, 71)
+ li r0, 0 /* clear requested_psscr to say we're awake */
+ std r0, PACA_REQ_PSSCR(r13)
cmpd cr4,r5,r4
bge cr4,pnv_wakeup_tb_loss /* returns to caller */
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 443d5ca..c1ad222 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -24,6 +24,7 @@
#include <asm/code-patching.h>
#include <asm/smp.h>
#include <asm/runlatch.h>
+#include <asm/dbell.h>
#include "powernv.h"
#include "subcore.h"
@@ -387,6 +388,82 @@ void power9_idle(void)
power9_idle_type(pnv_default_stop_val, pnv_default_stop_mask);
}
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+/*
+ * This is used in working around bugs in thread reconfiguration
+ * on POWER9 (at least up to Nimbus DD2.2) relating to transactional
+ * memory and the way that XER[SO] is checkpointed.
+ * This function forces the core into SMT4 in order by asking
+ * all other threads not to stop, and sending a message to any
+ * that are in a stop state.
+ * Must be called with preemption disabled.
+ */
+void pnv_power9_force_smt4_catch(void)
+{
+ int cpu, cpu0, thr;
+ struct paca_struct *tpaca;
+ int awake_threads = 1; /* this thread is awake */
+ int poke_threads = 0;
+ int need_awake = threads_per_core;
+
+ cpu = smp_processor_id();
+ cpu0 = cpu & ~(threads_per_core - 1);
+ tpaca = &paca[cpu0];
+ for (thr = 0; thr < threads_per_core; ++thr) {
+ if (cpu != cpu0 + thr)
+ atomic_inc(&tpaca[thr].dont_stop);
+ }
+ /* order setting dont_stop vs testing requested_psscr */
+ mb();
+ for (thr = 0; thr < threads_per_core; ++thr) {
+ if (!tpaca[thr].requested_psscr)
+ ++awake_threads;
+ else
+ poke_threads |= (1 << thr);
+ }
+
+ /* If at least 3 threads are awake, the core is in SMT4 already */
+ if (awake_threads < need_awake) {
+ /* We have to wake some threads; we'll use msgsnd */
+ for (thr = 0; thr < threads_per_core; ++thr) {
+ if (poke_threads & (1 << thr)) {
+ ppc_msgsnd_sync();
+ ppc_msgsnd(PPC_DBELL_MSGTYPE, 0,
+ tpaca[thr].hw_cpu_id);
+ }
+ }
+ /* now spin until at least 3 threads are awake */
+ do {
+ for (thr = 0; thr < threads_per_core; ++thr) {
+ if ((poke_threads & (1 << thr)) &&
+ !tpaca[thr].requested_psscr) {
+ ++awake_threads;
+ poke_threads &= ~(1 << thr);
+ }
+ }
+ } while (awake_threads < need_awake);
+ }
+}
+EXPORT_SYMBOL_GPL(pnv_power9_force_smt4_catch);
+
+void pnv_power9_force_smt4_release(void)
+{
+ int cpu, cpu0, thr;
+ struct paca_struct *tpaca;
+
+ cpu = smp_processor_id();
+ cpu0 = cpu & ~(threads_per_core - 1);
+ tpaca = &paca[cpu0];
+
+ /* clear all the dont_stop flags */
+ for (thr = 0; thr < threads_per_core; ++thr) {
+ if (cpu != cpu0 + thr)
+ atomic_dec(&tpaca[thr].dont_stop);
+ }
+}
+EXPORT_SYMBOL_GPL(pnv_power9_force_smt4_release);
+#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
+
#ifdef CONFIG_HOTPLUG_CPU
static void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val)
{
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 3/5] KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
2018-03-08 7:02 ` Paul Mackerras
@ 2018-03-08 7:02 ` Paul Mackerras
-1 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode). Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads. The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems. This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.
The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional. The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated. The trechkpt
instruction also causes a soft patch interrupt.
On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present. The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state. Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and
reads back as 0.
On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.
Emulation of the instructions that cause a softpatch interrupt is
handled in two paths. If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state. This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active. If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on. This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.
The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0. The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.
With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_asm.h | 2 +
arch/powerpc/include/asm/kvm_book3s.h | 4 +
arch/powerpc/include/asm/kvm_book3s_64.h | 43 ++++++
arch/powerpc/include/asm/kvm_book3s_asm.h | 1 +
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/powerpc/include/asm/ppc-opcode.h | 4 +
arch/powerpc/include/asm/reg.h | 7 +
arch/powerpc/kernel/asm-offsets.c | 2 +
arch/powerpc/kernel/cputable.c | 1 -
arch/powerpc/kernel/exceptions-64s.S | 4 +-
arch/powerpc/kvm/Makefile | 7 +
arch/powerpc/kvm/book3s_hv.c | 18 ++-
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 131 +++++++++++++++++-
arch/powerpc/kvm/book3s_hv_tm.c | 216 ++++++++++++++++++++++++++++++
arch/powerpc/kvm/book3s_hv_tm_builtin.c | 109 +++++++++++++++
arch/powerpc/kvm/powerpc.c | 5 +-
16 files changed, 545 insertions(+), 10 deletions(-)
create mode 100644 arch/powerpc/kvm/book3s_hv_tm.c
create mode 100644 arch/powerpc/kvm/book3s_hv_tm_builtin.c
diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 09a802b..a790d5c 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -108,6 +108,8 @@
/* book3s_hv */
+#define BOOK3S_INTERRUPT_HV_SOFTPATCH 0x1500
+
/*
* Special trap used to indicate to host that this is a
* passthrough interrupt that could not be handled
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 376ae80..4c02a73 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -241,6 +241,10 @@ extern void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr,
unsigned long mask);
extern void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr);
+extern int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu);
+extern int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu);
+extern void kvmhv_emulate_tm_rollback(struct kvm_vcpu *vcpu);
+
extern void kvmppc_entry_trampoline(void);
extern void kvmppc_hv_entry_trampoline(void);
extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 998f7b7..c424e44 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -472,6 +472,49 @@ static inline void set_dirty_bits_atomic(unsigned long *map, unsigned long i,
set_bit_le(i, map);
}
+static inline u64 sanitize_msr(u64 msr)
+{
+ msr &= ~MSR_HV;
+ msr |= MSR_ME;
+ return msr;
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+static inline void copy_from_checkpoint(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.cr = vcpu->arch.cr_tm;
+ vcpu->arch.xer = vcpu->arch.xer_tm;
+ vcpu->arch.lr = vcpu->arch.lr_tm;
+ vcpu->arch.ctr = vcpu->arch.ctr_tm;
+ vcpu->arch.amr = vcpu->arch.amr_tm;
+ vcpu->arch.ppr = vcpu->arch.ppr_tm;
+ vcpu->arch.dscr = vcpu->arch.dscr_tm;
+ vcpu->arch.tar = vcpu->arch.tar_tm;
+ memcpy(vcpu->arch.gpr, vcpu->arch.gpr_tm,
+ sizeof(vcpu->arch.gpr));
+ vcpu->arch.fp = vcpu->arch.fp_tm;
+ vcpu->arch.vr = vcpu->arch.vr_tm;
+ vcpu->arch.vrsave = vcpu->arch.vrsave_tm;
+}
+
+static inline void copy_to_checkpoint(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.cr_tm = vcpu->arch.cr;
+ vcpu->arch.xer_tm = vcpu->arch.xer;
+ vcpu->arch.lr_tm = vcpu->arch.lr;
+ vcpu->arch.ctr_tm = vcpu->arch.ctr;
+ vcpu->arch.amr_tm = vcpu->arch.amr;
+ vcpu->arch.ppr_tm = vcpu->arch.ppr;
+ vcpu->arch.dscr_tm = vcpu->arch.dscr;
+ vcpu->arch.tar_tm = vcpu->arch.tar;
+ memcpy(vcpu->arch.gpr_tm, vcpu->arch.gpr,
+ sizeof(vcpu->arch.gpr));
+ vcpu->arch.fp_tm = vcpu->arch.fp;
+ vcpu->arch.vr_tm = vcpu->arch.vr;
+ vcpu->arch.vrsave_tm = vcpu->arch.vrsave;
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
#endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
index ab386af..d978fdf 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -119,6 +119,7 @@ struct kvmppc_host_state {
u8 host_ipi;
u8 ptid; /* thread number within subcore when split */
u8 tid; /* thread number within whole core */
+ u8 fake_suspend;
struct kvm_vcpu *kvm_vcpu;
struct kvmppc_vcore *kvm_vcore;
void __iomem *xics_phys;
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 6b69d79..17498e9 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -609,6 +609,7 @@ struct kvm_vcpu_arch {
u64 tfhar;
u64 texasr;
u64 tfiar;
+ u64 orig_texasr;
u32 cr_tm;
u64 xer_tm;
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index f1083bc..772eff7 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -232,6 +232,7 @@
#define PPC_INST_MSGSYNC 0x7c0006ec
#define PPC_INST_MSGSNDP 0x7c00011c
#define PPC_INST_MSGCLRP 0x7c00015c
+#define PPC_INST_MTMSRD 0x7c000164
#define PPC_INST_MTTMR 0x7c0003dc
#define PPC_INST_NOP 0x60000000
#define PPC_INST_PASTE 0x7c20070d
@@ -239,8 +240,10 @@
#define PPC_INST_POPCNTB_MASK 0xfc0007fe
#define PPC_INST_POPCNTD 0x7c0003f4
#define PPC_INST_POPCNTW 0x7c0002f4
+#define PPC_INST_RFEBB 0x4c000124
#define PPC_INST_RFCI 0x4c000066
#define PPC_INST_RFDI 0x4c00004e
+#define PPC_INST_RFID 0x4c000024
#define PPC_INST_RFMCI 0x4c00004c
#define PPC_INST_MFSPR 0x7c0002a6
#define PPC_INST_MFSPR_DSCR 0x7c1102a6
@@ -277,6 +280,7 @@
#define PPC_INST_TRECHKPT 0x7c0007dd
#define PPC_INST_TRECLAIM 0x7c00075d
#define PPC_INST_TABORT 0x7c00071d
+#define PPC_INST_TSR 0x7c0005dd
#define PPC_INST_NAP 0x4c000364
#define PPC_INST_SLEEP 0x4c0003a4
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index e6c7ead..cb0f272 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -156,6 +156,8 @@
#define PSSCR_SD 0x00400000 /* Status Disable */
#define PSSCR_PLS 0xf000000000000000 /* Power-saving Level Status */
#define PSSCR_GUEST_VIS 0xf0000000000003ff /* Guest-visible PSSCR fields */
+#define PSSCR_FAKE_SUSPEND 0x00000400 /* Fake-suspend bit (P9 DD2.2) */
+#define PSSCR_FAKE_SUSPEND_LG 10 /* Fake-suspend bit position */
/* Floating Point Status and Control Register (FPSCR) Fields */
#define FPSCR_FX 0x80000000 /* FPU exception summary */
@@ -237,7 +239,12 @@
#define SPRN_TFIAR 0x81 /* Transaction Failure Inst Addr */
#define SPRN_TEXASR 0x82 /* Transaction EXception & Summary */
#define SPRN_TEXASRU 0x83 /* '' '' '' Upper 32 */
+#define TEXASR_ABORT __MASK(63-31) /* terminated by tabort or treclaim */
+#define TEXASR_SUSP __MASK(63-32) /* tx failed in suspended state */
+#define TEXASR_HV __MASK(63-34) /* MSR[HV] when failure occurred */
+#define TEXASR_PR __MASK(63-35) /* MSR[PR] when failure occurred */
#define TEXASR_FS __MASK(63-36) /* TEXASR Failure Summary */
+#define TEXASR_EXACT __MASK(63-37) /* TFIAR value is exact */
#define SPRN_TFHAR 0x80 /* Transaction Failure Handler Addr */
#define SPRN_TIDR 144 /* Thread ID register */
#define SPRN_CTRLF 0x088
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index dbefe30..daf809a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -568,6 +568,7 @@ int main(void)
OFFSET(VCPU_TFHAR, kvm_vcpu, arch.tfhar);
OFFSET(VCPU_TFIAR, kvm_vcpu, arch.tfiar);
OFFSET(VCPU_TEXASR, kvm_vcpu, arch.texasr);
+ OFFSET(VCPU_ORIG_TEXASR, kvm_vcpu, arch.orig_texasr);
OFFSET(VCPU_GPR_TM, kvm_vcpu, arch.gpr_tm);
OFFSET(VCPU_FPRS_TM, kvm_vcpu, arch.fp_tm.fpr);
OFFSET(VCPU_VRS_TM, kvm_vcpu, arch.vr_tm.vr);
@@ -650,6 +651,7 @@ int main(void)
HSTATE_FIELD(HSTATE_HOST_IPI, host_ipi);
HSTATE_FIELD(HSTATE_PTID, ptid);
HSTATE_FIELD(HSTATE_TID, tid);
+ HSTATE_FIELD(HSTATE_FAKE_SUSPEND, fake_suspend);
HSTATE_FIELD(HSTATE_MMCR0, host_mmcr[0]);
HSTATE_FIELD(HSTATE_MMCR1, host_mmcr[1]);
HSTATE_FIELD(HSTATE_MMCRA, host_mmcr[2]);
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 68052ea..b3de017 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -569,7 +569,6 @@ static struct cpu_spec __initdata cpu_specs[] = {
.oprofile_type = PPC_OPROFILE_INVALID,
.cpu_setup = __setup_cpu_power9,
.cpu_restore = __restore_cpu_power9,
- .flush_tlb = __flush_tlb_power9,
.machine_check_early = __machine_check_early_realmode_p9,
.platform = "power9",
},
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 243d072..9df9e0a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1273,7 +1273,7 @@ EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 0x100)
bne+ denorm_assist
#endif
- KVMTEST_PR(0x1500)
+ KVMTEST_HV(0x1500)
EXCEPTION_PROLOG_PSERIES_1(denorm_common, EXC_HV)
EXC_REAL_END(denorm_exception_hv, 0x1500, 0x100)
@@ -1285,7 +1285,7 @@ EXC_VIRT_END(denorm_exception, 0x5500, 0x100)
EXC_VIRT_NONE(0x5500, 0x100)
#endif
-TRAMP_KVM_SKIP(PACA_EXGEN, 0x1500)
+TRAMP_KVM_HV(PACA_EXGEN, 0x1500)
#ifdef CONFIG_PPC_DENORMALISATION
TRAMP_REAL_BEGIN(denorm_assist)
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 85ba80d..4b19da8 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -74,9 +74,15 @@ kvm-hv-y += \
book3s_64_mmu_hv.o \
book3s_64_mmu_radix.o
+kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
+ book3s_hv_tm.o
+
kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
book3s_hv_rm_xics.o book3s_hv_rm_xive.o
+kvm-book3s_64-builtin-tm-objs-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
+ book3s_hv_tm_builtin.o
+
ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
book3s_hv_hmi.o \
@@ -84,6 +90,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
book3s_hv_rm_mmu.o \
book3s_hv_ras.o \
book3s_hv_builtin.o \
+ $(kvm-book3s_64-builtin-tm-objs-y) \
$(kvm-book3s_64-builtin-xics-objs-y)
endif
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4863ab8..13df99fe 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1206,6 +1206,19 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
}
break;
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+ case BOOK3S_INTERRUPT_HV_SOFTPATCH:
+ /*
+ * This occurs for various TM-related instructions that
+ * we need to emulate on POWER9 DD2.2. We have already
+ * handled the cases where the guest was in real-suspend
+ * mode and was transitioning to transactional state.
+ */
+ r = kvmhv_p9_tm_emulation(vcpu);
+ break;
+#endif
+
case BOOK3S_INTERRUPT_HV_RM_HARD:
r = RESUME_PASSTHROUGH;
break;
@@ -1978,7 +1991,9 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
* turn off the HFSCR bit, which causes those instructions to trap.
*/
vcpu->arch.hfscr = mfspr(SPRN_HFSCR);
- if (!cpu_has_feature(CPU_FTR_TM))
+ if (cpu_has_feature(CPU_FTR_P9_TM_EMUL))
+ vcpu->arch.hfscr |= HFSCR_TM;
+ else if (!cpu_has_feature(CPU_FTR_TM_COMP))
vcpu->arch.hfscr &= ~HFSCR_TM;
if (cpu_has_feature(CPU_FTR_ARCH_300))
vcpu->arch.hfscr &= ~HFSCR_MSGP;
@@ -2242,6 +2257,7 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu, struct kvmppc_vcore *vc)
tpaca = &paca[cpu];
tpaca->kvm_hstate.kvm_vcpu = vcpu;
tpaca->kvm_hstate.ptid = cpu - vc->pcpu;
+ tpaca->kvm_hstate.fake_suspend = 0;
/* Order stores to hstate.kvm_vcpu etc. before store to kvm_vcore */
smp_wmb();
tpaca->kvm_hstate.kvm_vcore = vc;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f31f357..f73eba6 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -787,12 +787,15 @@ BEGIN_FTR_SECTION
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+/* Branch around the call if both CPU_FTR_TM and CPU_FTR_P9_TM_EMUL are off */
BEGIN_FTR_SECTION
+ b 91f
+END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_EMUL, 0)
/*
* NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR
*/
bl kvmppc_restore_tm
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
+91:
#endif
/* Load guest PMU registers */
@@ -915,11 +918,14 @@ BEGIN_FTR_SECTION
mtspr SPRN_ACOP, r6
mtspr SPRN_CSIGR, r7
mtspr SPRN_TACR, r8
+ nop
FTR_SECTION_ELSE
/* POWER9-only registers */
ld r5, VCPU_TID(r4)
ld r6, VCPU_PSSCR(r4)
+ lbz r8, HSTATE_FAKE_SUSPEND(r13)
oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */
+ rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG
ld r7, VCPU_HFSCR(r4)
mtspr SPRN_TIDR, r5
mtspr SPRN_PSSCR, r6
@@ -1370,6 +1376,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
std r3, VCPU_CTR(r9)
std r4, VCPU_XER(r9)
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+ /* For softpatch interrupt, go off and do TM instruction emulation */
+ cmpwi r12, BOOK3S_INTERRUPT_HV_SOFTPATCH
+ beq kvmppc_tm_emul
+#endif
+
/* If this is a page table miss then see if it's theirs or ours */
cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
beq kvmppc_hdsi
@@ -1729,12 +1741,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
bl kvmppc_save_fp
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+/* Branch around the call if both CPU_FTR_TM and CPU_FTR_P9_TM_EMUL are off */
BEGIN_FTR_SECTION
+ b 91f
+END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_EMUL, 0)
/*
* NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR
*/
bl kvmppc_save_tm
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
+91:
#endif
/* Increment yield count if they have a VPA */
@@ -2054,6 +2069,42 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
mtlr r0
blr
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+/*
+ * Softpatch interrupt for transactional memory emulation cases
+ * on POWER9 DD2.2. This is early in the guest exit path - we
+ * haven't saved registers or done a treclaim yet.
+ */
+kvmppc_tm_emul:
+ /* Save instruction image in HEIR */
+ mfspr r3, SPRN_HEIR
+ stw r3, VCPU_HEIR(r9)
+
+ /*
+ * The cases we want to handle here are those where the guest
+ * is in real suspend mode and is trying to transition to
+ * transactional mode.
+ */
+ lbz r0, HSTATE_FAKE_SUSPEND(r13)
+ cmpwi r0, 0 /* keep exiting guest if in fake suspend */
+ bne guest_exit_cont
+ rldicl r3, r11, 64 - MSR_TS_S_LG, 62
+ cmpwi r3, 1 /* or if not in suspend state */
+ bne guest_exit_cont
+
+ /* Call C code to do the emulation */
+ mr r3, r9
+ bl kvmhv_p9_tm_emulation_early
+ nop
+ ld r9, HSTATE_KVM_VCPU(r13)
+ li r12, BOOK3S_INTERRUPT_HV_SOFTPATCH
+ cmpwi r3, 0
+ beq guest_exit_cont /* continue exiting if not handled */
+ ld r10, VCPU_PC(r9)
+ ld r11, VCPU_MSR(r9)
+ b fast_interrupt_c_return /* go back to guest if handled */
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+
/*
* Check whether an HDSI is an HPTE not found fault or something else.
* If it is an HPTE not found fault that is due to the guest accessing
@@ -2587,13 +2638,16 @@ _GLOBAL(kvmppc_h_cede) /* r3 = vcpu pointer, r11 = msr, r13 = paca */
bl kvmppc_save_fp
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+/* Branch around the call if both CPU_FTR_TM and CPU_FTR_P9_TM_EMUL are off */
BEGIN_FTR_SECTION
+ b 91f
+END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_EMUL, 0)
/*
* NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR
*/
ld r9, HSTATE_KVM_VCPU(r13)
bl kvmppc_save_tm
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
+91:
#endif
/*
@@ -2700,12 +2754,15 @@ kvm_end_cede:
#endif
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+/* Branch around the call if both CPU_FTR_TM and CPU_FTR_P9_TM_EMUL are off */
BEGIN_FTR_SECTION
+ b 91f
+END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_EMUL, 0)
/*
* NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR
*/
bl kvmppc_restore_tm
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
+91:
#endif
/* load up FP state */
@@ -3046,6 +3103,15 @@ kvmppc_save_tm:
std r1, HSTATE_HOST_R1(r13)
li r3, TM_CAUSE_KVM_RESCHED
+BEGIN_FTR_SECTION
+ /* Emulation of the treclaim instruction needs TEXASR before treclaim */
+ mfspr r6, SPRN_TEXASR
+ std r6, VCPU_ORIG_TEXASR(r9)
+
+ rldicl. r8, r8, 64 - MSR_TS_S_LG, 62
+ beq 3f
+END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
+
/* Clear the MSR RI since r1, r13 are all going to be foobar. */
li r5, 0
mtmsrd r5, 1
@@ -3057,6 +3123,38 @@ kvmppc_save_tm:
SET_SCRATCH0(r13)
GET_PACA(r13)
std r9, PACATMSCRATCH(r13)
+
+ /* If doing TM emulation on POWER9 DD2.2, check for fake suspend mode */
+BEGIN_FTR_SECTION
+3:
+ lbz r9, HSTATE_FAKE_SUSPEND(r13)
+ cmpwi r9, 0
+ beq 2f
+ /*
+ * We were in fake suspend, so we are not going to save the
+ * register state as the guest checkpointed state (since
+ * we already have it), therefore we can now use any volatile GPR.
+ */
+ /* Reload stack pointer and TOC. */
+ ld r1, HSTATE_HOST_R1(r13)
+ ld r2, PACATOC(r13)
+ li r5, MSR_RI
+ mtmsrd r5, 1
+ HMT_MEDIUM
+ ld r6, HSTATE_DSCR(r13)
+ mtspr SPRN_DSCR, r6
+ li r0, 0
+ stb r0, HSTATE_FAKE_SUSPEND(r13)
+ mfspr r3, SPRN_PSSCR
+ /* PSSCR_FAKE_SUSPEND is a write-only bit, but clear it anyway */
+ li r0, PSSCR_FAKE_SUSPEND
+ andc r3, r3, r0
+ mtspr SPRN_PSSCR, r3
+ ld r9, HSTATE_KVM_VCPU(r13)
+ b 1f
+2:
+END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
+
ld r9, HSTATE_KVM_VCPU(r13)
/* Get a few more GPRs free. */
@@ -3182,6 +3280,15 @@ kvmppc_restore_tm:
mtspr SPRN_TEXASR, r7
/*
+ * If we are doing TM emulation for the guest on a POWER9 DD2,
+ * then we don't actually do a trechkpt -- we either set up
+ * fake-suspend mode, or emulate a TM rollback.
+ */
+BEGIN_FTR_SECTION
+ b .Ldo_tm_fake_load
+END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
+
+ /*
* We need to load up the checkpointed state for the guest.
* We need to do this early as it will blow away any GPRs, VSRs and
* some SPRs.
@@ -3253,10 +3360,24 @@ kvmppc_restore_tm:
/* Set the MSR RI since we have our registers back. */
li r5, MSR_RI
mtmsrd r5, 1
-
+9:
ld r0, PPC_LR_STKOFF(r1)
mtlr r0
blr
+
+.Ldo_tm_fake_load:
+ cmpwi r5, 1 /* check for suspended state */
+ bgt 10f
+ stb r5, HSTATE_FAKE_SUSPEND(r13)
+ b 9b /* and return */
+10: stdu r1, -PPC_MIN_STKFRM(r1)
+ /* guest is in transactional state, so simulate rollback */
+ mr r3, r4
+ bl kvmhv_emulate_tm_rollback
+ nop
+ ld r4, HSTATE_KVM_VCPU(r13) /* our vcpu pointer has been trashed */
+ addi r1, r1, PPC_MIN_STKFRM
+ b 9b
#endif
/*
diff --git a/arch/powerpc/kvm/book3s_hv_tm.c b/arch/powerpc/kvm/book3s_hv_tm.c
new file mode 100644
index 0000000..bf710ad
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_tm.c
@@ -0,0 +1,216 @@
+/*
+ * Copyright 2017 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_book3s_64.h>
+#include <asm/reg.h>
+#include <asm/ppc-opcode.h>
+
+static void emulate_tx_failure(struct kvm_vcpu *vcpu, u64 failure_cause)
+{
+ u64 texasr, tfiar;
+ u64 msr = vcpu->arch.shregs.msr;
+
+ tfiar = vcpu->arch.pc & ~0x3ull;
+ texasr = (failure_cause << 56) | TEXASR_ABORT | TEXASR_FS | TEXASR_EXACT;
+ if (MSR_TM_SUSPENDED(vcpu->arch.shregs.msr))
+ texasr |= TEXASR_SUSP;
+ if (msr & MSR_PR) {
+ texasr |= TEXASR_PR;
+ tfiar |= 1;
+ }
+ vcpu->arch.tfiar = tfiar;
+ /* Preserve ROT and TL fields of existing TEXASR */
+ vcpu->arch.texasr = (vcpu->arch.texasr & 0x3ffffff) | texasr;
+}
+
+/*
+ * This gets called on a softpatch interrupt on POWER9 DD2.2 processors.
+ * We expect to find a TM-related instruction to be emulated. The
+ * instruction image is in vcpu->arch.emul_inst. If the guest was in
+ * TM suspended or transactional state, the checkpointed state has been
+ * reclaimed and is in the vcpu struct. The CPU is in virtual mode in
+ * host context.
+ */
+int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
+{
+ u32 instr = vcpu->arch.emul_inst;
+ u64 msr = vcpu->arch.shregs.msr;
+ u64 newmsr, bescr;
+ int ra, rs;
+
+ switch (instr & 0xfc0007ff) {
+ case PPC_INST_RFID:
+ /* XXX do we need to check for PR=0 here? */
+ newmsr = vcpu->arch.shregs.srr1;
+ /* should only get here for Sx -> T1 transition */
+ WARN_ON_ONCE(!(MSR_TM_SUSPENDED(msr) &&
+ MSR_TM_TRANSACTIONAL(newmsr) &&
+ (newmsr & MSR_TM)));
+ newmsr = sanitize_msr(newmsr);
+ vcpu->arch.shregs.msr = newmsr;
+ vcpu->arch.cfar = vcpu->arch.pc - 4;
+ vcpu->arch.pc = vcpu->arch.shregs.srr0;
+ return RESUME_GUEST;
+
+ case PPC_INST_RFEBB:
+ if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ /* check EBB facility is available */
+ if (!(vcpu->arch.hfscr & HFSCR_EBB)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ if ((msr & MSR_PR) && !(vcpu->arch.fscr & FSCR_EBB)) {
+ /* generate a facility unavailable interrupt */
+ vcpu->arch.fscr = (vcpu->arch.fscr & ~(0xffull << 56)) |
+ ((u64)FSCR_EBB_LG << 56);
+ kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_FAC_UNAVAIL);
+ return RESUME_GUEST;
+ }
+ bescr = vcpu->arch.bescr;
+ /* expect to see a S->T transition requested */
+ WARN_ON_ONCE(!(MSR_TM_SUSPENDED(msr) &&
+ ((bescr >> 30) & 3) == 2));
+ bescr &= ~BESCR_GE;
+ if (instr & (1 << 11))
+ bescr |= BESCR_GE;
+ vcpu->arch.bescr = bescr;
+ msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
+ vcpu->arch.shregs.msr = msr;
+ vcpu->arch.cfar = vcpu->arch.pc - 4;
+ vcpu->arch.pc = vcpu->arch.ebbrr;
+ return RESUME_GUEST;
+
+ case PPC_INST_MTMSRD:
+ /* XXX do we need to check for PR=0 here? */
+ rs = (instr >> 21) & 0x1f;
+ newmsr = kvmppc_get_gpr(vcpu, rs);
+ /* check this is a Sx -> T1 transition */
+ WARN_ON_ONCE(!(MSR_TM_SUSPENDED(msr) &&
+ MSR_TM_TRANSACTIONAL(newmsr) &&
+ (newmsr & MSR_TM)));
+ /* mtmsrd doesn't change LE */
+ newmsr = (newmsr & ~MSR_LE) | (msr & MSR_LE);
+ newmsr = sanitize_msr(newmsr);
+ vcpu->arch.shregs.msr = newmsr;
+ return RESUME_GUEST;
+
+ case PPC_INST_TSR:
+ /* check for PR=1 and arch 2.06 bit set in PCR */
+ if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ /* check for TM disabled in the HFSCR or MSR */
+ if (!(vcpu->arch.hfscr & HFSCR_TM)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ if (!(msr & MSR_TM)) {
+ /* generate a facility unavailable interrupt */
+ vcpu->arch.fscr = (vcpu->arch.fscr & ~(0xffull << 56)) |
+ ((u64)FSCR_TM_LG << 56);
+ kvmppc_book3s_queue_irqprio(vcpu,
+ BOOK3S_INTERRUPT_FAC_UNAVAIL);
+ return RESUME_GUEST;
+ }
+ /* Set CR0 to indicate previous transactional state */
+ vcpu->arch.cr = (vcpu->arch.cr & 0x0fffffff) |
+ (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 28);
+ /* L=1 => tresume, L=0 => tsuspend */
+ if (instr & (1 << 21)) {
+ if (MSR_TM_SUSPENDED(msr))
+ msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
+ } else {
+ if (MSR_TM_TRANSACTIONAL(msr))
+ msr = (msr & ~MSR_TS_MASK) | MSR_TS_S;
+ }
+ vcpu->arch.shregs.msr = msr;
+ return RESUME_GUEST;
+
+ case PPC_INST_TRECLAIM:
+ /* check for TM disabled in the HFSCR or MSR */
+ if (!(vcpu->arch.hfscr & HFSCR_TM)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ if (!(msr & MSR_TM)) {
+ /* generate a facility unavailable interrupt */
+ vcpu->arch.fscr = (vcpu->arch.fscr & ~(0xffull << 56)) |
+ ((u64)FSCR_TM_LG << 56);
+ kvmppc_book3s_queue_irqprio(vcpu,
+ BOOK3S_INTERRUPT_FAC_UNAVAIL);
+ return RESUME_GUEST;
+ }
+ /* If no transaction active, generate TM bad thing */
+ if (!MSR_TM_ACTIVE(msr)) {
+ kvmppc_core_queue_program(vcpu, SRR1_PROGTM);
+ return RESUME_GUEST;
+ }
+ /* If failure was not previously recorded, recompute TEXASR */
+ if (!(vcpu->arch.orig_texasr & TEXASR_FS)) {
+ ra = (instr >> 16) & 0x1f;
+ if (ra)
+ ra = kvmppc_get_gpr(vcpu, ra) & 0xff;
+ emulate_tx_failure(vcpu, ra);
+ }
+
+ copy_from_checkpoint(vcpu);
+
+ /* Set CR0 to indicate previous transactional state */
+ vcpu->arch.cr = (vcpu->arch.cr & 0x0fffffff) |
+ (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 28);
+ vcpu->arch.shregs.msr &= ~MSR_TS_MASK;
+ return RESUME_GUEST;
+
+ case PPC_INST_TRECHKPT:
+ /* XXX do we need to check for PR=0 here? */
+ /* check for TM disabled in the HFSCR or MSR */
+ if (!(vcpu->arch.hfscr & HFSCR_TM)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ if (!(msr & MSR_TM)) {
+ /* generate a facility unavailable interrupt */
+ vcpu->arch.fscr = (vcpu->arch.fscr & ~(0xffull << 56)) |
+ ((u64)FSCR_TM_LG << 56);
+ kvmppc_book3s_queue_irqprio(vcpu,
+ BOOK3S_INTERRUPT_FAC_UNAVAIL);
+ return RESUME_GUEST;
+ }
+ /* If transaction active or TEXASR[FS] = 0, bad thing */
+ if (MSR_TM_ACTIVE(msr) || !(vcpu->arch.texasr & TEXASR_FS)) {
+ kvmppc_core_queue_program(vcpu, SRR1_PROGTM);
+ return RESUME_GUEST;
+ }
+
+ copy_to_checkpoint(vcpu);
+
+ /* Set CR0 to indicate previous transactional state */
+ vcpu->arch.cr = (vcpu->arch.cr & 0x0fffffff) |
+ (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 28);
+ vcpu->arch.shregs.msr = msr | MSR_TS_S;
+ return RESUME_GUEST;
+ }
+
+ /* What should we do here? We didn't recognize the instruction */
+ WARN_ON_ONCE(1);
+ return RESUME_GUEST;
+}
diff --git a/arch/powerpc/kvm/book3s_hv_tm_builtin.c b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
new file mode 100644
index 0000000..d98ccfd
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright 2017 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_book3s_64.h>
+#include <asm/reg.h>
+#include <asm/ppc-opcode.h>
+
+/*
+ * This handles the cases where the guest is in real suspend mode
+ * and we want to get back to the guest without dooming the transaction.
+ * The caller has checked that the guest is in real-suspend mode
+ * (MSR[TS] = S and the fake-suspend flag is not set).
+ */
+int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
+{
+ u32 instr = vcpu->arch.emul_inst;
+ u64 newmsr, msr, bescr;
+ int rs;
+
+ switch (instr & 0xfc0007ff) {
+ case PPC_INST_RFID:
+ /* XXX do we need to check for PR=0 here? */
+ newmsr = vcpu->arch.shregs.srr1;
+ /* should only get here for Sx -> T1 transition */
+ if (!(MSR_TM_TRANSACTIONAL(newmsr) && (newmsr & MSR_TM)))
+ return 0;
+ newmsr = sanitize_msr(newmsr);
+ vcpu->arch.shregs.msr = newmsr;
+ vcpu->arch.cfar = vcpu->arch.pc - 4;
+ vcpu->arch.pc = vcpu->arch.shregs.srr0;
+ return 1;
+
+ case PPC_INST_RFEBB:
+ /* check for PR=1 and arch 2.06 bit set in PCR */
+ msr = vcpu->arch.shregs.msr;
+ if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206))
+ return 0;
+ /* check EBB facility is available */
+ if (!(vcpu->arch.hfscr & HFSCR_EBB) ||
+ ((msr & MSR_PR) && !(mfspr(SPRN_FSCR) & FSCR_EBB)))
+ return 0;
+ bescr = mfspr(SPRN_BESCR);
+ /* expect to see a S->T transition requested */
+ if (((bescr >> 30) & 3) != 2)
+ return 0;
+ bescr &= ~BESCR_GE;
+ if (instr & (1 << 11))
+ bescr |= BESCR_GE;
+ mtspr(SPRN_BESCR, bescr);
+ msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
+ vcpu->arch.shregs.msr = msr;
+ vcpu->arch.cfar = vcpu->arch.pc - 4;
+ vcpu->arch.pc = mfspr(SPRN_EBBRR);
+ return 1;
+
+ case PPC_INST_MTMSRD:
+ /* XXX do we need to check for PR=0 here? */
+ rs = (instr >> 21) & 0x1f;
+ newmsr = kvmppc_get_gpr(vcpu, rs);
+ msr = vcpu->arch.shregs.msr;
+ /* check this is a Sx -> T1 transition */
+ if (!(MSR_TM_TRANSACTIONAL(newmsr) && (newmsr & MSR_TM)))
+ return 0;
+ /* mtmsrd doesn't change LE */
+ newmsr = (newmsr & ~MSR_LE) | (msr & MSR_LE);
+ newmsr = sanitize_msr(newmsr);
+ vcpu->arch.shregs.msr = newmsr;
+ return 1;
+
+ case PPC_INST_TSR:
+ /* we know the MSR has the TS field = S (0b01) here */
+ msr = vcpu->arch.shregs.msr;
+ /* check for PR=1 and arch 2.06 bit set in PCR */
+ if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206))
+ return 0;
+ /* check for TM disabled in the HFSCR or MSR */
+ if (!(vcpu->arch.hfscr & HFSCR_TM) || !(msr & MSR_TM))
+ return 0;
+ /* L=1 => tresume => set TS to T (0b10) */
+ if (instr & (1 << 21))
+ vcpu->arch.shregs.msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
+ /* Set CR0 to 0b0010 */
+ vcpu->arch.cr = (vcpu->arch.cr & 0x0fffffff) | 0x20000000;
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * This is called when we are returning to a guest in TM transactional
+ * state. We roll the guest state back to the checkpointed state.
+ */
+void kvmhv_emulate_tm_rollback(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.shregs.msr &= ~MSR_TS_MASK; /* go to N state */
+ vcpu->arch.pc = vcpu->arch.tfhar;
+ copy_from_checkpoint(vcpu);
+ vcpu->arch.cr = (vcpu->arch.cr & 0x0fffffff) | 0xa0000000;
+}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 52c2053..14e3265 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -646,10 +646,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = hv_enabled;
break;
#endif
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
case KVM_CAP_PPC_HTM:
r = hv_enabled &&
- (cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_HTM_COMP);
+ (!!(cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_HTM) ||
+ cpu_has_feature(CPU_FTR_P9_TM_EMUL));
break;
+#endif
default:
r = 0;
break;
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 3/5] KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
@ 2018-03-08 7:02 ` Paul Mackerras
0 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode). Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads. The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems. This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.
The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional. The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated. The trechkpt
instruction also causes a soft patch interrupt.
On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present. The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state. Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and
reads back as 0.
On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.
Emulation of the instructions that cause a softpatch interrupt is
handled in two paths. If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state. This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active. If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on. This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.
The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0. The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.
With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_asm.h | 2 +
arch/powerpc/include/asm/kvm_book3s.h | 4 +
arch/powerpc/include/asm/kvm_book3s_64.h | 43 ++++++
arch/powerpc/include/asm/kvm_book3s_asm.h | 1 +
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/powerpc/include/asm/ppc-opcode.h | 4 +
arch/powerpc/include/asm/reg.h | 7 +
arch/powerpc/kernel/asm-offsets.c | 2 +
arch/powerpc/kernel/cputable.c | 1 -
arch/powerpc/kernel/exceptions-64s.S | 4 +-
arch/powerpc/kvm/Makefile | 7 +
arch/powerpc/kvm/book3s_hv.c | 18 ++-
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 131 +++++++++++++++++-
arch/powerpc/kvm/book3s_hv_tm.c | 216 ++++++++++++++++++++++++++++++
arch/powerpc/kvm/book3s_hv_tm_builtin.c | 109 +++++++++++++++
arch/powerpc/kvm/powerpc.c | 5 +-
16 files changed, 545 insertions(+), 10 deletions(-)
create mode 100644 arch/powerpc/kvm/book3s_hv_tm.c
create mode 100644 arch/powerpc/kvm/book3s_hv_tm_builtin.c
diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 09a802b..a790d5c 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -108,6 +108,8 @@
/* book3s_hv */
+#define BOOK3S_INTERRUPT_HV_SOFTPATCH 0x1500
+
/*
* Special trap used to indicate to host that this is a
* passthrough interrupt that could not be handled
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 376ae80..4c02a73 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -241,6 +241,10 @@ extern void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr,
unsigned long mask);
extern void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr);
+extern int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu);
+extern int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu);
+extern void kvmhv_emulate_tm_rollback(struct kvm_vcpu *vcpu);
+
extern void kvmppc_entry_trampoline(void);
extern void kvmppc_hv_entry_trampoline(void);
extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 998f7b7..c424e44 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -472,6 +472,49 @@ static inline void set_dirty_bits_atomic(unsigned long *map, unsigned long i,
set_bit_le(i, map);
}
+static inline u64 sanitize_msr(u64 msr)
+{
+ msr &= ~MSR_HV;
+ msr |= MSR_ME;
+ return msr;
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+static inline void copy_from_checkpoint(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.cr = vcpu->arch.cr_tm;
+ vcpu->arch.xer = vcpu->arch.xer_tm;
+ vcpu->arch.lr = vcpu->arch.lr_tm;
+ vcpu->arch.ctr = vcpu->arch.ctr_tm;
+ vcpu->arch.amr = vcpu->arch.amr_tm;
+ vcpu->arch.ppr = vcpu->arch.ppr_tm;
+ vcpu->arch.dscr = vcpu->arch.dscr_tm;
+ vcpu->arch.tar = vcpu->arch.tar_tm;
+ memcpy(vcpu->arch.gpr, vcpu->arch.gpr_tm,
+ sizeof(vcpu->arch.gpr));
+ vcpu->arch.fp = vcpu->arch.fp_tm;
+ vcpu->arch.vr = vcpu->arch.vr_tm;
+ vcpu->arch.vrsave = vcpu->arch.vrsave_tm;
+}
+
+static inline void copy_to_checkpoint(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.cr_tm = vcpu->arch.cr;
+ vcpu->arch.xer_tm = vcpu->arch.xer;
+ vcpu->arch.lr_tm = vcpu->arch.lr;
+ vcpu->arch.ctr_tm = vcpu->arch.ctr;
+ vcpu->arch.amr_tm = vcpu->arch.amr;
+ vcpu->arch.ppr_tm = vcpu->arch.ppr;
+ vcpu->arch.dscr_tm = vcpu->arch.dscr;
+ vcpu->arch.tar_tm = vcpu->arch.tar;
+ memcpy(vcpu->arch.gpr_tm, vcpu->arch.gpr,
+ sizeof(vcpu->arch.gpr));
+ vcpu->arch.fp_tm = vcpu->arch.fp;
+ vcpu->arch.vr_tm = vcpu->arch.vr;
+ vcpu->arch.vrsave_tm = vcpu->arch.vrsave;
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
#endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
index ab386af..d978fdf 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -119,6 +119,7 @@ struct kvmppc_host_state {
u8 host_ipi;
u8 ptid; /* thread number within subcore when split */
u8 tid; /* thread number within whole core */
+ u8 fake_suspend;
struct kvm_vcpu *kvm_vcpu;
struct kvmppc_vcore *kvm_vcore;
void __iomem *xics_phys;
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 6b69d79..17498e9 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -609,6 +609,7 @@ struct kvm_vcpu_arch {
u64 tfhar;
u64 texasr;
u64 tfiar;
+ u64 orig_texasr;
u32 cr_tm;
u64 xer_tm;
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index f1083bc..772eff7 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -232,6 +232,7 @@
#define PPC_INST_MSGSYNC 0x7c0006ec
#define PPC_INST_MSGSNDP 0x7c00011c
#define PPC_INST_MSGCLRP 0x7c00015c
+#define PPC_INST_MTMSRD 0x7c000164
#define PPC_INST_MTTMR 0x7c0003dc
#define PPC_INST_NOP 0x60000000
#define PPC_INST_PASTE 0x7c20070d
@@ -239,8 +240,10 @@
#define PPC_INST_POPCNTB_MASK 0xfc0007fe
#define PPC_INST_POPCNTD 0x7c0003f4
#define PPC_INST_POPCNTW 0x7c0002f4
+#define PPC_INST_RFEBB 0x4c000124
#define PPC_INST_RFCI 0x4c000066
#define PPC_INST_RFDI 0x4c00004e
+#define PPC_INST_RFID 0x4c000024
#define PPC_INST_RFMCI 0x4c00004c
#define PPC_INST_MFSPR 0x7c0002a6
#define PPC_INST_MFSPR_DSCR 0x7c1102a6
@@ -277,6 +280,7 @@
#define PPC_INST_TRECHKPT 0x7c0007dd
#define PPC_INST_TRECLAIM 0x7c00075d
#define PPC_INST_TABORT 0x7c00071d
+#define PPC_INST_TSR 0x7c0005dd
#define PPC_INST_NAP 0x4c000364
#define PPC_INST_SLEEP 0x4c0003a4
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index e6c7ead..cb0f272 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -156,6 +156,8 @@
#define PSSCR_SD 0x00400000 /* Status Disable */
#define PSSCR_PLS 0xf000000000000000 /* Power-saving Level Status */
#define PSSCR_GUEST_VIS 0xf0000000000003ff /* Guest-visible PSSCR fields */
+#define PSSCR_FAKE_SUSPEND 0x00000400 /* Fake-suspend bit (P9 DD2.2) */
+#define PSSCR_FAKE_SUSPEND_LG 10 /* Fake-suspend bit position */
/* Floating Point Status and Control Register (FPSCR) Fields */
#define FPSCR_FX 0x80000000 /* FPU exception summary */
@@ -237,7 +239,12 @@
#define SPRN_TFIAR 0x81 /* Transaction Failure Inst Addr */
#define SPRN_TEXASR 0x82 /* Transaction EXception & Summary */
#define SPRN_TEXASRU 0x83 /* '' '' '' Upper 32 */
+#define TEXASR_ABORT __MASK(63-31) /* terminated by tabort or treclaim */
+#define TEXASR_SUSP __MASK(63-32) /* tx failed in suspended state */
+#define TEXASR_HV __MASK(63-34) /* MSR[HV] when failure occurred */
+#define TEXASR_PR __MASK(63-35) /* MSR[PR] when failure occurred */
#define TEXASR_FS __MASK(63-36) /* TEXASR Failure Summary */
+#define TEXASR_EXACT __MASK(63-37) /* TFIAR value is exact */
#define SPRN_TFHAR 0x80 /* Transaction Failure Handler Addr */
#define SPRN_TIDR 144 /* Thread ID register */
#define SPRN_CTRLF 0x088
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index dbefe30..daf809a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -568,6 +568,7 @@ int main(void)
OFFSET(VCPU_TFHAR, kvm_vcpu, arch.tfhar);
OFFSET(VCPU_TFIAR, kvm_vcpu, arch.tfiar);
OFFSET(VCPU_TEXASR, kvm_vcpu, arch.texasr);
+ OFFSET(VCPU_ORIG_TEXASR, kvm_vcpu, arch.orig_texasr);
OFFSET(VCPU_GPR_TM, kvm_vcpu, arch.gpr_tm);
OFFSET(VCPU_FPRS_TM, kvm_vcpu, arch.fp_tm.fpr);
OFFSET(VCPU_VRS_TM, kvm_vcpu, arch.vr_tm.vr);
@@ -650,6 +651,7 @@ int main(void)
HSTATE_FIELD(HSTATE_HOST_IPI, host_ipi);
HSTATE_FIELD(HSTATE_PTID, ptid);
HSTATE_FIELD(HSTATE_TID, tid);
+ HSTATE_FIELD(HSTATE_FAKE_SUSPEND, fake_suspend);
HSTATE_FIELD(HSTATE_MMCR0, host_mmcr[0]);
HSTATE_FIELD(HSTATE_MMCR1, host_mmcr[1]);
HSTATE_FIELD(HSTATE_MMCRA, host_mmcr[2]);
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 68052ea..b3de017 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -569,7 +569,6 @@ static struct cpu_spec __initdata cpu_specs[] = {
.oprofile_type = PPC_OPROFILE_INVALID,
.cpu_setup = __setup_cpu_power9,
.cpu_restore = __restore_cpu_power9,
- .flush_tlb = __flush_tlb_power9,
.machine_check_early = __machine_check_early_realmode_p9,
.platform = "power9",
},
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 243d072..9df9e0a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1273,7 +1273,7 @@ EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 0x100)
bne+ denorm_assist
#endif
- KVMTEST_PR(0x1500)
+ KVMTEST_HV(0x1500)
EXCEPTION_PROLOG_PSERIES_1(denorm_common, EXC_HV)
EXC_REAL_END(denorm_exception_hv, 0x1500, 0x100)
@@ -1285,7 +1285,7 @@ EXC_VIRT_END(denorm_exception, 0x5500, 0x100)
EXC_VIRT_NONE(0x5500, 0x100)
#endif
-TRAMP_KVM_SKIP(PACA_EXGEN, 0x1500)
+TRAMP_KVM_HV(PACA_EXGEN, 0x1500)
#ifdef CONFIG_PPC_DENORMALISATION
TRAMP_REAL_BEGIN(denorm_assist)
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 85ba80d..4b19da8 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -74,9 +74,15 @@ kvm-hv-y += \
book3s_64_mmu_hv.o \
book3s_64_mmu_radix.o
+kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
+ book3s_hv_tm.o
+
kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
book3s_hv_rm_xics.o book3s_hv_rm_xive.o
+kvm-book3s_64-builtin-tm-objs-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
+ book3s_hv_tm_builtin.o
+
ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
book3s_hv_hmi.o \
@@ -84,6 +90,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
book3s_hv_rm_mmu.o \
book3s_hv_ras.o \
book3s_hv_builtin.o \
+ $(kvm-book3s_64-builtin-tm-objs-y) \
$(kvm-book3s_64-builtin-xics-objs-y)
endif
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4863ab8..13df99fe 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1206,6 +1206,19 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
}
break;
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+ case BOOK3S_INTERRUPT_HV_SOFTPATCH:
+ /*
+ * This occurs for various TM-related instructions that
+ * we need to emulate on POWER9 DD2.2. We have already
+ * handled the cases where the guest was in real-suspend
+ * mode and was transitioning to transactional state.
+ */
+ r = kvmhv_p9_tm_emulation(vcpu);
+ break;
+#endif
+
case BOOK3S_INTERRUPT_HV_RM_HARD:
r = RESUME_PASSTHROUGH;
break;
@@ -1978,7 +1991,9 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
* turn off the HFSCR bit, which causes those instructions to trap.
*/
vcpu->arch.hfscr = mfspr(SPRN_HFSCR);
- if (!cpu_has_feature(CPU_FTR_TM))
+ if (cpu_has_feature(CPU_FTR_P9_TM_EMUL))
+ vcpu->arch.hfscr |= HFSCR_TM;
+ else if (!cpu_has_feature(CPU_FTR_TM_COMP))
vcpu->arch.hfscr &= ~HFSCR_TM;
if (cpu_has_feature(CPU_FTR_ARCH_300))
vcpu->arch.hfscr &= ~HFSCR_MSGP;
@@ -2242,6 +2257,7 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu, struct kvmppc_vcore *vc)
tpaca = &paca[cpu];
tpaca->kvm_hstate.kvm_vcpu = vcpu;
tpaca->kvm_hstate.ptid = cpu - vc->pcpu;
+ tpaca->kvm_hstate.fake_suspend = 0;
/* Order stores to hstate.kvm_vcpu etc. before store to kvm_vcore */
smp_wmb();
tpaca->kvm_hstate.kvm_vcore = vc;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f31f357..f73eba6 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -787,12 +787,15 @@ BEGIN_FTR_SECTION
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+/* Branch around the call if both CPU_FTR_TM and CPU_FTR_P9_TM_EMUL are off */
BEGIN_FTR_SECTION
+ b 91f
+END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_EMUL, 0)
/*
* NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR
*/
bl kvmppc_restore_tm
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
+91:
#endif
/* Load guest PMU registers */
@@ -915,11 +918,14 @@ BEGIN_FTR_SECTION
mtspr SPRN_ACOP, r6
mtspr SPRN_CSIGR, r7
mtspr SPRN_TACR, r8
+ nop
FTR_SECTION_ELSE
/* POWER9-only registers */
ld r5, VCPU_TID(r4)
ld r6, VCPU_PSSCR(r4)
+ lbz r8, HSTATE_FAKE_SUSPEND(r13)
oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */
+ rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG
ld r7, VCPU_HFSCR(r4)
mtspr SPRN_TIDR, r5
mtspr SPRN_PSSCR, r6
@@ -1370,6 +1376,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
std r3, VCPU_CTR(r9)
std r4, VCPU_XER(r9)
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+ /* For softpatch interrupt, go off and do TM instruction emulation */
+ cmpwi r12, BOOK3S_INTERRUPT_HV_SOFTPATCH
+ beq kvmppc_tm_emul
+#endif
+
/* If this is a page table miss then see if it's theirs or ours */
cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
beq kvmppc_hdsi
@@ -1729,12 +1741,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
bl kvmppc_save_fp
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+/* Branch around the call if both CPU_FTR_TM and CPU_FTR_P9_TM_EMUL are off */
BEGIN_FTR_SECTION
+ b 91f
+END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_EMUL, 0)
/*
* NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR
*/
bl kvmppc_save_tm
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
+91:
#endif
/* Increment yield count if they have a VPA */
@@ -2054,6 +2069,42 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
mtlr r0
blr
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+/*
+ * Softpatch interrupt for transactional memory emulation cases
+ * on POWER9 DD2.2. This is early in the guest exit path - we
+ * haven't saved registers or done a treclaim yet.
+ */
+kvmppc_tm_emul:
+ /* Save instruction image in HEIR */
+ mfspr r3, SPRN_HEIR
+ stw r3, VCPU_HEIR(r9)
+
+ /*
+ * The cases we want to handle here are those where the guest
+ * is in real suspend mode and is trying to transition to
+ * transactional mode.
+ */
+ lbz r0, HSTATE_FAKE_SUSPEND(r13)
+ cmpwi r0, 0 /* keep exiting guest if in fake suspend */
+ bne guest_exit_cont
+ rldicl r3, r11, 64 - MSR_TS_S_LG, 62
+ cmpwi r3, 1 /* or if not in suspend state */
+ bne guest_exit_cont
+
+ /* Call C code to do the emulation */
+ mr r3, r9
+ bl kvmhv_p9_tm_emulation_early
+ nop
+ ld r9, HSTATE_KVM_VCPU(r13)
+ li r12, BOOK3S_INTERRUPT_HV_SOFTPATCH
+ cmpwi r3, 0
+ beq guest_exit_cont /* continue exiting if not handled */
+ ld r10, VCPU_PC(r9)
+ ld r11, VCPU_MSR(r9)
+ b fast_interrupt_c_return /* go back to guest if handled */
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+
/*
* Check whether an HDSI is an HPTE not found fault or something else.
* If it is an HPTE not found fault that is due to the guest accessing
@@ -2587,13 +2638,16 @@ _GLOBAL(kvmppc_h_cede) /* r3 = vcpu pointer, r11 = msr, r13 = paca */
bl kvmppc_save_fp
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+/* Branch around the call if both CPU_FTR_TM and CPU_FTR_P9_TM_EMUL are off */
BEGIN_FTR_SECTION
+ b 91f
+END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_EMUL, 0)
/*
* NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR
*/
ld r9, HSTATE_KVM_VCPU(r13)
bl kvmppc_save_tm
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
+91:
#endif
/*
@@ -2700,12 +2754,15 @@ kvm_end_cede:
#endif
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+/* Branch around the call if both CPU_FTR_TM and CPU_FTR_P9_TM_EMUL are off */
BEGIN_FTR_SECTION
+ b 91f
+END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_EMUL, 0)
/*
* NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR
*/
bl kvmppc_restore_tm
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
+91:
#endif
/* load up FP state */
@@ -3046,6 +3103,15 @@ kvmppc_save_tm:
std r1, HSTATE_HOST_R1(r13)
li r3, TM_CAUSE_KVM_RESCHED
+BEGIN_FTR_SECTION
+ /* Emulation of the treclaim instruction needs TEXASR before treclaim */
+ mfspr r6, SPRN_TEXASR
+ std r6, VCPU_ORIG_TEXASR(r9)
+
+ rldicl. r8, r8, 64 - MSR_TS_S_LG, 62
+ beq 3f
+END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
+
/* Clear the MSR RI since r1, r13 are all going to be foobar. */
li r5, 0
mtmsrd r5, 1
@@ -3057,6 +3123,38 @@ kvmppc_save_tm:
SET_SCRATCH0(r13)
GET_PACA(r13)
std r9, PACATMSCRATCH(r13)
+
+ /* If doing TM emulation on POWER9 DD2.2, check for fake suspend mode */
+BEGIN_FTR_SECTION
+3:
+ lbz r9, HSTATE_FAKE_SUSPEND(r13)
+ cmpwi r9, 0
+ beq 2f
+ /*
+ * We were in fake suspend, so we are not going to save the
+ * register state as the guest checkpointed state (since
+ * we already have it), therefore we can now use any volatile GPR.
+ */
+ /* Reload stack pointer and TOC. */
+ ld r1, HSTATE_HOST_R1(r13)
+ ld r2, PACATOC(r13)
+ li r5, MSR_RI
+ mtmsrd r5, 1
+ HMT_MEDIUM
+ ld r6, HSTATE_DSCR(r13)
+ mtspr SPRN_DSCR, r6
+ li r0, 0
+ stb r0, HSTATE_FAKE_SUSPEND(r13)
+ mfspr r3, SPRN_PSSCR
+ /* PSSCR_FAKE_SUSPEND is a write-only bit, but clear it anyway */
+ li r0, PSSCR_FAKE_SUSPEND
+ andc r3, r3, r0
+ mtspr SPRN_PSSCR, r3
+ ld r9, HSTATE_KVM_VCPU(r13)
+ b 1f
+2:
+END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
+
ld r9, HSTATE_KVM_VCPU(r13)
/* Get a few more GPRs free. */
@@ -3182,6 +3280,15 @@ kvmppc_restore_tm:
mtspr SPRN_TEXASR, r7
/*
+ * If we are doing TM emulation for the guest on a POWER9 DD2,
+ * then we don't actually do a trechkpt -- we either set up
+ * fake-suspend mode, or emulate a TM rollback.
+ */
+BEGIN_FTR_SECTION
+ b .Ldo_tm_fake_load
+END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
+
+ /*
* We need to load up the checkpointed state for the guest.
* We need to do this early as it will blow away any GPRs, VSRs and
* some SPRs.
@@ -3253,10 +3360,24 @@ kvmppc_restore_tm:
/* Set the MSR RI since we have our registers back. */
li r5, MSR_RI
mtmsrd r5, 1
-
+9:
ld r0, PPC_LR_STKOFF(r1)
mtlr r0
blr
+
+.Ldo_tm_fake_load:
+ cmpwi r5, 1 /* check for suspended state */
+ bgt 10f
+ stb r5, HSTATE_FAKE_SUSPEND(r13)
+ b 9b /* and return */
+10: stdu r1, -PPC_MIN_STKFRM(r1)
+ /* guest is in transactional state, so simulate rollback */
+ mr r3, r4
+ bl kvmhv_emulate_tm_rollback
+ nop
+ ld r4, HSTATE_KVM_VCPU(r13) /* our vcpu pointer has been trashed */
+ addi r1, r1, PPC_MIN_STKFRM
+ b 9b
#endif
/*
diff --git a/arch/powerpc/kvm/book3s_hv_tm.c b/arch/powerpc/kvm/book3s_hv_tm.c
new file mode 100644
index 0000000..bf710ad
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_tm.c
@@ -0,0 +1,216 @@
+/*
+ * Copyright 2017 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_book3s_64.h>
+#include <asm/reg.h>
+#include <asm/ppc-opcode.h>
+
+static void emulate_tx_failure(struct kvm_vcpu *vcpu, u64 failure_cause)
+{
+ u64 texasr, tfiar;
+ u64 msr = vcpu->arch.shregs.msr;
+
+ tfiar = vcpu->arch.pc & ~0x3ull;
+ texasr = (failure_cause << 56) | TEXASR_ABORT | TEXASR_FS | TEXASR_EXACT;
+ if (MSR_TM_SUSPENDED(vcpu->arch.shregs.msr))
+ texasr |= TEXASR_SUSP;
+ if (msr & MSR_PR) {
+ texasr |= TEXASR_PR;
+ tfiar |= 1;
+ }
+ vcpu->arch.tfiar = tfiar;
+ /* Preserve ROT and TL fields of existing TEXASR */
+ vcpu->arch.texasr = (vcpu->arch.texasr & 0x3ffffff) | texasr;
+}
+
+/*
+ * This gets called on a softpatch interrupt on POWER9 DD2.2 processors.
+ * We expect to find a TM-related instruction to be emulated. The
+ * instruction image is in vcpu->arch.emul_inst. If the guest was in
+ * TM suspended or transactional state, the checkpointed state has been
+ * reclaimed and is in the vcpu struct. The CPU is in virtual mode in
+ * host context.
+ */
+int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
+{
+ u32 instr = vcpu->arch.emul_inst;
+ u64 msr = vcpu->arch.shregs.msr;
+ u64 newmsr, bescr;
+ int ra, rs;
+
+ switch (instr & 0xfc0007ff) {
+ case PPC_INST_RFID:
+ /* XXX do we need to check for PR=0 here? */
+ newmsr = vcpu->arch.shregs.srr1;
+ /* should only get here for Sx -> T1 transition */
+ WARN_ON_ONCE(!(MSR_TM_SUSPENDED(msr) &&
+ MSR_TM_TRANSACTIONAL(newmsr) &&
+ (newmsr & MSR_TM)));
+ newmsr = sanitize_msr(newmsr);
+ vcpu->arch.shregs.msr = newmsr;
+ vcpu->arch.cfar = vcpu->arch.pc - 4;
+ vcpu->arch.pc = vcpu->arch.shregs.srr0;
+ return RESUME_GUEST;
+
+ case PPC_INST_RFEBB:
+ if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ /* check EBB facility is available */
+ if (!(vcpu->arch.hfscr & HFSCR_EBB)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ if ((msr & MSR_PR) && !(vcpu->arch.fscr & FSCR_EBB)) {
+ /* generate a facility unavailable interrupt */
+ vcpu->arch.fscr = (vcpu->arch.fscr & ~(0xffull << 56)) |
+ ((u64)FSCR_EBB_LG << 56);
+ kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_FAC_UNAVAIL);
+ return RESUME_GUEST;
+ }
+ bescr = vcpu->arch.bescr;
+ /* expect to see a S->T transition requested */
+ WARN_ON_ONCE(!(MSR_TM_SUSPENDED(msr) &&
+ ((bescr >> 30) & 3) = 2));
+ bescr &= ~BESCR_GE;
+ if (instr & (1 << 11))
+ bescr |= BESCR_GE;
+ vcpu->arch.bescr = bescr;
+ msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
+ vcpu->arch.shregs.msr = msr;
+ vcpu->arch.cfar = vcpu->arch.pc - 4;
+ vcpu->arch.pc = vcpu->arch.ebbrr;
+ return RESUME_GUEST;
+
+ case PPC_INST_MTMSRD:
+ /* XXX do we need to check for PR=0 here? */
+ rs = (instr >> 21) & 0x1f;
+ newmsr = kvmppc_get_gpr(vcpu, rs);
+ /* check this is a Sx -> T1 transition */
+ WARN_ON_ONCE(!(MSR_TM_SUSPENDED(msr) &&
+ MSR_TM_TRANSACTIONAL(newmsr) &&
+ (newmsr & MSR_TM)));
+ /* mtmsrd doesn't change LE */
+ newmsr = (newmsr & ~MSR_LE) | (msr & MSR_LE);
+ newmsr = sanitize_msr(newmsr);
+ vcpu->arch.shregs.msr = newmsr;
+ return RESUME_GUEST;
+
+ case PPC_INST_TSR:
+ /* check for PR=1 and arch 2.06 bit set in PCR */
+ if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ /* check for TM disabled in the HFSCR or MSR */
+ if (!(vcpu->arch.hfscr & HFSCR_TM)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ if (!(msr & MSR_TM)) {
+ /* generate a facility unavailable interrupt */
+ vcpu->arch.fscr = (vcpu->arch.fscr & ~(0xffull << 56)) |
+ ((u64)FSCR_TM_LG << 56);
+ kvmppc_book3s_queue_irqprio(vcpu,
+ BOOK3S_INTERRUPT_FAC_UNAVAIL);
+ return RESUME_GUEST;
+ }
+ /* Set CR0 to indicate previous transactional state */
+ vcpu->arch.cr = (vcpu->arch.cr & 0x0fffffff) |
+ (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 28);
+ /* L=1 => tresume, L=0 => tsuspend */
+ if (instr & (1 << 21)) {
+ if (MSR_TM_SUSPENDED(msr))
+ msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
+ } else {
+ if (MSR_TM_TRANSACTIONAL(msr))
+ msr = (msr & ~MSR_TS_MASK) | MSR_TS_S;
+ }
+ vcpu->arch.shregs.msr = msr;
+ return RESUME_GUEST;
+
+ case PPC_INST_TRECLAIM:
+ /* check for TM disabled in the HFSCR or MSR */
+ if (!(vcpu->arch.hfscr & HFSCR_TM)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ if (!(msr & MSR_TM)) {
+ /* generate a facility unavailable interrupt */
+ vcpu->arch.fscr = (vcpu->arch.fscr & ~(0xffull << 56)) |
+ ((u64)FSCR_TM_LG << 56);
+ kvmppc_book3s_queue_irqprio(vcpu,
+ BOOK3S_INTERRUPT_FAC_UNAVAIL);
+ return RESUME_GUEST;
+ }
+ /* If no transaction active, generate TM bad thing */
+ if (!MSR_TM_ACTIVE(msr)) {
+ kvmppc_core_queue_program(vcpu, SRR1_PROGTM);
+ return RESUME_GUEST;
+ }
+ /* If failure was not previously recorded, recompute TEXASR */
+ if (!(vcpu->arch.orig_texasr & TEXASR_FS)) {
+ ra = (instr >> 16) & 0x1f;
+ if (ra)
+ ra = kvmppc_get_gpr(vcpu, ra) & 0xff;
+ emulate_tx_failure(vcpu, ra);
+ }
+
+ copy_from_checkpoint(vcpu);
+
+ /* Set CR0 to indicate previous transactional state */
+ vcpu->arch.cr = (vcpu->arch.cr & 0x0fffffff) |
+ (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 28);
+ vcpu->arch.shregs.msr &= ~MSR_TS_MASK;
+ return RESUME_GUEST;
+
+ case PPC_INST_TRECHKPT:
+ /* XXX do we need to check for PR=0 here? */
+ /* check for TM disabled in the HFSCR or MSR */
+ if (!(vcpu->arch.hfscr & HFSCR_TM)) {
+ /* generate an illegal instruction interrupt */
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ return RESUME_GUEST;
+ }
+ if (!(msr & MSR_TM)) {
+ /* generate a facility unavailable interrupt */
+ vcpu->arch.fscr = (vcpu->arch.fscr & ~(0xffull << 56)) |
+ ((u64)FSCR_TM_LG << 56);
+ kvmppc_book3s_queue_irqprio(vcpu,
+ BOOK3S_INTERRUPT_FAC_UNAVAIL);
+ return RESUME_GUEST;
+ }
+ /* If transaction active or TEXASR[FS] = 0, bad thing */
+ if (MSR_TM_ACTIVE(msr) || !(vcpu->arch.texasr & TEXASR_FS)) {
+ kvmppc_core_queue_program(vcpu, SRR1_PROGTM);
+ return RESUME_GUEST;
+ }
+
+ copy_to_checkpoint(vcpu);
+
+ /* Set CR0 to indicate previous transactional state */
+ vcpu->arch.cr = (vcpu->arch.cr & 0x0fffffff) |
+ (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 28);
+ vcpu->arch.shregs.msr = msr | MSR_TS_S;
+ return RESUME_GUEST;
+ }
+
+ /* What should we do here? We didn't recognize the instruction */
+ WARN_ON_ONCE(1);
+ return RESUME_GUEST;
+}
diff --git a/arch/powerpc/kvm/book3s_hv_tm_builtin.c b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
new file mode 100644
index 0000000..d98ccfd
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright 2017 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_book3s_64.h>
+#include <asm/reg.h>
+#include <asm/ppc-opcode.h>
+
+/*
+ * This handles the cases where the guest is in real suspend mode
+ * and we want to get back to the guest without dooming the transaction.
+ * The caller has checked that the guest is in real-suspend mode
+ * (MSR[TS] = S and the fake-suspend flag is not set).
+ */
+int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
+{
+ u32 instr = vcpu->arch.emul_inst;
+ u64 newmsr, msr, bescr;
+ int rs;
+
+ switch (instr & 0xfc0007ff) {
+ case PPC_INST_RFID:
+ /* XXX do we need to check for PR=0 here? */
+ newmsr = vcpu->arch.shregs.srr1;
+ /* should only get here for Sx -> T1 transition */
+ if (!(MSR_TM_TRANSACTIONAL(newmsr) && (newmsr & MSR_TM)))
+ return 0;
+ newmsr = sanitize_msr(newmsr);
+ vcpu->arch.shregs.msr = newmsr;
+ vcpu->arch.cfar = vcpu->arch.pc - 4;
+ vcpu->arch.pc = vcpu->arch.shregs.srr0;
+ return 1;
+
+ case PPC_INST_RFEBB:
+ /* check for PR=1 and arch 2.06 bit set in PCR */
+ msr = vcpu->arch.shregs.msr;
+ if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206))
+ return 0;
+ /* check EBB facility is available */
+ if (!(vcpu->arch.hfscr & HFSCR_EBB) ||
+ ((msr & MSR_PR) && !(mfspr(SPRN_FSCR) & FSCR_EBB)))
+ return 0;
+ bescr = mfspr(SPRN_BESCR);
+ /* expect to see a S->T transition requested */
+ if (((bescr >> 30) & 3) != 2)
+ return 0;
+ bescr &= ~BESCR_GE;
+ if (instr & (1 << 11))
+ bescr |= BESCR_GE;
+ mtspr(SPRN_BESCR, bescr);
+ msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
+ vcpu->arch.shregs.msr = msr;
+ vcpu->arch.cfar = vcpu->arch.pc - 4;
+ vcpu->arch.pc = mfspr(SPRN_EBBRR);
+ return 1;
+
+ case PPC_INST_MTMSRD:
+ /* XXX do we need to check for PR=0 here? */
+ rs = (instr >> 21) & 0x1f;
+ newmsr = kvmppc_get_gpr(vcpu, rs);
+ msr = vcpu->arch.shregs.msr;
+ /* check this is a Sx -> T1 transition */
+ if (!(MSR_TM_TRANSACTIONAL(newmsr) && (newmsr & MSR_TM)))
+ return 0;
+ /* mtmsrd doesn't change LE */
+ newmsr = (newmsr & ~MSR_LE) | (msr & MSR_LE);
+ newmsr = sanitize_msr(newmsr);
+ vcpu->arch.shregs.msr = newmsr;
+ return 1;
+
+ case PPC_INST_TSR:
+ /* we know the MSR has the TS field = S (0b01) here */
+ msr = vcpu->arch.shregs.msr;
+ /* check for PR=1 and arch 2.06 bit set in PCR */
+ if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206))
+ return 0;
+ /* check for TM disabled in the HFSCR or MSR */
+ if (!(vcpu->arch.hfscr & HFSCR_TM) || !(msr & MSR_TM))
+ return 0;
+ /* L=1 => tresume => set TS to T (0b10) */
+ if (instr & (1 << 21))
+ vcpu->arch.shregs.msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
+ /* Set CR0 to 0b0010 */
+ vcpu->arch.cr = (vcpu->arch.cr & 0x0fffffff) | 0x20000000;
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * This is called when we are returning to a guest in TM transactional
+ * state. We roll the guest state back to the checkpointed state.
+ */
+void kvmhv_emulate_tm_rollback(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.shregs.msr &= ~MSR_TS_MASK; /* go to N state */
+ vcpu->arch.pc = vcpu->arch.tfhar;
+ copy_from_checkpoint(vcpu);
+ vcpu->arch.cr = (vcpu->arch.cr & 0x0fffffff) | 0xa0000000;
+}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 52c2053..14e3265 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -646,10 +646,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = hv_enabled;
break;
#endif
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
case KVM_CAP_PPC_HTM:
r = hv_enabled &&
- (cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_HTM_COMP);
+ (!!(cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_HTM) ||
+ cpu_has_feature(CPU_FTR_P9_TM_EMUL));
break;
+#endif
default:
r = 0;
break;
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 4/5] KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode
2018-03-08 7:02 ` Paul Mackerras
@ 2018-03-08 7:02 ` Paul Mackerras
-1 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
where a treclaim performed in fake suspend mode can cause subsequent
reads from the XER register to return inconsistent values for the SO
(summary overflow) bit. The inconsistent SO bit state can potentially
be observed on any thread in the core. We have to do the treclaim
because that is the only way to get the thread out of suspend state
(fake or real) and into non-transactional state.
The workaround for the bug is to force the core into SMT4 mode before
doing the treclaim. This patch adds the code to do that.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f73eba6..7b932f1 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -3089,6 +3089,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
kvmppc_save_tm:
mflr r0
std r0, PPC_LR_STKOFF(r1)
+ stdu r1, -PPC_MIN_STKFRM(r1)
/* Turn on TM. */
mfmsr r8
@@ -3108,8 +3109,14 @@ BEGIN_FTR_SECTION
mfspr r6, SPRN_TEXASR
std r6, VCPU_ORIG_TEXASR(r9)
- rldicl. r8, r8, 64 - MSR_TS_S_LG, 62
+ lbz r0, HSTATE_FAKE_SUSPEND(r13) /* Were we fake suspended? */
+ cmpwi r0, 0
beq 3f
+ rldicl. r8, r8, 64 - MSR_TS_S_LG, 62 /* Did we actually hrfid? */
+ beq 4f
+ bl pnv_power9_force_smt4_catch
+ nop
+3:
END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
/* Clear the MSR RI since r1, r13 are all going to be foobar. */
@@ -3126,7 +3133,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
/* If doing TM emulation on POWER9 DD2.2, check for fake suspend mode */
BEGIN_FTR_SECTION
-3:
lbz r9, HSTATE_FAKE_SUSPEND(r13)
cmpwi r9, 0
beq 2f
@@ -3138,13 +3144,16 @@ BEGIN_FTR_SECTION
/* Reload stack pointer and TOC. */
ld r1, HSTATE_HOST_R1(r13)
ld r2, PACATOC(r13)
+ /* Set MSR RI now we have r1 and r13 back. */
li r5, MSR_RI
mtmsrd r5, 1
HMT_MEDIUM
ld r6, HSTATE_DSCR(r13)
mtspr SPRN_DSCR, r6
- li r0, 0
- stb r0, HSTATE_FAKE_SUSPEND(r13)
+ bl pnv_power9_force_smt4_release
+ nop
+
+4:
mfspr r3, SPRN_PSSCR
/* PSSCR_FAKE_SUSPEND is a write-only bit, but clear it anyway */
li r0, PSSCR_FAKE_SUSPEND
@@ -3232,6 +3241,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
std r6, VCPU_TFIAR(r9)
std r7, VCPU_TEXASR(r9)
+ addi r1, r1, PPC_MIN_STKFRM
ld r0, PPC_LR_STKOFF(r1)
mtlr r0
blr
@@ -3266,6 +3276,8 @@ kvmppc_restore_tm:
mtspr SPRN_TFIAR, r6
mtspr SPRN_TEXASR, r7
+ li r0, 0
+ stb r0, HSTATE_FAKE_SUSPEND(r13)
ld r5, VCPU_MSR(r4)
rldicl. r5, r5, 64 - MSR_TS_S_LG, 62
beqlr /* TM not active in guest */
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 4/5] KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode
@ 2018-03-08 7:02 ` Paul Mackerras
0 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
where a treclaim performed in fake suspend mode can cause subsequent
reads from the XER register to return inconsistent values for the SO
(summary overflow) bit. The inconsistent SO bit state can potentially
be observed on any thread in the core. We have to do the treclaim
because that is the only way to get the thread out of suspend state
(fake or real) and into non-transactional state.
The workaround for the bug is to force the core into SMT4 mode before
doing the treclaim. This patch adds the code to do that.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f73eba6..7b932f1 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -3089,6 +3089,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
kvmppc_save_tm:
mflr r0
std r0, PPC_LR_STKOFF(r1)
+ stdu r1, -PPC_MIN_STKFRM(r1)
/* Turn on TM. */
mfmsr r8
@@ -3108,8 +3109,14 @@ BEGIN_FTR_SECTION
mfspr r6, SPRN_TEXASR
std r6, VCPU_ORIG_TEXASR(r9)
- rldicl. r8, r8, 64 - MSR_TS_S_LG, 62
+ lbz r0, HSTATE_FAKE_SUSPEND(r13) /* Were we fake suspended? */
+ cmpwi r0, 0
beq 3f
+ rldicl. r8, r8, 64 - MSR_TS_S_LG, 62 /* Did we actually hrfid? */
+ beq 4f
+ bl pnv_power9_force_smt4_catch
+ nop
+3:
END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
/* Clear the MSR RI since r1, r13 are all going to be foobar. */
@@ -3126,7 +3133,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
/* If doing TM emulation on POWER9 DD2.2, check for fake suspend mode */
BEGIN_FTR_SECTION
-3:
lbz r9, HSTATE_FAKE_SUSPEND(r13)
cmpwi r9, 0
beq 2f
@@ -3138,13 +3144,16 @@ BEGIN_FTR_SECTION
/* Reload stack pointer and TOC. */
ld r1, HSTATE_HOST_R1(r13)
ld r2, PACATOC(r13)
+ /* Set MSR RI now we have r1 and r13 back. */
li r5, MSR_RI
mtmsrd r5, 1
HMT_MEDIUM
ld r6, HSTATE_DSCR(r13)
mtspr SPRN_DSCR, r6
- li r0, 0
- stb r0, HSTATE_FAKE_SUSPEND(r13)
+ bl pnv_power9_force_smt4_release
+ nop
+
+4:
mfspr r3, SPRN_PSSCR
/* PSSCR_FAKE_SUSPEND is a write-only bit, but clear it anyway */
li r0, PSSCR_FAKE_SUSPEND
@@ -3232,6 +3241,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
std r6, VCPU_TFIAR(r9)
std r7, VCPU_TEXASR(r9)
+ addi r1, r1, PPC_MIN_STKFRM
ld r0, PPC_LR_STKOFF(r1)
mtlr r0
blr
@@ -3266,6 +3276,8 @@ kvmppc_restore_tm:
mtspr SPRN_TFIAR, r6
mtspr SPRN_TEXASR, r7
+ li r0, 0
+ stb r0, HSTATE_FAKE_SUSPEND(r13)
ld r5, VCPU_MSR(r4)
rldicl. r5, r5, 64 - MSR_TS_S_LG, 62
beqlr /* TM not active in guest */
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 5/5] KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend mode
2018-03-08 7:02 ` Paul Mackerras
@ 2018-03-08 7:02 ` Paul Mackerras
-1 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
where the contents of the TEXASR can get corrupted while a thread is
in fake suspend state. The workaround is for the instruction emulation
code to use the value saved at the most recent guest exit in real
suspend mode. We achieve this by simply not saving the TEXASR into
the vcpu struct on an exit in fake suspend state. We also have to
take care to set the orig_texasr field only on guest exit in real
suspend state.
This also means that on guest entry in fake suspend state, TEXASR
will be restored to the value it had on the last exit in real suspend
state, effectively counteracting any hardware-caused corruption. This
works because TEXASR may not be written in suspend state.
With this, the guest might see the wrong values in TEXASR if it reads
it while in suspend state, but will see the correct value in
non-transactional state (e.g. after a treclaim), and treclaim will
work correctly.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 7b932f1..f374073 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -3105,10 +3105,6 @@ kvmppc_save_tm:
li r3, TM_CAUSE_KVM_RESCHED
BEGIN_FTR_SECTION
- /* Emulation of the treclaim instruction needs TEXASR before treclaim */
- mfspr r6, SPRN_TEXASR
- std r6, VCPU_ORIG_TEXASR(r9)
-
lbz r0, HSTATE_FAKE_SUSPEND(r13) /* Were we fake suspended? */
cmpwi r0, 0
beq 3f
@@ -3116,7 +3112,12 @@ BEGIN_FTR_SECTION
beq 4f
bl pnv_power9_force_smt4_catch
nop
+ b 6f
3:
+ /* Emulation of the treclaim instruction needs TEXASR before treclaim */
+ mfspr r6, SPRN_TEXASR
+ std r6, VCPU_ORIG_TEXASR(r9)
+6:
END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
/* Clear the MSR RI since r1, r13 are all going to be foobar. */
@@ -3160,7 +3161,8 @@ BEGIN_FTR_SECTION
andc r3, r3, r0
mtspr SPRN_PSSCR, r3
ld r9, HSTATE_KVM_VCPU(r13)
- b 1f
+ /* Don't save TEXASR, use value from last exit in real suspend state */
+ b 11f
2:
END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
@@ -3234,12 +3236,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
* change these outside of a transaction, so they must always be
* context switched.
*/
+ mfspr r7, SPRN_TEXASR
+ std r7, VCPU_TEXASR(r9)
+11:
mfspr r5, SPRN_TFHAR
mfspr r6, SPRN_TFIAR
- mfspr r7, SPRN_TEXASR
std r5, VCPU_TFHAR(r9)
std r6, VCPU_TFIAR(r9)
- std r7, VCPU_TEXASR(r9)
addi r1, r1, PPC_MIN_STKFRM
ld r0, PPC_LR_STKOFF(r1)
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 5/5] KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend mode
@ 2018-03-08 7:02 ` Paul Mackerras
0 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2018-03-08 7:02 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
where the contents of the TEXASR can get corrupted while a thread is
in fake suspend state. The workaround is for the instruction emulation
code to use the value saved at the most recent guest exit in real
suspend mode. We achieve this by simply not saving the TEXASR into
the vcpu struct on an exit in fake suspend state. We also have to
take care to set the orig_texasr field only on guest exit in real
suspend state.
This also means that on guest entry in fake suspend state, TEXASR
will be restored to the value it had on the last exit in real suspend
state, effectively counteracting any hardware-caused corruption. This
works because TEXASR may not be written in suspend state.
With this, the guest might see the wrong values in TEXASR if it reads
it while in suspend state, but will see the correct value in
non-transactional state (e.g. after a treclaim), and treclaim will
work correctly.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 7b932f1..f374073 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -3105,10 +3105,6 @@ kvmppc_save_tm:
li r3, TM_CAUSE_KVM_RESCHED
BEGIN_FTR_SECTION
- /* Emulation of the treclaim instruction needs TEXASR before treclaim */
- mfspr r6, SPRN_TEXASR
- std r6, VCPU_ORIG_TEXASR(r9)
-
lbz r0, HSTATE_FAKE_SUSPEND(r13) /* Were we fake suspended? */
cmpwi r0, 0
beq 3f
@@ -3116,7 +3112,12 @@ BEGIN_FTR_SECTION
beq 4f
bl pnv_power9_force_smt4_catch
nop
+ b 6f
3:
+ /* Emulation of the treclaim instruction needs TEXASR before treclaim */
+ mfspr r6, SPRN_TEXASR
+ std r6, VCPU_ORIG_TEXASR(r9)
+6:
END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
/* Clear the MSR RI since r1, r13 are all going to be foobar. */
@@ -3160,7 +3161,8 @@ BEGIN_FTR_SECTION
andc r3, r3, r0
mtspr SPRN_PSSCR, r3
ld r9, HSTATE_KVM_VCPU(r13)
- b 1f
+ /* Don't save TEXASR, use value from last exit in real suspend state */
+ b 11f
2:
END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
@@ -3234,12 +3236,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
* change these outside of a transaction, so they must always be
* context switched.
*/
+ mfspr r7, SPRN_TEXASR
+ std r7, VCPU_TEXASR(r9)
+11:
mfspr r5, SPRN_TFHAR
mfspr r6, SPRN_TFIAR
- mfspr r7, SPRN_TEXASR
std r5, VCPU_TFHAR(r9)
std r6, VCPU_TFIAR(r9)
- std r7, VCPU_TEXASR(r9)
addi r1, r1, PPC_MIN_STKFRM
ld r0, PPC_LR_STKOFF(r1)
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-03-08 7:02 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-08 7:02 [PATCH RFC 0/5] powerpc & KVM: Work around POWER9 TM hardware bugs Paul Mackerras
2018-03-08 7:02 ` Paul Mackerras
2018-03-08 7:02 ` [PATCH RFC 1/5] powerpc: Add a CPU feature bit for TM bug workarounds on POWER9 v2.2 Paul Mackerras
2018-03-08 7:02 ` Paul Mackerras
2018-03-08 7:02 ` [PATCH RFC 2/5] powerpc/powernv: Provide a way to force a core into SMT4 mode Paul Mackerras
2018-03-08 7:02 ` Paul Mackerras
2018-03-08 7:02 ` [PATCH RFC 3/5] KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 Paul Mackerras
2018-03-08 7:02 ` Paul Mackerras
2018-03-08 7:02 ` [PATCH RFC 4/5] KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode Paul Mackerras
2018-03-08 7:02 ` Paul Mackerras
2018-03-08 7:02 ` [PATCH RFC 5/5] KVM: PPC: Book3S HV: Work around TEXASR " Paul Mackerras
2018-03-08 7:02 ` Paul Mackerras
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.