* [PATCH 00/13] KVM: PPC: Support POWER9 guests
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
This series of patches adds support to HV KVM for running KVM guests
on POWER9 systems. This allows us to run KVM guests that use HPT
(hashed page table) address translation and know about the POWER9
processor. With this, Suraj Jitindar Singh's recent patch series
"powerpc: add support for ISA v2.07 compat level" and suitable changes
to the user-mode driver will allow us to run guests on POWER9 in
POWER8 (or POWER7) compatibility mode.
For now we require the host to be in HPT mode (not radix).
This series of patches is based on v4.9-rc4 plus my patch "powerpc/64:
Simplify adaptation to new ISA v3.00 HPTE format" and Yongji Xie's
two-patch series "KVM: PPC: Book3S HV: Optimize for MMIO emulation".
Paul.
---
Documentation/virtual/kvm/api.txt | 2 +
arch/powerpc/include/asm/kvm_host.h | 3 +
arch/powerpc/include/asm/kvm_ppc.h | 7 +-
arch/powerpc/include/asm/mmu.h | 5 +
arch/powerpc/include/asm/opal.h | 3 +
arch/powerpc/include/asm/reg.h | 5 +
arch/powerpc/include/uapi/asm/kvm.h | 4 +
arch/powerpc/kernel/asm-offsets.c | 3 +
arch/powerpc/kvm/book3s_64_mmu_hv.c | 39 +++++--
arch/powerpc/kvm/book3s_hv.c | 140 ++++++++++++++++++++++---
arch/powerpc/kvm/book3s_hv_builtin.c | 69 +++++++++---
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 113 ++++++++++++++------
arch/powerpc/kvm/book3s_hv_rm_xics.c | 23 ++--
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 132 ++++++++++++++++-------
arch/powerpc/kvm/powerpc.c | 11 +-
arch/powerpc/mm/hash_utils_64.c | 28 +----
arch/powerpc/mm/pgtable-radix.c | 18 ++--
arch/powerpc/mm/pgtable_64.c | 33 ++++++
arch/powerpc/platforms/powernv/opal-wrappers.S | 3 +
arch/powerpc/platforms/powernv/opal.c | 2 +
20 files changed, 483 insertions(+), 160 deletions(-)
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 00/13] KVM: PPC: Support POWER9 guests
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
This series of patches adds support to HV KVM for running KVM guests
on POWER9 systems. This allows us to run KVM guests that use HPT
(hashed page table) address translation and know about the POWER9
processor. With this, Suraj Jitindar Singh's recent patch series
"powerpc: add support for ISA v2.07 compat level" and suitable changes
to the user-mode driver will allow us to run guests on POWER9 in
POWER8 (or POWER7) compatibility mode.
For now we require the host to be in HPT mode (not radix).
This series of patches is based on v4.9-rc4 plus my patch "powerpc/64:
Simplify adaptation to new ISA v3.00 HPTE format" and Yongji Xie's
two-patch series "KVM: PPC: Book3S HV: Optimize for MMIO emulation".
Paul.
---
Documentation/virtual/kvm/api.txt | 2 +
arch/powerpc/include/asm/kvm_host.h | 3 +
arch/powerpc/include/asm/kvm_ppc.h | 7 +-
arch/powerpc/include/asm/mmu.h | 5 +
arch/powerpc/include/asm/opal.h | 3 +
arch/powerpc/include/asm/reg.h | 5 +
arch/powerpc/include/uapi/asm/kvm.h | 4 +
arch/powerpc/kernel/asm-offsets.c | 3 +
arch/powerpc/kvm/book3s_64_mmu_hv.c | 39 +++++--
arch/powerpc/kvm/book3s_hv.c | 140 ++++++++++++++++++++++---
arch/powerpc/kvm/book3s_hv_builtin.c | 69 +++++++++---
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 113 ++++++++++++++------
arch/powerpc/kvm/book3s_hv_rm_xics.c | 23 ++--
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 132 ++++++++++++++++-------
arch/powerpc/kvm/powerpc.c | 11 +-
arch/powerpc/mm/hash_utils_64.c | 28 +----
arch/powerpc/mm/pgtable-radix.c | 18 ++--
arch/powerpc/mm/pgtable_64.c | 33 ++++++
arch/powerpc/platforms/powernv/opal-wrappers.S | 3 +
arch/powerpc/platforms/powernv/opal.c | 2 +
20 files changed, 483 insertions(+), 160 deletions(-)
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 01/13] powerpc/64: Add some more SPRs and SPR bits for POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
These definitions will be needed by KVM.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/reg.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 9cd4e8c..df81411 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -153,6 +153,8 @@
#define PSSCR_EC 0x00100000 /* Exit Criterion */
#define PSSCR_ESL 0x00200000 /* Enable State Loss */
#define PSSCR_SD 0x00400000 /* Status Disable */
+#define PSSCR_PLS 0xf000000000000000 /* Power-saving Level Status */
+#define PSSCR_GUEST_VIS 0xf0000000000003ff /* Guest-visible PSSCR fields */
/* Floating Point Status and Control Register (FPSCR) Fields */
#define FPSCR_FX 0x80000000 /* FPU exception summary */
@@ -236,6 +238,7 @@
#define SPRN_TEXASRU 0x83 /* '' '' '' Upper 32 */
#define TEXASR_FS __MASK(63-36) /* TEXASR Failure Summary */
#define SPRN_TFHAR 0x80 /* Transaction Failure Handler Addr */
+#define SPRN_TIDR 144 /* Thread ID register */
#define SPRN_CTRLF 0x088
#define SPRN_CTRLT 0x098
#define CTRL_CT 0xc0000000 /* current thread */
@@ -294,6 +297,7 @@
#define SPRN_HSRR1 0x13B /* Hypervisor Save/Restore 1 */
#define SPRN_LMRR 0x32D /* Load Monitor Region Register */
#define SPRN_LMSER 0x32E /* Load Monitor Section Enable Register */
+#define SPRN_ASDR 0x330 /* Access segment descriptor register */
#define SPRN_IC 0x350 /* Virtual Instruction Count */
#define SPRN_VTB 0x351 /* Virtual Time Base */
#define SPRN_LDBAR 0x352 /* LD Base Address Register */
@@ -357,6 +361,7 @@
#define LPCR_PECE2 ASM_CONST(0x0000000000001000) /* machine check etc can cause exit */
#define LPCR_MER ASM_CONST(0x0000000000000800) /* Mediated External Exception */
#define LPCR_MER_SH 11
+#define LPCR_GTSE ASM_CONST(0x0000000000000400) /* Guest Translation Shootdown Enable */
#define LPCR_TC ASM_CONST(0x0000000000000200) /* Translation control */
#define LPCR_LPES 0x0000000c
#define LPCR_LPES0 ASM_CONST(0x0000000000000008) /* LPAR Env selector 0 */
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 01/13] powerpc/64: Add some more SPRs and SPR bits for POWER9
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
These definitions will be needed by KVM.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/reg.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 9cd4e8c..df81411 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -153,6 +153,8 @@
#define PSSCR_EC 0x00100000 /* Exit Criterion */
#define PSSCR_ESL 0x00200000 /* Enable State Loss */
#define PSSCR_SD 0x00400000 /* Status Disable */
+#define PSSCR_PLS 0xf000000000000000 /* Power-saving Level Status */
+#define PSSCR_GUEST_VIS 0xf0000000000003ff /* Guest-visible PSSCR fields */
/* Floating Point Status and Control Register (FPSCR) Fields */
#define FPSCR_FX 0x80000000 /* FPU exception summary */
@@ -236,6 +238,7 @@
#define SPRN_TEXASRU 0x83 /* '' '' '' Upper 32 */
#define TEXASR_FS __MASK(63-36) /* TEXASR Failure Summary */
#define SPRN_TFHAR 0x80 /* Transaction Failure Handler Addr */
+#define SPRN_TIDR 144 /* Thread ID register */
#define SPRN_CTRLF 0x088
#define SPRN_CTRLT 0x098
#define CTRL_CT 0xc0000000 /* current thread */
@@ -294,6 +297,7 @@
#define SPRN_HSRR1 0x13B /* Hypervisor Save/Restore 1 */
#define SPRN_LMRR 0x32D /* Load Monitor Region Register */
#define SPRN_LMSER 0x32E /* Load Monitor Section Enable Register */
+#define SPRN_ASDR 0x330 /* Access segment descriptor register */
#define SPRN_IC 0x350 /* Virtual Instruction Count */
#define SPRN_VTB 0x351 /* Virtual Time Base */
#define SPRN_LDBAR 0x352 /* LD Base Address Register */
@@ -357,6 +361,7 @@
#define LPCR_PECE2 ASM_CONST(0x0000000000001000) /* machine check etc can cause exit */
#define LPCR_MER ASM_CONST(0x0000000000000800) /* Mediated External Exception */
#define LPCR_MER_SH 11
+#define LPCR_GTSE ASM_CONST(0x0000000000000400) /* Guest Translation Shootdown Enable */
#define LPCR_TC ASM_CONST(0x0000000000000200) /* Translation control */
#define LPCR_LPES 0x0000000c
#define LPCR_LPES0 ASM_CONST(0x0000000000000008) /* LPAR Env selector 0 */
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
POWER9 requires the host to set up a partition table, which is a
table in memory indexed by logical partition ID (LPID) which
contains the pointers to page tables and process tables for the
host and each guest.
This factors out the initialization of the partition table into
a single function. This code was previously duplicated between
hash_utils_64.c and pgtable-radix.c.
This provides a function for setting a partition table entry,
which is used in early MMU initialization, and will be used by
KVM whenever a guest is created. This function includes a tlbie
instruction which will flush all TLB entries for the LPID and
all caches of the partition table entry for the LPID, across the
system.
This also moves a call to memblock_set_current_limit(), which was
in radix_init_partition_table(), but has nothing to do with the
partition table. By analogy with the similar code for hash, the
call gets moved to near the end of radix__early_init_mmu(). It
now gets called when running as a guest, whereas previously it
would only be called if the kernel is running as the host.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/mmu.h | 5 +++++
arch/powerpc/mm/hash_utils_64.c | 28 ++++------------------------
arch/powerpc/mm/pgtable-radix.c | 18 ++++++------------
arch/powerpc/mm/pgtable_64.c | 33 +++++++++++++++++++++++++++++++++
4 files changed, 48 insertions(+), 36 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index e883683..060b40b 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -208,6 +208,11 @@ extern u64 ppc64_rma_size;
/* Cleanup function used by kexec */
extern void mmu_cleanup_all(void);
extern void radix__mmu_cleanup_all(void);
+
+/* Functions for creating and updating partition table on POWER9 */
+extern void mmu_partition_table_init(void);
+extern void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
+ unsigned long dw1);
#endif /* CONFIG_PPC64 */
struct mm_struct;
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 44d3c3a..b9a062f 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -792,37 +792,17 @@ static void update_hid_for_hash(void)
static void __init hash_init_partition_table(phys_addr_t hash_table,
unsigned long htab_size)
{
- unsigned long ps_field;
- unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
+ mmu_partition_table_init();
/*
- * slb llp encoding for the page size used in VPM real mode.
- * We can ignore that for lpid 0
+ * PS field (VRMA page size) is not used for LPID 0, hence set to 0.
+ * For now, UPRT is 0 and we have no segment table.
*/
- ps_field = 0;
htab_size = __ilog2(htab_size) - 18;
-
- BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
- partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
- MEMBLOCK_ALLOC_ANYWHERE));
-
- /* Initialize the Partition Table with no entries */
- memset((void *)partition_tb, 0, patb_size);
- partition_tb->patb0 = cpu_to_be64(ps_field | hash_table | htab_size);
- /*
- * FIXME!! This should be done via update_partition table
- * For now UPRT is 0 for us.
- */
- partition_tb->patb1 = 0;
+ mmu_partition_table_set_entry(0, hash_table | htab_size, 0);
pr_info("Partition table %p\n", partition_tb);
if (cpu_has_feature(CPU_FTR_POWER9_DD1))
update_hid_for_hash();
- /*
- * update partition table control register,
- * 64 K size.
- */
- mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
-
}
static void __init htab_initialize(void)
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index ed7bddc..186f1ad 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -177,23 +177,15 @@ static void __init radix_init_pgtable(void)
static void __init radix_init_partition_table(void)
{
- unsigned long rts_field;
+ unsigned long rts_field, dw0;
+ mmu_partition_table_init();
rts_field = radix__get_tree_size();
+ dw0 = rts_field | __pa(init_mm.pgd) | RADIX_PGD_INDEX_SIZE | PATB_HR;
+ mmu_partition_table_set_entry(0, dw0, 0);
- BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
- partition_tb = early_alloc_pgtable(1UL << PATB_SIZE_SHIFT);
- partition_tb->patb0 = cpu_to_be64(rts_field | __pa(init_mm.pgd) |
- RADIX_PGD_INDEX_SIZE | PATB_HR);
pr_info("Initializing Radix MMU\n");
pr_info("Partition table %p\n", partition_tb);
-
- memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
- /*
- * update partition table control register,
- * 64 K size.
- */
- mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
}
void __init radix_init_native(void)
@@ -378,6 +370,8 @@ void __init radix__early_init_mmu(void)
radix_init_partition_table();
}
+ memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
+
radix_init_pgtable();
}
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index f5e8d4e..fef0890 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -431,3 +431,36 @@ void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
}
}
#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+void mmu_partition_table_init(void)
+{
+ unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
+
+ BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
+ partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
+ MEMBLOCK_ALLOC_ANYWHERE));
+
+ /* Initialize the Partition Table with no entries */
+ memset((void *)partition_tb, 0, patb_size);
+
+ /*
+ * update partition table control register,
+ * 64 K size.
+ */
+ mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
+}
+
+void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
+ unsigned long dw1)
+{
+ partition_tb[lpid].patb0 = cpu_to_be64(dw0);
+ partition_tb[lpid].patb1 = cpu_to_be64(dw1);
+
+ /* Global flush of TLBs and partition table caches for this lpid */
+ asm volatile("ptesync");
+ asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
+ asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+}
+EXPORT_SYMBOL_GPL(mmu_partition_table_set_entry);
+#endif /* CONFIG_PPC_BOOK3S_64 */
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
POWER9 requires the host to set up a partition table, which is a
table in memory indexed by logical partition ID (LPID) which
contains the pointers to page tables and process tables for the
host and each guest.
This factors out the initialization of the partition table into
a single function. This code was previously duplicated between
hash_utils_64.c and pgtable-radix.c.
This provides a function for setting a partition table entry,
which is used in early MMU initialization, and will be used by
KVM whenever a guest is created. This function includes a tlbie
instruction which will flush all TLB entries for the LPID and
all caches of the partition table entry for the LPID, across the
system.
This also moves a call to memblock_set_current_limit(), which was
in radix_init_partition_table(), but has nothing to do with the
partition table. By analogy with the similar code for hash, the
call gets moved to near the end of radix__early_init_mmu(). It
now gets called when running as a guest, whereas previously it
would only be called if the kernel is running as the host.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/mmu.h | 5 +++++
arch/powerpc/mm/hash_utils_64.c | 28 ++++------------------------
arch/powerpc/mm/pgtable-radix.c | 18 ++++++------------
arch/powerpc/mm/pgtable_64.c | 33 +++++++++++++++++++++++++++++++++
4 files changed, 48 insertions(+), 36 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index e883683..060b40b 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -208,6 +208,11 @@ extern u64 ppc64_rma_size;
/* Cleanup function used by kexec */
extern void mmu_cleanup_all(void);
extern void radix__mmu_cleanup_all(void);
+
+/* Functions for creating and updating partition table on POWER9 */
+extern void mmu_partition_table_init(void);
+extern void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
+ unsigned long dw1);
#endif /* CONFIG_PPC64 */
struct mm_struct;
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 44d3c3a..b9a062f 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -792,37 +792,17 @@ static void update_hid_for_hash(void)
static void __init hash_init_partition_table(phys_addr_t hash_table,
unsigned long htab_size)
{
- unsigned long ps_field;
- unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
+ mmu_partition_table_init();
/*
- * slb llp encoding for the page size used in VPM real mode.
- * We can ignore that for lpid 0
+ * PS field (VRMA page size) is not used for LPID 0, hence set to 0.
+ * For now, UPRT is 0 and we have no segment table.
*/
- ps_field = 0;
htab_size = __ilog2(htab_size) - 18;
-
- BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
- partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
- MEMBLOCK_ALLOC_ANYWHERE));
-
- /* Initialize the Partition Table with no entries */
- memset((void *)partition_tb, 0, patb_size);
- partition_tb->patb0 = cpu_to_be64(ps_field | hash_table | htab_size);
- /*
- * FIXME!! This should be done via update_partition table
- * For now UPRT is 0 for us.
- */
- partition_tb->patb1 = 0;
+ mmu_partition_table_set_entry(0, hash_table | htab_size, 0);
pr_info("Partition table %p\n", partition_tb);
if (cpu_has_feature(CPU_FTR_POWER9_DD1))
update_hid_for_hash();
- /*
- * update partition table control register,
- * 64 K size.
- */
- mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
-
}
static void __init htab_initialize(void)
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index ed7bddc..186f1ad 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -177,23 +177,15 @@ static void __init radix_init_pgtable(void)
static void __init radix_init_partition_table(void)
{
- unsigned long rts_field;
+ unsigned long rts_field, dw0;
+ mmu_partition_table_init();
rts_field = radix__get_tree_size();
+ dw0 = rts_field | __pa(init_mm.pgd) | RADIX_PGD_INDEX_SIZE | PATB_HR;
+ mmu_partition_table_set_entry(0, dw0, 0);
- BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
- partition_tb = early_alloc_pgtable(1UL << PATB_SIZE_SHIFT);
- partition_tb->patb0 = cpu_to_be64(rts_field | __pa(init_mm.pgd) |
- RADIX_PGD_INDEX_SIZE | PATB_HR);
pr_info("Initializing Radix MMU\n");
pr_info("Partition table %p\n", partition_tb);
-
- memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
- /*
- * update partition table control register,
- * 64 K size.
- */
- mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
}
void __init radix_init_native(void)
@@ -378,6 +370,8 @@ void __init radix__early_init_mmu(void)
radix_init_partition_table();
}
+ memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
+
radix_init_pgtable();
}
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index f5e8d4e..fef0890 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -431,3 +431,36 @@ void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
}
}
#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+void mmu_partition_table_init(void)
+{
+ unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
+
+ BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
+ partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
+ MEMBLOCK_ALLOC_ANYWHERE));
+
+ /* Initialize the Partition Table with no entries */
+ memset((void *)partition_tb, 0, patb_size);
+
+ /*
+ * update partition table control register,
+ * 64 K size.
+ */
+ mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
+}
+
+void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
+ unsigned long dw1)
+{
+ partition_tb[lpid].patb0 = cpu_to_be64(dw0);
+ partition_tb[lpid].patb1 = cpu_to_be64(dw1);
+
+ /* Global flush of TLBs and partition table caches for this lpid */
+ asm volatile("ptesync");
+ asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
+ asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+}
+EXPORT_SYMBOL_GPL(mmu_partition_table_set_entry);
+#endif /* CONFIG_PPC_BOOK3S_64 */
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 03/13] powerpc/powernv: Define real-mode versions of OPAL XICS accessors
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
This defines real-mode versions of opal_int_get_xirr(), opal_int_eoi()
and opal_int_set_mfrr(), for use by KVM real-mode code.
It also exports opal_int_set_mfrr() so that the modular part of KVM
can use it to send IPIs.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/opal.h | 3 +++
arch/powerpc/platforms/powernv/opal-wrappers.S | 3 +++
arch/powerpc/platforms/powernv/opal.c | 2 ++
3 files changed, 8 insertions(+)
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e958b70..5c7db0f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -220,9 +220,12 @@ int64_t opal_pci_set_power_state(uint64_t async_token, uint64_t id,
int64_t opal_pci_poll2(uint64_t id, uint64_t data);
int64_t opal_int_get_xirr(uint32_t *out_xirr, bool just_poll);
+int64_t opal_rm_int_get_xirr(__be32 *out_xirr, bool just_poll);
int64_t opal_int_set_cppr(uint8_t cppr);
int64_t opal_int_eoi(uint32_t xirr);
+int64_t opal_rm_int_eoi(uint32_t xirr);
int64_t opal_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
+int64_t opal_rm_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
uint32_t pe_num, uint32_t tce_size,
uint64_t dma_addr, uint32_t npages);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 44d2d84..3aa40f1 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -304,8 +304,11 @@ OPAL_CALL(opal_pci_get_presence_state, OPAL_PCI_GET_PRESENCE_STATE);
OPAL_CALL(opal_pci_get_power_state, OPAL_PCI_GET_POWER_STATE);
OPAL_CALL(opal_pci_set_power_state, OPAL_PCI_SET_POWER_STATE);
OPAL_CALL(opal_int_get_xirr, OPAL_INT_GET_XIRR);
+OPAL_CALL_REAL(opal_rm_int_get_xirr, OPAL_INT_GET_XIRR);
OPAL_CALL(opal_int_set_cppr, OPAL_INT_SET_CPPR);
OPAL_CALL(opal_int_eoi, OPAL_INT_EOI);
+OPAL_CALL_REAL(opal_rm_int_eoi, OPAL_INT_EOI);
OPAL_CALL(opal_int_set_mfrr, OPAL_INT_SET_MFRR);
+OPAL_CALL_REAL(opal_rm_int_set_mfrr, OPAL_INT_SET_MFRR);
OPAL_CALL(opal_pci_tce_kill, OPAL_PCI_TCE_KILL);
OPAL_CALL_REAL(opal_rm_pci_tce_kill, OPAL_PCI_TCE_KILL);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 6c9a65b..b3b8930 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -896,3 +896,5 @@ EXPORT_SYMBOL_GPL(opal_leds_get_ind);
EXPORT_SYMBOL_GPL(opal_leds_set_ind);
/* Export this symbol for PowerNV Operator Panel class driver */
EXPORT_SYMBOL_GPL(opal_write_oppanel_async);
+/* Export this for KVM */
+EXPORT_SYMBOL_GPL(opal_int_set_mfrr);
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 03/13] powerpc/powernv: Define real-mode versions of OPAL XICS accessors
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
This defines real-mode versions of opal_int_get_xirr(), opal_int_eoi()
and opal_int_set_mfrr(), for use by KVM real-mode code.
It also exports opal_int_set_mfrr() so that the modular part of KVM
can use it to send IPIs.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/opal.h | 3 +++
arch/powerpc/platforms/powernv/opal-wrappers.S | 3 +++
arch/powerpc/platforms/powernv/opal.c | 2 ++
3 files changed, 8 insertions(+)
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e958b70..5c7db0f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -220,9 +220,12 @@ int64_t opal_pci_set_power_state(uint64_t async_token, uint64_t id,
int64_t opal_pci_poll2(uint64_t id, uint64_t data);
int64_t opal_int_get_xirr(uint32_t *out_xirr, bool just_poll);
+int64_t opal_rm_int_get_xirr(__be32 *out_xirr, bool just_poll);
int64_t opal_int_set_cppr(uint8_t cppr);
int64_t opal_int_eoi(uint32_t xirr);
+int64_t opal_rm_int_eoi(uint32_t xirr);
int64_t opal_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
+int64_t opal_rm_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
uint32_t pe_num, uint32_t tce_size,
uint64_t dma_addr, uint32_t npages);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 44d2d84..3aa40f1 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -304,8 +304,11 @@ OPAL_CALL(opal_pci_get_presence_state, OPAL_PCI_GET_PRESENCE_STATE);
OPAL_CALL(opal_pci_get_power_state, OPAL_PCI_GET_POWER_STATE);
OPAL_CALL(opal_pci_set_power_state, OPAL_PCI_SET_POWER_STATE);
OPAL_CALL(opal_int_get_xirr, OPAL_INT_GET_XIRR);
+OPAL_CALL_REAL(opal_rm_int_get_xirr, OPAL_INT_GET_XIRR);
OPAL_CALL(opal_int_set_cppr, OPAL_INT_SET_CPPR);
OPAL_CALL(opal_int_eoi, OPAL_INT_EOI);
+OPAL_CALL_REAL(opal_rm_int_eoi, OPAL_INT_EOI);
OPAL_CALL(opal_int_set_mfrr, OPAL_INT_SET_MFRR);
+OPAL_CALL_REAL(opal_rm_int_set_mfrr, OPAL_INT_SET_MFRR);
OPAL_CALL(opal_pci_tce_kill, OPAL_PCI_TCE_KILL);
OPAL_CALL_REAL(opal_rm_pci_tce_kill, OPAL_PCI_TCE_KILL);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 6c9a65b..b3b8930 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -896,3 +896,5 @@ EXPORT_SYMBOL_GPL(opal_leds_get_ind);
EXPORT_SYMBOL_GPL(opal_leds_set_ind);
/* Export this symbol for PowerNV Operator Panel class driver */
EXPORT_SYMBOL_GPL(opal_write_oppanel_async);
+/* Export this for KVM */
+EXPORT_SYMBOL_GPL(opal_int_set_mfrr);
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 04/13] KVM: PPC: Book3S HV: Don't lose hardware R/C bit updates in H_PROTECT
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
The hashed page table MMU in POWER processors can update the R
(reference) and C (change) bits in a HPTE at any time until the
HPTE has been invalidated and the TLB invalidation sequence has
completed. In kvmppc_h_protect, which implements the H_PROTECT
hypercall, we read the HPTE, modify the second doubleword,
invalidate the HPTE in memory, do the TLB invalidation sequence,
and then write the modified value of the second doubleword back
to memory. In doing so we could overwrite an R/C bit update done
by hardware between when we read the HPTE and when the TLB
invalidation completed. To fix this we re-read the second
doubleword after the TLB invalidation and OR in the (possibly)
new values of R and C. We can use an OR since hardware only ever
sets R and C, never clears them.
This race was found by code inspection. In principle this bug could
cause occasional guest memory corruption under host memory pressure.
Fixes: a8606e20e41a ("KVM: PPC: Handle some PAPR hcalls in the kernel", 2011-06-29)
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 752451f3..02786b3 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -670,6 +670,8 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
HPTE_V_ABSENT);
do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
true);
+ /* Don't lose R/C bit updates done by hardware */
+ r |= be64_to_cpu(hpte[1]) & (HPTE_R_R | HPTE_R_C);
hpte[1] = cpu_to_be64(r);
}
}
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 04/13] KVM: PPC: Book3S HV: Don't lose hardware R/C bit updates in H_PROTECT
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
The hashed page table MMU in POWER processors can update the R
(reference) and C (change) bits in a HPTE at any time until the
HPTE has been invalidated and the TLB invalidation sequence has
completed. In kvmppc_h_protect, which implements the H_PROTECT
hypercall, we read the HPTE, modify the second doubleword,
invalidate the HPTE in memory, do the TLB invalidation sequence,
and then write the modified value of the second doubleword back
to memory. In doing so we could overwrite an R/C bit update done
by hardware between when we read the HPTE and when the TLB
invalidation completed. To fix this we re-read the second
doubleword after the TLB invalidation and OR in the (possibly)
new values of R and C. We can use an OR since hardware only ever
sets R and C, never clears them.
This race was found by code inspection. In principle this bug could
cause occasional guest memory corruption under host memory pressure.
Fixes: a8606e20e41a ("KVM: PPC: Handle some PAPR hcalls in the kernel", 2011-06-29)
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 752451f3..02786b3 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -670,6 +670,8 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
HPTE_V_ABSENT);
do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
true);
+ /* Don't lose R/C bit updates done by hardware */
+ r |= be64_to_cpu(hpte[1]) & (HPTE_R_R | HPTE_R_C);
hpte[1] = cpu_to_be64(r);
}
}
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
This adapts the KVM-HV hashed page table (HPT) code to read and write
HPT entries in the new format defined in Power ISA v3.00 on POWER9
machines. The new format moves the B (segment size) field from the
first doubleword to the second, and trims some bits from the AVA
(abbreviated virtual address) and ARPN (abbreviated real page number)
fields. As far as possible, the conversion is done when reading or
writing the HPT entries, and the rest of the code continues to use
the old format.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_64_mmu_hv.c | 39 ++++++++++----
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 101 +++++++++++++++++++++++++-----------
2 files changed, 100 insertions(+), 40 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7755bd0..20a8e8e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -314,7 +314,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
struct kvmppc_slb *slbe;
unsigned long slb_v;
unsigned long pp, key;
- unsigned long v, gr;
+ unsigned long v, orig_v, gr;
__be64 *hptep;
int index;
int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
@@ -339,10 +339,12 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
return -ENOENT;
}
hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
- v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
+ v = orig_v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ v = hpte_new_to_old_v(v, be64_to_cpu(hptep[1]));
gr = kvm->arch.revmap[index].guest_rpte;
- unlock_hpte(hptep, v);
+ unlock_hpte(hptep, orig_v);
preempt_enable();
gpte->eaddr = eaddr;
@@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
{
struct kvm *kvm = vcpu->kvm;
unsigned long hpte[3], r;
+ unsigned long hnow_v, hnow_r;
__be64 *hptep;
unsigned long mmu_seq, psize, pte_size;
unsigned long gpa_base, gfn_base;
@@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
unlock_hpte(hptep, hpte[0]);
preempt_enable();
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
+ hpte[1] = hpte_new_to_old_r(hpte[1]);
+ }
if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
hpte[1] != vcpu->arch.pgfault_hpte[1])
return RESUME_GUEST;
@@ -599,9 +606,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
preempt_disable();
while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
cpu_relax();
- if ((be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK) != hpte[0] ||
- be64_to_cpu(hptep[1]) != hpte[1] ||
- rev->guest_rpte != hpte[2])
+ hnow_v = be64_to_cpu(hptep[0]);
+ hnow_r = be64_to_cpu(hptep[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ hnow_v = hpte_new_to_old_v(hnow_v, hnow_r);
+ hnow_r = hpte_new_to_old_r(hnow_r);
+ }
+ if ((hnow_v & ~HPTE_V_HVLOCK) != hpte[0] || hnow_r != hpte[1] ||
+ rev->guest_rpte != hpte[2])
/* HPTE has been changed under us; let the guest retry */
goto out_unlock;
hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
@@ -632,6 +644,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
}
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ r = hpte_old_to_new_r(hpte[0], r);
+ hpte[0] = hpte_old_to_new_v(hpte[0]);
+ }
hptep[1] = cpu_to_be64(r);
eieio();
__unlock_hpte(hptep, hpte[0]);
@@ -1183,7 +1199,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
unsigned long *hpte, struct revmap_entry *revp,
int want_valid, int first_pass)
{
- unsigned long v, r;
+ unsigned long v, r, hr;
unsigned long rcbits_unset;
int ok = 1;
int valid, dirty;
@@ -1210,6 +1226,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
cpu_relax();
v = be64_to_cpu(hptp[0]);
+ hr = be64_to_cpu(hptp[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ v = hpte_new_to_old_v(v, hr);
+ hr = hpte_new_to_old_r(hr);
+ }
/* re-evaluate valid and dirty from synchronized HPTE value */
valid = !!(v & HPTE_V_VALID);
@@ -1217,8 +1238,8 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
/* Harvest R and C into guest view if necessary */
rcbits_unset = ~revp->guest_rpte & (HPTE_R_R | HPTE_R_C);
- if (valid && (rcbits_unset & be64_to_cpu(hptp[1]))) {
- revp->guest_rpte |= (be64_to_cpu(hptp[1]) &
+ if (valid && (rcbits_unset & hr)) {
+ revp->guest_rpte |= (hr &
(HPTE_R_R | HPTE_R_C)) | HPTE_GR_MODIFIED;
dirty = 1;
}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 02786b3..1179e40 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -364,6 +364,11 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
}
}
+ /* Convert to new format on P9 */
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ ptel = hpte_old_to_new_r(pteh, ptel);
+ pteh = hpte_old_to_new_v(pteh);
+ }
hpte[1] = cpu_to_be64(ptel);
/* Write the first HPTE dword, unlocking the HPTE and making it valid */
@@ -445,27 +450,31 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
__be64 *hpte;
unsigned long v, r, rb;
struct revmap_entry *rev;
- u64 pte;
+ u64 pte, orig_pte, pte_r;
if (pte_index >= kvm->arch.hpt_npte)
return H_PARAMETER;
hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
cpu_relax();
- pte = be64_to_cpu(hpte[0]);
+ pte = orig_pte = be64_to_cpu(hpte[0]);
+ pte_r = be64_to_cpu(hpte[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ pte = hpte_new_to_old_v(pte, pte_r);
+ pte_r = hpte_new_to_old_r(pte_r);
+ }
if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
((flags & H_ANDCOND) && (pte & avpn) != 0)) {
- __unlock_hpte(hpte, pte);
+ __unlock_hpte(hpte, orig_pte);
return H_NOT_FOUND;
}
rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
v = pte & ~HPTE_V_HVLOCK;
- pte = be64_to_cpu(hpte[1]);
if (v & HPTE_V_VALID) {
hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
- rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
+ rb = compute_tlbie_rb(v, pte_r, pte_index);
do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
/*
* The reference (R) and change (C) bits in a HPT
@@ -483,7 +492,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
note_hpte_modification(kvm, rev);
unlock_hpte(hpte, 0);
- if (is_mmio_hpte(v, pte))
+ if (is_mmio_hpte(v, pte_r))
atomic64_inc(&kvm->arch.mmio_update);
if (v & HPTE_V_ABSENT)
@@ -546,6 +555,10 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
found = 0;
hp0 = be64_to_cpu(hp[0]);
hp1 = be64_to_cpu(hp[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ hp0 = hpte_new_to_old_v(hp0, hp1);
+ hp1 = hpte_new_to_old_r(hp1);
+ }
if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
switch (flags & 3) {
case 0: /* absolute */
@@ -583,8 +596,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
/* leave it locked */
hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
- tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
- be64_to_cpu(hp[1]), pte_index);
+ tlbrb[n] = compute_tlbie_rb(hp0, hp1, pte_index);
indexes[n] = j;
hptes[n] = hp;
revs[n] = rev;
@@ -622,7 +634,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
__be64 *hpte;
struct revmap_entry *rev;
unsigned long v, r, rb, mask, bits;
- u64 pte;
+ u64 pte_v, pte_r;
if (pte_index >= kvm->arch.hpt_npte)
return H_PARAMETER;
@@ -630,15 +642,16 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
cpu_relax();
- pte = be64_to_cpu(hpte[0]);
- if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
- ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
- __unlock_hpte(hpte, pte);
+ v = pte_v = be64_to_cpu(hpte[0]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ v = hpte_new_to_old_v(v, be64_to_cpu(hpte[1]));
+ if ((v & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
+ ((flags & H_AVPN) && (v & ~0x7fUL) != avpn)) {
+ __unlock_hpte(hpte, pte_v);
return H_NOT_FOUND;
}
- v = pte;
- pte = be64_to_cpu(hpte[1]);
+ pte_r = be64_to_cpu(hpte[1]);
bits = (flags << 55) & HPTE_R_PP0;
bits |= (flags << 48) & HPTE_R_KEY_HI;
bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
@@ -660,13 +673,13 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
* readonly to writable. If it should be writable, we'll
* take a trap and let the page fault code sort it out.
*/
- r = (pte & ~mask) | bits;
- if (hpte_is_writable(r) && !hpte_is_writable(pte))
+ r = (pte_r & ~mask) | bits;
+ if (hpte_is_writable(r) && !hpte_is_writable(pte_r))
r = hpte_make_readonly(r);
/* If the PTE is changing, invalidate it first */
- if (r != pte) {
+ if (r != pte_r) {
rb = compute_tlbie_rb(v, r, pte_index);
- hpte[0] = cpu_to_be64((v & ~HPTE_V_VALID) |
+ hpte[0] = cpu_to_be64((pte_v & ~HPTE_V_VALID) |
HPTE_V_ABSENT);
do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
true);
@@ -675,9 +688,9 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
hpte[1] = cpu_to_be64(r);
}
}
- unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
+ unlock_hpte(hpte, pte_v & ~HPTE_V_HVLOCK);
asm volatile("ptesync" : : : "memory");
- if (is_mmio_hpte(v, pte))
+ if (is_mmio_hpte(v, pte_r))
atomic64_inc(&kvm->arch.mmio_update);
return H_SUCCESS;
@@ -703,6 +716,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
r = be64_to_cpu(hpte[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ v = hpte_new_to_old_v(v, r);
+ r = hpte_new_to_old_r(r);
+ }
if (v & HPTE_V_ABSENT) {
v &= ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
@@ -820,10 +837,16 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
unsigned long pte_index)
{
unsigned long rb;
+ u64 hp0, hp1;
hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
- rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
- pte_index);
+ hp0 = be64_to_cpu(hptep[0]);
+ hp1 = be64_to_cpu(hptep[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ hp0 = hpte_new_to_old_v(hp0, hp1);
+ hp1 = hpte_new_to_old_r(hp1);
+ }
+ rb = compute_tlbie_rb(hp0, hp1, pte_index);
do_tlbies(kvm, &rb, 1, 1, true);
}
EXPORT_SYMBOL_GPL(kvmppc_invalidate_hpte);
@@ -833,9 +856,15 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
{
unsigned long rb;
unsigned char rbyte;
+ u64 hp0, hp1;
- rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
- pte_index);
+ hp0 = be64_to_cpu(hptep[0]);
+ hp1 = be64_to_cpu(hptep[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ hp0 = hpte_new_to_old_v(hp0, hp1);
+ hp1 = hpte_new_to_old_r(hp1);
+ }
+ rb = compute_tlbie_rb(hp0, hp1, pte_index);
rbyte = (be64_to_cpu(hptep[1]) & ~HPTE_R_R) >> 8;
/* modify only the second-last byte, which contains the ref bit */
*((char *)hptep + 14) = rbyte;
@@ -895,7 +924,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
unsigned long avpn;
__be64 *hpte;
unsigned long mask, val;
- unsigned long v, r;
+ unsigned long v, r, orig_v;
/* Get page shift, work out hash and AVPN etc. */
mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_SECONDARY;
@@ -930,6 +959,8 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
for (i = 0; i < 16; i += 2) {
/* Read the PTE racily */
v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ v = hpte_new_to_old_v(v, be64_to_cpu(hpte[i+1]));
/* Check valid/absent, hash, segment size and AVPN */
if (!(v & valid) || (v & mask) != val)
@@ -938,8 +969,12 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
/* Lock the PTE and read it under the lock */
while (!try_lock_hpte(&hpte[i], HPTE_V_HVLOCK))
cpu_relax();
- v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
+ v = orig_v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
r = be64_to_cpu(hpte[i+1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ v = hpte_new_to_old_v(v, r);
+ r = hpte_new_to_old_r(r);
+ }
/*
* Check the HPTE again, including base page size
@@ -949,7 +984,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
/* Return with the HPTE still locked */
return (hash << 3) + (i >> 1);
- __unlock_hpte(&hpte[i], v);
+ __unlock_hpte(&hpte[i], orig_v);
}
if (val & HPTE_V_SECONDARY)
@@ -977,7 +1012,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
{
struct kvm *kvm = vcpu->kvm;
long int index;
- unsigned long v, r, gr;
+ unsigned long v, r, gr, orig_v;
__be64 *hpte;
unsigned long valid;
struct revmap_entry *rev;
@@ -1005,12 +1040,16 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
return 0; /* for prot fault, HPTE disappeared */
}
hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
- v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
+ v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
r = be64_to_cpu(hpte[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ v = hpte_new_to_old_v(v, r);
+ r = hpte_new_to_old_r(r);
+ }
rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
gr = rev->guest_rpte;
- unlock_hpte(hpte, v);
+ unlock_hpte(hpte, orig_v);
}
/* For not found, if the HPTE is valid by now, retry the instruction */
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
This adapts the KVM-HV hashed page table (HPT) code to read and write
HPT entries in the new format defined in Power ISA v3.00 on POWER9
machines. The new format moves the B (segment size) field from the
first doubleword to the second, and trims some bits from the AVA
(abbreviated virtual address) and ARPN (abbreviated real page number)
fields. As far as possible, the conversion is done when reading or
writing the HPT entries, and the rest of the code continues to use
the old format.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_64_mmu_hv.c | 39 ++++++++++----
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 101 +++++++++++++++++++++++++-----------
2 files changed, 100 insertions(+), 40 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7755bd0..20a8e8e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -314,7 +314,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
struct kvmppc_slb *slbe;
unsigned long slb_v;
unsigned long pp, key;
- unsigned long v, gr;
+ unsigned long v, orig_v, gr;
__be64 *hptep;
int index;
int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
@@ -339,10 +339,12 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
return -ENOENT;
}
hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
- v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
+ v = orig_v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ v = hpte_new_to_old_v(v, be64_to_cpu(hptep[1]));
gr = kvm->arch.revmap[index].guest_rpte;
- unlock_hpte(hptep, v);
+ unlock_hpte(hptep, orig_v);
preempt_enable();
gpte->eaddr = eaddr;
@@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
{
struct kvm *kvm = vcpu->kvm;
unsigned long hpte[3], r;
+ unsigned long hnow_v, hnow_r;
__be64 *hptep;
unsigned long mmu_seq, psize, pte_size;
unsigned long gpa_base, gfn_base;
@@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
unlock_hpte(hptep, hpte[0]);
preempt_enable();
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
+ hpte[1] = hpte_new_to_old_r(hpte[1]);
+ }
if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
hpte[1] != vcpu->arch.pgfault_hpte[1])
return RESUME_GUEST;
@@ -599,9 +606,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
preempt_disable();
while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
cpu_relax();
- if ((be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK) != hpte[0] ||
- be64_to_cpu(hptep[1]) != hpte[1] ||
- rev->guest_rpte != hpte[2])
+ hnow_v = be64_to_cpu(hptep[0]);
+ hnow_r = be64_to_cpu(hptep[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ hnow_v = hpte_new_to_old_v(hnow_v, hnow_r);
+ hnow_r = hpte_new_to_old_r(hnow_r);
+ }
+ if ((hnow_v & ~HPTE_V_HVLOCK) != hpte[0] || hnow_r != hpte[1] ||
+ rev->guest_rpte != hpte[2])
/* HPTE has been changed under us; let the guest retry */
goto out_unlock;
hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
@@ -632,6 +644,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
}
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ r = hpte_old_to_new_r(hpte[0], r);
+ hpte[0] = hpte_old_to_new_v(hpte[0]);
+ }
hptep[1] = cpu_to_be64(r);
eieio();
__unlock_hpte(hptep, hpte[0]);
@@ -1183,7 +1199,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
unsigned long *hpte, struct revmap_entry *revp,
int want_valid, int first_pass)
{
- unsigned long v, r;
+ unsigned long v, r, hr;
unsigned long rcbits_unset;
int ok = 1;
int valid, dirty;
@@ -1210,6 +1226,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
cpu_relax();
v = be64_to_cpu(hptp[0]);
+ hr = be64_to_cpu(hptp[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ v = hpte_new_to_old_v(v, hr);
+ hr = hpte_new_to_old_r(hr);
+ }
/* re-evaluate valid and dirty from synchronized HPTE value */
valid = !!(v & HPTE_V_VALID);
@@ -1217,8 +1238,8 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
/* Harvest R and C into guest view if necessary */
rcbits_unset = ~revp->guest_rpte & (HPTE_R_R | HPTE_R_C);
- if (valid && (rcbits_unset & be64_to_cpu(hptp[1]))) {
- revp->guest_rpte |= (be64_to_cpu(hptp[1]) &
+ if (valid && (rcbits_unset & hr)) {
+ revp->guest_rpte |= (hr &
(HPTE_R_R | HPTE_R_C)) | HPTE_GR_MODIFIED;
dirty = 1;
}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 02786b3..1179e40 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -364,6 +364,11 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
}
}
+ /* Convert to new format on P9 */
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ ptel = hpte_old_to_new_r(pteh, ptel);
+ pteh = hpte_old_to_new_v(pteh);
+ }
hpte[1] = cpu_to_be64(ptel);
/* Write the first HPTE dword, unlocking the HPTE and making it valid */
@@ -445,27 +450,31 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
__be64 *hpte;
unsigned long v, r, rb;
struct revmap_entry *rev;
- u64 pte;
+ u64 pte, orig_pte, pte_r;
if (pte_index >= kvm->arch.hpt_npte)
return H_PARAMETER;
hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
cpu_relax();
- pte = be64_to_cpu(hpte[0]);
+ pte = orig_pte = be64_to_cpu(hpte[0]);
+ pte_r = be64_to_cpu(hpte[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ pte = hpte_new_to_old_v(pte, pte_r);
+ pte_r = hpte_new_to_old_r(pte_r);
+ }
if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
((flags & H_ANDCOND) && (pte & avpn) != 0)) {
- __unlock_hpte(hpte, pte);
+ __unlock_hpte(hpte, orig_pte);
return H_NOT_FOUND;
}
rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
v = pte & ~HPTE_V_HVLOCK;
- pte = be64_to_cpu(hpte[1]);
if (v & HPTE_V_VALID) {
hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
- rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
+ rb = compute_tlbie_rb(v, pte_r, pte_index);
do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
/*
* The reference (R) and change (C) bits in a HPT
@@ -483,7 +492,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
note_hpte_modification(kvm, rev);
unlock_hpte(hpte, 0);
- if (is_mmio_hpte(v, pte))
+ if (is_mmio_hpte(v, pte_r))
atomic64_inc(&kvm->arch.mmio_update);
if (v & HPTE_V_ABSENT)
@@ -546,6 +555,10 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
found = 0;
hp0 = be64_to_cpu(hp[0]);
hp1 = be64_to_cpu(hp[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ hp0 = hpte_new_to_old_v(hp0, hp1);
+ hp1 = hpte_new_to_old_r(hp1);
+ }
if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
switch (flags & 3) {
case 0: /* absolute */
@@ -583,8 +596,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
/* leave it locked */
hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
- tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
- be64_to_cpu(hp[1]), pte_index);
+ tlbrb[n] = compute_tlbie_rb(hp0, hp1, pte_index);
indexes[n] = j;
hptes[n] = hp;
revs[n] = rev;
@@ -622,7 +634,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
__be64 *hpte;
struct revmap_entry *rev;
unsigned long v, r, rb, mask, bits;
- u64 pte;
+ u64 pte_v, pte_r;
if (pte_index >= kvm->arch.hpt_npte)
return H_PARAMETER;
@@ -630,15 +642,16 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
cpu_relax();
- pte = be64_to_cpu(hpte[0]);
- if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
- ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
- __unlock_hpte(hpte, pte);
+ v = pte_v = be64_to_cpu(hpte[0]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ v = hpte_new_to_old_v(v, be64_to_cpu(hpte[1]));
+ if ((v & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
+ ((flags & H_AVPN) && (v & ~0x7fUL) != avpn)) {
+ __unlock_hpte(hpte, pte_v);
return H_NOT_FOUND;
}
- v = pte;
- pte = be64_to_cpu(hpte[1]);
+ pte_r = be64_to_cpu(hpte[1]);
bits = (flags << 55) & HPTE_R_PP0;
bits |= (flags << 48) & HPTE_R_KEY_HI;
bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
@@ -660,13 +673,13 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
* readonly to writable. If it should be writable, we'll
* take a trap and let the page fault code sort it out.
*/
- r = (pte & ~mask) | bits;
- if (hpte_is_writable(r) && !hpte_is_writable(pte))
+ r = (pte_r & ~mask) | bits;
+ if (hpte_is_writable(r) && !hpte_is_writable(pte_r))
r = hpte_make_readonly(r);
/* If the PTE is changing, invalidate it first */
- if (r != pte) {
+ if (r != pte_r) {
rb = compute_tlbie_rb(v, r, pte_index);
- hpte[0] = cpu_to_be64((v & ~HPTE_V_VALID) |
+ hpte[0] = cpu_to_be64((pte_v & ~HPTE_V_VALID) |
HPTE_V_ABSENT);
do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
true);
@@ -675,9 +688,9 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
hpte[1] = cpu_to_be64(r);
}
}
- unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
+ unlock_hpte(hpte, pte_v & ~HPTE_V_HVLOCK);
asm volatile("ptesync" : : : "memory");
- if (is_mmio_hpte(v, pte))
+ if (is_mmio_hpte(v, pte_r))
atomic64_inc(&kvm->arch.mmio_update);
return H_SUCCESS;
@@ -703,6 +716,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
r = be64_to_cpu(hpte[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ v = hpte_new_to_old_v(v, r);
+ r = hpte_new_to_old_r(r);
+ }
if (v & HPTE_V_ABSENT) {
v &= ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
@@ -820,10 +837,16 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
unsigned long pte_index)
{
unsigned long rb;
+ u64 hp0, hp1;
hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
- rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
- pte_index);
+ hp0 = be64_to_cpu(hptep[0]);
+ hp1 = be64_to_cpu(hptep[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ hp0 = hpte_new_to_old_v(hp0, hp1);
+ hp1 = hpte_new_to_old_r(hp1);
+ }
+ rb = compute_tlbie_rb(hp0, hp1, pte_index);
do_tlbies(kvm, &rb, 1, 1, true);
}
EXPORT_SYMBOL_GPL(kvmppc_invalidate_hpte);
@@ -833,9 +856,15 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
{
unsigned long rb;
unsigned char rbyte;
+ u64 hp0, hp1;
- rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
- pte_index);
+ hp0 = be64_to_cpu(hptep[0]);
+ hp1 = be64_to_cpu(hptep[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ hp0 = hpte_new_to_old_v(hp0, hp1);
+ hp1 = hpte_new_to_old_r(hp1);
+ }
+ rb = compute_tlbie_rb(hp0, hp1, pte_index);
rbyte = (be64_to_cpu(hptep[1]) & ~HPTE_R_R) >> 8;
/* modify only the second-last byte, which contains the ref bit */
*((char *)hptep + 14) = rbyte;
@@ -895,7 +924,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
unsigned long avpn;
__be64 *hpte;
unsigned long mask, val;
- unsigned long v, r;
+ unsigned long v, r, orig_v;
/* Get page shift, work out hash and AVPN etc. */
mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_SECONDARY;
@@ -930,6 +959,8 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
for (i = 0; i < 16; i += 2) {
/* Read the PTE racily */
v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ v = hpte_new_to_old_v(v, be64_to_cpu(hpte[i+1]));
/* Check valid/absent, hash, segment size and AVPN */
if (!(v & valid) || (v & mask) != val)
@@ -938,8 +969,12 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
/* Lock the PTE and read it under the lock */
while (!try_lock_hpte(&hpte[i], HPTE_V_HVLOCK))
cpu_relax();
- v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
+ v = orig_v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
r = be64_to_cpu(hpte[i+1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ v = hpte_new_to_old_v(v, r);
+ r = hpte_new_to_old_r(r);
+ }
/*
* Check the HPTE again, including base page size
@@ -949,7 +984,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
/* Return with the HPTE still locked */
return (hash << 3) + (i >> 1);
- __unlock_hpte(&hpte[i], v);
+ __unlock_hpte(&hpte[i], orig_v);
}
if (val & HPTE_V_SECONDARY)
@@ -977,7 +1012,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
{
struct kvm *kvm = vcpu->kvm;
long int index;
- unsigned long v, r, gr;
+ unsigned long v, r, gr, orig_v;
__be64 *hpte;
unsigned long valid;
struct revmap_entry *rev;
@@ -1005,12 +1040,16 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
return 0; /* for prot fault, HPTE disappeared */
}
hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
- v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
+ v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
r = be64_to_cpu(hpte[1]);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ v = hpte_new_to_old_v(v, r);
+ r = hpte_new_to_old_r(r);
+ }
rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
gr = rev->guest_rpte;
- unlock_hpte(hpte, v);
+ unlock_hpte(hpte, orig_v);
}
/* For not found, if the HPTE is valid by now, retry the instruction */
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 06/13] KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
On POWER9, the SDR1 register (hashed page table base address) is no
longer used, and instead the hardware reads the HPT base address
and size from the partition table. The partition table entry also
contains the bits that specify the page size for the VRMA mapping,
which were previously in the LPCR. The VPM0 bit of the LPCR is
now reserved; the processor now always uses the VRMA (virtual
real-mode area) mechanism for guest real-mode accesses in HPT mode,
and the RMO (real-mode offset) mechanism has been dropped.
When entering or exiting the guest, we now only have to set the
LPIDR (logical partition ID register), not the SDR1 register.
There is also no requirement now to transition via a reserved
LPID value.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv.c | 36 +++++++++++++++++++++++++++------
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 ++++++---
2 files changed, 37 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 40b2b6d..5cbe3c3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -54,6 +54,7 @@
#include <asm/dbell.h>
#include <asm/hmi.h>
#include <asm/pnv-pci.h>
+#include <asm/mmu.h>
#include <linux/gfp.h>
#include <linux/vmalloc.h>
#include <linux/highmem.h>
@@ -3024,6 +3025,22 @@ static void kvmppc_mmu_destroy_hv(struct kvm_vcpu *vcpu)
return;
}
+static void kvmppc_setup_partition_table(struct kvm *kvm)
+{
+ unsigned long dw0, dw1;
+
+ /* PS field - page size for VRMA */
+ dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
+ ((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
+ /* HTABSIZE and HTABORG fields */
+ dw0 |= kvm->arch.sdr1;
+
+ /* Second dword has GR=0; other fields are unused since UPRT=0 */
+ dw1 = 0;
+
+ mmu_partition_table_set_entry(kvm->arch.lpid, dw0, dw1);
+}
+
static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
{
int err = 0;
@@ -3075,17 +3092,20 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
psize == 0x1000000))
goto out_srcu;
- /* Update VRMASD field in the LPCR */
senc = slb_pgsize_encoding(psize);
kvm->arch.vrma_slb_v = senc | SLB_VSID_B_1T |
(VRMA_VSID << SLB_VSID_SHIFT_1T);
- /* the -4 is to account for senc values starting at 0x10 */
- lpcr = senc << (LPCR_VRMASD_SH - 4);
-
/* Create HPTEs in the hash page table for the VRMA */
kvmppc_map_vrma(vcpu, memslot, porder);
- kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
+ /* Update VRMASD field in the LPCR */
+ if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+ /* the -4 is to account for senc values starting at 0x10 */
+ lpcr = senc << (LPCR_VRMASD_SH - 4);
+ kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
+ } else {
+ kvmppc_setup_partition_table(kvm);
+ }
/* Order updates to kvm->arch.lpcr etc. vs. hpte_setup_done */
smp_wmb();
@@ -3235,7 +3255,8 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
sizeof(kvm->arch.enabled_hcalls));
- kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
+ if (!cpu_has_feature(CPU_FTR_ARCH_300))
+ kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
/* Init LPCR for virtual RMA mode */
kvm->arch.host_lpid = mfspr(SPRN_LPID);
@@ -3248,6 +3269,9 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
/* On POWER8 turn on online bit to enable PURR/SPURR */
if (cpu_has_feature(CPU_FTR_ARCH_207S))
lpcr |= LPCR_ONL;
+ /* On POWER9, VPM0 bit is reserved (VPM0=1 behaviour is assumed) */
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ lpcr &= ~LPCR_VPM0;
kvm->arch.lpcr = lpcr;
/*
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c3c1d1b..dc25467 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -581,12 +581,14 @@ kvmppc_hv_entry:
ld r9,VCORE_KVM(r5) /* pointer to struct kvm */
cmpwi r6,0
bne 10f
- ld r6,KVM_SDR1(r9)
lwz r7,KVM_LPID(r9)
+BEGIN_FTR_SECTION
+ ld r6,KVM_SDR1(r9)
li r0,LPID_RSVD /* switch to reserved LPID */
mtspr SPRN_LPID,r0
ptesync
mtspr SPRN_SDR1,r6 /* switch to partition page table */
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
mtspr SPRN_LPID,r7
isync
@@ -1552,12 +1554,14 @@ kvmhv_switch_to_host:
beq 19f
/* Primary thread switches back to host partition */
- ld r6,KVM_HOST_SDR1(r4)
lwz r7,KVM_HOST_LPID(r4)
+BEGIN_FTR_SECTION
+ ld r6,KVM_HOST_SDR1(r4)
li r8,LPID_RSVD /* switch to reserved LPID */
mtspr SPRN_LPID,r8
ptesync
- mtspr SPRN_SDR1,r6 /* switch to partition page table */
+ mtspr SPRN_SDR1,r6 /* switch to host page table */
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
mtspr SPRN_LPID,r7
isync
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 06/13] KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
On POWER9, the SDR1 register (hashed page table base address) is no
longer used, and instead the hardware reads the HPT base address
and size from the partition table. The partition table entry also
contains the bits that specify the page size for the VRMA mapping,
which were previously in the LPCR. The VPM0 bit of the LPCR is
now reserved; the processor now always uses the VRMA (virtual
real-mode area) mechanism for guest real-mode accesses in HPT mode,
and the RMO (real-mode offset) mechanism has been dropped.
When entering or exiting the guest, we now only have to set the
LPIDR (logical partition ID register), not the SDR1 register.
There is also no requirement now to transition via a reserved
LPID value.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv.c | 36 +++++++++++++++++++++++++++------
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 ++++++---
2 files changed, 37 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 40b2b6d..5cbe3c3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -54,6 +54,7 @@
#include <asm/dbell.h>
#include <asm/hmi.h>
#include <asm/pnv-pci.h>
+#include <asm/mmu.h>
#include <linux/gfp.h>
#include <linux/vmalloc.h>
#include <linux/highmem.h>
@@ -3024,6 +3025,22 @@ static void kvmppc_mmu_destroy_hv(struct kvm_vcpu *vcpu)
return;
}
+static void kvmppc_setup_partition_table(struct kvm *kvm)
+{
+ unsigned long dw0, dw1;
+
+ /* PS field - page size for VRMA */
+ dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
+ ((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
+ /* HTABSIZE and HTABORG fields */
+ dw0 |= kvm->arch.sdr1;
+
+ /* Second dword has GR=0; other fields are unused since UPRT=0 */
+ dw1 = 0;
+
+ mmu_partition_table_set_entry(kvm->arch.lpid, dw0, dw1);
+}
+
static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
{
int err = 0;
@@ -3075,17 +3092,20 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
psize = 0x1000000))
goto out_srcu;
- /* Update VRMASD field in the LPCR */
senc = slb_pgsize_encoding(psize);
kvm->arch.vrma_slb_v = senc | SLB_VSID_B_1T |
(VRMA_VSID << SLB_VSID_SHIFT_1T);
- /* the -4 is to account for senc values starting at 0x10 */
- lpcr = senc << (LPCR_VRMASD_SH - 4);
-
/* Create HPTEs in the hash page table for the VRMA */
kvmppc_map_vrma(vcpu, memslot, porder);
- kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
+ /* Update VRMASD field in the LPCR */
+ if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+ /* the -4 is to account for senc values starting at 0x10 */
+ lpcr = senc << (LPCR_VRMASD_SH - 4);
+ kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
+ } else {
+ kvmppc_setup_partition_table(kvm);
+ }
/* Order updates to kvm->arch.lpcr etc. vs. hpte_setup_done */
smp_wmb();
@@ -3235,7 +3255,8 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
sizeof(kvm->arch.enabled_hcalls));
- kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
+ if (!cpu_has_feature(CPU_FTR_ARCH_300))
+ kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
/* Init LPCR for virtual RMA mode */
kvm->arch.host_lpid = mfspr(SPRN_LPID);
@@ -3248,6 +3269,9 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
/* On POWER8 turn on online bit to enable PURR/SPURR */
if (cpu_has_feature(CPU_FTR_ARCH_207S))
lpcr |= LPCR_ONL;
+ /* On POWER9, VPM0 bit is reserved (VPM0=1 behaviour is assumed) */
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ lpcr &= ~LPCR_VPM0;
kvm->arch.lpcr = lpcr;
/*
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c3c1d1b..dc25467 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -581,12 +581,14 @@ kvmppc_hv_entry:
ld r9,VCORE_KVM(r5) /* pointer to struct kvm */
cmpwi r6,0
bne 10f
- ld r6,KVM_SDR1(r9)
lwz r7,KVM_LPID(r9)
+BEGIN_FTR_SECTION
+ ld r6,KVM_SDR1(r9)
li r0,LPID_RSVD /* switch to reserved LPID */
mtspr SPRN_LPID,r0
ptesync
mtspr SPRN_SDR1,r6 /* switch to partition page table */
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
mtspr SPRN_LPID,r7
isync
@@ -1552,12 +1554,14 @@ kvmhv_switch_to_host:
beq 19f
/* Primary thread switches back to host partition */
- ld r6,KVM_HOST_SDR1(r4)
lwz r7,KVM_HOST_LPID(r4)
+BEGIN_FTR_SECTION
+ ld r6,KVM_HOST_SDR1(r4)
li r8,LPID_RSVD /* switch to reserved LPID */
mtspr SPRN_LPID,r8
ptesync
- mtspr SPRN_SDR1,r6 /* switch to partition page table */
+ mtspr SPRN_SDR1,r6 /* switch to host page table */
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
mtspr SPRN_LPID,r7
isync
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
Some special-purpose registers that were present and accessible
by guests on POWER8 no longer exist on POWER9, so this adds
feature sections to ensure that we don't try to context-switch
them when going into or out of a guest on POWER9. These are
all relatively obscure, rarely-used registers, but we had to
context-switch them on POWER8 to avoid creating a covert channel.
They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 ++++++++++++++++++++-------------
1 file changed, 30 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index dc25467..d422014 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -752,14 +752,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
BEGIN_FTR_SECTION
ld r5, VCPU_MMCR + 24(r4)
ld r6, VCPU_SIER(r4)
+ mtspr SPRN_MMCR2, r5
+ mtspr SPRN_SIER, r6
+BEGIN_FTR_SECTION_NESTED(96)
lwz r7, VCPU_PMC + 24(r4)
lwz r8, VCPU_PMC + 28(r4)
ld r9, VCPU_MMCR + 32(r4)
- mtspr SPRN_MMCR2, r5
- mtspr SPRN_SIER, r6
mtspr SPRN_SPMC1, r7
mtspr SPRN_SPMC2, r8
mtspr SPRN_MMCRS, r9
+END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr SPRN_MMCR0, r3
isync
@@ -815,20 +817,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
mtspr SPRN_EBBHR, r8
ld r5, VCPU_EBBRR(r4)
ld r6, VCPU_BESCR(r4)
- ld r7, VCPU_CSIGR(r4)
- ld r8, VCPU_TACR(r4)
+ lwz r7, VCPU_GUEST_PID(r4)
+ ld r8, VCPU_WORT(r4)
mtspr SPRN_EBBRR, r5
mtspr SPRN_BESCR, r6
- mtspr SPRN_CSIGR, r7
- mtspr SPRN_TACR, r8
+ mtspr SPRN_PID, r7
+ mtspr SPRN_WORT, r8
+BEGIN_FTR_SECTION
ld r5, VCPU_TCSCR(r4)
ld r6, VCPU_ACOP(r4)
- lwz r7, VCPU_GUEST_PID(r4)
- ld r8, VCPU_WORT(r4)
+ ld r7, VCPU_CSIGR(r4)
+ ld r8, VCPU_TACR(r4)
mtspr SPRN_TCSCR, r5
mtspr SPRN_ACOP, r6
- mtspr SPRN_PID, r7
- mtspr SPRN_WORT, r8
+ mtspr SPRN_CSIGR, r7
+ mtspr SPRN_TACR, r8
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
8:
/*
@@ -1343,20 +1347,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
std r8, VCPU_EBBHR(r9)
mfspr r5, SPRN_EBBRR
mfspr r6, SPRN_BESCR
- mfspr r7, SPRN_CSIGR
- mfspr r8, SPRN_TACR
+ mfspr r7, SPRN_PID
+ mfspr r8, SPRN_WORT
std r5, VCPU_EBBRR(r9)
std r6, VCPU_BESCR(r9)
- std r7, VCPU_CSIGR(r9)
- std r8, VCPU_TACR(r9)
+ stw r7, VCPU_GUEST_PID(r9)
+ std r8, VCPU_WORT(r9)
+BEGIN_FTR_SECTION
mfspr r5, SPRN_TCSCR
mfspr r6, SPRN_ACOP
- mfspr r7, SPRN_PID
- mfspr r8, SPRN_WORT
+ mfspr r7, SPRN_CSIGR
+ mfspr r8, SPRN_TACR
std r5, VCPU_TCSCR(r9)
std r6, VCPU_ACOP(r9)
- stw r7, VCPU_GUEST_PID(r9)
- std r8, VCPU_WORT(r9)
+ std r7, VCPU_CSIGR(r9)
+ std r8, VCPU_TACR(r9)
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
@@ -1365,12 +1371,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
mtspr SPRN_IAMR, r0
mtspr SPRN_CIABR, r0
mtspr SPRN_DAWRX, r0
- mtspr SPRN_TCSCR, r0
mtspr SPRN_WORT, r0
+BEGIN_FTR_SECTION
+ mtspr SPRN_TCSCR, r0
/* Set MMCRS to 1<<31 to freeze and disable the SPMC counters */
li r0, 1
sldi r0, r0, 31
mtspr SPRN_MMCRS, r0
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
8:
/* Save and reset AMR and UAMOR before turning on the MMU */
@@ -1504,15 +1512,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
stw r8, VCPU_PMC + 20(r9)
BEGIN_FTR_SECTION
mfspr r5, SPRN_SIER
+ std r5, VCPU_SIER(r9)
+BEGIN_FTR_SECTION_NESTED(96)
mfspr r6, SPRN_SPMC1
mfspr r7, SPRN_SPMC2
mfspr r8, SPRN_MMCRS
- std r5, VCPU_SIER(r9)
stw r6, VCPU_PMC + 24(r9)
stw r7, VCPU_PMC + 28(r9)
std r8, VCPU_MMCR + 32(r9)
lis r4, 0x8000
mtspr SPRN_MMCRS, r4
+END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
22:
/* Clear out SLB */
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
Some special-purpose registers that were present and accessible
by guests on POWER8 no longer exist on POWER9, so this adds
feature sections to ensure that we don't try to context-switch
them when going into or out of a guest on POWER9. These are
all relatively obscure, rarely-used registers, but we had to
context-switch them on POWER8 to avoid creating a covert channel.
They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 ++++++++++++++++++++-------------
1 file changed, 30 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index dc25467..d422014 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -752,14 +752,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
BEGIN_FTR_SECTION
ld r5, VCPU_MMCR + 24(r4)
ld r6, VCPU_SIER(r4)
+ mtspr SPRN_MMCR2, r5
+ mtspr SPRN_SIER, r6
+BEGIN_FTR_SECTION_NESTED(96)
lwz r7, VCPU_PMC + 24(r4)
lwz r8, VCPU_PMC + 28(r4)
ld r9, VCPU_MMCR + 32(r4)
- mtspr SPRN_MMCR2, r5
- mtspr SPRN_SIER, r6
mtspr SPRN_SPMC1, r7
mtspr SPRN_SPMC2, r8
mtspr SPRN_MMCRS, r9
+END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr SPRN_MMCR0, r3
isync
@@ -815,20 +817,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
mtspr SPRN_EBBHR, r8
ld r5, VCPU_EBBRR(r4)
ld r6, VCPU_BESCR(r4)
- ld r7, VCPU_CSIGR(r4)
- ld r8, VCPU_TACR(r4)
+ lwz r7, VCPU_GUEST_PID(r4)
+ ld r8, VCPU_WORT(r4)
mtspr SPRN_EBBRR, r5
mtspr SPRN_BESCR, r6
- mtspr SPRN_CSIGR, r7
- mtspr SPRN_TACR, r8
+ mtspr SPRN_PID, r7
+ mtspr SPRN_WORT, r8
+BEGIN_FTR_SECTION
ld r5, VCPU_TCSCR(r4)
ld r6, VCPU_ACOP(r4)
- lwz r7, VCPU_GUEST_PID(r4)
- ld r8, VCPU_WORT(r4)
+ ld r7, VCPU_CSIGR(r4)
+ ld r8, VCPU_TACR(r4)
mtspr SPRN_TCSCR, r5
mtspr SPRN_ACOP, r6
- mtspr SPRN_PID, r7
- mtspr SPRN_WORT, r8
+ mtspr SPRN_CSIGR, r7
+ mtspr SPRN_TACR, r8
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
8:
/*
@@ -1343,20 +1347,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
std r8, VCPU_EBBHR(r9)
mfspr r5, SPRN_EBBRR
mfspr r6, SPRN_BESCR
- mfspr r7, SPRN_CSIGR
- mfspr r8, SPRN_TACR
+ mfspr r7, SPRN_PID
+ mfspr r8, SPRN_WORT
std r5, VCPU_EBBRR(r9)
std r6, VCPU_BESCR(r9)
- std r7, VCPU_CSIGR(r9)
- std r8, VCPU_TACR(r9)
+ stw r7, VCPU_GUEST_PID(r9)
+ std r8, VCPU_WORT(r9)
+BEGIN_FTR_SECTION
mfspr r5, SPRN_TCSCR
mfspr r6, SPRN_ACOP
- mfspr r7, SPRN_PID
- mfspr r8, SPRN_WORT
+ mfspr r7, SPRN_CSIGR
+ mfspr r8, SPRN_TACR
std r5, VCPU_TCSCR(r9)
std r6, VCPU_ACOP(r9)
- stw r7, VCPU_GUEST_PID(r9)
- std r8, VCPU_WORT(r9)
+ std r7, VCPU_CSIGR(r9)
+ std r8, VCPU_TACR(r9)
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
@@ -1365,12 +1371,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
mtspr SPRN_IAMR, r0
mtspr SPRN_CIABR, r0
mtspr SPRN_DAWRX, r0
- mtspr SPRN_TCSCR, r0
mtspr SPRN_WORT, r0
+BEGIN_FTR_SECTION
+ mtspr SPRN_TCSCR, r0
/* Set MMCRS to 1<<31 to freeze and disable the SPMC counters */
li r0, 1
sldi r0, r0, 31
mtspr SPRN_MMCRS, r0
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
8:
/* Save and reset AMR and UAMOR before turning on the MMU */
@@ -1504,15 +1512,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
stw r8, VCPU_PMC + 20(r9)
BEGIN_FTR_SECTION
mfspr r5, SPRN_SIER
+ std r5, VCPU_SIER(r9)
+BEGIN_FTR_SECTION_NESTED(96)
mfspr r6, SPRN_SPMC1
mfspr r7, SPRN_SPMC2
mfspr r8, SPRN_MMCRS
- std r5, VCPU_SIER(r9)
stw r6, VCPU_PMC + 24(r9)
stw r7, VCPU_PMC + 28(r9)
std r8, VCPU_MMCR + 32(r9)
lis r4, 0x8000
mtspr SPRN_MMCRS, r4
+END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
22:
/* Clear out SLB */
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 08/13] KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
This adds code to handle two new guest-accessible special-purpose
registers on POWER9: TIDR (thread ID register) and PSSCR (processor
stop status and control register). They are context-switched
between host and guest, and the guest values can be read and set
via the one_reg interface.
The PSSCR contains some fields which are guest-accessible and some
which are only accessible in hypervisor mode. We only allow the
guest-accessible fields to be read or set by userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
Documentation/virtual/kvm/api.txt | 2 ++
arch/powerpc/include/asm/kvm_host.h | 2 ++
arch/powerpc/include/uapi/asm/kvm.h | 4 ++++
arch/powerpc/kernel/asm-offsets.c | 2 ++
arch/powerpc/kvm/book3s_hv.c | 12 ++++++++++
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 39 +++++++++++++++++++++++++++++++--
6 files changed, 59 insertions(+), 2 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 739db9a..40b2bfc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2023,6 +2023,8 @@ registers, find a list below:
PPC | KVM_REG_PPC_WORT | 64
PPC | KVM_REG_PPC_SPRG9 | 64
PPC | KVM_REG_PPC_DBSR | 32
+ PPC | KVM_REG_PPC_TIDR | 64
+ PPC | KVM_REG_PPC_PSSCR | 64
PPC | KVM_REG_PPC_TM_GPR0 | 64
...
PPC | KVM_REG_PPC_TM_GPR31 | 64
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 20ef27d..0d94608 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -517,6 +517,8 @@ struct kvm_vcpu_arch {
ulong tcscr;
ulong acop;
ulong wort;
+ ulong tid;
+ ulong psscr;
ulong shadow_srr1;
#endif
u32 vrsave; /* also USPRG0 */
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index c93cf35..f0bae66 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -573,6 +573,10 @@ struct kvm_get_htab_header {
#define KVM_REG_PPC_SPRG9 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xba)
#define KVM_REG_PPC_DBSR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbb)
+/* POWER9 registers */
+#define KVM_REG_PPC_TIDR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbc)
+#define KVM_REG_PPC_PSSCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbd)
+
/* Transactional Memory checkpointed state:
* This is all GPRs, all VSX regs and a subset of SPRs
*/
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index caec7bf..494241b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -548,6 +548,8 @@ int main(void)
DEFINE(VCPU_TCSCR, offsetof(struct kvm_vcpu, arch.tcscr));
DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
+ DEFINE(VCPU_TID, offsetof(struct kvm_vcpu, arch.tid));
+ DEFINE(VCPU_PSSCR, offsetof(struct kvm_vcpu, arch.psscr));
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5cbe3c3..59e18dfb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1230,6 +1230,12 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_WORT:
*val = get_reg_val(id, vcpu->arch.wort);
break;
+ case KVM_REG_PPC_TIDR:
+ *val = get_reg_val(id, vcpu->arch.tid);
+ break;
+ case KVM_REG_PPC_PSSCR:
+ *val = get_reg_val(id, vcpu->arch.psscr);
+ break;
case KVM_REG_PPC_VPA_ADDR:
spin_lock(&vcpu->arch.vpa_update_lock);
*val = get_reg_val(id, vcpu->arch.vpa.next_gpa);
@@ -1428,6 +1434,12 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_WORT:
vcpu->arch.wort = set_reg_val(id, *val);
break;
+ case KVM_REG_PPC_TIDR:
+ vcpu->arch.tid = set_reg_val(id, *val);
+ break;
+ case KVM_REG_PPC_PSSCR:
+ vcpu->arch.psscr = set_reg_val(id, *val) & PSSCR_GUEST_VIS;
+ break;
case KVM_REG_PPC_VPA_ADDR:
addr = set_reg_val(id, *val);
r = -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d422014..219a04f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -523,6 +523,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
* *
*****************************************************************************/
+/* Stack frame offsets */
+#define STACK_SLOT_TID (112-16)
+#define STACK_SLOT_PSSCR (112-24)
+
.global kvmppc_hv_entry
kvmppc_hv_entry:
@@ -700,6 +704,14 @@ kvmppc_got_guest:
mtspr SPRN_PURR,r7
mtspr SPRN_SPURR,r8
+ /* Save host values of some registers */
+BEGIN_FTR_SECTION
+ mfspr r5, SPRN_TIDR
+ mfspr r6, SPRN_PSSCR
+ std r5, STACK_SLOT_TID(r1)
+ std r6, STACK_SLOT_PSSCR(r1)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+
BEGIN_FTR_SECTION
/* Set partition DABR */
/* Do this before re-enabling PMU to avoid P7 DABR corruption bug */
@@ -824,6 +836,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
mtspr SPRN_PID, r7
mtspr SPRN_WORT, r8
BEGIN_FTR_SECTION
+ /* POWER8-only registers */
ld r5, VCPU_TCSCR(r4)
ld r6, VCPU_ACOP(r4)
ld r7, VCPU_CSIGR(r4)
@@ -832,7 +845,14 @@ BEGIN_FTR_SECTION
mtspr SPRN_ACOP, r6
mtspr SPRN_CSIGR, r7
mtspr SPRN_TACR, r8
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
+FTR_SECTION_ELSE
+ /* POWER9-only registers */
+ ld r5, VCPU_TID(r4)
+ ld r6, VCPU_PSSCR(r4)
+ oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */
+ mtspr SPRN_TIDR, r5
+ mtspr SPRN_PSSCR, r6
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
8:
/*
@@ -1362,7 +1382,14 @@ BEGIN_FTR_SECTION
std r6, VCPU_ACOP(r9)
std r7, VCPU_CSIGR(r9)
std r8, VCPU_TACR(r9)
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
+FTR_SECTION_ELSE
+ mfspr r5, SPRN_TIDR
+ mfspr r6, SPRN_PSSCR
+ std r5, VCPU_TID(r9)
+ rldicl r6, r6, 4, 50 /* r6 &= PSSCR_GUEST_VIS */
+ rotldi r6, r6, 60
+ std r6, VCPU_PSSCR(r9)
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
@@ -1531,6 +1558,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
slbia
ptesync
+ /* Restore host values of some registers */
+BEGIN_FTR_SECTION
+ ld r5, STACK_SLOT_TID(r1)
+ ld r6, STACK_SLOT_PSSCR(r1)
+ mtspr SPRN_TIDR, r5
+ mtspr SPRN_PSSCR, r6
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+
/*
* POWER7/POWER8 guest -> host partition switch code.
* We don't have to lock against tlbies but we do
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 08/13] KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
This adds code to handle two new guest-accessible special-purpose
registers on POWER9: TIDR (thread ID register) and PSSCR (processor
stop status and control register). They are context-switched
between host and guest, and the guest values can be read and set
via the one_reg interface.
The PSSCR contains some fields which are guest-accessible and some
which are only accessible in hypervisor mode. We only allow the
guest-accessible fields to be read or set by userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
Documentation/virtual/kvm/api.txt | 2 ++
arch/powerpc/include/asm/kvm_host.h | 2 ++
arch/powerpc/include/uapi/asm/kvm.h | 4 ++++
arch/powerpc/kernel/asm-offsets.c | 2 ++
arch/powerpc/kvm/book3s_hv.c | 12 ++++++++++
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 39 +++++++++++++++++++++++++++++++--
6 files changed, 59 insertions(+), 2 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 739db9a..40b2bfc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2023,6 +2023,8 @@ registers, find a list below:
PPC | KVM_REG_PPC_WORT | 64
PPC | KVM_REG_PPC_SPRG9 | 64
PPC | KVM_REG_PPC_DBSR | 32
+ PPC | KVM_REG_PPC_TIDR | 64
+ PPC | KVM_REG_PPC_PSSCR | 64
PPC | KVM_REG_PPC_TM_GPR0 | 64
...
PPC | KVM_REG_PPC_TM_GPR31 | 64
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 20ef27d..0d94608 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -517,6 +517,8 @@ struct kvm_vcpu_arch {
ulong tcscr;
ulong acop;
ulong wort;
+ ulong tid;
+ ulong psscr;
ulong shadow_srr1;
#endif
u32 vrsave; /* also USPRG0 */
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index c93cf35..f0bae66 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -573,6 +573,10 @@ struct kvm_get_htab_header {
#define KVM_REG_PPC_SPRG9 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xba)
#define KVM_REG_PPC_DBSR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbb)
+/* POWER9 registers */
+#define KVM_REG_PPC_TIDR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbc)
+#define KVM_REG_PPC_PSSCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbd)
+
/* Transactional Memory checkpointed state:
* This is all GPRs, all VSX regs and a subset of SPRs
*/
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index caec7bf..494241b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -548,6 +548,8 @@ int main(void)
DEFINE(VCPU_TCSCR, offsetof(struct kvm_vcpu, arch.tcscr));
DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
+ DEFINE(VCPU_TID, offsetof(struct kvm_vcpu, arch.tid));
+ DEFINE(VCPU_PSSCR, offsetof(struct kvm_vcpu, arch.psscr));
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5cbe3c3..59e18dfb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1230,6 +1230,12 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_WORT:
*val = get_reg_val(id, vcpu->arch.wort);
break;
+ case KVM_REG_PPC_TIDR:
+ *val = get_reg_val(id, vcpu->arch.tid);
+ break;
+ case KVM_REG_PPC_PSSCR:
+ *val = get_reg_val(id, vcpu->arch.psscr);
+ break;
case KVM_REG_PPC_VPA_ADDR:
spin_lock(&vcpu->arch.vpa_update_lock);
*val = get_reg_val(id, vcpu->arch.vpa.next_gpa);
@@ -1428,6 +1434,12 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_WORT:
vcpu->arch.wort = set_reg_val(id, *val);
break;
+ case KVM_REG_PPC_TIDR:
+ vcpu->arch.tid = set_reg_val(id, *val);
+ break;
+ case KVM_REG_PPC_PSSCR:
+ vcpu->arch.psscr = set_reg_val(id, *val) & PSSCR_GUEST_VIS;
+ break;
case KVM_REG_PPC_VPA_ADDR:
addr = set_reg_val(id, *val);
r = -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d422014..219a04f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -523,6 +523,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
* *
*****************************************************************************/
+/* Stack frame offsets */
+#define STACK_SLOT_TID (112-16)
+#define STACK_SLOT_PSSCR (112-24)
+
.global kvmppc_hv_entry
kvmppc_hv_entry:
@@ -700,6 +704,14 @@ kvmppc_got_guest:
mtspr SPRN_PURR,r7
mtspr SPRN_SPURR,r8
+ /* Save host values of some registers */
+BEGIN_FTR_SECTION
+ mfspr r5, SPRN_TIDR
+ mfspr r6, SPRN_PSSCR
+ std r5, STACK_SLOT_TID(r1)
+ std r6, STACK_SLOT_PSSCR(r1)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+
BEGIN_FTR_SECTION
/* Set partition DABR */
/* Do this before re-enabling PMU to avoid P7 DABR corruption bug */
@@ -824,6 +836,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
mtspr SPRN_PID, r7
mtspr SPRN_WORT, r8
BEGIN_FTR_SECTION
+ /* POWER8-only registers */
ld r5, VCPU_TCSCR(r4)
ld r6, VCPU_ACOP(r4)
ld r7, VCPU_CSIGR(r4)
@@ -832,7 +845,14 @@ BEGIN_FTR_SECTION
mtspr SPRN_ACOP, r6
mtspr SPRN_CSIGR, r7
mtspr SPRN_TACR, r8
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
+FTR_SECTION_ELSE
+ /* POWER9-only registers */
+ ld r5, VCPU_TID(r4)
+ ld r6, VCPU_PSSCR(r4)
+ oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */
+ mtspr SPRN_TIDR, r5
+ mtspr SPRN_PSSCR, r6
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
8:
/*
@@ -1362,7 +1382,14 @@ BEGIN_FTR_SECTION
std r6, VCPU_ACOP(r9)
std r7, VCPU_CSIGR(r9)
std r8, VCPU_TACR(r9)
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
+FTR_SECTION_ELSE
+ mfspr r5, SPRN_TIDR
+ mfspr r6, SPRN_PSSCR
+ std r5, VCPU_TID(r9)
+ rldicl r6, r6, 4, 50 /* r6 &= PSSCR_GUEST_VIS */
+ rotldi r6, r6, 60
+ std r6, VCPU_PSSCR(r9)
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
@@ -1531,6 +1558,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
slbia
ptesync
+ /* Restore host values of some registers */
+BEGIN_FTR_SECTION
+ ld r5, STACK_SLOT_TID(r1)
+ ld r6, STACK_SLOT_PSSCR(r1)
+ mtspr SPRN_TIDR, r5
+ mtspr SPRN_PSSCR, r6
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+
/*
* POWER7/POWER8 guest -> host partition switch code.
* We don't have to lock against tlbies but we do
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
and tlbiel (local tlbie) instructions. Both instructions get a
set of new parameters (RIC, PRS and R) which appear as bits in the
instruction word. The tlbiel instruction now has a second register
operand, which contains a PID and/or LPID value if needed, and
should otherwise contain 0.
This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
as well as older processors. Since we only handle HPT guests so
far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
word as on previous processors, so we don't need to conditionally
execute different instructions depending on the processor.
The local flush on first entry to a guest in book3s_hv_rmhandlers.S
is a loop which depends on the number of TLB sets. Rather than
using feature sections to set the number of iterations based on
which CPU we're on, we now work out this number at VM creation time
and store it in the kvm_arch struct. That will make it possible to
get the number from the device tree in future, which will help with
compatibility with future processors.
Since mmu_partition_table_set_entry() does a global flush of the
whole LPID, we don't need to do the TLB flush on first entry to the
guest on each processor. Therefore we don't set all bits in the
tlb_need_flush bitmap on VM startup on POWER9.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kvm/book3s_hv.c | 17 ++++++++++++++++-
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 10 ++++++++--
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 ++------
5 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 0d94608..ea78864 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -244,6 +244,7 @@ struct kvm_arch_memory_slot {
struct kvm_arch {
unsigned int lpid;
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ unsigned int tlb_sets;
unsigned long hpt_virt;
struct revmap_entry *revmap;
atomic64_t mmio_update;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 494241b..b9c8386 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -487,6 +487,7 @@ int main(void)
/* book3s */
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ DEFINE(KVM_TLB_SETS, offsetof(struct kvm, arch.tlb_sets));
DEFINE(KVM_SDR1, offsetof(struct kvm, arch.sdr1));
DEFINE(KVM_HOST_LPID, offsetof(struct kvm, arch.host_lpid));
DEFINE(KVM_HOST_LPCR, offsetof(struct kvm, arch.host_lpcr));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 59e18dfb..8395a7f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3260,8 +3260,11 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
* Since we don't flush the TLB when tearing down a VM,
* and this lpid might have previously been used,
* make sure we flush on each core before running the new VM.
+ * On POWER9, the tlbie in mmu_partition_table_set_entry()
+ * does this flush for us.
*/
- cpumask_setall(&kvm->arch.need_tlb_flush);
+ if (!cpu_has_feature(CPU_FTR_ARCH_300))
+ cpumask_setall(&kvm->arch.need_tlb_flush);
/* Start out with the default set of hcalls enabled */
memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
@@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
kvm->arch.lpcr = lpcr;
/*
+ * Work out how many sets the TLB has, for the use of
+ * the TLB invalidation loop in book3s_hv_rmhandlers.S.
+ */
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ kvm->arch.tlb_sets = 256; /* POWER9 */
+ else if (cpu_has_feature(CPU_FTR_ARCH_207S))
+ kvm->arch.tlb_sets = 512; /* POWER8 */
+ else
+ kvm->arch.tlb_sets = 128; /* POWER7 */
+
+ /*
* Track that we now have a HV mode VM active. This blocks secondary
* CPU threads from coming online.
*/
@@ -3728,3 +3742,4 @@ module_exit(kvmppc_book3s_exit_hv);
MODULE_LICENSE("GPL");
MODULE_ALIAS_MISCDEV(KVM_MINOR);
MODULE_ALIAS("devname:kvm");
+
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1179e40..9ef3c4b 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -424,13 +424,18 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
{
long i;
+ /*
+ * We use the POWER9 5-operand versions of tlbie and tlbiel here.
+ * Since we are using RIC=0 PRS=0 R=0, and P7/P8 tlbiel ignores
+ * the RS field, this is backwards-compatible with P7 and P8.
+ */
if (global) {
while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
cpu_relax();
if (need_sync)
asm volatile("ptesync" : : : "memory");
for (i = 0; i < npages; ++i)
- asm volatile(PPC_TLBIE(%1,%0) : :
+ asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
"r" (rbvalues[i]), "r" (kvm->arch.lpid));
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
kvm->arch.tlbie_lock = 0;
@@ -438,7 +443,8 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
if (need_sync)
asm volatile("ptesync" : : : "memory");
for (i = 0; i < npages; ++i)
- asm volatile("tlbiel %0" : : "r" (rbvalues[i]));
+ asm volatile(PPC_TLBIEL(%0,%1,0,0,0) : :
+ "r" (rbvalues[i]), "r" (0));
asm volatile("ptesync" : : : "memory");
}
}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 219a04f..acae5c3 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -613,12 +613,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
stdcx. r7,0,r6
bne 23b
/* Flush the TLB of any entries for this LPID */
- /* use arch 2.07S as a proxy for POWER8 */
-BEGIN_FTR_SECTION
- li r6,512 /* POWER8 has 512 sets */
-FTR_SECTION_ELSE
- li r6,128 /* POWER7 has 128 sets */
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_207S)
+ lwz r6,KVM_TLB_SETS(r9)
+ li r0,0 /* RS for P9 version of tlbiel */
mtctr r6
li r7,0x800 /* IS field = 0b10 */
ptesync
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
and tlbiel (local tlbie) instructions. Both instructions get a
set of new parameters (RIC, PRS and R) which appear as bits in the
instruction word. The tlbiel instruction now has a second register
operand, which contains a PID and/or LPID value if needed, and
should otherwise contain 0.
This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
as well as older processors. Since we only handle HPT guests so
far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
word as on previous processors, so we don't need to conditionally
execute different instructions depending on the processor.
The local flush on first entry to a guest in book3s_hv_rmhandlers.S
is a loop which depends on the number of TLB sets. Rather than
using feature sections to set the number of iterations based on
which CPU we're on, we now work out this number at VM creation time
and store it in the kvm_arch struct. That will make it possible to
get the number from the device tree in future, which will help with
compatibility with future processors.
Since mmu_partition_table_set_entry() does a global flush of the
whole LPID, we don't need to do the TLB flush on first entry to the
guest on each processor. Therefore we don't set all bits in the
tlb_need_flush bitmap on VM startup on POWER9.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kvm/book3s_hv.c | 17 ++++++++++++++++-
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 10 ++++++++--
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 ++------
5 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 0d94608..ea78864 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -244,6 +244,7 @@ struct kvm_arch_memory_slot {
struct kvm_arch {
unsigned int lpid;
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ unsigned int tlb_sets;
unsigned long hpt_virt;
struct revmap_entry *revmap;
atomic64_t mmio_update;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 494241b..b9c8386 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -487,6 +487,7 @@ int main(void)
/* book3s */
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ DEFINE(KVM_TLB_SETS, offsetof(struct kvm, arch.tlb_sets));
DEFINE(KVM_SDR1, offsetof(struct kvm, arch.sdr1));
DEFINE(KVM_HOST_LPID, offsetof(struct kvm, arch.host_lpid));
DEFINE(KVM_HOST_LPCR, offsetof(struct kvm, arch.host_lpcr));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 59e18dfb..8395a7f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3260,8 +3260,11 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
* Since we don't flush the TLB when tearing down a VM,
* and this lpid might have previously been used,
* make sure we flush on each core before running the new VM.
+ * On POWER9, the tlbie in mmu_partition_table_set_entry()
+ * does this flush for us.
*/
- cpumask_setall(&kvm->arch.need_tlb_flush);
+ if (!cpu_has_feature(CPU_FTR_ARCH_300))
+ cpumask_setall(&kvm->arch.need_tlb_flush);
/* Start out with the default set of hcalls enabled */
memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
@@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
kvm->arch.lpcr = lpcr;
/*
+ * Work out how many sets the TLB has, for the use of
+ * the TLB invalidation loop in book3s_hv_rmhandlers.S.
+ */
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ kvm->arch.tlb_sets = 256; /* POWER9 */
+ else if (cpu_has_feature(CPU_FTR_ARCH_207S))
+ kvm->arch.tlb_sets = 512; /* POWER8 */
+ else
+ kvm->arch.tlb_sets = 128; /* POWER7 */
+
+ /*
* Track that we now have a HV mode VM active. This blocks secondary
* CPU threads from coming online.
*/
@@ -3728,3 +3742,4 @@ module_exit(kvmppc_book3s_exit_hv);
MODULE_LICENSE("GPL");
MODULE_ALIAS_MISCDEV(KVM_MINOR);
MODULE_ALIAS("devname:kvm");
+
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1179e40..9ef3c4b 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -424,13 +424,18 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
{
long i;
+ /*
+ * We use the POWER9 5-operand versions of tlbie and tlbiel here.
+ * Since we are using RIC=0 PRS=0 R=0, and P7/P8 tlbiel ignores
+ * the RS field, this is backwards-compatible with P7 and P8.
+ */
if (global) {
while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
cpu_relax();
if (need_sync)
asm volatile("ptesync" : : : "memory");
for (i = 0; i < npages; ++i)
- asm volatile(PPC_TLBIE(%1,%0) : :
+ asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
"r" (rbvalues[i]), "r" (kvm->arch.lpid));
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
kvm->arch.tlbie_lock = 0;
@@ -438,7 +443,8 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
if (need_sync)
asm volatile("ptesync" : : : "memory");
for (i = 0; i < npages; ++i)
- asm volatile("tlbiel %0" : : "r" (rbvalues[i]));
+ asm volatile(PPC_TLBIEL(%0,%1,0,0,0) : :
+ "r" (rbvalues[i]), "r" (0));
asm volatile("ptesync" : : : "memory");
}
}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 219a04f..acae5c3 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -613,12 +613,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
stdcx. r7,0,r6
bne 23b
/* Flush the TLB of any entries for this LPID */
- /* use arch 2.07S as a proxy for POWER8 */
-BEGIN_FTR_SECTION
- li r6,512 /* POWER8 has 512 sets */
-FTR_SECTION_ELSE
- li r6,128 /* POWER7 has 128 sets */
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_207S)
+ lwz r6,KVM_TLB_SETS(r9)
+ li r0,0 /* RS for P9 version of tlbiel */
mtctr r6
li r7,0x800 /* IS field = 0b10 */
ptesync
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
On POWER9, the msgsnd instruction is able to send interrupts to
other cores, as well as other threads on the local core. Since
msgsnd is generally simpler and faster than sending an IPI via the
XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv.c | 11 ++++++++++-
arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8395a7f..ace89df 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -147,12 +147,21 @@ static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc,
static bool kvmppc_ipi_thread(int cpu)
{
+ unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+
+ /* On POWER9 we can use msgsnd to IPI any cpu */
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ msg |= get_hard_smp_processor_id(cpu);
+ smp_mb();
+ __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+ return true;
+ }
+
/* On POWER8 for IPIs to threads in the same core, use msgsnd */
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
preempt_disable();
if (cpu_first_thread_sibling(cpu) ==
cpu_first_thread_sibling(smp_processor_id())) {
- unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
msg |= cpu_thread_in_core(cpu);
smp_mb();
__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 0c84d6b..37ed045 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
void kvmhv_rm_send_ipi(int cpu)
{
unsigned long xics_phys;
+ unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
- /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+ /* On POWER9 we can use msgsnd for any destination cpu. */
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ msg |= get_hard_smp_processor_id(cpu);
+ __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+ return;
+ }
+ /* On POWER8 for IPIs to threads in the same core, use msgsnd. */
if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
cpu_first_thread_sibling(cpu) ==
cpu_first_thread_sibling(raw_smp_processor_id())) {
- unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
msg |= cpu_thread_in_core(cpu);
__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
return;
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
On POWER9, the msgsnd instruction is able to send interrupts to
other cores, as well as other threads on the local core. Since
msgsnd is generally simpler and faster than sending an IPI via the
XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv.c | 11 ++++++++++-
arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8395a7f..ace89df 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -147,12 +147,21 @@ static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc,
static bool kvmppc_ipi_thread(int cpu)
{
+ unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+
+ /* On POWER9 we can use msgsnd to IPI any cpu */
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ msg |= get_hard_smp_processor_id(cpu);
+ smp_mb();
+ __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+ return true;
+ }
+
/* On POWER8 for IPIs to threads in the same core, use msgsnd */
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
preempt_disable();
if (cpu_first_thread_sibling(cpu) =
cpu_first_thread_sibling(smp_processor_id())) {
- unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
msg |= cpu_thread_in_core(cpu);
smp_mb();
__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 0c84d6b..37ed045 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
void kvmhv_rm_send_ipi(int cpu)
{
unsigned long xics_phys;
+ unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
- /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+ /* On POWER9 we can use msgsnd for any destination cpu. */
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ msg |= get_hard_smp_processor_id(cpu);
+ __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+ return;
+ }
+ /* On POWER8 for IPIs to threads in the same core, use msgsnd. */
if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
cpu_first_thread_sibling(cpu) =
cpu_first_thread_sibling(raw_smp_processor_id())) {
- unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
msg |= cpu_thread_in_core(cpu);
__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
return;
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 11/13] KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
POWER9 includes a new interrupt controller, called XIVE, which is
quite different from the XICS interrupt controller on POWER7 and
POWER8 machines. KVM-HV accesses the XICS directly in several places
in order to send and clear IPIs and handle interrupts from PCI
devices being passed through to the guest.
In order to make the transition to XIVE easier, OPAL firmware will
include an emulation of XICS on top of XIVE. Access to the emulated
XICS is via OPAL calls. The one complication is that the EOI
(end-of-interrupt) function can now return a value indicating that
another interrupt is pending; in this case, the XIVE will not signal
an interrupt in hardware to the CPU, and software is supposed to
acknowledge the new interrupt without waiting for another interrupt
to be delivered in hardware.
This adapts KVM-HV to use the OPAL calls on machines where there is
no XICS hardware. When there is no XICS, we look for a device-tree
node with "ibm,opal-intc" in its compatible property, which is how
OPAL indicates that it provides XICS emulation.
In order to handle the EOI return value, kvmppc_read_intr() has
become kvmppc_read_one_intr(), with a boolean variable passed by
reference which can be set by the EOI functions to indicate that
another interrupt is pending. The new kvmppc_read_intr() keeps
calling kvmppc_read_one_intr() until there are no more interrupts
to process. The return value from kvmppc_read_intr() is the
largest non-zero value of the returns from kvmppc_read_one_intr().
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_ppc.h | 7 +++--
arch/powerpc/kvm/book3s_hv.c | 28 +++++++++++++++--
arch/powerpc/kvm/book3s_hv_builtin.c | 59 ++++++++++++++++++++++++++++++------
arch/powerpc/kvm/book3s_hv_rm_xics.c | 23 ++++++++++----
4 files changed, 96 insertions(+), 21 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index f6e4964..a5b94be 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -483,9 +483,10 @@ extern void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long guest_irq,
unsigned long host_irq);
extern void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
unsigned long host_irq);
-extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, u32 xirr,
- struct kvmppc_irq_map *irq_map,
- struct kvmppc_passthru_irqmap *pimap);
+extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, __be32 xirr,
+ struct kvmppc_irq_map *irq_map,
+ struct kvmppc_passthru_irqmap *pimap,
+ bool *again);
extern int h_ipi_redirect;
#else
static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ace89df..a1d2b5f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -55,6 +55,8 @@
#include <asm/hmi.h>
#include <asm/pnv-pci.h>
#include <asm/mmu.h>
+#include <asm/opal.h>
+#include <asm/xics.h>
#include <linux/gfp.h>
#include <linux/vmalloc.h>
#include <linux/highmem.h>
@@ -63,6 +65,7 @@
#include <linux/irqbypass.h>
#include <linux/module.h>
#include <linux/compiler.h>
+#include <linux/of.h>
#include "book3s.h"
@@ -172,8 +175,12 @@ static bool kvmppc_ipi_thread(int cpu)
}
#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
- if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
- xics_wake_cpu(cpu);
+ if (cpu >= 0 && cpu < nr_cpu_ids) {
+ if (paca[cpu].kvm_hstate.xics_phys) {
+ xics_wake_cpu(cpu);
+ return true;
+ }
+ opal_int_set_mfrr(get_hard_smp_processor_id(cpu), IPI_PRIORITY);
return true;
}
#endif
@@ -3729,6 +3736,23 @@ static int kvmppc_book3s_init_hv(void)
if (r)
return r;
+ /*
+ * We need a way of accessing the XICS interrupt controller,
+ * either directly, via paca[cpu].kvm_hstate.xics_phys, or
+ * indirectly, via OPAL.
+ */
+#ifdef CONFIG_SMP
+ if (!get_paca()->kvm_hstate.xics_phys) {
+ struct device_node *np;
+
+ np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc");
+ if (!np) {
+ pr_err("KVM-HV: Cannot determine method for accessing XICS\n");
+ return -ENODEV;
+ }
+ }
+#endif
+
kvm_ops_hv.owner = THIS_MODULE;
kvmppc_hv_ops = &kvm_ops_hv;
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 37ed045..a09c917 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -26,6 +26,7 @@
#include <asm/dbell.h>
#include <asm/cputhreads.h>
#include <asm/io.h>
+#include <asm/opal.h>
#define KVM_CMA_CHUNK_ORDER 18
@@ -224,7 +225,11 @@ void kvmhv_rm_send_ipi(int cpu)
/* Else poke the target with an IPI */
xics_phys = paca[cpu].kvm_hstate.xics_phys;
- rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+ if (xics_phys)
+ rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+ else
+ opal_rm_int_set_mfrr(get_hard_smp_processor_id(cpu),
+ IPI_PRIORITY);
}
/*
@@ -335,7 +340,7 @@ static struct kvmppc_irq_map *get_irqmap(struct kvmppc_passthru_irqmap *pimap,
* saved a copy of the XIRR in the PACA, it will be picked up by
* the host ICP driver.
*/
-static int kvmppc_check_passthru(u32 xisr, __be32 xirr)
+static int kvmppc_check_passthru(u32 xisr, __be32 xirr, bool *again)
{
struct kvmppc_passthru_irqmap *pimap;
struct kvmppc_irq_map *irq_map;
@@ -354,7 +359,7 @@ static int kvmppc_check_passthru(u32 xisr, __be32 xirr)
/* We're handling this interrupt, generic code doesn't need to */
local_paca->kvm_hstate.saved_xirr = 0;
- return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap);
+ return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap, again);
}
#else
@@ -373,14 +378,31 @@ static inline int kvmppc_check_passthru(u32 xisr, __be32 xirr)
* -1 if there was a guest wakeup IPI (which has now been cleared)
* -2 if there is PCI passthrough external interrupt that was handled
*/
+static long kvmppc_read_one_intr(bool *again);
long kvmppc_read_intr(void)
{
+ long ret = 0;
+ long rc;
+ bool again;
+
+ do {
+ again = false;
+ rc = kvmppc_read_one_intr(&again);
+ if (rc && (ret == 0 || rc > ret))
+ ret = rc;
+ } while (again);
+ return ret;
+}
+
+static long kvmppc_read_one_intr(bool *again)
+{
unsigned long xics_phys;
u32 h_xirr;
__be32 xirr;
u32 xisr;
u8 host_ipi;
+ int64_t rc;
/* see if a host IPI is pending */
host_ipi = local_paca->kvm_hstate.host_ipi;
@@ -389,8 +411,14 @@ long kvmppc_read_intr(void)
/* Now read the interrupt from the ICP */
xics_phys = local_paca->kvm_hstate.xics_phys;
- if (unlikely(!xics_phys))
- return 1;
+ if (!xics_phys) {
+ /* Use OPAL to read the XIRR */
+ rc = opal_rm_int_get_xirr(&xirr, false);
+ if (rc < 0)
+ return 1;
+ } else {
+ xirr = _lwzcix(xics_phys + XICS_XIRR);
+ }
/*
* Save XIRR for later. Since we get control in reverse endian
@@ -398,7 +426,6 @@ long kvmppc_read_intr(void)
* host endian. Note that xirr is the value read from the
* XIRR register, while h_xirr is the host endian version.
*/
- xirr = _lwzcix(xics_phys + XICS_XIRR);
h_xirr = be32_to_cpu(xirr);
local_paca->kvm_hstate.saved_xirr = h_xirr;
xisr = h_xirr & 0xffffff;
@@ -417,8 +444,16 @@ long kvmppc_read_intr(void)
* If it is an IPI, clear the MFRR and EOI it.
*/
if (xisr == XICS_IPI) {
- _stbcix(xics_phys + XICS_MFRR, 0xff);
- _stwcix(xics_phys + XICS_XIRR, xirr);
+ if (xics_phys) {
+ _stbcix(xics_phys + XICS_MFRR, 0xff);
+ _stwcix(xics_phys + XICS_XIRR, xirr);
+ } else {
+ opal_rm_int_set_mfrr(hard_smp_processor_id(), 0xff);
+ rc = opal_rm_int_eoi(h_xirr);
+ /* If rc > 0, there is another interrupt pending */
+ *again = rc > 0;
+ }
+
/*
* Need to ensure side effects of above stores
* complete before proceeding.
@@ -435,7 +470,11 @@ long kvmppc_read_intr(void)
/* We raced with the host,
* we need to resend that IPI, bummer
*/
- _stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY);
+ if (xics_phys)
+ _stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY);
+ else
+ opal_rm_int_set_mfrr(hard_smp_processor_id(),
+ IPI_PRIORITY);
/* Let side effects complete */
smp_mb();
return 1;
@@ -446,5 +485,5 @@ long kvmppc_read_intr(void)
return -1;
}
- return kvmppc_check_passthru(xisr, xirr);
+ return kvmppc_check_passthru(xisr, xirr, again);
}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index a0ea63a..06edc43 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -70,7 +70,11 @@ static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu)
hcpu = hcore << threads_shift;
kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu;
smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION);
- icp_native_cause_ipi_rm(hcpu);
+ if (paca[hcpu].kvm_hstate.xics_phys)
+ icp_native_cause_ipi_rm(hcpu);
+ else
+ opal_rm_int_set_mfrr(get_hard_smp_processor_id(hcpu),
+ IPI_PRIORITY);
}
#else
static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) { }
@@ -737,7 +741,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
unsigned long eoi_rc;
-static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
+static void icp_eoi(struct irq_chip *c, u32 hwirq, __be32 xirr, bool *again)
{
unsigned long xics_phys;
int64_t rc;
@@ -751,7 +755,12 @@ static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
/* EOI it */
xics_phys = local_paca->kvm_hstate.xics_phys;
- _stwcix(xics_phys + XICS_XIRR, xirr);
+ if (xics_phys) {
+ _stwcix(xics_phys + XICS_XIRR, xirr);
+ } else {
+ rc = opal_rm_int_eoi(be32_to_cpu(xirr));
+ *again = rc > 0;
+ }
}
static int xics_opal_rm_set_server(unsigned int hw_irq, int server_cpu)
@@ -809,9 +818,10 @@ static void kvmppc_rm_handle_irq_desc(struct irq_desc *desc)
}
long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
- u32 xirr,
+ __be32 xirr,
struct kvmppc_irq_map *irq_map,
- struct kvmppc_passthru_irqmap *pimap)
+ struct kvmppc_passthru_irqmap *pimap,
+ bool *again)
{
struct kvmppc_xics *xics;
struct kvmppc_icp *icp;
@@ -825,7 +835,8 @@ long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
icp_rm_deliver_irq(xics, icp, irq);
/* EOI the interrupt */
- icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr);
+ icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr,
+ again);
if (check_too_hard(xics, icp) == H_TOO_HARD)
return 2;
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 11/13] KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
POWER9 includes a new interrupt controller, called XIVE, which is
quite different from the XICS interrupt controller on POWER7 and
POWER8 machines. KVM-HV accesses the XICS directly in several places
in order to send and clear IPIs and handle interrupts from PCI
devices being passed through to the guest.
In order to make the transition to XIVE easier, OPAL firmware will
include an emulation of XICS on top of XIVE. Access to the emulated
XICS is via OPAL calls. The one complication is that the EOI
(end-of-interrupt) function can now return a value indicating that
another interrupt is pending; in this case, the XIVE will not signal
an interrupt in hardware to the CPU, and software is supposed to
acknowledge the new interrupt without waiting for another interrupt
to be delivered in hardware.
This adapts KVM-HV to use the OPAL calls on machines where there is
no XICS hardware. When there is no XICS, we look for a device-tree
node with "ibm,opal-intc" in its compatible property, which is how
OPAL indicates that it provides XICS emulation.
In order to handle the EOI return value, kvmppc_read_intr() has
become kvmppc_read_one_intr(), with a boolean variable passed by
reference which can be set by the EOI functions to indicate that
another interrupt is pending. The new kvmppc_read_intr() keeps
calling kvmppc_read_one_intr() until there are no more interrupts
to process. The return value from kvmppc_read_intr() is the
largest non-zero value of the returns from kvmppc_read_one_intr().
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_ppc.h | 7 +++--
arch/powerpc/kvm/book3s_hv.c | 28 +++++++++++++++--
arch/powerpc/kvm/book3s_hv_builtin.c | 59 ++++++++++++++++++++++++++++++------
arch/powerpc/kvm/book3s_hv_rm_xics.c | 23 ++++++++++----
4 files changed, 96 insertions(+), 21 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index f6e4964..a5b94be 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -483,9 +483,10 @@ extern void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long guest_irq,
unsigned long host_irq);
extern void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
unsigned long host_irq);
-extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, u32 xirr,
- struct kvmppc_irq_map *irq_map,
- struct kvmppc_passthru_irqmap *pimap);
+extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, __be32 xirr,
+ struct kvmppc_irq_map *irq_map,
+ struct kvmppc_passthru_irqmap *pimap,
+ bool *again);
extern int h_ipi_redirect;
#else
static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ace89df..a1d2b5f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -55,6 +55,8 @@
#include <asm/hmi.h>
#include <asm/pnv-pci.h>
#include <asm/mmu.h>
+#include <asm/opal.h>
+#include <asm/xics.h>
#include <linux/gfp.h>
#include <linux/vmalloc.h>
#include <linux/highmem.h>
@@ -63,6 +65,7 @@
#include <linux/irqbypass.h>
#include <linux/module.h>
#include <linux/compiler.h>
+#include <linux/of.h>
#include "book3s.h"
@@ -172,8 +175,12 @@ static bool kvmppc_ipi_thread(int cpu)
}
#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
- if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
- xics_wake_cpu(cpu);
+ if (cpu >= 0 && cpu < nr_cpu_ids) {
+ if (paca[cpu].kvm_hstate.xics_phys) {
+ xics_wake_cpu(cpu);
+ return true;
+ }
+ opal_int_set_mfrr(get_hard_smp_processor_id(cpu), IPI_PRIORITY);
return true;
}
#endif
@@ -3729,6 +3736,23 @@ static int kvmppc_book3s_init_hv(void)
if (r)
return r;
+ /*
+ * We need a way of accessing the XICS interrupt controller,
+ * either directly, via paca[cpu].kvm_hstate.xics_phys, or
+ * indirectly, via OPAL.
+ */
+#ifdef CONFIG_SMP
+ if (!get_paca()->kvm_hstate.xics_phys) {
+ struct device_node *np;
+
+ np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc");
+ if (!np) {
+ pr_err("KVM-HV: Cannot determine method for accessing XICS\n");
+ return -ENODEV;
+ }
+ }
+#endif
+
kvm_ops_hv.owner = THIS_MODULE;
kvmppc_hv_ops = &kvm_ops_hv;
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 37ed045..a09c917 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -26,6 +26,7 @@
#include <asm/dbell.h>
#include <asm/cputhreads.h>
#include <asm/io.h>
+#include <asm/opal.h>
#define KVM_CMA_CHUNK_ORDER 18
@@ -224,7 +225,11 @@ void kvmhv_rm_send_ipi(int cpu)
/* Else poke the target with an IPI */
xics_phys = paca[cpu].kvm_hstate.xics_phys;
- rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+ if (xics_phys)
+ rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+ else
+ opal_rm_int_set_mfrr(get_hard_smp_processor_id(cpu),
+ IPI_PRIORITY);
}
/*
@@ -335,7 +340,7 @@ static struct kvmppc_irq_map *get_irqmap(struct kvmppc_passthru_irqmap *pimap,
* saved a copy of the XIRR in the PACA, it will be picked up by
* the host ICP driver.
*/
-static int kvmppc_check_passthru(u32 xisr, __be32 xirr)
+static int kvmppc_check_passthru(u32 xisr, __be32 xirr, bool *again)
{
struct kvmppc_passthru_irqmap *pimap;
struct kvmppc_irq_map *irq_map;
@@ -354,7 +359,7 @@ static int kvmppc_check_passthru(u32 xisr, __be32 xirr)
/* We're handling this interrupt, generic code doesn't need to */
local_paca->kvm_hstate.saved_xirr = 0;
- return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap);
+ return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap, again);
}
#else
@@ -373,14 +378,31 @@ static inline int kvmppc_check_passthru(u32 xisr, __be32 xirr)
* -1 if there was a guest wakeup IPI (which has now been cleared)
* -2 if there is PCI passthrough external interrupt that was handled
*/
+static long kvmppc_read_one_intr(bool *again);
long kvmppc_read_intr(void)
{
+ long ret = 0;
+ long rc;
+ bool again;
+
+ do {
+ again = false;
+ rc = kvmppc_read_one_intr(&again);
+ if (rc && (ret = 0 || rc > ret))
+ ret = rc;
+ } while (again);
+ return ret;
+}
+
+static long kvmppc_read_one_intr(bool *again)
+{
unsigned long xics_phys;
u32 h_xirr;
__be32 xirr;
u32 xisr;
u8 host_ipi;
+ int64_t rc;
/* see if a host IPI is pending */
host_ipi = local_paca->kvm_hstate.host_ipi;
@@ -389,8 +411,14 @@ long kvmppc_read_intr(void)
/* Now read the interrupt from the ICP */
xics_phys = local_paca->kvm_hstate.xics_phys;
- if (unlikely(!xics_phys))
- return 1;
+ if (!xics_phys) {
+ /* Use OPAL to read the XIRR */
+ rc = opal_rm_int_get_xirr(&xirr, false);
+ if (rc < 0)
+ return 1;
+ } else {
+ xirr = _lwzcix(xics_phys + XICS_XIRR);
+ }
/*
* Save XIRR for later. Since we get control in reverse endian
@@ -398,7 +426,6 @@ long kvmppc_read_intr(void)
* host endian. Note that xirr is the value read from the
* XIRR register, while h_xirr is the host endian version.
*/
- xirr = _lwzcix(xics_phys + XICS_XIRR);
h_xirr = be32_to_cpu(xirr);
local_paca->kvm_hstate.saved_xirr = h_xirr;
xisr = h_xirr & 0xffffff;
@@ -417,8 +444,16 @@ long kvmppc_read_intr(void)
* If it is an IPI, clear the MFRR and EOI it.
*/
if (xisr = XICS_IPI) {
- _stbcix(xics_phys + XICS_MFRR, 0xff);
- _stwcix(xics_phys + XICS_XIRR, xirr);
+ if (xics_phys) {
+ _stbcix(xics_phys + XICS_MFRR, 0xff);
+ _stwcix(xics_phys + XICS_XIRR, xirr);
+ } else {
+ opal_rm_int_set_mfrr(hard_smp_processor_id(), 0xff);
+ rc = opal_rm_int_eoi(h_xirr);
+ /* If rc > 0, there is another interrupt pending */
+ *again = rc > 0;
+ }
+
/*
* Need to ensure side effects of above stores
* complete before proceeding.
@@ -435,7 +470,11 @@ long kvmppc_read_intr(void)
/* We raced with the host,
* we need to resend that IPI, bummer
*/
- _stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY);
+ if (xics_phys)
+ _stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY);
+ else
+ opal_rm_int_set_mfrr(hard_smp_processor_id(),
+ IPI_PRIORITY);
/* Let side effects complete */
smp_mb();
return 1;
@@ -446,5 +485,5 @@ long kvmppc_read_intr(void)
return -1;
}
- return kvmppc_check_passthru(xisr, xirr);
+ return kvmppc_check_passthru(xisr, xirr, again);
}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index a0ea63a..06edc43 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -70,7 +70,11 @@ static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu)
hcpu = hcore << threads_shift;
kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu;
smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION);
- icp_native_cause_ipi_rm(hcpu);
+ if (paca[hcpu].kvm_hstate.xics_phys)
+ icp_native_cause_ipi_rm(hcpu);
+ else
+ opal_rm_int_set_mfrr(get_hard_smp_processor_id(hcpu),
+ IPI_PRIORITY);
}
#else
static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) { }
@@ -737,7 +741,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
unsigned long eoi_rc;
-static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
+static void icp_eoi(struct irq_chip *c, u32 hwirq, __be32 xirr, bool *again)
{
unsigned long xics_phys;
int64_t rc;
@@ -751,7 +755,12 @@ static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
/* EOI it */
xics_phys = local_paca->kvm_hstate.xics_phys;
- _stwcix(xics_phys + XICS_XIRR, xirr);
+ if (xics_phys) {
+ _stwcix(xics_phys + XICS_XIRR, xirr);
+ } else {
+ rc = opal_rm_int_eoi(be32_to_cpu(xirr));
+ *again = rc > 0;
+ }
}
static int xics_opal_rm_set_server(unsigned int hw_irq, int server_cpu)
@@ -809,9 +818,10 @@ static void kvmppc_rm_handle_irq_desc(struct irq_desc *desc)
}
long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
- u32 xirr,
+ __be32 xirr,
struct kvmppc_irq_map *irq_map,
- struct kvmppc_passthru_irqmap *pimap)
+ struct kvmppc_passthru_irqmap *pimap,
+ bool *again)
{
struct kvmppc_xics *xics;
struct kvmppc_icp *icp;
@@ -825,7 +835,8 @@ long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
icp_rm_deliver_irq(xics, icp, irq);
/* EOI the interrupt */
- icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr);
+ icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr,
+ again);
if (check_too_hard(xics, icp) = H_TOO_HARD)
return 2;
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 12/13] KVM: PPC: Book3S HV: Use stop instruction rather than nap on POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
POWER9 replaces the various power-saving mode instructions on POWER8
(doze, nap, sleep and rvwinkle) with a single "stop" instruction, plus
a register, PSSCR, which controls the depth of the power-saving mode.
This replaces the use of the nap instruction when threads are idle
during guest execution with the stop instruction, and adds code to
set PSSCR to a value which will allow an SMT mode switch while the
thread is idle (given that the core as a whole won't be idle in these
cases).
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index acae5c3..e9eaff4 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -501,17 +501,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
cmpwi r0, 0
beq 57f
li r3, (LPCR_PECEDH | LPCR_PECE0) >> 4
- mfspr r4, SPRN_LPCR
- rlwimi r4, r3, 4, (LPCR_PECEDP | LPCR_PECEDH | LPCR_PECE0 | LPCR_PECE1)
- mtspr SPRN_LPCR, r4
- isync
- std r0, HSTATE_SCRATCH0(r13)
- ptesync
- ld r0, HSTATE_SCRATCH0(r13)
-1: cmpd r0, r0
- bne 1b
- nap
- b .
+ mfspr r5, SPRN_LPCR
+ rlwimi r5, r3, 4, (LPCR_PECEDP | LPCR_PECEDH | LPCR_PECE0 | LPCR_PECE1)
+ b kvm_nap_sequence
57: li r0, 0
stbx r0, r3, r4
@@ -2256,6 +2248,17 @@ BEGIN_FTR_SECTION
ori r5, r5, LPCR_PECEDH
rlwimi r5, r3, 0, LPCR_PECEDP
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+
+kvm_nap_sequence: /* desired LPCR value in r5 */
+BEGIN_FTR_SECTION
+ /*
+ * PSSCR bits: exit criterion = 1 (wakeup based on LPCR at sreset)
+ * enable state loss = 1 (allow SMT mode switch)
+ * requested level = 0 (just stop dispatching)
+ */
+ lis r3, (PSSCR_EC | PSSCR_ESL)@h
+ mtspr SPRN_PSSCR, r3
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
mtspr SPRN_LPCR,r5
isync
li r0, 0
@@ -2264,7 +2267,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
ld r0, HSTATE_SCRATCH0(r13)
1: cmpd r0, r0
bne 1b
+BEGIN_FTR_SECTION
nap
+FTR_SECTION_ELSE
+ PPC_STOP
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
b .
33: mr r4, r3
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 12/13] KVM: PPC: Book3S HV: Use stop instruction rather than nap on POWER9
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
POWER9 replaces the various power-saving mode instructions on POWER8
(doze, nap, sleep and rvwinkle) with a single "stop" instruction, plus
a register, PSSCR, which controls the depth of the power-saving mode.
This replaces the use of the nap instruction when threads are idle
during guest execution with the stop instruction, and adds code to
set PSSCR to a value which will allow an SMT mode switch while the
thread is idle (given that the core as a whole won't be idle in these
cases).
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index acae5c3..e9eaff4 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -501,17 +501,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
cmpwi r0, 0
beq 57f
li r3, (LPCR_PECEDH | LPCR_PECE0) >> 4
- mfspr r4, SPRN_LPCR
- rlwimi r4, r3, 4, (LPCR_PECEDP | LPCR_PECEDH | LPCR_PECE0 | LPCR_PECE1)
- mtspr SPRN_LPCR, r4
- isync
- std r0, HSTATE_SCRATCH0(r13)
- ptesync
- ld r0, HSTATE_SCRATCH0(r13)
-1: cmpd r0, r0
- bne 1b
- nap
- b .
+ mfspr r5, SPRN_LPCR
+ rlwimi r5, r3, 4, (LPCR_PECEDP | LPCR_PECEDH | LPCR_PECE0 | LPCR_PECE1)
+ b kvm_nap_sequence
57: li r0, 0
stbx r0, r3, r4
@@ -2256,6 +2248,17 @@ BEGIN_FTR_SECTION
ori r5, r5, LPCR_PECEDH
rlwimi r5, r3, 0, LPCR_PECEDP
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+
+kvm_nap_sequence: /* desired LPCR value in r5 */
+BEGIN_FTR_SECTION
+ /*
+ * PSSCR bits: exit criterion = 1 (wakeup based on LPCR at sreset)
+ * enable state loss = 1 (allow SMT mode switch)
+ * requested level = 0 (just stop dispatching)
+ */
+ lis r3, (PSSCR_EC | PSSCR_ESL)@h
+ mtspr SPRN_PSSCR, r3
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
mtspr SPRN_LPCR,r5
isync
li r0, 0
@@ -2264,7 +2267,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
ld r0, HSTATE_SCRATCH0(r13)
1: cmpd r0, r0
bne 1b
+BEGIN_FTR_SECTION
nap
+FTR_SECTION_ELSE
+ PPC_STOP
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
b .
33: mr r4, r3
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 13/13] KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 7:28 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
With POWER9, each CPU thread has its own MMU context and can be
in the host or a guest independently of the other threads; there is
still however a restriction that all threads must use the same type
of address translation, either radix tree or hashed page table (HPT).
Since we only support HPT guests on a HPT host at this point, we
can treat the threads as being independent, and avoid all of the
work of coordinating the CPU threads. To make this simpler, we
introduce a new threads_per_vcore() function that returns 1 on
POWER9 and threads_per_subcore on POWER7/8, and use that instead
of threads_per_subcore or threads_per_core in various places.
This also changes the value of the KVM_CAP_PPC_SMT capability on
POWER9 systems from 4 to 1, so that userspace will not try to
create VMs with multiple vcpus per vcore. (If userspace did create
a VM that thought it was in an SMT mode, the VM might try to use
the msgsndp instruction, which will not work as expected. In
future it may be possible to trap and emulate msgsndp in order to
allow VMs to think they are in an SMT mode, if only for the purpose
of allowing migration from POWER8 systems.)
With all this, we can now run guests on POWER9 as long as the host
is running with HPT translation. Since userspace currently has no
way to request radix tree translation for the guest, the guest has
no choice but to use HPT translation.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv.c | 36 +++++++++++++++++++++++++++++-------
arch/powerpc/kvm/powerpc.c | 11 +++++++----
2 files changed, 36 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a1d2b5f..591ac84 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1569,6 +1569,20 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
return r;
}
+/*
+ * On POWER9, threads are independent and can be in different partitions.
+ * Therefore we consider each thread to be a subcore.
+ * There is a restriction that all threads have to be in the same
+ * MMU mode (radix or HPT), unfortunately, but since we only support
+ * HPT guests on a HPT host so far, that isn't an impediment yet.
+ */
+static int threads_per_vcore(void)
+{
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ return 1;
+ return threads_per_subcore;
+}
+
static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
{
struct kvmppc_vcore *vcore;
@@ -1583,7 +1597,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
init_swait_queue_head(&vcore->wq);
vcore->preempt_tb = TB_NIL;
vcore->lpcr = kvm->arch.lpcr;
- vcore->first_vcpuid = core * threads_per_subcore;
+ vcore->first_vcpuid = core * threads_per_vcore();
vcore->kvm = kvm;
INIT_LIST_HEAD(&vcore->preempt_list);
@@ -1746,7 +1760,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
int core;
struct kvmppc_vcore *vcore;
- core = id / threads_per_subcore;
+ core = id / threads_per_vcore();
if (core >= KVM_MAX_VCORES)
goto out;
@@ -2336,6 +2350,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
unsigned long cmd_bit, stat_bit;
int pcpu, thr;
int target_threads;
+ int controlled_threads;
/*
* Remove from the list any threads that have a signal pending
@@ -2354,11 +2369,18 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
vc->preempt_tb = TB_NIL;
/*
+ * Number of threads that we will be controlling: the same as
+ * the number of threads per subcore, except on POWER9,
+ * where it's 1 because the threads are (mostly) independent.
+ */
+ controlled_threads = threads_per_vcore();
+
+ /*
* Make sure we are running on primary threads, and that secondary
* threads are offline. Also check if the number of threads in this
* guest are greater than the current system threads per guest.
*/
- if ((threads_per_core > 1) &&
+ if ((controlled_threads > 1) &&
((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
for_each_runnable_thread(i, vcpu, vc) {
vcpu->arch.ret = -EBUSY;
@@ -2374,7 +2396,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
*/
init_core_info(&core_info, vc);
pcpu = smp_processor_id();
- target_threads = threads_per_subcore;
+ target_threads = controlled_threads;
if (target_smt_mode && target_smt_mode < target_threads)
target_threads = target_smt_mode;
if (vc->num_threads < target_threads)
@@ -2410,7 +2432,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
smp_wmb();
}
pcpu = smp_processor_id();
- for (thr = 0; thr < threads_per_subcore; ++thr)
+ for (thr = 0; thr < controlled_threads; ++thr)
paca[pcpu + thr].kvm_hstate.kvm_split_mode = sip;
/* Initiate micro-threading (split-core) if required */
@@ -3380,9 +3402,9 @@ static int kvmppc_core_check_processor_compat_hv(void)
!cpu_has_feature(CPU_FTR_ARCH_206))
return -EIO;
/*
- * Disable KVM for Power9, untill the required bits merged.
+ * Disable KVM for Power9 in radix mode.
*/
- if (cpu_has_feature(CPU_FTR_ARCH_300))
+ if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled())
return -EIO;
return 0;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 70963c8..b5e4705 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -548,10 +548,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
#endif /* CONFIG_PPC_BOOK3S_64 */
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_SMT:
- if (hv_enabled)
- r = threads_per_subcore;
- else
- r = 0;
+ r = 0;
+ if (hv_enabled) {
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ r = 1;
+ else
+ r = threads_per_subcore;
+ }
break;
case KVM_CAP_PPC_RMA:
r = 0;
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 13/13] KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores
@ 2016-11-18 7:28 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18 7:28 UTC (permalink / raw)
To: kvm, kvm-ppc, linuxppc-dev
With POWER9, each CPU thread has its own MMU context and can be
in the host or a guest independently of the other threads; there is
still however a restriction that all threads must use the same type
of address translation, either radix tree or hashed page table (HPT).
Since we only support HPT guests on a HPT host at this point, we
can treat the threads as being independent, and avoid all of the
work of coordinating the CPU threads. To make this simpler, we
introduce a new threads_per_vcore() function that returns 1 on
POWER9 and threads_per_subcore on POWER7/8, and use that instead
of threads_per_subcore or threads_per_core in various places.
This also changes the value of the KVM_CAP_PPC_SMT capability on
POWER9 systems from 4 to 1, so that userspace will not try to
create VMs with multiple vcpus per vcore. (If userspace did create
a VM that thought it was in an SMT mode, the VM might try to use
the msgsndp instruction, which will not work as expected. In
future it may be possible to trap and emulate msgsndp in order to
allow VMs to think they are in an SMT mode, if only for the purpose
of allowing migration from POWER8 systems.)
With all this, we can now run guests on POWER9 as long as the host
is running with HPT translation. Since userspace currently has no
way to request radix tree translation for the guest, the guest has
no choice but to use HPT translation.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv.c | 36 +++++++++++++++++++++++++++++-------
arch/powerpc/kvm/powerpc.c | 11 +++++++----
2 files changed, 36 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a1d2b5f..591ac84 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1569,6 +1569,20 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
return r;
}
+/*
+ * On POWER9, threads are independent and can be in different partitions.
+ * Therefore we consider each thread to be a subcore.
+ * There is a restriction that all threads have to be in the same
+ * MMU mode (radix or HPT), unfortunately, but since we only support
+ * HPT guests on a HPT host so far, that isn't an impediment yet.
+ */
+static int threads_per_vcore(void)
+{
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ return 1;
+ return threads_per_subcore;
+}
+
static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
{
struct kvmppc_vcore *vcore;
@@ -1583,7 +1597,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
init_swait_queue_head(&vcore->wq);
vcore->preempt_tb = TB_NIL;
vcore->lpcr = kvm->arch.lpcr;
- vcore->first_vcpuid = core * threads_per_subcore;
+ vcore->first_vcpuid = core * threads_per_vcore();
vcore->kvm = kvm;
INIT_LIST_HEAD(&vcore->preempt_list);
@@ -1746,7 +1760,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
int core;
struct kvmppc_vcore *vcore;
- core = id / threads_per_subcore;
+ core = id / threads_per_vcore();
if (core >= KVM_MAX_VCORES)
goto out;
@@ -2336,6 +2350,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
unsigned long cmd_bit, stat_bit;
int pcpu, thr;
int target_threads;
+ int controlled_threads;
/*
* Remove from the list any threads that have a signal pending
@@ -2354,11 +2369,18 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
vc->preempt_tb = TB_NIL;
/*
+ * Number of threads that we will be controlling: the same as
+ * the number of threads per subcore, except on POWER9,
+ * where it's 1 because the threads are (mostly) independent.
+ */
+ controlled_threads = threads_per_vcore();
+
+ /*
* Make sure we are running on primary threads, and that secondary
* threads are offline. Also check if the number of threads in this
* guest are greater than the current system threads per guest.
*/
- if ((threads_per_core > 1) &&
+ if ((controlled_threads > 1) &&
((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
for_each_runnable_thread(i, vcpu, vc) {
vcpu->arch.ret = -EBUSY;
@@ -2374,7 +2396,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
*/
init_core_info(&core_info, vc);
pcpu = smp_processor_id();
- target_threads = threads_per_subcore;
+ target_threads = controlled_threads;
if (target_smt_mode && target_smt_mode < target_threads)
target_threads = target_smt_mode;
if (vc->num_threads < target_threads)
@@ -2410,7 +2432,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
smp_wmb();
}
pcpu = smp_processor_id();
- for (thr = 0; thr < threads_per_subcore; ++thr)
+ for (thr = 0; thr < controlled_threads; ++thr)
paca[pcpu + thr].kvm_hstate.kvm_split_mode = sip;
/* Initiate micro-threading (split-core) if required */
@@ -3380,9 +3402,9 @@ static int kvmppc_core_check_processor_compat_hv(void)
!cpu_has_feature(CPU_FTR_ARCH_206))
return -EIO;
/*
- * Disable KVM for Power9, untill the required bits merged.
+ * Disable KVM for Power9 in radix mode.
*/
- if (cpu_has_feature(CPU_FTR_ARCH_300))
+ if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled())
return -EIO;
return 0;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 70963c8..b5e4705 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -548,10 +548,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
#endif /* CONFIG_PPC_BOOK3S_64 */
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_SMT:
- if (hv_enabled)
- r = threads_per_subcore;
- else
- r = 0;
+ r = 0;
+ if (hv_enabled) {
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ r = 1;
+ else
+ r = threads_per_subcore;
+ }
break;
case KVM_CAP_PPC_RMA:
r = 0;
--
2.7.4
^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 14:39 ` Aneesh Kumar K.V
-1 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:27 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
Paul Mackerras <paulus@ozlabs.org> writes:
+
> + /* Global flush of TLBs and partition table caches for this lpid */
> + asm volatile("ptesync");
> + asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> + asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> +}
It would be nice to convert that 0x800 to a documented IS value or better use
radix__flush_tlb_pid() ?
-aneesh
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 14:47 ` Aneesh Kumar K.V
-1 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:35 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
Paul Mackerras <paulus@ozlabs.org> writes:
> Some special-purpose registers that were present and accessible
> by guests on POWER8 no longer exist on POWER9, so this adds
> feature sections to ensure that we don't try to context-switch
> them when going into or out of a guest on POWER9. These are
> all relatively obscure, rarely-used registers, but we had to
> context-switch them on POWER8 to avoid creating a covert channel.
> They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.
We don't need to context-switch them even when running a power8 compat
guest ?
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 ++++++++++++++++++++-------------
> 1 file changed, 30 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index dc25467..d422014 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -752,14 +752,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
> BEGIN_FTR_SECTION
> ld r5, VCPU_MMCR + 24(r4)
> ld r6, VCPU_SIER(r4)
> + mtspr SPRN_MMCR2, r5
> + mtspr SPRN_SIER, r6
> +BEGIN_FTR_SECTION_NESTED(96)
> lwz r7, VCPU_PMC + 24(r4)
> lwz r8, VCPU_PMC + 28(r4)
> ld r9, VCPU_MMCR + 32(r4)
> - mtspr SPRN_MMCR2, r5
> - mtspr SPRN_SIER, r6
> mtspr SPRN_SPMC1, r7
> mtspr SPRN_SPMC2, r8
> mtspr SPRN_MMCRS, r9
> +END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
> END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> mtspr SPRN_MMCR0, r3
> isync
> @@ -815,20 +817,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
> mtspr SPRN_EBBHR, r8
> ld r5, VCPU_EBBRR(r4)
> ld r6, VCPU_BESCR(r4)
> - ld r7, VCPU_CSIGR(r4)
> - ld r8, VCPU_TACR(r4)
> + lwz r7, VCPU_GUEST_PID(r4)
> + ld r8, VCPU_WORT(r4)
> mtspr SPRN_EBBRR, r5
> mtspr SPRN_BESCR, r6
> - mtspr SPRN_CSIGR, r7
> - mtspr SPRN_TACR, r8
> + mtspr SPRN_PID, r7
> + mtspr SPRN_WORT, r8
> +BEGIN_FTR_SECTION
> ld r5, VCPU_TCSCR(r4)
> ld r6, VCPU_ACOP(r4)
> - lwz r7, VCPU_GUEST_PID(r4)
> - ld r8, VCPU_WORT(r4)
> + ld r7, VCPU_CSIGR(r4)
> + ld r8, VCPU_TACR(r4)
> mtspr SPRN_TCSCR, r5
> mtspr SPRN_ACOP, r6
> - mtspr SPRN_PID, r7
> - mtspr SPRN_WORT, r8
> + mtspr SPRN_CSIGR, r7
> + mtspr SPRN_TACR, r8
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> 8:
>
> /*
> @@ -1343,20 +1347,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
> std r8, VCPU_EBBHR(r9)
> mfspr r5, SPRN_EBBRR
> mfspr r6, SPRN_BESCR
> - mfspr r7, SPRN_CSIGR
> - mfspr r8, SPRN_TACR
> + mfspr r7, SPRN_PID
> + mfspr r8, SPRN_WORT
> std r5, VCPU_EBBRR(r9)
> std r6, VCPU_BESCR(r9)
> - std r7, VCPU_CSIGR(r9)
> - std r8, VCPU_TACR(r9)
> + stw r7, VCPU_GUEST_PID(r9)
> + std r8, VCPU_WORT(r9)
> +BEGIN_FTR_SECTION
> mfspr r5, SPRN_TCSCR
> mfspr r6, SPRN_ACOP
> - mfspr r7, SPRN_PID
> - mfspr r8, SPRN_WORT
> + mfspr r7, SPRN_CSIGR
> + mfspr r8, SPRN_TACR
> std r5, VCPU_TCSCR(r9)
> std r6, VCPU_ACOP(r9)
> - stw r7, VCPU_GUEST_PID(r9)
> - std r8, VCPU_WORT(r9)
> + std r7, VCPU_CSIGR(r9)
> + std r8, VCPU_TACR(r9)
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> /*
> * Restore various registers to 0, where non-zero values
> * set by the guest could disrupt the host.
> @@ -1365,12 +1371,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
> mtspr SPRN_IAMR, r0
> mtspr SPRN_CIABR, r0
> mtspr SPRN_DAWRX, r0
> - mtspr SPRN_TCSCR, r0
> mtspr SPRN_WORT, r0
> +BEGIN_FTR_SECTION
> + mtspr SPRN_TCSCR, r0
> /* Set MMCRS to 1<<31 to freeze and disable the SPMC counters */
> li r0, 1
> sldi r0, r0, 31
> mtspr SPRN_MMCRS, r0
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> 8:
>
> /* Save and reset AMR and UAMOR before turning on the MMU */
> @@ -1504,15 +1512,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> stw r8, VCPU_PMC + 20(r9)
> BEGIN_FTR_SECTION
> mfspr r5, SPRN_SIER
> + std r5, VCPU_SIER(r9)
> +BEGIN_FTR_SECTION_NESTED(96)
> mfspr r6, SPRN_SPMC1
> mfspr r7, SPRN_SPMC2
> mfspr r8, SPRN_MMCRS
> - std r5, VCPU_SIER(r9)
> stw r6, VCPU_PMC + 24(r9)
> stw r7, VCPU_PMC + 28(r9)
> std r8, VCPU_MMCR + 32(r9)
> lis r4, 0x8000
> mtspr SPRN_MMCRS, r4
> +END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
> END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> 22:
> /* Clear out SLB */
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-18 14:39 ` Aneesh Kumar K.V
0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:39 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
Paul Mackerras <paulus@ozlabs.org> writes:
+
> + /* Global flush of TLBs and partition table caches for this lpid */
> + asm volatile("ptesync");
> + asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> + asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> +}
It would be nice to convert that 0x800 to a documented IS value or better use
radix__flush_tlb_pid() ?
-aneesh
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 14:53 ` Aneesh Kumar K.V
-1 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:41 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
Paul Mackerras <paulus@ozlabs.org> writes:
> POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
> and tlbiel (local tlbie) instructions. Both instructions get a
> set of new parameters (RIC, PRS and R) which appear as bits in the
> instruction word. The tlbiel instruction now has a second register
> operand, which contains a PID and/or LPID value if needed, and
> should otherwise contain 0.
>
> This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
> as well as older processors. Since we only handle HPT guests so
> far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
> word as on previous processors, so we don't need to conditionally
> execute different instructions depending on the processor.
>
> The local flush on first entry to a guest in book3s_hv_rmhandlers.S
> is a loop which depends on the number of TLB sets. Rather than
> using feature sections to set the number of iterations based on
> which CPU we're on, we now work out this number at VM creation time
> and store it in the kvm_arch struct. That will make it possible to
> get the number from the device tree in future, which will help with
> compatibility with future processors.
>
> Since mmu_partition_table_set_entry() does a global flush of the
> whole LPID, we don't need to do the TLB flush on first entry to the
> guest on each processor. Therefore we don't set all bits in the
> tlb_need_flush bitmap on VM startup on POWER9.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/include/asm/kvm_host.h | 1 +
> arch/powerpc/kernel/asm-offsets.c | 1 +
> arch/powerpc/kvm/book3s_hv.c | 17 ++++++++++++++++-
> arch/powerpc/kvm/book3s_hv_rm_mmu.c | 10 ++++++++--
> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 ++------
> 5 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 0d94608..ea78864 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -244,6 +244,7 @@ struct kvm_arch_memory_slot {
> struct kvm_arch {
> unsigned int lpid;
> #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> + unsigned int tlb_sets;
> unsigned long hpt_virt;
> struct revmap_entry *revmap;
> atomic64_t mmio_update;
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 494241b..b9c8386 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -487,6 +487,7 @@ int main(void)
>
> /* book3s */
> #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> + DEFINE(KVM_TLB_SETS, offsetof(struct kvm, arch.tlb_sets));
> DEFINE(KVM_SDR1, offsetof(struct kvm, arch.sdr1));
> DEFINE(KVM_HOST_LPID, offsetof(struct kvm, arch.host_lpid));
> DEFINE(KVM_HOST_LPCR, offsetof(struct kvm, arch.host_lpcr));
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 59e18dfb..8395a7f 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3260,8 +3260,11 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> * Since we don't flush the TLB when tearing down a VM,
> * and this lpid might have previously been used,
> * make sure we flush on each core before running the new VM.
> + * On POWER9, the tlbie in mmu_partition_table_set_entry()
> + * does this flush for us.
> */
> - cpumask_setall(&kvm->arch.need_tlb_flush);
> + if (!cpu_has_feature(CPU_FTR_ARCH_300))
> + cpumask_setall(&kvm->arch.need_tlb_flush);
>
> /* Start out with the default set of hcalls enabled */
> memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
> @@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> kvm->arch.lpcr = lpcr;
>
> /*
> + * Work out how many sets the TLB has, for the use of
> + * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> + */
> + if (cpu_has_feature(CPU_FTR_ARCH_300))
> + kvm->arch.tlb_sets = 256; /* POWER9 */
> + else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> + kvm->arch.tlb_sets = 512; /* POWER8 */
> + else
> + kvm->arch.tlb_sets = 128; /* POWER7 */
> +
We have
#define POWER7_TLB_SETS 128 /* # sets in POWER7 TLB */
#define POWER8_TLB_SETS 512 /* # sets in POWER8 TLB */
#define POWER9_TLB_SETS_HASH 256 /* # sets in POWER9 TLB Hash mode */
#define POWER9_TLB_SETS_RADIX 128 /* # sets in POWER9 TLB Radix mode */
May be use that instead of opencoding ?
> + /*
> * Track that we now have a HV mode VM active. This blocks secondary
> * CPU threads from coming online.
> */
> @@ -3728,3 +3742,4 @@ module_exit(kvmppc_book3s_exit_hv);
> MODULE_LICENSE("GPL");
> MODULE_ALIAS_MISCDEV(KVM_MINOR);
> MODULE_ALIAS("devname:kvm");
> +
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 1179e40..9ef3c4b 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -424,13 +424,18 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
> {
> long i;
>
> + /*
> + * We use the POWER9 5-operand versions of tlbie and tlbiel here.
> + * Since we are using RIC=0 PRS=0 R=0, and P7/P8 tlbiel ignores
> + * the RS field, this is backwards-compatible with P7 and P8.
> + */
> if (global) {
> while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
> cpu_relax();
> if (need_sync)
> asm volatile("ptesync" : : : "memory");
> for (i = 0; i < npages; ++i)
> - asm volatile(PPC_TLBIE(%1,%0) : :
> + asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
> "r" (rbvalues[i]), "r" (kvm->arch.lpid));
> asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> kvm->arch.tlbie_lock = 0;
> @@ -438,7 +443,8 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
> if (need_sync)
> asm volatile("ptesync" : : : "memory");
> for (i = 0; i < npages; ++i)
> - asm volatile("tlbiel %0" : : "r" (rbvalues[i]));
> + asm volatile(PPC_TLBIEL(%0,%1,0,0,0) : :
> + "r" (rbvalues[i]), "r" (0));
> asm volatile("ptesync" : : : "memory");
> }
> }
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 219a04f..acae5c3 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -613,12 +613,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> stdcx. r7,0,r6
> bne 23b
> /* Flush the TLB of any entries for this LPID */
> - /* use arch 2.07S as a proxy for POWER8 */
> -BEGIN_FTR_SECTION
> - li r6,512 /* POWER8 has 512 sets */
> -FTR_SECTION_ELSE
> - li r6,128 /* POWER7 has 128 sets */
> -ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_207S)
> + lwz r6,KVM_TLB_SETS(r9)
> + li r0,0 /* RS for P9 version of tlbiel */
> mtctr r6
> li r7,0x800 /* IS field = 0b10 */
> ptesync
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-18 14:59 ` Aneesh Kumar K.V
-1 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:47 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
Paul Mackerras <paulus@ozlabs.org> writes:
> On POWER9, the msgsnd instruction is able to send interrupts to
> other cores, as well as other threads on the local core. Since
> msgsnd is generally simpler and faster than sending an IPI via the
> XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/kvm/book3s_hv.c | 11 ++++++++++-
> arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
> 2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 8395a7f..ace89df 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -147,12 +147,21 @@ static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc,
>
> static bool kvmppc_ipi_thread(int cpu)
> {
> + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> +
> + /* On POWER9 we can use msgsnd to IPI any cpu */
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + msg |= get_hard_smp_processor_id(cpu);
> + smp_mb();
> + __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> + return true;
> + }
> +
> /* On POWER8 for IPIs to threads in the same core, use msgsnd */
> if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
> preempt_disable();
> if (cpu_first_thread_sibling(cpu) ==
> cpu_first_thread_sibling(smp_processor_id())) {
> - unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> msg |= cpu_thread_in_core(cpu);
> smp_mb();
> __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 0c84d6b..37ed045 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
> void kvmhv_rm_send_ipi(int cpu)
> {
> unsigned long xics_phys;
> + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
>
> - /* On POWER8 for IPIs to threads in the same core, use msgsnd */
> + /* On POWER9 we can use msgsnd for any destination cpu. */
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + msg |= get_hard_smp_processor_id(cpu);
> + __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> + return;
Do we need a "sync" there before msgsnd ?
> + }
> + /* On POWER8 for IPIs to threads in the same core, use msgsnd. */
> if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
> cpu_first_thread_sibling(cpu) ==
> cpu_first_thread_sibling(raw_smp_processor_id())) {
> - unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> msg |= cpu_thread_in_core(cpu);
> __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> return;
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
@ 2016-11-18 14:47 ` Aneesh Kumar K.V
0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:47 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
Paul Mackerras <paulus@ozlabs.org> writes:
> Some special-purpose registers that were present and accessible
> by guests on POWER8 no longer exist on POWER9, so this adds
> feature sections to ensure that we don't try to context-switch
> them when going into or out of a guest on POWER9. These are
> all relatively obscure, rarely-used registers, but we had to
> context-switch them on POWER8 to avoid creating a covert channel.
> They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.
We don't need to context-switch them even when running a power8 compat
guest ?
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 ++++++++++++++++++++-------------
> 1 file changed, 30 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index dc25467..d422014 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -752,14 +752,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
> BEGIN_FTR_SECTION
> ld r5, VCPU_MMCR + 24(r4)
> ld r6, VCPU_SIER(r4)
> + mtspr SPRN_MMCR2, r5
> + mtspr SPRN_SIER, r6
> +BEGIN_FTR_SECTION_NESTED(96)
> lwz r7, VCPU_PMC + 24(r4)
> lwz r8, VCPU_PMC + 28(r4)
> ld r9, VCPU_MMCR + 32(r4)
> - mtspr SPRN_MMCR2, r5
> - mtspr SPRN_SIER, r6
> mtspr SPRN_SPMC1, r7
> mtspr SPRN_SPMC2, r8
> mtspr SPRN_MMCRS, r9
> +END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
> END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> mtspr SPRN_MMCR0, r3
> isync
> @@ -815,20 +817,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
> mtspr SPRN_EBBHR, r8
> ld r5, VCPU_EBBRR(r4)
> ld r6, VCPU_BESCR(r4)
> - ld r7, VCPU_CSIGR(r4)
> - ld r8, VCPU_TACR(r4)
> + lwz r7, VCPU_GUEST_PID(r4)
> + ld r8, VCPU_WORT(r4)
> mtspr SPRN_EBBRR, r5
> mtspr SPRN_BESCR, r6
> - mtspr SPRN_CSIGR, r7
> - mtspr SPRN_TACR, r8
> + mtspr SPRN_PID, r7
> + mtspr SPRN_WORT, r8
> +BEGIN_FTR_SECTION
> ld r5, VCPU_TCSCR(r4)
> ld r6, VCPU_ACOP(r4)
> - lwz r7, VCPU_GUEST_PID(r4)
> - ld r8, VCPU_WORT(r4)
> + ld r7, VCPU_CSIGR(r4)
> + ld r8, VCPU_TACR(r4)
> mtspr SPRN_TCSCR, r5
> mtspr SPRN_ACOP, r6
> - mtspr SPRN_PID, r7
> - mtspr SPRN_WORT, r8
> + mtspr SPRN_CSIGR, r7
> + mtspr SPRN_TACR, r8
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> 8:
>
> /*
> @@ -1343,20 +1347,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
> std r8, VCPU_EBBHR(r9)
> mfspr r5, SPRN_EBBRR
> mfspr r6, SPRN_BESCR
> - mfspr r7, SPRN_CSIGR
> - mfspr r8, SPRN_TACR
> + mfspr r7, SPRN_PID
> + mfspr r8, SPRN_WORT
> std r5, VCPU_EBBRR(r9)
> std r6, VCPU_BESCR(r9)
> - std r7, VCPU_CSIGR(r9)
> - std r8, VCPU_TACR(r9)
> + stw r7, VCPU_GUEST_PID(r9)
> + std r8, VCPU_WORT(r9)
> +BEGIN_FTR_SECTION
> mfspr r5, SPRN_TCSCR
> mfspr r6, SPRN_ACOP
> - mfspr r7, SPRN_PID
> - mfspr r8, SPRN_WORT
> + mfspr r7, SPRN_CSIGR
> + mfspr r8, SPRN_TACR
> std r5, VCPU_TCSCR(r9)
> std r6, VCPU_ACOP(r9)
> - stw r7, VCPU_GUEST_PID(r9)
> - std r8, VCPU_WORT(r9)
> + std r7, VCPU_CSIGR(r9)
> + std r8, VCPU_TACR(r9)
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> /*
> * Restore various registers to 0, where non-zero values
> * set by the guest could disrupt the host.
> @@ -1365,12 +1371,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
> mtspr SPRN_IAMR, r0
> mtspr SPRN_CIABR, r0
> mtspr SPRN_DAWRX, r0
> - mtspr SPRN_TCSCR, r0
> mtspr SPRN_WORT, r0
> +BEGIN_FTR_SECTION
> + mtspr SPRN_TCSCR, r0
> /* Set MMCRS to 1<<31 to freeze and disable the SPMC counters */
> li r0, 1
> sldi r0, r0, 31
> mtspr SPRN_MMCRS, r0
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> 8:
>
> /* Save and reset AMR and UAMOR before turning on the MMU */
> @@ -1504,15 +1512,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> stw r8, VCPU_PMC + 20(r9)
> BEGIN_FTR_SECTION
> mfspr r5, SPRN_SIER
> + std r5, VCPU_SIER(r9)
> +BEGIN_FTR_SECTION_NESTED(96)
> mfspr r6, SPRN_SPMC1
> mfspr r7, SPRN_SPMC2
> mfspr r8, SPRN_MMCRS
> - std r5, VCPU_SIER(r9)
> stw r6, VCPU_PMC + 24(r9)
> stw r7, VCPU_PMC + 28(r9)
> std r8, VCPU_MMCR + 32(r9)
> lis r4, 0x8000
> mtspr SPRN_MMCRS, r4
> +END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
> END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> 22:
> /* Clear out SLB */
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-18 14:53 ` Aneesh Kumar K.V
0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:53 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
Paul Mackerras <paulus@ozlabs.org> writes:
> POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
> and tlbiel (local tlbie) instructions. Both instructions get a
> set of new parameters (RIC, PRS and R) which appear as bits in the
> instruction word. The tlbiel instruction now has a second register
> operand, which contains a PID and/or LPID value if needed, and
> should otherwise contain 0.
>
> This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
> as well as older processors. Since we only handle HPT guests so
> far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
> word as on previous processors, so we don't need to conditionally
> execute different instructions depending on the processor.
>
> The local flush on first entry to a guest in book3s_hv_rmhandlers.S
> is a loop which depends on the number of TLB sets. Rather than
> using feature sections to set the number of iterations based on
> which CPU we're on, we now work out this number at VM creation time
> and store it in the kvm_arch struct. That will make it possible to
> get the number from the device tree in future, which will help with
> compatibility with future processors.
>
> Since mmu_partition_table_set_entry() does a global flush of the
> whole LPID, we don't need to do the TLB flush on first entry to the
> guest on each processor. Therefore we don't set all bits in the
> tlb_need_flush bitmap on VM startup on POWER9.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/include/asm/kvm_host.h | 1 +
> arch/powerpc/kernel/asm-offsets.c | 1 +
> arch/powerpc/kvm/book3s_hv.c | 17 ++++++++++++++++-
> arch/powerpc/kvm/book3s_hv_rm_mmu.c | 10 ++++++++--
> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 ++------
> 5 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 0d94608..ea78864 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -244,6 +244,7 @@ struct kvm_arch_memory_slot {
> struct kvm_arch {
> unsigned int lpid;
> #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> + unsigned int tlb_sets;
> unsigned long hpt_virt;
> struct revmap_entry *revmap;
> atomic64_t mmio_update;
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 494241b..b9c8386 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -487,6 +487,7 @@ int main(void)
>
> /* book3s */
> #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> + DEFINE(KVM_TLB_SETS, offsetof(struct kvm, arch.tlb_sets));
> DEFINE(KVM_SDR1, offsetof(struct kvm, arch.sdr1));
> DEFINE(KVM_HOST_LPID, offsetof(struct kvm, arch.host_lpid));
> DEFINE(KVM_HOST_LPCR, offsetof(struct kvm, arch.host_lpcr));
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 59e18dfb..8395a7f 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3260,8 +3260,11 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> * Since we don't flush the TLB when tearing down a VM,
> * and this lpid might have previously been used,
> * make sure we flush on each core before running the new VM.
> + * On POWER9, the tlbie in mmu_partition_table_set_entry()
> + * does this flush for us.
> */
> - cpumask_setall(&kvm->arch.need_tlb_flush);
> + if (!cpu_has_feature(CPU_FTR_ARCH_300))
> + cpumask_setall(&kvm->arch.need_tlb_flush);
>
> /* Start out with the default set of hcalls enabled */
> memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
> @@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> kvm->arch.lpcr = lpcr;
>
> /*
> + * Work out how many sets the TLB has, for the use of
> + * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> + */
> + if (cpu_has_feature(CPU_FTR_ARCH_300))
> + kvm->arch.tlb_sets = 256; /* POWER9 */
> + else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> + kvm->arch.tlb_sets = 512; /* POWER8 */
> + else
> + kvm->arch.tlb_sets = 128; /* POWER7 */
> +
We have
#define POWER7_TLB_SETS 128 /* # sets in POWER7 TLB */
#define POWER8_TLB_SETS 512 /* # sets in POWER8 TLB */
#define POWER9_TLB_SETS_HASH 256 /* # sets in POWER9 TLB Hash mode */
#define POWER9_TLB_SETS_RADIX 128 /* # sets in POWER9 TLB Radix mode */
May be use that instead of opencoding ?
> + /*
> * Track that we now have a HV mode VM active. This blocks secondary
> * CPU threads from coming online.
> */
> @@ -3728,3 +3742,4 @@ module_exit(kvmppc_book3s_exit_hv);
> MODULE_LICENSE("GPL");
> MODULE_ALIAS_MISCDEV(KVM_MINOR);
> MODULE_ALIAS("devname:kvm");
> +
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 1179e40..9ef3c4b 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -424,13 +424,18 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
> {
> long i;
>
> + /*
> + * We use the POWER9 5-operand versions of tlbie and tlbiel here.
> + * Since we are using RIC=0 PRS=0 R=0, and P7/P8 tlbiel ignores
> + * the RS field, this is backwards-compatible with P7 and P8.
> + */
> if (global) {
> while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
> cpu_relax();
> if (need_sync)
> asm volatile("ptesync" : : : "memory");
> for (i = 0; i < npages; ++i)
> - asm volatile(PPC_TLBIE(%1,%0) : :
> + asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
> "r" (rbvalues[i]), "r" (kvm->arch.lpid));
> asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> kvm->arch.tlbie_lock = 0;
> @@ -438,7 +443,8 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
> if (need_sync)
> asm volatile("ptesync" : : : "memory");
> for (i = 0; i < npages; ++i)
> - asm volatile("tlbiel %0" : : "r" (rbvalues[i]));
> + asm volatile(PPC_TLBIEL(%0,%1,0,0,0) : :
> + "r" (rbvalues[i]), "r" (0));
> asm volatile("ptesync" : : : "memory");
> }
> }
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 219a04f..acae5c3 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -613,12 +613,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> stdcx. r7,0,r6
> bne 23b
> /* Flush the TLB of any entries for this LPID */
> - /* use arch 2.07S as a proxy for POWER8 */
> -BEGIN_FTR_SECTION
> - li r6,512 /* POWER8 has 512 sets */
> -FTR_SECTION_ELSE
> - li r6,128 /* POWER7 has 128 sets */
> -ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_207S)
> + lwz r6,KVM_TLB_SETS(r9)
> + li r0,0 /* RS for P9 version of tlbiel */
> mtctr r6
> li r7,0x800 /* IS field = 0b10 */
> ptesync
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
@ 2016-11-18 14:59 ` Aneesh Kumar K.V
0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:59 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
Paul Mackerras <paulus@ozlabs.org> writes:
> On POWER9, the msgsnd instruction is able to send interrupts to
> other cores, as well as other threads on the local core. Since
> msgsnd is generally simpler and faster than sending an IPI via the
> XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/kvm/book3s_hv.c | 11 ++++++++++-
> arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
> 2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 8395a7f..ace89df 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -147,12 +147,21 @@ static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc,
>
> static bool kvmppc_ipi_thread(int cpu)
> {
> + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> +
> + /* On POWER9 we can use msgsnd to IPI any cpu */
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + msg |= get_hard_smp_processor_id(cpu);
> + smp_mb();
> + __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> + return true;
> + }
> +
> /* On POWER8 for IPIs to threads in the same core, use msgsnd */
> if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
> preempt_disable();
> if (cpu_first_thread_sibling(cpu) =
> cpu_first_thread_sibling(smp_processor_id())) {
> - unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> msg |= cpu_thread_in_core(cpu);
> smp_mb();
> __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 0c84d6b..37ed045 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
> void kvmhv_rm_send_ipi(int cpu)
> {
> unsigned long xics_phys;
> + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
>
> - /* On POWER8 for IPIs to threads in the same core, use msgsnd */
> + /* On POWER9 we can use msgsnd for any destination cpu. */
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + msg |= get_hard_smp_processor_id(cpu);
> + __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> + return;
Do we need a "sync" there before msgsnd ?
> + }
> + /* On POWER8 for IPIs to threads in the same core, use msgsnd. */
> if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
> cpu_first_thread_sibling(cpu) =
> cpu_first_thread_sibling(raw_smp_processor_id())) {
> - unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> msg |= cpu_thread_in_core(cpu);
> __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> return;
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
2016-11-18 14:53 ` Aneesh Kumar K.V
@ 2016-11-18 21:57 ` Benjamin Herrenschmidt
-1 siblings, 0 replies; 64+ messages in thread
From: Benjamin Herrenschmidt @ 2016-11-18 21:57 UTC (permalink / raw)
To: Aneesh Kumar K.V, Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
On Fri, 2016-11-18 at 20:11 +0530, Aneesh Kumar K.V wrote:
> > + * Work out how many sets the TLB has, for the use of
> > + * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > + */
> > + if (cpu_has_feature(CPU_FTR_ARCH_300))
> > + kvm->arch.tlb_sets = 256; /* POWER9 */
> > + else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > + kvm->arch.tlb_sets = 512; /* POWER8 */
> > + else
> > + kvm->arch.tlb_sets = 128; /* POWER7 */
> > +
>
> We have
>
> #define POWER7_TLB_SETS 128 /* # sets in POWER7 TLB */
> #define POWER8_TLB_SETS 512 /* # sets in POWER8 TLB */
> #define POWER9_TLB_SETS_HASH 256 /* # sets in POWER9 TLB Hash mode */
> #define POWER9_TLB_SETS_RADIX 128 /* # sets in POWER9 TLB Radix mode */
>
> May be use that instead of opencoding ?
Both are bad and are going to kill us for future backward
compatibility.
These should be a device-tree property. We can fallback to hard wired
values if it doesn't exist but we should at least look for one.
Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
in the CPU node, so let's create a new one instead, with 2 entries
(hash vs. radix) or 2 new ones, one for hash and one for radix (when
available).
Cheers,
Ben.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-18 21:57 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 64+ messages in thread
From: Benjamin Herrenschmidt @ 2016-11-18 21:57 UTC (permalink / raw)
To: Aneesh Kumar K.V, Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
On Fri, 2016-11-18 at 20:11 +0530, Aneesh Kumar K.V wrote:
> > + * Work out how many sets the TLB has, for the use of
> > + * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > + */
> > + if (cpu_has_feature(CPU_FTR_ARCH_300))
> > + kvm->arch.tlb_sets = 256; /* POWER9 */
> > + else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > + kvm->arch.tlb_sets = 512; /* POWER8 */
> > + else
> > + kvm->arch.tlb_sets = 128; /* POWER7 */
> > +
>
> We have
>
> #define POWER7_TLB_SETS 128 /* # sets in POWER7 TLB */
> #define POWER8_TLB_SETS 512 /* # sets in POWER8 TLB */
> #define POWER9_TLB_SETS_HASH 256 /* # sets in POWER9 TLB Hash mode */
> #define POWER9_TLB_SETS_RADIX 128 /* # sets in POWER9 TLB Radix mode */
>
> May be use that instead of opencoding ?
Both are bad and are going to kill us for future backward
compatibility.
These should be a device-tree property. We can fallback to hard wired
values if it doesn't exist but we should at least look for one.
Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
in the CPU node, so let's create a new one instead, with 2 entries
(hash vs. radix) or 2 new ones, one for hash and one for radix (when
available).
Cheers,
Ben.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-19 0:38 ` Balbir Singh
-1 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19 0:38 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
On 18/11/16 18:28, Paul Mackerras wrote:
> This adapts the KVM-HV hashed page table (HPT) code to read and write
> HPT entries in the new format defined in Power ISA v3.00 on POWER9
> machines. The new format moves the B (segment size) field from the
> first doubleword to the second, and trims some bits from the AVA
> (abbreviated virtual address) and ARPN (abbreviated real page number)
> fields. As far as possible, the conversion is done when reading or
> writing the HPT entries, and the rest of the code continues to use
> the old format.
>
I had a verison to do this, but it assumed we supported both PTE formats (old
and new) and that kvm would be aware of the format supported (the one you reviewed).
This is much nicer now that we know that we support *only* the older format
for KVM guests.
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 39 ++++++++++----
> arch/powerpc/kvm/book3s_hv_rm_mmu.c | 101 +++++++++++++++++++++++++-----------
> 2 files changed, 100 insertions(+), 40 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 7755bd0..20a8e8e 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -314,7 +314,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
> struct kvmppc_slb *slbe;
> unsigned long slb_v;
> unsigned long pp, key;
> - unsigned long v, gr;
> + unsigned long v, orig_v, gr;
> __be64 *hptep;
> int index;
> int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
> @@ -339,10 +339,12 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
> return -ENOENT;
> }
> hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
> - v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
> + v = orig_v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
> + if (cpu_has_feature(CPU_FTR_ARCH_300))
> + v = hpte_new_to_old_v(v, be64_to_cpu(hptep[1]));
> gr = kvm->arch.revmap[index].guest_rpte;
>
> - unlock_hpte(hptep, v);
> + unlock_hpte(hptep, orig_v);
> preempt_enable();
>
> gpte->eaddr = eaddr;
> @@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> {
> struct kvm *kvm = vcpu->kvm;
> unsigned long hpte[3], r;
> + unsigned long hnow_v, hnow_r;
> __be64 *hptep;
> unsigned long mmu_seq, psize, pte_size;
> unsigned long gpa_base, gfn_base;
> @@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> unlock_hpte(hptep, hpte[0]);
> preempt_enable();
>
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
> + hpte[1] = hpte_new_to_old_r(hpte[1]);
> + }
I think we can avoid this, if we avoid the conversion in kvmppc_hpte_hv_fault().
If we decide not to do this, then gpa will need to use a new mask to extract
the correct gpa.
> if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
> hpte[1] != vcpu->arch.pgfault_hpte[1])
> return RESUME_GUEST;
> @@ -599,9 +606,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> preempt_disable();
> while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
> cpu_relax();
> - if ((be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK) != hpte[0] ||
> - be64_to_cpu(hptep[1]) != hpte[1] ||
> - rev->guest_rpte != hpte[2])
> + hnow_v = be64_to_cpu(hptep[0]);
> + hnow_r = be64_to_cpu(hptep[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + hnow_v = hpte_new_to_old_v(hnow_v, hnow_r);
> + hnow_r = hpte_new_to_old_r(hnow_r);
> + }
> + if ((hnow_v & ~HPTE_V_HVLOCK) != hpte[0] || hnow_r != hpte[1] ||
> + rev->guest_rpte != hpte[2])
These changes can be avoided as well (based on the comment above)
> /* HPTE has been changed under us; let the guest retry */
> goto out_unlock;
> hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
> @@ -632,6 +644,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
> }
>
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + r = hpte_old_to_new_r(hpte[0], r);
> + hpte[0] = hpte_old_to_new_v(hpte[0]);
> + }
> hptep[1] = cpu_to_be64(r);
> eieio();
> __unlock_hpte(hptep, hpte[0]);
> @@ -1183,7 +1199,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
> unsigned long *hpte, struct revmap_entry *revp,
> int want_valid, int first_pass)
> {
> - unsigned long v, r;
> + unsigned long v, r, hr;
> unsigned long rcbits_unset;
> int ok = 1;
> int valid, dirty;
> @@ -1210,6 +1226,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
> while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
> cpu_relax();
> v = be64_to_cpu(hptp[0]);
> + hr = be64_to_cpu(hptp[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + v = hpte_new_to_old_v(v, hr);
> + hr = hpte_new_to_old_r(hr);
> + }
>
> /* re-evaluate valid and dirty from synchronized HPTE value */
> valid = !!(v & HPTE_V_VALID);
> @@ -1217,8 +1238,8 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
>
> /* Harvest R and C into guest view if necessary */
> rcbits_unset = ~revp->guest_rpte & (HPTE_R_R | HPTE_R_C);
> - if (valid && (rcbits_unset & be64_to_cpu(hptp[1]))) {
> - revp->guest_rpte |= (be64_to_cpu(hptp[1]) &
> + if (valid && (rcbits_unset & hr)) {
> + revp->guest_rpte |= (hr &
> (HPTE_R_R | HPTE_R_C)) | HPTE_GR_MODIFIED;
> dirty = 1;
> }
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 02786b3..1179e40 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -364,6 +364,11 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
> }
> }
>
> + /* Convert to new format on P9 */
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + ptel = hpte_old_to_new_r(pteh, ptel);
> + pteh = hpte_old_to_new_v(pteh);
> + }
So much nicer when we support just one format, otherwise my patches did a whole
bunch of unnecessary changes.
> hpte[1] = cpu_to_be64(ptel);
>
> /* Write the first HPTE dword, unlocking the HPTE and making it valid */
> @@ -445,27 +450,31 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
> __be64 *hpte;
> unsigned long v, r, rb;
> struct revmap_entry *rev;
> - u64 pte;
> + u64 pte, orig_pte, pte_r;
>
> if (pte_index >= kvm->arch.hpt_npte)
> return H_PARAMETER;
> hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
> while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
> cpu_relax();
> - pte = be64_to_cpu(hpte[0]);
> + pte = orig_pte = be64_to_cpu(hpte[0]);
> + pte_r = be64_to_cpu(hpte[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + pte = hpte_new_to_old_v(pte, pte_r);
> + pte_r = hpte_new_to_old_r(pte_r);
> + }
> if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
> ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
> ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
> - __unlock_hpte(hpte, pte);
> + __unlock_hpte(hpte, orig_pte);
> return H_NOT_FOUND;
> }
>
> rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
> v = pte & ~HPTE_V_HVLOCK;
> - pte = be64_to_cpu(hpte[1]);
> if (v & HPTE_V_VALID) {
> hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
> - rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
> + rb = compute_tlbie_rb(v, pte_r, pte_index);
This is good, I think it makes sense to retain the old format for compute_tlbie_rb()
> do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
> /*
> * The reference (R) and change (C) bits in a HPT
> @@ -483,7 +492,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
> note_hpte_modification(kvm, rev);
> unlock_hpte(hpte, 0);
>
> - if (is_mmio_hpte(v, pte))
> + if (is_mmio_hpte(v, pte_r))
> atomic64_inc(&kvm->arch.mmio_update);
>
> if (v & HPTE_V_ABSENT)
> @@ -546,6 +555,10 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
> found = 0;
> hp0 = be64_to_cpu(hp[0]);
> hp1 = be64_to_cpu(hp[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + hp0 = hpte_new_to_old_v(hp0, hp1);
> + hp1 = hpte_new_to_old_r(hp1);
> + }
> if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
> switch (flags & 3) {
> case 0: /* absolute */
> @@ -583,8 +596,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
>
> /* leave it locked */
> hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
> - tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
> - be64_to_cpu(hp[1]), pte_index);
> + tlbrb[n] = compute_tlbie_rb(hp0, hp1, pte_index);
> indexes[n] = j;
> hptes[n] = hp;
> revs[n] = rev;
> @@ -622,7 +634,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
> __be64 *hpte;
> struct revmap_entry *rev;
> unsigned long v, r, rb, mask, bits;
> - u64 pte;
> + u64 pte_v, pte_r;
>
> if (pte_index >= kvm->arch.hpt_npte)
> return H_PARAMETER;
> @@ -630,15 +642,16 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
> hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
> while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
> cpu_relax();
> - pte = be64_to_cpu(hpte[0]);
> - if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
> - ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
> - __unlock_hpte(hpte, pte);
> + v = pte_v = be64_to_cpu(hpte[0]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300))
> + v = hpte_new_to_old_v(v, be64_to_cpu(hpte[1]));
> + if ((v & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
> + ((flags & H_AVPN) && (v & ~0x7fUL) != avpn)) {
> + __unlock_hpte(hpte, pte_v);
> return H_NOT_FOUND;
> }
>
> - v = pte;
> - pte = be64_to_cpu(hpte[1]);
> + pte_r = be64_to_cpu(hpte[1]);
> bits = (flags << 55) & HPTE_R_PP0;
> bits |= (flags << 48) & HPTE_R_KEY_HI;
> bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
> @@ -660,13 +673,13 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
> * readonly to writable. If it should be writable, we'll
> * take a trap and let the page fault code sort it out.
> */
> - r = (pte & ~mask) | bits;
> - if (hpte_is_writable(r) && !hpte_is_writable(pte))
> + r = (pte_r & ~mask) | bits;
> + if (hpte_is_writable(r) && !hpte_is_writable(pte_r))
> r = hpte_make_readonly(r);
> /* If the PTE is changing, invalidate it first */
> - if (r != pte) {
> + if (r != pte_r) {
> rb = compute_tlbie_rb(v, r, pte_index);
> - hpte[0] = cpu_to_be64((v & ~HPTE_V_VALID) |
> + hpte[0] = cpu_to_be64((pte_v & ~HPTE_V_VALID) |
> HPTE_V_ABSENT);
> do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
> true);
> @@ -675,9 +688,9 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
> hpte[1] = cpu_to_be64(r);
> }
> }
> - unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
> + unlock_hpte(hpte, pte_v & ~HPTE_V_HVLOCK);
> asm volatile("ptesync" : : : "memory");
> - if (is_mmio_hpte(v, pte))
> + if (is_mmio_hpte(v, pte_r))
> atomic64_inc(&kvm->arch.mmio_update);
>
> return H_SUCCESS;
> @@ -703,6 +716,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
> hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
> v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
> r = be64_to_cpu(hpte[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + v = hpte_new_to_old_v(v, r);
> + r = hpte_new_to_old_r(r);
> + }
> if (v & HPTE_V_ABSENT) {
> v &= ~HPTE_V_ABSENT;
> v |= HPTE_V_VALID;
> @@ -820,10 +837,16 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
> unsigned long pte_index)
> {
> unsigned long rb;
> + u64 hp0, hp1;
>
> hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
> - rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
> - pte_index);
> + hp0 = be64_to_cpu(hptep[0]);
> + hp1 = be64_to_cpu(hptep[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + hp0 = hpte_new_to_old_v(hp0, hp1);
> + hp1 = hpte_new_to_old_r(hp1);
> + }
> + rb = compute_tlbie_rb(hp0, hp1, pte_index);
> do_tlbies(kvm, &rb, 1, 1, true);
> }
> EXPORT_SYMBOL_GPL(kvmppc_invalidate_hpte);
> @@ -833,9 +856,15 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
> {
> unsigned long rb;
> unsigned char rbyte;
> + u64 hp0, hp1;
>
> - rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
> - pte_index);
> + hp0 = be64_to_cpu(hptep[0]);
> + hp1 = be64_to_cpu(hptep[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + hp0 = hpte_new_to_old_v(hp0, hp1);
> + hp1 = hpte_new_to_old_r(hp1);
> + }
> + rb = compute_tlbie_rb(hp0, hp1, pte_index);
> rbyte = (be64_to_cpu(hptep[1]) & ~HPTE_R_R) >> 8;
> /* modify only the second-last byte, which contains the ref bit */
> *((char *)hptep + 14) = rbyte;
> @@ -895,7 +924,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
> unsigned long avpn;
> __be64 *hpte;
> unsigned long mask, val;
> - unsigned long v, r;
> + unsigned long v, r, orig_v;
>
> /* Get page shift, work out hash and AVPN etc. */
> mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_SECONDARY;
> @@ -930,6 +959,8 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
> for (i = 0; i < 16; i += 2) {
> /* Read the PTE racily */
> v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
> + if (cpu_has_feature(CPU_FTR_ARCH_300))
> + v = hpte_new_to_old_v(v, be64_to_cpu(hpte[i+1]));
>
> /* Check valid/absent, hash, segment size and AVPN */
> if (!(v & valid) || (v & mask) != val)
> @@ -938,8 +969,12 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
> /* Lock the PTE and read it under the lock */
> while (!try_lock_hpte(&hpte[i], HPTE_V_HVLOCK))
> cpu_relax();
> - v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
> + v = orig_v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
> r = be64_to_cpu(hpte[i+1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + v = hpte_new_to_old_v(v, r);
> + r = hpte_new_to_old_r(r);
> + }
>
> /*
> * Check the HPTE again, including base page size
> @@ -949,7 +984,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
> /* Return with the HPTE still locked */
> return (hash << 3) + (i >> 1);
>
> - __unlock_hpte(&hpte[i], v);
> + __unlock_hpte(&hpte[i], orig_v);
> }
>
> if (val & HPTE_V_SECONDARY)
> @@ -977,7 +1012,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
> {
> struct kvm *kvm = vcpu->kvm;
> long int index;
> - unsigned long v, r, gr;
> + unsigned long v, r, gr, orig_v;
> __be64 *hpte;
> unsigned long valid;
> struct revmap_entry *rev;
> @@ -1005,12 +1040,16 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
> return 0; /* for prot fault, HPTE disappeared */
> }
> hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
> - v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
> + v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
> r = be64_to_cpu(hpte[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + v = hpte_new_to_old_v(v, r);
> + r = hpte_new_to_old_r(r);
> + }
> rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
> gr = rev->guest_rpte;
>
> - unlock_hpte(hpte, v);
> + unlock_hpte(hpte, orig_v);
> }
>
> /* For not found, if the HPTE is valid by now, retry the instruction */
>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
@ 2016-11-19 0:38 ` Balbir Singh
0 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19 0:38 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
On 18/11/16 18:28, Paul Mackerras wrote:
> This adapts the KVM-HV hashed page table (HPT) code to read and write
> HPT entries in the new format defined in Power ISA v3.00 on POWER9
> machines. The new format moves the B (segment size) field from the
> first doubleword to the second, and trims some bits from the AVA
> (abbreviated virtual address) and ARPN (abbreviated real page number)
> fields. As far as possible, the conversion is done when reading or
> writing the HPT entries, and the rest of the code continues to use
> the old format.
>
I had a verison to do this, but it assumed we supported both PTE formats (old
and new) and that kvm would be aware of the format supported (the one you reviewed).
This is much nicer now that we know that we support *only* the older format
for KVM guests.
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 39 ++++++++++----
> arch/powerpc/kvm/book3s_hv_rm_mmu.c | 101 +++++++++++++++++++++++++-----------
> 2 files changed, 100 insertions(+), 40 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 7755bd0..20a8e8e 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -314,7 +314,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
> struct kvmppc_slb *slbe;
> unsigned long slb_v;
> unsigned long pp, key;
> - unsigned long v, gr;
> + unsigned long v, orig_v, gr;
> __be64 *hptep;
> int index;
> int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
> @@ -339,10 +339,12 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
> return -ENOENT;
> }
> hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
> - v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
> + v = orig_v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
> + if (cpu_has_feature(CPU_FTR_ARCH_300))
> + v = hpte_new_to_old_v(v, be64_to_cpu(hptep[1]));
> gr = kvm->arch.revmap[index].guest_rpte;
>
> - unlock_hpte(hptep, v);
> + unlock_hpte(hptep, orig_v);
> preempt_enable();
>
> gpte->eaddr = eaddr;
> @@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> {
> struct kvm *kvm = vcpu->kvm;
> unsigned long hpte[3], r;
> + unsigned long hnow_v, hnow_r;
> __be64 *hptep;
> unsigned long mmu_seq, psize, pte_size;
> unsigned long gpa_base, gfn_base;
> @@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> unlock_hpte(hptep, hpte[0]);
> preempt_enable();
>
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
> + hpte[1] = hpte_new_to_old_r(hpte[1]);
> + }
I think we can avoid this, if we avoid the conversion in kvmppc_hpte_hv_fault().
If we decide not to do this, then gpa will need to use a new mask to extract
the correct gpa.
> if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
> hpte[1] != vcpu->arch.pgfault_hpte[1])
> return RESUME_GUEST;
> @@ -599,9 +606,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> preempt_disable();
> while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
> cpu_relax();
> - if ((be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK) != hpte[0] ||
> - be64_to_cpu(hptep[1]) != hpte[1] ||
> - rev->guest_rpte != hpte[2])
> + hnow_v = be64_to_cpu(hptep[0]);
> + hnow_r = be64_to_cpu(hptep[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + hnow_v = hpte_new_to_old_v(hnow_v, hnow_r);
> + hnow_r = hpte_new_to_old_r(hnow_r);
> + }
> + if ((hnow_v & ~HPTE_V_HVLOCK) != hpte[0] || hnow_r != hpte[1] ||
> + rev->guest_rpte != hpte[2])
These changes can be avoided as well (based on the comment above)
> /* HPTE has been changed under us; let the guest retry */
> goto out_unlock;
> hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
> @@ -632,6 +644,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
> }
>
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + r = hpte_old_to_new_r(hpte[0], r);
> + hpte[0] = hpte_old_to_new_v(hpte[0]);
> + }
> hptep[1] = cpu_to_be64(r);
> eieio();
> __unlock_hpte(hptep, hpte[0]);
> @@ -1183,7 +1199,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
> unsigned long *hpte, struct revmap_entry *revp,
> int want_valid, int first_pass)
> {
> - unsigned long v, r;
> + unsigned long v, r, hr;
> unsigned long rcbits_unset;
> int ok = 1;
> int valid, dirty;
> @@ -1210,6 +1226,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
> while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
> cpu_relax();
> v = be64_to_cpu(hptp[0]);
> + hr = be64_to_cpu(hptp[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + v = hpte_new_to_old_v(v, hr);
> + hr = hpte_new_to_old_r(hr);
> + }
>
> /* re-evaluate valid and dirty from synchronized HPTE value */
> valid = !!(v & HPTE_V_VALID);
> @@ -1217,8 +1238,8 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
>
> /* Harvest R and C into guest view if necessary */
> rcbits_unset = ~revp->guest_rpte & (HPTE_R_R | HPTE_R_C);
> - if (valid && (rcbits_unset & be64_to_cpu(hptp[1]))) {
> - revp->guest_rpte |= (be64_to_cpu(hptp[1]) &
> + if (valid && (rcbits_unset & hr)) {
> + revp->guest_rpte |= (hr &
> (HPTE_R_R | HPTE_R_C)) | HPTE_GR_MODIFIED;
> dirty = 1;
> }
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 02786b3..1179e40 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -364,6 +364,11 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
> }
> }
>
> + /* Convert to new format on P9 */
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + ptel = hpte_old_to_new_r(pteh, ptel);
> + pteh = hpte_old_to_new_v(pteh);
> + }
So much nicer when we support just one format, otherwise my patches did a whole
bunch of unnecessary changes.
> hpte[1] = cpu_to_be64(ptel);
>
> /* Write the first HPTE dword, unlocking the HPTE and making it valid */
> @@ -445,27 +450,31 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
> __be64 *hpte;
> unsigned long v, r, rb;
> struct revmap_entry *rev;
> - u64 pte;
> + u64 pte, orig_pte, pte_r;
>
> if (pte_index >= kvm->arch.hpt_npte)
> return H_PARAMETER;
> hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
> while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
> cpu_relax();
> - pte = be64_to_cpu(hpte[0]);
> + pte = orig_pte = be64_to_cpu(hpte[0]);
> + pte_r = be64_to_cpu(hpte[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + pte = hpte_new_to_old_v(pte, pte_r);
> + pte_r = hpte_new_to_old_r(pte_r);
> + }
> if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
> ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
> ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
> - __unlock_hpte(hpte, pte);
> + __unlock_hpte(hpte, orig_pte);
> return H_NOT_FOUND;
> }
>
> rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
> v = pte & ~HPTE_V_HVLOCK;
> - pte = be64_to_cpu(hpte[1]);
> if (v & HPTE_V_VALID) {
> hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
> - rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
> + rb = compute_tlbie_rb(v, pte_r, pte_index);
This is good, I think it makes sense to retain the old format for compute_tlbie_rb()
> do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
> /*
> * The reference (R) and change (C) bits in a HPT
> @@ -483,7 +492,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
> note_hpte_modification(kvm, rev);
> unlock_hpte(hpte, 0);
>
> - if (is_mmio_hpte(v, pte))
> + if (is_mmio_hpte(v, pte_r))
> atomic64_inc(&kvm->arch.mmio_update);
>
> if (v & HPTE_V_ABSENT)
> @@ -546,6 +555,10 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
> found = 0;
> hp0 = be64_to_cpu(hp[0]);
> hp1 = be64_to_cpu(hp[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + hp0 = hpte_new_to_old_v(hp0, hp1);
> + hp1 = hpte_new_to_old_r(hp1);
> + }
> if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
> switch (flags & 3) {
> case 0: /* absolute */
> @@ -583,8 +596,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
>
> /* leave it locked */
> hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
> - tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
> - be64_to_cpu(hp[1]), pte_index);
> + tlbrb[n] = compute_tlbie_rb(hp0, hp1, pte_index);
> indexes[n] = j;
> hptes[n] = hp;
> revs[n] = rev;
> @@ -622,7 +634,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
> __be64 *hpte;
> struct revmap_entry *rev;
> unsigned long v, r, rb, mask, bits;
> - u64 pte;
> + u64 pte_v, pte_r;
>
> if (pte_index >= kvm->arch.hpt_npte)
> return H_PARAMETER;
> @@ -630,15 +642,16 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
> hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
> while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
> cpu_relax();
> - pte = be64_to_cpu(hpte[0]);
> - if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
> - ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
> - __unlock_hpte(hpte, pte);
> + v = pte_v = be64_to_cpu(hpte[0]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300))
> + v = hpte_new_to_old_v(v, be64_to_cpu(hpte[1]));
> + if ((v & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
> + ((flags & H_AVPN) && (v & ~0x7fUL) != avpn)) {
> + __unlock_hpte(hpte, pte_v);
> return H_NOT_FOUND;
> }
>
> - v = pte;
> - pte = be64_to_cpu(hpte[1]);
> + pte_r = be64_to_cpu(hpte[1]);
> bits = (flags << 55) & HPTE_R_PP0;
> bits |= (flags << 48) & HPTE_R_KEY_HI;
> bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
> @@ -660,13 +673,13 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
> * readonly to writable. If it should be writable, we'll
> * take a trap and let the page fault code sort it out.
> */
> - r = (pte & ~mask) | bits;
> - if (hpte_is_writable(r) && !hpte_is_writable(pte))
> + r = (pte_r & ~mask) | bits;
> + if (hpte_is_writable(r) && !hpte_is_writable(pte_r))
> r = hpte_make_readonly(r);
> /* If the PTE is changing, invalidate it first */
> - if (r != pte) {
> + if (r != pte_r) {
> rb = compute_tlbie_rb(v, r, pte_index);
> - hpte[0] = cpu_to_be64((v & ~HPTE_V_VALID) |
> + hpte[0] = cpu_to_be64((pte_v & ~HPTE_V_VALID) |
> HPTE_V_ABSENT);
> do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
> true);
> @@ -675,9 +688,9 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
> hpte[1] = cpu_to_be64(r);
> }
> }
> - unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
> + unlock_hpte(hpte, pte_v & ~HPTE_V_HVLOCK);
> asm volatile("ptesync" : : : "memory");
> - if (is_mmio_hpte(v, pte))
> + if (is_mmio_hpte(v, pte_r))
> atomic64_inc(&kvm->arch.mmio_update);
>
> return H_SUCCESS;
> @@ -703,6 +716,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
> hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
> v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
> r = be64_to_cpu(hpte[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + v = hpte_new_to_old_v(v, r);
> + r = hpte_new_to_old_r(r);
> + }
> if (v & HPTE_V_ABSENT) {
> v &= ~HPTE_V_ABSENT;
> v |= HPTE_V_VALID;
> @@ -820,10 +837,16 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
> unsigned long pte_index)
> {
> unsigned long rb;
> + u64 hp0, hp1;
>
> hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
> - rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
> - pte_index);
> + hp0 = be64_to_cpu(hptep[0]);
> + hp1 = be64_to_cpu(hptep[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + hp0 = hpte_new_to_old_v(hp0, hp1);
> + hp1 = hpte_new_to_old_r(hp1);
> + }
> + rb = compute_tlbie_rb(hp0, hp1, pte_index);
> do_tlbies(kvm, &rb, 1, 1, true);
> }
> EXPORT_SYMBOL_GPL(kvmppc_invalidate_hpte);
> @@ -833,9 +856,15 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
> {
> unsigned long rb;
> unsigned char rbyte;
> + u64 hp0, hp1;
>
> - rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
> - pte_index);
> + hp0 = be64_to_cpu(hptep[0]);
> + hp1 = be64_to_cpu(hptep[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + hp0 = hpte_new_to_old_v(hp0, hp1);
> + hp1 = hpte_new_to_old_r(hp1);
> + }
> + rb = compute_tlbie_rb(hp0, hp1, pte_index);
> rbyte = (be64_to_cpu(hptep[1]) & ~HPTE_R_R) >> 8;
> /* modify only the second-last byte, which contains the ref bit */
> *((char *)hptep + 14) = rbyte;
> @@ -895,7 +924,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
> unsigned long avpn;
> __be64 *hpte;
> unsigned long mask, val;
> - unsigned long v, r;
> + unsigned long v, r, orig_v;
>
> /* Get page shift, work out hash and AVPN etc. */
> mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_SECONDARY;
> @@ -930,6 +959,8 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
> for (i = 0; i < 16; i += 2) {
> /* Read the PTE racily */
> v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
> + if (cpu_has_feature(CPU_FTR_ARCH_300))
> + v = hpte_new_to_old_v(v, be64_to_cpu(hpte[i+1]));
>
> /* Check valid/absent, hash, segment size and AVPN */
> if (!(v & valid) || (v & mask) != val)
> @@ -938,8 +969,12 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
> /* Lock the PTE and read it under the lock */
> while (!try_lock_hpte(&hpte[i], HPTE_V_HVLOCK))
> cpu_relax();
> - v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
> + v = orig_v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
> r = be64_to_cpu(hpte[i+1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + v = hpte_new_to_old_v(v, r);
> + r = hpte_new_to_old_r(r);
> + }
>
> /*
> * Check the HPTE again, including base page size
> @@ -949,7 +984,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
> /* Return with the HPTE still locked */
> return (hash << 3) + (i >> 1);
>
> - __unlock_hpte(&hpte[i], v);
> + __unlock_hpte(&hpte[i], orig_v);
> }
>
> if (val & HPTE_V_SECONDARY)
> @@ -977,7 +1012,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
> {
> struct kvm *kvm = vcpu->kvm;
> long int index;
> - unsigned long v, r, gr;
> + unsigned long v, r, gr, orig_v;
> __be64 *hpte;
> unsigned long valid;
> struct revmap_entry *rev;
> @@ -1005,12 +1040,16 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
> return 0; /* for prot fault, HPTE disappeared */
> }
> hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
> - v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
> + v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
> r = be64_to_cpu(hpte[1]);
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + v = hpte_new_to_old_v(v, r);
> + r = hpte_new_to_old_r(r);
> + }
> rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
> gr = rev->guest_rpte;
>
> - unlock_hpte(hpte, v);
> + unlock_hpte(hpte, orig_v);
> }
>
> /* For not found, if the HPTE is valid by now, retry the instruction */
>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-19 0:45 ` Balbir Singh
-1 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19 0:45 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
> +#ifdef CONFIG_PPC_BOOK3S_64
> +void mmu_partition_table_init(void)
> +{
> + unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
> +
> + BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
This should be 36 (12 + 24)
> + partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
> + MEMBLOCK_ALLOC_ANYWHERE));
> +
Balbir
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-19 0:45 ` Balbir Singh
0 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19 0:45 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
> +#ifdef CONFIG_PPC_BOOK3S_64
> +void mmu_partition_table_init(void)
> +{
> + unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
> +
> + BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
This should be 36 (12 + 24)
> + partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
> + MEMBLOCK_ALLOC_ANYWHERE));
> +
Balbir
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 06/13] KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9
2016-11-18 7:28 ` Paul Mackerras
@ 2016-11-19 1:01 ` Balbir Singh
-1 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19 1:01 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
On 18/11/16 18:28, Paul Mackerras wrote:
> On POWER9, the SDR1 register (hashed page table base address) is no
> longer used, and instead the hardware reads the HPT base address
> and size from the partition table. The partition table entry also
> contains the bits that specify the page size for the VRMA mapping,
> which were previously in the LPCR. The VPM0 bit of the LPCR is
> now reserved; the processor now always uses the VRMA (virtual
> real-mode area) mechanism for guest real-mode accesses in HPT mode,
> and the RMO (real-mode offset) mechanism has been dropped.
>
> When entering or exiting the guest, we now only have to set the
> LPIDR (logical partition ID register), not the SDR1 register.
> There is also no requirement now to transition via a reserved
> LPID value.
>
I had similar changes, but did not have the VPM and host SDR switching
bits either.
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/kvm/book3s_hv.c | 36 +++++++++++++++++++++++++++------
> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 ++++++---
> 2 files changed, 37 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 40b2b6d..5cbe3c3 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -54,6 +54,7 @@
> #include <asm/dbell.h>
> #include <asm/hmi.h>
> #include <asm/pnv-pci.h>
> +#include <asm/mmu.h>
> #include <linux/gfp.h>
> #include <linux/vmalloc.h>
> #include <linux/highmem.h>
> @@ -3024,6 +3025,22 @@ static void kvmppc_mmu_destroy_hv(struct kvm_vcpu *vcpu)
> return;
> }
>
> +static void kvmppc_setup_partition_table(struct kvm *kvm)
> +{
> + unsigned long dw0, dw1;
> +
> + /* PS field - page size for VRMA */
> + dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
> + ((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
> + /* HTABSIZE and HTABORG fields */
> + dw0 |= kvm->arch.sdr1;
> +
> + /* Second dword has GR=0; other fields are unused since UPRT=0 */
> + dw1 = 0;
Don't we need to set LPCR_GTSE for legacy guests?
Otherwise
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 06/13] KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9
@ 2016-11-19 1:01 ` Balbir Singh
0 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19 1:01 UTC (permalink / raw)
To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev
On 18/11/16 18:28, Paul Mackerras wrote:
> On POWER9, the SDR1 register (hashed page table base address) is no
> longer used, and instead the hardware reads the HPT base address
> and size from the partition table. The partition table entry also
> contains the bits that specify the page size for the VRMA mapping,
> which were previously in the LPCR. The VPM0 bit of the LPCR is
> now reserved; the processor now always uses the VRMA (virtual
> real-mode area) mechanism for guest real-mode accesses in HPT mode,
> and the RMO (real-mode offset) mechanism has been dropped.
>
> When entering or exiting the guest, we now only have to set the
> LPIDR (logical partition ID register), not the SDR1 register.
> There is also no requirement now to transition via a reserved
> LPID value.
>
I had similar changes, but did not have the VPM and host SDR switching
bits either.
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/kvm/book3s_hv.c | 36 +++++++++++++++++++++++++++------
> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 ++++++---
> 2 files changed, 37 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 40b2b6d..5cbe3c3 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -54,6 +54,7 @@
> #include <asm/dbell.h>
> #include <asm/hmi.h>
> #include <asm/pnv-pci.h>
> +#include <asm/mmu.h>
> #include <linux/gfp.h>
> #include <linux/vmalloc.h>
> #include <linux/highmem.h>
> @@ -3024,6 +3025,22 @@ static void kvmppc_mmu_destroy_hv(struct kvm_vcpu *vcpu)
> return;
> }
>
> +static void kvmppc_setup_partition_table(struct kvm *kvm)
> +{
> + unsigned long dw0, dw1;
> +
> + /* PS field - page size for VRMA */
> + dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
> + ((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
> + /* HTABSIZE and HTABORG fields */
> + dw0 |= kvm->arch.sdr1;
> +
> + /* Second dword has GR=0; other fields are unused since UPRT=0 */
> + dw1 = 0;
Don't we need to set LPCR_GTSE for legacy guests?
Otherwise
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
2016-11-18 14:59 ` Aneesh Kumar K.V
@ 2016-11-19 3:53 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 3:53 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev
On Fri, Nov 18, 2016 at 08:17:25PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
>
> > On POWER9, the msgsnd instruction is able to send interrupts to
> > other cores, as well as other threads on the local core. Since
> > msgsnd is generally simpler and faster than sending an IPI via the
> > XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
> >
> > Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> > ---
> > arch/powerpc/kvm/book3s_hv.c | 11 ++++++++++-
> > arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
> > 2 files changed, 18 insertions(+), 3 deletions(-)
> >
[...]
> > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> > index 0c84d6b..37ed045 100644
> > --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> > +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> > @@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
> > void kvmhv_rm_send_ipi(int cpu)
> > {
> > unsigned long xics_phys;
> > + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> >
> > - /* On POWER8 for IPIs to threads in the same core, use msgsnd */
> > + /* On POWER9 we can use msgsnd for any destination cpu. */
> > + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > + msg |= get_hard_smp_processor_id(cpu);
> > + __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> > + return;
>
> Do we need a "sync" there before msgsnd ?
The comment just above this function says:
/*
* Send an interrupt or message to another CPU.
* This can only be called in real mode.
* The caller needs to include any barrier needed to order writes
* to memory vs. the IPI/message.
*/
so no. In fact all of its callers do smp_mb() before calling it.
(And no we don't want to move the smp_mb() into kvmhv_rm_send_ipi();
see kvmhv_interrupt_vcore() for why.)
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
@ 2016-11-19 3:53 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 3:53 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev
On Fri, Nov 18, 2016 at 08:17:25PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
>
> > On POWER9, the msgsnd instruction is able to send interrupts to
> > other cores, as well as other threads on the local core. Since
> > msgsnd is generally simpler and faster than sending an IPI via the
> > XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
> >
> > Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> > ---
> > arch/powerpc/kvm/book3s_hv.c | 11 ++++++++++-
> > arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
> > 2 files changed, 18 insertions(+), 3 deletions(-)
> >
[...]
> > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> > index 0c84d6b..37ed045 100644
> > --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> > +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> > @@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
> > void kvmhv_rm_send_ipi(int cpu)
> > {
> > unsigned long xics_phys;
> > + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> >
> > - /* On POWER8 for IPIs to threads in the same core, use msgsnd */
> > + /* On POWER9 we can use msgsnd for any destination cpu. */
> > + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > + msg |= get_hard_smp_processor_id(cpu);
> > + __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> > + return;
>
> Do we need a "sync" there before msgsnd ?
The comment just above this function says:
/*
* Send an interrupt or message to another CPU.
* This can only be called in real mode.
* The caller needs to include any barrier needed to order writes
* to memory vs. the IPI/message.
*/
so no. In fact all of its callers do smp_mb() before calling it.
(And no we don't want to move the smp_mb() into kvmhv_rm_send_ipi();
see kvmhv_interrupt_vcore() for why.)
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
2016-11-18 14:47 ` Aneesh Kumar K.V
@ 2016-11-19 4:02 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 4:02 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev
On Fri, Nov 18, 2016 at 08:05:47PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
>
> > Some special-purpose registers that were present and accessible
> > by guests on POWER8 no longer exist on POWER9, so this adds
> > feature sections to ensure that we don't try to context-switch
> > them when going into or out of a guest on POWER9. These are
> > all relatively obscure, rarely-used registers, but we had to
> > context-switch them on POWER8 to avoid creating a covert channel.
> > They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.
>
> We don't need to context-switch them even when running a power8 compat
> guest ?
They physically don't exist on the P9 chip, so how could we
context-switch them? They certainly can't be used as a covert
channel.
Accesses to them will be a no-op for the guest in privileged
(supervisor) mode (i.e., mfspr won't modify the destination
register), which could be confusing for the guest if it was expecting
to use them. SPMC1/2 and MMCRS are part of the "supervisor" PMU,
which we have never used. I think CSIGR, TACR and TCSCR are part of a
facility that was never completely implemented or usable on P8, so
nothing uses them. ACOP is used in arch/powerpc/mm/icswx.c in
conjunction with accelerators. There might be a problem there, but in
any case, with no physical ACOP register present there's no way to
save/restore it.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
@ 2016-11-19 4:02 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 4:02 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev
On Fri, Nov 18, 2016 at 08:05:47PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
>
> > Some special-purpose registers that were present and accessible
> > by guests on POWER8 no longer exist on POWER9, so this adds
> > feature sections to ensure that we don't try to context-switch
> > them when going into or out of a guest on POWER9. These are
> > all relatively obscure, rarely-used registers, but we had to
> > context-switch them on POWER8 to avoid creating a covert channel.
> > They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.
>
> We don't need to context-switch them even when running a power8 compat
> guest ?
They physically don't exist on the P9 chip, so how could we
context-switch them? They certainly can't be used as a covert
channel.
Accesses to them will be a no-op for the guest in privileged
(supervisor) mode (i.e., mfspr won't modify the destination
register), which could be confusing for the guest if it was expecting
to use them. SPMC1/2 and MMCRS are part of the "supervisor" PMU,
which we have never used. I think CSIGR, TACR and TCSCR are part of a
facility that was never completely implemented or usable on P8, so
nothing uses them. ACOP is used in arch/powerpc/mm/icswx.c in
conjunction with accelerators. There might be a problem there, but in
any case, with no physical ACOP register present there's no way to
save/restore it.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
2016-11-18 14:53 ` Aneesh Kumar K.V
@ 2016-11-19 4:13 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 4:13 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev
On Fri, Nov 18, 2016 at 08:11:34PM +0530, Aneesh Kumar K.V wrote:
> > @@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> > kvm->arch.lpcr = lpcr;
> >
> > /*
> > + * Work out how many sets the TLB has, for the use of
> > + * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > + */
> > + if (cpu_has_feature(CPU_FTR_ARCH_300))
> > + kvm->arch.tlb_sets = 256; /* POWER9 */
> > + else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > + kvm->arch.tlb_sets = 512; /* POWER8 */
> > + else
> > + kvm->arch.tlb_sets = 128; /* POWER7 */
> > +
>
> We have
>
> #define POWER7_TLB_SETS 128 /* # sets in POWER7 TLB */
> #define POWER8_TLB_SETS 512 /* # sets in POWER8 TLB */
> #define POWER9_TLB_SETS_HASH 256 /* # sets in POWER9 TLB Hash mode */
> #define POWER9_TLB_SETS_RADIX 128 /* # sets in POWER9 TLB Radix mode */
>
> May be use that instead of opencoding ?
Doing that would make it easier to check that we're using the same
values everywhere but harder to see what actual numbers we're
getting. I guess I could use the symbols and put the values in the
comments. In any case, in future these values are just going to be
default values if we can't find a suitable device-tree property.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-19 4:13 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 4:13 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev
On Fri, Nov 18, 2016 at 08:11:34PM +0530, Aneesh Kumar K.V wrote:
> > @@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> > kvm->arch.lpcr = lpcr;
> >
> > /*
> > + * Work out how many sets the TLB has, for the use of
> > + * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > + */
> > + if (cpu_has_feature(CPU_FTR_ARCH_300))
> > + kvm->arch.tlb_sets = 256; /* POWER9 */
> > + else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > + kvm->arch.tlb_sets = 512; /* POWER8 */
> > + else
> > + kvm->arch.tlb_sets = 128; /* POWER7 */
> > +
>
> We have
>
> #define POWER7_TLB_SETS 128 /* # sets in POWER7 TLB */
> #define POWER8_TLB_SETS 512 /* # sets in POWER8 TLB */
> #define POWER9_TLB_SETS_HASH 256 /* # sets in POWER9 TLB Hash mode */
> #define POWER9_TLB_SETS_RADIX 128 /* # sets in POWER9 TLB Radix mode */
>
> May be use that instead of opencoding ?
Doing that would make it easier to check that we're using the same
values everywhere but harder to see what actual numbers we're
getting. I guess I could use the symbols and put the values in the
comments. In any case, in future these values are just going to be
default values if we can't find a suitable device-tree property.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
2016-11-18 21:57 ` Benjamin Herrenschmidt
@ 2016-11-19 4:14 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 4:14 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Aneesh Kumar K.V, kvm, kvm-ppc, linuxppc-dev
On Sat, Nov 19, 2016 at 08:57:28AM +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2016-11-18 at 20:11 +0530, Aneesh Kumar K.V wrote:
> > > + * Work out how many sets the TLB has, for the use of
> > > + * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > > + */
> > > + if (cpu_has_feature(CPU_FTR_ARCH_300))
> > > + kvm->arch.tlb_sets = 256; /* POWER9 */
> > > + else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > > + kvm->arch.tlb_sets = 512; /* POWER8 */
> > > + else
> > > + kvm->arch.tlb_sets = 128; /* POWER7 */
> > > +
> >
> > We have
> >
> > #define POWER7_TLB_SETS 128 /* # sets in POWER7 TLB */
> > #define POWER8_TLB_SETS 512 /* # sets in POWER8 TLB */
> > #define POWER9_TLB_SETS_HASH 256 /* # sets in POWER9 TLB Hash mode */
> > #define POWER9_TLB_SETS_RADIX 128 /* # sets in POWER9 TLB Radix mode */
> >
> > May be use that instead of opencoding ?
>
> Both are bad and are going to kill us for future backward
> compatibility.
>
> These should be a device-tree property. We can fallback to hard wired
> values if it doesn't exist but we should at least look for one.
Tell me what the property is called and I'll add code to use it. :)
That's the whole reason why I moved this to C code.
> Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
> in the CPU node, so let's create a new one instead, with 2 entries
> (hash vs. radix) or 2 new ones, one for hash and one for radix (when
> available).
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-19 4:14 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 4:14 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Aneesh Kumar K.V, kvm, kvm-ppc, linuxppc-dev
On Sat, Nov 19, 2016 at 08:57:28AM +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2016-11-18 at 20:11 +0530, Aneesh Kumar K.V wrote:
> > > + * Work out how many sets the TLB has, for the use of
> > > + * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > > + */
> > > + if (cpu_has_feature(CPU_FTR_ARCH_300))
> > > + kvm->arch.tlb_sets = 256; /* POWER9 */
> > > + else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > > + kvm->arch.tlb_sets = 512; /* POWER8 */
> > > + else
> > > + kvm->arch.tlb_sets = 128; /* POWER7 */
> > > +
> >
> > We have
> >
> > #define POWER7_TLB_SETS 128 /* # sets in POWER7 TLB */
> > #define POWER8_TLB_SETS 512 /* # sets in POWER8 TLB */
> > #define POWER9_TLB_SETS_HASH 256 /* # sets in POWER9 TLB Hash mode */
> > #define POWER9_TLB_SETS_RADIX 128 /* # sets in POWER9 TLB Radix mode */
> >
> > May be use that instead of opencoding ?
>
> Both are bad and are going to kill us for future backward
> compatibility.
>
> These should be a device-tree property. We can fallback to hard wired
> values if it doesn't exist but we should at least look for one.
Tell me what the property is called and I'll add code to use it. :)
That's the whole reason why I moved this to C code.
> Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
> in the CPU node, so let's create a new one instead, with 2 entries
> (hash vs. radix) or 2 new ones, one for hash and one for radix (when
> available).
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
2016-11-18 14:39 ` Aneesh Kumar K.V
@ 2016-11-19 4:19 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 4:19 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev
On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
> +
> > + /* Global flush of TLBs and partition table caches for this lpid */
> > + asm volatile("ptesync");
> > + asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> > + asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> > +}
>
>
> It would be nice to convert that 0x800 to a documented IS value or better use
> radix__flush_tlb_pid() ?
Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
flush. I could use TLBIEL_INVAL_SET_LPID except the name implies it's
for tlbiel and this is a tlbie.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-19 4:19 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 4:19 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev
On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
> +
> > + /* Global flush of TLBs and partition table caches for this lpid */
> > + asm volatile("ptesync");
> > + asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> > + asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> > +}
>
>
> It would be nice to convert that 0x800 to a documented IS value or better use
> radix__flush_tlb_pid() ?
Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
flush. I could use TLBIEL_INVAL_SET_LPID except the name implies it's
for tlbiel and this is a tlbie.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
2016-11-19 0:45 ` Balbir Singh
@ 2016-11-19 4:23 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 4:23 UTC (permalink / raw)
To: Balbir Singh; +Cc: kvm, kvm-ppc, linuxppc-dev
On Sat, Nov 19, 2016 at 11:45:52AM +1100, Balbir Singh wrote:
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +void mmu_partition_table_init(void)
> > +{
> > + unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
> > +
> > + BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
>
> This should be 36 (12 + 24)
True, though for P9, PATB_SIZE_SHIFT has to be 16.
The BUILD_BUG_ON_MSG is probably not really necessary - I just moved
this code from elsewhere.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-19 4:23 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19 4:23 UTC (permalink / raw)
To: Balbir Singh; +Cc: kvm, kvm-ppc, linuxppc-dev
On Sat, Nov 19, 2016 at 11:45:52AM +1100, Balbir Singh wrote:
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +void mmu_partition_table_init(void)
> > +{
> > + unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
> > +
> > + BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
>
> This should be 36 (12 + 24)
True, though for P9, PATB_SIZE_SHIFT has to be 16.
The BUILD_BUG_ON_MSG is probably not really necessary - I just moved
this code from elsewhere.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
2016-11-19 4:14 ` Paul Mackerras
@ 2016-11-19 4:41 ` Benjamin Herrenschmidt
-1 siblings, 0 replies; 64+ messages in thread
From: Benjamin Herrenschmidt @ 2016-11-19 4:41 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Aneesh Kumar K.V, kvm, kvm-ppc, linuxppc-dev
On Sat, 2016-11-19 at 15:14 +1100, Paul Mackerras wrote:
>
> > These should be a device-tree property. We can fallback to hard wired
> > values if it doesn't exist but we should at least look for one.
>
> Tell me what the property is called and I'll add code to use it. :)
> That's the whole reason why I moved this to C code.
>
> >
> > Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
> > in the CPU node, so let's create a new one instead, with 2 entries
> > (hash vs. radix) or 2 new ones, one for hash and one for radix (when
> > available).
Well, as I said above, there's a defined one but it has bogus values
on almost all P8 firwmares. So I think we need the core code to export
values for use by both the core mm and KVM which can then be picked up
from the DT with "quirks" to fixup the DT values.
(A bit like I did for the never-applied cache geometry patches)
That or we make up new names.
The question remains whether we need a separate property for radix
vs. hash though, we probably should as the "radix is half of hash"
might not be true on future chips.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-19 4:41 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 64+ messages in thread
From: Benjamin Herrenschmidt @ 2016-11-19 4:41 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Aneesh Kumar K.V, kvm, kvm-ppc, linuxppc-dev
On Sat, 2016-11-19 at 15:14 +1100, Paul Mackerras wrote:
>
> > These should be a device-tree property. We can fallback to hard wired
> > values if it doesn't exist but we should at least look for one.
>
> Tell me what the property is called and I'll add code to use it. :)
> That's the whole reason why I moved this to C code.
>
> >
> > Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
> > in the CPU node, so let's create a new one instead, with 2 entries
> > (hash vs. radix) or 2 new ones, one for hash and one for radix (when
> > available).
Well, as I said above, there's a defined one but it has bogus values
on almost all P8 firwmares. So I think we need the core code to export
values for use by both the core mm and KVM which can then be picked up
from the DT with "quirks" to fixup the DT values.
(A bit like I did for the never-applied cache geometry patches)
That or we make up new names.
The question remains whether we need a separate property for radix
vs. hash though, we probably should as the "radix is half of hash"
might not be true on future chips.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
2016-11-19 4:19 ` Paul Mackerras
@ 2016-11-19 6:47 ` Aneesh Kumar K.V
-1 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-19 6:35 UTC (permalink / raw)
To: Paul Mackerras; +Cc: kvm, kvm-ppc, linuxppc-dev
Paul Mackerras <paulus@ozlabs.org> writes:
> On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
>> Paul Mackerras <paulus@ozlabs.org> writes:
>> +
>> > + /* Global flush of TLBs and partition table caches for this lpid */
>> > + asm volatile("ptesync");
>> > + asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
>> > + asm volatile("eieio; tlbsync; ptesync" : : : "memory");
>> > +}
>>
>>
>> It would be nice to convert that 0x800 to a documented IS value or better use
>> radix__flush_tlb_pid() ?
>
> Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
> flush. I could use TLBIEL_INVAL_SET_LPID except the name implies it's
> for tlbiel and this is a tlbie.
>
I wrote that wrong, we really don't have tlb_pid() what we have is tlb_lpid().
void radix__flush_tlb_lpid(unsigned long lpid)
{
unsigned long rb,rs,prs,r;
unsigned long ric = RIC_FLUSH_ALL;
rb = 0x2 << PPC_BITLSHIFT(53); /* IS = 2 */
rs = lpid & ((1UL << 32) - 1);
prs = 0; /* partition scoped */
r = 1; /* raidx format */
asm volatile("ptesync": : :"memory");
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
asm volatile("eieio; tlbsync; ptesync": : :"memory");
}
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-19 6:47 ` Aneesh Kumar K.V
0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-19 6:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: kvm, kvm-ppc, linuxppc-dev
Paul Mackerras <paulus@ozlabs.org> writes:
> On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
>> Paul Mackerras <paulus@ozlabs.org> writes:
>> +
>> > + /* Global flush of TLBs and partition table caches for this lpid */
>> > + asm volatile("ptesync");
>> > + asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
>> > + asm volatile("eieio; tlbsync; ptesync" : : : "memory");
>> > +}
>>
>>
>> It would be nice to convert that 0x800 to a documented IS value or better use
>> radix__flush_tlb_pid() ?
>
> Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
> flush. I could use TLBIEL_INVAL_SET_LPID except the name implies it's
> for tlbiel and this is a tlbie.
>
I wrote that wrong, we really don't have tlb_pid() what we have is tlb_lpid().
void radix__flush_tlb_lpid(unsigned long lpid)
{
unsigned long rb,rs,prs,r;
unsigned long ric = RIC_FLUSH_ALL;
rb = 0x2 << PPC_BITLSHIFT(53); /* IS = 2 */
rs = lpid & ((1UL << 32) - 1);
prs = 0; /* partition scoped */
r = 1; /* raidx format */
asm volatile("ptesync": : :"memory");
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
asm volatile("eieio; tlbsync; ptesync": : :"memory");
}
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
2016-11-19 0:38 ` Balbir Singh
@ 2016-11-21 2:02 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-21 2:02 UTC (permalink / raw)
To: Balbir Singh; +Cc: kvm, kvm-ppc, linuxppc-dev
On Sat, Nov 19, 2016 at 11:38:40AM +1100, Balbir Singh wrote:
>
>
> On 18/11/16 18:28, Paul Mackerras wrote:
> > This adapts the KVM-HV hashed page table (HPT) code to read and write
> > HPT entries in the new format defined in Power ISA v3.00 on POWER9
> > machines. The new format moves the B (segment size) field from the
> > first doubleword to the second, and trims some bits from the AVA
> > (abbreviated virtual address) and ARPN (abbreviated real page number)
> > fields. As far as possible, the conversion is done when reading or
> > writing the HPT entries, and the rest of the code continues to use
> > the old format.
[snip]
> > @@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> > {
> > struct kvm *kvm = vcpu->kvm;
> > unsigned long hpte[3], r;
> > + unsigned long hnow_v, hnow_r;
> > __be64 *hptep;
> > unsigned long mmu_seq, psize, pte_size;
> > unsigned long gpa_base, gfn_base;
> > @@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> > unlock_hpte(hptep, hpte[0]);
> > preempt_enable();
> >
> > + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > + hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
> > + hpte[1] = hpte_new_to_old_r(hpte[1]);
> > + }
>
> I think we can avoid this, if we avoid the conversion in kvmppc_hpte_hv_fault().
> If we decide not to do this, then gpa will need to use a new mask to extract
> the correct gpa.
Yes, we could store vcpu->arch.pgfault[] in native format, i.e. new
format on P9. That might make the code a bit simpler indeed.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
@ 2016-11-21 2:02 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-21 2:02 UTC (permalink / raw)
To: Balbir Singh; +Cc: kvm, kvm-ppc, linuxppc-dev
On Sat, Nov 19, 2016 at 11:38:40AM +1100, Balbir Singh wrote:
>
>
> On 18/11/16 18:28, Paul Mackerras wrote:
> > This adapts the KVM-HV hashed page table (HPT) code to read and write
> > HPT entries in the new format defined in Power ISA v3.00 on POWER9
> > machines. The new format moves the B (segment size) field from the
> > first doubleword to the second, and trims some bits from the AVA
> > (abbreviated virtual address) and ARPN (abbreviated real page number)
> > fields. As far as possible, the conversion is done when reading or
> > writing the HPT entries, and the rest of the code continues to use
> > the old format.
[snip]
> > @@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> > {
> > struct kvm *kvm = vcpu->kvm;
> > unsigned long hpte[3], r;
> > + unsigned long hnow_v, hnow_r;
> > __be64 *hptep;
> > unsigned long mmu_seq, psize, pte_size;
> > unsigned long gpa_base, gfn_base;
> > @@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> > unlock_hpte(hptep, hpte[0]);
> > preempt_enable();
> >
> > + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > + hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
> > + hpte[1] = hpte_new_to_old_r(hpte[1]);
> > + }
>
> I think we can avoid this, if we avoid the conversion in kvmppc_hpte_hv_fault().
> If we decide not to do this, then gpa will need to use a new mask to extract
> the correct gpa.
Yes, we could store vcpu->arch.pgfault[] in native format, i.e. new
format on P9. That might make the code a bit simpler indeed.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
2016-11-19 6:47 ` Aneesh Kumar K.V
@ 2016-11-21 2:14 ` Paul Mackerras
-1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-21 2:14 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev
On Sat, Nov 19, 2016 at 12:05:21PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
>
> > On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
> >> Paul Mackerras <paulus@ozlabs.org> writes:
> >> +
> >> > + /* Global flush of TLBs and partition table caches for this lpid */
> >> > + asm volatile("ptesync");
> >> > + asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> >> > + asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> >> > +}
> >>
> >>
> >> It would be nice to convert that 0x800 to a documented IS value or better use
> >> radix__flush_tlb_pid() ?
> >
> > Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
> > flush. I could use TLBIEL_INVAL_SET_LPID except the name implies it's
> > for tlbiel and this is a tlbie.
> >
>
> I wrote that wrong, we really don't have tlb_pid() what we have is tlb_lpid().
>
> void radix__flush_tlb_lpid(unsigned long lpid)
> {
> unsigned long rb,rs,prs,r;
> unsigned long ric = RIC_FLUSH_ALL;
>
> rb = 0x2 << PPC_BITLSHIFT(53); /* IS = 2 */
> rs = lpid & ((1UL << 32) - 1);
> prs = 0; /* partition scoped */
> r = 1; /* raidx format */
>
> asm volatile("ptesync": : :"memory");
> asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
> : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
> asm volatile("eieio; tlbsync; ptesync": : :"memory");
> }
That has R=1, I'm using R=0.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-21 2:14 ` Paul Mackerras
0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-21 2:14 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev
On Sat, Nov 19, 2016 at 12:05:21PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
>
> > On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
> >> Paul Mackerras <paulus@ozlabs.org> writes:
> >> +
> >> > + /* Global flush of TLBs and partition table caches for this lpid */
> >> > + asm volatile("ptesync");
> >> > + asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> >> > + asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> >> > +}
> >>
> >>
> >> It would be nice to convert that 0x800 to a documented IS value or better use
> >> radix__flush_tlb_pid() ?
> >
> > Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
> > flush. I could use TLBIEL_INVAL_SET_LPID except the name implies it's
> > for tlbiel and this is a tlbie.
> >
>
> I wrote that wrong, we really don't have tlb_pid() what we have is tlb_lpid().
>
> void radix__flush_tlb_lpid(unsigned long lpid)
> {
> unsigned long rb,rs,prs,r;
> unsigned long ric = RIC_FLUSH_ALL;
>
> rb = 0x2 << PPC_BITLSHIFT(53); /* IS = 2 */
> rs = lpid & ((1UL << 32) - 1);
> prs = 0; /* partition scoped */
> r = 1; /* raidx format */
>
> asm volatile("ptesync": : :"memory");
> asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
> : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
> asm volatile("eieio; tlbsync; ptesync": : :"memory");
> }
That has R=1, I'm using R=0.
Paul.
^ permalink raw reply [flat|nested] 64+ messages in thread
end of thread, other threads:[~2016-11-21 2:14 UTC | newest]
Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-18 7:28 [PATCH 00/13] KVM: PPC: Support POWER9 guests Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 01/13] powerpc/64: Add some more SPRs and SPR bits for POWER9 Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-18 14:27 ` Aneesh Kumar K.V
2016-11-18 14:39 ` Aneesh Kumar K.V
2016-11-19 4:19 ` Paul Mackerras
2016-11-19 4:19 ` Paul Mackerras
2016-11-19 6:35 ` Aneesh Kumar K.V
2016-11-19 6:47 ` Aneesh Kumar K.V
2016-11-21 2:14 ` Paul Mackerras
2016-11-21 2:14 ` Paul Mackerras
2016-11-19 0:45 ` Balbir Singh
2016-11-19 0:45 ` Balbir Singh
2016-11-19 4:23 ` Paul Mackerras
2016-11-19 4:23 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 03/13] powerpc/powernv: Define real-mode versions of OPAL XICS accessors Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 04/13] KVM: PPC: Book3S HV: Don't lose hardware R/C bit updates in H_PROTECT Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9 Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-19 0:38 ` Balbir Singh
2016-11-19 0:38 ` Balbir Singh
2016-11-21 2:02 ` Paul Mackerras
2016-11-21 2:02 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 06/13] KVM: PPC: Book3S HV: Set partition table rather than SDR1 " Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-19 1:01 ` Balbir Singh
2016-11-19 1:01 ` Balbir Singh
2016-11-18 7:28 ` [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9 Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-18 14:35 ` Aneesh Kumar K.V
2016-11-18 14:47 ` Aneesh Kumar K.V
2016-11-19 4:02 ` Paul Mackerras
2016-11-19 4:02 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 08/13] KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9 Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-18 14:41 ` Aneesh Kumar K.V
2016-11-18 14:53 ` Aneesh Kumar K.V
2016-11-18 21:57 ` Benjamin Herrenschmidt
2016-11-18 21:57 ` Benjamin Herrenschmidt
2016-11-19 4:14 ` Paul Mackerras
2016-11-19 4:14 ` Paul Mackerras
2016-11-19 4:41 ` Benjamin Herrenschmidt
2016-11-19 4:41 ` Benjamin Herrenschmidt
2016-11-19 4:13 ` Paul Mackerras
2016-11-19 4:13 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores " Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-18 14:47 ` Aneesh Kumar K.V
2016-11-18 14:59 ` Aneesh Kumar K.V
2016-11-19 3:53 ` Paul Mackerras
2016-11-19 3:53 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 11/13] KVM: PPC: Book3S HV: Use OPAL XICS emulation " Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 12/13] KVM: PPC: Book3S HV: Use stop instruction rather than nap " Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
2016-11-18 7:28 ` [PATCH 13/13] KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores Paul Mackerras
2016-11-18 7:28 ` Paul Mackerras
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.