All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] KVM: PPC: Support POWER9 guests
@ 2016-11-18  7:28 ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

This series of patches adds support to HV KVM for running KVM guests
on POWER9 systems.  This allows us to run KVM guests that use HPT
(hashed page table) address translation and know about the POWER9
processor.  With this, Suraj Jitindar Singh's recent patch series
"powerpc: add support for ISA v2.07 compat level" and suitable changes
to the user-mode driver will allow us to run guests on POWER9 in
POWER8 (or POWER7) compatibility mode.

For now we require the host to be in HPT mode (not radix).

This series of patches is based on v4.9-rc4 plus my patch "powerpc/64:
Simplify adaptation to new ISA v3.00 HPTE format" and Yongji Xie's
two-patch series "KVM: PPC: Book3S HV: Optimize for MMIO emulation".

Paul.
---
 Documentation/virtual/kvm/api.txt              |   2 +
 arch/powerpc/include/asm/kvm_host.h            |   3 +
 arch/powerpc/include/asm/kvm_ppc.h             |   7 +-
 arch/powerpc/include/asm/mmu.h                 |   5 +
 arch/powerpc/include/asm/opal.h                |   3 +
 arch/powerpc/include/asm/reg.h                 |   5 +
 arch/powerpc/include/uapi/asm/kvm.h            |   4 +
 arch/powerpc/kernel/asm-offsets.c              |   3 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c            |  39 +++++--
 arch/powerpc/kvm/book3s_hv.c                   | 140 ++++++++++++++++++++++---
 arch/powerpc/kvm/book3s_hv_builtin.c           |  69 +++++++++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c            | 113 ++++++++++++++------
 arch/powerpc/kvm/book3s_hv_rm_xics.c           |  23 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S        | 132 ++++++++++++++++-------
 arch/powerpc/kvm/powerpc.c                     |  11 +-
 arch/powerpc/mm/hash_utils_64.c                |  28 +----
 arch/powerpc/mm/pgtable-radix.c                |  18 ++--
 arch/powerpc/mm/pgtable_64.c                   |  33 ++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
 arch/powerpc/platforms/powernv/opal.c          |   2 +
 20 files changed, 483 insertions(+), 160 deletions(-)


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 00/13] KVM: PPC: Support POWER9 guests
@ 2016-11-18  7:28 ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

This series of patches adds support to HV KVM for running KVM guests
on POWER9 systems.  This allows us to run KVM guests that use HPT
(hashed page table) address translation and know about the POWER9
processor.  With this, Suraj Jitindar Singh's recent patch series
"powerpc: add support for ISA v2.07 compat level" and suitable changes
to the user-mode driver will allow us to run guests on POWER9 in
POWER8 (or POWER7) compatibility mode.

For now we require the host to be in HPT mode (not radix).

This series of patches is based on v4.9-rc4 plus my patch "powerpc/64:
Simplify adaptation to new ISA v3.00 HPTE format" and Yongji Xie's
two-patch series "KVM: PPC: Book3S HV: Optimize for MMIO emulation".

Paul.
---
 Documentation/virtual/kvm/api.txt              |   2 +
 arch/powerpc/include/asm/kvm_host.h            |   3 +
 arch/powerpc/include/asm/kvm_ppc.h             |   7 +-
 arch/powerpc/include/asm/mmu.h                 |   5 +
 arch/powerpc/include/asm/opal.h                |   3 +
 arch/powerpc/include/asm/reg.h                 |   5 +
 arch/powerpc/include/uapi/asm/kvm.h            |   4 +
 arch/powerpc/kernel/asm-offsets.c              |   3 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c            |  39 +++++--
 arch/powerpc/kvm/book3s_hv.c                   | 140 ++++++++++++++++++++++---
 arch/powerpc/kvm/book3s_hv_builtin.c           |  69 +++++++++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c            | 113 ++++++++++++++------
 arch/powerpc/kvm/book3s_hv_rm_xics.c           |  23 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S        | 132 ++++++++++++++++-------
 arch/powerpc/kvm/powerpc.c                     |  11 +-
 arch/powerpc/mm/hash_utils_64.c                |  28 +----
 arch/powerpc/mm/pgtable-radix.c                |  18 ++--
 arch/powerpc/mm/pgtable_64.c                   |  33 ++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
 arch/powerpc/platforms/powernv/opal.c          |   2 +
 20 files changed, 483 insertions(+), 160 deletions(-)


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 01/13] powerpc/64: Add some more SPRs and SPR bits for POWER9
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

These definitions will be needed by KVM.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/reg.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 9cd4e8c..df81411 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -153,6 +153,8 @@
 #define PSSCR_EC		0x00100000 /* Exit Criterion */
 #define PSSCR_ESL		0x00200000 /* Enable State Loss */
 #define PSSCR_SD		0x00400000 /* Status Disable */
+#define PSSCR_PLS	0xf000000000000000 /* Power-saving Level Status */
+#define PSSCR_GUEST_VIS	0xf0000000000003ff /* Guest-visible PSSCR fields */
 
 /* Floating Point Status and Control Register (FPSCR) Fields */
 #define FPSCR_FX	0x80000000	/* FPU exception summary */
@@ -236,6 +238,7 @@
 #define SPRN_TEXASRU	0x83	/* ''	   ''	   ''	 Upper 32  */
 #define   TEXASR_FS	__MASK(63-36) /* TEXASR Failure Summary */
 #define SPRN_TFHAR	0x80	/* Transaction Failure Handler Addr */
+#define SPRN_TIDR	144	/* Thread ID register */
 #define SPRN_CTRLF	0x088
 #define SPRN_CTRLT	0x098
 #define   CTRL_CT	0xc0000000	/* current thread */
@@ -294,6 +297,7 @@
 #define SPRN_HSRR1	0x13B	/* Hypervisor Save/Restore 1 */
 #define SPRN_LMRR	0x32D	/* Load Monitor Region Register */
 #define SPRN_LMSER	0x32E	/* Load Monitor Section Enable Register */
+#define SPRN_ASDR	0x330	/* Access segment descriptor register */
 #define SPRN_IC		0x350	/* Virtual Instruction Count */
 #define SPRN_VTB	0x351	/* Virtual Time Base */
 #define SPRN_LDBAR	0x352	/* LD Base Address Register */
@@ -357,6 +361,7 @@
 #define     LPCR_PECE2		ASM_CONST(0x0000000000001000)	/* machine check etc can cause exit */
 #define   LPCR_MER		ASM_CONST(0x0000000000000800)	/* Mediated External Exception */
 #define   LPCR_MER_SH		11
+#define	  LPCR_GTSE		ASM_CONST(0x0000000000000400)  	/* Guest Translation Shootdown Enable */
 #define   LPCR_TC		ASM_CONST(0x0000000000000200)	/* Translation control */
 #define   LPCR_LPES		0x0000000c
 #define   LPCR_LPES0		ASM_CONST(0x0000000000000008)      /* LPAR Env selector 0 */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 01/13] powerpc/64: Add some more SPRs and SPR bits for POWER9
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

These definitions will be needed by KVM.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/reg.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 9cd4e8c..df81411 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -153,6 +153,8 @@
 #define PSSCR_EC		0x00100000 /* Exit Criterion */
 #define PSSCR_ESL		0x00200000 /* Enable State Loss */
 #define PSSCR_SD		0x00400000 /* Status Disable */
+#define PSSCR_PLS	0xf000000000000000 /* Power-saving Level Status */
+#define PSSCR_GUEST_VIS	0xf0000000000003ff /* Guest-visible PSSCR fields */
 
 /* Floating Point Status and Control Register (FPSCR) Fields */
 #define FPSCR_FX	0x80000000	/* FPU exception summary */
@@ -236,6 +238,7 @@
 #define SPRN_TEXASRU	0x83	/* ''	   ''	   ''	 Upper 32  */
 #define   TEXASR_FS	__MASK(63-36) /* TEXASR Failure Summary */
 #define SPRN_TFHAR	0x80	/* Transaction Failure Handler Addr */
+#define SPRN_TIDR	144	/* Thread ID register */
 #define SPRN_CTRLF	0x088
 #define SPRN_CTRLT	0x098
 #define   CTRL_CT	0xc0000000	/* current thread */
@@ -294,6 +297,7 @@
 #define SPRN_HSRR1	0x13B	/* Hypervisor Save/Restore 1 */
 #define SPRN_LMRR	0x32D	/* Load Monitor Region Register */
 #define SPRN_LMSER	0x32E	/* Load Monitor Section Enable Register */
+#define SPRN_ASDR	0x330	/* Access segment descriptor register */
 #define SPRN_IC		0x350	/* Virtual Instruction Count */
 #define SPRN_VTB	0x351	/* Virtual Time Base */
 #define SPRN_LDBAR	0x352	/* LD Base Address Register */
@@ -357,6 +361,7 @@
 #define     LPCR_PECE2		ASM_CONST(0x0000000000001000)	/* machine check etc can cause exit */
 #define   LPCR_MER		ASM_CONST(0x0000000000000800)	/* Mediated External Exception */
 #define   LPCR_MER_SH		11
+#define	  LPCR_GTSE		ASM_CONST(0x0000000000000400)  	/* Guest Translation Shootdown Enable */
 #define   LPCR_TC		ASM_CONST(0x0000000000000200)	/* Translation control */
 #define   LPCR_LPES		0x0000000c
 #define   LPCR_LPES0		ASM_CONST(0x0000000000000008)      /* LPAR Env selector 0 */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

POWER9 requires the host to set up a partition table, which is a
table in memory indexed by logical partition ID (LPID) which
contains the pointers to page tables and process tables for the
host and each guest.

This factors out the initialization of the partition table into
a single function.  This code was previously duplicated between
hash_utils_64.c and pgtable-radix.c.

This provides a function for setting a partition table entry,
which is used in early MMU initialization, and will be used by
KVM whenever a guest is created.  This function includes a tlbie
instruction which will flush all TLB entries for the LPID and
all caches of the partition table entry for the LPID, across the
system.

This also moves a call to memblock_set_current_limit(), which was
in radix_init_partition_table(), but has nothing to do with the
partition table.  By analogy with the similar code for hash, the
call gets moved to near the end of radix__early_init_mmu().  It
now gets called when running as a guest, whereas previously it
would only be called if the kernel is running as the host.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/mmu.h  |  5 +++++
 arch/powerpc/mm/hash_utils_64.c | 28 ++++------------------------
 arch/powerpc/mm/pgtable-radix.c | 18 ++++++------------
 arch/powerpc/mm/pgtable_64.c    | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index e883683..060b40b 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -208,6 +208,11 @@ extern u64 ppc64_rma_size;
 /* Cleanup function used by kexec */
 extern void mmu_cleanup_all(void);
 extern void radix__mmu_cleanup_all(void);
+
+/* Functions for creating and updating partition table on POWER9 */
+extern void mmu_partition_table_init(void);
+extern void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
+					  unsigned long dw1);
 #endif /* CONFIG_PPC64 */
 
 struct mm_struct;
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 44d3c3a..b9a062f 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -792,37 +792,17 @@ static void update_hid_for_hash(void)
 static void __init hash_init_partition_table(phys_addr_t hash_table,
 					     unsigned long htab_size)
 {
-	unsigned long ps_field;
-	unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
+	mmu_partition_table_init();
 
 	/*
-	 * slb llp encoding for the page size used in VPM real mode.
-	 * We can ignore that for lpid 0
+	 * PS field (VRMA page size) is not used for LPID 0, hence set to 0.
+	 * For now, UPRT is 0 and we have no segment table.
 	 */
-	ps_field = 0;
 	htab_size =  __ilog2(htab_size) - 18;
-
-	BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
-	partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
-						MEMBLOCK_ALLOC_ANYWHERE));
-
-	/* Initialize the Partition Table with no entries */
-	memset((void *)partition_tb, 0, patb_size);
-	partition_tb->patb0 = cpu_to_be64(ps_field | hash_table | htab_size);
-	/*
-	 * FIXME!! This should be done via update_partition table
-	 * For now UPRT is 0 for us.
-	 */
-	partition_tb->patb1 = 0;
+	mmu_partition_table_set_entry(0, hash_table | htab_size, 0);
 	pr_info("Partition table %p\n", partition_tb);
 	if (cpu_has_feature(CPU_FTR_POWER9_DD1))
 		update_hid_for_hash();
-	/*
-	 * update partition table control register,
-	 * 64 K size.
-	 */
-	mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
-
 }
 
 static void __init htab_initialize(void)
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index ed7bddc..186f1ad 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -177,23 +177,15 @@ static void __init radix_init_pgtable(void)
 
 static void __init radix_init_partition_table(void)
 {
-	unsigned long rts_field;
+	unsigned long rts_field, dw0;
 
+	mmu_partition_table_init();
 	rts_field = radix__get_tree_size();
+	dw0 = rts_field | __pa(init_mm.pgd) | RADIX_PGD_INDEX_SIZE | PATB_HR;
+	mmu_partition_table_set_entry(0, dw0, 0);
 
-	BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
-	partition_tb = early_alloc_pgtable(1UL << PATB_SIZE_SHIFT);
-	partition_tb->patb0 = cpu_to_be64(rts_field | __pa(init_mm.pgd) |
-					  RADIX_PGD_INDEX_SIZE | PATB_HR);
 	pr_info("Initializing Radix MMU\n");
 	pr_info("Partition table %p\n", partition_tb);
-
-	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
-	/*
-	 * update partition table control register,
-	 * 64 K size.
-	 */
-	mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
 }
 
 void __init radix_init_native(void)
@@ -378,6 +370,8 @@ void __init radix__early_init_mmu(void)
 		radix_init_partition_table();
 	}
 
+	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
+
 	radix_init_pgtable();
 }
 
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index f5e8d4e..fef0890 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -431,3 +431,36 @@ void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
 	}
 }
 #endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+void mmu_partition_table_init(void)
+{
+	unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
+
+	BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
+	partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
+						MEMBLOCK_ALLOC_ANYWHERE));
+
+	/* Initialize the Partition Table with no entries */
+	memset((void *)partition_tb, 0, patb_size);
+
+	/*
+	 * update partition table control register,
+	 * 64 K size.
+	 */
+	mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
+}
+
+void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
+				   unsigned long dw1)
+{
+	partition_tb[lpid].patb0 = cpu_to_be64(dw0);
+	partition_tb[lpid].patb1 = cpu_to_be64(dw1);
+
+	/* Global flush of TLBs and partition table caches for this lpid */
+	asm volatile("ptesync");
+	asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
+	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+}
+EXPORT_SYMBOL_GPL(mmu_partition_table_set_entry);
+#endif /* CONFIG_PPC_BOOK3S_64 */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

POWER9 requires the host to set up a partition table, which is a
table in memory indexed by logical partition ID (LPID) which
contains the pointers to page tables and process tables for the
host and each guest.

This factors out the initialization of the partition table into
a single function.  This code was previously duplicated between
hash_utils_64.c and pgtable-radix.c.

This provides a function for setting a partition table entry,
which is used in early MMU initialization, and will be used by
KVM whenever a guest is created.  This function includes a tlbie
instruction which will flush all TLB entries for the LPID and
all caches of the partition table entry for the LPID, across the
system.

This also moves a call to memblock_set_current_limit(), which was
in radix_init_partition_table(), but has nothing to do with the
partition table.  By analogy with the similar code for hash, the
call gets moved to near the end of radix__early_init_mmu().  It
now gets called when running as a guest, whereas previously it
would only be called if the kernel is running as the host.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/mmu.h  |  5 +++++
 arch/powerpc/mm/hash_utils_64.c | 28 ++++------------------------
 arch/powerpc/mm/pgtable-radix.c | 18 ++++++------------
 arch/powerpc/mm/pgtable_64.c    | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index e883683..060b40b 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -208,6 +208,11 @@ extern u64 ppc64_rma_size;
 /* Cleanup function used by kexec */
 extern void mmu_cleanup_all(void);
 extern void radix__mmu_cleanup_all(void);
+
+/* Functions for creating and updating partition table on POWER9 */
+extern void mmu_partition_table_init(void);
+extern void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
+					  unsigned long dw1);
 #endif /* CONFIG_PPC64 */
 
 struct mm_struct;
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 44d3c3a..b9a062f 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -792,37 +792,17 @@ static void update_hid_for_hash(void)
 static void __init hash_init_partition_table(phys_addr_t hash_table,
 					     unsigned long htab_size)
 {
-	unsigned long ps_field;
-	unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
+	mmu_partition_table_init();
 
 	/*
-	 * slb llp encoding for the page size used in VPM real mode.
-	 * We can ignore that for lpid 0
+	 * PS field (VRMA page size) is not used for LPID 0, hence set to 0.
+	 * For now, UPRT is 0 and we have no segment table.
 	 */
-	ps_field = 0;
 	htab_size =  __ilog2(htab_size) - 18;
-
-	BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
-	partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
-						MEMBLOCK_ALLOC_ANYWHERE));
-
-	/* Initialize the Partition Table with no entries */
-	memset((void *)partition_tb, 0, patb_size);
-	partition_tb->patb0 = cpu_to_be64(ps_field | hash_table | htab_size);
-	/*
-	 * FIXME!! This should be done via update_partition table
-	 * For now UPRT is 0 for us.
-	 */
-	partition_tb->patb1 = 0;
+	mmu_partition_table_set_entry(0, hash_table | htab_size, 0);
 	pr_info("Partition table %p\n", partition_tb);
 	if (cpu_has_feature(CPU_FTR_POWER9_DD1))
 		update_hid_for_hash();
-	/*
-	 * update partition table control register,
-	 * 64 K size.
-	 */
-	mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
-
 }
 
 static void __init htab_initialize(void)
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index ed7bddc..186f1ad 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -177,23 +177,15 @@ static void __init radix_init_pgtable(void)
 
 static void __init radix_init_partition_table(void)
 {
-	unsigned long rts_field;
+	unsigned long rts_field, dw0;
 
+	mmu_partition_table_init();
 	rts_field = radix__get_tree_size();
+	dw0 = rts_field | __pa(init_mm.pgd) | RADIX_PGD_INDEX_SIZE | PATB_HR;
+	mmu_partition_table_set_entry(0, dw0, 0);
 
-	BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
-	partition_tb = early_alloc_pgtable(1UL << PATB_SIZE_SHIFT);
-	partition_tb->patb0 = cpu_to_be64(rts_field | __pa(init_mm.pgd) |
-					  RADIX_PGD_INDEX_SIZE | PATB_HR);
 	pr_info("Initializing Radix MMU\n");
 	pr_info("Partition table %p\n", partition_tb);
-
-	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
-	/*
-	 * update partition table control register,
-	 * 64 K size.
-	 */
-	mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
 }
 
 void __init radix_init_native(void)
@@ -378,6 +370,8 @@ void __init radix__early_init_mmu(void)
 		radix_init_partition_table();
 	}
 
+	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
+
 	radix_init_pgtable();
 }
 
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index f5e8d4e..fef0890 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -431,3 +431,36 @@ void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
 	}
 }
 #endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+void mmu_partition_table_init(void)
+{
+	unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
+
+	BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
+	partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
+						MEMBLOCK_ALLOC_ANYWHERE));
+
+	/* Initialize the Partition Table with no entries */
+	memset((void *)partition_tb, 0, patb_size);
+
+	/*
+	 * update partition table control register,
+	 * 64 K size.
+	 */
+	mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
+}
+
+void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
+				   unsigned long dw1)
+{
+	partition_tb[lpid].patb0 = cpu_to_be64(dw0);
+	partition_tb[lpid].patb1 = cpu_to_be64(dw1);
+
+	/* Global flush of TLBs and partition table caches for this lpid */
+	asm volatile("ptesync");
+	asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
+	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+}
+EXPORT_SYMBOL_GPL(mmu_partition_table_set_entry);
+#endif /* CONFIG_PPC_BOOK3S_64 */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 03/13] powerpc/powernv: Define real-mode versions of OPAL XICS accessors
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

This defines real-mode versions of opal_int_get_xirr(), opal_int_eoi()
and opal_int_set_mfrr(), for use by KVM real-mode code.

It also exports opal_int_set_mfrr() so that the modular part of KVM
can use it to send IPIs.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/opal.h                | 3 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S | 3 +++
 arch/powerpc/platforms/powernv/opal.c          | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e958b70..5c7db0f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -220,9 +220,12 @@ int64_t opal_pci_set_power_state(uint64_t async_token, uint64_t id,
 int64_t opal_pci_poll2(uint64_t id, uint64_t data);
 
 int64_t opal_int_get_xirr(uint32_t *out_xirr, bool just_poll);
+int64_t opal_rm_int_get_xirr(__be32 *out_xirr, bool just_poll);
 int64_t opal_int_set_cppr(uint8_t cppr);
 int64_t opal_int_eoi(uint32_t xirr);
+int64_t opal_rm_int_eoi(uint32_t xirr);
 int64_t opal_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
+int64_t opal_rm_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
 int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
 			  uint32_t pe_num, uint32_t tce_size,
 			  uint64_t dma_addr, uint32_t npages);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 44d2d84..3aa40f1 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -304,8 +304,11 @@ OPAL_CALL(opal_pci_get_presence_state,		OPAL_PCI_GET_PRESENCE_STATE);
 OPAL_CALL(opal_pci_get_power_state,		OPAL_PCI_GET_POWER_STATE);
 OPAL_CALL(opal_pci_set_power_state,		OPAL_PCI_SET_POWER_STATE);
 OPAL_CALL(opal_int_get_xirr,			OPAL_INT_GET_XIRR);
+OPAL_CALL_REAL(opal_rm_int_get_xirr,		OPAL_INT_GET_XIRR);
 OPAL_CALL(opal_int_set_cppr,			OPAL_INT_SET_CPPR);
 OPAL_CALL(opal_int_eoi,				OPAL_INT_EOI);
+OPAL_CALL_REAL(opal_rm_int_eoi,			OPAL_INT_EOI);
 OPAL_CALL(opal_int_set_mfrr,			OPAL_INT_SET_MFRR);
+OPAL_CALL_REAL(opal_rm_int_set_mfrr,		OPAL_INT_SET_MFRR);
 OPAL_CALL(opal_pci_tce_kill,			OPAL_PCI_TCE_KILL);
 OPAL_CALL_REAL(opal_rm_pci_tce_kill,		OPAL_PCI_TCE_KILL);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 6c9a65b..b3b8930 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -896,3 +896,5 @@ EXPORT_SYMBOL_GPL(opal_leds_get_ind);
 EXPORT_SYMBOL_GPL(opal_leds_set_ind);
 /* Export this symbol for PowerNV Operator Panel class driver */
 EXPORT_SYMBOL_GPL(opal_write_oppanel_async);
+/* Export this for KVM */
+EXPORT_SYMBOL_GPL(opal_int_set_mfrr);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 03/13] powerpc/powernv: Define real-mode versions of OPAL XICS accessors
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

This defines real-mode versions of opal_int_get_xirr(), opal_int_eoi()
and opal_int_set_mfrr(), for use by KVM real-mode code.

It also exports opal_int_set_mfrr() so that the modular part of KVM
can use it to send IPIs.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/opal.h                | 3 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S | 3 +++
 arch/powerpc/platforms/powernv/opal.c          | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e958b70..5c7db0f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -220,9 +220,12 @@ int64_t opal_pci_set_power_state(uint64_t async_token, uint64_t id,
 int64_t opal_pci_poll2(uint64_t id, uint64_t data);
 
 int64_t opal_int_get_xirr(uint32_t *out_xirr, bool just_poll);
+int64_t opal_rm_int_get_xirr(__be32 *out_xirr, bool just_poll);
 int64_t opal_int_set_cppr(uint8_t cppr);
 int64_t opal_int_eoi(uint32_t xirr);
+int64_t opal_rm_int_eoi(uint32_t xirr);
 int64_t opal_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
+int64_t opal_rm_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
 int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
 			  uint32_t pe_num, uint32_t tce_size,
 			  uint64_t dma_addr, uint32_t npages);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 44d2d84..3aa40f1 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -304,8 +304,11 @@ OPAL_CALL(opal_pci_get_presence_state,		OPAL_PCI_GET_PRESENCE_STATE);
 OPAL_CALL(opal_pci_get_power_state,		OPAL_PCI_GET_POWER_STATE);
 OPAL_CALL(opal_pci_set_power_state,		OPAL_PCI_SET_POWER_STATE);
 OPAL_CALL(opal_int_get_xirr,			OPAL_INT_GET_XIRR);
+OPAL_CALL_REAL(opal_rm_int_get_xirr,		OPAL_INT_GET_XIRR);
 OPAL_CALL(opal_int_set_cppr,			OPAL_INT_SET_CPPR);
 OPAL_CALL(opal_int_eoi,				OPAL_INT_EOI);
+OPAL_CALL_REAL(opal_rm_int_eoi,			OPAL_INT_EOI);
 OPAL_CALL(opal_int_set_mfrr,			OPAL_INT_SET_MFRR);
+OPAL_CALL_REAL(opal_rm_int_set_mfrr,		OPAL_INT_SET_MFRR);
 OPAL_CALL(opal_pci_tce_kill,			OPAL_PCI_TCE_KILL);
 OPAL_CALL_REAL(opal_rm_pci_tce_kill,		OPAL_PCI_TCE_KILL);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 6c9a65b..b3b8930 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -896,3 +896,5 @@ EXPORT_SYMBOL_GPL(opal_leds_get_ind);
 EXPORT_SYMBOL_GPL(opal_leds_set_ind);
 /* Export this symbol for PowerNV Operator Panel class driver */
 EXPORT_SYMBOL_GPL(opal_write_oppanel_async);
+/* Export this for KVM */
+EXPORT_SYMBOL_GPL(opal_int_set_mfrr);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 04/13] KVM: PPC: Book3S HV: Don't lose hardware R/C bit updates in H_PROTECT
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

The hashed page table MMU in POWER processors can update the R
(reference) and C (change) bits in a HPTE at any time until the
HPTE has been invalidated and the TLB invalidation sequence has
completed.  In kvmppc_h_protect, which implements the H_PROTECT
hypercall, we read the HPTE, modify the second doubleword,
invalidate the HPTE in memory, do the TLB invalidation sequence,
and then write the modified value of the second doubleword back
to memory.  In doing so we could overwrite an R/C bit update done
by hardware between when we read the HPTE and when the TLB
invalidation completed.  To fix this we re-read the second
doubleword after the TLB invalidation and OR in the (possibly)
new values of R and C.  We can use an OR since hardware only ever
sets R and C, never clears them.

This race was found by code inspection.  In principle this bug could
cause occasional guest memory corruption under host memory pressure.

Fixes: a8606e20e41a ("KVM: PPC: Handle some PAPR hcalls in the kernel", 2011-06-29)
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 752451f3..02786b3 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -670,6 +670,8 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 					      HPTE_V_ABSENT);
 			do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
 				  true);
+			/* Don't lose R/C bit updates done by hardware */
+			r |= be64_to_cpu(hpte[1]) & (HPTE_R_R | HPTE_R_C);
 			hpte[1] = cpu_to_be64(r);
 		}
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 04/13] KVM: PPC: Book3S HV: Don't lose hardware R/C bit updates in H_PROTECT
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

The hashed page table MMU in POWER processors can update the R
(reference) and C (change) bits in a HPTE at any time until the
HPTE has been invalidated and the TLB invalidation sequence has
completed.  In kvmppc_h_protect, which implements the H_PROTECT
hypercall, we read the HPTE, modify the second doubleword,
invalidate the HPTE in memory, do the TLB invalidation sequence,
and then write the modified value of the second doubleword back
to memory.  In doing so we could overwrite an R/C bit update done
by hardware between when we read the HPTE and when the TLB
invalidation completed.  To fix this we re-read the second
doubleword after the TLB invalidation and OR in the (possibly)
new values of R and C.  We can use an OR since hardware only ever
sets R and C, never clears them.

This race was found by code inspection.  In principle this bug could
cause occasional guest memory corruption under host memory pressure.

Fixes: a8606e20e41a ("KVM: PPC: Handle some PAPR hcalls in the kernel", 2011-06-29)
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 752451f3..02786b3 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -670,6 +670,8 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 					      HPTE_V_ABSENT);
 			do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
 				  true);
+			/* Don't lose R/C bit updates done by hardware */
+			r |= be64_to_cpu(hpte[1]) & (HPTE_R_R | HPTE_R_C);
 			hpte[1] = cpu_to_be64(r);
 		}
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

This adapts the KVM-HV hashed page table (HPT) code to read and write
HPT entries in the new format defined in Power ISA v3.00 on POWER9
machines.  The new format moves the B (segment size) field from the
first doubleword to the second, and trims some bits from the AVA
(abbreviated virtual address) and ARPN (abbreviated real page number)
fields.  As far as possible, the conversion is done when reading or
writing the HPT entries, and the rest of the code continues to use
the old format.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  39 ++++++++++----
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 101 +++++++++++++++++++++++++-----------
 2 files changed, 100 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7755bd0..20a8e8e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -314,7 +314,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	struct kvmppc_slb *slbe;
 	unsigned long slb_v;
 	unsigned long pp, key;
-	unsigned long v, gr;
+	unsigned long v, orig_v, gr;
 	__be64 *hptep;
 	int index;
 	int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
@@ -339,10 +339,12 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 		return -ENOENT;
 	}
 	hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
-	v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
+	v = orig_v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		v = hpte_new_to_old_v(v, be64_to_cpu(hptep[1]));
 	gr = kvm->arch.revmap[index].guest_rpte;
 
-	unlock_hpte(hptep, v);
+	unlock_hpte(hptep, orig_v);
 	preempt_enable();
 
 	gpte->eaddr = eaddr;
@@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 {
 	struct kvm *kvm = vcpu->kvm;
 	unsigned long hpte[3], r;
+	unsigned long hnow_v, hnow_r;
 	__be64 *hptep;
 	unsigned long mmu_seq, psize, pte_size;
 	unsigned long gpa_base, gfn_base;
@@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	unlock_hpte(hptep, hpte[0]);
 	preempt_enable();
 
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
+		hpte[1] = hpte_new_to_old_r(hpte[1]);
+	}
 	if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
 	    hpte[1] != vcpu->arch.pgfault_hpte[1])
 		return RESUME_GUEST;
@@ -599,9 +606,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	preempt_disable();
 	while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
 		cpu_relax();
-	if ((be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK) != hpte[0] ||
-		be64_to_cpu(hptep[1]) != hpte[1] ||
-		rev->guest_rpte != hpte[2])
+	hnow_v = be64_to_cpu(hptep[0]);
+	hnow_r = be64_to_cpu(hptep[1]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		hnow_v = hpte_new_to_old_v(hnow_v, hnow_r);
+		hnow_r = hpte_new_to_old_r(hnow_r);
+	}
+	if ((hnow_v & ~HPTE_V_HVLOCK) != hpte[0] || hnow_r != hpte[1] ||
+	    rev->guest_rpte != hpte[2])
 		/* HPTE has been changed under us; let the guest retry */
 		goto out_unlock;
 	hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
@@ -632,6 +644,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
 	}
 
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		r = hpte_old_to_new_r(hpte[0], r);
+		hpte[0] = hpte_old_to_new_v(hpte[0]);
+	}
 	hptep[1] = cpu_to_be64(r);
 	eieio();
 	__unlock_hpte(hptep, hpte[0]);
@@ -1183,7 +1199,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 			unsigned long *hpte, struct revmap_entry *revp,
 			int want_valid, int first_pass)
 {
-	unsigned long v, r;
+	unsigned long v, r, hr;
 	unsigned long rcbits_unset;
 	int ok = 1;
 	int valid, dirty;
@@ -1210,6 +1226,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 		while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
 			cpu_relax();
 		v = be64_to_cpu(hptp[0]);
+		hr = be64_to_cpu(hptp[1]);
+		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+			v = hpte_new_to_old_v(v, hr);
+			hr = hpte_new_to_old_r(hr);
+		}
 
 		/* re-evaluate valid and dirty from synchronized HPTE value */
 		valid = !!(v & HPTE_V_VALID);
@@ -1217,8 +1238,8 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 
 		/* Harvest R and C into guest view if necessary */
 		rcbits_unset = ~revp->guest_rpte & (HPTE_R_R | HPTE_R_C);
-		if (valid && (rcbits_unset & be64_to_cpu(hptp[1]))) {
-			revp->guest_rpte |= (be64_to_cpu(hptp[1]) &
+		if (valid && (rcbits_unset & hr)) {
+			revp->guest_rpte |= (hr &
 				(HPTE_R_R | HPTE_R_C)) | HPTE_GR_MODIFIED;
 			dirty = 1;
 		}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 02786b3..1179e40 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -364,6 +364,11 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		}
 	}
 
+	/* Convert to new format on P9 */
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		ptel = hpte_old_to_new_r(pteh, ptel);
+		pteh = hpte_old_to_new_v(pteh);
+	}
 	hpte[1] = cpu_to_be64(ptel);
 
 	/* Write the first HPTE dword, unlocking the HPTE and making it valid */
@@ -445,27 +450,31 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	__be64 *hpte;
 	unsigned long v, r, rb;
 	struct revmap_entry *rev;
-	u64 pte;
+	u64 pte, orig_pte, pte_r;
 
 	if (pte_index >= kvm->arch.hpt_npte)
 		return H_PARAMETER;
 	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
-	pte = be64_to_cpu(hpte[0]);
+	pte = orig_pte = be64_to_cpu(hpte[0]);
+	pte_r = be64_to_cpu(hpte[1]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		pte = hpte_new_to_old_v(pte, pte_r);
+		pte_r = hpte_new_to_old_r(pte_r);
+	}
 	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
 	    ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
-		__unlock_hpte(hpte, pte);
+		__unlock_hpte(hpte, orig_pte);
 		return H_NOT_FOUND;
 	}
 
 	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 	v = pte & ~HPTE_V_HVLOCK;
-	pte = be64_to_cpu(hpte[1]);
 	if (v & HPTE_V_VALID) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
-		rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
+		rb = compute_tlbie_rb(v, pte_r, pte_index);
 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
 		/*
 		 * The reference (R) and change (C) bits in a HPT
@@ -483,7 +492,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	note_hpte_modification(kvm, rev);
 	unlock_hpte(hpte, 0);
 
-	if (is_mmio_hpte(v, pte))
+	if (is_mmio_hpte(v, pte_r))
 		atomic64_inc(&kvm->arch.mmio_update);
 
 	if (v & HPTE_V_ABSENT)
@@ -546,6 +555,10 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			found = 0;
 			hp0 = be64_to_cpu(hp[0]);
 			hp1 = be64_to_cpu(hp[1]);
+			if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+				hp0 = hpte_new_to_old_v(hp0, hp1);
+				hp1 = hpte_new_to_old_r(hp1);
+			}
 			if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
 				switch (flags & 3) {
 				case 0:		/* absolute */
@@ -583,8 +596,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 
 			/* leave it locked */
 			hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
-			tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
-				be64_to_cpu(hp[1]), pte_index);
+			tlbrb[n] = compute_tlbie_rb(hp0, hp1, pte_index);
 			indexes[n] = j;
 			hptes[n] = hp;
 			revs[n] = rev;
@@ -622,7 +634,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	__be64 *hpte;
 	struct revmap_entry *rev;
 	unsigned long v, r, rb, mask, bits;
-	u64 pte;
+	u64 pte_v, pte_r;
 
 	if (pte_index >= kvm->arch.hpt_npte)
 		return H_PARAMETER;
@@ -630,15 +642,16 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
-	pte = be64_to_cpu(hpte[0]);
-	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
-	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
-		__unlock_hpte(hpte, pte);
+	v = pte_v = be64_to_cpu(hpte[0]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		v = hpte_new_to_old_v(v, be64_to_cpu(hpte[1]));
+	if ((v & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
+	    ((flags & H_AVPN) && (v & ~0x7fUL) != avpn)) {
+		__unlock_hpte(hpte, pte_v);
 		return H_NOT_FOUND;
 	}
 
-	v = pte;
-	pte = be64_to_cpu(hpte[1]);
+	pte_r = be64_to_cpu(hpte[1]);
 	bits = (flags << 55) & HPTE_R_PP0;
 	bits |= (flags << 48) & HPTE_R_KEY_HI;
 	bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
@@ -660,13 +673,13 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 		 * readonly to writable.  If it should be writable, we'll
 		 * take a trap and let the page fault code sort it out.
 		 */
-		r = (pte & ~mask) | bits;
-		if (hpte_is_writable(r) && !hpte_is_writable(pte))
+		r = (pte_r & ~mask) | bits;
+		if (hpte_is_writable(r) && !hpte_is_writable(pte_r))
 			r = hpte_make_readonly(r);
 		/* If the PTE is changing, invalidate it first */
-		if (r != pte) {
+		if (r != pte_r) {
 			rb = compute_tlbie_rb(v, r, pte_index);
-			hpte[0] = cpu_to_be64((v & ~HPTE_V_VALID) |
+			hpte[0] = cpu_to_be64((pte_v & ~HPTE_V_VALID) |
 					      HPTE_V_ABSENT);
 			do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
 				  true);
@@ -675,9 +688,9 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 			hpte[1] = cpu_to_be64(r);
 		}
 	}
-	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
+	unlock_hpte(hpte, pte_v & ~HPTE_V_HVLOCK);
 	asm volatile("ptesync" : : : "memory");
-	if (is_mmio_hpte(v, pte))
+	if (is_mmio_hpte(v, pte_r))
 		atomic64_inc(&kvm->arch.mmio_update);
 
 	return H_SUCCESS;
@@ -703,6 +716,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
 		r = be64_to_cpu(hpte[1]);
+		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+			v = hpte_new_to_old_v(v, r);
+			r = hpte_new_to_old_r(r);
+		}
 		if (v & HPTE_V_ABSENT) {
 			v &= ~HPTE_V_ABSENT;
 			v |= HPTE_V_VALID;
@@ -820,10 +837,16 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
 			unsigned long pte_index)
 {
 	unsigned long rb;
+	u64 hp0, hp1;
 
 	hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
-	rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
-			      pte_index);
+	hp0 = be64_to_cpu(hptep[0]);
+	hp1 = be64_to_cpu(hptep[1]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		hp0 = hpte_new_to_old_v(hp0, hp1);
+		hp1 = hpte_new_to_old_r(hp1);
+	}
+	rb = compute_tlbie_rb(hp0, hp1, pte_index);
 	do_tlbies(kvm, &rb, 1, 1, true);
 }
 EXPORT_SYMBOL_GPL(kvmppc_invalidate_hpte);
@@ -833,9 +856,15 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
 {
 	unsigned long rb;
 	unsigned char rbyte;
+	u64 hp0, hp1;
 
-	rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
-			      pte_index);
+	hp0 = be64_to_cpu(hptep[0]);
+	hp1 = be64_to_cpu(hptep[1]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		hp0 = hpte_new_to_old_v(hp0, hp1);
+		hp1 = hpte_new_to_old_r(hp1);
+	}
+	rb = compute_tlbie_rb(hp0, hp1, pte_index);
 	rbyte = (be64_to_cpu(hptep[1]) & ~HPTE_R_R) >> 8;
 	/* modify only the second-last byte, which contains the ref bit */
 	*((char *)hptep + 14) = rbyte;
@@ -895,7 +924,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 	unsigned long avpn;
 	__be64 *hpte;
 	unsigned long mask, val;
-	unsigned long v, r;
+	unsigned long v, r, orig_v;
 
 	/* Get page shift, work out hash and AVPN etc. */
 	mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_SECONDARY;
@@ -930,6 +959,8 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 		for (i = 0; i < 16; i += 2) {
 			/* Read the PTE racily */
 			v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
+			if (cpu_has_feature(CPU_FTR_ARCH_300))
+				v = hpte_new_to_old_v(v, be64_to_cpu(hpte[i+1]));
 
 			/* Check valid/absent, hash, segment size and AVPN */
 			if (!(v & valid) || (v & mask) != val)
@@ -938,8 +969,12 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 			/* Lock the PTE and read it under the lock */
 			while (!try_lock_hpte(&hpte[i], HPTE_V_HVLOCK))
 				cpu_relax();
-			v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
+			v = orig_v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
 			r = be64_to_cpu(hpte[i+1]);
+			if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+				v = hpte_new_to_old_v(v, r);
+				r = hpte_new_to_old_r(r);
+			}
 
 			/*
 			 * Check the HPTE again, including base page size
@@ -949,7 +984,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 				/* Return with the HPTE still locked */
 				return (hash << 3) + (i >> 1);
 
-			__unlock_hpte(&hpte[i], v);
+			__unlock_hpte(&hpte[i], orig_v);
 		}
 
 		if (val & HPTE_V_SECONDARY)
@@ -977,7 +1012,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 {
 	struct kvm *kvm = vcpu->kvm;
 	long int index;
-	unsigned long v, r, gr;
+	unsigned long v, r, gr, orig_v;
 	__be64 *hpte;
 	unsigned long valid;
 	struct revmap_entry *rev;
@@ -1005,12 +1040,16 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 			return 0;	/* for prot fault, HPTE disappeared */
 		}
 		hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
-		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
+		v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
 		r = be64_to_cpu(hpte[1]);
+		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+			v = hpte_new_to_old_v(v, r);
+			r = hpte_new_to_old_r(r);
+		}
 		rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
 		gr = rev->guest_rpte;
 
-		unlock_hpte(hpte, v);
+		unlock_hpte(hpte, orig_v);
 	}
 
 	/* For not found, if the HPTE is valid by now, retry the instruction */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

This adapts the KVM-HV hashed page table (HPT) code to read and write
HPT entries in the new format defined in Power ISA v3.00 on POWER9
machines.  The new format moves the B (segment size) field from the
first doubleword to the second, and trims some bits from the AVA
(abbreviated virtual address) and ARPN (abbreviated real page number)
fields.  As far as possible, the conversion is done when reading or
writing the HPT entries, and the rest of the code continues to use
the old format.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  39 ++++++++++----
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 101 +++++++++++++++++++++++++-----------
 2 files changed, 100 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7755bd0..20a8e8e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -314,7 +314,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	struct kvmppc_slb *slbe;
 	unsigned long slb_v;
 	unsigned long pp, key;
-	unsigned long v, gr;
+	unsigned long v, orig_v, gr;
 	__be64 *hptep;
 	int index;
 	int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
@@ -339,10 +339,12 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 		return -ENOENT;
 	}
 	hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
-	v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
+	v = orig_v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		v = hpte_new_to_old_v(v, be64_to_cpu(hptep[1]));
 	gr = kvm->arch.revmap[index].guest_rpte;
 
-	unlock_hpte(hptep, v);
+	unlock_hpte(hptep, orig_v);
 	preempt_enable();
 
 	gpte->eaddr = eaddr;
@@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 {
 	struct kvm *kvm = vcpu->kvm;
 	unsigned long hpte[3], r;
+	unsigned long hnow_v, hnow_r;
 	__be64 *hptep;
 	unsigned long mmu_seq, psize, pte_size;
 	unsigned long gpa_base, gfn_base;
@@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	unlock_hpte(hptep, hpte[0]);
 	preempt_enable();
 
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
+		hpte[1] = hpte_new_to_old_r(hpte[1]);
+	}
 	if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
 	    hpte[1] != vcpu->arch.pgfault_hpte[1])
 		return RESUME_GUEST;
@@ -599,9 +606,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	preempt_disable();
 	while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
 		cpu_relax();
-	if ((be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK) != hpte[0] ||
-		be64_to_cpu(hptep[1]) != hpte[1] ||
-		rev->guest_rpte != hpte[2])
+	hnow_v = be64_to_cpu(hptep[0]);
+	hnow_r = be64_to_cpu(hptep[1]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		hnow_v = hpte_new_to_old_v(hnow_v, hnow_r);
+		hnow_r = hpte_new_to_old_r(hnow_r);
+	}
+	if ((hnow_v & ~HPTE_V_HVLOCK) != hpte[0] || hnow_r != hpte[1] ||
+	    rev->guest_rpte != hpte[2])
 		/* HPTE has been changed under us; let the guest retry */
 		goto out_unlock;
 	hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
@@ -632,6 +644,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
 	}
 
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		r = hpte_old_to_new_r(hpte[0], r);
+		hpte[0] = hpte_old_to_new_v(hpte[0]);
+	}
 	hptep[1] = cpu_to_be64(r);
 	eieio();
 	__unlock_hpte(hptep, hpte[0]);
@@ -1183,7 +1199,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 			unsigned long *hpte, struct revmap_entry *revp,
 			int want_valid, int first_pass)
 {
-	unsigned long v, r;
+	unsigned long v, r, hr;
 	unsigned long rcbits_unset;
 	int ok = 1;
 	int valid, dirty;
@@ -1210,6 +1226,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 		while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
 			cpu_relax();
 		v = be64_to_cpu(hptp[0]);
+		hr = be64_to_cpu(hptp[1]);
+		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+			v = hpte_new_to_old_v(v, hr);
+			hr = hpte_new_to_old_r(hr);
+		}
 
 		/* re-evaluate valid and dirty from synchronized HPTE value */
 		valid = !!(v & HPTE_V_VALID);
@@ -1217,8 +1238,8 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 
 		/* Harvest R and C into guest view if necessary */
 		rcbits_unset = ~revp->guest_rpte & (HPTE_R_R | HPTE_R_C);
-		if (valid && (rcbits_unset & be64_to_cpu(hptp[1]))) {
-			revp->guest_rpte |= (be64_to_cpu(hptp[1]) &
+		if (valid && (rcbits_unset & hr)) {
+			revp->guest_rpte |= (hr &
 				(HPTE_R_R | HPTE_R_C)) | HPTE_GR_MODIFIED;
 			dirty = 1;
 		}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 02786b3..1179e40 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -364,6 +364,11 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		}
 	}
 
+	/* Convert to new format on P9 */
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		ptel = hpte_old_to_new_r(pteh, ptel);
+		pteh = hpte_old_to_new_v(pteh);
+	}
 	hpte[1] = cpu_to_be64(ptel);
 
 	/* Write the first HPTE dword, unlocking the HPTE and making it valid */
@@ -445,27 +450,31 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	__be64 *hpte;
 	unsigned long v, r, rb;
 	struct revmap_entry *rev;
-	u64 pte;
+	u64 pte, orig_pte, pte_r;
 
 	if (pte_index >= kvm->arch.hpt_npte)
 		return H_PARAMETER;
 	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
-	pte = be64_to_cpu(hpte[0]);
+	pte = orig_pte = be64_to_cpu(hpte[0]);
+	pte_r = be64_to_cpu(hpte[1]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		pte = hpte_new_to_old_v(pte, pte_r);
+		pte_r = hpte_new_to_old_r(pte_r);
+	}
 	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
 	    ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
-		__unlock_hpte(hpte, pte);
+		__unlock_hpte(hpte, orig_pte);
 		return H_NOT_FOUND;
 	}
 
 	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 	v = pte & ~HPTE_V_HVLOCK;
-	pte = be64_to_cpu(hpte[1]);
 	if (v & HPTE_V_VALID) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
-		rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
+		rb = compute_tlbie_rb(v, pte_r, pte_index);
 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
 		/*
 		 * The reference (R) and change (C) bits in a HPT
@@ -483,7 +492,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	note_hpte_modification(kvm, rev);
 	unlock_hpte(hpte, 0);
 
-	if (is_mmio_hpte(v, pte))
+	if (is_mmio_hpte(v, pte_r))
 		atomic64_inc(&kvm->arch.mmio_update);
 
 	if (v & HPTE_V_ABSENT)
@@ -546,6 +555,10 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			found = 0;
 			hp0 = be64_to_cpu(hp[0]);
 			hp1 = be64_to_cpu(hp[1]);
+			if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+				hp0 = hpte_new_to_old_v(hp0, hp1);
+				hp1 = hpte_new_to_old_r(hp1);
+			}
 			if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
 				switch (flags & 3) {
 				case 0:		/* absolute */
@@ -583,8 +596,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 
 			/* leave it locked */
 			hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
-			tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
-				be64_to_cpu(hp[1]), pte_index);
+			tlbrb[n] = compute_tlbie_rb(hp0, hp1, pte_index);
 			indexes[n] = j;
 			hptes[n] = hp;
 			revs[n] = rev;
@@ -622,7 +634,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	__be64 *hpte;
 	struct revmap_entry *rev;
 	unsigned long v, r, rb, mask, bits;
-	u64 pte;
+	u64 pte_v, pte_r;
 
 	if (pte_index >= kvm->arch.hpt_npte)
 		return H_PARAMETER;
@@ -630,15 +642,16 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
-	pte = be64_to_cpu(hpte[0]);
-	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
-	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
-		__unlock_hpte(hpte, pte);
+	v = pte_v = be64_to_cpu(hpte[0]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		v = hpte_new_to_old_v(v, be64_to_cpu(hpte[1]));
+	if ((v & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
+	    ((flags & H_AVPN) && (v & ~0x7fUL) != avpn)) {
+		__unlock_hpte(hpte, pte_v);
 		return H_NOT_FOUND;
 	}
 
-	v = pte;
-	pte = be64_to_cpu(hpte[1]);
+	pte_r = be64_to_cpu(hpte[1]);
 	bits = (flags << 55) & HPTE_R_PP0;
 	bits |= (flags << 48) & HPTE_R_KEY_HI;
 	bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
@@ -660,13 +673,13 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 		 * readonly to writable.  If it should be writable, we'll
 		 * take a trap and let the page fault code sort it out.
 		 */
-		r = (pte & ~mask) | bits;
-		if (hpte_is_writable(r) && !hpte_is_writable(pte))
+		r = (pte_r & ~mask) | bits;
+		if (hpte_is_writable(r) && !hpte_is_writable(pte_r))
 			r = hpte_make_readonly(r);
 		/* If the PTE is changing, invalidate it first */
-		if (r != pte) {
+		if (r != pte_r) {
 			rb = compute_tlbie_rb(v, r, pte_index);
-			hpte[0] = cpu_to_be64((v & ~HPTE_V_VALID) |
+			hpte[0] = cpu_to_be64((pte_v & ~HPTE_V_VALID) |
 					      HPTE_V_ABSENT);
 			do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
 				  true);
@@ -675,9 +688,9 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 			hpte[1] = cpu_to_be64(r);
 		}
 	}
-	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
+	unlock_hpte(hpte, pte_v & ~HPTE_V_HVLOCK);
 	asm volatile("ptesync" : : : "memory");
-	if (is_mmio_hpte(v, pte))
+	if (is_mmio_hpte(v, pte_r))
 		atomic64_inc(&kvm->arch.mmio_update);
 
 	return H_SUCCESS;
@@ -703,6 +716,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
 		r = be64_to_cpu(hpte[1]);
+		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+			v = hpte_new_to_old_v(v, r);
+			r = hpte_new_to_old_r(r);
+		}
 		if (v & HPTE_V_ABSENT) {
 			v &= ~HPTE_V_ABSENT;
 			v |= HPTE_V_VALID;
@@ -820,10 +837,16 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
 			unsigned long pte_index)
 {
 	unsigned long rb;
+	u64 hp0, hp1;
 
 	hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
-	rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
-			      pte_index);
+	hp0 = be64_to_cpu(hptep[0]);
+	hp1 = be64_to_cpu(hptep[1]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		hp0 = hpte_new_to_old_v(hp0, hp1);
+		hp1 = hpte_new_to_old_r(hp1);
+	}
+	rb = compute_tlbie_rb(hp0, hp1, pte_index);
 	do_tlbies(kvm, &rb, 1, 1, true);
 }
 EXPORT_SYMBOL_GPL(kvmppc_invalidate_hpte);
@@ -833,9 +856,15 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
 {
 	unsigned long rb;
 	unsigned char rbyte;
+	u64 hp0, hp1;
 
-	rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
-			      pte_index);
+	hp0 = be64_to_cpu(hptep[0]);
+	hp1 = be64_to_cpu(hptep[1]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		hp0 = hpte_new_to_old_v(hp0, hp1);
+		hp1 = hpte_new_to_old_r(hp1);
+	}
+	rb = compute_tlbie_rb(hp0, hp1, pte_index);
 	rbyte = (be64_to_cpu(hptep[1]) & ~HPTE_R_R) >> 8;
 	/* modify only the second-last byte, which contains the ref bit */
 	*((char *)hptep + 14) = rbyte;
@@ -895,7 +924,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 	unsigned long avpn;
 	__be64 *hpte;
 	unsigned long mask, val;
-	unsigned long v, r;
+	unsigned long v, r, orig_v;
 
 	/* Get page shift, work out hash and AVPN etc. */
 	mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_SECONDARY;
@@ -930,6 +959,8 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 		for (i = 0; i < 16; i += 2) {
 			/* Read the PTE racily */
 			v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
+			if (cpu_has_feature(CPU_FTR_ARCH_300))
+				v = hpte_new_to_old_v(v, be64_to_cpu(hpte[i+1]));
 
 			/* Check valid/absent, hash, segment size and AVPN */
 			if (!(v & valid) || (v & mask) != val)
@@ -938,8 +969,12 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 			/* Lock the PTE and read it under the lock */
 			while (!try_lock_hpte(&hpte[i], HPTE_V_HVLOCK))
 				cpu_relax();
-			v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
+			v = orig_v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
 			r = be64_to_cpu(hpte[i+1]);
+			if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+				v = hpte_new_to_old_v(v, r);
+				r = hpte_new_to_old_r(r);
+			}
 
 			/*
 			 * Check the HPTE again, including base page size
@@ -949,7 +984,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 				/* Return with the HPTE still locked */
 				return (hash << 3) + (i >> 1);
 
-			__unlock_hpte(&hpte[i], v);
+			__unlock_hpte(&hpte[i], orig_v);
 		}
 
 		if (val & HPTE_V_SECONDARY)
@@ -977,7 +1012,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 {
 	struct kvm *kvm = vcpu->kvm;
 	long int index;
-	unsigned long v, r, gr;
+	unsigned long v, r, gr, orig_v;
 	__be64 *hpte;
 	unsigned long valid;
 	struct revmap_entry *rev;
@@ -1005,12 +1040,16 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 			return 0;	/* for prot fault, HPTE disappeared */
 		}
 		hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
-		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
+		v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
 		r = be64_to_cpu(hpte[1]);
+		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+			v = hpte_new_to_old_v(v, r);
+			r = hpte_new_to_old_r(r);
+		}
 		rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
 		gr = rev->guest_rpte;
 
-		unlock_hpte(hpte, v);
+		unlock_hpte(hpte, orig_v);
 	}
 
 	/* For not found, if the HPTE is valid by now, retry the instruction */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 06/13] KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

On POWER9, the SDR1 register (hashed page table base address) is no
longer used, and instead the hardware reads the HPT base address
and size from the partition table.  The partition table entry also
contains the bits that specify the page size for the VRMA mapping,
which were previously in the LPCR.  The VPM0 bit of the LPCR is
now reserved; the processor now always uses the VRMA (virtual
real-mode area) mechanism for guest real-mode accesses in HPT mode,
and the RMO (real-mode offset) mechanism has been dropped.

When entering or exiting the guest, we now only have to set the
LPIDR (logical partition ID register), not the SDR1 register.
There is also no requirement now to transition via a reserved
LPID value.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv.c            | 36 +++++++++++++++++++++++++++------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 ++++++---
 2 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 40b2b6d..5cbe3c3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -54,6 +54,7 @@
 #include <asm/dbell.h>
 #include <asm/hmi.h>
 #include <asm/pnv-pci.h>
+#include <asm/mmu.h>
 #include <linux/gfp.h>
 #include <linux/vmalloc.h>
 #include <linux/highmem.h>
@@ -3024,6 +3025,22 @@ static void kvmppc_mmu_destroy_hv(struct kvm_vcpu *vcpu)
 	return;
 }
 
+static void kvmppc_setup_partition_table(struct kvm *kvm)
+{
+	unsigned long dw0, dw1;
+
+	/* PS field - page size for VRMA */
+	dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
+		((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
+	/* HTABSIZE and HTABORG fields */
+	dw0 |= kvm->arch.sdr1;
+
+	/* Second dword has GR=0; other fields are unused since UPRT=0 */
+	dw1 = 0;
+
+	mmu_partition_table_set_entry(kvm->arch.lpid, dw0, dw1);
+}
+
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 {
 	int err = 0;
@@ -3075,17 +3092,20 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 	      psize == 0x1000000))
 		goto out_srcu;
 
-	/* Update VRMASD field in the LPCR */
 	senc = slb_pgsize_encoding(psize);
 	kvm->arch.vrma_slb_v = senc | SLB_VSID_B_1T |
 		(VRMA_VSID << SLB_VSID_SHIFT_1T);
-	/* the -4 is to account for senc values starting at 0x10 */
-	lpcr = senc << (LPCR_VRMASD_SH - 4);
-
 	/* Create HPTEs in the hash page table for the VRMA */
 	kvmppc_map_vrma(vcpu, memslot, porder);
 
-	kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
+	/* Update VRMASD field in the LPCR */
+	if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+		/* the -4 is to account for senc values starting at 0x10 */
+		lpcr = senc << (LPCR_VRMASD_SH - 4);
+		kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
+	} else {
+		kvmppc_setup_partition_table(kvm);
+	}
 
 	/* Order updates to kvm->arch.lpcr etc. vs. hpte_setup_done */
 	smp_wmb();
@@ -3235,7 +3255,8 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
 	       sizeof(kvm->arch.enabled_hcalls));
 
-	kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
+	if (!cpu_has_feature(CPU_FTR_ARCH_300))
+		kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
 
 	/* Init LPCR for virtual RMA mode */
 	kvm->arch.host_lpid = mfspr(SPRN_LPID);
@@ -3248,6 +3269,9 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	/* On POWER8 turn on online bit to enable PURR/SPURR */
 	if (cpu_has_feature(CPU_FTR_ARCH_207S))
 		lpcr |= LPCR_ONL;
+	/* On POWER9, VPM0 bit is reserved (VPM0=1 behaviour is assumed) */
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		lpcr &= ~LPCR_VPM0;
 	kvm->arch.lpcr = lpcr;
 
 	/*
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c3c1d1b..dc25467 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -581,12 +581,14 @@ kvmppc_hv_entry:
 	ld	r9,VCORE_KVM(r5)	/* pointer to struct kvm */
 	cmpwi	r6,0
 	bne	10f
-	ld	r6,KVM_SDR1(r9)
 	lwz	r7,KVM_LPID(r9)
+BEGIN_FTR_SECTION
+	ld	r6,KVM_SDR1(r9)
 	li	r0,LPID_RSVD		/* switch to reserved LPID */
 	mtspr	SPRN_LPID,r0
 	ptesync
 	mtspr	SPRN_SDR1,r6		/* switch to partition page table */
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 	mtspr	SPRN_LPID,r7
 	isync
 
@@ -1552,12 +1554,14 @@ kvmhv_switch_to_host:
 	beq	19f
 
 	/* Primary thread switches back to host partition */
-	ld	r6,KVM_HOST_SDR1(r4)
 	lwz	r7,KVM_HOST_LPID(r4)
+BEGIN_FTR_SECTION
+	ld	r6,KVM_HOST_SDR1(r4)
 	li	r8,LPID_RSVD		/* switch to reserved LPID */
 	mtspr	SPRN_LPID,r8
 	ptesync
-	mtspr	SPRN_SDR1,r6		/* switch to partition page table */
+	mtspr	SPRN_SDR1,r6		/* switch to host page table */
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 	mtspr	SPRN_LPID,r7
 	isync
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 06/13] KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

On POWER9, the SDR1 register (hashed page table base address) is no
longer used, and instead the hardware reads the HPT base address
and size from the partition table.  The partition table entry also
contains the bits that specify the page size for the VRMA mapping,
which were previously in the LPCR.  The VPM0 bit of the LPCR is
now reserved; the processor now always uses the VRMA (virtual
real-mode area) mechanism for guest real-mode accesses in HPT mode,
and the RMO (real-mode offset) mechanism has been dropped.

When entering or exiting the guest, we now only have to set the
LPIDR (logical partition ID register), not the SDR1 register.
There is also no requirement now to transition via a reserved
LPID value.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv.c            | 36 +++++++++++++++++++++++++++------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 ++++++---
 2 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 40b2b6d..5cbe3c3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -54,6 +54,7 @@
 #include <asm/dbell.h>
 #include <asm/hmi.h>
 #include <asm/pnv-pci.h>
+#include <asm/mmu.h>
 #include <linux/gfp.h>
 #include <linux/vmalloc.h>
 #include <linux/highmem.h>
@@ -3024,6 +3025,22 @@ static void kvmppc_mmu_destroy_hv(struct kvm_vcpu *vcpu)
 	return;
 }
 
+static void kvmppc_setup_partition_table(struct kvm *kvm)
+{
+	unsigned long dw0, dw1;
+
+	/* PS field - page size for VRMA */
+	dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
+		((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
+	/* HTABSIZE and HTABORG fields */
+	dw0 |= kvm->arch.sdr1;
+
+	/* Second dword has GR=0; other fields are unused since UPRT=0 */
+	dw1 = 0;
+
+	mmu_partition_table_set_entry(kvm->arch.lpid, dw0, dw1);
+}
+
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 {
 	int err = 0;
@@ -3075,17 +3092,20 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 	      psize = 0x1000000))
 		goto out_srcu;
 
-	/* Update VRMASD field in the LPCR */
 	senc = slb_pgsize_encoding(psize);
 	kvm->arch.vrma_slb_v = senc | SLB_VSID_B_1T |
 		(VRMA_VSID << SLB_VSID_SHIFT_1T);
-	/* the -4 is to account for senc values starting at 0x10 */
-	lpcr = senc << (LPCR_VRMASD_SH - 4);
-
 	/* Create HPTEs in the hash page table for the VRMA */
 	kvmppc_map_vrma(vcpu, memslot, porder);
 
-	kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
+	/* Update VRMASD field in the LPCR */
+	if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+		/* the -4 is to account for senc values starting at 0x10 */
+		lpcr = senc << (LPCR_VRMASD_SH - 4);
+		kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
+	} else {
+		kvmppc_setup_partition_table(kvm);
+	}
 
 	/* Order updates to kvm->arch.lpcr etc. vs. hpte_setup_done */
 	smp_wmb();
@@ -3235,7 +3255,8 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
 	       sizeof(kvm->arch.enabled_hcalls));
 
-	kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
+	if (!cpu_has_feature(CPU_FTR_ARCH_300))
+		kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
 
 	/* Init LPCR for virtual RMA mode */
 	kvm->arch.host_lpid = mfspr(SPRN_LPID);
@@ -3248,6 +3269,9 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	/* On POWER8 turn on online bit to enable PURR/SPURR */
 	if (cpu_has_feature(CPU_FTR_ARCH_207S))
 		lpcr |= LPCR_ONL;
+	/* On POWER9, VPM0 bit is reserved (VPM0=1 behaviour is assumed) */
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		lpcr &= ~LPCR_VPM0;
 	kvm->arch.lpcr = lpcr;
 
 	/*
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c3c1d1b..dc25467 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -581,12 +581,14 @@ kvmppc_hv_entry:
 	ld	r9,VCORE_KVM(r5)	/* pointer to struct kvm */
 	cmpwi	r6,0
 	bne	10f
-	ld	r6,KVM_SDR1(r9)
 	lwz	r7,KVM_LPID(r9)
+BEGIN_FTR_SECTION
+	ld	r6,KVM_SDR1(r9)
 	li	r0,LPID_RSVD		/* switch to reserved LPID */
 	mtspr	SPRN_LPID,r0
 	ptesync
 	mtspr	SPRN_SDR1,r6		/* switch to partition page table */
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 	mtspr	SPRN_LPID,r7
 	isync
 
@@ -1552,12 +1554,14 @@ kvmhv_switch_to_host:
 	beq	19f
 
 	/* Primary thread switches back to host partition */
-	ld	r6,KVM_HOST_SDR1(r4)
 	lwz	r7,KVM_HOST_LPID(r4)
+BEGIN_FTR_SECTION
+	ld	r6,KVM_HOST_SDR1(r4)
 	li	r8,LPID_RSVD		/* switch to reserved LPID */
 	mtspr	SPRN_LPID,r8
 	ptesync
-	mtspr	SPRN_SDR1,r6		/* switch to partition page table */
+	mtspr	SPRN_SDR1,r6		/* switch to host page table */
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 	mtspr	SPRN_LPID,r7
 	isync
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

Some special-purpose registers that were present and accessible
by guests on POWER8 no longer exist on POWER9, so this adds
feature sections to ensure that we don't try to context-switch
them when going into or out of a guest on POWER9.  These are
all relatively obscure, rarely-used registers, but we had to
context-switch them on POWER8 to avoid creating a covert channel.
They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 ++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index dc25467..d422014 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -752,14 +752,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
 BEGIN_FTR_SECTION
 	ld	r5, VCPU_MMCR + 24(r4)
 	ld	r6, VCPU_SIER(r4)
+	mtspr	SPRN_MMCR2, r5
+	mtspr	SPRN_SIER, r6
+BEGIN_FTR_SECTION_NESTED(96)
 	lwz	r7, VCPU_PMC + 24(r4)
 	lwz	r8, VCPU_PMC + 28(r4)
 	ld	r9, VCPU_MMCR + 32(r4)
-	mtspr	SPRN_MMCR2, r5
-	mtspr	SPRN_SIER, r6
 	mtspr	SPRN_SPMC1, r7
 	mtspr	SPRN_SPMC2, r8
 	mtspr	SPRN_MMCRS, r9
+END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_MMCR0, r3
 	isync
@@ -815,20 +817,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_EBBHR, r8
 	ld	r5, VCPU_EBBRR(r4)
 	ld	r6, VCPU_BESCR(r4)
-	ld	r7, VCPU_CSIGR(r4)
-	ld	r8, VCPU_TACR(r4)
+	lwz	r7, VCPU_GUEST_PID(r4)
+	ld	r8, VCPU_WORT(r4)
 	mtspr	SPRN_EBBRR, r5
 	mtspr	SPRN_BESCR, r6
-	mtspr	SPRN_CSIGR, r7
-	mtspr	SPRN_TACR, r8
+	mtspr	SPRN_PID, r7
+	mtspr	SPRN_WORT, r8
+BEGIN_FTR_SECTION
 	ld	r5, VCPU_TCSCR(r4)
 	ld	r6, VCPU_ACOP(r4)
-	lwz	r7, VCPU_GUEST_PID(r4)
-	ld	r8, VCPU_WORT(r4)
+	ld	r7, VCPU_CSIGR(r4)
+	ld	r8, VCPU_TACR(r4)
 	mtspr	SPRN_TCSCR, r5
 	mtspr	SPRN_ACOP, r6
-	mtspr	SPRN_PID, r7
-	mtspr	SPRN_WORT, r8
+	mtspr	SPRN_CSIGR, r7
+	mtspr	SPRN_TACR, r8
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 8:
 
 	/*
@@ -1343,20 +1347,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	std	r8, VCPU_EBBHR(r9)
 	mfspr	r5, SPRN_EBBRR
 	mfspr	r6, SPRN_BESCR
-	mfspr	r7, SPRN_CSIGR
-	mfspr	r8, SPRN_TACR
+	mfspr	r7, SPRN_PID
+	mfspr	r8, SPRN_WORT
 	std	r5, VCPU_EBBRR(r9)
 	std	r6, VCPU_BESCR(r9)
-	std	r7, VCPU_CSIGR(r9)
-	std	r8, VCPU_TACR(r9)
+	stw	r7, VCPU_GUEST_PID(r9)
+	std	r8, VCPU_WORT(r9)
+BEGIN_FTR_SECTION
 	mfspr	r5, SPRN_TCSCR
 	mfspr	r6, SPRN_ACOP
-	mfspr	r7, SPRN_PID
-	mfspr	r8, SPRN_WORT
+	mfspr	r7, SPRN_CSIGR
+	mfspr	r8, SPRN_TACR
 	std	r5, VCPU_TCSCR(r9)
 	std	r6, VCPU_ACOP(r9)
-	stw	r7, VCPU_GUEST_PID(r9)
-	std	r8, VCPU_WORT(r9)
+	std	r7, VCPU_CSIGR(r9)
+	std	r8, VCPU_TACR(r9)
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 	/*
 	 * Restore various registers to 0, where non-zero values
 	 * set by the guest could disrupt the host.
@@ -1365,12 +1371,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_IAMR, r0
 	mtspr	SPRN_CIABR, r0
 	mtspr	SPRN_DAWRX, r0
-	mtspr	SPRN_TCSCR, r0
 	mtspr	SPRN_WORT, r0
+BEGIN_FTR_SECTION
+	mtspr	SPRN_TCSCR, r0
 	/* Set MMCRS to 1<<31 to freeze and disable the SPMC counters */
 	li	r0, 1
 	sldi	r0, r0, 31
 	mtspr	SPRN_MMCRS, r0
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 8:
 
 	/* Save and reset AMR and UAMOR before turning on the MMU */
@@ -1504,15 +1512,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	stw	r8, VCPU_PMC + 20(r9)
 BEGIN_FTR_SECTION
 	mfspr	r5, SPRN_SIER
+	std	r5, VCPU_SIER(r9)
+BEGIN_FTR_SECTION_NESTED(96)
 	mfspr	r6, SPRN_SPMC1
 	mfspr	r7, SPRN_SPMC2
 	mfspr	r8, SPRN_MMCRS
-	std	r5, VCPU_SIER(r9)
 	stw	r6, VCPU_PMC + 24(r9)
 	stw	r7, VCPU_PMC + 28(r9)
 	std	r8, VCPU_MMCR + 32(r9)
 	lis	r4, 0x8000
 	mtspr	SPRN_MMCRS, r4
+END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 22:
 	/* Clear out SLB */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

Some special-purpose registers that were present and accessible
by guests on POWER8 no longer exist on POWER9, so this adds
feature sections to ensure that we don't try to context-switch
them when going into or out of a guest on POWER9.  These are
all relatively obscure, rarely-used registers, but we had to
context-switch them on POWER8 to avoid creating a covert channel.
They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 ++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index dc25467..d422014 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -752,14 +752,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
 BEGIN_FTR_SECTION
 	ld	r5, VCPU_MMCR + 24(r4)
 	ld	r6, VCPU_SIER(r4)
+	mtspr	SPRN_MMCR2, r5
+	mtspr	SPRN_SIER, r6
+BEGIN_FTR_SECTION_NESTED(96)
 	lwz	r7, VCPU_PMC + 24(r4)
 	lwz	r8, VCPU_PMC + 28(r4)
 	ld	r9, VCPU_MMCR + 32(r4)
-	mtspr	SPRN_MMCR2, r5
-	mtspr	SPRN_SIER, r6
 	mtspr	SPRN_SPMC1, r7
 	mtspr	SPRN_SPMC2, r8
 	mtspr	SPRN_MMCRS, r9
+END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_MMCR0, r3
 	isync
@@ -815,20 +817,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_EBBHR, r8
 	ld	r5, VCPU_EBBRR(r4)
 	ld	r6, VCPU_BESCR(r4)
-	ld	r7, VCPU_CSIGR(r4)
-	ld	r8, VCPU_TACR(r4)
+	lwz	r7, VCPU_GUEST_PID(r4)
+	ld	r8, VCPU_WORT(r4)
 	mtspr	SPRN_EBBRR, r5
 	mtspr	SPRN_BESCR, r6
-	mtspr	SPRN_CSIGR, r7
-	mtspr	SPRN_TACR, r8
+	mtspr	SPRN_PID, r7
+	mtspr	SPRN_WORT, r8
+BEGIN_FTR_SECTION
 	ld	r5, VCPU_TCSCR(r4)
 	ld	r6, VCPU_ACOP(r4)
-	lwz	r7, VCPU_GUEST_PID(r4)
-	ld	r8, VCPU_WORT(r4)
+	ld	r7, VCPU_CSIGR(r4)
+	ld	r8, VCPU_TACR(r4)
 	mtspr	SPRN_TCSCR, r5
 	mtspr	SPRN_ACOP, r6
-	mtspr	SPRN_PID, r7
-	mtspr	SPRN_WORT, r8
+	mtspr	SPRN_CSIGR, r7
+	mtspr	SPRN_TACR, r8
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 8:
 
 	/*
@@ -1343,20 +1347,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	std	r8, VCPU_EBBHR(r9)
 	mfspr	r5, SPRN_EBBRR
 	mfspr	r6, SPRN_BESCR
-	mfspr	r7, SPRN_CSIGR
-	mfspr	r8, SPRN_TACR
+	mfspr	r7, SPRN_PID
+	mfspr	r8, SPRN_WORT
 	std	r5, VCPU_EBBRR(r9)
 	std	r6, VCPU_BESCR(r9)
-	std	r7, VCPU_CSIGR(r9)
-	std	r8, VCPU_TACR(r9)
+	stw	r7, VCPU_GUEST_PID(r9)
+	std	r8, VCPU_WORT(r9)
+BEGIN_FTR_SECTION
 	mfspr	r5, SPRN_TCSCR
 	mfspr	r6, SPRN_ACOP
-	mfspr	r7, SPRN_PID
-	mfspr	r8, SPRN_WORT
+	mfspr	r7, SPRN_CSIGR
+	mfspr	r8, SPRN_TACR
 	std	r5, VCPU_TCSCR(r9)
 	std	r6, VCPU_ACOP(r9)
-	stw	r7, VCPU_GUEST_PID(r9)
-	std	r8, VCPU_WORT(r9)
+	std	r7, VCPU_CSIGR(r9)
+	std	r8, VCPU_TACR(r9)
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 	/*
 	 * Restore various registers to 0, where non-zero values
 	 * set by the guest could disrupt the host.
@@ -1365,12 +1371,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_IAMR, r0
 	mtspr	SPRN_CIABR, r0
 	mtspr	SPRN_DAWRX, r0
-	mtspr	SPRN_TCSCR, r0
 	mtspr	SPRN_WORT, r0
+BEGIN_FTR_SECTION
+	mtspr	SPRN_TCSCR, r0
 	/* Set MMCRS to 1<<31 to freeze and disable the SPMC counters */
 	li	r0, 1
 	sldi	r0, r0, 31
 	mtspr	SPRN_MMCRS, r0
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 8:
 
 	/* Save and reset AMR and UAMOR before turning on the MMU */
@@ -1504,15 +1512,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	stw	r8, VCPU_PMC + 20(r9)
 BEGIN_FTR_SECTION
 	mfspr	r5, SPRN_SIER
+	std	r5, VCPU_SIER(r9)
+BEGIN_FTR_SECTION_NESTED(96)
 	mfspr	r6, SPRN_SPMC1
 	mfspr	r7, SPRN_SPMC2
 	mfspr	r8, SPRN_MMCRS
-	std	r5, VCPU_SIER(r9)
 	stw	r6, VCPU_PMC + 24(r9)
 	stw	r7, VCPU_PMC + 28(r9)
 	std	r8, VCPU_MMCR + 32(r9)
 	lis	r4, 0x8000
 	mtspr	SPRN_MMCRS, r4
+END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 22:
 	/* Clear out SLB */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 08/13] KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

This adds code to handle two new guest-accessible special-purpose
registers on POWER9: TIDR (thread ID register) and PSSCR (processor
stop status and control register).  They are context-switched
between host and guest, and the guest values can be read and set
via the one_reg interface.

The PSSCR contains some fields which are guest-accessible and some
which are only accessible in hypervisor mode.  We only allow the
guest-accessible fields to be read or set by userspace.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 Documentation/virtual/kvm/api.txt       |  2 ++
 arch/powerpc/include/asm/kvm_host.h     |  2 ++
 arch/powerpc/include/uapi/asm/kvm.h     |  4 ++++
 arch/powerpc/kernel/asm-offsets.c       |  2 ++
 arch/powerpc/kvm/book3s_hv.c            | 12 ++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 39 +++++++++++++++++++++++++++++++--
 6 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 739db9a..40b2bfc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2023,6 +2023,8 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_WORT              | 64
   PPC	| KVM_REG_PPC_SPRG9             | 64
   PPC	| KVM_REG_PPC_DBSR              | 32
+  PPC   | KVM_REG_PPC_TIDR              | 64
+  PPC   | KVM_REG_PPC_PSSCR             | 64
   PPC   | KVM_REG_PPC_TM_GPR0           | 64
           ...
   PPC   | KVM_REG_PPC_TM_GPR31          | 64
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 20ef27d..0d94608 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -517,6 +517,8 @@ struct kvm_vcpu_arch {
 	ulong tcscr;
 	ulong acop;
 	ulong wort;
+	ulong tid;
+	ulong psscr;
 	ulong shadow_srr1;
 #endif
 	u32 vrsave; /* also USPRG0 */
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index c93cf35..f0bae66 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -573,6 +573,10 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_SPRG9	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xba)
 #define KVM_REG_PPC_DBSR	(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbb)
 
+/* POWER9 registers */
+#define KVM_REG_PPC_TIDR	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbc)
+#define KVM_REG_PPC_PSSCR	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbd)
+
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
  */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index caec7bf..494241b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -548,6 +548,8 @@ int main(void)
 	DEFINE(VCPU_TCSCR, offsetof(struct kvm_vcpu, arch.tcscr));
 	DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
 	DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
+	DEFINE(VCPU_TID, offsetof(struct kvm_vcpu, arch.tid));
+	DEFINE(VCPU_PSSCR, offsetof(struct kvm_vcpu, arch.psscr));
 	DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
 	DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
 	DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5cbe3c3..59e18dfb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1230,6 +1230,12 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_WORT:
 		*val = get_reg_val(id, vcpu->arch.wort);
 		break;
+	case KVM_REG_PPC_TIDR:
+		*val = get_reg_val(id, vcpu->arch.tid);
+		break;
+	case KVM_REG_PPC_PSSCR:
+		*val = get_reg_val(id, vcpu->arch.psscr);
+		break;
 	case KVM_REG_PPC_VPA_ADDR:
 		spin_lock(&vcpu->arch.vpa_update_lock);
 		*val = get_reg_val(id, vcpu->arch.vpa.next_gpa);
@@ -1428,6 +1434,12 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_WORT:
 		vcpu->arch.wort = set_reg_val(id, *val);
 		break;
+	case KVM_REG_PPC_TIDR:
+		vcpu->arch.tid = set_reg_val(id, *val);
+		break;
+	case KVM_REG_PPC_PSSCR:
+		vcpu->arch.psscr = set_reg_val(id, *val) & PSSCR_GUEST_VIS;
+		break;
 	case KVM_REG_PPC_VPA_ADDR:
 		addr = set_reg_val(id, *val);
 		r = -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d422014..219a04f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -523,6 +523,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
  *                                                                            *
  *****************************************************************************/
 
+/* Stack frame offsets */
+#define STACK_SLOT_TID		(112-16)
+#define STACK_SLOT_PSSCR	(112-24)
+
 .global kvmppc_hv_entry
 kvmppc_hv_entry:
 
@@ -700,6 +704,14 @@ kvmppc_got_guest:
 	mtspr	SPRN_PURR,r7
 	mtspr	SPRN_SPURR,r8
 
+	/* Save host values of some registers */
+BEGIN_FTR_SECTION
+	mfspr	r5, SPRN_TIDR
+	mfspr	r6, SPRN_PSSCR
+	std	r5, STACK_SLOT_TID(r1)
+	std	r6, STACK_SLOT_PSSCR(r1)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+
 BEGIN_FTR_SECTION
 	/* Set partition DABR */
 	/* Do this before re-enabling PMU to avoid P7 DABR corruption bug */
@@ -824,6 +836,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_PID, r7
 	mtspr	SPRN_WORT, r8
 BEGIN_FTR_SECTION
+	/* POWER8-only registers */
 	ld	r5, VCPU_TCSCR(r4)
 	ld	r6, VCPU_ACOP(r4)
 	ld	r7, VCPU_CSIGR(r4)
@@ -832,7 +845,14 @@ BEGIN_FTR_SECTION
 	mtspr	SPRN_ACOP, r6
 	mtspr	SPRN_CSIGR, r7
 	mtspr	SPRN_TACR, r8
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
+FTR_SECTION_ELSE
+	/* POWER9-only registers */
+	ld	r5, VCPU_TID(r4)
+	ld	r6, VCPU_PSSCR(r4)
+	oris	r6, r6, PSSCR_EC@h	/* This makes stop trap to HV */
+	mtspr	SPRN_TIDR, r5
+	mtspr	SPRN_PSSCR, r6
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
 8:
 
 	/*
@@ -1362,7 +1382,14 @@ BEGIN_FTR_SECTION
 	std	r6, VCPU_ACOP(r9)
 	std	r7, VCPU_CSIGR(r9)
 	std	r8, VCPU_TACR(r9)
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
+FTR_SECTION_ELSE
+	mfspr	r5, SPRN_TIDR
+	mfspr	r6, SPRN_PSSCR
+	std	r5, VCPU_TID(r9)
+	rldicl	r6, r6, 4, 50		/* r6 &= PSSCR_GUEST_VIS */
+	rotldi	r6, r6, 60
+	std	r6, VCPU_PSSCR(r9)
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
 	/*
 	 * Restore various registers to 0, where non-zero values
 	 * set by the guest could disrupt the host.
@@ -1531,6 +1558,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	slbia
 	ptesync
 
+	/* Restore host values of some registers */
+BEGIN_FTR_SECTION
+	ld	r5, STACK_SLOT_TID(r1)
+	ld	r6, STACK_SLOT_PSSCR(r1)
+	mtspr	SPRN_TIDR, r5
+	mtspr	SPRN_PSSCR, r6
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+
 	/*
 	 * POWER7/POWER8 guest -> host partition switch code.
 	 * We don't have to lock against tlbies but we do
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 08/13] KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

This adds code to handle two new guest-accessible special-purpose
registers on POWER9: TIDR (thread ID register) and PSSCR (processor
stop status and control register).  They are context-switched
between host and guest, and the guest values can be read and set
via the one_reg interface.

The PSSCR contains some fields which are guest-accessible and some
which are only accessible in hypervisor mode.  We only allow the
guest-accessible fields to be read or set by userspace.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 Documentation/virtual/kvm/api.txt       |  2 ++
 arch/powerpc/include/asm/kvm_host.h     |  2 ++
 arch/powerpc/include/uapi/asm/kvm.h     |  4 ++++
 arch/powerpc/kernel/asm-offsets.c       |  2 ++
 arch/powerpc/kvm/book3s_hv.c            | 12 ++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 39 +++++++++++++++++++++++++++++++--
 6 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 739db9a..40b2bfc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2023,6 +2023,8 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_WORT              | 64
   PPC	| KVM_REG_PPC_SPRG9             | 64
   PPC	| KVM_REG_PPC_DBSR              | 32
+  PPC   | KVM_REG_PPC_TIDR              | 64
+  PPC   | KVM_REG_PPC_PSSCR             | 64
   PPC   | KVM_REG_PPC_TM_GPR0           | 64
           ...
   PPC   | KVM_REG_PPC_TM_GPR31          | 64
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 20ef27d..0d94608 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -517,6 +517,8 @@ struct kvm_vcpu_arch {
 	ulong tcscr;
 	ulong acop;
 	ulong wort;
+	ulong tid;
+	ulong psscr;
 	ulong shadow_srr1;
 #endif
 	u32 vrsave; /* also USPRG0 */
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index c93cf35..f0bae66 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -573,6 +573,10 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_SPRG9	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xba)
 #define KVM_REG_PPC_DBSR	(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbb)
 
+/* POWER9 registers */
+#define KVM_REG_PPC_TIDR	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbc)
+#define KVM_REG_PPC_PSSCR	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbd)
+
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
  */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index caec7bf..494241b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -548,6 +548,8 @@ int main(void)
 	DEFINE(VCPU_TCSCR, offsetof(struct kvm_vcpu, arch.tcscr));
 	DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
 	DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
+	DEFINE(VCPU_TID, offsetof(struct kvm_vcpu, arch.tid));
+	DEFINE(VCPU_PSSCR, offsetof(struct kvm_vcpu, arch.psscr));
 	DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
 	DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
 	DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5cbe3c3..59e18dfb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1230,6 +1230,12 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_WORT:
 		*val = get_reg_val(id, vcpu->arch.wort);
 		break;
+	case KVM_REG_PPC_TIDR:
+		*val = get_reg_val(id, vcpu->arch.tid);
+		break;
+	case KVM_REG_PPC_PSSCR:
+		*val = get_reg_val(id, vcpu->arch.psscr);
+		break;
 	case KVM_REG_PPC_VPA_ADDR:
 		spin_lock(&vcpu->arch.vpa_update_lock);
 		*val = get_reg_val(id, vcpu->arch.vpa.next_gpa);
@@ -1428,6 +1434,12 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_WORT:
 		vcpu->arch.wort = set_reg_val(id, *val);
 		break;
+	case KVM_REG_PPC_TIDR:
+		vcpu->arch.tid = set_reg_val(id, *val);
+		break;
+	case KVM_REG_PPC_PSSCR:
+		vcpu->arch.psscr = set_reg_val(id, *val) & PSSCR_GUEST_VIS;
+		break;
 	case KVM_REG_PPC_VPA_ADDR:
 		addr = set_reg_val(id, *val);
 		r = -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d422014..219a04f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -523,6 +523,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
  *                                                                            *
  *****************************************************************************/
 
+/* Stack frame offsets */
+#define STACK_SLOT_TID		(112-16)
+#define STACK_SLOT_PSSCR	(112-24)
+
 .global kvmppc_hv_entry
 kvmppc_hv_entry:
 
@@ -700,6 +704,14 @@ kvmppc_got_guest:
 	mtspr	SPRN_PURR,r7
 	mtspr	SPRN_SPURR,r8
 
+	/* Save host values of some registers */
+BEGIN_FTR_SECTION
+	mfspr	r5, SPRN_TIDR
+	mfspr	r6, SPRN_PSSCR
+	std	r5, STACK_SLOT_TID(r1)
+	std	r6, STACK_SLOT_PSSCR(r1)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+
 BEGIN_FTR_SECTION
 	/* Set partition DABR */
 	/* Do this before re-enabling PMU to avoid P7 DABR corruption bug */
@@ -824,6 +836,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_PID, r7
 	mtspr	SPRN_WORT, r8
 BEGIN_FTR_SECTION
+	/* POWER8-only registers */
 	ld	r5, VCPU_TCSCR(r4)
 	ld	r6, VCPU_ACOP(r4)
 	ld	r7, VCPU_CSIGR(r4)
@@ -832,7 +845,14 @@ BEGIN_FTR_SECTION
 	mtspr	SPRN_ACOP, r6
 	mtspr	SPRN_CSIGR, r7
 	mtspr	SPRN_TACR, r8
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
+FTR_SECTION_ELSE
+	/* POWER9-only registers */
+	ld	r5, VCPU_TID(r4)
+	ld	r6, VCPU_PSSCR(r4)
+	oris	r6, r6, PSSCR_EC@h	/* This makes stop trap to HV */
+	mtspr	SPRN_TIDR, r5
+	mtspr	SPRN_PSSCR, r6
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
 8:
 
 	/*
@@ -1362,7 +1382,14 @@ BEGIN_FTR_SECTION
 	std	r6, VCPU_ACOP(r9)
 	std	r7, VCPU_CSIGR(r9)
 	std	r8, VCPU_TACR(r9)
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
+FTR_SECTION_ELSE
+	mfspr	r5, SPRN_TIDR
+	mfspr	r6, SPRN_PSSCR
+	std	r5, VCPU_TID(r9)
+	rldicl	r6, r6, 4, 50		/* r6 &= PSSCR_GUEST_VIS */
+	rotldi	r6, r6, 60
+	std	r6, VCPU_PSSCR(r9)
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
 	/*
 	 * Restore various registers to 0, where non-zero values
 	 * set by the guest could disrupt the host.
@@ -1531,6 +1558,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	slbia
 	ptesync
 
+	/* Restore host values of some registers */
+BEGIN_FTR_SECTION
+	ld	r5, STACK_SLOT_TID(r1)
+	ld	r6, STACK_SLOT_PSSCR(r1)
+	mtspr	SPRN_TIDR, r5
+	mtspr	SPRN_PSSCR, r6
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+
 	/*
 	 * POWER7/POWER8 guest -> host partition switch code.
 	 * We don't have to lock against tlbies but we do
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
and tlbiel (local tlbie) instructions.  Both instructions get a
set of new parameters (RIC, PRS and R) which appear as bits in the
instruction word.  The tlbiel instruction now has a second register
operand, which contains a PID and/or LPID value if needed, and
should otherwise contain 0.

This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
as well as older processors.  Since we only handle HPT guests so
far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
word as on previous processors, so we don't need to conditionally
execute different instructions depending on the processor.

The local flush on first entry to a guest in book3s_hv_rmhandlers.S
is a loop which depends on the number of TLB sets.  Rather than
using feature sections to set the number of iterations based on
which CPU we're on, we now work out this number at VM creation time
and store it in the kvm_arch struct.  That will make it possible to
get the number from the device tree in future, which will help with
compatibility with future processors.

Since mmu_partition_table_set_entry() does a global flush of the
whole LPID, we don't need to do the TLB flush on first entry to the
guest on each processor.  Therefore we don't set all bits in the
tlb_need_flush bitmap on VM startup on POWER9.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/kvm_host.h     |  1 +
 arch/powerpc/kernel/asm-offsets.c       |  1 +
 arch/powerpc/kvm/book3s_hv.c            | 17 ++++++++++++++++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c     | 10 ++++++++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 ++------
 5 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 0d94608..ea78864 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -244,6 +244,7 @@ struct kvm_arch_memory_slot {
 struct kvm_arch {
 	unsigned int lpid;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	unsigned int tlb_sets;
 	unsigned long hpt_virt;
 	struct revmap_entry *revmap;
 	atomic64_t mmio_update;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 494241b..b9c8386 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -487,6 +487,7 @@ int main(void)
 
 	/* book3s */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	DEFINE(KVM_TLB_SETS, offsetof(struct kvm, arch.tlb_sets));
 	DEFINE(KVM_SDR1, offsetof(struct kvm, arch.sdr1));
 	DEFINE(KVM_HOST_LPID, offsetof(struct kvm, arch.host_lpid));
 	DEFINE(KVM_HOST_LPCR, offsetof(struct kvm, arch.host_lpcr));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 59e18dfb..8395a7f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3260,8 +3260,11 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	 * Since we don't flush the TLB when tearing down a VM,
 	 * and this lpid might have previously been used,
 	 * make sure we flush on each core before running the new VM.
+	 * On POWER9, the tlbie in mmu_partition_table_set_entry()
+	 * does this flush for us.
 	 */
-	cpumask_setall(&kvm->arch.need_tlb_flush);
+	if (!cpu_has_feature(CPU_FTR_ARCH_300))
+		cpumask_setall(&kvm->arch.need_tlb_flush);
 
 	/* Start out with the default set of hcalls enabled */
 	memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
@@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	kvm->arch.lpcr = lpcr;
 
 	/*
+	 * Work out how many sets the TLB has, for the use of
+	 * the TLB invalidation loop in book3s_hv_rmhandlers.S.
+	 */
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		kvm->arch.tlb_sets = 256;	/* POWER9 */
+	else if (cpu_has_feature(CPU_FTR_ARCH_207S))
+		kvm->arch.tlb_sets = 512;	/* POWER8 */
+	else
+		kvm->arch.tlb_sets = 128;	/* POWER7 */
+
+	/*
 	 * Track that we now have a HV mode VM active. This blocks secondary
 	 * CPU threads from coming online.
 	 */
@@ -3728,3 +3742,4 @@ module_exit(kvmppc_book3s_exit_hv);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_MISCDEV(KVM_MINOR);
 MODULE_ALIAS("devname:kvm");
+
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1179e40..9ef3c4b 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -424,13 +424,18 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
 {
 	long i;
 
+	/*
+	 * We use the POWER9 5-operand versions of tlbie and tlbiel here.
+	 * Since we are using RIC=0 PRS=0 R=0, and P7/P8 tlbiel ignores
+	 * the RS field, this is backwards-compatible with P7 and P8.
+	 */
 	if (global) {
 		while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
 			cpu_relax();
 		if (need_sync)
 			asm volatile("ptesync" : : : "memory");
 		for (i = 0; i < npages; ++i)
-			asm volatile(PPC_TLBIE(%1,%0) : :
+			asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
 				     "r" (rbvalues[i]), "r" (kvm->arch.lpid));
 		asm volatile("eieio; tlbsync; ptesync" : : : "memory");
 		kvm->arch.tlbie_lock = 0;
@@ -438,7 +443,8 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
 		if (need_sync)
 			asm volatile("ptesync" : : : "memory");
 		for (i = 0; i < npages; ++i)
-			asm volatile("tlbiel %0" : : "r" (rbvalues[i]));
+			asm volatile(PPC_TLBIEL(%0,%1,0,0,0) : :
+				     "r" (rbvalues[i]), "r" (0));
 		asm volatile("ptesync" : : : "memory");
 	}
 }
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 219a04f..acae5c3 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -613,12 +613,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 	stdcx.	r7,0,r6
 	bne	23b
 	/* Flush the TLB of any entries for this LPID */
-	/* use arch 2.07S as a proxy for POWER8 */
-BEGIN_FTR_SECTION
-	li	r6,512			/* POWER8 has 512 sets */
-FTR_SECTION_ELSE
-	li	r6,128			/* POWER7 has 128 sets */
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_207S)
+	lwz	r6,KVM_TLB_SETS(r9)
+	li	r0,0			/* RS for P9 version of tlbiel */
 	mtctr	r6
 	li	r7,0x800		/* IS field = 0b10 */
 	ptesync
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
and tlbiel (local tlbie) instructions.  Both instructions get a
set of new parameters (RIC, PRS and R) which appear as bits in the
instruction word.  The tlbiel instruction now has a second register
operand, which contains a PID and/or LPID value if needed, and
should otherwise contain 0.

This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
as well as older processors.  Since we only handle HPT guests so
far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
word as on previous processors, so we don't need to conditionally
execute different instructions depending on the processor.

The local flush on first entry to a guest in book3s_hv_rmhandlers.S
is a loop which depends on the number of TLB sets.  Rather than
using feature sections to set the number of iterations based on
which CPU we're on, we now work out this number at VM creation time
and store it in the kvm_arch struct.  That will make it possible to
get the number from the device tree in future, which will help with
compatibility with future processors.

Since mmu_partition_table_set_entry() does a global flush of the
whole LPID, we don't need to do the TLB flush on first entry to the
guest on each processor.  Therefore we don't set all bits in the
tlb_need_flush bitmap on VM startup on POWER9.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/kvm_host.h     |  1 +
 arch/powerpc/kernel/asm-offsets.c       |  1 +
 arch/powerpc/kvm/book3s_hv.c            | 17 ++++++++++++++++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c     | 10 ++++++++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 ++------
 5 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 0d94608..ea78864 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -244,6 +244,7 @@ struct kvm_arch_memory_slot {
 struct kvm_arch {
 	unsigned int lpid;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	unsigned int tlb_sets;
 	unsigned long hpt_virt;
 	struct revmap_entry *revmap;
 	atomic64_t mmio_update;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 494241b..b9c8386 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -487,6 +487,7 @@ int main(void)
 
 	/* book3s */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	DEFINE(KVM_TLB_SETS, offsetof(struct kvm, arch.tlb_sets));
 	DEFINE(KVM_SDR1, offsetof(struct kvm, arch.sdr1));
 	DEFINE(KVM_HOST_LPID, offsetof(struct kvm, arch.host_lpid));
 	DEFINE(KVM_HOST_LPCR, offsetof(struct kvm, arch.host_lpcr));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 59e18dfb..8395a7f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3260,8 +3260,11 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	 * Since we don't flush the TLB when tearing down a VM,
 	 * and this lpid might have previously been used,
 	 * make sure we flush on each core before running the new VM.
+	 * On POWER9, the tlbie in mmu_partition_table_set_entry()
+	 * does this flush for us.
 	 */
-	cpumask_setall(&kvm->arch.need_tlb_flush);
+	if (!cpu_has_feature(CPU_FTR_ARCH_300))
+		cpumask_setall(&kvm->arch.need_tlb_flush);
 
 	/* Start out with the default set of hcalls enabled */
 	memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
@@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	kvm->arch.lpcr = lpcr;
 
 	/*
+	 * Work out how many sets the TLB has, for the use of
+	 * the TLB invalidation loop in book3s_hv_rmhandlers.S.
+	 */
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		kvm->arch.tlb_sets = 256;	/* POWER9 */
+	else if (cpu_has_feature(CPU_FTR_ARCH_207S))
+		kvm->arch.tlb_sets = 512;	/* POWER8 */
+	else
+		kvm->arch.tlb_sets = 128;	/* POWER7 */
+
+	/*
 	 * Track that we now have a HV mode VM active. This blocks secondary
 	 * CPU threads from coming online.
 	 */
@@ -3728,3 +3742,4 @@ module_exit(kvmppc_book3s_exit_hv);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_MISCDEV(KVM_MINOR);
 MODULE_ALIAS("devname:kvm");
+
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1179e40..9ef3c4b 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -424,13 +424,18 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
 {
 	long i;
 
+	/*
+	 * We use the POWER9 5-operand versions of tlbie and tlbiel here.
+	 * Since we are using RIC=0 PRS=0 R=0, and P7/P8 tlbiel ignores
+	 * the RS field, this is backwards-compatible with P7 and P8.
+	 */
 	if (global) {
 		while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
 			cpu_relax();
 		if (need_sync)
 			asm volatile("ptesync" : : : "memory");
 		for (i = 0; i < npages; ++i)
-			asm volatile(PPC_TLBIE(%1,%0) : :
+			asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
 				     "r" (rbvalues[i]), "r" (kvm->arch.lpid));
 		asm volatile("eieio; tlbsync; ptesync" : : : "memory");
 		kvm->arch.tlbie_lock = 0;
@@ -438,7 +443,8 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
 		if (need_sync)
 			asm volatile("ptesync" : : : "memory");
 		for (i = 0; i < npages; ++i)
-			asm volatile("tlbiel %0" : : "r" (rbvalues[i]));
+			asm volatile(PPC_TLBIEL(%0,%1,0,0,0) : :
+				     "r" (rbvalues[i]), "r" (0));
 		asm volatile("ptesync" : : : "memory");
 	}
 }
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 219a04f..acae5c3 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -613,12 +613,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 	stdcx.	r7,0,r6
 	bne	23b
 	/* Flush the TLB of any entries for this LPID */
-	/* use arch 2.07S as a proxy for POWER8 */
-BEGIN_FTR_SECTION
-	li	r6,512			/* POWER8 has 512 sets */
-FTR_SECTION_ELSE
-	li	r6,128			/* POWER7 has 128 sets */
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_207S)
+	lwz	r6,KVM_TLB_SETS(r9)
+	li	r0,0			/* RS for P9 version of tlbiel */
 	mtctr	r6
 	li	r7,0x800		/* IS field = 0b10 */
 	ptesync
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

On POWER9, the msgsnd instruction is able to send interrupts to
other cores, as well as other threads on the local core.  Since
msgsnd is generally simpler and faster than sending an IPI via the
XICS, we use msgsnd for all IPIs sent by KVM on POWER9.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv.c         | 11 ++++++++++-
 arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8395a7f..ace89df 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -147,12 +147,21 @@ static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc,
 
 static bool kvmppc_ipi_thread(int cpu)
 {
+	unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+
+	/* On POWER9 we can use msgsnd to IPI any cpu */
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		msg |= get_hard_smp_processor_id(cpu);
+		smp_mb();
+		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+		return true;
+	}
+
 	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
 	if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
 		preempt_disable();
 		if (cpu_first_thread_sibling(cpu) ==
 		    cpu_first_thread_sibling(smp_processor_id())) {
-			unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
 			msg |= cpu_thread_in_core(cpu);
 			smp_mb();
 			__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 0c84d6b..37ed045 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
 void kvmhv_rm_send_ipi(int cpu)
 {
 	unsigned long xics_phys;
+	unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
 
-	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
+	/* On POWER9 we can use msgsnd for any destination cpu. */
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		msg |= get_hard_smp_processor_id(cpu);
+		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+		return;
+	}
+	/* On POWER8 for IPIs to threads in the same core, use msgsnd. */
 	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
 	    cpu_first_thread_sibling(cpu) ==
 	    cpu_first_thread_sibling(raw_smp_processor_id())) {
-		unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
 		msg |= cpu_thread_in_core(cpu);
 		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
 		return;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

On POWER9, the msgsnd instruction is able to send interrupts to
other cores, as well as other threads on the local core.  Since
msgsnd is generally simpler and faster than sending an IPI via the
XICS, we use msgsnd for all IPIs sent by KVM on POWER9.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv.c         | 11 ++++++++++-
 arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8395a7f..ace89df 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -147,12 +147,21 @@ static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc,
 
 static bool kvmppc_ipi_thread(int cpu)
 {
+	unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+
+	/* On POWER9 we can use msgsnd to IPI any cpu */
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		msg |= get_hard_smp_processor_id(cpu);
+		smp_mb();
+		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+		return true;
+	}
+
 	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
 	if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
 		preempt_disable();
 		if (cpu_first_thread_sibling(cpu) =
 		    cpu_first_thread_sibling(smp_processor_id())) {
-			unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
 			msg |= cpu_thread_in_core(cpu);
 			smp_mb();
 			__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 0c84d6b..37ed045 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
 void kvmhv_rm_send_ipi(int cpu)
 {
 	unsigned long xics_phys;
+	unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
 
-	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
+	/* On POWER9 we can use msgsnd for any destination cpu. */
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		msg |= get_hard_smp_processor_id(cpu);
+		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+		return;
+	}
+	/* On POWER8 for IPIs to threads in the same core, use msgsnd. */
 	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
 	    cpu_first_thread_sibling(cpu) =
 	    cpu_first_thread_sibling(raw_smp_processor_id())) {
-		unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
 		msg |= cpu_thread_in_core(cpu);
 		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
 		return;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 11/13] KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

POWER9 includes a new interrupt controller, called XIVE, which is
quite different from the XICS interrupt controller on POWER7 and
POWER8 machines.  KVM-HV accesses the XICS directly in several places
in order to send and clear IPIs and handle interrupts from PCI
devices being passed through to the guest.

In order to make the transition to XIVE easier, OPAL firmware will
include an emulation of XICS on top of XIVE.  Access to the emulated
XICS is via OPAL calls.  The one complication is that the EOI
(end-of-interrupt) function can now return a value indicating that
another interrupt is pending; in this case, the XIVE will not signal
an interrupt in hardware to the CPU, and software is supposed to
acknowledge the new interrupt without waiting for another interrupt
to be delivered in hardware.

This adapts KVM-HV to use the OPAL calls on machines where there is
no XICS hardware.  When there is no XICS, we look for a device-tree
node with "ibm,opal-intc" in its compatible property, which is how
OPAL indicates that it provides XICS emulation.

In order to handle the EOI return value, kvmppc_read_intr() has
become kvmppc_read_one_intr(), with a boolean variable passed by
reference which can be set by the EOI functions to indicate that
another interrupt is pending.  The new kvmppc_read_intr() keeps
calling kvmppc_read_one_intr() until there are no more interrupts
to process.  The return value from kvmppc_read_intr() is the
largest non-zero value of the returns from kvmppc_read_one_intr().

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/kvm_ppc.h   |  7 +++--
 arch/powerpc/kvm/book3s_hv.c         | 28 +++++++++++++++--
 arch/powerpc/kvm/book3s_hv_builtin.c | 59 ++++++++++++++++++++++++++++++------
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 23 ++++++++++----
 4 files changed, 96 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index f6e4964..a5b94be 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -483,9 +483,10 @@ extern void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long guest_irq,
 				   unsigned long host_irq);
 extern void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
 				   unsigned long host_irq);
-extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, u32 xirr,
-				 struct kvmppc_irq_map *irq_map,
-				 struct kvmppc_passthru_irqmap *pimap);
+extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, __be32 xirr,
+					struct kvmppc_irq_map *irq_map,
+					struct kvmppc_passthru_irqmap *pimap,
+					bool *again);
 extern int h_ipi_redirect;
 #else
 static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ace89df..a1d2b5f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -55,6 +55,8 @@
 #include <asm/hmi.h>
 #include <asm/pnv-pci.h>
 #include <asm/mmu.h>
+#include <asm/opal.h>
+#include <asm/xics.h>
 #include <linux/gfp.h>
 #include <linux/vmalloc.h>
 #include <linux/highmem.h>
@@ -63,6 +65,7 @@
 #include <linux/irqbypass.h>
 #include <linux/module.h>
 #include <linux/compiler.h>
+#include <linux/of.h>
 
 #include "book3s.h"
 
@@ -172,8 +175,12 @@ static bool kvmppc_ipi_thread(int cpu)
 	}
 
 #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
-	if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
-		xics_wake_cpu(cpu);
+	if (cpu >= 0 && cpu < nr_cpu_ids) {
+		if (paca[cpu].kvm_hstate.xics_phys) {
+			xics_wake_cpu(cpu);
+			return true;
+		}
+		opal_int_set_mfrr(get_hard_smp_processor_id(cpu), IPI_PRIORITY);
 		return true;
 	}
 #endif
@@ -3729,6 +3736,23 @@ static int kvmppc_book3s_init_hv(void)
 	if (r)
 		return r;
 
+	/*
+	 * We need a way of accessing the XICS interrupt controller,
+	 * either directly, via paca[cpu].kvm_hstate.xics_phys, or
+	 * indirectly, via OPAL.
+	 */
+#ifdef CONFIG_SMP
+	if (!get_paca()->kvm_hstate.xics_phys) {
+		struct device_node *np;
+
+		np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc");
+		if (!np) {
+			pr_err("KVM-HV: Cannot determine method for accessing XICS\n");
+			return -ENODEV;
+		}
+	}
+#endif
+
 	kvm_ops_hv.owner = THIS_MODULE;
 	kvmppc_hv_ops = &kvm_ops_hv;
 
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 37ed045..a09c917 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -26,6 +26,7 @@
 #include <asm/dbell.h>
 #include <asm/cputhreads.h>
 #include <asm/io.h>
+#include <asm/opal.h>
 
 #define KVM_CMA_CHUNK_ORDER	18
 
@@ -224,7 +225,11 @@ void kvmhv_rm_send_ipi(int cpu)
 
 	/* Else poke the target with an IPI */
 	xics_phys = paca[cpu].kvm_hstate.xics_phys;
-	rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+	if (xics_phys)
+		rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+	else
+		opal_rm_int_set_mfrr(get_hard_smp_processor_id(cpu),
+				     IPI_PRIORITY);
 }
 
 /*
@@ -335,7 +340,7 @@ static struct kvmppc_irq_map *get_irqmap(struct kvmppc_passthru_irqmap *pimap,
  * saved a copy of the XIRR in the PACA, it will be picked up by
  * the host ICP driver.
  */
-static int kvmppc_check_passthru(u32 xisr, __be32 xirr)
+static int kvmppc_check_passthru(u32 xisr, __be32 xirr, bool *again)
 {
 	struct kvmppc_passthru_irqmap *pimap;
 	struct kvmppc_irq_map *irq_map;
@@ -354,7 +359,7 @@ static int kvmppc_check_passthru(u32 xisr, __be32 xirr)
 	/* We're handling this interrupt, generic code doesn't need to */
 	local_paca->kvm_hstate.saved_xirr = 0;
 
-	return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap);
+	return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap, again);
 }
 
 #else
@@ -373,14 +378,31 @@ static inline int kvmppc_check_passthru(u32 xisr, __be32 xirr)
  *	-1 if there was a guest wakeup IPI (which has now been cleared)
  *	-2 if there is PCI passthrough external interrupt that was handled
  */
+static long kvmppc_read_one_intr(bool *again);
 
 long kvmppc_read_intr(void)
 {
+	long ret = 0;
+	long rc;
+	bool again;
+
+	do {
+		again = false;
+		rc = kvmppc_read_one_intr(&again);
+		if (rc && (ret == 0 || rc > ret))
+			ret = rc;
+	} while (again);
+	return ret;
+}
+
+static long kvmppc_read_one_intr(bool *again)
+{
 	unsigned long xics_phys;
 	u32 h_xirr;
 	__be32 xirr;
 	u32 xisr;
 	u8 host_ipi;
+	int64_t rc;
 
 	/* see if a host IPI is pending */
 	host_ipi = local_paca->kvm_hstate.host_ipi;
@@ -389,8 +411,14 @@ long kvmppc_read_intr(void)
 
 	/* Now read the interrupt from the ICP */
 	xics_phys = local_paca->kvm_hstate.xics_phys;
-	if (unlikely(!xics_phys))
-		return 1;
+	if (!xics_phys) {
+		/* Use OPAL to read the XIRR */
+		rc = opal_rm_int_get_xirr(&xirr, false);
+		if (rc < 0)
+			return 1;
+	} else {
+		xirr = _lwzcix(xics_phys + XICS_XIRR);
+	}
 
 	/*
 	 * Save XIRR for later. Since we get control in reverse endian
@@ -398,7 +426,6 @@ long kvmppc_read_intr(void)
 	 * host endian. Note that xirr is the value read from the
 	 * XIRR register, while h_xirr is the host endian version.
 	 */
-	xirr = _lwzcix(xics_phys + XICS_XIRR);
 	h_xirr = be32_to_cpu(xirr);
 	local_paca->kvm_hstate.saved_xirr = h_xirr;
 	xisr = h_xirr & 0xffffff;
@@ -417,8 +444,16 @@ long kvmppc_read_intr(void)
 	 * If it is an IPI, clear the MFRR and EOI it.
 	 */
 	if (xisr == XICS_IPI) {
-		_stbcix(xics_phys + XICS_MFRR, 0xff);
-		_stwcix(xics_phys + XICS_XIRR, xirr);
+		if (xics_phys) {
+			_stbcix(xics_phys + XICS_MFRR, 0xff);
+			_stwcix(xics_phys + XICS_XIRR, xirr);
+		} else {
+			opal_rm_int_set_mfrr(hard_smp_processor_id(), 0xff);
+			rc = opal_rm_int_eoi(h_xirr);
+			/* If rc > 0, there is another interrupt pending */
+			*again = rc > 0;
+		}
+
 		/*
 		 * Need to ensure side effects of above stores
 		 * complete before proceeding.
@@ -435,7 +470,11 @@ long kvmppc_read_intr(void)
 			/* We raced with the host,
 			 * we need to resend that IPI, bummer
 			 */
-			_stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY);
+			if (xics_phys)
+				_stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY);
+			else
+				opal_rm_int_set_mfrr(hard_smp_processor_id(),
+						     IPI_PRIORITY);
 			/* Let side effects complete */
 			smp_mb();
 			return 1;
@@ -446,5 +485,5 @@ long kvmppc_read_intr(void)
 		return -1;
 	}
 
-	return kvmppc_check_passthru(xisr, xirr);
+	return kvmppc_check_passthru(xisr, xirr, again);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index a0ea63a..06edc43 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -70,7 +70,11 @@ static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu)
 	hcpu = hcore << threads_shift;
 	kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu;
 	smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION);
-	icp_native_cause_ipi_rm(hcpu);
+	if (paca[hcpu].kvm_hstate.xics_phys)
+		icp_native_cause_ipi_rm(hcpu);
+	else
+		opal_rm_int_set_mfrr(get_hard_smp_processor_id(hcpu),
+				     IPI_PRIORITY);
 }
 #else
 static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) { }
@@ -737,7 +741,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
 
 unsigned long eoi_rc;
 
-static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
+static void icp_eoi(struct irq_chip *c, u32 hwirq, __be32 xirr, bool *again)
 {
 	unsigned long xics_phys;
 	int64_t rc;
@@ -751,7 +755,12 @@ static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
 
 	/* EOI it */
 	xics_phys = local_paca->kvm_hstate.xics_phys;
-	_stwcix(xics_phys + XICS_XIRR, xirr);
+	if (xics_phys) {
+		_stwcix(xics_phys + XICS_XIRR, xirr);
+	} else {
+		rc = opal_rm_int_eoi(be32_to_cpu(xirr));
+		*again = rc > 0;
+	}
 }
 
 static int xics_opal_rm_set_server(unsigned int hw_irq, int server_cpu)
@@ -809,9 +818,10 @@ static void kvmppc_rm_handle_irq_desc(struct irq_desc *desc)
 }
 
 long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
-				 u32 xirr,
+				 __be32 xirr,
 				 struct kvmppc_irq_map *irq_map,
-				 struct kvmppc_passthru_irqmap *pimap)
+				 struct kvmppc_passthru_irqmap *pimap,
+				 bool *again)
 {
 	struct kvmppc_xics *xics;
 	struct kvmppc_icp *icp;
@@ -825,7 +835,8 @@ long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
 	icp_rm_deliver_irq(xics, icp, irq);
 
 	/* EOI the interrupt */
-	icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr);
+	icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr,
+		again);
 
 	if (check_too_hard(xics, icp) == H_TOO_HARD)
 		return 2;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 11/13] KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

POWER9 includes a new interrupt controller, called XIVE, which is
quite different from the XICS interrupt controller on POWER7 and
POWER8 machines.  KVM-HV accesses the XICS directly in several places
in order to send and clear IPIs and handle interrupts from PCI
devices being passed through to the guest.

In order to make the transition to XIVE easier, OPAL firmware will
include an emulation of XICS on top of XIVE.  Access to the emulated
XICS is via OPAL calls.  The one complication is that the EOI
(end-of-interrupt) function can now return a value indicating that
another interrupt is pending; in this case, the XIVE will not signal
an interrupt in hardware to the CPU, and software is supposed to
acknowledge the new interrupt without waiting for another interrupt
to be delivered in hardware.

This adapts KVM-HV to use the OPAL calls on machines where there is
no XICS hardware.  When there is no XICS, we look for a device-tree
node with "ibm,opal-intc" in its compatible property, which is how
OPAL indicates that it provides XICS emulation.

In order to handle the EOI return value, kvmppc_read_intr() has
become kvmppc_read_one_intr(), with a boolean variable passed by
reference which can be set by the EOI functions to indicate that
another interrupt is pending.  The new kvmppc_read_intr() keeps
calling kvmppc_read_one_intr() until there are no more interrupts
to process.  The return value from kvmppc_read_intr() is the
largest non-zero value of the returns from kvmppc_read_one_intr().

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/kvm_ppc.h   |  7 +++--
 arch/powerpc/kvm/book3s_hv.c         | 28 +++++++++++++++--
 arch/powerpc/kvm/book3s_hv_builtin.c | 59 ++++++++++++++++++++++++++++++------
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 23 ++++++++++----
 4 files changed, 96 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index f6e4964..a5b94be 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -483,9 +483,10 @@ extern void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long guest_irq,
 				   unsigned long host_irq);
 extern void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
 				   unsigned long host_irq);
-extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, u32 xirr,
-				 struct kvmppc_irq_map *irq_map,
-				 struct kvmppc_passthru_irqmap *pimap);
+extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, __be32 xirr,
+					struct kvmppc_irq_map *irq_map,
+					struct kvmppc_passthru_irqmap *pimap,
+					bool *again);
 extern int h_ipi_redirect;
 #else
 static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ace89df..a1d2b5f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -55,6 +55,8 @@
 #include <asm/hmi.h>
 #include <asm/pnv-pci.h>
 #include <asm/mmu.h>
+#include <asm/opal.h>
+#include <asm/xics.h>
 #include <linux/gfp.h>
 #include <linux/vmalloc.h>
 #include <linux/highmem.h>
@@ -63,6 +65,7 @@
 #include <linux/irqbypass.h>
 #include <linux/module.h>
 #include <linux/compiler.h>
+#include <linux/of.h>
 
 #include "book3s.h"
 
@@ -172,8 +175,12 @@ static bool kvmppc_ipi_thread(int cpu)
 	}
 
 #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
-	if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
-		xics_wake_cpu(cpu);
+	if (cpu >= 0 && cpu < nr_cpu_ids) {
+		if (paca[cpu].kvm_hstate.xics_phys) {
+			xics_wake_cpu(cpu);
+			return true;
+		}
+		opal_int_set_mfrr(get_hard_smp_processor_id(cpu), IPI_PRIORITY);
 		return true;
 	}
 #endif
@@ -3729,6 +3736,23 @@ static int kvmppc_book3s_init_hv(void)
 	if (r)
 		return r;
 
+	/*
+	 * We need a way of accessing the XICS interrupt controller,
+	 * either directly, via paca[cpu].kvm_hstate.xics_phys, or
+	 * indirectly, via OPAL.
+	 */
+#ifdef CONFIG_SMP
+	if (!get_paca()->kvm_hstate.xics_phys) {
+		struct device_node *np;
+
+		np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc");
+		if (!np) {
+			pr_err("KVM-HV: Cannot determine method for accessing XICS\n");
+			return -ENODEV;
+		}
+	}
+#endif
+
 	kvm_ops_hv.owner = THIS_MODULE;
 	kvmppc_hv_ops = &kvm_ops_hv;
 
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 37ed045..a09c917 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -26,6 +26,7 @@
 #include <asm/dbell.h>
 #include <asm/cputhreads.h>
 #include <asm/io.h>
+#include <asm/opal.h>
 
 #define KVM_CMA_CHUNK_ORDER	18
 
@@ -224,7 +225,11 @@ void kvmhv_rm_send_ipi(int cpu)
 
 	/* Else poke the target with an IPI */
 	xics_phys = paca[cpu].kvm_hstate.xics_phys;
-	rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+	if (xics_phys)
+		rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+	else
+		opal_rm_int_set_mfrr(get_hard_smp_processor_id(cpu),
+				     IPI_PRIORITY);
 }
 
 /*
@@ -335,7 +340,7 @@ static struct kvmppc_irq_map *get_irqmap(struct kvmppc_passthru_irqmap *pimap,
  * saved a copy of the XIRR in the PACA, it will be picked up by
  * the host ICP driver.
  */
-static int kvmppc_check_passthru(u32 xisr, __be32 xirr)
+static int kvmppc_check_passthru(u32 xisr, __be32 xirr, bool *again)
 {
 	struct kvmppc_passthru_irqmap *pimap;
 	struct kvmppc_irq_map *irq_map;
@@ -354,7 +359,7 @@ static int kvmppc_check_passthru(u32 xisr, __be32 xirr)
 	/* We're handling this interrupt, generic code doesn't need to */
 	local_paca->kvm_hstate.saved_xirr = 0;
 
-	return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap);
+	return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap, again);
 }
 
 #else
@@ -373,14 +378,31 @@ static inline int kvmppc_check_passthru(u32 xisr, __be32 xirr)
  *	-1 if there was a guest wakeup IPI (which has now been cleared)
  *	-2 if there is PCI passthrough external interrupt that was handled
  */
+static long kvmppc_read_one_intr(bool *again);
 
 long kvmppc_read_intr(void)
 {
+	long ret = 0;
+	long rc;
+	bool again;
+
+	do {
+		again = false;
+		rc = kvmppc_read_one_intr(&again);
+		if (rc && (ret = 0 || rc > ret))
+			ret = rc;
+	} while (again);
+	return ret;
+}
+
+static long kvmppc_read_one_intr(bool *again)
+{
 	unsigned long xics_phys;
 	u32 h_xirr;
 	__be32 xirr;
 	u32 xisr;
 	u8 host_ipi;
+	int64_t rc;
 
 	/* see if a host IPI is pending */
 	host_ipi = local_paca->kvm_hstate.host_ipi;
@@ -389,8 +411,14 @@ long kvmppc_read_intr(void)
 
 	/* Now read the interrupt from the ICP */
 	xics_phys = local_paca->kvm_hstate.xics_phys;
-	if (unlikely(!xics_phys))
-		return 1;
+	if (!xics_phys) {
+		/* Use OPAL to read the XIRR */
+		rc = opal_rm_int_get_xirr(&xirr, false);
+		if (rc < 0)
+			return 1;
+	} else {
+		xirr = _lwzcix(xics_phys + XICS_XIRR);
+	}
 
 	/*
 	 * Save XIRR for later. Since we get control in reverse endian
@@ -398,7 +426,6 @@ long kvmppc_read_intr(void)
 	 * host endian. Note that xirr is the value read from the
 	 * XIRR register, while h_xirr is the host endian version.
 	 */
-	xirr = _lwzcix(xics_phys + XICS_XIRR);
 	h_xirr = be32_to_cpu(xirr);
 	local_paca->kvm_hstate.saved_xirr = h_xirr;
 	xisr = h_xirr & 0xffffff;
@@ -417,8 +444,16 @@ long kvmppc_read_intr(void)
 	 * If it is an IPI, clear the MFRR and EOI it.
 	 */
 	if (xisr = XICS_IPI) {
-		_stbcix(xics_phys + XICS_MFRR, 0xff);
-		_stwcix(xics_phys + XICS_XIRR, xirr);
+		if (xics_phys) {
+			_stbcix(xics_phys + XICS_MFRR, 0xff);
+			_stwcix(xics_phys + XICS_XIRR, xirr);
+		} else {
+			opal_rm_int_set_mfrr(hard_smp_processor_id(), 0xff);
+			rc = opal_rm_int_eoi(h_xirr);
+			/* If rc > 0, there is another interrupt pending */
+			*again = rc > 0;
+		}
+
 		/*
 		 * Need to ensure side effects of above stores
 		 * complete before proceeding.
@@ -435,7 +470,11 @@ long kvmppc_read_intr(void)
 			/* We raced with the host,
 			 * we need to resend that IPI, bummer
 			 */
-			_stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY);
+			if (xics_phys)
+				_stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY);
+			else
+				opal_rm_int_set_mfrr(hard_smp_processor_id(),
+						     IPI_PRIORITY);
 			/* Let side effects complete */
 			smp_mb();
 			return 1;
@@ -446,5 +485,5 @@ long kvmppc_read_intr(void)
 		return -1;
 	}
 
-	return kvmppc_check_passthru(xisr, xirr);
+	return kvmppc_check_passthru(xisr, xirr, again);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index a0ea63a..06edc43 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -70,7 +70,11 @@ static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu)
 	hcpu = hcore << threads_shift;
 	kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu;
 	smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION);
-	icp_native_cause_ipi_rm(hcpu);
+	if (paca[hcpu].kvm_hstate.xics_phys)
+		icp_native_cause_ipi_rm(hcpu);
+	else
+		opal_rm_int_set_mfrr(get_hard_smp_processor_id(hcpu),
+				     IPI_PRIORITY);
 }
 #else
 static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) { }
@@ -737,7 +741,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
 
 unsigned long eoi_rc;
 
-static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
+static void icp_eoi(struct irq_chip *c, u32 hwirq, __be32 xirr, bool *again)
 {
 	unsigned long xics_phys;
 	int64_t rc;
@@ -751,7 +755,12 @@ static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
 
 	/* EOI it */
 	xics_phys = local_paca->kvm_hstate.xics_phys;
-	_stwcix(xics_phys + XICS_XIRR, xirr);
+	if (xics_phys) {
+		_stwcix(xics_phys + XICS_XIRR, xirr);
+	} else {
+		rc = opal_rm_int_eoi(be32_to_cpu(xirr));
+		*again = rc > 0;
+	}
 }
 
 static int xics_opal_rm_set_server(unsigned int hw_irq, int server_cpu)
@@ -809,9 +818,10 @@ static void kvmppc_rm_handle_irq_desc(struct irq_desc *desc)
 }
 
 long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
-				 u32 xirr,
+				 __be32 xirr,
 				 struct kvmppc_irq_map *irq_map,
-				 struct kvmppc_passthru_irqmap *pimap)
+				 struct kvmppc_passthru_irqmap *pimap,
+				 bool *again)
 {
 	struct kvmppc_xics *xics;
 	struct kvmppc_icp *icp;
@@ -825,7 +835,8 @@ long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
 	icp_rm_deliver_irq(xics, icp, irq);
 
 	/* EOI the interrupt */
-	icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr);
+	icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr,
+		again);
 
 	if (check_too_hard(xics, icp) = H_TOO_HARD)
 		return 2;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 12/13] KVM: PPC: Book3S HV: Use stop instruction rather than nap on POWER9
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

POWER9 replaces the various power-saving mode instructions on POWER8
(doze, nap, sleep and rvwinkle) with a single "stop" instruction, plus
a register, PSSCR, which controls the depth of the power-saving mode.
This replaces the use of the nap instruction when threads are idle
during guest execution with the stop instruction, and adds code to
set PSSCR to a value which will allow an SMT mode switch while the
thread is idle (given that the core as a whole won't be idle in these
cases).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index acae5c3..e9eaff4 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -501,17 +501,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	cmpwi	r0, 0
 	beq	57f
 	li	r3, (LPCR_PECEDH | LPCR_PECE0) >> 4
-	mfspr	r4, SPRN_LPCR
-	rlwimi	r4, r3, 4, (LPCR_PECEDP | LPCR_PECEDH | LPCR_PECE0 | LPCR_PECE1)
-	mtspr	SPRN_LPCR, r4
-	isync
-	std	r0, HSTATE_SCRATCH0(r13)
-	ptesync
-	ld	r0, HSTATE_SCRATCH0(r13)
-1:	cmpd	r0, r0
-	bne	1b
-	nap
-	b	.
+	mfspr	r5, SPRN_LPCR
+	rlwimi	r5, r3, 4, (LPCR_PECEDP | LPCR_PECEDH | LPCR_PECE0 | LPCR_PECE1)
+	b	kvm_nap_sequence
 
 57:	li	r0, 0
 	stbx	r0, r3, r4
@@ -2256,6 +2248,17 @@ BEGIN_FTR_SECTION
 	ori	r5, r5, LPCR_PECEDH
 	rlwimi	r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+
+kvm_nap_sequence:		/* desired LPCR value in r5 */
+BEGIN_FTR_SECTION
+	/*
+	 * PSSCR bits:	exit criterion = 1 (wakeup based on LPCR at sreset)
+	 *		enable state loss = 1 (allow SMT mode switch)
+	 *		requested level = 0 (just stop dispatching)
+	 */
+	lis	r3, (PSSCR_EC | PSSCR_ESL)@h
+	mtspr	SPRN_PSSCR, r3
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	mtspr	SPRN_LPCR,r5
 	isync
 	li	r0, 0
@@ -2264,7 +2267,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	ld	r0, HSTATE_SCRATCH0(r13)
 1:	cmpd	r0, r0
 	bne	1b
+BEGIN_FTR_SECTION
 	nap
+FTR_SECTION_ELSE
+	PPC_STOP
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
 	b	.
 
 33:	mr	r4, r3
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 12/13] KVM: PPC: Book3S HV: Use stop instruction rather than nap on POWER9
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

POWER9 replaces the various power-saving mode instructions on POWER8
(doze, nap, sleep and rvwinkle) with a single "stop" instruction, plus
a register, PSSCR, which controls the depth of the power-saving mode.
This replaces the use of the nap instruction when threads are idle
during guest execution with the stop instruction, and adds code to
set PSSCR to a value which will allow an SMT mode switch while the
thread is idle (given that the core as a whole won't be idle in these
cases).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index acae5c3..e9eaff4 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -501,17 +501,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	cmpwi	r0, 0
 	beq	57f
 	li	r3, (LPCR_PECEDH | LPCR_PECE0) >> 4
-	mfspr	r4, SPRN_LPCR
-	rlwimi	r4, r3, 4, (LPCR_PECEDP | LPCR_PECEDH | LPCR_PECE0 | LPCR_PECE1)
-	mtspr	SPRN_LPCR, r4
-	isync
-	std	r0, HSTATE_SCRATCH0(r13)
-	ptesync
-	ld	r0, HSTATE_SCRATCH0(r13)
-1:	cmpd	r0, r0
-	bne	1b
-	nap
-	b	.
+	mfspr	r5, SPRN_LPCR
+	rlwimi	r5, r3, 4, (LPCR_PECEDP | LPCR_PECEDH | LPCR_PECE0 | LPCR_PECE1)
+	b	kvm_nap_sequence
 
 57:	li	r0, 0
 	stbx	r0, r3, r4
@@ -2256,6 +2248,17 @@ BEGIN_FTR_SECTION
 	ori	r5, r5, LPCR_PECEDH
 	rlwimi	r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+
+kvm_nap_sequence:		/* desired LPCR value in r5 */
+BEGIN_FTR_SECTION
+	/*
+	 * PSSCR bits:	exit criterion = 1 (wakeup based on LPCR at sreset)
+	 *		enable state loss = 1 (allow SMT mode switch)
+	 *		requested level = 0 (just stop dispatching)
+	 */
+	lis	r3, (PSSCR_EC | PSSCR_ESL)@h
+	mtspr	SPRN_PSSCR, r3
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	mtspr	SPRN_LPCR,r5
 	isync
 	li	r0, 0
@@ -2264,7 +2267,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	ld	r0, HSTATE_SCRATCH0(r13)
 1:	cmpd	r0, r0
 	bne	1b
+BEGIN_FTR_SECTION
 	nap
+FTR_SECTION_ELSE
+	PPC_STOP
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
 	b	.
 
 33:	mr	r4, r3
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 13/13] KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores
  2016-11-18  7:28 ` Paul Mackerras
@ 2016-11-18  7:28   ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

With POWER9, each CPU thread has its own MMU context and can be
in the host or a guest independently of the other threads; there is
still however a restriction that all threads must use the same type
of address translation, either radix tree or hashed page table (HPT).

Since we only support HPT guests on a HPT host at this point, we
can treat the threads as being independent, and avoid all of the
work of coordinating the CPU threads.  To make this simpler, we
introduce a new threads_per_vcore() function that returns 1 on
POWER9 and threads_per_subcore on POWER7/8, and use that instead
of threads_per_subcore or threads_per_core in various places.

This also changes the value of the KVM_CAP_PPC_SMT capability on
POWER9 systems from 4 to 1, so that userspace will not try to
create VMs with multiple vcpus per vcore.  (If userspace did create
a VM that thought it was in an SMT mode, the VM might try to use
the msgsndp instruction, which will not work as expected.  In
future it may be possible to trap and emulate msgsndp in order to
allow VMs to think they are in an SMT mode, if only for the purpose
of allowing migration from POWER8 systems.)

With all this, we can now run guests on POWER9 as long as the host
is running with HPT translation.  Since userspace currently has no
way to request radix tree translation for the guest, the guest has
no choice but to use HPT translation.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv.c | 36 +++++++++++++++++++++++++++++-------
 arch/powerpc/kvm/powerpc.c   | 11 +++++++----
 2 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a1d2b5f..591ac84 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1569,6 +1569,20 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	return r;
 }
 
+/*
+ * On POWER9, threads are independent and can be in different partitions.
+ * Therefore we consider each thread to be a subcore.
+ * There is a restriction that all threads have to be in the same
+ * MMU mode (radix or HPT), unfortunately, but since we only support
+ * HPT guests on a HPT host so far, that isn't an impediment yet.
+ */
+static int threads_per_vcore(void)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		return 1;
+	return threads_per_subcore;
+}
+
 static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
 {
 	struct kvmppc_vcore *vcore;
@@ -1583,7 +1597,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
 	init_swait_queue_head(&vcore->wq);
 	vcore->preempt_tb = TB_NIL;
 	vcore->lpcr = kvm->arch.lpcr;
-	vcore->first_vcpuid = core * threads_per_subcore;
+	vcore->first_vcpuid = core * threads_per_vcore();
 	vcore->kvm = kvm;
 	INIT_LIST_HEAD(&vcore->preempt_list);
 
@@ -1746,7 +1760,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 	int core;
 	struct kvmppc_vcore *vcore;
 
-	core = id / threads_per_subcore;
+	core = id / threads_per_vcore();
 	if (core >= KVM_MAX_VCORES)
 		goto out;
 
@@ -2336,6 +2350,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 	unsigned long cmd_bit, stat_bit;
 	int pcpu, thr;
 	int target_threads;
+	int controlled_threads;
 
 	/*
 	 * Remove from the list any threads that have a signal pending
@@ -2354,11 +2369,18 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 	vc->preempt_tb = TB_NIL;
 
 	/*
+	 * Number of threads that we will be controlling: the same as
+	 * the number of threads per subcore, except on POWER9,
+	 * where it's 1 because the threads are (mostly) independent.
+	 */
+	controlled_threads = threads_per_vcore();
+
+	/*
 	 * Make sure we are running on primary threads, and that secondary
 	 * threads are offline.  Also check if the number of threads in this
 	 * guest are greater than the current system threads per guest.
 	 */
-	if ((threads_per_core > 1) &&
+	if ((controlled_threads > 1) &&
 	    ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
 		for_each_runnable_thread(i, vcpu, vc) {
 			vcpu->arch.ret = -EBUSY;
@@ -2374,7 +2396,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 	 */
 	init_core_info(&core_info, vc);
 	pcpu = smp_processor_id();
-	target_threads = threads_per_subcore;
+	target_threads = controlled_threads;
 	if (target_smt_mode && target_smt_mode < target_threads)
 		target_threads = target_smt_mode;
 	if (vc->num_threads < target_threads)
@@ -2410,7 +2432,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 		smp_wmb();
 	}
 	pcpu = smp_processor_id();
-	for (thr = 0; thr < threads_per_subcore; ++thr)
+	for (thr = 0; thr < controlled_threads; ++thr)
 		paca[pcpu + thr].kvm_hstate.kvm_split_mode = sip;
 
 	/* Initiate micro-threading (split-core) if required */
@@ -3380,9 +3402,9 @@ static int kvmppc_core_check_processor_compat_hv(void)
 	    !cpu_has_feature(CPU_FTR_ARCH_206))
 		return -EIO;
 	/*
-	 * Disable KVM for Power9, untill the required bits merged.
+	 * Disable KVM for Power9 in radix mode.
 	 */
-	if (cpu_has_feature(CPU_FTR_ARCH_300))
+	if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled())
 		return -EIO;
 
 	return 0;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 70963c8..b5e4705 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -548,10 +548,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 #endif /* CONFIG_PPC_BOOK3S_64 */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	case KVM_CAP_PPC_SMT:
-		if (hv_enabled)
-			r = threads_per_subcore;
-		else
-			r = 0;
+		r = 0;
+		if (hv_enabled) {
+			if (cpu_has_feature(CPU_FTR_ARCH_300))
+				r = 1;
+			else
+				r = threads_per_subcore;
+		}
 		break;
 	case KVM_CAP_PPC_RMA:
 		r = 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 13/13] KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores
@ 2016-11-18  7:28   ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-18  7:28 UTC (permalink / raw)
  To: kvm, kvm-ppc, linuxppc-dev

With POWER9, each CPU thread has its own MMU context and can be
in the host or a guest independently of the other threads; there is
still however a restriction that all threads must use the same type
of address translation, either radix tree or hashed page table (HPT).

Since we only support HPT guests on a HPT host at this point, we
can treat the threads as being independent, and avoid all of the
work of coordinating the CPU threads.  To make this simpler, we
introduce a new threads_per_vcore() function that returns 1 on
POWER9 and threads_per_subcore on POWER7/8, and use that instead
of threads_per_subcore or threads_per_core in various places.

This also changes the value of the KVM_CAP_PPC_SMT capability on
POWER9 systems from 4 to 1, so that userspace will not try to
create VMs with multiple vcpus per vcore.  (If userspace did create
a VM that thought it was in an SMT mode, the VM might try to use
the msgsndp instruction, which will not work as expected.  In
future it may be possible to trap and emulate msgsndp in order to
allow VMs to think they are in an SMT mode, if only for the purpose
of allowing migration from POWER8 systems.)

With all this, we can now run guests on POWER9 as long as the host
is running with HPT translation.  Since userspace currently has no
way to request radix tree translation for the guest, the guest has
no choice but to use HPT translation.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv.c | 36 +++++++++++++++++++++++++++++-------
 arch/powerpc/kvm/powerpc.c   | 11 +++++++----
 2 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a1d2b5f..591ac84 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1569,6 +1569,20 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	return r;
 }
 
+/*
+ * On POWER9, threads are independent and can be in different partitions.
+ * Therefore we consider each thread to be a subcore.
+ * There is a restriction that all threads have to be in the same
+ * MMU mode (radix or HPT), unfortunately, but since we only support
+ * HPT guests on a HPT host so far, that isn't an impediment yet.
+ */
+static int threads_per_vcore(void)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		return 1;
+	return threads_per_subcore;
+}
+
 static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
 {
 	struct kvmppc_vcore *vcore;
@@ -1583,7 +1597,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
 	init_swait_queue_head(&vcore->wq);
 	vcore->preempt_tb = TB_NIL;
 	vcore->lpcr = kvm->arch.lpcr;
-	vcore->first_vcpuid = core * threads_per_subcore;
+	vcore->first_vcpuid = core * threads_per_vcore();
 	vcore->kvm = kvm;
 	INIT_LIST_HEAD(&vcore->preempt_list);
 
@@ -1746,7 +1760,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 	int core;
 	struct kvmppc_vcore *vcore;
 
-	core = id / threads_per_subcore;
+	core = id / threads_per_vcore();
 	if (core >= KVM_MAX_VCORES)
 		goto out;
 
@@ -2336,6 +2350,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 	unsigned long cmd_bit, stat_bit;
 	int pcpu, thr;
 	int target_threads;
+	int controlled_threads;
 
 	/*
 	 * Remove from the list any threads that have a signal pending
@@ -2354,11 +2369,18 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 	vc->preempt_tb = TB_NIL;
 
 	/*
+	 * Number of threads that we will be controlling: the same as
+	 * the number of threads per subcore, except on POWER9,
+	 * where it's 1 because the threads are (mostly) independent.
+	 */
+	controlled_threads = threads_per_vcore();
+
+	/*
 	 * Make sure we are running on primary threads, and that secondary
 	 * threads are offline.  Also check if the number of threads in this
 	 * guest are greater than the current system threads per guest.
 	 */
-	if ((threads_per_core > 1) &&
+	if ((controlled_threads > 1) &&
 	    ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
 		for_each_runnable_thread(i, vcpu, vc) {
 			vcpu->arch.ret = -EBUSY;
@@ -2374,7 +2396,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 	 */
 	init_core_info(&core_info, vc);
 	pcpu = smp_processor_id();
-	target_threads = threads_per_subcore;
+	target_threads = controlled_threads;
 	if (target_smt_mode && target_smt_mode < target_threads)
 		target_threads = target_smt_mode;
 	if (vc->num_threads < target_threads)
@@ -2410,7 +2432,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 		smp_wmb();
 	}
 	pcpu = smp_processor_id();
-	for (thr = 0; thr < threads_per_subcore; ++thr)
+	for (thr = 0; thr < controlled_threads; ++thr)
 		paca[pcpu + thr].kvm_hstate.kvm_split_mode = sip;
 
 	/* Initiate micro-threading (split-core) if required */
@@ -3380,9 +3402,9 @@ static int kvmppc_core_check_processor_compat_hv(void)
 	    !cpu_has_feature(CPU_FTR_ARCH_206))
 		return -EIO;
 	/*
-	 * Disable KVM for Power9, untill the required bits merged.
+	 * Disable KVM for Power9 in radix mode.
 	 */
-	if (cpu_has_feature(CPU_FTR_ARCH_300))
+	if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled())
 		return -EIO;
 
 	return 0;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 70963c8..b5e4705 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -548,10 +548,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 #endif /* CONFIG_PPC_BOOK3S_64 */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	case KVM_CAP_PPC_SMT:
-		if (hv_enabled)
-			r = threads_per_subcore;
-		else
-			r = 0;
+		r = 0;
+		if (hv_enabled) {
+			if (cpu_has_feature(CPU_FTR_ARCH_300))
+				r = 1;
+			else
+				r = threads_per_subcore;
+		}
 		break;
 	case KVM_CAP_PPC_RMA:
 		r = 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
  2016-11-18  7:28   ` Paul Mackerras
@ 2016-11-18 14:39     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:27 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:
 +
> +	/* Global flush of TLBs and partition table caches for this lpid */
> +	asm volatile("ptesync");
> +	asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> +	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> +}


It would be nice to convert that 0x800 to a documented IS value or better use
radix__flush_tlb_pid() ?

-aneesh


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
  2016-11-18  7:28   ` Paul Mackerras
@ 2016-11-18 14:47     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:35 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> Some special-purpose registers that were present and accessible
> by guests on POWER8 no longer exist on POWER9, so this adds
> feature sections to ensure that we don't try to context-switch
> them when going into or out of a guest on POWER9.  These are
> all relatively obscure, rarely-used registers, but we had to
> context-switch them on POWER8 to avoid creating a covert channel.
> They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.

We don't need to context-switch them even when running a power8 compat
guest ?

>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 ++++++++++++++++++++-------------
>  1 file changed, 30 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index dc25467..d422014 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -752,14 +752,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
>  BEGIN_FTR_SECTION
>  	ld	r5, VCPU_MMCR + 24(r4)
>  	ld	r6, VCPU_SIER(r4)
> +	mtspr	SPRN_MMCR2, r5
> +	mtspr	SPRN_SIER, r6
> +BEGIN_FTR_SECTION_NESTED(96)
>  	lwz	r7, VCPU_PMC + 24(r4)
>  	lwz	r8, VCPU_PMC + 28(r4)
>  	ld	r9, VCPU_MMCR + 32(r4)
> -	mtspr	SPRN_MMCR2, r5
> -	mtspr	SPRN_SIER, r6
>  	mtspr	SPRN_SPMC1, r7
>  	mtspr	SPRN_SPMC2, r8
>  	mtspr	SPRN_MMCRS, r9
> +END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
>  END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_MMCR0, r3
>  	isync
> @@ -815,20 +817,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_EBBHR, r8
>  	ld	r5, VCPU_EBBRR(r4)
>  	ld	r6, VCPU_BESCR(r4)
> -	ld	r7, VCPU_CSIGR(r4)
> -	ld	r8, VCPU_TACR(r4)
> +	lwz	r7, VCPU_GUEST_PID(r4)
> +	ld	r8, VCPU_WORT(r4)
>  	mtspr	SPRN_EBBRR, r5
>  	mtspr	SPRN_BESCR, r6
> -	mtspr	SPRN_CSIGR, r7
> -	mtspr	SPRN_TACR, r8
> +	mtspr	SPRN_PID, r7
> +	mtspr	SPRN_WORT, r8
> +BEGIN_FTR_SECTION
>  	ld	r5, VCPU_TCSCR(r4)
>  	ld	r6, VCPU_ACOP(r4)
> -	lwz	r7, VCPU_GUEST_PID(r4)
> -	ld	r8, VCPU_WORT(r4)
> +	ld	r7, VCPU_CSIGR(r4)
> +	ld	r8, VCPU_TACR(r4)
>  	mtspr	SPRN_TCSCR, r5
>  	mtspr	SPRN_ACOP, r6
> -	mtspr	SPRN_PID, r7
> -	mtspr	SPRN_WORT, r8
> +	mtspr	SPRN_CSIGR, r7
> +	mtspr	SPRN_TACR, r8
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>  8:
>  
>  	/*
> @@ -1343,20 +1347,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>  	std	r8, VCPU_EBBHR(r9)
>  	mfspr	r5, SPRN_EBBRR
>  	mfspr	r6, SPRN_BESCR
> -	mfspr	r7, SPRN_CSIGR
> -	mfspr	r8, SPRN_TACR
> +	mfspr	r7, SPRN_PID
> +	mfspr	r8, SPRN_WORT
>  	std	r5, VCPU_EBBRR(r9)
>  	std	r6, VCPU_BESCR(r9)
> -	std	r7, VCPU_CSIGR(r9)
> -	std	r8, VCPU_TACR(r9)
> +	stw	r7, VCPU_GUEST_PID(r9)
> +	std	r8, VCPU_WORT(r9)
> +BEGIN_FTR_SECTION
>  	mfspr	r5, SPRN_TCSCR
>  	mfspr	r6, SPRN_ACOP
> -	mfspr	r7, SPRN_PID
> -	mfspr	r8, SPRN_WORT
> +	mfspr	r7, SPRN_CSIGR
> +	mfspr	r8, SPRN_TACR
>  	std	r5, VCPU_TCSCR(r9)
>  	std	r6, VCPU_ACOP(r9)
> -	stw	r7, VCPU_GUEST_PID(r9)
> -	std	r8, VCPU_WORT(r9)
> +	std	r7, VCPU_CSIGR(r9)
> +	std	r8, VCPU_TACR(r9)
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>  	/*
>  	 * Restore various registers to 0, where non-zero values
>  	 * set by the guest could disrupt the host.
> @@ -1365,12 +1371,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_IAMR, r0
>  	mtspr	SPRN_CIABR, r0
>  	mtspr	SPRN_DAWRX, r0
> -	mtspr	SPRN_TCSCR, r0
>  	mtspr	SPRN_WORT, r0
> +BEGIN_FTR_SECTION
> +	mtspr	SPRN_TCSCR, r0
>  	/* Set MMCRS to 1<<31 to freeze and disable the SPMC counters */
>  	li	r0, 1
>  	sldi	r0, r0, 31
>  	mtspr	SPRN_MMCRS, r0
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>  8:
>  
>  	/* Save and reset AMR and UAMOR before turning on the MMU */
> @@ -1504,15 +1512,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	stw	r8, VCPU_PMC + 20(r9)
>  BEGIN_FTR_SECTION
>  	mfspr	r5, SPRN_SIER
> +	std	r5, VCPU_SIER(r9)
> +BEGIN_FTR_SECTION_NESTED(96)
>  	mfspr	r6, SPRN_SPMC1
>  	mfspr	r7, SPRN_SPMC2
>  	mfspr	r8, SPRN_MMCRS
> -	std	r5, VCPU_SIER(r9)
>  	stw	r6, VCPU_PMC + 24(r9)
>  	stw	r7, VCPU_PMC + 28(r9)
>  	std	r8, VCPU_MMCR + 32(r9)
>  	lis	r4, 0x8000
>  	mtspr	SPRN_MMCRS, r4
> +END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
>  END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  22:
>  	/* Clear out SLB */
> -- 
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-18 14:39     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:39 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:
 +
> +	/* Global flush of TLBs and partition table caches for this lpid */
> +	asm volatile("ptesync");
> +	asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> +	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> +}


It would be nice to convert that 0x800 to a documented IS value or better use
radix__flush_tlb_pid() ?

-aneesh


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
  2016-11-18  7:28   ` Paul Mackerras
@ 2016-11-18 14:53     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:41 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
> and tlbiel (local tlbie) instructions.  Both instructions get a
> set of new parameters (RIC, PRS and R) which appear as bits in the
> instruction word.  The tlbiel instruction now has a second register
> operand, which contains a PID and/or LPID value if needed, and
> should otherwise contain 0.
>
> This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
> as well as older processors.  Since we only handle HPT guests so
> far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
> word as on previous processors, so we don't need to conditionally
> execute different instructions depending on the processor.
>
> The local flush on first entry to a guest in book3s_hv_rmhandlers.S
> is a loop which depends on the number of TLB sets.  Rather than
> using feature sections to set the number of iterations based on
> which CPU we're on, we now work out this number at VM creation time
> and store it in the kvm_arch struct.  That will make it possible to
> get the number from the device tree in future, which will help with
> compatibility with future processors.
>
> Since mmu_partition_table_set_entry() does a global flush of the
> whole LPID, we don't need to do the TLB flush on first entry to the
> guest on each processor.  Therefore we don't set all bits in the
> tlb_need_flush bitmap on VM startup on POWER9.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/include/asm/kvm_host.h     |  1 +
>  arch/powerpc/kernel/asm-offsets.c       |  1 +
>  arch/powerpc/kvm/book3s_hv.c            | 17 ++++++++++++++++-
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c     | 10 ++++++++--
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 ++------
>  5 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 0d94608..ea78864 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -244,6 +244,7 @@ struct kvm_arch_memory_slot {
>  struct kvm_arch {
>  	unsigned int lpid;
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +	unsigned int tlb_sets;
>  	unsigned long hpt_virt;
>  	struct revmap_entry *revmap;
>  	atomic64_t mmio_update;
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 494241b..b9c8386 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -487,6 +487,7 @@ int main(void)
>  
>  	/* book3s */
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +	DEFINE(KVM_TLB_SETS, offsetof(struct kvm, arch.tlb_sets));
>  	DEFINE(KVM_SDR1, offsetof(struct kvm, arch.sdr1));
>  	DEFINE(KVM_HOST_LPID, offsetof(struct kvm, arch.host_lpid));
>  	DEFINE(KVM_HOST_LPCR, offsetof(struct kvm, arch.host_lpcr));
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 59e18dfb..8395a7f 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3260,8 +3260,11 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>  	 * Since we don't flush the TLB when tearing down a VM,
>  	 * and this lpid might have previously been used,
>  	 * make sure we flush on each core before running the new VM.
> +	 * On POWER9, the tlbie in mmu_partition_table_set_entry()
> +	 * does this flush for us.
>  	 */
> -	cpumask_setall(&kvm->arch.need_tlb_flush);
> +	if (!cpu_has_feature(CPU_FTR_ARCH_300))
> +		cpumask_setall(&kvm->arch.need_tlb_flush);
>  
>  	/* Start out with the default set of hcalls enabled */
>  	memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
> @@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>  	kvm->arch.lpcr = lpcr;
>  
>  	/*
> +	 * Work out how many sets the TLB has, for the use of
> +	 * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> +	 */
> +	if (cpu_has_feature(CPU_FTR_ARCH_300))
> +		kvm->arch.tlb_sets = 256;	/* POWER9 */
> +	else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> +		kvm->arch.tlb_sets = 512;	/* POWER8 */
> +	else
> +		kvm->arch.tlb_sets = 128;	/* POWER7 */
> +

We have 

#define POWER7_TLB_SETS		128	/* # sets in POWER7 TLB */
#define POWER8_TLB_SETS		512	/* # sets in POWER8 TLB */
#define POWER9_TLB_SETS_HASH	256	/* # sets in POWER9 TLB Hash mode */
#define POWER9_TLB_SETS_RADIX	128	/* # sets in POWER9 TLB Radix mode */

May be use that instead of opencoding ?


> +	/*
>  	 * Track that we now have a HV mode VM active. This blocks secondary
>  	 * CPU threads from coming online.
>  	 */
> @@ -3728,3 +3742,4 @@ module_exit(kvmppc_book3s_exit_hv);
>  MODULE_LICENSE("GPL");
>  MODULE_ALIAS_MISCDEV(KVM_MINOR);
>  MODULE_ALIAS("devname:kvm");
> +
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 1179e40..9ef3c4b 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -424,13 +424,18 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
>  {
>  	long i;
>  
> +	/*
> +	 * We use the POWER9 5-operand versions of tlbie and tlbiel here.
> +	 * Since we are using RIC=0 PRS=0 R=0, and P7/P8 tlbiel ignores
> +	 * the RS field, this is backwards-compatible with P7 and P8.
> +	 */
>  	if (global) {
>  		while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
>  			cpu_relax();
>  		if (need_sync)
>  			asm volatile("ptesync" : : : "memory");
>  		for (i = 0; i < npages; ++i)
> -			asm volatile(PPC_TLBIE(%1,%0) : :
> +			asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
>  				     "r" (rbvalues[i]), "r" (kvm->arch.lpid));
>  		asm volatile("eieio; tlbsync; ptesync" : : : "memory");
>  		kvm->arch.tlbie_lock = 0;
> @@ -438,7 +443,8 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
>  		if (need_sync)
>  			asm volatile("ptesync" : : : "memory");
>  		for (i = 0; i < npages; ++i)
> -			asm volatile("tlbiel %0" : : "r" (rbvalues[i]));
> +			asm volatile(PPC_TLBIEL(%0,%1,0,0,0) : :
> +				     "r" (rbvalues[i]), "r" (0));
>  		asm volatile("ptesync" : : : "memory");
>  	}
>  }
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 219a04f..acae5c3 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -613,12 +613,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>  	stdcx.	r7,0,r6
>  	bne	23b
>  	/* Flush the TLB of any entries for this LPID */
> -	/* use arch 2.07S as a proxy for POWER8 */
> -BEGIN_FTR_SECTION
> -	li	r6,512			/* POWER8 has 512 sets */
> -FTR_SECTION_ELSE
> -	li	r6,128			/* POWER7 has 128 sets */
> -ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_207S)
> +	lwz	r6,KVM_TLB_SETS(r9)
> +	li	r0,0			/* RS for P9 version of tlbiel */
>  	mtctr	r6
>  	li	r7,0x800		/* IS field = 0b10 */
>  	ptesync
> -- 
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
  2016-11-18  7:28   ` Paul Mackerras
@ 2016-11-18 14:59     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:47 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> On POWER9, the msgsnd instruction is able to send interrupts to
> other cores, as well as other threads on the local core.  Since
> msgsnd is generally simpler and faster than sending an IPI via the
> XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kvm/book3s_hv.c         | 11 ++++++++++-
>  arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
>  2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 8395a7f..ace89df 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -147,12 +147,21 @@ static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc,
>  
>  static bool kvmppc_ipi_thread(int cpu)
>  {
> +	unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> +
> +	/* On POWER9 we can use msgsnd to IPI any cpu */
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		msg |= get_hard_smp_processor_id(cpu);
> +		smp_mb();
> +		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> +		return true;
> +	}
> +
>  	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
>  	if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
>  		preempt_disable();
>  		if (cpu_first_thread_sibling(cpu) ==
>  		    cpu_first_thread_sibling(smp_processor_id())) {
> -			unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
>  			msg |= cpu_thread_in_core(cpu);
>  			smp_mb();
>  			__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 0c84d6b..37ed045 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
>  void kvmhv_rm_send_ipi(int cpu)
>  {
>  	unsigned long xics_phys;
> +	unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
>  
> -	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
> +	/* On POWER9 we can use msgsnd for any destination cpu. */
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		msg |= get_hard_smp_processor_id(cpu);
> +		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> +		return;

Do we need a "sync" there  before msgsnd ?

> +	}
> +	/* On POWER8 for IPIs to threads in the same core, use msgsnd. */
>  	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
>  	    cpu_first_thread_sibling(cpu) ==
>  	    cpu_first_thread_sibling(raw_smp_processor_id())) {
> -		unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
>  		msg |= cpu_thread_in_core(cpu);
>  		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
>  		return;
> -- 
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
@ 2016-11-18 14:47     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:47 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> Some special-purpose registers that were present and accessible
> by guests on POWER8 no longer exist on POWER9, so this adds
> feature sections to ensure that we don't try to context-switch
> them when going into or out of a guest on POWER9.  These are
> all relatively obscure, rarely-used registers, but we had to
> context-switch them on POWER8 to avoid creating a covert channel.
> They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.

We don't need to context-switch them even when running a power8 compat
guest ?

>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 ++++++++++++++++++++-------------
>  1 file changed, 30 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index dc25467..d422014 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -752,14 +752,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
>  BEGIN_FTR_SECTION
>  	ld	r5, VCPU_MMCR + 24(r4)
>  	ld	r6, VCPU_SIER(r4)
> +	mtspr	SPRN_MMCR2, r5
> +	mtspr	SPRN_SIER, r6
> +BEGIN_FTR_SECTION_NESTED(96)
>  	lwz	r7, VCPU_PMC + 24(r4)
>  	lwz	r8, VCPU_PMC + 28(r4)
>  	ld	r9, VCPU_MMCR + 32(r4)
> -	mtspr	SPRN_MMCR2, r5
> -	mtspr	SPRN_SIER, r6
>  	mtspr	SPRN_SPMC1, r7
>  	mtspr	SPRN_SPMC2, r8
>  	mtspr	SPRN_MMCRS, r9
> +END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
>  END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_MMCR0, r3
>  	isync
> @@ -815,20 +817,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_EBBHR, r8
>  	ld	r5, VCPU_EBBRR(r4)
>  	ld	r6, VCPU_BESCR(r4)
> -	ld	r7, VCPU_CSIGR(r4)
> -	ld	r8, VCPU_TACR(r4)
> +	lwz	r7, VCPU_GUEST_PID(r4)
> +	ld	r8, VCPU_WORT(r4)
>  	mtspr	SPRN_EBBRR, r5
>  	mtspr	SPRN_BESCR, r6
> -	mtspr	SPRN_CSIGR, r7
> -	mtspr	SPRN_TACR, r8
> +	mtspr	SPRN_PID, r7
> +	mtspr	SPRN_WORT, r8
> +BEGIN_FTR_SECTION
>  	ld	r5, VCPU_TCSCR(r4)
>  	ld	r6, VCPU_ACOP(r4)
> -	lwz	r7, VCPU_GUEST_PID(r4)
> -	ld	r8, VCPU_WORT(r4)
> +	ld	r7, VCPU_CSIGR(r4)
> +	ld	r8, VCPU_TACR(r4)
>  	mtspr	SPRN_TCSCR, r5
>  	mtspr	SPRN_ACOP, r6
> -	mtspr	SPRN_PID, r7
> -	mtspr	SPRN_WORT, r8
> +	mtspr	SPRN_CSIGR, r7
> +	mtspr	SPRN_TACR, r8
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>  8:
>  
>  	/*
> @@ -1343,20 +1347,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>  	std	r8, VCPU_EBBHR(r9)
>  	mfspr	r5, SPRN_EBBRR
>  	mfspr	r6, SPRN_BESCR
> -	mfspr	r7, SPRN_CSIGR
> -	mfspr	r8, SPRN_TACR
> +	mfspr	r7, SPRN_PID
> +	mfspr	r8, SPRN_WORT
>  	std	r5, VCPU_EBBRR(r9)
>  	std	r6, VCPU_BESCR(r9)
> -	std	r7, VCPU_CSIGR(r9)
> -	std	r8, VCPU_TACR(r9)
> +	stw	r7, VCPU_GUEST_PID(r9)
> +	std	r8, VCPU_WORT(r9)
> +BEGIN_FTR_SECTION
>  	mfspr	r5, SPRN_TCSCR
>  	mfspr	r6, SPRN_ACOP
> -	mfspr	r7, SPRN_PID
> -	mfspr	r8, SPRN_WORT
> +	mfspr	r7, SPRN_CSIGR
> +	mfspr	r8, SPRN_TACR
>  	std	r5, VCPU_TCSCR(r9)
>  	std	r6, VCPU_ACOP(r9)
> -	stw	r7, VCPU_GUEST_PID(r9)
> -	std	r8, VCPU_WORT(r9)
> +	std	r7, VCPU_CSIGR(r9)
> +	std	r8, VCPU_TACR(r9)
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>  	/*
>  	 * Restore various registers to 0, where non-zero values
>  	 * set by the guest could disrupt the host.
> @@ -1365,12 +1371,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_IAMR, r0
>  	mtspr	SPRN_CIABR, r0
>  	mtspr	SPRN_DAWRX, r0
> -	mtspr	SPRN_TCSCR, r0
>  	mtspr	SPRN_WORT, r0
> +BEGIN_FTR_SECTION
> +	mtspr	SPRN_TCSCR, r0
>  	/* Set MMCRS to 1<<31 to freeze and disable the SPMC counters */
>  	li	r0, 1
>  	sldi	r0, r0, 31
>  	mtspr	SPRN_MMCRS, r0
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>  8:
>  
>  	/* Save and reset AMR and UAMOR before turning on the MMU */
> @@ -1504,15 +1512,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	stw	r8, VCPU_PMC + 20(r9)
>  BEGIN_FTR_SECTION
>  	mfspr	r5, SPRN_SIER
> +	std	r5, VCPU_SIER(r9)
> +BEGIN_FTR_SECTION_NESTED(96)
>  	mfspr	r6, SPRN_SPMC1
>  	mfspr	r7, SPRN_SPMC2
>  	mfspr	r8, SPRN_MMCRS
> -	std	r5, VCPU_SIER(r9)
>  	stw	r6, VCPU_PMC + 24(r9)
>  	stw	r7, VCPU_PMC + 28(r9)
>  	std	r8, VCPU_MMCR + 32(r9)
>  	lis	r4, 0x8000
>  	mtspr	SPRN_MMCRS, r4
> +END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
>  END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  22:
>  	/* Clear out SLB */
> -- 
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-18 14:53     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:53 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
> and tlbiel (local tlbie) instructions.  Both instructions get a
> set of new parameters (RIC, PRS and R) which appear as bits in the
> instruction word.  The tlbiel instruction now has a second register
> operand, which contains a PID and/or LPID value if needed, and
> should otherwise contain 0.
>
> This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
> as well as older processors.  Since we only handle HPT guests so
> far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
> word as on previous processors, so we don't need to conditionally
> execute different instructions depending on the processor.
>
> The local flush on first entry to a guest in book3s_hv_rmhandlers.S
> is a loop which depends on the number of TLB sets.  Rather than
> using feature sections to set the number of iterations based on
> which CPU we're on, we now work out this number at VM creation time
> and store it in the kvm_arch struct.  That will make it possible to
> get the number from the device tree in future, which will help with
> compatibility with future processors.
>
> Since mmu_partition_table_set_entry() does a global flush of the
> whole LPID, we don't need to do the TLB flush on first entry to the
> guest on each processor.  Therefore we don't set all bits in the
> tlb_need_flush bitmap on VM startup on POWER9.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/include/asm/kvm_host.h     |  1 +
>  arch/powerpc/kernel/asm-offsets.c       |  1 +
>  arch/powerpc/kvm/book3s_hv.c            | 17 ++++++++++++++++-
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c     | 10 ++++++++--
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 ++------
>  5 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 0d94608..ea78864 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -244,6 +244,7 @@ struct kvm_arch_memory_slot {
>  struct kvm_arch {
>  	unsigned int lpid;
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +	unsigned int tlb_sets;
>  	unsigned long hpt_virt;
>  	struct revmap_entry *revmap;
>  	atomic64_t mmio_update;
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 494241b..b9c8386 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -487,6 +487,7 @@ int main(void)
>  
>  	/* book3s */
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +	DEFINE(KVM_TLB_SETS, offsetof(struct kvm, arch.tlb_sets));
>  	DEFINE(KVM_SDR1, offsetof(struct kvm, arch.sdr1));
>  	DEFINE(KVM_HOST_LPID, offsetof(struct kvm, arch.host_lpid));
>  	DEFINE(KVM_HOST_LPCR, offsetof(struct kvm, arch.host_lpcr));
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 59e18dfb..8395a7f 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3260,8 +3260,11 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>  	 * Since we don't flush the TLB when tearing down a VM,
>  	 * and this lpid might have previously been used,
>  	 * make sure we flush on each core before running the new VM.
> +	 * On POWER9, the tlbie in mmu_partition_table_set_entry()
> +	 * does this flush for us.
>  	 */
> -	cpumask_setall(&kvm->arch.need_tlb_flush);
> +	if (!cpu_has_feature(CPU_FTR_ARCH_300))
> +		cpumask_setall(&kvm->arch.need_tlb_flush);
>  
>  	/* Start out with the default set of hcalls enabled */
>  	memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
> @@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>  	kvm->arch.lpcr = lpcr;
>  
>  	/*
> +	 * Work out how many sets the TLB has, for the use of
> +	 * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> +	 */
> +	if (cpu_has_feature(CPU_FTR_ARCH_300))
> +		kvm->arch.tlb_sets = 256;	/* POWER9 */
> +	else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> +		kvm->arch.tlb_sets = 512;	/* POWER8 */
> +	else
> +		kvm->arch.tlb_sets = 128;	/* POWER7 */
> +

We have 

#define POWER7_TLB_SETS		128	/* # sets in POWER7 TLB */
#define POWER8_TLB_SETS		512	/* # sets in POWER8 TLB */
#define POWER9_TLB_SETS_HASH	256	/* # sets in POWER9 TLB Hash mode */
#define POWER9_TLB_SETS_RADIX	128	/* # sets in POWER9 TLB Radix mode */

May be use that instead of opencoding ?


> +	/*
>  	 * Track that we now have a HV mode VM active. This blocks secondary
>  	 * CPU threads from coming online.
>  	 */
> @@ -3728,3 +3742,4 @@ module_exit(kvmppc_book3s_exit_hv);
>  MODULE_LICENSE("GPL");
>  MODULE_ALIAS_MISCDEV(KVM_MINOR);
>  MODULE_ALIAS("devname:kvm");
> +
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 1179e40..9ef3c4b 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -424,13 +424,18 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
>  {
>  	long i;
>  
> +	/*
> +	 * We use the POWER9 5-operand versions of tlbie and tlbiel here.
> +	 * Since we are using RIC=0 PRS=0 R=0, and P7/P8 tlbiel ignores
> +	 * the RS field, this is backwards-compatible with P7 and P8.
> +	 */
>  	if (global) {
>  		while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
>  			cpu_relax();
>  		if (need_sync)
>  			asm volatile("ptesync" : : : "memory");
>  		for (i = 0; i < npages; ++i)
> -			asm volatile(PPC_TLBIE(%1,%0) : :
> +			asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
>  				     "r" (rbvalues[i]), "r" (kvm->arch.lpid));
>  		asm volatile("eieio; tlbsync; ptesync" : : : "memory");
>  		kvm->arch.tlbie_lock = 0;
> @@ -438,7 +443,8 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
>  		if (need_sync)
>  			asm volatile("ptesync" : : : "memory");
>  		for (i = 0; i < npages; ++i)
> -			asm volatile("tlbiel %0" : : "r" (rbvalues[i]));
> +			asm volatile(PPC_TLBIEL(%0,%1,0,0,0) : :
> +				     "r" (rbvalues[i]), "r" (0));
>  		asm volatile("ptesync" : : : "memory");
>  	}
>  }
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 219a04f..acae5c3 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -613,12 +613,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>  	stdcx.	r7,0,r6
>  	bne	23b
>  	/* Flush the TLB of any entries for this LPID */
> -	/* use arch 2.07S as a proxy for POWER8 */
> -BEGIN_FTR_SECTION
> -	li	r6,512			/* POWER8 has 512 sets */
> -FTR_SECTION_ELSE
> -	li	r6,128			/* POWER7 has 128 sets */
> -ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_207S)
> +	lwz	r6,KVM_TLB_SETS(r9)
> +	li	r0,0			/* RS for P9 version of tlbiel */
>  	mtctr	r6
>  	li	r7,0x800		/* IS field = 0b10 */
>  	ptesync
> -- 
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
@ 2016-11-18 14:59     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-18 14:59 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> On POWER9, the msgsnd instruction is able to send interrupts to
> other cores, as well as other threads on the local core.  Since
> msgsnd is generally simpler and faster than sending an IPI via the
> XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kvm/book3s_hv.c         | 11 ++++++++++-
>  arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
>  2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 8395a7f..ace89df 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -147,12 +147,21 @@ static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc,
>  
>  static bool kvmppc_ipi_thread(int cpu)
>  {
> +	unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> +
> +	/* On POWER9 we can use msgsnd to IPI any cpu */
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		msg |= get_hard_smp_processor_id(cpu);
> +		smp_mb();
> +		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> +		return true;
> +	}
> +
>  	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
>  	if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
>  		preempt_disable();
>  		if (cpu_first_thread_sibling(cpu) =
>  		    cpu_first_thread_sibling(smp_processor_id())) {
> -			unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
>  			msg |= cpu_thread_in_core(cpu);
>  			smp_mb();
>  			__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 0c84d6b..37ed045 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
>  void kvmhv_rm_send_ipi(int cpu)
>  {
>  	unsigned long xics_phys;
> +	unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
>  
> -	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
> +	/* On POWER9 we can use msgsnd for any destination cpu. */
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		msg |= get_hard_smp_processor_id(cpu);
> +		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> +		return;

Do we need a "sync" there  before msgsnd ?

> +	}
> +	/* On POWER8 for IPIs to threads in the same core, use msgsnd. */
>  	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
>  	    cpu_first_thread_sibling(cpu) =
>  	    cpu_first_thread_sibling(raw_smp_processor_id())) {
> -		unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
>  		msg |= cpu_thread_in_core(cpu);
>  		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
>  		return;
> -- 
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
  2016-11-18 14:53     ` Aneesh Kumar K.V
@ 2016-11-18 21:57       ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 64+ messages in thread
From: Benjamin Herrenschmidt @ 2016-11-18 21:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

On Fri, 2016-11-18 at 20:11 +0530, Aneesh Kumar K.V wrote:
> > +      * Work out how many sets the TLB has, for the use of
> > +      * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > +      */
> > +     if (cpu_has_feature(CPU_FTR_ARCH_300))
> > +             kvm->arch.tlb_sets = 256;       /* POWER9 */
> > +     else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > +             kvm->arch.tlb_sets = 512;       /* POWER8 */
> > +     else
> > +             kvm->arch.tlb_sets = 128;       /* POWER7 */
> > +
> 
> We have 
> 
> #define POWER7_TLB_SETS         128     /* # sets in POWER7 TLB */
> #define POWER8_TLB_SETS         512     /* # sets in POWER8 TLB */
> #define POWER9_TLB_SETS_HASH    256     /* # sets in POWER9 TLB Hash mode */
> #define POWER9_TLB_SETS_RADIX   128     /* # sets in POWER9 TLB Radix mode */
> 
> May be use that instead of opencoding ?

Both are bad and are going to kill us for future backward
compatibility.

These should be a device-tree property. We can fallback to hard wired
values if it doesn't exist but we should at least look for one.

Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
in the CPU node, so let's create a new one instead, with 2 entries
(hash vs. radix) or 2 new ones, one for hash and one for radix (when
available).

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-18 21:57       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 64+ messages in thread
From: Benjamin Herrenschmidt @ 2016-11-18 21:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

On Fri, 2016-11-18 at 20:11 +0530, Aneesh Kumar K.V wrote:
> > +      * Work out how many sets the TLB has, for the use of
> > +      * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > +      */
> > +     if (cpu_has_feature(CPU_FTR_ARCH_300))
> > +             kvm->arch.tlb_sets = 256;       /* POWER9 */
> > +     else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > +             kvm->arch.tlb_sets = 512;       /* POWER8 */
> > +     else
> > +             kvm->arch.tlb_sets = 128;       /* POWER7 */
> > +
> 
> We have 
> 
> #define POWER7_TLB_SETS         128     /* # sets in POWER7 TLB */
> #define POWER8_TLB_SETS         512     /* # sets in POWER8 TLB */
> #define POWER9_TLB_SETS_HASH    256     /* # sets in POWER9 TLB Hash mode */
> #define POWER9_TLB_SETS_RADIX   128     /* # sets in POWER9 TLB Radix mode */
> 
> May be use that instead of opencoding ?

Both are bad and are going to kill us for future backward
compatibility.

These should be a device-tree property. We can fallback to hard wired
values if it doesn't exist but we should at least look for one.

Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
in the CPU node, so let's create a new one instead, with 2 entries
(hash vs. radix) or 2 new ones, one for hash and one for radix (when
available).

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
  2016-11-18  7:28   ` Paul Mackerras
@ 2016-11-19  0:38     ` Balbir Singh
  -1 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19  0:38 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev



On 18/11/16 18:28, Paul Mackerras wrote:
> This adapts the KVM-HV hashed page table (HPT) code to read and write
> HPT entries in the new format defined in Power ISA v3.00 on POWER9
> machines.  The new format moves the B (segment size) field from the
> first doubleword to the second, and trims some bits from the AVA
> (abbreviated virtual address) and ARPN (abbreviated real page number)
> fields.  As far as possible, the conversion is done when reading or
> writing the HPT entries, and the rest of the code continues to use
> the old format.
> 

I had a verison to do this, but it assumed we supported both PTE formats (old
and new) and that kvm would be aware of the format supported (the one you reviewed).
This is much nicer now that we know that we support *only* the older format 
for KVM guests.


> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c |  39 ++++++++++----
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 101 +++++++++++++++++++++++++-----------
>  2 files changed, 100 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 7755bd0..20a8e8e 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -314,7 +314,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
>  	struct kvmppc_slb *slbe;
>  	unsigned long slb_v;
>  	unsigned long pp, key;
> -	unsigned long v, gr;
> +	unsigned long v, orig_v, gr;
>  	__be64 *hptep;
>  	int index;
>  	int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
> @@ -339,10 +339,12 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
>  		return -ENOENT;
>  	}
>  	hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
> -	v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
> +	v = orig_v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
> +	if (cpu_has_feature(CPU_FTR_ARCH_300))
> +		v = hpte_new_to_old_v(v, be64_to_cpu(hptep[1]));
>  	gr = kvm->arch.revmap[index].guest_rpte;
>  
> -	unlock_hpte(hptep, v);
> +	unlock_hpte(hptep, orig_v);
>  	preempt_enable();
>  
>  	gpte->eaddr = eaddr;
> @@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  {
>  	struct kvm *kvm = vcpu->kvm;
>  	unsigned long hpte[3], r;
> +	unsigned long hnow_v, hnow_r;
>  	__be64 *hptep;
>  	unsigned long mmu_seq, psize, pte_size;
>  	unsigned long gpa_base, gfn_base;
> @@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  	unlock_hpte(hptep, hpte[0]);
>  	preempt_enable();
>  
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
> +		hpte[1] = hpte_new_to_old_r(hpte[1]);
> +	}

I think we can avoid this, if we avoid the conversion in kvmppc_hpte_hv_fault().
If we decide not to do this, then gpa will need to use a new mask to extract
the correct gpa.

>  	if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
>  	    hpte[1] != vcpu->arch.pgfault_hpte[1])
>  		return RESUME_GUEST;
> @@ -599,9 +606,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  	preempt_disable();
>  	while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
>  		cpu_relax();
> -	if ((be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK) != hpte[0] ||
> -		be64_to_cpu(hptep[1]) != hpte[1] ||
> -		rev->guest_rpte != hpte[2])
> +	hnow_v = be64_to_cpu(hptep[0]);
> +	hnow_r = be64_to_cpu(hptep[1]);
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		hnow_v = hpte_new_to_old_v(hnow_v, hnow_r);
> +		hnow_r = hpte_new_to_old_r(hnow_r);
> +	}
> +	if ((hnow_v & ~HPTE_V_HVLOCK) != hpte[0] || hnow_r != hpte[1] ||
> +	    rev->guest_rpte != hpte[2])

These changes can be avoided as well (based on the comment above)

>  		/* HPTE has been changed under us; let the guest retry */
>  		goto out_unlock;
>  	hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
> @@ -632,6 +644,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  		kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
>  	}
>  
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		r = hpte_old_to_new_r(hpte[0], r);
> +		hpte[0] = hpte_old_to_new_v(hpte[0]);
> +	}
>  	hptep[1] = cpu_to_be64(r);
>  	eieio();
>  	__unlock_hpte(hptep, hpte[0]);
> @@ -1183,7 +1199,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
>  			unsigned long *hpte, struct revmap_entry *revp,
>  			int want_valid, int first_pass)
>  {
> -	unsigned long v, r;
> +	unsigned long v, r, hr;
>  	unsigned long rcbits_unset;
>  	int ok = 1;
>  	int valid, dirty;
> @@ -1210,6 +1226,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
>  		while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
>  			cpu_relax();
>  		v = be64_to_cpu(hptp[0]);
> +		hr = be64_to_cpu(hptp[1]);
> +		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +			v = hpte_new_to_old_v(v, hr);
> +			hr = hpte_new_to_old_r(hr);
> +		}
>  
>  		/* re-evaluate valid and dirty from synchronized HPTE value */
>  		valid = !!(v & HPTE_V_VALID);
> @@ -1217,8 +1238,8 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
>  
>  		/* Harvest R and C into guest view if necessary */
>  		rcbits_unset = ~revp->guest_rpte & (HPTE_R_R | HPTE_R_C);
> -		if (valid && (rcbits_unset & be64_to_cpu(hptp[1]))) {
> -			revp->guest_rpte |= (be64_to_cpu(hptp[1]) &
> +		if (valid && (rcbits_unset & hr)) {
> +			revp->guest_rpte |= (hr &
>  				(HPTE_R_R | HPTE_R_C)) | HPTE_GR_MODIFIED;
>  			dirty = 1;
>  		}
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 02786b3..1179e40 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -364,6 +364,11 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
>  		}
>  	}
>  
> +	/* Convert to new format on P9 */
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		ptel = hpte_old_to_new_r(pteh, ptel);
> +		pteh = hpte_old_to_new_v(pteh);
> +	}

So much nicer when we support just one format, otherwise my patches did a whole
bunch of unnecessary changes.

>  	hpte[1] = cpu_to_be64(ptel);
>  
>  	/* Write the first HPTE dword, unlocking the HPTE and making it valid */
> @@ -445,27 +450,31 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
>  	__be64 *hpte;
>  	unsigned long v, r, rb;
>  	struct revmap_entry *rev;
> -	u64 pte;
> +	u64 pte, orig_pte, pte_r;
>  
>  	if (pte_index >= kvm->arch.hpt_npte)
>  		return H_PARAMETER;
>  	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
>  	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
>  		cpu_relax();
> -	pte = be64_to_cpu(hpte[0]);
> +	pte = orig_pte = be64_to_cpu(hpte[0]);
> +	pte_r = be64_to_cpu(hpte[1]);
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		pte = hpte_new_to_old_v(pte, pte_r);
> +		pte_r = hpte_new_to_old_r(pte_r);
> +	}
>  	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
>  	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
>  	    ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
> -		__unlock_hpte(hpte, pte);
> +		__unlock_hpte(hpte, orig_pte);
>  		return H_NOT_FOUND;
>  	}
>  
>  	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
>  	v = pte & ~HPTE_V_HVLOCK;
> -	pte = be64_to_cpu(hpte[1]);
>  	if (v & HPTE_V_VALID) {
>  		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
> -		rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
> +		rb = compute_tlbie_rb(v, pte_r, pte_index);

This is good, I think it makes sense to retain the old format for compute_tlbie_rb()

>  		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
>  		/*
>  		 * The reference (R) and change (C) bits in a HPT
> @@ -483,7 +492,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
>  	note_hpte_modification(kvm, rev);
>  	unlock_hpte(hpte, 0);
>  
> -	if (is_mmio_hpte(v, pte))
> +	if (is_mmio_hpte(v, pte_r))
>  		atomic64_inc(&kvm->arch.mmio_update);
>  
>  	if (v & HPTE_V_ABSENT)
> @@ -546,6 +555,10 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
>  			found = 0;
>  			hp0 = be64_to_cpu(hp[0]);
>  			hp1 = be64_to_cpu(hp[1]);
> +			if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +				hp0 = hpte_new_to_old_v(hp0, hp1);
> +				hp1 = hpte_new_to_old_r(hp1);
> +			}
>  			if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
>  				switch (flags & 3) {
>  				case 0:		/* absolute */
> @@ -583,8 +596,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
>  
>  			/* leave it locked */
>  			hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
> -			tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
> -				be64_to_cpu(hp[1]), pte_index);
> +			tlbrb[n] = compute_tlbie_rb(hp0, hp1, pte_index);
>  			indexes[n] = j;
>  			hptes[n] = hp;
>  			revs[n] = rev;
> @@ -622,7 +634,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
>  	__be64 *hpte;
>  	struct revmap_entry *rev;
>  	unsigned long v, r, rb, mask, bits;
> -	u64 pte;
> +	u64 pte_v, pte_r;
>  
>  	if (pte_index >= kvm->arch.hpt_npte)
>  		return H_PARAMETER;
> @@ -630,15 +642,16 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
>  	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
>  	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
>  		cpu_relax();
> -	pte = be64_to_cpu(hpte[0]);
> -	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
> -	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
> -		__unlock_hpte(hpte, pte);
> +	v = pte_v = be64_to_cpu(hpte[0]);
> +	if (cpu_has_feature(CPU_FTR_ARCH_300))
> +		v = hpte_new_to_old_v(v, be64_to_cpu(hpte[1]));
> +	if ((v & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
> +	    ((flags & H_AVPN) && (v & ~0x7fUL) != avpn)) {
> +		__unlock_hpte(hpte, pte_v);
>  		return H_NOT_FOUND;
>  	}
>  
> -	v = pte;
> -	pte = be64_to_cpu(hpte[1]);
> +	pte_r = be64_to_cpu(hpte[1]);
>  	bits = (flags << 55) & HPTE_R_PP0;
>  	bits |= (flags << 48) & HPTE_R_KEY_HI;
>  	bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
> @@ -660,13 +673,13 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
>  		 * readonly to writable.  If it should be writable, we'll
>  		 * take a trap and let the page fault code sort it out.
>  		 */
> -		r = (pte & ~mask) | bits;
> -		if (hpte_is_writable(r) && !hpte_is_writable(pte))
> +		r = (pte_r & ~mask) | bits;
> +		if (hpte_is_writable(r) && !hpte_is_writable(pte_r))
>  			r = hpte_make_readonly(r);
>  		/* If the PTE is changing, invalidate it first */
> -		if (r != pte) {
> +		if (r != pte_r) {
>  			rb = compute_tlbie_rb(v, r, pte_index);
> -			hpte[0] = cpu_to_be64((v & ~HPTE_V_VALID) |
> +			hpte[0] = cpu_to_be64((pte_v & ~HPTE_V_VALID) |
>  					      HPTE_V_ABSENT);
>  			do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
>  				  true);
> @@ -675,9 +688,9 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
>  			hpte[1] = cpu_to_be64(r);
>  		}
>  	}
> -	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
> +	unlock_hpte(hpte, pte_v & ~HPTE_V_HVLOCK);
>  	asm volatile("ptesync" : : : "memory");
> -	if (is_mmio_hpte(v, pte))
> +	if (is_mmio_hpte(v, pte_r))
>  		atomic64_inc(&kvm->arch.mmio_update);
>  
>  	return H_SUCCESS;
> @@ -703,6 +716,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
>  		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
>  		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
>  		r = be64_to_cpu(hpte[1]);
> +		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +			v = hpte_new_to_old_v(v, r);
> +			r = hpte_new_to_old_r(r);
> +		}
>  		if (v & HPTE_V_ABSENT) {
>  			v &= ~HPTE_V_ABSENT;
>  			v |= HPTE_V_VALID;
> @@ -820,10 +837,16 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
>  			unsigned long pte_index)
>  {
>  	unsigned long rb;
> +	u64 hp0, hp1;
>  
>  	hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
> -	rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
> -			      pte_index);
> +	hp0 = be64_to_cpu(hptep[0]);
> +	hp1 = be64_to_cpu(hptep[1]);
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		hp0 = hpte_new_to_old_v(hp0, hp1);
> +		hp1 = hpte_new_to_old_r(hp1);
> +	}
> +	rb = compute_tlbie_rb(hp0, hp1, pte_index);
>  	do_tlbies(kvm, &rb, 1, 1, true);
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_invalidate_hpte);
> @@ -833,9 +856,15 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
>  {
>  	unsigned long rb;
>  	unsigned char rbyte;
> +	u64 hp0, hp1;
>  
> -	rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
> -			      pte_index);
> +	hp0 = be64_to_cpu(hptep[0]);
> +	hp1 = be64_to_cpu(hptep[1]);
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		hp0 = hpte_new_to_old_v(hp0, hp1);
> +		hp1 = hpte_new_to_old_r(hp1);
> +	}
> +	rb = compute_tlbie_rb(hp0, hp1, pte_index);
>  	rbyte = (be64_to_cpu(hptep[1]) & ~HPTE_R_R) >> 8;
>  	/* modify only the second-last byte, which contains the ref bit */
>  	*((char *)hptep + 14) = rbyte;
> @@ -895,7 +924,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
>  	unsigned long avpn;
>  	__be64 *hpte;
>  	unsigned long mask, val;
> -	unsigned long v, r;
> +	unsigned long v, r, orig_v;
>  
>  	/* Get page shift, work out hash and AVPN etc. */
>  	mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_SECONDARY;
> @@ -930,6 +959,8 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
>  		for (i = 0; i < 16; i += 2) {
>  			/* Read the PTE racily */
>  			v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
> +			if (cpu_has_feature(CPU_FTR_ARCH_300))
> +				v = hpte_new_to_old_v(v, be64_to_cpu(hpte[i+1]));
>  
>  			/* Check valid/absent, hash, segment size and AVPN */
>  			if (!(v & valid) || (v & mask) != val)
> @@ -938,8 +969,12 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
>  			/* Lock the PTE and read it under the lock */
>  			while (!try_lock_hpte(&hpte[i], HPTE_V_HVLOCK))
>  				cpu_relax();
> -			v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
> +			v = orig_v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
>  			r = be64_to_cpu(hpte[i+1]);
> +			if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +				v = hpte_new_to_old_v(v, r);
> +				r = hpte_new_to_old_r(r);
> +			}
>  
>  			/*
>  			 * Check the HPTE again, including base page size
> @@ -949,7 +984,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
>  				/* Return with the HPTE still locked */
>  				return (hash << 3) + (i >> 1);
>  
> -			__unlock_hpte(&hpte[i], v);
> +			__unlock_hpte(&hpte[i], orig_v);
>  		}
>  
>  		if (val & HPTE_V_SECONDARY)
> @@ -977,7 +1012,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
>  {
>  	struct kvm *kvm = vcpu->kvm;
>  	long int index;
> -	unsigned long v, r, gr;
> +	unsigned long v, r, gr, orig_v;
>  	__be64 *hpte;
>  	unsigned long valid;
>  	struct revmap_entry *rev;
> @@ -1005,12 +1040,16 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
>  			return 0;	/* for prot fault, HPTE disappeared */
>  		}
>  		hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
> -		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
> +		v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
>  		r = be64_to_cpu(hpte[1]);
> +		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +			v = hpte_new_to_old_v(v, r);
> +			r = hpte_new_to_old_r(r);
> +		}
>  		rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
>  		gr = rev->guest_rpte;
>  
> -		unlock_hpte(hpte, v);
> +		unlock_hpte(hpte, orig_v);
>  	}
>  
>  	/* For not found, if the HPTE is valid by now, retry the instruction */
> 


Reviewed-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
@ 2016-11-19  0:38     ` Balbir Singh
  0 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19  0:38 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev



On 18/11/16 18:28, Paul Mackerras wrote:
> This adapts the KVM-HV hashed page table (HPT) code to read and write
> HPT entries in the new format defined in Power ISA v3.00 on POWER9
> machines.  The new format moves the B (segment size) field from the
> first doubleword to the second, and trims some bits from the AVA
> (abbreviated virtual address) and ARPN (abbreviated real page number)
> fields.  As far as possible, the conversion is done when reading or
> writing the HPT entries, and the rest of the code continues to use
> the old format.
> 

I had a verison to do this, but it assumed we supported both PTE formats (old
and new) and that kvm would be aware of the format supported (the one you reviewed).
This is much nicer now that we know that we support *only* the older format 
for KVM guests.


> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c |  39 ++++++++++----
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 101 +++++++++++++++++++++++++-----------
>  2 files changed, 100 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 7755bd0..20a8e8e 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -314,7 +314,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
>  	struct kvmppc_slb *slbe;
>  	unsigned long slb_v;
>  	unsigned long pp, key;
> -	unsigned long v, gr;
> +	unsigned long v, orig_v, gr;
>  	__be64 *hptep;
>  	int index;
>  	int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
> @@ -339,10 +339,12 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
>  		return -ENOENT;
>  	}
>  	hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
> -	v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
> +	v = orig_v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
> +	if (cpu_has_feature(CPU_FTR_ARCH_300))
> +		v = hpte_new_to_old_v(v, be64_to_cpu(hptep[1]));
>  	gr = kvm->arch.revmap[index].guest_rpte;
>  
> -	unlock_hpte(hptep, v);
> +	unlock_hpte(hptep, orig_v);
>  	preempt_enable();
>  
>  	gpte->eaddr = eaddr;
> @@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  {
>  	struct kvm *kvm = vcpu->kvm;
>  	unsigned long hpte[3], r;
> +	unsigned long hnow_v, hnow_r;
>  	__be64 *hptep;
>  	unsigned long mmu_seq, psize, pte_size;
>  	unsigned long gpa_base, gfn_base;
> @@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  	unlock_hpte(hptep, hpte[0]);
>  	preempt_enable();
>  
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
> +		hpte[1] = hpte_new_to_old_r(hpte[1]);
> +	}

I think we can avoid this, if we avoid the conversion in kvmppc_hpte_hv_fault().
If we decide not to do this, then gpa will need to use a new mask to extract
the correct gpa.

>  	if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
>  	    hpte[1] != vcpu->arch.pgfault_hpte[1])
>  		return RESUME_GUEST;
> @@ -599,9 +606,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  	preempt_disable();
>  	while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
>  		cpu_relax();
> -	if ((be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK) != hpte[0] ||
> -		be64_to_cpu(hptep[1]) != hpte[1] ||
> -		rev->guest_rpte != hpte[2])
> +	hnow_v = be64_to_cpu(hptep[0]);
> +	hnow_r = be64_to_cpu(hptep[1]);
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		hnow_v = hpte_new_to_old_v(hnow_v, hnow_r);
> +		hnow_r = hpte_new_to_old_r(hnow_r);
> +	}
> +	if ((hnow_v & ~HPTE_V_HVLOCK) != hpte[0] || hnow_r != hpte[1] ||
> +	    rev->guest_rpte != hpte[2])

These changes can be avoided as well (based on the comment above)

>  		/* HPTE has been changed under us; let the guest retry */
>  		goto out_unlock;
>  	hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
> @@ -632,6 +644,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  		kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
>  	}
>  
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		r = hpte_old_to_new_r(hpte[0], r);
> +		hpte[0] = hpte_old_to_new_v(hpte[0]);
> +	}
>  	hptep[1] = cpu_to_be64(r);
>  	eieio();
>  	__unlock_hpte(hptep, hpte[0]);
> @@ -1183,7 +1199,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
>  			unsigned long *hpte, struct revmap_entry *revp,
>  			int want_valid, int first_pass)
>  {
> -	unsigned long v, r;
> +	unsigned long v, r, hr;
>  	unsigned long rcbits_unset;
>  	int ok = 1;
>  	int valid, dirty;
> @@ -1210,6 +1226,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
>  		while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
>  			cpu_relax();
>  		v = be64_to_cpu(hptp[0]);
> +		hr = be64_to_cpu(hptp[1]);
> +		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +			v = hpte_new_to_old_v(v, hr);
> +			hr = hpte_new_to_old_r(hr);
> +		}
>  
>  		/* re-evaluate valid and dirty from synchronized HPTE value */
>  		valid = !!(v & HPTE_V_VALID);
> @@ -1217,8 +1238,8 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
>  
>  		/* Harvest R and C into guest view if necessary */
>  		rcbits_unset = ~revp->guest_rpte & (HPTE_R_R | HPTE_R_C);
> -		if (valid && (rcbits_unset & be64_to_cpu(hptp[1]))) {
> -			revp->guest_rpte |= (be64_to_cpu(hptp[1]) &
> +		if (valid && (rcbits_unset & hr)) {
> +			revp->guest_rpte |= (hr &
>  				(HPTE_R_R | HPTE_R_C)) | HPTE_GR_MODIFIED;
>  			dirty = 1;
>  		}
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 02786b3..1179e40 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -364,6 +364,11 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
>  		}
>  	}
>  
> +	/* Convert to new format on P9 */
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		ptel = hpte_old_to_new_r(pteh, ptel);
> +		pteh = hpte_old_to_new_v(pteh);
> +	}

So much nicer when we support just one format, otherwise my patches did a whole
bunch of unnecessary changes.

>  	hpte[1] = cpu_to_be64(ptel);
>  
>  	/* Write the first HPTE dword, unlocking the HPTE and making it valid */
> @@ -445,27 +450,31 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
>  	__be64 *hpte;
>  	unsigned long v, r, rb;
>  	struct revmap_entry *rev;
> -	u64 pte;
> +	u64 pte, orig_pte, pte_r;
>  
>  	if (pte_index >= kvm->arch.hpt_npte)
>  		return H_PARAMETER;
>  	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
>  	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
>  		cpu_relax();
> -	pte = be64_to_cpu(hpte[0]);
> +	pte = orig_pte = be64_to_cpu(hpte[0]);
> +	pte_r = be64_to_cpu(hpte[1]);
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		pte = hpte_new_to_old_v(pte, pte_r);
> +		pte_r = hpte_new_to_old_r(pte_r);
> +	}
>  	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
>  	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
>  	    ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
> -		__unlock_hpte(hpte, pte);
> +		__unlock_hpte(hpte, orig_pte);
>  		return H_NOT_FOUND;
>  	}
>  
>  	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
>  	v = pte & ~HPTE_V_HVLOCK;
> -	pte = be64_to_cpu(hpte[1]);
>  	if (v & HPTE_V_VALID) {
>  		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
> -		rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
> +		rb = compute_tlbie_rb(v, pte_r, pte_index);

This is good, I think it makes sense to retain the old format for compute_tlbie_rb()

>  		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
>  		/*
>  		 * The reference (R) and change (C) bits in a HPT
> @@ -483,7 +492,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
>  	note_hpte_modification(kvm, rev);
>  	unlock_hpte(hpte, 0);
>  
> -	if (is_mmio_hpte(v, pte))
> +	if (is_mmio_hpte(v, pte_r))
>  		atomic64_inc(&kvm->arch.mmio_update);
>  
>  	if (v & HPTE_V_ABSENT)
> @@ -546,6 +555,10 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
>  			found = 0;
>  			hp0 = be64_to_cpu(hp[0]);
>  			hp1 = be64_to_cpu(hp[1]);
> +			if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +				hp0 = hpte_new_to_old_v(hp0, hp1);
> +				hp1 = hpte_new_to_old_r(hp1);
> +			}
>  			if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
>  				switch (flags & 3) {
>  				case 0:		/* absolute */
> @@ -583,8 +596,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
>  
>  			/* leave it locked */
>  			hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
> -			tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
> -				be64_to_cpu(hp[1]), pte_index);
> +			tlbrb[n] = compute_tlbie_rb(hp0, hp1, pte_index);
>  			indexes[n] = j;
>  			hptes[n] = hp;
>  			revs[n] = rev;
> @@ -622,7 +634,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
>  	__be64 *hpte;
>  	struct revmap_entry *rev;
>  	unsigned long v, r, rb, mask, bits;
> -	u64 pte;
> +	u64 pte_v, pte_r;
>  
>  	if (pte_index >= kvm->arch.hpt_npte)
>  		return H_PARAMETER;
> @@ -630,15 +642,16 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
>  	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
>  	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
>  		cpu_relax();
> -	pte = be64_to_cpu(hpte[0]);
> -	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
> -	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
> -		__unlock_hpte(hpte, pte);
> +	v = pte_v = be64_to_cpu(hpte[0]);
> +	if (cpu_has_feature(CPU_FTR_ARCH_300))
> +		v = hpte_new_to_old_v(v, be64_to_cpu(hpte[1]));
> +	if ((v & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
> +	    ((flags & H_AVPN) && (v & ~0x7fUL) != avpn)) {
> +		__unlock_hpte(hpte, pte_v);
>  		return H_NOT_FOUND;
>  	}
>  
> -	v = pte;
> -	pte = be64_to_cpu(hpte[1]);
> +	pte_r = be64_to_cpu(hpte[1]);
>  	bits = (flags << 55) & HPTE_R_PP0;
>  	bits |= (flags << 48) & HPTE_R_KEY_HI;
>  	bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
> @@ -660,13 +673,13 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
>  		 * readonly to writable.  If it should be writable, we'll
>  		 * take a trap and let the page fault code sort it out.
>  		 */
> -		r = (pte & ~mask) | bits;
> -		if (hpte_is_writable(r) && !hpte_is_writable(pte))
> +		r = (pte_r & ~mask) | bits;
> +		if (hpte_is_writable(r) && !hpte_is_writable(pte_r))
>  			r = hpte_make_readonly(r);
>  		/* If the PTE is changing, invalidate it first */
> -		if (r != pte) {
> +		if (r != pte_r) {
>  			rb = compute_tlbie_rb(v, r, pte_index);
> -			hpte[0] = cpu_to_be64((v & ~HPTE_V_VALID) |
> +			hpte[0] = cpu_to_be64((pte_v & ~HPTE_V_VALID) |
>  					      HPTE_V_ABSENT);
>  			do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
>  				  true);
> @@ -675,9 +688,9 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
>  			hpte[1] = cpu_to_be64(r);
>  		}
>  	}
> -	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
> +	unlock_hpte(hpte, pte_v & ~HPTE_V_HVLOCK);
>  	asm volatile("ptesync" : : : "memory");
> -	if (is_mmio_hpte(v, pte))
> +	if (is_mmio_hpte(v, pte_r))
>  		atomic64_inc(&kvm->arch.mmio_update);
>  
>  	return H_SUCCESS;
> @@ -703,6 +716,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
>  		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
>  		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
>  		r = be64_to_cpu(hpte[1]);
> +		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +			v = hpte_new_to_old_v(v, r);
> +			r = hpte_new_to_old_r(r);
> +		}
>  		if (v & HPTE_V_ABSENT) {
>  			v &= ~HPTE_V_ABSENT;
>  			v |= HPTE_V_VALID;
> @@ -820,10 +837,16 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
>  			unsigned long pte_index)
>  {
>  	unsigned long rb;
> +	u64 hp0, hp1;
>  
>  	hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
> -	rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
> -			      pte_index);
> +	hp0 = be64_to_cpu(hptep[0]);
> +	hp1 = be64_to_cpu(hptep[1]);
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		hp0 = hpte_new_to_old_v(hp0, hp1);
> +		hp1 = hpte_new_to_old_r(hp1);
> +	}
> +	rb = compute_tlbie_rb(hp0, hp1, pte_index);
>  	do_tlbies(kvm, &rb, 1, 1, true);
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_invalidate_hpte);
> @@ -833,9 +856,15 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
>  {
>  	unsigned long rb;
>  	unsigned char rbyte;
> +	u64 hp0, hp1;
>  
> -	rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
> -			      pte_index);
> +	hp0 = be64_to_cpu(hptep[0]);
> +	hp1 = be64_to_cpu(hptep[1]);
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		hp0 = hpte_new_to_old_v(hp0, hp1);
> +		hp1 = hpte_new_to_old_r(hp1);
> +	}
> +	rb = compute_tlbie_rb(hp0, hp1, pte_index);
>  	rbyte = (be64_to_cpu(hptep[1]) & ~HPTE_R_R) >> 8;
>  	/* modify only the second-last byte, which contains the ref bit */
>  	*((char *)hptep + 14) = rbyte;
> @@ -895,7 +924,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
>  	unsigned long avpn;
>  	__be64 *hpte;
>  	unsigned long mask, val;
> -	unsigned long v, r;
> +	unsigned long v, r, orig_v;
>  
>  	/* Get page shift, work out hash and AVPN etc. */
>  	mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_SECONDARY;
> @@ -930,6 +959,8 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
>  		for (i = 0; i < 16; i += 2) {
>  			/* Read the PTE racily */
>  			v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
> +			if (cpu_has_feature(CPU_FTR_ARCH_300))
> +				v = hpte_new_to_old_v(v, be64_to_cpu(hpte[i+1]));
>  
>  			/* Check valid/absent, hash, segment size and AVPN */
>  			if (!(v & valid) || (v & mask) != val)
> @@ -938,8 +969,12 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
>  			/* Lock the PTE and read it under the lock */
>  			while (!try_lock_hpte(&hpte[i], HPTE_V_HVLOCK))
>  				cpu_relax();
> -			v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
> +			v = orig_v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
>  			r = be64_to_cpu(hpte[i+1]);
> +			if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +				v = hpte_new_to_old_v(v, r);
> +				r = hpte_new_to_old_r(r);
> +			}
>  
>  			/*
>  			 * Check the HPTE again, including base page size
> @@ -949,7 +984,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
>  				/* Return with the HPTE still locked */
>  				return (hash << 3) + (i >> 1);
>  
> -			__unlock_hpte(&hpte[i], v);
> +			__unlock_hpte(&hpte[i], orig_v);
>  		}
>  
>  		if (val & HPTE_V_SECONDARY)
> @@ -977,7 +1012,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
>  {
>  	struct kvm *kvm = vcpu->kvm;
>  	long int index;
> -	unsigned long v, r, gr;
> +	unsigned long v, r, gr, orig_v;
>  	__be64 *hpte;
>  	unsigned long valid;
>  	struct revmap_entry *rev;
> @@ -1005,12 +1040,16 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
>  			return 0;	/* for prot fault, HPTE disappeared */
>  		}
>  		hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
> -		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
> +		v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
>  		r = be64_to_cpu(hpte[1]);
> +		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +			v = hpte_new_to_old_v(v, r);
> +			r = hpte_new_to_old_r(r);
> +		}
>  		rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
>  		gr = rev->guest_rpte;
>  
> -		unlock_hpte(hpte, v);
> +		unlock_hpte(hpte, orig_v);
>  	}
>  
>  	/* For not found, if the HPTE is valid by now, retry the instruction */
> 


Reviewed-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
  2016-11-18  7:28   ` Paul Mackerras
@ 2016-11-19  0:45     ` Balbir Singh
  -1 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19  0:45 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

> +#ifdef CONFIG_PPC_BOOK3S_64
> +void mmu_partition_table_init(void)
> +{
> +	unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
> +
> +	BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");

This should be 36 (12 + 24)

> +	partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
> +						MEMBLOCK_ALLOC_ANYWHERE));
> +

Balbir

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-19  0:45     ` Balbir Singh
  0 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19  0:45 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev

> +#ifdef CONFIG_PPC_BOOK3S_64
> +void mmu_partition_table_init(void)
> +{
> +	unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
> +
> +	BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");

This should be 36 (12 + 24)

> +	partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
> +						MEMBLOCK_ALLOC_ANYWHERE));
> +

Balbir

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 06/13] KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9
  2016-11-18  7:28   ` Paul Mackerras
@ 2016-11-19  1:01     ` Balbir Singh
  -1 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19  1:01 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev



On 18/11/16 18:28, Paul Mackerras wrote:
> On POWER9, the SDR1 register (hashed page table base address) is no
> longer used, and instead the hardware reads the HPT base address
> and size from the partition table.  The partition table entry also
> contains the bits that specify the page size for the VRMA mapping,
> which were previously in the LPCR.  The VPM0 bit of the LPCR is
> now reserved; the processor now always uses the VRMA (virtual
> real-mode area) mechanism for guest real-mode accesses in HPT mode,
> and the RMO (real-mode offset) mechanism has been dropped.
> 
> When entering or exiting the guest, we now only have to set the
> LPIDR (logical partition ID register), not the SDR1 register.
> There is also no requirement now to transition via a reserved
> LPID value.
> 

I had similar changes, but did not have the VPM and host SDR switching
bits either.


> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kvm/book3s_hv.c            | 36 +++++++++++++++++++++++++++------
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 ++++++---
>  2 files changed, 37 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 40b2b6d..5cbe3c3 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -54,6 +54,7 @@
>  #include <asm/dbell.h>
>  #include <asm/hmi.h>
>  #include <asm/pnv-pci.h>
> +#include <asm/mmu.h>
>  #include <linux/gfp.h>
>  #include <linux/vmalloc.h>
>  #include <linux/highmem.h>
> @@ -3024,6 +3025,22 @@ static void kvmppc_mmu_destroy_hv(struct kvm_vcpu *vcpu)
>  	return;
>  }
>  
> +static void kvmppc_setup_partition_table(struct kvm *kvm)
> +{
> +	unsigned long dw0, dw1;
> +
> +	/* PS field - page size for VRMA */
> +	dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
> +		((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
> +	/* HTABSIZE and HTABORG fields */
> +	dw0 |= kvm->arch.sdr1;
> +
> +	/* Second dword has GR=0; other fields are unused since UPRT=0 */
> +	dw1 = 0;

Don't we need to set LPCR_GTSE for legacy guests?

Otherwise

Reviewed-by: Balbir Singh <bsingharora@gmail.com>



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 06/13] KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9
@ 2016-11-19  1:01     ` Balbir Singh
  0 siblings, 0 replies; 64+ messages in thread
From: Balbir Singh @ 2016-11-19  1:01 UTC (permalink / raw)
  To: Paul Mackerras, kvm, kvm-ppc, linuxppc-dev



On 18/11/16 18:28, Paul Mackerras wrote:
> On POWER9, the SDR1 register (hashed page table base address) is no
> longer used, and instead the hardware reads the HPT base address
> and size from the partition table.  The partition table entry also
> contains the bits that specify the page size for the VRMA mapping,
> which were previously in the LPCR.  The VPM0 bit of the LPCR is
> now reserved; the processor now always uses the VRMA (virtual
> real-mode area) mechanism for guest real-mode accesses in HPT mode,
> and the RMO (real-mode offset) mechanism has been dropped.
> 
> When entering or exiting the guest, we now only have to set the
> LPIDR (logical partition ID register), not the SDR1 register.
> There is also no requirement now to transition via a reserved
> LPID value.
> 

I had similar changes, but did not have the VPM and host SDR switching
bits either.


> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kvm/book3s_hv.c            | 36 +++++++++++++++++++++++++++------
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 ++++++---
>  2 files changed, 37 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 40b2b6d..5cbe3c3 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -54,6 +54,7 @@
>  #include <asm/dbell.h>
>  #include <asm/hmi.h>
>  #include <asm/pnv-pci.h>
> +#include <asm/mmu.h>
>  #include <linux/gfp.h>
>  #include <linux/vmalloc.h>
>  #include <linux/highmem.h>
> @@ -3024,6 +3025,22 @@ static void kvmppc_mmu_destroy_hv(struct kvm_vcpu *vcpu)
>  	return;
>  }
>  
> +static void kvmppc_setup_partition_table(struct kvm *kvm)
> +{
> +	unsigned long dw0, dw1;
> +
> +	/* PS field - page size for VRMA */
> +	dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
> +		((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
> +	/* HTABSIZE and HTABORG fields */
> +	dw0 |= kvm->arch.sdr1;
> +
> +	/* Second dword has GR=0; other fields are unused since UPRT=0 */
> +	dw1 = 0;

Don't we need to set LPCR_GTSE for legacy guests?

Otherwise

Reviewed-by: Balbir Singh <bsingharora@gmail.com>



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
  2016-11-18 14:59     ` Aneesh Kumar K.V
@ 2016-11-19  3:53       ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  3:53 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev

On Fri, Nov 18, 2016 at 08:17:25PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
> 
> > On POWER9, the msgsnd instruction is able to send interrupts to
> > other cores, as well as other threads on the local core.  Since
> > msgsnd is generally simpler and faster than sending an IPI via the
> > XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
> >
> > Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> > ---
> >  arch/powerpc/kvm/book3s_hv.c         | 11 ++++++++++-
> >  arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
> >  2 files changed, 18 insertions(+), 3 deletions(-)
> >
[...]
> > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> > index 0c84d6b..37ed045 100644
> > --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> > +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> > @@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
> >  void kvmhv_rm_send_ipi(int cpu)
> >  {
> >  	unsigned long xics_phys;
> > +	unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> >  
> > -	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
> > +	/* On POWER9 we can use msgsnd for any destination cpu. */
> > +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > +		msg |= get_hard_smp_processor_id(cpu);
> > +		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> > +		return;
> 
> Do we need a "sync" there  before msgsnd ?

The comment just above this function says:

/*
 * Send an interrupt or message to another CPU.
 * This can only be called in real mode.
 * The caller needs to include any barrier needed to order writes
 * to memory vs. the IPI/message.
 */

so no.  In fact all of its callers do smp_mb() before calling it.
(And no we don't want to move the smp_mb() into kvmhv_rm_send_ipi();
see kvmhv_interrupt_vcore() for why.)

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
@ 2016-11-19  3:53       ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  3:53 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev

On Fri, Nov 18, 2016 at 08:17:25PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
> 
> > On POWER9, the msgsnd instruction is able to send interrupts to
> > other cores, as well as other threads on the local core.  Since
> > msgsnd is generally simpler and faster than sending an IPI via the
> > XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
> >
> > Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> > ---
> >  arch/powerpc/kvm/book3s_hv.c         | 11 ++++++++++-
> >  arch/powerpc/kvm/book3s_hv_builtin.c | 10 ++++++++--
> >  2 files changed, 18 insertions(+), 3 deletions(-)
> >
[...]
> > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> > index 0c84d6b..37ed045 100644
> > --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> > +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> > @@ -205,12 +205,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
> >  void kvmhv_rm_send_ipi(int cpu)
> >  {
> >  	unsigned long xics_phys;
> > +	unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> >  
> > -	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
> > +	/* On POWER9 we can use msgsnd for any destination cpu. */
> > +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > +		msg |= get_hard_smp_processor_id(cpu);
> > +		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> > +		return;
> 
> Do we need a "sync" there  before msgsnd ?

The comment just above this function says:

/*
 * Send an interrupt or message to another CPU.
 * This can only be called in real mode.
 * The caller needs to include any barrier needed to order writes
 * to memory vs. the IPI/message.
 */

so no.  In fact all of its callers do smp_mb() before calling it.
(And no we don't want to move the smp_mb() into kvmhv_rm_send_ipi();
see kvmhv_interrupt_vcore() for why.)

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
  2016-11-18 14:47     ` Aneesh Kumar K.V
@ 2016-11-19  4:02       ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  4:02 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev

On Fri, Nov 18, 2016 at 08:05:47PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
> 
> > Some special-purpose registers that were present and accessible
> > by guests on POWER8 no longer exist on POWER9, so this adds
> > feature sections to ensure that we don't try to context-switch
> > them when going into or out of a guest on POWER9.  These are
> > all relatively obscure, rarely-used registers, but we had to
> > context-switch them on POWER8 to avoid creating a covert channel.
> > They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.
> 
> We don't need to context-switch them even when running a power8 compat
> guest ?

They physically don't exist on the P9 chip, so how could we
context-switch them?  They certainly can't be used as a covert
channel.

Accesses to them will be a no-op for the guest in privileged
(supervisor) mode (i.e., mfspr won't modify the destination
register), which could be confusing for the guest if it was expecting
to use them.  SPMC1/2 and MMCRS are part of the "supervisor" PMU,
which we have never used.  I think CSIGR, TACR and TCSCR are part of a
facility that was never completely implemented or usable on P8, so
nothing uses them.  ACOP is used in arch/powerpc/mm/icswx.c in
conjunction with accelerators.  There might be a problem there, but in
any case, with no physical ACOP register present there's no way to
save/restore it.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9
@ 2016-11-19  4:02       ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  4:02 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev

On Fri, Nov 18, 2016 at 08:05:47PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
> 
> > Some special-purpose registers that were present and accessible
> > by guests on POWER8 no longer exist on POWER9, so this adds
> > feature sections to ensure that we don't try to context-switch
> > them when going into or out of a guest on POWER9.  These are
> > all relatively obscure, rarely-used registers, but we had to
> > context-switch them on POWER8 to avoid creating a covert channel.
> > They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.
> 
> We don't need to context-switch them even when running a power8 compat
> guest ?

They physically don't exist on the P9 chip, so how could we
context-switch them?  They certainly can't be used as a covert
channel.

Accesses to them will be a no-op for the guest in privileged
(supervisor) mode (i.e., mfspr won't modify the destination
register), which could be confusing for the guest if it was expecting
to use them.  SPMC1/2 and MMCRS are part of the "supervisor" PMU,
which we have never used.  I think CSIGR, TACR and TCSCR are part of a
facility that was never completely implemented or usable on P8, so
nothing uses them.  ACOP is used in arch/powerpc/mm/icswx.c in
conjunction with accelerators.  There might be a problem there, but in
any case, with no physical ACOP register present there's no way to
save/restore it.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
  2016-11-18 14:53     ` Aneesh Kumar K.V
@ 2016-11-19  4:13       ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  4:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev

On Fri, Nov 18, 2016 at 08:11:34PM +0530, Aneesh Kumar K.V wrote:
> > @@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> >  	kvm->arch.lpcr = lpcr;
> >  
> >  	/*
> > +	 * Work out how many sets the TLB has, for the use of
> > +	 * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > +	 */
> > +	if (cpu_has_feature(CPU_FTR_ARCH_300))
> > +		kvm->arch.tlb_sets = 256;	/* POWER9 */
> > +	else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > +		kvm->arch.tlb_sets = 512;	/* POWER8 */
> > +	else
> > +		kvm->arch.tlb_sets = 128;	/* POWER7 */
> > +
> 
> We have 
> 
> #define POWER7_TLB_SETS		128	/* # sets in POWER7 TLB */
> #define POWER8_TLB_SETS		512	/* # sets in POWER8 TLB */
> #define POWER9_TLB_SETS_HASH	256	/* # sets in POWER9 TLB Hash mode */
> #define POWER9_TLB_SETS_RADIX	128	/* # sets in POWER9 TLB Radix mode */
> 
> May be use that instead of opencoding ?

Doing that would make it easier to check that we're using the same
values everywhere but harder to see what actual numbers we're
getting.  I guess I could use the symbols and put the values in the
comments.  In any case, in future these values are just going to be
default values if we can't find a suitable device-tree property.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-19  4:13       ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  4:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev

On Fri, Nov 18, 2016 at 08:11:34PM +0530, Aneesh Kumar K.V wrote:
> > @@ -3287,6 +3290,17 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> >  	kvm->arch.lpcr = lpcr;
> >  
> >  	/*
> > +	 * Work out how many sets the TLB has, for the use of
> > +	 * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > +	 */
> > +	if (cpu_has_feature(CPU_FTR_ARCH_300))
> > +		kvm->arch.tlb_sets = 256;	/* POWER9 */
> > +	else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > +		kvm->arch.tlb_sets = 512;	/* POWER8 */
> > +	else
> > +		kvm->arch.tlb_sets = 128;	/* POWER7 */
> > +
> 
> We have 
> 
> #define POWER7_TLB_SETS		128	/* # sets in POWER7 TLB */
> #define POWER8_TLB_SETS		512	/* # sets in POWER8 TLB */
> #define POWER9_TLB_SETS_HASH	256	/* # sets in POWER9 TLB Hash mode */
> #define POWER9_TLB_SETS_RADIX	128	/* # sets in POWER9 TLB Radix mode */
> 
> May be use that instead of opencoding ?

Doing that would make it easier to check that we're using the same
values everywhere but harder to see what actual numbers we're
getting.  I guess I could use the symbols and put the values in the
comments.  In any case, in future these values are just going to be
default values if we can't find a suitable device-tree property.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
  2016-11-18 21:57       ` Benjamin Herrenschmidt
@ 2016-11-19  4:14         ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  4:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Aneesh Kumar K.V, kvm, kvm-ppc, linuxppc-dev

On Sat, Nov 19, 2016 at 08:57:28AM +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2016-11-18 at 20:11 +0530, Aneesh Kumar K.V wrote:
> > > +      * Work out how many sets the TLB has, for the use of
> > > +      * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > > +      */
> > > +     if (cpu_has_feature(CPU_FTR_ARCH_300))
> > > +             kvm->arch.tlb_sets = 256;       /* POWER9 */
> > > +     else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > > +             kvm->arch.tlb_sets = 512;       /* POWER8 */
> > > +     else
> > > +             kvm->arch.tlb_sets = 128;       /* POWER7 */
> > > +
> > 
> > We have 
> > 
> > #define POWER7_TLB_SETS         128     /* # sets in POWER7 TLB */
> > #define POWER8_TLB_SETS         512     /* # sets in POWER8 TLB */
> > #define POWER9_TLB_SETS_HASH    256     /* # sets in POWER9 TLB Hash mode */
> > #define POWER9_TLB_SETS_RADIX   128     /* # sets in POWER9 TLB Radix mode */
> > 
> > May be use that instead of opencoding ?
> 
> Both are bad and are going to kill us for future backward
> compatibility.
> 
> These should be a device-tree property. We can fallback to hard wired
> values if it doesn't exist but we should at least look for one.

Tell me what the property is called and I'll add code to use it. :)
That's the whole reason why I moved this to C code.

> Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
> in the CPU node, so let's create a new one instead, with 2 entries
> (hash vs. radix) or 2 new ones, one for hash and one for radix (when
> available).

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-19  4:14         ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  4:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Aneesh Kumar K.V, kvm, kvm-ppc, linuxppc-dev

On Sat, Nov 19, 2016 at 08:57:28AM +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2016-11-18 at 20:11 +0530, Aneesh Kumar K.V wrote:
> > > +      * Work out how many sets the TLB has, for the use of
> > > +      * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> > > +      */
> > > +     if (cpu_has_feature(CPU_FTR_ARCH_300))
> > > +             kvm->arch.tlb_sets = 256;       /* POWER9 */
> > > +     else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > > +             kvm->arch.tlb_sets = 512;       /* POWER8 */
> > > +     else
> > > +             kvm->arch.tlb_sets = 128;       /* POWER7 */
> > > +
> > 
> > We have 
> > 
> > #define POWER7_TLB_SETS         128     /* # sets in POWER7 TLB */
> > #define POWER8_TLB_SETS         512     /* # sets in POWER8 TLB */
> > #define POWER9_TLB_SETS_HASH    256     /* # sets in POWER9 TLB Hash mode */
> > #define POWER9_TLB_SETS_RADIX   128     /* # sets in POWER9 TLB Radix mode */
> > 
> > May be use that instead of opencoding ?
> 
> Both are bad and are going to kill us for future backward
> compatibility.
> 
> These should be a device-tree property. We can fallback to hard wired
> values if it doesn't exist but we should at least look for one.

Tell me what the property is called and I'll add code to use it. :)
That's the whole reason why I moved this to C code.

> Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
> in the CPU node, so let's create a new one instead, with 2 entries
> (hash vs. radix) or 2 new ones, one for hash and one for radix (when
> available).

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
  2016-11-18 14:39     ` Aneesh Kumar K.V
@ 2016-11-19  4:19       ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  4:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev

On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
>  +
> > +	/* Global flush of TLBs and partition table caches for this lpid */
> > +	asm volatile("ptesync");
> > +	asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> > +	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> > +}
> 
> 
> It would be nice to convert that 0x800 to a documented IS value or better use
> radix__flush_tlb_pid() ?

Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
flush.  I could use TLBIEL_INVAL_SET_LPID except the name implies it's
for tlbiel and this is a tlbie.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-19  4:19       ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  4:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev

On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
>  +
> > +	/* Global flush of TLBs and partition table caches for this lpid */
> > +	asm volatile("ptesync");
> > +	asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> > +	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> > +}
> 
> 
> It would be nice to convert that 0x800 to a documented IS value or better use
> radix__flush_tlb_pid() ?

Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
flush.  I could use TLBIEL_INVAL_SET_LPID except the name implies it's
for tlbiel and this is a tlbie.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
  2016-11-19  0:45     ` Balbir Singh
@ 2016-11-19  4:23       ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  4:23 UTC (permalink / raw)
  To: Balbir Singh; +Cc: kvm, kvm-ppc, linuxppc-dev

On Sat, Nov 19, 2016 at 11:45:52AM +1100, Balbir Singh wrote:
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +void mmu_partition_table_init(void)
> > +{
> > +	unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
> > +
> > +	BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
> 
> This should be 36 (12 + 24)

True, though for P9, PATB_SIZE_SHIFT has to be 16.

The BUILD_BUG_ON_MSG is probably not really necessary - I just moved
this code from elsewhere.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-19  4:23       ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-19  4:23 UTC (permalink / raw)
  To: Balbir Singh; +Cc: kvm, kvm-ppc, linuxppc-dev

On Sat, Nov 19, 2016 at 11:45:52AM +1100, Balbir Singh wrote:
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +void mmu_partition_table_init(void)
> > +{
> > +	unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
> > +
> > +	BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
> 
> This should be 36 (12 + 24)

True, though for P9, PATB_SIZE_SHIFT has to be 16.

The BUILD_BUG_ON_MSG is probably not really necessary - I just moved
this code from elsewhere.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
  2016-11-19  4:14         ` Paul Mackerras
@ 2016-11-19  4:41           ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 64+ messages in thread
From: Benjamin Herrenschmidt @ 2016-11-19  4:41 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Aneesh Kumar K.V, kvm, kvm-ppc, linuxppc-dev

On Sat, 2016-11-19 at 15:14 +1100, Paul Mackerras wrote:
> 
> > These should be a device-tree property. We can fallback to hard wired
> > values if it doesn't exist but we should at least look for one.
> 
> Tell me what the property is called and I'll add code to use it. :)
> That's the whole reason why I moved this to C code.
> 
> > 
> > Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
> > in the CPU node, so let's create a new one instead, with 2 entries
> > (hash vs. radix) or 2 new ones, one for hash and one for radix (when
> > available).

Well, as I said above, there's a defined one but it has bogus values
on almost all P8 firwmares. So I think we need the core code to export
values for use by both the core mm and KVM which can then be picked up
from the DT with "quirks" to fixup the DT values.

(A bit like I did for the never-applied cache geometry patches)

That or we make up new names.

The question remains whether we need a separate property for radix
vs. hash though, we probably should as the "radix is half of hash"
might not be true on future chips.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
@ 2016-11-19  4:41           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 64+ messages in thread
From: Benjamin Herrenschmidt @ 2016-11-19  4:41 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Aneesh Kumar K.V, kvm, kvm-ppc, linuxppc-dev

On Sat, 2016-11-19 at 15:14 +1100, Paul Mackerras wrote:
> 
> > These should be a device-tree property. We can fallback to hard wired
> > values if it doesn't exist but we should at least look for one.
> 
> Tell me what the property is called and I'll add code to use it. :)
> That's the whole reason why I moved this to C code.
> 
> > 
> > Note: P8 firmwares all have a bug creating a bogus "tlb-sets" property
> > in the CPU node, so let's create a new one instead, with 2 entries
> > (hash vs. radix) or 2 new ones, one for hash and one for radix (when
> > available).

Well, as I said above, there's a defined one but it has bogus values
on almost all P8 firwmares. So I think we need the core code to export
values for use by both the core mm and KVM which can then be picked up
from the DT with "quirks" to fixup the DT values.

(A bit like I did for the never-applied cache geometry patches)

That or we make up new names.

The question remains whether we need a separate property for radix
vs. hash though, we probably should as the "radix is half of hash"
might not be true on future chips.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
  2016-11-19  4:19       ` Paul Mackerras
@ 2016-11-19  6:47         ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-19  6:35 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
>> Paul Mackerras <paulus@ozlabs.org> writes:
>>  +
>> > +	/* Global flush of TLBs and partition table caches for this lpid */
>> > +	asm volatile("ptesync");
>> > +	asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
>> > +	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
>> > +}
>> 
>> 
>> It would be nice to convert that 0x800 to a documented IS value or better use
>> radix__flush_tlb_pid() ?
>
> Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
> flush.  I could use TLBIEL_INVAL_SET_LPID except the name implies it's
> for tlbiel and this is a tlbie.
>

I wrote that wrong, we really don't have tlb_pid() what we have is tlb_lpid().

void radix__flush_tlb_lpid(unsigned long lpid)
{
	unsigned long rb,rs,prs,r;
	unsigned long ric = RIC_FLUSH_ALL;

	rb = 0x2 << PPC_BITLSHIFT(53); /* IS = 2 */
	rs = lpid & ((1UL << 32) - 1);
	prs = 0; /* partition scoped */
	r = 1;   /* raidx format */

	asm volatile("ptesync": : :"memory");
	asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
		     : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
	asm volatile("eieio; tlbsync; ptesync": : :"memory");
}


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-19  6:47         ` Aneesh Kumar K.V
  0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2016-11-19  6:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
>> Paul Mackerras <paulus@ozlabs.org> writes:
>>  +
>> > +	/* Global flush of TLBs and partition table caches for this lpid */
>> > +	asm volatile("ptesync");
>> > +	asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
>> > +	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
>> > +}
>> 
>> 
>> It would be nice to convert that 0x800 to a documented IS value or better use
>> radix__flush_tlb_pid() ?
>
> Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
> flush.  I could use TLBIEL_INVAL_SET_LPID except the name implies it's
> for tlbiel and this is a tlbie.
>

I wrote that wrong, we really don't have tlb_pid() what we have is tlb_lpid().

void radix__flush_tlb_lpid(unsigned long lpid)
{
	unsigned long rb,rs,prs,r;
	unsigned long ric = RIC_FLUSH_ALL;

	rb = 0x2 << PPC_BITLSHIFT(53); /* IS = 2 */
	rs = lpid & ((1UL << 32) - 1);
	prs = 0; /* partition scoped */
	r = 1;   /* raidx format */

	asm volatile("ptesync": : :"memory");
	asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
		     : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
	asm volatile("eieio; tlbsync; ptesync": : :"memory");
}


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
  2016-11-19  0:38     ` Balbir Singh
@ 2016-11-21  2:02       ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-21  2:02 UTC (permalink / raw)
  To: Balbir Singh; +Cc: kvm, kvm-ppc, linuxppc-dev

On Sat, Nov 19, 2016 at 11:38:40AM +1100, Balbir Singh wrote:
> 
> 
> On 18/11/16 18:28, Paul Mackerras wrote:
> > This adapts the KVM-HV hashed page table (HPT) code to read and write
> > HPT entries in the new format defined in Power ISA v3.00 on POWER9
> > machines.  The new format moves the B (segment size) field from the
> > first doubleword to the second, and trims some bits from the AVA
> > (abbreviated virtual address) and ARPN (abbreviated real page number)
> > fields.  As far as possible, the conversion is done when reading or
> > writing the HPT entries, and the rest of the code continues to use
> > the old format.
[snip]
> > @@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> >  {
> >  	struct kvm *kvm = vcpu->kvm;
> >  	unsigned long hpte[3], r;
> > +	unsigned long hnow_v, hnow_r;
> >  	__be64 *hptep;
> >  	unsigned long mmu_seq, psize, pte_size;
> >  	unsigned long gpa_base, gfn_base;
> > @@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> >  	unlock_hpte(hptep, hpte[0]);
> >  	preempt_enable();
> >  
> > +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > +		hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
> > +		hpte[1] = hpte_new_to_old_r(hpte[1]);
> > +	}
> 
> I think we can avoid this, if we avoid the conversion in kvmppc_hpte_hv_fault().
> If we decide not to do this, then gpa will need to use a new mask to extract
> the correct gpa.

Yes, we could store vcpu->arch.pgfault[] in native format, i.e. new
format on P9.  That might make the code a bit simpler indeed.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
@ 2016-11-21  2:02       ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-21  2:02 UTC (permalink / raw)
  To: Balbir Singh; +Cc: kvm, kvm-ppc, linuxppc-dev

On Sat, Nov 19, 2016 at 11:38:40AM +1100, Balbir Singh wrote:
> 
> 
> On 18/11/16 18:28, Paul Mackerras wrote:
> > This adapts the KVM-HV hashed page table (HPT) code to read and write
> > HPT entries in the new format defined in Power ISA v3.00 on POWER9
> > machines.  The new format moves the B (segment size) field from the
> > first doubleword to the second, and trims some bits from the AVA
> > (abbreviated virtual address) and ARPN (abbreviated real page number)
> > fields.  As far as possible, the conversion is done when reading or
> > writing the HPT entries, and the rest of the code continues to use
> > the old format.
[snip]
> > @@ -440,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> >  {
> >  	struct kvm *kvm = vcpu->kvm;
> >  	unsigned long hpte[3], r;
> > +	unsigned long hnow_v, hnow_r;
> >  	__be64 *hptep;
> >  	unsigned long mmu_seq, psize, pte_size;
> >  	unsigned long gpa_base, gfn_base;
> > @@ -488,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
> >  	unlock_hpte(hptep, hpte[0]);
> >  	preempt_enable();
> >  
> > +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > +		hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
> > +		hpte[1] = hpte_new_to_old_r(hpte[1]);
> > +	}
> 
> I think we can avoid this, if we avoid the conversion in kvmppc_hpte_hv_fault().
> If we decide not to do this, then gpa will need to use a new mask to extract
> the correct gpa.

Yes, we could store vcpu->arch.pgfault[] in native format, i.e. new
format on P9.  That might make the code a bit simpler indeed.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
  2016-11-19  6:47         ` Aneesh Kumar K.V
@ 2016-11-21  2:14           ` Paul Mackerras
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-21  2:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev

On Sat, Nov 19, 2016 at 12:05:21PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
> 
> > On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
> >> Paul Mackerras <paulus@ozlabs.org> writes:
> >>  +
> >> > +	/* Global flush of TLBs and partition table caches for this lpid */
> >> > +	asm volatile("ptesync");
> >> > +	asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> >> > +	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> >> > +}
> >> 
> >> 
> >> It would be nice to convert that 0x800 to a documented IS value or better use
> >> radix__flush_tlb_pid() ?
> >
> > Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
> > flush.  I could use TLBIEL_INVAL_SET_LPID except the name implies it's
> > for tlbiel and this is a tlbie.
> >
> 
> I wrote that wrong, we really don't have tlb_pid() what we have is tlb_lpid().
> 
> void radix__flush_tlb_lpid(unsigned long lpid)
> {
> 	unsigned long rb,rs,prs,r;
> 	unsigned long ric = RIC_FLUSH_ALL;
> 
> 	rb = 0x2 << PPC_BITLSHIFT(53); /* IS = 2 */
> 	rs = lpid & ((1UL << 32) - 1);
> 	prs = 0; /* partition scoped */
> 	r = 1;   /* raidx format */
> 
> 	asm volatile("ptesync": : :"memory");
> 	asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
> 		     : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
> 	asm volatile("eieio; tlbsync; ptesync": : :"memory");
> }

That has R=1, I'm using R=0.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table
@ 2016-11-21  2:14           ` Paul Mackerras
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Mackerras @ 2016-11-21  2:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: kvm, kvm-ppc, linuxppc-dev

On Sat, Nov 19, 2016 at 12:05:21PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras <paulus@ozlabs.org> writes:
> 
> > On Fri, Nov 18, 2016 at 07:57:30PM +0530, Aneesh Kumar K.V wrote:
> >> Paul Mackerras <paulus@ozlabs.org> writes:
> >>  +
> >> > +	/* Global flush of TLBs and partition table caches for this lpid */
> >> > +	asm volatile("ptesync");
> >> > +	asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : : "r"(0x800), "r" (lpid));
> >> > +	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> >> > +}
> >> 
> >> 
> >> It would be nice to convert that 0x800 to a documented IS value or better use
> >> radix__flush_tlb_pid() ?
> >
> > Well, not radix__flush_tlb_pid - this isn't radix and it isn't a PID
> > flush.  I could use TLBIEL_INVAL_SET_LPID except the name implies it's
> > for tlbiel and this is a tlbie.
> >
> 
> I wrote that wrong, we really don't have tlb_pid() what we have is tlb_lpid().
> 
> void radix__flush_tlb_lpid(unsigned long lpid)
> {
> 	unsigned long rb,rs,prs,r;
> 	unsigned long ric = RIC_FLUSH_ALL;
> 
> 	rb = 0x2 << PPC_BITLSHIFT(53); /* IS = 2 */
> 	rs = lpid & ((1UL << 32) - 1);
> 	prs = 0; /* partition scoped */
> 	r = 1;   /* raidx format */
> 
> 	asm volatile("ptesync": : :"memory");
> 	asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
> 		     : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
> 	asm volatile("eieio; tlbsync; ptesync": : :"memory");
> }

That has R=1, I'm using R=0.

Paul.

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2016-11-21  2:14 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-18  7:28 [PATCH 00/13] KVM: PPC: Support POWER9 guests Paul Mackerras
2016-11-18  7:28 ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 01/13] powerpc/64: Add some more SPRs and SPR bits for POWER9 Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 02/13] powerpc/64: Provide functions for accessing POWER9 partition table Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-18 14:27   ` Aneesh Kumar K.V
2016-11-18 14:39     ` Aneesh Kumar K.V
2016-11-19  4:19     ` Paul Mackerras
2016-11-19  4:19       ` Paul Mackerras
2016-11-19  6:35       ` Aneesh Kumar K.V
2016-11-19  6:47         ` Aneesh Kumar K.V
2016-11-21  2:14         ` Paul Mackerras
2016-11-21  2:14           ` Paul Mackerras
2016-11-19  0:45   ` Balbir Singh
2016-11-19  0:45     ` Balbir Singh
2016-11-19  4:23     ` Paul Mackerras
2016-11-19  4:23       ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 03/13] powerpc/powernv: Define real-mode versions of OPAL XICS accessors Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 04/13] KVM: PPC: Book3S HV: Don't lose hardware R/C bit updates in H_PROTECT Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 05/13] KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9 Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-19  0:38   ` Balbir Singh
2016-11-19  0:38     ` Balbir Singh
2016-11-21  2:02     ` Paul Mackerras
2016-11-21  2:02       ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 06/13] KVM: PPC: Book3S HV: Set partition table rather than SDR1 " Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-19  1:01   ` Balbir Singh
2016-11-19  1:01     ` Balbir Singh
2016-11-18  7:28 ` [PATCH 07/13] KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9 Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-18 14:35   ` Aneesh Kumar K.V
2016-11-18 14:47     ` Aneesh Kumar K.V
2016-11-19  4:02     ` Paul Mackerras
2016-11-19  4:02       ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 08/13] KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 09/13] KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9 Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-18 14:41   ` Aneesh Kumar K.V
2016-11-18 14:53     ` Aneesh Kumar K.V
2016-11-18 21:57     ` Benjamin Herrenschmidt
2016-11-18 21:57       ` Benjamin Herrenschmidt
2016-11-19  4:14       ` Paul Mackerras
2016-11-19  4:14         ` Paul Mackerras
2016-11-19  4:41         ` Benjamin Herrenschmidt
2016-11-19  4:41           ` Benjamin Herrenschmidt
2016-11-19  4:13     ` Paul Mackerras
2016-11-19  4:13       ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 10/13] KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores " Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-18 14:47   ` Aneesh Kumar K.V
2016-11-18 14:59     ` Aneesh Kumar K.V
2016-11-19  3:53     ` Paul Mackerras
2016-11-19  3:53       ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 11/13] KVM: PPC: Book3S HV: Use OPAL XICS emulation " Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 12/13] KVM: PPC: Book3S HV: Use stop instruction rather than nap " Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras
2016-11-18  7:28 ` [PATCH 13/13] KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores Paul Mackerras
2016-11-18  7:28   ` Paul Mackerras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.