* [PATCH 01/18] powerpc/64: Don't try to use radix MMU under a hypervisor
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 16:48 ` Balbir Singh
2017-01-12 9:07 ` [PATCH 02/18] powerpc/64: Fixes for the ibm, client-architecture-support options Paul Mackerras
` (16 subsequent siblings)
17 siblings, 1 reply; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
Currently, if the kernel is running on a POWER9 processor under a
hypervisor, it will try to use the radix MMU even though it doesn't
have the necessary code to use radix under a hypervisor (it doesn't
negotiate use of radix, and it doesn't do the H_REGISTER_PROC_TBL
hcall). The result is that the guest kernel will crash when it tries
to turn on the MMU.
This fixes it by looking for the /chosen/ibm,architecture-vec-5
property, and if it exists, clears the radix MMU feature bit,
before we decide whether to initialize for radix or HPT. This
property is created by the hypervisor as a result of the guest
calling the ibm,client-architecture-support method to indicate
its capabilities, so it will indicate whether the hypervisor
agreed to us using radix.
Systems without a hypervisor may have this property also (for
example, skiboot creates it), so we check the HV bit in the MSR
to see whether we are running as a guest or not. If we are in
hypervisor mode, then we can do whatever we like including
using the radix MMU.
The reason for using this property is that in future, when we
have support for using radix under a hypervisor, we will need
to check this property to see whether the hypervisor agreed to
us using radix.
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/mm/init_64.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 93abf8a..4d9481e 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -42,6 +42,8 @@
#include <linux/memblock.h>
#include <linux/hugetlb.h>
#include <linux/slab.h>
+#include <linux/of_fdt.h>
+#include <linux/libfdt.h>
#include <asm/pgalloc.h>
#include <asm/page.h>
@@ -344,12 +346,43 @@ static int __init parse_disable_radix(char *p)
}
early_param("disable_radix", parse_disable_radix);
+/*
+ * If we're running under a hypervisor, we currently can't do radix
+ * since we don't have the code to do the H_REGISTER_PROC_TBL hcall.
+ * We tell that we're running under a hypervisor by looking for the
+ * /chosen/ibm,architecture-vec-5 property.
+ */
+static void early_check_vec5(void)
+{
+ unsigned long root, chosen;
+ int size;
+ const u8 *vec5;
+
+ root = of_get_flat_dt_root();
+ chosen = of_get_flat_dt_subnode_by_name(root, "chosen");
+ if (chosen == -FDT_ERR_NOTFOUND)
+ return;
+ vec5 = of_get_flat_dt_prop(chosen, "ibm,architecture-vec-5", &size);
+ if (!vec5)
+ return;
+ cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+}
+
void __init mmu_early_init_devtree(void)
{
/* Disable radix mode based on kernel command line. */
if (disable_radix)
cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+ /*
+ * Check /chosen/ibm,architecture-vec-5 if running as a guest.
+ * When running bare-metal, we can use radix if we like
+ * even though the ibm,architecture-vec-5 property created by
+ * skiboot doesn't have the necessary bits set.
+ */
+ if (early_radix_enabled() && !(mfmsr() & MSR_HV))
+ early_check_vec5();
+
if (early_radix_enabled())
radix__early_init_devtree();
else
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 01/18] powerpc/64: Don't try to use radix MMU under a hypervisor
2017-01-12 9:07 ` [PATCH 01/18] powerpc/64: Don't try to use radix MMU under a hypervisor Paul Mackerras
@ 2017-01-12 16:48 ` Balbir Singh
0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2017-01-12 16:48 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev, kvm, kvm-ppc
On Thu, Jan 12, 2017 at 08:07:09PM +1100, Paul Mackerras wrote:
> Currently, if the kernel is running on a POWER9 processor under a
> hypervisor, it will try to use the radix MMU even though it doesn't
> have the necessary code to use radix under a hypervisor (it doesn't
> negotiate use of radix, and it doesn't do the H_REGISTER_PROC_TBL
H_REGISTER_PROCESS_TABLE
for consistency
> hcall). The result is that the guest kernel will crash when it tries
> to turn on the MMU.
>
> This fixes it by looking for the /chosen/ibm,architecture-vec-5
> property, and if it exists, clears the radix MMU feature bit,
> before we decide whether to initialize for radix or HPT. This
> property is created by the hypervisor as a result of the guest
> calling the ibm,client-architecture-support method to indicate
> its capabilities, so it will indicate whether the hypervisor
> agreed to us using radix.
>
> Systems without a hypervisor may have this property also (for
> example, skiboot creates it), so we check the HV bit in the MSR
> to see whether we are running as a guest or not. If we are in
> hypervisor mode, then we can do whatever we like including
> using the radix MMU.
>
> The reason for using this property is that in future, when we
> have support for using radix under a hypervisor, we will need
> to check this property to see whether the hypervisor agreed to
> us using radix.
>
> Cc: stable@vger.kernel.org # v4.8+
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/mm/init_64.c | 33 +++++++++++++++++++++++++++++++++
> 1 file changed, 33 insertions(+)
>
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index 93abf8a..4d9481e 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -42,6 +42,8 @@
> #include <linux/memblock.h>
> #include <linux/hugetlb.h>
> #include <linux/slab.h>
> +#include <linux/of_fdt.h>
> +#include <linux/libfdt.h>
>
> #include <asm/pgalloc.h>
> #include <asm/page.h>
> @@ -344,12 +346,43 @@ static int __init parse_disable_radix(char *p)
> }
> early_param("disable_radix", parse_disable_radix);
>
> +/*
> + * If we're running under a hypervisor, we currently can't do radix
> + * since we don't have the code to do the H_REGISTER_PROC_TBL hcall.
_PROCESS_TABLE
> + * We tell that we're running under a hypervisor by looking for the
> + * /chosen/ibm,architecture-vec-5 property.
> + */
> +static void early_check_vec5(void)
> +{
The function is called early_check, but it also disables MMU_FTR_TYPE_RADIX,
should the disabling be moved out to the caller, since the check has
nothing to do with disabling the feature?
> + unsigned long root, chosen;
> + int size;
> + const u8 *vec5;
> +
> + root = of_get_flat_dt_root();
> + chosen = of_get_flat_dt_subnode_by_name(root, "chosen");
> + if (chosen == -FDT_ERR_NOTFOUND)
> + return;
> + vec5 = of_get_flat_dt_prop(chosen, "ibm,architecture-vec-5", &size);
> + if (!vec5)
> + return;
> + cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
> +}
> +
> void __init mmu_early_init_devtree(void)
> {
> /* Disable radix mode based on kernel command line. */
> if (disable_radix)
> cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
>
> + /*
> + * Check /chosen/ibm,architecture-vec-5 if running as a guest.
> + * When running bare-metal, we can use radix if we like
> + * even though the ibm,architecture-vec-5 property created by
> + * skiboot doesn't have the necessary bits set.
> + */
> + if (early_radix_enabled() && !(mfmsr() & MSR_HV))
> + early_check_vec5();
Balbir Singh.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 02/18] powerpc/64: Fixes for the ibm, client-architecture-support options
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
2017-01-12 9:07 ` [PATCH 01/18] powerpc/64: Don't try to use radix MMU under a hypervisor Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 9:07 ` [PATCH 03/18] powerpc/64: Always enable radix support for 64-bit Book 3S kernels Paul Mackerras
` (15 subsequent siblings)
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This fixes the values for some of the option vector 5 bits in
the ibm,client-architecture-support vector 5. The "platform
facilities options" bits are in byte 17 not byte 14, so the
upper 8 bits of their definitions need to be 0x11 not 0x0E.
The "sub processor support" option is in byte 21 not byte 15.
When checking whether option bits are set, we should check that
the offset of the byte being checked is less than the vector
length that we got from the hypervisor.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/prom.h | 8 ++++----
arch/powerpc/platforms/pseries/firmware.c | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 5e57705..e6d83d0 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -151,10 +151,10 @@ struct of_drconf_cell {
#define OV5_XCMO 0x0440 /* Page Coalescing */
#define OV5_TYPE1_AFFINITY 0x0580 /* Type 1 NUMA affinity */
#define OV5_PRRN 0x0540 /* Platform Resource Reassignment */
-#define OV5_PFO_HW_RNG 0x0E80 /* PFO Random Number Generator */
-#define OV5_PFO_HW_842 0x0E40 /* PFO Compression Accelerator */
-#define OV5_PFO_HW_ENCR 0x0E20 /* PFO Encryption Accelerator */
-#define OV5_SUB_PROCESSORS 0x0F01 /* 1,2,or 4 Sub-Processors supported */
+#define OV5_PFO_HW_RNG 0x1180 /* PFO Random Number Generator */
+#define OV5_PFO_HW_842 0x1140 /* PFO Compression Accelerator */
+#define OV5_PFO_HW_ENCR 0x1120 /* PFO Encryption Accelerator */
+#define OV5_SUB_PROCESSORS 0x1501 /* 1,2,or 4 Sub-Processors supported */
/* Option Vector 6: IBM PAPR hints */
#define OV6_LINUX 0x02 /* Linux is our OS */
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index ea7f09b..7d67623 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -126,7 +126,7 @@ static void __init fw_vec5_feature_init(const char *vec5, unsigned long len)
index = OV5_INDX(vec5_fw_features_table[i].feature);
feat = OV5_FEAT(vec5_fw_features_table[i].feature);
- if (vec5[index] & feat)
+ if (index < len && (vec5[index] & feat))
powerpc_firmware_features |=
vec5_fw_features_table[i].val;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 03/18] powerpc/64: Always enable radix support for 64-bit Book 3S kernels
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
2017-01-12 9:07 ` [PATCH 01/18] powerpc/64: Don't try to use radix MMU under a hypervisor Paul Mackerras
2017-01-12 9:07 ` [PATCH 02/18] powerpc/64: Fixes for the ibm, client-architecture-support options Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 9:07 ` [PATCH 04/18] powerpc/64: Enable use of radix MMU under hypervisor on POWER9 Paul Mackerras
` (14 subsequent siblings)
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This removes the ability for the user to choose whether or not to
include support for the radix MMU in kernels built to run on 64-bit
Book 3S machines. Excluding radix support saves only about 25kiB
of text and 13kiB of data, a total of little over half a page.
Having the option expands the space of option combinations that
need to be tested, which is an ongoing burden on developers.
Given that the space savings are small, let's remove the option.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/platforms/Kconfig.cputype | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 6e89e5a..de88156 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -334,13 +334,8 @@ config PPC_STD_MMU_64
depends on PPC_STD_MMU && PPC64
config PPC_RADIX_MMU
- bool "Radix MMU Support"
+ def_bool y
depends on PPC_BOOK3S_64
- default y
- help
- Enable support for the Power ISA 3.0 Radix style MMU. Currently this
- is only implemented by IBM Power9 CPUs, if you don't have one of them
- you can probably disable this.
config PPC_MMU_NOHASH
def_bool y
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 04/18] powerpc/64: Enable use of radix MMU under hypervisor on POWER9
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (2 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 03/18] powerpc/64: Always enable radix support for 64-bit Book 3S kernels Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-22 2:17 ` kbuild test robot
2017-01-12 9:07 ` [PATCH 05/18] powerpc/64: More definitions for POWER9 Paul Mackerras
` (13 subsequent siblings)
17 siblings, 1 reply; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
To use radix as a guest, we first need to tell the hypervisor via
the ibm,client-architecture call first that we support POWER9 and
architecture v3.00, and that we can do either radix or hash and
that we would like to choose later using an hcall (the
H_REGISTER_PROC_TBL hcall).
Then we need to check whether the hypervisor agreed to us using
radix. We need to do this very early on in the kernel boot process
before any of the MMU initialization is done. If the hypervisor
doesn't agree, we can't use radix and therefore clear the radix
MMU feature bit.
Later, when we have set up our process table, which points to the
radix tree for each process, we need to install that using the
H_REGISTER_PROC_TBL hcall.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/book3s/64/mmu.h | 2 ++
arch/powerpc/include/asm/hvcall.h | 11 +++++++++++
arch/powerpc/include/asm/prom.h | 9 +++++++++
arch/powerpc/kernel/prom_init.c | 18 +++++++++++++++++-
arch/powerpc/mm/init_64.c | 12 +++++++-----
arch/powerpc/mm/pgtable-radix.c | 2 ++
arch/powerpc/platforms/pseries/lpar.c | 29 +++++++++++++++++++++++++++++
7 files changed, 77 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 8afb0e0..e8cbdc0 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -138,5 +138,7 @@ static inline void setup_initial_memory_limit(phys_addr_t first_memblock_base,
extern int (*register_process_table)(unsigned long base, unsigned long page_size,
unsigned long tbl_size);
+extern void radix_init_pseries(void);
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 77ff1ba..54d11b3 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -276,6 +276,7 @@
#define H_GET_MPP_X 0x314
#define H_SET_MODE 0x31C
#define H_CLEAR_HPT 0x358
+#define H_REGISTER_PROC_TBL 0x37C
#define H_SIGNAL_SYS_RESET 0x380
#define MAX_HCALL_OPCODE H_SIGNAL_SYS_RESET
@@ -313,6 +314,16 @@
#define H_SIGNAL_SYS_RESET_ALL_OTHERS -2
/* >= 0 values are CPU number */
+/* Flag values used in H_REGISTER_PROC_TBL hcall */
+#define PROC_TABLE_OP_MASK 0x18
+#define PROC_TABLE_DEREG 0x10
+#define PROC_TABLE_NEW 0x18
+#define PROC_TABLE_TYPE_MASK 0x06
+#define PROC_TABLE_HPT_SLB 0x00
+#define PROC_TABLE_HPT_PT 0x02
+#define PROC_TABLE_RADIX 0x04
+#define PROC_TABLE_GTSE 0x01
+
#ifndef __ASSEMBLY__
/**
diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index e6d83d0..8af2546 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -121,6 +121,8 @@ struct of_drconf_cell {
#define OV1_PPC_2_06 0x02 /* set if we support PowerPC 2.06 */
#define OV1_PPC_2_07 0x01 /* set if we support PowerPC 2.07 */
+#define OV1_PPC_3_00 0x80 /* set if we support PowerPC 3.00 */
+
/* Option vector 2: Open Firmware options supported */
#define OV2_REAL_MODE 0x20 /* set if we want OF in real mode */
@@ -155,6 +157,13 @@ struct of_drconf_cell {
#define OV5_PFO_HW_842 0x1140 /* PFO Compression Accelerator */
#define OV5_PFO_HW_ENCR 0x1120 /* PFO Encryption Accelerator */
#define OV5_SUB_PROCESSORS 0x1501 /* 1,2,or 4 Sub-Processors supported */
+#define OV5_XIVE_EXPLOIT 0x1701 /* XIVE exploitation supported */
+#define OV5_MMU_RADIX_300 0x1880 /* ISA v3.00 radix MMU supported */
+#define OV5_MMU_HASH_300 0x1840 /* ISA v3.00 hash MMU supported */
+#define OV5_MMU_SEGM_RADIX 0x1820 /* radix mode (no segmentation) */
+#define OV5_MMU_PROC_TBL 0x1810 /* hcall selects SLB or proc table */
+#define OV5_MMU_SLB 0x1800 /* always use SLB */
+#define OV5_MMU_GTSE 0x1808 /* Guest translation shootdown */
/* Option Vector 6: IBM PAPR hints */
#define OV6_LINUX 0x02 /* Linux is our OS */
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index ec47a93..358d43f 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -649,6 +649,7 @@ static void __init early_cmdline_parse(void)
struct option_vector1 {
u8 byte1;
u8 arch_versions;
+ u8 arch_versions3;
} __packed;
struct option_vector2 {
@@ -691,6 +692,9 @@ struct option_vector5 {
u8 reserved2;
__be16 reserved3;
u8 subprocessors;
+ u8 byte22;
+ u8 intarch;
+ u8 mmu;
} __packed;
struct option_vector6 {
@@ -700,7 +704,7 @@ struct option_vector6 {
} __packed;
struct ibm_arch_vec {
- struct { u32 mask, val; } pvrs[10];
+ struct { u32 mask, val; } pvrs[12];
u8 num_vectors;
@@ -750,6 +754,14 @@ struct ibm_arch_vec __cacheline_aligned ibm_architecture_vec = {
.val = cpu_to_be32(0x004d0000),
},
{
+ .mask = cpu_to_be32(0xffff0000), /* POWER9 */
+ .val = cpu_to_be32(0x004e0000),
+ },
+ {
+ .mask = cpu_to_be32(0xffffffff), /* all 3.00-compliant */
+ .val = cpu_to_be32(0x0f000005),
+ },
+ {
.mask = cpu_to_be32(0xffffffff), /* all 2.07-compliant */
.val = cpu_to_be32(0x0f000004),
},
@@ -774,6 +786,7 @@ struct ibm_arch_vec __cacheline_aligned ibm_architecture_vec = {
.byte1 = 0,
.arch_versions = OV1_PPC_2_00 | OV1_PPC_2_01 | OV1_PPC_2_02 | OV1_PPC_2_03 |
OV1_PPC_2_04 | OV1_PPC_2_05 | OV1_PPC_2_06 | OV1_PPC_2_07,
+ .arch_versions3 = OV1_PPC_3_00,
},
.vec2_len = VECTOR_LENGTH(sizeof(struct option_vector2)),
@@ -836,6 +849,9 @@ struct ibm_arch_vec __cacheline_aligned ibm_architecture_vec = {
.reserved2 = 0,
.reserved3 = 0,
.subprocessors = 1,
+ .intarch = 0,
+ .mmu = OV5_FEAT(OV5_MMU_RADIX_300) | OV5_FEAT(OV5_MMU_HASH_300) |
+ OV5_FEAT(OV5_MMU_PROC_TBL) | OV5_FEAT(OV5_MMU_GTSE),
},
/* option vector 6: IBM PAPR hints */
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 4d9481e..10c9a54 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -347,10 +347,9 @@ static int __init parse_disable_radix(char *p)
early_param("disable_radix", parse_disable_radix);
/*
- * If we're running under a hypervisor, we currently can't do radix
- * since we don't have the code to do the H_REGISTER_PROC_TBL hcall.
- * We tell that we're running under a hypervisor by looking for the
- * /chosen/ibm,architecture-vec-5 property.
+ * If we're running under a hypervisor, we need to check the contents of
+ * /chosen/ibm,architecture-vec-5 to see if the hypervisor is willing to do
+ * radix. If not, we clear the radix feature bit so we fall back to hash.
*/
static void early_check_vec5(void)
{
@@ -365,7 +364,10 @@ static void early_check_vec5(void)
vec5 = of_get_flat_dt_prop(chosen, "ibm,architecture-vec-5", &size);
if (!vec5)
return;
- cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+ if (size <= OV5_INDX(OV5_MMU_RADIX_300) ||
+ !(vec5[OV5_INDX(OV5_MMU_RADIX_300)] & OV5_FEAT(OV5_MMU_RADIX_300)))
+ /* Hypervisor doesn't support radix */
+ cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
}
void __init mmu_early_init_devtree(void)
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index cfa53cc..94323c4 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -401,6 +401,8 @@ void __init radix__early_init_mmu(void)
mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
radix_init_partition_table();
radix_init_amor();
+ } else {
+ radix_init_pseries();
}
memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 5dc1c3c..364429c 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -609,6 +609,29 @@ static int __init disable_bulk_remove(char *str)
__setup("bulk_remove=", disable_bulk_remove);
+/* Actually only used for radix, so far */
+static int pSeries_lpar_register_process_table(unsigned long base,
+ unsigned long page_size, unsigned long table_size)
+{
+ long rc;
+ unsigned long flags = PROC_TABLE_NEW;
+
+ if (radix_enabled())
+ flags |= PROC_TABLE_RADIX | PROC_TABLE_GTSE;
+ for (;;) {
+ rc = plpar_hcall_norets(H_REGISTER_PROC_TBL, flags, base,
+ page_size, table_size);
+ if (!H_IS_LONG_BUSY(rc))
+ break;
+ mdelay(get_longbusy_msecs(rc));
+ }
+ if (rc != H_SUCCESS) {
+ pr_err("Failed to register process table (rc=%ld)\n", rc);
+ BUG();
+ }
+ return rc;
+}
+
void __init hpte_init_pseries(void)
{
mmu_hash_ops.hpte_invalidate = pSeries_lpar_hpte_invalidate;
@@ -622,6 +645,12 @@ void __init hpte_init_pseries(void)
mmu_hash_ops.hugepage_invalidate = pSeries_lpar_hugepage_invalidate;
}
+void radix_init_pseries(void)
+{
+ pr_info("Using radix MMU under hypervisor\n");
+ register_process_table = pSeries_lpar_register_process_table;
+}
+
#ifdef CONFIG_PPC_SMLPAR
#define CMO_FREE_HINT_DEFAULT 1
static int cmo_free_hint_flag = CMO_FREE_HINT_DEFAULT;
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 05/18] powerpc/64: More definitions for POWER9
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (3 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 04/18] powerpc/64: Enable use of radix MMU under hypervisor on POWER9 Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 9:07 ` [PATCH 06/18] powerpc/64: Export pgtable_cache and pgtable_cache_add for KVM Paul Mackerras
` (12 subsequent siblings)
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This adds definitions for bits in the DSISR register which are used
by POWER9 for various translation-related exception conditions, and
for some more bits in the partition table entry that will be needed
by KVM.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/book3s/64/mmu.h | 12 +++++++++++-
arch/powerpc/include/asm/reg.h | 4 ++++
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index e8cbdc0..d827825 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -44,10 +44,20 @@ struct patb_entry {
};
extern struct patb_entry *partition_tb;
+/* Bits in patb0 field */
#define PATB_HR (1UL << 63)
-#define PATB_GR (1UL << 63)
#define RPDB_MASK 0x0ffffffffffff00fUL
#define RPDB_SHIFT (1UL << 8)
+#define RTS1_SHIFT 61 /* top 2 bits of radix tree size */
+#define RTS1_MASK (3UL << RTS1_SHIFT)
+#define RTS2_SHIFT 5 /* bottom 3 bits of radix tree size */
+#define RTS2_MASK (7UL << RTS2_SHIFT)
+#define RPDS_MASK 0x1f /* root page dir. size field */
+
+/* Bits in patb1 field */
+#define PATB_GR (1UL << 63) /* guest uses radix; must match HR */
+#define PRTS_MASK 0x1f /* process table size field */
+
/*
* Limit process table to PAGE_SIZE table. This
* also limit the max pid we can support.
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 0d4531a..aa44a83 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -274,10 +274,14 @@
#define SPRN_DSISR 0x012 /* Data Storage Interrupt Status Register */
#define DSISR_NOHPTE 0x40000000 /* no translation found */
#define DSISR_PROTFAULT 0x08000000 /* protection fault */
+#define DSISR_BADACCESS 0x04000000 /* bad access to CI or G */
#define DSISR_ISSTORE 0x02000000 /* access was a store */
#define DSISR_DABRMATCH 0x00400000 /* hit data breakpoint */
#define DSISR_NOSEGMENT 0x00200000 /* SLB miss */
#define DSISR_KEYFAULT 0x00200000 /* Key fault */
+#define DSISR_UNSUPP_MMU 0x00080000 /* Unsupported MMU config */
+#define DSISR_SET_RC 0x00040000 /* Failed setting of R/C bits */
+#define DSISR_PGDIRFAULT 0x00020000 /* Fault on page directory */
#define SPRN_TBRL 0x10C /* Time Base Read Lower Register (user, R/O) */
#define SPRN_TBRU 0x10D /* Time Base Read Upper Register (user, R/O) */
#define SPRN_CIR 0x11B /* Chip Information Register (hyper, R/0) */
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 06/18] powerpc/64: Export pgtable_cache and pgtable_cache_add for KVM
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (4 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 05/18] powerpc/64: More definitions for POWER9 Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 9:07 ` [PATCH 07/18] powerpc/64: Make type of partition table flush depend on partition type Paul Mackerras
` (11 subsequent siblings)
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This exports the pgtable_cache array and the pgtable_cache_add
function so that HV KVM can use them for allocating radix page
tables for guests.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/mm/init-common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index a175cd8..2be5dc2 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -41,6 +41,7 @@ static void pmd_ctor(void *addr)
}
struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
+EXPORT_SYMBOL_GPL(pgtable_cache); /* used by kvm_hv module */
/*
* Create a kmem_cache() for pagetables. This is not used for PTE
@@ -82,7 +83,7 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
pgtable_cache[shift - 1] = new;
pr_debug("Allocated pgtable cache for order %d\n", shift);
}
-
+EXPORT_SYMBOL_GPL(pgtable_cache_add); /* used by kvm_hv module */
void pgtable_cache_init(void)
{
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 07/18] powerpc/64: Make type of partition table flush depend on partition type
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (5 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 06/18] powerpc/64: Export pgtable_cache and pgtable_cache_add for KVM Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 9:07 ` [PATCH 08/18] KVM: PPC: Book3S HV: Don't try to signal cpu -1 Paul Mackerras
` (10 subsequent siblings)
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
When changing a partition table entry on POWER9, we do a particular
form of the tlbie instruction which flushes all TLBs and caches of
the partition table for a given logical partition ID (LPID).
This instruction has a field in the instruction word, labelled R
(radix), which should be 1 if the partition was previously a radix
partition and 0 if it was a HPT partition. This implements that
logic.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/mm/pgtable_64.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 8bca7f5..d6b5e5c 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -454,13 +454,23 @@ void __init mmu_partition_table_init(void)
void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
unsigned long dw1)
{
+ unsigned long old = be64_to_cpu(partition_tb[lpid].patb0);
+
partition_tb[lpid].patb0 = cpu_to_be64(dw0);
partition_tb[lpid].patb1 = cpu_to_be64(dw1);
- /* Global flush of TLBs and partition table caches for this lpid */
+ /*
+ * Global flush of TLBs and partition table caches for this lpid.
+ * The type of flush (hash or radix) depends on what the previous
+ * use of this partition ID was, not the new use.
+ */
asm volatile("ptesync" : : : "memory");
- asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : :
- "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
+ if (old & PATB_HR)
+ asm volatile(PPC_TLBIE_5(%0,%1,2,0,1) : :
+ "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
+ else
+ asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : :
+ "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
}
EXPORT_SYMBOL_GPL(mmu_partition_table_set_entry);
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 08/18] KVM: PPC: Book3S HV: Don't try to signal cpu -1
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (6 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 07/18] powerpc/64: Make type of partition table flush depend on partition type Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 9:07 ` [PATCH 09/18] KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU Paul Mackerras
` (9 subsequent siblings)
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
If the target vcpu for kvmppc_fast_vcpu_kick_hv() is not running on
any CPU, then we will have vcpu->arch.thread_cpu == -1, and as it
happens, kvmppc_fast_vcpu_kick_hv will call kvmppc_ipi_thread with
-1 as the cpu argument. Although this is not meaningful, in the past,
before commit 1704a81ccebc ("KVM: PPC: Book3S HV: Use msgsnd for IPIs
to other cores on POWER9", 2016-11-18), it was harmless because CPU
-1 is not in the same core as any real CPU thread. On a POWER9,
however, we don't do the "same core" check, so we were trying to
do a msgsnd to thread -1, which is invalid. To avoid this, we add
a check to see that vcpu->arch.thread_cpu is >= 0 before calling
kvmppc_ipi_thread() with it. Since vcpu->arch.thread_vcpu can change
asynchronously, we use READ_ONCE to ensure that the value we check is
the same value that we use as the argument to kvmppc_ipi_thread().
Fixes: 1704a81ccebc ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ec34e39..8d9cc07 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -182,7 +182,8 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
++vcpu->stat.halt_wakeup;
}
- if (kvmppc_ipi_thread(vcpu->arch.thread_cpu))
+ cpu = READ_ONCE(vcpu->arch.thread_cpu);
+ if (cpu >= 0 && kvmppc_ipi_thread(cpu))
return;
/* CPU points to the first thread of the core */
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 09/18] KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (7 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 08/18] KVM: PPC: Book3S HV: Don't try to signal cpu -1 Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 9:07 ` [PATCH 10/18] KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9 Paul Mackerras
` (8 subsequent siblings)
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This adds two capabilities and two ioctls to allow userspace to
find out about and configure the POWER9 MMU in a guest. The two
capabilities tell userspace whether KVM can support a guest using
the radix MMU, or using the hashed page table (HPT) MMU with a
process table and segment tables. (Note that the MMUs in the
POWER9 processor cores do not use the process and segment tables
when in HPT mode, but the nest MMU does).
The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
whether a guest will use the radix MMU or the HPT MMU, and to
specify the size and location (in guest space) of the process
table.
The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
the radix MMU. It returns a list of supported radix tree geometries
(base page size and number of bits indexed at each level of the
radix tree) and the encoding used to specify the various page
sizes for the TLB invalidate entry instruction.
Initially, both capabilities return 0 and the ioctls return -EINVAL,
until the necessary infrastructure for them to operate correctly
is added.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
Documentation/virtual/kvm/api.txt | 83 +++++++++++++++++++++++++++++++++++++
arch/powerpc/include/asm/kvm_ppc.h | 2 +
arch/powerpc/include/uapi/asm/kvm.h | 20 +++++++++
arch/powerpc/kvm/book3s_hv.c | 13 ++++++
arch/powerpc/kvm/powerpc.c | 32 ++++++++++++++
include/uapi/linux/kvm.h | 6 +++
6 files changed, 156 insertions(+)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 03145b7..4470671 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3201,6 +3201,71 @@ struct kvm_reinject_control {
pit_reinject = 0 (!reinject mode) is recommended, unless running an old
operating system that uses the PIT for timing (e.g. Linux 2.4.x).
+4.99 KVM_PPC_CONFIGURE_V3_MMU
+
+Capability: KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3
+Architectures: ppc
+Type: vm ioctl
+Parameters: struct kvm_ppc_mmuv3_cfg (in)
+Returns: 0 on success,
+ -EFAULT if struct kvm_ppc_mmuv3_cfg cannot be read,
+ -EINVAL if the configuration is invalid
+
+This ioctl controls whether the guest will use radix or HPT (hashed
+page table) translation, and sets the pointer to the process table for
+the guest.
+
+struct kvm_ppc_mmuv3_cfg {
+ __u64 flags;
+ __u64 process_table;
+};
+
+There are two bits that can be set in flags; KVM_PPC_MMUV3_RADIX and
+KVM_PPC_MMUV3_GTSE. KVM_PPC_MMUV3_RADIX, if set, configures the guest
+to use radix tree translation, and if clear, to use HPT translation.
+KVM_PPC_MMUV3_GTSE, if set and if KVM permits it, configures the guest
+to be able to use the global TLB and SLB invalidation instructions;
+if clear, the guest may not use these instructions.
+
+The process_table field specifies the address and size of the guest
+process table, which is in the guest's space. This field is formatted
+as the second doubleword of the partition table entry, as defined in
+the Power ISA V3.00, Book III section 5.7.6.1.
+
+4.100 KVM_PPC_GET_RMMU_INFO
+
+Capability: KVM_CAP_PPC_RADIX_MMU
+Architectures: ppc
+Type: vm ioctl
+Parameters: struct kvm_ppc_rmmu_info (out)
+Returns: 0 on success,
+ -EFAULT if struct kvm_ppc_rmmu_info cannot be written,
+ -EINVAL if no useful information can be returned
+
+This ioctl returns a structure containing two things: (a) a list
+containing supported radix tree geometries, and (b) a list that maps
+page sizes to put in the "AP" (actual page size) field for the tlbie
+(TLB invalidate entry) instruction.
+
+struct kvm_ppc_rmmu_info {
+ struct kvm_ppc_radix_geom {
+ __u8 page_shift;
+ __u8 level_bits[4];
+ __u8 pad[3];
+ } geometries[8];
+ __u32 ap_encodings[8];
+};
+
+The geometries[] field gives up to 8 supported geometries for the
+radix page table, in terms of the log base 2 of the smallest page
+size, and the number of bits indexed at each level of the tree, from
+the PTE level up to the PGD level in that order. Any unused entries
+will have 0 in the page_shift field.
+
+The ap_encodings gives the supported page sizes and their AP field
+encodings, encoded with the AP value in the top 3 bits and the log
+base 2 of the page size in the bottom 6 bits.
+
5. The kvm_run structure
------------------------
@@ -3942,3 +4007,21 @@ In order to use SynIC, it has to be activated by setting this
capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
will disable the use of APIC hardware virtualization even if supported
by the CPU, as it's incompatible with SynIC auto-EOI behavior.
+
+8.3 KVM_CAP_PPC_RADIX_MMU
+
+Architectures: ppc
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that that the kernel can support guests using the
+radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
+processor).
+
+8.4 KVM_CAP_PPC_HASH_MMU_V3
+
+Architectures: ppc
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that that the kernel can support guests using the
+hashed page table MMU defined in Power ISA V3.00 (as implemented in
+the POWER9 processor), including in-memory segment tables.
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2da67bf..48c760f 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -291,6 +291,8 @@ struct kvmppc_ops {
struct irq_bypass_producer *);
void (*irq_bypass_del_producer)(struct irq_bypass_consumer *,
struct irq_bypass_producer *);
+ int (*configure_mmu)(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg);
+ int (*get_rmmu_info)(struct kvm *kvm, struct kvm_ppc_rmmu_info *info);
};
extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 3603b6f..cc0908b 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -413,6 +413,26 @@ struct kvm_get_htab_header {
__u16 n_invalid;
};
+/* For KVM_PPC_CONFIGURE_V3_MMU */
+struct kvm_ppc_mmuv3_cfg {
+ __u64 flags;
+ __u64 process_table; /* second doubleword of partition table entry */
+};
+
+/* Flag values for KVM_PPC_CONFIGURE_V3_MMU */
+#define KVM_PPC_MMUV3_RADIX 1 /* 1 = radix mode, 0 = HPT */
+#define KVM_PPC_MMUV3_GTSE 2 /* global translation shootdown enb. */
+
+/* For KVM_PPC_GET_RMMU_INFO */
+struct kvm_ppc_rmmu_info {
+ struct kvm_ppc_radix_geom {
+ __u8 page_shift;
+ __u8 level_bits[4];
+ __u8 pad[3];
+ } geometries[8];
+ __u32 ap_encodings[8];
+};
+
/* Per-vcpu XICS interrupt controller state */
#define KVM_REG_PPC_ICP_STATE (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x8c)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8d9cc07..1736f87 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3658,6 +3658,17 @@ static void init_default_hcalls(void)
}
}
+/* dummy implementations for now */
+static int kvmhv_configure_mmu(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
+{
+ return -EINVAL;
+}
+
+static int kvmhv_get_rmmu_info(struct kvm *kvm, struct kvm_ppc_rmmu_info *info)
+{
+ return -EINVAL;
+}
+
static struct kvmppc_ops kvm_ops_hv = {
.get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
.set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
@@ -3695,6 +3706,8 @@ static struct kvmppc_ops kvm_ops_hv = {
.irq_bypass_add_producer = kvmppc_irq_bypass_add_producer_hv,
.irq_bypass_del_producer = kvmppc_irq_bypass_del_producer_hv,
#endif
+ .configure_mmu = kvmhv_configure_mmu,
+ .get_rmmu_info = kvmhv_get_rmmu_info,
};
static int kvm_init_subcore_bitmap(void)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index cd892de..38c0d15 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -565,6 +565,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_PPC_HWRNG:
r = kvmppc_hwrng_present();
break;
+ case KVM_CAP_PPC_MMU_RADIX:
+ r = !!(0 && hv_enabled && radix_enabled());
+ break;
+ case KVM_CAP_PPC_MMU_HASH_V3:
+ r = !!(0 && hv_enabled && !radix_enabled() &&
+ cpu_has_feature(CPU_FTR_ARCH_300));
+ break;
#endif
case KVM_CAP_SYNC_MMU:
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -1468,6 +1475,31 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_vm_ioctl_rtas_define_token(kvm, argp);
break;
}
+ case KVM_PPC_CONFIGURE_V3_MMU: {
+ struct kvm *kvm = filp->private_data;
+ struct kvm_ppc_mmuv3_cfg cfg;
+
+ r = -EINVAL;
+ if (!kvm->arch.kvm_ops->configure_mmu)
+ goto out;
+ r = -EFAULT;
+ if (copy_from_user(&cfg, argp, sizeof(cfg)))
+ goto out;
+ r = kvm->arch.kvm_ops->configure_mmu(kvm, &cfg);
+ break;
+ }
+ case KVM_PPC_GET_RMMU_INFO: {
+ struct kvm *kvm = filp->private_data;
+ struct kvm_ppc_rmmu_info info;
+
+ r = -EINVAL;
+ if (!kvm->arch.kvm_ops->get_rmmu_info)
+ goto out;
+ r = kvm->arch.kvm_ops->get_rmmu_info(kvm, &info);
+ if (r >= 0 && copy_to_user(argp, &info, sizeof(info)))
+ r = -EFAULT;
+ break;
+ }
default: {
struct kvm *kvm = filp->private_data;
r = kvm->arch.kvm_ops->arch_vm_ioctl(filp, ioctl, arg);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index cac48ed..e003580 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -871,6 +871,8 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_S390_USER_INSTR0 130
#define KVM_CAP_MSI_DEVID 131
#define KVM_CAP_PPC_HTM 132
+#define KVM_CAP_PPC_MMU_RADIX 134
+#define KVM_CAP_PPC_MMU_HASH_V3 135
#ifdef KVM_CAP_IRQ_ROUTING
@@ -1187,6 +1189,10 @@ struct kvm_s390_ucas_mapping {
#define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
/* Available with KVM_CAP_PPC_RTAS */
#define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
+/* Available with KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3 */
+#define KVM_PPC_CONFIGURE_V3_MMU _IOW(KVMIO, 0xaf, struct kvm_ppc_mmuv3_cfg)
+/* Available with KVM_CAP_PPC_RADIX_MMU */
+#define KVM_PPC_GET_RMMU_INFO _IOW(KVMIO, 0xb0, struct kvm_ppc_rmmu_info)
/* ioctl for vm fd */
#define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device)
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 10/18] KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (8 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 09/18] KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-23 2:39 ` Suraj Jitindar Singh
2017-01-12 9:07 ` [PATCH 11/18] KVM: PPC: Book3S HV: Add basic infrastructure for radix guests Paul Mackerras
` (7 subsequent siblings)
17 siblings, 1 reply; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl
for HPT guests on POWER9. With this, we can return 1 for the
KVM_CAP_PPC_MMU_HASH_V3 capability.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/powerpc/kvm/book3s_hv.c | 35 +++++++++++++++++++++++++++++++----
arch/powerpc/kvm/powerpc.c | 2 +-
3 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index e59b172..944532d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -264,6 +264,7 @@ struct kvm_arch {
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
int hpt_cma_alloc;
+ u64 process_table;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1736f87..6bd0f4a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3092,8 +3092,8 @@ static void kvmppc_setup_partition_table(struct kvm *kvm)
/* HTABSIZE and HTABORG fields */
dw0 |= kvm->arch.sdr1;
- /* Second dword has GR=0; other fields are unused since UPRT=0 */
- dw1 = 0;
+ /* Second dword as set by userspace */
+ dw1 = kvm->arch.process_table;
mmu_partition_table_set_entry(kvm->arch.lpid, dw0, dw1);
}
@@ -3658,10 +3658,37 @@ static void init_default_hcalls(void)
}
}
-/* dummy implementations for now */
static int kvmhv_configure_mmu(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
{
- return -EINVAL;
+ unsigned long lpcr;
+
+ /* If not on a POWER9, reject it */
+ if (!cpu_has_feature(CPU_FTR_ARCH_300))
+ return -ENODEV;
+
+ /* If any unknown flags set, reject it */
+ if (cfg->flags & ~(KVM_PPC_MMUV3_RADIX | KVM_PPC_MMUV3_GTSE))
+ return -EINVAL;
+
+ /* We can't do radix yet */
+ if (cfg->flags & KVM_PPC_MMUV3_RADIX)
+ return -EINVAL;
+
+ /* GR (guest radix) bit in process_table field must match */
+ if (cfg->process_table & PATB_GR)
+ return -EINVAL;
+
+ /* Process table size field must be reasonable, i.e. <= 24 */
+ if ((cfg->process_table & PRTS_MASK) > 24)
+ return -EINVAL;
+
+ kvm->arch.process_table = cfg->process_table;
+ kvmppc_setup_partition_table(kvm);
+
+ lpcr = (cfg->flags & KVM_PPC_MMUV3_GTSE) ? LPCR_GTSE : 0;
+ kvmppc_update_lpcr(kvm, lpcr, LPCR_GTSE);
+
+ return 0;
}
static int kvmhv_get_rmmu_info(struct kvm *kvm, struct kvm_ppc_rmmu_info *info)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 38c0d15..1476a48 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -569,7 +569,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = !!(0 && hv_enabled && radix_enabled());
break;
case KVM_CAP_PPC_MMU_HASH_V3:
- r = !!(0 && hv_enabled && !radix_enabled() &&
+ r = !!(hv_enabled && !radix_enabled() &&
cpu_has_feature(CPU_FTR_ARCH_300));
break;
#endif
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 10/18] KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9
2017-01-12 9:07 ` [PATCH 10/18] KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9 Paul Mackerras
@ 2017-01-23 2:39 ` Suraj Jitindar Singh
2017-01-23 4:37 ` Paul Mackerras
0 siblings, 1 reply; 27+ messages in thread
From: Suraj Jitindar Singh @ 2017-01-23 2:39 UTC (permalink / raw)
To: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc
On Thu, 2017-01-12 at 20:07 +1100, Paul Mackerras wrote:
> This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl
> for HPT guests on POWER9. With this, we can return 1 for the
> KVM_CAP_PPC_MMU_HASH_V3 capability.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/include/asm/kvm_host.h | 1 +
> arch/powerpc/kvm/book3s_hv.c | 35
> +++++++++++++++++++++++++++++++----
> arch/powerpc/kvm/powerpc.c | 2 +-
> 3 files changed, 33 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h
> b/arch/powerpc/include/asm/kvm_host.h
> index e59b172..944532d 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -264,6 +264,7 @@ struct kvm_arch {
> atomic_t hpte_mod_interest;
> cpumask_t need_tlb_flush;
> int hpt_cma_alloc;
> + u64 process_table;
> struct dentry *debugfs_dir;
> struct dentry *htab_dentry;
> #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> diff --git a/arch/powerpc/kvm/book3s_hv.c
> b/arch/powerpc/kvm/book3s_hv.c
> index 1736f87..6bd0f4a 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3092,8 +3092,8 @@ static void kvmppc_setup_partition_table(struct
> kvm *kvm)
> /* HTABSIZE and HTABORG fields */
> dw0 |= kvm->arch.sdr1;
>
> - /* Second dword has GR=0; other fields are unused since
> UPRT=0 */
> - dw1 = 0;
> + /* Second dword as set by userspace */
> + dw1 = kvm->arch.process_table;
>
> mmu_partition_table_set_entry(kvm->arch.lpid, dw0, dw1);
> }
> @@ -3658,10 +3658,37 @@ static void init_default_hcalls(void)
> }
> }
>
> -/* dummy implementations for now */
> static int kvmhv_configure_mmu(struct kvm *kvm, struct
> kvm_ppc_mmuv3_cfg *cfg)
> {
> - return -EINVAL;
> + unsigned long lpcr;
> +
> + /* If not on a POWER9, reject it */
> + if (!cpu_has_feature(CPU_FTR_ARCH_300))
> + return -ENODEV;
> +
> + /* If any unknown flags set, reject it */
> + if (cfg->flags & ~(KVM_PPC_MMUV3_RADIX |
> KVM_PPC_MMUV3_GTSE))
> + return -EINVAL;
> +
> + /* We can't do radix yet */
> + if (cfg->flags & KVM_PPC_MMUV3_RADIX)
> + return -EINVAL;
> +
> + /* GR (guest radix) bit in process_table field must match */
> + if (cfg->process_table & PATB_GR)
> + return -EINVAL;
> +
> + /* Process table size field must be reasonable, i.e. <= 24
> */
> + if ((cfg->process_table & PRTS_MASK) > 24)
> + return -EINVAL;
> +
> + kvm->arch.process_table = cfg->process_table;
> + kvmppc_setup_partition_table(kvm);
> +
> + lpcr = (cfg->flags & KVM_PPC_MMUV3_GTSE) ? LPCR_GTSE : 0;
> + kvmppc_update_lpcr(kvm, lpcr, LPCR_GTSE);
> +
> + return 0;
> }
>
> static int kvmhv_get_rmmu_info(struct kvm *kvm, struct
> kvm_ppc_rmmu_info *info)
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 38c0d15..1476a48 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -569,7 +569,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm,
> long ext)
> r = !!(0 && hv_enabled && radix_enabled());
> break;
> case KVM_CAP_PPC_MMU_HASH_V3:
> - r = !!(0 && hv_enabled && !radix_enabled() &&
> + r = !!(hv_enabled && !radix_enabled() &&
Just because we have radix enabled, is it correct to preclude a hash
guest from running? Isn't it the case that we may have support for
radix but a guest choose to run in hash mode (for what ever reason)?
> cpu_has_feature(CPU_FTR_ARCH_300));
> break;
> #endif
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 10/18] KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9
2017-01-23 2:39 ` Suraj Jitindar Singh
@ 2017-01-23 4:37 ` Paul Mackerras
0 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-23 4:37 UTC (permalink / raw)
To: Suraj Jitindar Singh; +Cc: linuxppc-dev, kvm, kvm-ppc
On Mon, Jan 23, 2017 at 01:39:27PM +1100, Suraj Jitindar Singh wrote:
> On Thu, 2017-01-12 at 20:07 +1100, Paul Mackerras wrote:
> > This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl
> > for HPT guests on POWER9. With this, we can return 1 for the
> > KVM_CAP_PPC_MMU_HASH_V3 capability.
> >
> > Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> > ---
> > arch/powerpc/include/asm/kvm_host.h | 1 +
> > arch/powerpc/kvm/book3s_hv.c | 35
> > +++++++++++++++++++++++++++++++----
> > arch/powerpc/kvm/powerpc.c | 2 +-
> > 3 files changed, 33 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/kvm_host.h
> > b/arch/powerpc/include/asm/kvm_host.h
> > index e59b172..944532d 100644
> > --- a/arch/powerpc/include/asm/kvm_host.h
> > +++ b/arch/powerpc/include/asm/kvm_host.h
> > @@ -264,6 +264,7 @@ struct kvm_arch {
> > atomic_t hpte_mod_interest;
> > cpumask_t need_tlb_flush;
> > int hpt_cma_alloc;
> > + u64 process_table;
> > struct dentry *debugfs_dir;
> > struct dentry *htab_dentry;
> > #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> > diff --git a/arch/powerpc/kvm/book3s_hv.c
> > b/arch/powerpc/kvm/book3s_hv.c
> > index 1736f87..6bd0f4a 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -3092,8 +3092,8 @@ static void kvmppc_setup_partition_table(struct
> > kvm *kvm)
> > /* HTABSIZE and HTABORG fields */
> > dw0 |= kvm->arch.sdr1;
> >
> > - /* Second dword has GR=0; other fields are unused since
> > UPRT=0 */
> > - dw1 = 0;
> > + /* Second dword as set by userspace */
> > + dw1 = kvm->arch.process_table;
> >
> > mmu_partition_table_set_entry(kvm->arch.lpid, dw0, dw1);
> > }
> > @@ -3658,10 +3658,37 @@ static void init_default_hcalls(void)
> > }
> > }
> >
> > -/* dummy implementations for now */
> > static int kvmhv_configure_mmu(struct kvm *kvm, struct
> > kvm_ppc_mmuv3_cfg *cfg)
> > {
> > - return -EINVAL;
> > + unsigned long lpcr;
> > +
> > + /* If not on a POWER9, reject it */
> > + if (!cpu_has_feature(CPU_FTR_ARCH_300))
> > + return -ENODEV;
> > +
> > + /* If any unknown flags set, reject it */
> > + if (cfg->flags & ~(KVM_PPC_MMUV3_RADIX |
> > KVM_PPC_MMUV3_GTSE))
> > + return -EINVAL;
> > +
> > + /* We can't do radix yet */
> > + if (cfg->flags & KVM_PPC_MMUV3_RADIX)
> > + return -EINVAL;
> > +
> > + /* GR (guest radix) bit in process_table field must match */
> > + if (cfg->process_table & PATB_GR)
> > + return -EINVAL;
> > +
> > + /* Process table size field must be reasonable, i.e. <= 24
> > */
> > + if ((cfg->process_table & PRTS_MASK) > 24)
> > + return -EINVAL;
> > +
> > + kvm->arch.process_table = cfg->process_table;
> > + kvmppc_setup_partition_table(kvm);
> > +
> > + lpcr = (cfg->flags & KVM_PPC_MMUV3_GTSE) ? LPCR_GTSE : 0;
> > + kvmppc_update_lpcr(kvm, lpcr, LPCR_GTSE);
> > +
> > + return 0;
> > }
> >
> > static int kvmhv_get_rmmu_info(struct kvm *kvm, struct
> > kvm_ppc_rmmu_info *info)
> > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> > index 38c0d15..1476a48 100644
> > --- a/arch/powerpc/kvm/powerpc.c
> > +++ b/arch/powerpc/kvm/powerpc.c
> > @@ -569,7 +569,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm,
> > long ext)
> > r = !!(0 && hv_enabled && radix_enabled());
> > break;
> > case KVM_CAP_PPC_MMU_HASH_V3:
> > - r = !!(0 && hv_enabled && !radix_enabled() &&
> > + r = !!(hv_enabled && !radix_enabled() &&
> Just because we have radix enabled, is it correct to preclude a hash
> guest from running? Isn't it the case that we may have support for
> radix but a guest choose to run in hash mode (for what ever reason)?
At the moment it is correct, because we don't (yet) have the code to
switch to hash mode when entering a guest and switch back to radix
mode on exit.
Paul.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 11/18] KVM: PPC: Book3S HV: Add basic infrastructure for radix guests
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (9 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 10/18] KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9 Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 9:07 ` [PATCH 12/18] KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle " Paul Mackerras
` (6 subsequent siblings)
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This adds a field in struct kvm_arch and an inline helper to
indicate whether a guest is a radix guest or not, plus a new file
to contain the radix MMU code, which currently contains just a
translate function which knows how to traverse the guest page
tables to translate an address.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_book3s.h | 3 +
arch/powerpc/include/asm/kvm_book3s_64.h | 6 ++
arch/powerpc/include/asm/kvm_host.h | 2 +
arch/powerpc/kvm/Makefile | 3 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 ++-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 139 +++++++++++++++++++++++++++++++
6 files changed, 160 insertions(+), 3 deletions(-)
create mode 100644 arch/powerpc/kvm/book3s_64_mmu_radix.c
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 5cf306a..7adfcc0 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -182,6 +182,9 @@ extern void kvmppc_mmu_hpte_sysexit(void);
extern int kvmppc_mmu_hv_init(void);
extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc);
+extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
+ struct kvmppc_pte *gpte, bool data, bool iswrite);
+
/* XXX remove this export when load_last_inst() is generic */
extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 8482921..0db010c 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -36,6 +36,12 @@ static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu)
#endif
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+
+static inline bool kvm_is_radix(struct kvm *kvm)
+{
+ return kvm->arch.radix;
+}
+
#define KVM_DEFAULT_HPT_ORDER 24 /* 16MB HPT by default */
#endif
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 944532d..fb73518 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -264,6 +264,8 @@ struct kvm_arch {
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
int hpt_cma_alloc;
+ u8 radix;
+ pgd_t *pgtable;
u64 process_table;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 7dd89b7..b87ccde 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -70,7 +70,8 @@ endif
kvm-hv-y += \
book3s_hv.o \
book3s_hv_interrupts.o \
- book3s_64_mmu_hv.o
+ book3s_64_mmu_hv.o \
+ book3s_64_mmu_radix.o
kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
book3s_hv_rm_xics.o
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index b795dd1..c208bf3 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -119,6 +119,9 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
long err = -EBUSY;
long order;
+ if (kvm_is_radix(kvm))
+ return -EINVAL;
+
mutex_lock(&kvm->lock);
if (kvm->arch.hpte_setup_done) {
kvm->arch.hpte_setup_done = 0;
@@ -157,7 +160,7 @@ void kvmppc_free_hpt(struct kvm *kvm)
if (kvm->arch.hpt_cma_alloc)
kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt),
1 << (kvm->arch.hpt_order - PAGE_SHIFT));
- else
+ else if (kvm->arch.hpt_virt)
free_pages(kvm->arch.hpt_virt,
kvm->arch.hpt_order - PAGE_SHIFT);
}
@@ -1675,7 +1678,10 @@ void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu)
vcpu->arch.slb_nr = 32; /* POWER7/POWER8 */
- mmu->xlate = kvmppc_mmu_book3s_64_hv_xlate;
+ if (kvm_is_radix(vcpu->kvm))
+ mmu->xlate = kvmppc_mmu_radix_xlate;
+ else
+ mmu->xlate = kvmppc_mmu_book3s_64_hv_xlate;
mmu->reset_msr = kvmppc_mmu_book3s_64_hv_reset_msr;
vcpu->arch.hflags |= BOOK3S_HFLAG_SLB;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
new file mode 100644
index 0000000..9091407
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -0,0 +1,139 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Copyright 2016 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/page.h>
+#include <asm/mmu.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+
+/*
+ * Supported radix tree geometry.
+ * Like p9, we support either 5 or 9 bits at the first (lowest) level,
+ * for a page size of 64k or 4k.
+ */
+static int p9_supported_radix_bits[4] = { 5, 9, 9, 13 };
+
+int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
+ struct kvmppc_pte *gpte, bool data, bool iswrite)
+{
+ struct kvm *kvm = vcpu->kvm;
+ u32 pid;
+ int ret, level, ps;
+ __be64 prte, rpte;
+ unsigned long root, pte, index;
+ unsigned long rts, bits, offset;
+ unsigned long gpa;
+ unsigned long proc_tbl_size;
+
+ /* Work out effective PID */
+ switch (eaddr >> 62) {
+ case 0:
+ pid = vcpu->arch.pid;
+ break;
+ case 3:
+ pid = 0;
+ break;
+ default:
+ return -EINVAL;
+ }
+ proc_tbl_size = 1 << ((kvm->arch.process_table & PRTS_MASK) + 12);
+ if (pid * 16 >= proc_tbl_size)
+ return -EINVAL;
+
+ /* Read partition table to find root of tree for effective PID */
+ ret = kvm_read_guest(kvm, kvm->arch.process_table + pid * 16,
+ &prte, sizeof(prte));
+ if (ret)
+ return ret;
+
+ root = be64_to_cpu(prte);
+ rts = ((root & RTS1_MASK) >> (RTS1_SHIFT - 3)) |
+ ((root & RTS2_MASK) >> RTS2_SHIFT);
+ bits = root & RPDS_MASK;
+ root = root & RPDB_MASK;
+
+ /* P9 DD1 interprets RTS (radix tree size) differently */
+ offset = rts + 31;
+ if (cpu_has_feature(CPU_FTR_POWER9_DD1))
+ offset -= 3;
+
+ /* current implementations only support 52-bit space */
+ if (offset != 52)
+ return -EINVAL;
+
+ for (level = 3; level >= 0; --level) {
+ if (level && bits != p9_supported_radix_bits[level])
+ return -EINVAL;
+ if (level == 0 && !(bits == 5 || bits == 9))
+ return -EINVAL;
+ offset -= bits;
+ index = (eaddr >> offset) & ((1UL << bits) - 1);
+ /* check that low bits of page table base are zero */
+ if (root & ((1UL << (bits + 3)) - 1))
+ return -EINVAL;
+ ret = kvm_read_guest(kvm, root + index * 8,
+ &rpte, sizeof(rpte));
+ if (ret)
+ return ret;
+ pte = __be64_to_cpu(rpte);
+ if (!(pte & _PAGE_PRESENT))
+ return -ENOENT;
+ if (pte & _PAGE_PTE)
+ break;
+ bits = pte & 0x1f;
+ root = pte & 0x0fffffffffffff00ul;
+ }
+ /* need a leaf at lowest level; 512GB pages not supported */
+ if (level < 0 || level == 3)
+ return -EINVAL;
+
+ /* offset is now log base 2 of the page size */
+ gpa = pte & 0x01fffffffffff000ul;
+ if (gpa & ((1ul << offset) - 1))
+ return -EINVAL;
+ gpa += eaddr & ((1ul << offset) - 1);
+ for (ps = MMU_PAGE_4K; ps < MMU_PAGE_COUNT; ++ps)
+ if (offset == mmu_psize_defs[ps].shift)
+ break;
+ gpte->page_size = ps;
+
+ gpte->eaddr = eaddr;
+ gpte->raddr = gpa;
+
+ /* Work out permissions */
+ gpte->may_read = !!(pte & _PAGE_READ);
+ gpte->may_write = !!(pte & _PAGE_WRITE);
+ gpte->may_execute = !!(pte & _PAGE_EXEC);
+ if (kvmppc_get_msr(vcpu) & MSR_PR) {
+ if (pte & _PAGE_PRIVILEGED) {
+ gpte->may_read = 0;
+ gpte->may_write = 0;
+ gpte->may_execute = 0;
+ }
+ } else {
+ if (!(pte & _PAGE_PRIVILEGED)) {
+ /* Check AMR/IAMR to see if strict mode is in force */
+ if (vcpu->arch.amr & (1ul << 62))
+ gpte->may_read = 0;
+ if (vcpu->arch.amr & (1ul << 63))
+ gpte->may_write = 0;
+ if (vcpu->arch.iamr & (1ul << 62))
+ gpte->may_execute = 0;
+ }
+ }
+
+ return 0;
+}
+
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 12/18] KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guests
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (10 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 11/18] KVM: PPC: Book3S HV: Add basic infrastructure for radix guests Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 9:07 ` [PATCH 13/18] KVM: PPC: Book3S HV: Page table construction and page faults for " Paul Mackerras
` (5 subsequent siblings)
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This adds code to branch around the parts that radix guests don't
need - clearing and loading the SLB with the guest SLB contents,
flushing the TLB on first entry on each physical CPU, and saving
the guest SLB contents on exit.
Since the host is now using radix, we need to save and restore the
host value for the PID register.
On hypervisor data/instruction storage interrupts, we don't do the
guest HPT lookup on radix, but just save the guest physical address
for the fault (from the ASDR register) in the vcpu struct.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/powerpc/kernel/asm-offsets.c | 2 ++
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 58 ++++++++++++++++++++++++++-------
3 files changed, 50 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index fb73518..da1421a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -606,6 +606,7 @@ struct kvm_vcpu_arch {
ulong fault_dar;
u32 fault_dsisr;
unsigned long intr_msr;
+ ulong fault_gpa; /* guest real address of page fault (POWER9) */
#endif
#ifdef CONFIG_BOOKE
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 0601e6a..3afa0ad 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -498,6 +498,7 @@ int main(void)
DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+ DEFINE(KVM_RADIX, offsetof(struct kvm, arch.radix));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
@@ -537,6 +538,7 @@ int main(void)
DEFINE(VCPU_SLB_NR, offsetof(struct kvm_vcpu, arch.slb_nr));
DEFINE(VCPU_FAULT_DSISR, offsetof(struct kvm_vcpu, arch.fault_dsisr));
DEFINE(VCPU_FAULT_DAR, offsetof(struct kvm_vcpu, arch.fault_dar));
+ DEFINE(VCPU_FAULT_GPA, offsetof(struct kvm_vcpu, arch.fault_gpa));
DEFINE(VCPU_INTR_MSR, offsetof(struct kvm_vcpu, arch.intr_msr));
DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
DEFINE(VCPU_TRAP, offsetof(struct kvm_vcpu, arch.trap));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 9338a81..f638f3e 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -518,6 +518,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
/* Stack frame offsets */
#define STACK_SLOT_TID (112-16)
#define STACK_SLOT_PSSCR (112-24)
+#define STACK_SLOT_PID (112-32)
.global kvmppc_hv_entry
kvmppc_hv_entry:
@@ -530,6 +531,7 @@ kvmppc_hv_entry:
* R1 = host R1
* R2 = TOC
* all other volatile GPRS = free
+ * Does not preserve non-volatile GPRs or CR fields
*/
mflr r0
std r0, PPC_LR_STKOFF(r1)
@@ -549,32 +551,38 @@ kvmppc_hv_entry:
bl kvmhv_start_timing
1:
#endif
- /* Clear out SLB */
+
+ /* Use cr7 as an indication of radix mode */
+ ld r5, HSTATE_KVM_VCORE(r13)
+ ld r9, VCORE_KVM(r5) /* pointer to struct kvm */
+ lbz r0, KVM_RADIX(r9)
+ cmpwi cr7, r0, 0
+
+ /* Clear out SLB if hash */
+ bne cr7, 2f
li r6,0
slbmte r6,r6
slbia
ptesync
-
+2:
/*
* POWER7/POWER8 host -> guest partition switch code.
* We don't have to lock against concurrent tlbies,
* but we do have to coordinate across hardware threads.
*/
/* Set bit in entry map iff exit map is zero. */
- ld r5, HSTATE_KVM_VCORE(r13)
li r7, 1
lbz r6, HSTATE_PTID(r13)
sld r7, r7, r6
- addi r9, r5, VCORE_ENTRY_EXIT
-21: lwarx r3, 0, r9
+ addi r8, r5, VCORE_ENTRY_EXIT
+21: lwarx r3, 0, r8
cmpwi r3, 0x100 /* any threads starting to exit? */
bge secondary_too_late /* if so we're too late to the party */
or r3, r3, r7
- stwcx. r3, 0, r9
+ stwcx. r3, 0, r8
bne 21b
/* Primary thread switches to guest partition. */
- ld r9,VCORE_KVM(r5) /* pointer to struct kvm */
cmpwi r6,0
bne 10f
lwz r7,KVM_LPID(r9)
@@ -589,6 +597,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
isync
/* See if we need to flush the TLB */
+ bne cr7, 22f /* skip this for radix */
lhz r6,PACAPACAINDEX(r13) /* test_bit(cpu, need_tlb_flush) */
clrldi r7,r6,64-6 /* extract bit number (6 bits) */
srdi r6,r6,6 /* doubleword number */
@@ -658,7 +667,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
beq kvmppc_primary_no_guest
kvmppc_got_guest:
- /* Load up guest SLB entries */
+ /* Load up guest SLB entries (N.B. slb_max will be 0 for radix) */
lwz r5,VCPU_SLB_MAX(r4)
cmpwi r5,0
beq 9f
@@ -696,8 +705,10 @@ kvmppc_got_guest:
BEGIN_FTR_SECTION
mfspr r5, SPRN_TIDR
mfspr r6, SPRN_PSSCR
+ mfspr r7, SPRN_PID
std r5, STACK_SLOT_TID(r1)
std r6, STACK_SLOT_PSSCR(r1)
+ std r7, STACK_SLOT_PID(r1)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
BEGIN_FTR_SECTION
@@ -1285,11 +1296,15 @@ mc_cont:
mtspr SPRN_CTRLT,r6
4:
/* Read the guest SLB and save it away */
+ ld r5, VCPU_KVM(r9)
+ lbz r0, KVM_RADIX(r5)
+ cmpwi r0, 0
+ li r5, 0
+ bne 3f /* for radix, save 0 entries */
lwz r0,VCPU_SLB_NR(r9) /* number of entries in SLB */
mtctr r0
li r6,0
addi r7,r9,VCPU_SLB
- li r5,0
1: slbmfee r8,r6
andis. r0,r8,SLB_ESID_V@h
beq 2f
@@ -1301,7 +1316,7 @@ mc_cont:
addi r5,r5,1
2: addi r6,r6,1
bdnz 1b
- stw r5,VCPU_SLB_MAX(r9)
+3: stw r5,VCPU_SLB_MAX(r9)
/*
* Save the guest PURR/SPURR
@@ -1550,8 +1565,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
BEGIN_FTR_SECTION
ld r5, STACK_SLOT_TID(r1)
ld r6, STACK_SLOT_PSSCR(r1)
+ ld r7, STACK_SLOT_PID(r1)
mtspr SPRN_TIDR, r5
mtspr SPRN_PSSCR, r6
+ mtspr SPRN_PID, r7
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
/*
@@ -1663,6 +1680,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
isync
/* load host SLB entries */
+BEGIN_MMU_FTR_SECTION
+ b 0f
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
ld r8,PACA_SLBSHADOWPTR(r13)
.rept SLB_NUM_BOLTED
@@ -1675,7 +1695,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
slbmte r6,r5
1: addi r8,r8,16
.endr
-
+0:
#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
/* Finish timing, if we have a vcpu */
ld r4, HSTATE_KVM_VCPU(r13)
@@ -1702,8 +1722,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
* reflect the HDSI to the guest as a DSI.
*/
kvmppc_hdsi:
+ ld r3, VCPU_KVM(r9)
+ lbz r0, KVM_RADIX(r3)
+ cmpwi r0, 0
mfspr r4, SPRN_HDAR
mfspr r6, SPRN_HDSISR
+ bne .Lradix_hdsi /* on radix, just save DAR/DSISR/ASDR */
/* HPTE not found fault or protection fault? */
andis. r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
beq 1f /* if not, send it to the guest */
@@ -1776,11 +1800,23 @@ fast_interrupt_c_return:
stb r0, HSTATE_IN_GUEST(r13)
b guest_exit_cont
+.Lradix_hdsi:
+ std r4, VCPU_FAULT_DAR(r9)
+ stw r6, VCPU_FAULT_DSISR(r9)
+.Lradix_hisi:
+ mfspr r5, SPRN_ASDR
+ std r5, VCPU_FAULT_GPA(r9)
+ b guest_exit_cont
+
/*
* Similarly for an HISI, reflect it to the guest as an ISI unless
* it is an HPTE not found fault for a page that we have paged out.
*/
kvmppc_hisi:
+ ld r3, VCPU_KVM(r9)
+ lbz r0, KVM_RADIX(r3)
+ cmpwi r0, 0
+ bne .Lradix_hisi /* for radix, just save ASDR */
andis. r0, r11, SRR1_ISI_NOPT@h
beq 1f
andi. r0, r11, MSR_IR /* instruction relocation enabled? */
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 13/18] KVM: PPC: Book3S HV: Page table construction and page faults for radix guests
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (11 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 12/18] KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle " Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-23 3:17 ` Suraj Jitindar Singh
2017-01-12 9:07 ` [PATCH 14/18] KVM: PPC: Book3S HV: MMU notifier callbacks " Paul Mackerras
` (4 subsequent siblings)
17 siblings, 1 reply; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This adds the code to construct the second-level ("partition-scoped" in
architecturese) page tables for guests using the radix MMU. Apart from
the PGD level, which is allocated when the guest is created, the rest
of the tree is all constructed in response to hypervisor page faults.
As well as hypervisor page faults for missing pages, we also get faults
for reference/change (RC) bits needing to be set, as well as various
other error conditions. For now, we only set the R or C bit in the
guest page table if the same bit is set in the host PTE for the
backing page.
This code can take advantage of the guest being backed with either
transparent or ordinary 2MB huge pages, and insert 2MB page entries
into the guest page tables. There is no support for 1GB huge pages
yet.
---
arch/powerpc/include/asm/kvm_book3s.h | 8 +
arch/powerpc/kvm/book3s.c | 1 +
arch/powerpc/kvm/book3s_64_mmu_hv.c | 7 +-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 385 +++++++++++++++++++++++++++++++++
arch/powerpc/kvm/book3s_hv.c | 17 +-
5 files changed, 415 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 7adfcc0..ff5cd5c 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -170,6 +170,8 @@ extern int kvmppc_book3s_hv_page_fault(struct kvm_run *run,
unsigned long status);
extern long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr,
unsigned long slb_v, unsigned long valid);
+extern int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu,
+ unsigned long gpa, gva_t ea, int is_store);
extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte);
extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu);
@@ -182,8 +184,14 @@ extern void kvmppc_mmu_hpte_sysexit(void);
extern int kvmppc_mmu_hv_init(void);
extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc);
+extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run,
+ struct kvm_vcpu *vcpu,
+ unsigned long ea, unsigned long dsisr);
extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
struct kvmppc_pte *gpte, bool data, bool iswrite);
+extern void kvmppc_free_radix(struct kvm *kvm);
+extern int kvmppc_radix_init(void);
+extern void kvmppc_radix_exit(void);
/* XXX remove this export when load_last_inst() is generic */
extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 019f008..b6b5c18 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -239,6 +239,7 @@ void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, ulong dar,
kvmppc_set_dsisr(vcpu, flags);
kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE);
}
+EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage); /* used by kvm_hv */
void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong flags)
{
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index c208bf3..57690c2 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -395,8 +395,8 @@ static int instruction_is_store(unsigned int instr)
return (instr & mask) != 0;
}
-static int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu,
- unsigned long gpa, gva_t ea, int is_store)
+int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu,
+ unsigned long gpa, gva_t ea, int is_store)
{
u32 last_inst;
@@ -461,6 +461,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned long rcbits;
long mmio_update;
+ if (kvm_is_radix(kvm))
+ return kvmppc_book3s_radix_page_fault(run, vcpu, ea, dsisr);
+
/*
* Real-mode code has already searched the HPT and found the
* entry we're interested in. Lock the entry and check that
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 9091407..865ea9b 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -137,3 +137,388 @@ int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
return 0;
}
+#ifdef CONFIG_PPC_64K_PAGES
+#define MMU_BASE_PSIZE MMU_PAGE_64K
+#else
+#define MMU_BASE_PSIZE MMU_PAGE_4K
+#endif
+
+static void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr,
+ unsigned int pshift)
+{
+ int psize = MMU_BASE_PSIZE;
+
+ if (pshift >= PMD_SHIFT)
+ psize = MMU_PAGE_2M;
+ addr &= ~0xfffUL;
+ addr |= mmu_psize_defs[psize].ap << 5;
+ asm volatile("ptesync": : :"memory");
+ asm volatile(PPC_TLBIE_5(%0, %1, 0, 0, 1)
+ : : "r" (addr), "r" (kvm->arch.lpid) : "memory");
+ asm volatile("ptesync": : :"memory");
+}
+
+void kvmppc_radix_update_pte(struct kvm *kvm, pte_t *ptep, unsigned long clr,
+ unsigned long set, unsigned long addr,
+ unsigned int shift)
+{
+ if (!(clr & _PAGE_PRESENT) && cpu_has_feature(CPU_FTR_POWER9_DD1) &&
+ pte_present(*ptep)) {
+ /* have to invalidate it first */
+ __radix_pte_update(ptep, _PAGE_PRESENT, 0);
+ kvmppc_radix_tlbie_page(kvm, addr, shift);
+ set |= _PAGE_PRESENT;
+ }
+ __radix_pte_update(ptep, clr, set);
+}
+
+void kvmppc_radix_set_pte_at(struct kvm *kvm, unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+ radix__set_pte_at(kvm->mm, addr, ptep, pte, 0);
+}
+
+static struct kmem_cache *kvm_pte_cache;
+
+static pte_t *kvmppc_pte_alloc(void)
+{
+ return kmem_cache_alloc(kvm_pte_cache, GFP_KERNEL);
+}
+
+static void kvmppc_pte_free(pte_t *ptep)
+{
+ kmem_cache_free(kvm_pte_cache, ptep);
+}
+
+static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
+ unsigned int level, unsigned long mmu_seq)
+{
+ pgd_t *pgd;
+ pud_t *pud, *new_pud = NULL;
+ pmd_t *pmd, *new_pmd = NULL;
+ pte_t *ptep, *new_ptep = NULL;
+ int ret;
+
+ /* Traverse the guest's 2nd-level tree, allocate new levels needed */
+ pgd = kvm->arch.pgtable + pgd_index(gpa);
+ pud = NULL;
+ if (pgd_present(*pgd))
+ pud = pud_offset(pgd, gpa);
+ else
+ new_pud = pud_alloc_one(kvm->mm, gpa);
+
+ pmd = NULL;
+ if (pud && pud_present(*pud))
+ pmd = pmd_offset(pud, gpa);
+ else
+ new_pmd = pmd_alloc_one(kvm->mm, gpa);
+
+ if (level == 0 && !(pmd && pmd_present(*pmd)))
+ new_ptep = kvmppc_pte_alloc();
+
+ /* Check if we might have been invalidated; let the guest retry if so */
+ spin_lock(&kvm->mmu_lock);
+ ret = -EAGAIN;
+ if (mmu_notifier_retry(kvm, mmu_seq))
+ goto out_unlock;
+
+ /* Now traverse again under the lock and change the tree */
+ ret = -ENOMEM;
+ if (pgd_none(*pgd)) {
+ if (!new_pud)
+ goto out_unlock;
+ pgd_populate(kvm->mm, pgd, new_pud);
+ new_pud = NULL;
+ }
+ pud = pud_offset(pgd, gpa);
+ if (pud_none(*pud)) {
+ if (!new_pmd)
+ goto out_unlock;
+ pud_populate(kvm->mm, pud, new_pmd);
+ new_pmd = NULL;
+ }
+ pmd = pmd_offset(pud, gpa);
+ if (pmd_large(*pmd)) {
+ /* Someone else has instantiated a large page here; retry */
+ ret = -EAGAIN;
+ goto out_unlock;
+ }
+ if (level == 1 && !pmd_none(*pmd)) {
+ /*
+ * There's a page table page here, but we wanted
+ * to install a large page. Tell the caller and let
+ * it try installing a normal page if it wants.
+ */
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+ if (level == 0) {
+ if (pmd_none(*pmd)) {
+ if (!new_ptep)
+ goto out_unlock;
+ pmd_populate(kvm->mm, pmd, new_ptep);
+ new_ptep = NULL;
+ }
+ ptep = pte_offset_kernel(pmd, gpa);
+ if (pte_present(*ptep)) {
+ /* PTE was previously valid, so invalidate it */
+ kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT,
+ 0, gpa, 0);
+ kvmppc_radix_tlbie_page(kvm, gpa, 0);
+ }
+ kvmppc_radix_set_pte_at(kvm, gpa, ptep, pte);
+ } else {
+ kvmppc_radix_set_pte_at(kvm, gpa, pmdp_ptep(pmd), pte);
+ }
+ ret = 0;
+
+ out_unlock:
+ spin_unlock(&kvm->mmu_lock);
+ if (new_pud)
+ pud_free(kvm->mm, new_pud);
+ if (new_pmd)
+ pmd_free(kvm->mm, new_pmd);
+ if (new_ptep)
+ kvmppc_pte_free(new_ptep);
+ return ret;
+}
+
+int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
+ unsigned long ea, unsigned long dsisr)
+{
+ struct kvm *kvm = vcpu->kvm;
+ unsigned long mmu_seq, pte_size;
+ unsigned long gpa, gfn, hva, pfn;
+ struct kvm_memory_slot *memslot;
+ struct page *page = NULL, *pages[1];
+ long ret, npages, ok;
+ unsigned int writing;
+ struct vm_area_struct *vma;
+ unsigned long flags;
+ pte_t pte, *ptep;
+ unsigned long pgflags;
+ unsigned int shift, level;
+
+ /* Check for unusual errors */
+ if (dsisr & DSISR_UNSUPP_MMU) {
+ pr_err("KVM: Got unsupported MMU fault\n");
+ return -EFAULT;
+ }
+ if (dsisr & DSISR_BADACCESS) {
+ /* Reflect to the guest as DSI */
+ pr_err("KVM: Got radix HV page fault with DSISR=%lx\n", dsisr);
+ kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+ return RESUME_GUEST;
+ }
+
+ /* Translate the logical address and get the page */
+ gpa = vcpu->arch.fault_gpa & ~0xfffUL;
+ gpa &= ~0xF000000000000000ul;
+ gfn = gpa >> PAGE_SHIFT;
+ if (!(dsisr & DSISR_PGDIRFAULT))
+ gpa |= ea & 0xfff;
+ memslot = gfn_to_memslot(kvm, gfn);
+
+ /* No memslot means it's an emulated MMIO region */
+ if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) {
+ if (dsisr & (DSISR_PGDIRFAULT | DSISR_BADACCESS |
+ DSISR_SET_RC)) {
+ /*
+ * Bad address in guest page table tree, or other
+ * unusual error - reflect it to the guest as DSI.
+ */
+ kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+ return RESUME_GUEST;
+ }
+ return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea,
+ dsisr & DSISR_ISSTORE);
+ }
+
+ /* used to check for invalidations in progress */
+ mmu_seq = kvm->mmu_notifier_seq;
+ smp_rmb();
+
+ writing = (dsisr & DSISR_ISSTORE) != 0;
+ hva = gfn_to_hva_memslot(memslot, gfn);
+ if (dsisr & DSISR_SET_RC) {
+ /*
+ * Need to set an R or C bit in the 2nd-level tables;
+ * if the relevant bits aren't already set in the linux
+ * page tables, fall through to do the gup_fast to
+ * set them in the linux page tables too.
+ */
+ ok = 0;
+ pgflags = _PAGE_ACCESSED;
+ if (writing)
+ pgflags |= _PAGE_DIRTY;
+ local_irq_save(flags);
+ ptep = __find_linux_pte_or_hugepte(current->mm->pgd, hva,
+ NULL, NULL);
+ if (ptep) {
+ pte = READ_ONCE(*ptep);
+ if (pte_present(pte) &&
+ (pte_val(pte) & pgflags) == pgflags)
+ ok = 1;
+ }
+ local_irq_restore(flags);
+ if (ok) {
+ spin_lock(&kvm->mmu_lock);
+ if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) {
+ spin_unlock(&kvm->mmu_lock);
+ return RESUME_GUEST;
+ }
+ ptep = __find_linux_pte_or_hugepte(kvm->arch.pgtable,
+ gpa, NULL, &shift);
+ if (ptep && pte_present(*ptep)) {
+ kvmppc_radix_update_pte(kvm, ptep, 0, pgflags,
+ gpa, shift);
+ spin_unlock(&kvm->mmu_lock);
+ return RESUME_GUEST;
+ }
+ spin_unlock(&kvm->mmu_lock);
+ }
+ }
+
+ ret = -EFAULT;
+ pfn = 0;
+ pte_size = PAGE_SIZE;
+ pgflags = _PAGE_READ | _PAGE_EXEC;
+ level = 0;
+ npages = get_user_pages_fast(hva, 1, writing, pages);
+ if (npages < 1) {
+ /* Check if it's an I/O mapping */
+ down_read(¤t->mm->mmap_sem);
+ vma = find_vma(current->mm, hva);
+ if (vma && vma->vm_start <= hva && hva < vma->vm_end &&
+ (vma->vm_flags & VM_PFNMAP)) {
+ pfn = vma->vm_pgoff +
+ ((hva - vma->vm_start) >> PAGE_SHIFT);
+ pgflags = pgprot_val(vma->vm_page_prot);
+ }
+ up_read(¤t->mm->mmap_sem);
+ if (!pfn)
+ return -EFAULT;
+ } else {
+ page = pages[0];
+ pfn = page_to_pfn(page);
+ if (PageHuge(page)) {
+ page = compound_head(page);
+ pte_size <<= compound_order(page);
+ /* See if we can insert a 2MB large-page PTE here */
+ if (pte_size >= PMD_SIZE &&
+ (gpa & PMD_MASK & PAGE_MASK) ==
+ (hva & PMD_MASK & PAGE_MASK)) {
+ level = 1;
+ pfn &= ~((PMD_SIZE >> PAGE_SHIFT) - 1);
+ }
+ }
+ /* See if we can provide write access */
+ if (writing) {
+ /*
+ * We assume gup_fast has set dirty on the host PTE.
+ */
+ pgflags |= _PAGE_WRITE;
+ } else {
+ local_irq_save(flags);
+ ptep = __find_linux_pte_or_hugepte(current->mm->pgd,
+ hva, NULL, NULL);
+ if (ptep && pte_write(*ptep) && pte_dirty(*ptep))
+ pgflags |= _PAGE_WRITE;
+ local_irq_restore(flags);
+ }
+ }
+
+ /*
+ * Compute the PTE value that we need to insert.
+ */
+ pgflags |= _PAGE_PRESENT | _PAGE_PTE | _PAGE_ACCESSED;
+ if (pgflags & _PAGE_WRITE)
+ pgflags |= _PAGE_DIRTY;
+ pte = pfn_pte(pfn, __pgprot(pgflags));
+
+ /* Allocate space in the tree and write the PTE */
+ ret = kvmppc_create_pte(kvm, pte, gpa, level, mmu_seq);
+ if (ret == -EBUSY) {
+ /*
+ * There's already a PMD where wanted to install a large page;
+ * for now, fall back to installing a small page.
+ */
+ level = 0;
+ pfn |= gfn & ((PMD_SIZE >> PAGE_SHIFT) - 1);
+ pte = pfn_pte(pfn, __pgprot(pgflags));
+ ret = kvmppc_create_pte(kvm, pte, gpa, level, mmu_seq);
+ }
+ if (ret == 0 || ret == -EAGAIN)
+ ret = RESUME_GUEST;
+
+ if (page) {
+ /*
+ * We drop pages[0] here, not page because page might
+ * have been set to the head page of a compound, but
+ * we have to drop the reference on the correct tail
+ * page to match the get inside gup()
+ */
+ put_page(pages[0]);
+ }
+ return ret;
+}
+
+void kvmppc_free_radix(struct kvm *kvm)
+{
+ unsigned long ig, iu, im;
+ pte_t *pte;
+ pmd_t *pmd;
+ pud_t *pud;
+ pgd_t *pgd;
+
+ if (!kvm->arch.pgtable)
+ return;
+ pgd = kvm->arch.pgtable;
+ for (ig = 0; ig < PTRS_PER_PGD; ++ig, ++pgd) {
+ if (!pgd_present(*pgd))
+ continue;
+ pud = pud_offset(pgd, 0);
+ for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++pud) {
+ if (!pud_present(*pud))
+ continue;
+ pmd = pmd_offset(pud, 0);
+ for (im = 0; im < PTRS_PER_PMD; ++im, ++pmd) {
+ if (pmd_huge(*pmd)) {
+ pmd_clear(pmd);
+ continue;
+ }
+ if (!pmd_present(*pmd))
+ continue;
+ pte = pte_offset_map(pmd, 0);
+ memset(pte, 0, sizeof(long) << PTE_INDEX_SIZE);
+ kvmppc_pte_free(pte);
+ pmd_clear(pmd);
+ }
+ pmd_free(kvm->mm, pmd_offset(pud, 0));
+ pud_clear(pud);
+ }
+ pud_free(kvm->mm, pud_offset(pgd, 0));
+ pgd_clear(pgd);
+ }
+ pgd_free(kvm->mm, kvm->arch.pgtable);
+}
+
+static void pte_ctor(void *addr)
+{
+ memset(addr, 0, PTE_TABLE_SIZE);
+}
+
+int kvmppc_radix_init(void)
+{
+ unsigned long size = sizeof(void *) << PTE_INDEX_SIZE;
+
+ kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor);
+ if (!kvm_pte_cache)
+ return -ENOMEM;
+ return 0;
+}
+
+void kvmppc_radix_exit(void)
+{
+ kmem_cache_destroy(kvm_pte_cache);
+}
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6bd0f4a..4c2d054 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3357,7 +3357,10 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
kvmppc_free_vcores(kvm);
- kvmppc_free_hpt(kvm);
+ if (kvm->arch.radix)
+ kvmppc_free_radix(kvm);
+ else
+ kvmppc_free_hpt(kvm);
kvmppc_free_pimap(kvm);
}
@@ -3769,6 +3772,11 @@ static int kvm_init_subcore_bitmap(void)
return 0;
}
+static int kvmppc_radix_possible(void)
+{
+ return cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled();
+}
+
static int kvmppc_book3s_init_hv(void)
{
int r;
@@ -3808,12 +3816,19 @@ static int kvmppc_book3s_init_hv(void)
init_vcore_lists();
r = kvmppc_mmu_hv_init();
+ if (r)
+ return r;
+
+ if (kvmppc_radix_possible())
+ r = kvmppc_radix_init();
return r;
}
static void kvmppc_book3s_exit_hv(void)
{
kvmppc_free_host_rm_ops();
+ if (kvmppc_radix_possible())
+ kvmppc_radix_exit();
kvmppc_hv_ops = NULL;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 13/18] KVM: PPC: Book3S HV: Page table construction and page faults for radix guests
2017-01-12 9:07 ` [PATCH 13/18] KVM: PPC: Book3S HV: Page table construction and page faults for " Paul Mackerras
@ 2017-01-23 3:17 ` Suraj Jitindar Singh
2017-01-23 4:38 ` Paul Mackerras
0 siblings, 1 reply; 27+ messages in thread
From: Suraj Jitindar Singh @ 2017-01-23 3:17 UTC (permalink / raw)
To: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc
On Thu, 2017-01-12 at 20:07 +1100, Paul Mackerras wrote:
> This adds the code to construct the second-level ("partition-scoped"
> in
> architecturese) page tables for guests using the radix MMU. Apart
> from
> the PGD level, which is allocated when the guest is created, the rest
> of the tree is all constructed in response to hypervisor page faults.
>
> As well as hypervisor page faults for missing pages, we also get
> faults
> for reference/change (RC) bits needing to be set, as well as various
> other error conditions. For now, we only set the R or C bit in the
> guest page table if the same bit is set in the host PTE for the
> backing page.
>
> This code can take advantage of the guest being backed with either
> transparent or ordinary 2MB huge pages, and insert 2MB page entries
> into the guest page tables. There is no support for 1GB huge pages
> yet.
> ---
> arch/powerpc/include/asm/kvm_book3s.h | 8 +
> arch/powerpc/kvm/book3s.c | 1 +
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 7 +-
> arch/powerpc/kvm/book3s_64_mmu_radix.c | 385
> +++++++++++++++++++++++++++++++++
> arch/powerpc/kvm/book3s_hv.c | 17 +-
> 5 files changed, 415 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h
> b/arch/powerpc/include/asm/kvm_book3s.h
> index 7adfcc0..ff5cd5c 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -170,6 +170,8 @@ extern int kvmppc_book3s_hv_page_fault(struct
> kvm_run *run,
> unsigned long status);
> extern long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr,
> unsigned long slb_v, unsigned long valid);
> +extern int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct
> kvm_vcpu *vcpu,
> + unsigned long gpa, gva_t ea, int is_store);
>
> extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct
> hpte_cache *pte);
> extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu
> *vcpu);
> @@ -182,8 +184,14 @@ extern void kvmppc_mmu_hpte_sysexit(void);
> extern int kvmppc_mmu_hv_init(void);
> extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned
> long hc);
>
> +extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run,
> + struct kvm_vcpu *vcpu,
> + unsigned long ea, unsigned long dsisr);
> extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t
> eaddr,
> struct kvmppc_pte *gpte, bool data, bool
> iswrite);
> +extern void kvmppc_free_radix(struct kvm *kvm);
> +extern int kvmppc_radix_init(void);
> +extern void kvmppc_radix_exit(void);
>
> /* XXX remove this export when load_last_inst() is generic */
> extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size,
> void *ptr, bool data);
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 019f008..b6b5c18 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -239,6 +239,7 @@ void kvmppc_core_queue_data_storage(struct
> kvm_vcpu *vcpu, ulong dar,
> kvmppc_set_dsisr(vcpu, flags);
> kvmppc_book3s_queue_irqprio(vcpu,
> BOOK3S_INTERRUPT_DATA_STORAGE);
> }
> +EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage); /* used by
> kvm_hv */
>
> void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong
> flags)
> {
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index c208bf3..57690c2 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -395,8 +395,8 @@ static int instruction_is_store(unsigned int
> instr)
> return (instr & mask) != 0;
> }
>
> -static int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct
> kvm_vcpu *vcpu,
> - unsigned long gpa, gva_t ea, int
> is_store)
> +int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct kvm_vcpu
> *vcpu,
> + unsigned long gpa, gva_t ea, int
> is_store)
> {
> u32 last_inst;
>
> @@ -461,6 +461,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run
> *run, struct kvm_vcpu *vcpu,
> unsigned long rcbits;
> long mmio_update;
>
> + if (kvm_is_radix(kvm))
> + return kvmppc_book3s_radix_page_fault(run, vcpu, ea,
> dsisr);
> +
> /*
> * Real-mode code has already searched the HPT and found the
> * entry we're interested in. Lock the entry and check that
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> index 9091407..865ea9b 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> @@ -137,3 +137,388 @@ int kvmppc_mmu_radix_xlate(struct kvm_vcpu
> *vcpu, gva_t eaddr,
> return 0;
> }
>
> +#ifdef CONFIG_PPC_64K_PAGES
> +#define MMU_BASE_PSIZE MMU_PAGE_64K
> +#else
> +#define MMU_BASE_PSIZE MMU_PAGE_4K
> +#endif
> +
> +static void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long
> addr,
> + unsigned int pshift)
> +{
> + int psize = MMU_BASE_PSIZE;
> +
> + if (pshift >= PMD_SHIFT)
> + psize = MMU_PAGE_2M;
> + addr &= ~0xfffUL;
> + addr |= mmu_psize_defs[psize].ap << 5;
> + asm volatile("ptesync": : :"memory");
> + asm volatile(PPC_TLBIE_5(%0, %1, 0, 0, 1)
> + : : "r" (addr), "r" (kvm->arch.lpid) :
> "memory");
> + asm volatile("ptesync": : :"memory");
> +}
> +
> +void kvmppc_radix_update_pte(struct kvm *kvm, pte_t *ptep, unsigned
> long clr,
> + unsigned long set, unsigned long addr,
> + unsigned int shift)
> +{
> + if (!(clr & _PAGE_PRESENT) &&
> cpu_has_feature(CPU_FTR_POWER9_DD1) &&
> + pte_present(*ptep)) {
> + /* have to invalidate it first */
> + __radix_pte_update(ptep, _PAGE_PRESENT, 0);
> + kvmppc_radix_tlbie_page(kvm, addr, shift);
> + set |= _PAGE_PRESENT;
> + }
> + __radix_pte_update(ptep, clr, set);
> +}
> +
> +void kvmppc_radix_set_pte_at(struct kvm *kvm, unsigned long addr,
> + pte_t *ptep, pte_t pte)
> +{
> + radix__set_pte_at(kvm->mm, addr, ptep, pte, 0);
> +}
> +
> +static struct kmem_cache *kvm_pte_cache;
> +
> +static pte_t *kvmppc_pte_alloc(void)
> +{
> + return kmem_cache_alloc(kvm_pte_cache, GFP_KERNEL);
> +}
> +
> +static void kvmppc_pte_free(pte_t *ptep)
> +{
> + kmem_cache_free(kvm_pte_cache, ptep);
> +}
> +
> +static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned
> long gpa,
> + unsigned int level, unsigned long
> mmu_seq)
> +{
> + pgd_t *pgd;
> + pud_t *pud, *new_pud = NULL;
> + pmd_t *pmd, *new_pmd = NULL;
> + pte_t *ptep, *new_ptep = NULL;
> + int ret;
> +
> + /* Traverse the guest's 2nd-level tree, allocate new levels
> needed */
> + pgd = kvm->arch.pgtable + pgd_index(gpa);
> + pud = NULL;
> + if (pgd_present(*pgd))
> + pud = pud_offset(pgd, gpa);
> + else
> + new_pud = pud_alloc_one(kvm->mm, gpa);
> +
> + pmd = NULL;
> + if (pud && pud_present(*pud))
> + pmd = pmd_offset(pud, gpa);
> + else
> + new_pmd = pmd_alloc_one(kvm->mm, gpa);
> +
> + if (level == 0 && !(pmd && pmd_present(*pmd)))
> + new_ptep = kvmppc_pte_alloc();
> +
> + /* Check if we might have been invalidated; let the guest
> retry if so */
> + spin_lock(&kvm->mmu_lock);
> + ret = -EAGAIN;
> + if (mmu_notifier_retry(kvm, mmu_seq))
> + goto out_unlock;
> +
> + /* Now traverse again under the lock and change the tree */
> + ret = -ENOMEM;
> + if (pgd_none(*pgd)) {
> + if (!new_pud)
> + goto out_unlock;
> + pgd_populate(kvm->mm, pgd, new_pud);
> + new_pud = NULL;
> + }
> + pud = pud_offset(pgd, gpa);
> + if (pud_none(*pud)) {
> + if (!new_pmd)
> + goto out_unlock;
> + pud_populate(kvm->mm, pud, new_pmd);
> + new_pmd = NULL;
> + }
> + pmd = pmd_offset(pud, gpa);
> + if (pmd_large(*pmd)) {
> + /* Someone else has instantiated a large page here;
> retry */
> + ret = -EAGAIN;
> + goto out_unlock;
> + }
> + if (level == 1 && !pmd_none(*pmd)) {
> + /*
> + * There's a page table page here, but we wanted
> + * to install a large page. Tell the caller and let
> + * it try installing a normal page if it wants.
> + */
> + ret = -EBUSY;
> + goto out_unlock;
> + }
> + if (level == 0) {
> + if (pmd_none(*pmd)) {
> + if (!new_ptep)
> + goto out_unlock;
> + pmd_populate(kvm->mm, pmd, new_ptep);
> + new_ptep = NULL;
> + }
> + ptep = pte_offset_kernel(pmd, gpa);
> + if (pte_present(*ptep)) {
> + /* PTE was previously valid, so invalidate
> it */
> + kvmppc_radix_update_pte(kvm, ptep,
> _PAGE_PRESENT,
> + 0, gpa, 0);
> + kvmppc_radix_tlbie_page(kvm, gpa, 0);
> + }
> + kvmppc_radix_set_pte_at(kvm, gpa, ptep, pte);
> + } else {
> + kvmppc_radix_set_pte_at(kvm, gpa, pmdp_ptep(pmd),
> pte);
> + }
> + ret = 0;
> +
> + out_unlock:
> + spin_unlock(&kvm->mmu_lock);
> + if (new_pud)
> + pud_free(kvm->mm, new_pud);
> + if (new_pmd)
> + pmd_free(kvm->mm, new_pmd);
> + if (new_ptep)
> + kvmppc_pte_free(new_ptep);
> + return ret;
> +}
> +
> +int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct
> kvm_vcpu *vcpu,
> + unsigned long ea, unsigned long
> dsisr)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + unsigned long mmu_seq, pte_size;
> + unsigned long gpa, gfn, hva, pfn;
> + struct kvm_memory_slot *memslot;
> + struct page *page = NULL, *pages[1];
> + long ret, npages, ok;
> + unsigned int writing;
> + struct vm_area_struct *vma;
> + unsigned long flags;
> + pte_t pte, *ptep;
> + unsigned long pgflags;
> + unsigned int shift, level;
> +
> + /* Check for unusual errors */
> + if (dsisr & DSISR_UNSUPP_MMU) {
> + pr_err("KVM: Got unsupported MMU fault\n");
> + return -EFAULT;
> + }
> + if (dsisr & DSISR_BADACCESS) {
> + /* Reflect to the guest as DSI */
> + pr_err("KVM: Got radix HV page fault with
> DSISR=%lx\n", dsisr);
> + kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
> + return RESUME_GUEST;
> + }
> +
> + /* Translate the logical address and get the page */
> + gpa = vcpu->arch.fault_gpa & ~0xfffUL;
> + gpa &= ~0xF000000000000000ul;
> + gfn = gpa >> PAGE_SHIFT;
> + if (!(dsisr & DSISR_PGDIRFAULT))
> + gpa |= ea & 0xfff;
> + memslot = gfn_to_memslot(kvm, gfn);
> +
> + /* No memslot means it's an emulated MMIO region */
> + if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) {
> + if (dsisr & (DSISR_PGDIRFAULT | DSISR_BADACCESS |
> + DSISR_SET_RC)) {
> + /*
> + * Bad address in guest page table tree, or
> other
> + * unusual error - reflect it to the guest
> as DSI.
> + */
> + kvmppc_core_queue_data_storage(vcpu, ea,
> dsisr);
> + return RESUME_GUEST;
> + }
> + return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea,
> + dsisr &
> DSISR_ISSTORE);
> + }
> +
> + /* used to check for invalidations in progress */
> + mmu_seq = kvm->mmu_notifier_seq;
> + smp_rmb();
> +
> + writing = (dsisr & DSISR_ISSTORE) != 0;
> + hva = gfn_to_hva_memslot(memslot, gfn);
> + if (dsisr & DSISR_SET_RC) {
> + /*
> + * Need to set an R or C bit in the 2nd-level
> tables;
> + * if the relevant bits aren't already set in the
> linux
> + * page tables, fall through to do the gup_fast to
> + * set them in the linux page tables too.
> + */
> + ok = 0;
> + pgflags = _PAGE_ACCESSED;
> + if (writing)
> + pgflags |= _PAGE_DIRTY;
> + local_irq_save(flags);
> + ptep = __find_linux_pte_or_hugepte(current->mm->pgd,
> hva,
> + NULL, NULL);
> + if (ptep) {
> + pte = READ_ONCE(*ptep);
> + if (pte_present(pte) &&
> + (pte_val(pte) & pgflags) == pgflags)
> + ok = 1;
> + }
> + local_irq_restore(flags);
> + if (ok) {
> + spin_lock(&kvm->mmu_lock);
> + if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
> {
> + spin_unlock(&kvm->mmu_lock);
> + return RESUME_GUEST;
> + }
> + ptep = __find_linux_pte_or_hugepte(kvm-
> >arch.pgtable,
> + gpa, NULL,
> &shift);
> + if (ptep && pte_present(*ptep)) {
> + kvmppc_radix_update_pte(kvm, ptep,
> 0, pgflags,
> + gpa, shift);
> + spin_unlock(&kvm->mmu_lock);
> + return RESUME_GUEST;
> + }
> + spin_unlock(&kvm->mmu_lock);
> + }
> + }
> +
> + ret = -EFAULT;
> + pfn = 0;
> + pte_size = PAGE_SIZE;
> + pgflags = _PAGE_READ | _PAGE_EXEC;
> + level = 0;
> + npages = get_user_pages_fast(hva, 1, writing, pages);
> + if (npages < 1) {
> + /* Check if it's an I/O mapping */
> + down_read(¤t->mm->mmap_sem);
> + vma = find_vma(current->mm, hva);
> + if (vma && vma->vm_start <= hva && hva < vma->vm_end
> &&
> + (vma->vm_flags & VM_PFNMAP)) {
> + pfn = vma->vm_pgoff +
> + ((hva - vma->vm_start) >>
> PAGE_SHIFT);
> + pgflags = pgprot_val(vma->vm_page_prot);
> + }
> + up_read(¤t->mm->mmap_sem);
> + if (!pfn)
> + return -EFAULT;
> + } else {
> + page = pages[0];
> + pfn = page_to_pfn(page);
> + if (PageHuge(page)) {
> + page = compound_head(page);
> + pte_size <<= compound_order(page);
> + /* See if we can insert a 2MB large-page PTE
> here */
> + if (pte_size >= PMD_SIZE &&
> + (gpa & PMD_MASK & PAGE_MASK) ==
> + (hva & PMD_MASK & PAGE_MASK)) {
> + level = 1;
> + pfn &= ~((PMD_SIZE >> PAGE_SHIFT) -
> 1);
> + }
> + }
> + /* See if we can provide write access */
> + if (writing) {
> + /*
> + * We assume gup_fast has set dirty on the
> host PTE.
> + */
> + pgflags |= _PAGE_WRITE;
> + } else {
> + local_irq_save(flags);
> + ptep = __find_linux_pte_or_hugepte(current-
> >mm->pgd,
> + hva, NULL,
> NULL);
> + if (ptep && pte_write(*ptep) &&
> pte_dirty(*ptep))
> + pgflags |= _PAGE_WRITE;
> + local_irq_restore(flags);
> + }
> + }
> +
> + /*
> + * Compute the PTE value that we need to insert.
> + */
> + pgflags |= _PAGE_PRESENT | _PAGE_PTE | _PAGE_ACCESSED;
> + if (pgflags & _PAGE_WRITE)
> + pgflags |= _PAGE_DIRTY;
> + pte = pfn_pte(pfn, __pgprot(pgflags));
> +
> + /* Allocate space in the tree and write the PTE */
> + ret = kvmppc_create_pte(kvm, pte, gpa, level, mmu_seq);
> + if (ret == -EBUSY) {
> + /*
> + * There's already a PMD where wanted to install a
> large page;
> + * for now, fall back to installing a small page.
> + */
> + level = 0;
> + pfn |= gfn & ((PMD_SIZE >> PAGE_SHIFT) - 1);
> + pte = pfn_pte(pfn, __pgprot(pgflags));
> + ret = kvmppc_create_pte(kvm, pte, gpa, level,
> mmu_seq);
> + }
> + if (ret == 0 || ret == -EAGAIN)
> + ret = RESUME_GUEST;
> +
> + if (page) {
> + /*
> + * We drop pages[0] here, not page because page
> might
> + * have been set to the head page of a compound, but
> + * we have to drop the reference on the correct tail
> + * page to match the get inside gup()
> + */
> + put_page(pages[0]);
> + }
> + return ret;
> +}
> +
> +void kvmppc_free_radix(struct kvm *kvm)
> +{
> + unsigned long ig, iu, im;
> + pte_t *pte;
> + pmd_t *pmd;
> + pud_t *pud;
> + pgd_t *pgd;
> +
> + if (!kvm->arch.pgtable)
> + return;
> + pgd = kvm->arch.pgtable;
> + for (ig = 0; ig < PTRS_PER_PGD; ++ig, ++pgd) {
> + if (!pgd_present(*pgd))
> + continue;
> + pud = pud_offset(pgd, 0);
> + for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++pud) {
> + if (!pud_present(*pud))
> + continue;
> + pmd = pmd_offset(pud, 0);
> + for (im = 0; im < PTRS_PER_PMD; ++im, ++pmd)
> {
> + if (pmd_huge(*pmd)) {
> + pmd_clear(pmd);
> + continue;
> + }
> + if (!pmd_present(*pmd))
> + continue;
> + pte = pte_offset_map(pmd, 0);
> + memset(pte, 0, sizeof(long) <<
> PTE_INDEX_SIZE);
> + kvmppc_pte_free(pte);
> + pmd_clear(pmd);
> + }
> + pmd_free(kvm->mm, pmd_offset(pud, 0));
> + pud_clear(pud);
> + }
> + pud_free(kvm->mm, pud_offset(pgd, 0));
> + pgd_clear(pgd);
> + }
> + pgd_free(kvm->mm, kvm->arch.pgtable);
> +}
> +
> +static void pte_ctor(void *addr)
> +{
> + memset(addr, 0, PTE_TABLE_SIZE);
> +}
> +
> +int kvmppc_radix_init(void)
> +{
> + unsigned long size = sizeof(void *) << PTE_INDEX_SIZE;
> +
> + kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0,
> pte_ctor);
> + if (!kvm_pte_cache)
> + return -ENOMEM;
> + return 0;
> +}
> +
> +void kvmppc_radix_exit(void)
> +{
> + kmem_cache_destroy(kvm_pte_cache);
> +}
> diff --git a/arch/powerpc/kvm/book3s_hv.c
> b/arch/powerpc/kvm/book3s_hv.c
> index 6bd0f4a..4c2d054 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3357,7 +3357,10 @@ static void kvmppc_core_destroy_vm_hv(struct
> kvm *kvm)
>
> kvmppc_free_vcores(kvm);
>
> - kvmppc_free_hpt(kvm);
> + if (kvm->arch.radix)
kvm_is_radix() for consistency?
> + kvmppc_free_radix(kvm);
> + else
> + kvmppc_free_hpt(kvm);
>
> kvmppc_free_pimap(kvm);
> }
> @@ -3769,6 +3772,11 @@ static int kvm_init_subcore_bitmap(void)
> return 0;
> }
>
> +static int kvmppc_radix_possible(void)
> +{
> + return cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled();
> +}
> +
> static int kvmppc_book3s_init_hv(void)
> {
> int r;
> @@ -3808,12 +3816,19 @@ static int kvmppc_book3s_init_hv(void)
> init_vcore_lists();
>
> r = kvmppc_mmu_hv_init();
> + if (r)
> + return r;
> +
> + if (kvmppc_radix_possible())
> + r = kvmppc_radix_init();
> return r;
> }
>
> static void kvmppc_book3s_exit_hv(void)
> {
> kvmppc_free_host_rm_ops();
> + if (kvmppc_radix_possible())
> + kvmppc_radix_exit();
> kvmppc_hv_ops = NULL;
> }
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 13/18] KVM: PPC: Book3S HV: Page table construction and page faults for radix guests
2017-01-23 3:17 ` Suraj Jitindar Singh
@ 2017-01-23 4:38 ` Paul Mackerras
0 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-23 4:38 UTC (permalink / raw)
To: Suraj Jitindar Singh; +Cc: linuxppc-dev, kvm, kvm-ppc
On Mon, Jan 23, 2017 at 02:17:20PM +1100, Suraj Jitindar Singh wrote:
> On Thu, 2017-01-12 at 20:07 +1100, Paul Mackerras wrote:
> > This adds the code to construct the second-level ("partition-scoped"
> > in
> > architecturese) page tables for guests using the radix MMU. Apart
> > from
> > the PGD level, which is allocated when the guest is created, the rest
> > of the tree is all constructed in response to hypervisor page faults.
> >
> > As well as hypervisor page faults for missing pages, we also get
> > faults
> > for reference/change (RC) bits needing to be set, as well as various
> > other error conditions. For now, we only set the R or C bit in the
> > guest page table if the same bit is set in the host PTE for the
> > backing page.
> >
> > This code can take advantage of the guest being backed with either
> > transparent or ordinary 2MB huge pages, and insert 2MB page entries
> > into the guest page tables. There is no support for 1GB huge pages
> > yet.
[snip]
> > diff --git a/arch/powerpc/kvm/book3s_hv.c
> > b/arch/powerpc/kvm/book3s_hv.c
> > index 6bd0f4a..4c2d054 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -3357,7 +3357,10 @@ static void kvmppc_core_destroy_vm_hv(struct
> > kvm *kvm)
> >
> > kvmppc_free_vcores(kvm);
> >
> > - kvmppc_free_hpt(kvm);
> > + if (kvm->arch.radix)
> kvm_is_radix() for consistency?
Sure, and in the other places you noted.
Thanks,
Paul.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 14/18] KVM: PPC: Book3S HV: MMU notifier callbacks for radix guests
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (12 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 13/18] KVM: PPC: Book3S HV: Page table construction and page faults for " Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-23 3:21 ` Suraj Jitindar Singh
2017-01-12 9:07 ` [PATCH 15/18] KVM: PPC: Book3S HV: Implement dirty page logging " Paul Mackerras
` (3 subsequent siblings)
17 siblings, 1 reply; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This adapts our implementations of the MMU notifier callbacks
(unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva)
to call radix functions when the guest is using radix. These
implementations are much simpler than for HPT guests because we
have only one PTE to deal with, so we don't need to traverse
rmap chains.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_book3s.h | 6 ++++
arch/powerpc/kvm/book3s_64_mmu_hv.c | 64 +++++++++++++++++++++++-----------
arch/powerpc/kvm/book3s_64_mmu_radix.c | 54 ++++++++++++++++++++++++++++
3 files changed, 103 insertions(+), 21 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index ff5cd5c..952cc4b 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -192,6 +192,12 @@ extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
extern void kvmppc_free_radix(struct kvm *kvm);
extern int kvmppc_radix_init(void);
extern void kvmppc_radix_exit(void);
+extern int kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ unsigned long gfn);
+extern int kvm_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ unsigned long gfn);
+extern int kvm_test_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ unsigned long gfn);
/* XXX remove this export when load_last_inst() is generic */
extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 57690c2..fbb3de4 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -701,12 +701,13 @@ static void kvmppc_rmap_reset(struct kvm *kvm)
srcu_read_unlock(&kvm->srcu, srcu_idx);
}
+typedef int (*hva_handler_fn)(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ unsigned long gfn);
+
static int kvm_handle_hva_range(struct kvm *kvm,
unsigned long start,
unsigned long end,
- int (*handler)(struct kvm *kvm,
- unsigned long *rmapp,
- unsigned long gfn))
+ hva_handler_fn handler)
{
int ret;
int retval = 0;
@@ -731,9 +732,7 @@ static int kvm_handle_hva_range(struct kvm *kvm,
gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot);
for (; gfn < gfn_end; ++gfn) {
- gfn_t gfn_offset = gfn - memslot->base_gfn;
-
- ret = handler(kvm, &memslot->arch.rmap[gfn_offset], gfn);
+ ret = handler(kvm, memslot, gfn);
retval |= ret;
}
}
@@ -742,20 +741,21 @@ static int kvm_handle_hva_range(struct kvm *kvm,
}
static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
- int (*handler)(struct kvm *kvm, unsigned long *rmapp,
- unsigned long gfn))
+ hva_handler_fn handler)
{
return kvm_handle_hva_range(kvm, hva, hva + 1, handler);
}
-static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
+static int kvm_unmap_rmapp(struct kvm *kvm, struct kvm_memory_slot *memslot,
unsigned long gfn)
{
struct revmap_entry *rev = kvm->arch.revmap;
unsigned long h, i, j;
__be64 *hptep;
unsigned long ptel, psize, rcbits;
+ unsigned long *rmapp;
+ rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn];
for (;;) {
lock_rmap(rmapp);
if (!(*rmapp & KVMPPC_RMAP_PRESENT)) {
@@ -816,26 +816,36 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva)
{
- kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
+ hva_handler_fn handler;
+
+ handler = kvm->arch.radix ? kvm_unmap_radix : kvm_unmap_rmapp;
+ kvm_handle_hva(kvm, hva, handler);
return 0;
}
int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned long end)
{
- kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);
+ hva_handler_fn handler;
+
+ handler = kvm->arch.radix ? kvm_unmap_radix : kvm_unmap_rmapp;
+ kvm_handle_hva_range(kvm, start, end, handler);
return 0;
}
void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- unsigned long *rmapp;
unsigned long gfn;
unsigned long n;
+ unsigned long *rmapp;
- rmapp = memslot->arch.rmap;
gfn = memslot->base_gfn;
- for (n = memslot->npages; n; --n) {
+ rmapp = memslot->arch.rmap;
+ for (n = memslot->npages; n; --n, ++gfn) {
+ if (kvm->arch.radix) {
+ kvm_unmap_radix(kvm, memslot, gfn);
+ continue;
+ }
/*
* Testing the present bit without locking is OK because
* the memslot has been marked invalid already, and hence
@@ -843,20 +853,21 @@ void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
* thus the present bit can't go from 0 to 1.
*/
if (*rmapp & KVMPPC_RMAP_PRESENT)
- kvm_unmap_rmapp(kvm, rmapp, gfn);
+ kvm_unmap_rmapp(kvm, memslot, gfn);
++rmapp;
- ++gfn;
}
}
-static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
+static int kvm_age_rmapp(struct kvm *kvm, struct kvm_memory_slot *memslot,
unsigned long gfn)
{
struct revmap_entry *rev = kvm->arch.revmap;
unsigned long head, i, j;
__be64 *hptep;
int ret = 0;
+ unsigned long *rmapp;
+ rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn];
retry:
lock_rmap(rmapp);
if (*rmapp & KVMPPC_RMAP_REFERENCED) {
@@ -904,17 +915,22 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
int kvm_age_hva_hv(struct kvm *kvm, unsigned long start, unsigned long end)
{
- return kvm_handle_hva_range(kvm, start, end, kvm_age_rmapp);
+ hva_handler_fn handler;
+
+ handler = kvm->arch.radix ? kvm_age_radix : kvm_age_rmapp;
+ return kvm_handle_hva_range(kvm, start, end, handler);
}
-static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
+static int kvm_test_age_rmapp(struct kvm *kvm, struct kvm_memory_slot *memslot,
unsigned long gfn)
{
struct revmap_entry *rev = kvm->arch.revmap;
unsigned long head, i, j;
unsigned long *hp;
int ret = 1;
+ unsigned long *rmapp;
+ rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn];
if (*rmapp & KVMPPC_RMAP_REFERENCED)
return 1;
@@ -940,12 +956,18 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
int kvm_test_age_hva_hv(struct kvm *kvm, unsigned long hva)
{
- return kvm_handle_hva(kvm, hva, kvm_test_age_rmapp);
+ hva_handler_fn handler;
+
+ handler = kvm->arch.radix ? kvm_test_age_radix : kvm_test_age_rmapp;
+ return kvm_handle_hva(kvm, hva, handler);
}
void kvm_set_spte_hva_hv(struct kvm *kvm, unsigned long hva, pte_t pte)
{
- kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
+ hva_handler_fn handler;
+
+ handler = kvm->arch.radix ? kvm_unmap_radix : kvm_unmap_rmapp;
+ kvm_handle_hva(kvm, hva, handler);
}
static int vcpus_running(struct kvm *kvm)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 865ea9b..69cabad 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -463,6 +463,60 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
return ret;
}
+/* Called with kvm->lock held */
+int kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ unsigned long gfn)
+{
+ pte_t *ptep;
+ unsigned long gpa = gfn << PAGE_SHIFT;
+ unsigned int shift;
+
+ ptep = __find_linux_pte_or_hugepte(kvm->arch.pgtable, gpa,
+ NULL, &shift);
+ if (ptep && pte_present(*ptep)) {
+ kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT, 0,
+ gpa, shift);
+ kvmppc_radix_tlbie_page(kvm, gpa, shift);
+ }
+ return 0;
+}
+
+/* Called with kvm->lock held */
+int kvm_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ unsigned long gfn)
+{
+ pte_t *ptep;
+ unsigned long gpa = gfn << PAGE_SHIFT;
+ unsigned int shift;
+ int ref = 0;
+
+ ptep = __find_linux_pte_or_hugepte(kvm->arch.pgtable, gpa,
+ NULL, &shift);
+ if (ptep && pte_present(*ptep) && pte_young(*ptep)) {
+ kvmppc_radix_update_pte(kvm, ptep, _PAGE_ACCESSED, 0,
+ gpa, shift);
+ /* XXX need to flush tlb here? */
+ ref = 1;
+ }
+ return ref;
+}
+
+/* Called with kvm->lock held */
+int kvm_test_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ unsigned long gfn)
+{
+ pte_t *ptep;
+ unsigned long gpa = gfn << PAGE_SHIFT;
+ unsigned int shift;
+ int ref = 0;
+
+ ptep = __find_linux_pte_or_hugepte(kvm->arch.pgtable, gpa,
+ NULL, &shift);
+ if (ptep && pte_present(*ptep) && pte_young(*ptep))
+ ref = 1;
+ return ref;
+}
+
void kvmppc_free_radix(struct kvm *kvm)
{
unsigned long ig, iu, im;
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 14/18] KVM: PPC: Book3S HV: MMU notifier callbacks for radix guests
2017-01-12 9:07 ` [PATCH 14/18] KVM: PPC: Book3S HV: MMU notifier callbacks " Paul Mackerras
@ 2017-01-23 3:21 ` Suraj Jitindar Singh
0 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2017-01-23 3:21 UTC (permalink / raw)
To: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc
On Thu, 2017-01-12 at 20:07 +1100, Paul Mackerras wrote:
> This adapts our implementations of the MMU notifier callbacks
> (unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva)
> to call radix functions when the guest is using radix. These
> implementations are much simpler than for HPT guests because we
> have only one PTE to deal with, so we don't need to traverse
> rmap chains.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/include/asm/kvm_book3s.h | 6 ++++
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 64 +++++++++++++++++++++++-
> ----------
> arch/powerpc/kvm/book3s_64_mmu_radix.c | 54
> ++++++++++++++++++++++++++++
> 3 files changed, 103 insertions(+), 21 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h
> b/arch/powerpc/include/asm/kvm_book3s.h
> index ff5cd5c..952cc4b 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -192,6 +192,12 @@ extern int kvmppc_mmu_radix_xlate(struct
> kvm_vcpu *vcpu, gva_t eaddr,
> extern void kvmppc_free_radix(struct kvm *kvm);
> extern int kvmppc_radix_init(void);
> extern void kvmppc_radix_exit(void);
> +extern int kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot
> *memslot,
> + unsigned long gfn);
> +extern int kvm_age_radix(struct kvm *kvm, struct kvm_memory_slot
> *memslot,
> + unsigned long gfn);
> +extern int kvm_test_age_radix(struct kvm *kvm, struct
> kvm_memory_slot *memslot,
> + unsigned long gfn);
>
> /* XXX remove this export when load_last_inst() is generic */
> extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size,
> void *ptr, bool data);
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 57690c2..fbb3de4 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -701,12 +701,13 @@ static void kvmppc_rmap_reset(struct kvm *kvm)
> srcu_read_unlock(&kvm->srcu, srcu_idx);
> }
>
> +typedef int (*hva_handler_fn)(struct kvm *kvm, struct
> kvm_memory_slot *memslot,
> + unsigned long gfn);
> +
> static int kvm_handle_hva_range(struct kvm *kvm,
> unsigned long start,
> unsigned long end,
> - int (*handler)(struct kvm *kvm,
> - unsigned long *rmapp,
> - unsigned long gfn))
> + hva_handler_fn handler)
> {
> int ret;
> int retval = 0;
> @@ -731,9 +732,7 @@ static int kvm_handle_hva_range(struct kvm *kvm,
> gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE -
> 1, memslot);
>
> for (; gfn < gfn_end; ++gfn) {
> - gfn_t gfn_offset = gfn - memslot->base_gfn;
> -
> - ret = handler(kvm, &memslot-
> >arch.rmap[gfn_offset], gfn);
> + ret = handler(kvm, memslot, gfn);
> retval |= ret;
> }
> }
> @@ -742,20 +741,21 @@ static int kvm_handle_hva_range(struct kvm
> *kvm,
> }
>
> static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
> - int (*handler)(struct kvm *kvm, unsigned
> long *rmapp,
> - unsigned long gfn))
> + hva_handler_fn handler)
> {
> return kvm_handle_hva_range(kvm, hva, hva + 1, handler);
> }
>
> -static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
> +static int kvm_unmap_rmapp(struct kvm *kvm, struct kvm_memory_slot
> *memslot,
> unsigned long gfn)
> {
> struct revmap_entry *rev = kvm->arch.revmap;
> unsigned long h, i, j;
> __be64 *hptep;
> unsigned long ptel, psize, rcbits;
> + unsigned long *rmapp;
>
> + rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn];
> for (;;) {
> lock_rmap(rmapp);
> if (!(*rmapp & KVMPPC_RMAP_PRESENT)) {
> @@ -816,26 +816,36 @@ static int kvm_unmap_rmapp(struct kvm *kvm,
> unsigned long *rmapp,
>
> int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva)
> {
> - kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
> + hva_handler_fn handler;
> +
> + handler = kvm->arch.radix ? kvm_unmap_radix :
kvm_is_radix() for consistency?
> kvm_unmap_rmapp;
> + kvm_handle_hva(kvm, hva, handler);
> return 0;
> }
>
> int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start,
> unsigned long end)
> {
> - kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);
> + hva_handler_fn handler;
> +
> + handler = kvm->arch.radix ? kvm_unmap_radix :
ditto
> kvm_unmap_rmapp;
> + kvm_handle_hva_range(kvm, start, end, handler);
> return 0;
> }
>
> void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
> struct kvm_memory_slot *memslot)
> {
> - unsigned long *rmapp;
> unsigned long gfn;
> unsigned long n;
> + unsigned long *rmapp;
>
> - rmapp = memslot->arch.rmap;
> gfn = memslot->base_gfn;
> - for (n = memslot->npages; n; --n) {
> + rmapp = memslot->arch.rmap;
> + for (n = memslot->npages; n; --n, ++gfn) {
> + if (kvm->arch.radix) {
ditto
> + kvm_unmap_radix(kvm, memslot, gfn);
> + continue;
> + }
> /*
> * Testing the present bit without locking is OK
> because
> * the memslot has been marked invalid already, and
> hence
> @@ -843,20 +853,21 @@ void kvmppc_core_flush_memslot_hv(struct kvm
> *kvm,
> * thus the present bit can't go from 0 to 1.
> */
> if (*rmapp & KVMPPC_RMAP_PRESENT)
> - kvm_unmap_rmapp(kvm, rmapp, gfn);
> + kvm_unmap_rmapp(kvm, memslot, gfn);
> ++rmapp;
> - ++gfn;
> }
> }
>
> -static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
> +static int kvm_age_rmapp(struct kvm *kvm, struct kvm_memory_slot
> *memslot,
> unsigned long gfn)
> {
> struct revmap_entry *rev = kvm->arch.revmap;
> unsigned long head, i, j;
> __be64 *hptep;
> int ret = 0;
> + unsigned long *rmapp;
>
> + rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn];
> retry:
> lock_rmap(rmapp);
> if (*rmapp & KVMPPC_RMAP_REFERENCED) {
> @@ -904,17 +915,22 @@ static int kvm_age_rmapp(struct kvm *kvm,
> unsigned long *rmapp,
>
> int kvm_age_hva_hv(struct kvm *kvm, unsigned long start, unsigned
> long end)
> {
> - return kvm_handle_hva_range(kvm, start, end, kvm_age_rmapp);
> + hva_handler_fn handler;
> +
> + handler = kvm->arch.radix ? kvm_age_radix : kvm_age_rmapp;
ditto
> + return kvm_handle_hva_range(kvm, start, end, handler);
> }
>
> -static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
> +static int kvm_test_age_rmapp(struct kvm *kvm, struct
> kvm_memory_slot *memslot,
> unsigned long gfn)
> {
> struct revmap_entry *rev = kvm->arch.revmap;
> unsigned long head, i, j;
> unsigned long *hp;
> int ret = 1;
> + unsigned long *rmapp;
>
> + rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn];
> if (*rmapp & KVMPPC_RMAP_REFERENCED)
> return 1;
>
> @@ -940,12 +956,18 @@ static int kvm_test_age_rmapp(struct kvm *kvm,
> unsigned long *rmapp,
>
> int kvm_test_age_hva_hv(struct kvm *kvm, unsigned long hva)
> {
> - return kvm_handle_hva(kvm, hva, kvm_test_age_rmapp);
> + hva_handler_fn handler;
> +
> + handler = kvm->arch.radix ? kvm_test_age_radix :
ditto
> kvm_test_age_rmapp;
> + return kvm_handle_hva(kvm, hva, handler);
> }
>
> void kvm_set_spte_hva_hv(struct kvm *kvm, unsigned long hva, pte_t
> pte)
> {
> - kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
> + hva_handler_fn handler;
> +
> + handler = kvm->arch.radix ? kvm_unmap_radix :
ditto
> kvm_unmap_rmapp;
> + kvm_handle_hva(kvm, hva, handler);
> }
>
> static int vcpus_running(struct kvm *kvm)
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> index 865ea9b..69cabad 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> @@ -463,6 +463,60 @@ int kvmppc_book3s_radix_page_fault(struct
> kvm_run *run, struct kvm_vcpu *vcpu,
> return ret;
> }
>
> +/* Called with kvm->lock held */
> +int kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot
> *memslot,
> + unsigned long gfn)
> +{
> + pte_t *ptep;
> + unsigned long gpa = gfn << PAGE_SHIFT;
> + unsigned int shift;
> +
> + ptep = __find_linux_pte_or_hugepte(kvm->arch.pgtable, gpa,
> + NULL, &shift);
> + if (ptep && pte_present(*ptep)) {
> + kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT, 0,
> + gpa, shift);
> + kvmppc_radix_tlbie_page(kvm, gpa, shift);
> + }
> + return 0;
> +}
> +
> +/* Called with kvm->lock held */
> +int kvm_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
> + unsigned long gfn)
> +{
> + pte_t *ptep;
> + unsigned long gpa = gfn << PAGE_SHIFT;
> + unsigned int shift;
> + int ref = 0;
> +
> + ptep = __find_linux_pte_or_hugepte(kvm->arch.pgtable, gpa,
> + NULL, &shift);
> + if (ptep && pte_present(*ptep) && pte_young(*ptep)) {
> + kvmppc_radix_update_pte(kvm, ptep, _PAGE_ACCESSED,
> 0,
> + gpa, shift);
> + /* XXX need to flush tlb here? */
> + ref = 1;
> + }
> + return ref;
> +}
> +
> +/* Called with kvm->lock held */
> +int kvm_test_age_radix(struct kvm *kvm, struct kvm_memory_slot
> *memslot,
> + unsigned long gfn)
> +{
> + pte_t *ptep;
> + unsigned long gpa = gfn << PAGE_SHIFT;
> + unsigned int shift;
> + int ref = 0;
> +
> + ptep = __find_linux_pte_or_hugepte(kvm->arch.pgtable, gpa,
> + NULL, &shift);
> + if (ptep && pte_present(*ptep) && pte_young(*ptep))
> + ref = 1;
> + return ref;
> +}
> +
> void kvmppc_free_radix(struct kvm *kvm)
> {
> unsigned long ig, iu, im;
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 15/18] KVM: PPC: Book3S HV: Implement dirty page logging for radix guests
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (13 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 14/18] KVM: PPC: Book3S HV: MMU notifier callbacks " Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 9:07 ` [PATCH 16/18] KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix mode Paul Mackerras
` (2 subsequent siblings)
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This adds code to keep track of dirty pages when requested (that is,
when memslot->dirty_bitmap is non-NULL) for radix guests. We use the
dirty bits in the PTEs in the second-level (partition-scoped) page
tables, together with a bitmap of pages that were dirty when their
PTE was invalidated (e.g., when the page was paged out). This bitmap
is stored in the first half of the memslot->dirty_bitmap area, and
kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the
bitmap that gets returned to userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_book3s.h | 7 ++-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++++-----
arch/powerpc/kvm/book3s_64_mmu_radix.c | 111 ++++++++++++++++++++++++++++++---
arch/powerpc/kvm/book3s_hv.c | 31 +++++++--
4 files changed, 144 insertions(+), 33 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 952cc4b..57dc407 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -198,6 +198,8 @@ extern int kvm_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
unsigned long gfn);
extern int kvm_test_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
unsigned long gfn);
+extern long kvmppc_hv_get_dirty_log_radix(struct kvm *kvm,
+ struct kvm_memory_slot *memslot, unsigned long *map);
/* XXX remove this export when load_last_inst() is generic */
extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
@@ -228,8 +230,11 @@ extern long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
extern long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
unsigned long pte_index, unsigned long avpn,
unsigned long *hpret);
-extern long kvmppc_hv_get_dirty_log(struct kvm *kvm,
+extern long kvmppc_hv_get_dirty_log_hpt(struct kvm *kvm,
struct kvm_memory_slot *memslot, unsigned long *map);
+extern void kvmppc_harvest_vpa_dirty(struct kvmppc_vpa *vpa,
+ struct kvm_memory_slot *memslot,
+ unsigned long *map);
extern void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr,
unsigned long mask);
extern void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index fbb3de4..7a9afbe 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1068,7 +1068,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
return npages_dirty;
}
-static void harvest_vpa_dirty(struct kvmppc_vpa *vpa,
+void kvmppc_harvest_vpa_dirty(struct kvmppc_vpa *vpa,
struct kvm_memory_slot *memslot,
unsigned long *map)
{
@@ -1086,12 +1086,11 @@ static void harvest_vpa_dirty(struct kvmppc_vpa *vpa,
__set_bit_le(gfn - memslot->base_gfn, map);
}
-long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot,
- unsigned long *map)
+long kvmppc_hv_get_dirty_log_hpt(struct kvm *kvm,
+ struct kvm_memory_slot *memslot, unsigned long *map)
{
unsigned long i, j;
unsigned long *rmapp;
- struct kvm_vcpu *vcpu;
preempt_disable();
rmapp = memslot->arch.rmap;
@@ -1107,15 +1106,6 @@ long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot,
__set_bit_le(j, map);
++rmapp;
}
-
- /* Harvest dirty bits from VPA and DTL updates */
- /* Note: we never modify the SLB shadow buffer areas */
- kvm_for_each_vcpu(i, vcpu, kvm) {
- spin_lock(&vcpu->arch.vpa_update_lock);
- harvest_vpa_dirty(&vcpu->arch.vpa, memslot, map);
- harvest_vpa_dirty(&vcpu->arch.dtl, memslot, map);
- spin_unlock(&vcpu->arch.vpa_update_lock);
- }
preempt_enable();
return 0;
}
@@ -1170,10 +1160,14 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, unsigned long gpa,
srcu_idx = srcu_read_lock(&kvm->srcu);
memslot = gfn_to_memslot(kvm, gfn);
if (memslot) {
- rmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
- lock_rmap(rmap);
- *rmap |= KVMPPC_RMAP_CHANGED;
- unlock_rmap(rmap);
+ if (!kvm_is_radix(kvm)) {
+ rmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
+ lock_rmap(rmap);
+ *rmap |= KVMPPC_RMAP_CHANGED;
+ unlock_rmap(rmap);
+ } else if (memslot->dirty_bitmap) {
+ mark_page_dirty(kvm, gfn);
+ }
}
srcu_read_unlock(&kvm->srcu, srcu_idx);
}
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 69cabad..125cc7c 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -158,18 +158,21 @@ static void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr,
asm volatile("ptesync": : :"memory");
}
-void kvmppc_radix_update_pte(struct kvm *kvm, pte_t *ptep, unsigned long clr,
- unsigned long set, unsigned long addr,
- unsigned int shift)
+unsigned long kvmppc_radix_update_pte(struct kvm *kvm, pte_t *ptep,
+ unsigned long clr, unsigned long set,
+ unsigned long addr, unsigned int shift)
{
+ unsigned long old = 0;
+
if (!(clr & _PAGE_PRESENT) && cpu_has_feature(CPU_FTR_POWER9_DD1) &&
pte_present(*ptep)) {
/* have to invalidate it first */
- __radix_pte_update(ptep, _PAGE_PRESENT, 0);
+ old = __radix_pte_update(ptep, _PAGE_PRESENT, 0);
kvmppc_radix_tlbie_page(kvm, addr, shift);
set |= _PAGE_PRESENT;
+ old &= _PAGE_PRESENT;
}
- __radix_pte_update(ptep, clr, set);
+ return __radix_pte_update(ptep, clr, set) | old;
}
void kvmppc_radix_set_pte_at(struct kvm *kvm, unsigned long addr,
@@ -197,6 +200,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
pud_t *pud, *new_pud = NULL;
pmd_t *pmd, *new_pmd = NULL;
pte_t *ptep, *new_ptep = NULL;
+ unsigned long old;
int ret;
/* Traverse the guest's 2nd-level tree, allocate new levels needed */
@@ -262,9 +266,11 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
ptep = pte_offset_kernel(pmd, gpa);
if (pte_present(*ptep)) {
/* PTE was previously valid, so invalidate it */
- kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT,
- 0, gpa, 0);
+ old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT,
+ 0, gpa, 0);
kvmppc_radix_tlbie_page(kvm, gpa, 0);
+ if (old & _PAGE_DIRTY)
+ mark_page_dirty(kvm, gpa >> PAGE_SHIFT);
}
kvmppc_radix_set_pte_at(kvm, gpa, ptep, pte);
} else {
@@ -463,6 +469,26 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
return ret;
}
+static void mark_pages_dirty(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ unsigned long gfn, unsigned int order)
+{
+ unsigned long i, limit;
+ unsigned long *dp;
+
+ if (!memslot->dirty_bitmap)
+ return;
+ limit = 1ul << order;
+ if (limit < BITS_PER_LONG) {
+ for (i = 0; i < limit; ++i)
+ mark_page_dirty(kvm, gfn + i);
+ return;
+ }
+ dp = memslot->dirty_bitmap + (gfn - memslot->base_gfn);
+ limit /= BITS_PER_LONG;
+ for (i = 0; i < limit; ++i)
+ *dp++ = ~0ul;
+}
+
/* Called with kvm->lock held */
int kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
unsigned long gfn)
@@ -470,13 +496,21 @@ int kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
pte_t *ptep;
unsigned long gpa = gfn << PAGE_SHIFT;
unsigned int shift;
+ unsigned long old;
ptep = __find_linux_pte_or_hugepte(kvm->arch.pgtable, gpa,
NULL, &shift);
if (ptep && pte_present(*ptep)) {
- kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT, 0,
- gpa, shift);
+ old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT, 0,
+ gpa, shift);
kvmppc_radix_tlbie_page(kvm, gpa, shift);
+ if (old & _PAGE_DIRTY) {
+ if (!shift)
+ mark_page_dirty(kvm, gfn);
+ else
+ mark_pages_dirty(kvm, memslot,
+ gfn, shift - PAGE_SHIFT);
+ }
}
return 0;
}
@@ -517,6 +551,65 @@ int kvm_test_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
return ref;
}
+/* Returns the number of PAGE_SIZE pages that are dirty */
+static int kvm_radix_test_clear_dirty(struct kvm *kvm,
+ struct kvm_memory_slot *memslot, int pagenum)
+{
+ unsigned long gfn = memslot->base_gfn + pagenum;
+ unsigned long gpa = gfn << PAGE_SHIFT;
+ pte_t *ptep;
+ unsigned int shift;
+ int ret = 0;
+
+ ptep = __find_linux_pte_or_hugepte(kvm->arch.pgtable, gpa,
+ NULL, &shift);
+ if (ptep && pte_present(*ptep) && pte_dirty(*ptep)) {
+ ret = 1;
+ if (shift)
+ ret = 1 << (shift - PAGE_SHIFT);
+ kvmppc_radix_update_pte(kvm, ptep, _PAGE_DIRTY, 0,
+ gpa, shift);
+ kvmppc_radix_tlbie_page(kvm, gpa, shift);
+ }
+ return ret;
+}
+
+long kvmppc_hv_get_dirty_log_radix(struct kvm *kvm,
+ struct kvm_memory_slot *memslot, unsigned long *map)
+{
+ unsigned long i, j;
+ unsigned long n, *p;
+ int npages;
+
+ /*
+ * Radix accumulates dirty bits in the first half of the
+ * memslot's dirty_bitmap area, for when pages are paged
+ * out or modified by the host directly. Pick up these
+ * bits and add them to the map.
+ */
+ n = kvm_dirty_bitmap_bytes(memslot) / sizeof(long);
+ p = memslot->dirty_bitmap;
+ for (i = 0; i < n; ++i)
+ map[i] |= xchg(&p[i], 0);
+
+ for (i = 0; i < memslot->npages; i = j) {
+ npages = kvm_radix_test_clear_dirty(kvm, memslot, i);
+
+ /*
+ * Note that if npages > 0 then i must be a multiple of npages,
+ * since huge pages are only used to back the guest at guest
+ * real addresses that are a multiple of their size.
+ * Since we have at most one PTE covering any given guest
+ * real address, if npages > 1 we can skip to i + npages.
+ */
+ j = i + 1;
+ if (npages)
+ for (j = i; npages; ++j, --npages)
+ __set_bit_le(j, map);
+ }
+ return 0;
+}
+
void kvmppc_free_radix(struct kvm *kvm)
{
unsigned long ig, iu, im;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4c2d054..ab5adcd 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2962,8 +2962,10 @@ static int kvm_vm_ioctl_get_dirty_log_hv(struct kvm *kvm,
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
- int r;
+ int i, r;
unsigned long n;
+ unsigned long *buf;
+ struct kvm_vcpu *vcpu;
mutex_lock(&kvm->slots_lock);
@@ -2977,15 +2979,32 @@ static int kvm_vm_ioctl_get_dirty_log_hv(struct kvm *kvm,
if (!memslot->dirty_bitmap)
goto out;
+ /*
+ * Use second half of bitmap area because radix accumulates
+ * bits in the first half.
+ */
n = kvm_dirty_bitmap_bytes(memslot);
- memset(memslot->dirty_bitmap, 0, n);
+ buf = memslot->dirty_bitmap + n / sizeof(long);
+ memset(buf, 0, n);
- r = kvmppc_hv_get_dirty_log(kvm, memslot, memslot->dirty_bitmap);
+ if (kvm_is_radix(kvm))
+ r = kvmppc_hv_get_dirty_log_radix(kvm, memslot, buf);
+ else
+ r = kvmppc_hv_get_dirty_log_hpt(kvm, memslot, buf);
if (r)
goto out;
+ /* Harvest dirty bits from VPA and DTL updates */
+ /* Note: we never modify the SLB shadow buffer areas */
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ spin_lock(&vcpu->arch.vpa_update_lock);
+ kvmppc_harvest_vpa_dirty(&vcpu->arch.vpa, memslot, buf);
+ kvmppc_harvest_vpa_dirty(&vcpu->arch.dtl, memslot, buf);
+ spin_unlock(&vcpu->arch.vpa_update_lock);
+ }
+
r = -EFAULT;
- if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
+ if (copy_to_user(log->dirty_bitmap, buf, n))
goto out;
r = 0;
@@ -3038,7 +3057,7 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
if (npages)
atomic64_inc(&kvm->arch.mmio_update);
- if (npages && old->npages) {
+ if (npages && old->npages && !kvm_is_radix(kvm)) {
/*
* If modifying a memslot, reset all the rmap dirty bits.
* If this is a new memslot, we don't need to do anything
@@ -3047,7 +3066,7 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
*/
slots = kvm_memslots(kvm);
memslot = id_to_memslot(slots, mem->slot);
- kvmppc_hv_get_dirty_log(kvm, memslot, NULL);
+ kvmppc_hv_get_dirty_log_hpt(kvm, memslot, NULL);
}
}
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 16/18] KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix mode
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (14 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 15/18] KVM: PPC: Book3S HV: Implement dirty page logging " Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-12 9:07 ` [PATCH 17/18] KVM: PPC: Book3S HV: Enable radix guest support Paul Mackerras
2017-01-12 9:07 ` [PATCH 18/18] KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9 Paul Mackerras
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
If the guest is in radix mode, then it doesn't have a hashed page
table (HPT), so all of the hypercalls that manipulate the HPT can't
work and should return an error. This adds checks to make them
return H_FUNCTION ("function not supported").
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9ef3c4b..6c1ac3d 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -182,6 +182,8 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
unsigned long mmu_seq;
unsigned long rcbits, irq_flags = 0;
+ if (kvm_is_radix(kvm))
+ return H_FUNCTION;
psize = hpte_page_size(pteh, ptel);
if (!psize)
return H_PARAMETER;
@@ -458,6 +460,8 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
struct revmap_entry *rev;
u64 pte, orig_pte, pte_r;
+ if (kvm_is_radix(kvm))
+ return H_FUNCTION;
if (pte_index >= kvm->arch.hpt_npte)
return H_PARAMETER;
hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
@@ -529,6 +533,8 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
struct revmap_entry *rev, *revs[4];
u64 hp0, hp1;
+ if (kvm_is_radix(kvm))
+ return H_FUNCTION;
global = global_invalidates(kvm, 0);
for (i = 0; i < 4 && ret == H_SUCCESS; ) {
n = 0;
@@ -642,6 +648,8 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long v, r, rb, mask, bits;
u64 pte_v, pte_r;
+ if (kvm_is_radix(kvm))
+ return H_FUNCTION;
if (pte_index >= kvm->arch.hpt_npte)
return H_PARAMETER;
@@ -711,6 +719,8 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
int i, n = 1;
struct revmap_entry *rev = NULL;
+ if (kvm_is_radix(kvm))
+ return H_FUNCTION;
if (pte_index >= kvm->arch.hpt_npte)
return H_PARAMETER;
if (flags & H_READ_4) {
@@ -750,6 +760,8 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long *rmap;
long ret = H_NOT_FOUND;
+ if (kvm_is_radix(kvm))
+ return H_FUNCTION;
if (pte_index >= kvm->arch.hpt_npte)
return H_PARAMETER;
@@ -796,6 +808,8 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long *rmap;
long ret = H_NOT_FOUND;
+ if (kvm_is_radix(kvm))
+ return H_FUNCTION;
if (pte_index >= kvm->arch.hpt_npte)
return H_PARAMETER;
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 17/18] KVM: PPC: Book3S HV: Enable radix guest support
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (15 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 16/18] KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix mode Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
2017-01-23 3:31 ` Suraj Jitindar Singh
2017-01-12 9:07 ` [PATCH 18/18] KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9 Paul Mackerras
17 siblings, 1 reply; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
This adds a few last pieces of the support for radix guests:
* Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and
KVM_PPC_GET_RMMU_INFO ioctls for radix guests
* On POWER9, allow secondary threads to be on/off-lined while guests
are running.
* Set up LPCR and the partition table entry for radix guests.
* Don't allocate the rmap array in the kvm_memory_slot structure
on radix.
* Prevent the AIL field in the LPCR being set for radix guests,
since we can't yet handle getting interrupts from the guest with
the MMU on.
* Don't try to initialize the HPT for radix guests, since they don't
have an HPT.
* Take out the code that prevents the HV KVM module from
initializing on radix hosts.
At this stage, we only support radix guests if the host is running
in radix mode, and only support HPT guests if the host is running in
HPT mode. Thus a guest cannot switch from one mode to the other,
which enables some simplifications.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_book3s.h | 2 +
arch/powerpc/kvm/book3s_64_mmu_hv.c | 1 -
arch/powerpc/kvm/book3s_64_mmu_radix.c | 45 ++++++++++++++++
arch/powerpc/kvm/book3s_hv.c | 93 ++++++++++++++++++++++++----------
arch/powerpc/kvm/powerpc.c | 2 +-
5 files changed, 115 insertions(+), 28 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 57dc407..2bf3501 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -189,6 +189,7 @@ extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run,
unsigned long ea, unsigned long dsisr);
extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
struct kvmppc_pte *gpte, bool data, bool iswrite);
+extern int kvmppc_init_vm_radix(struct kvm *kvm);
extern void kvmppc_free_radix(struct kvm *kvm);
extern int kvmppc_radix_init(void);
extern void kvmppc_radix_exit(void);
@@ -200,6 +201,7 @@ extern int kvm_test_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
unsigned long gfn);
extern long kvmppc_hv_get_dirty_log_radix(struct kvm *kvm,
struct kvm_memory_slot *memslot, unsigned long *map);
+extern int kvmhv_get_rmmu_info(struct kvm *kvm, struct kvm_ppc_rmmu_info *info);
/* XXX remove this export when load_last_inst() is generic */
extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7a9afbe..db8de17 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -155,7 +155,6 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
void kvmppc_free_hpt(struct kvm *kvm)
{
- kvmppc_free_lpid(kvm->arch.lpid);
vfree(kvm->arch.revmap);
if (kvm->arch.hpt_cma_alloc)
kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt),
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 125cc7c..4344651 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -610,6 +610,51 @@ long kvmppc_hv_get_dirty_log_radix(struct kvm *kvm,
return 0;
}
+static void add_rmmu_ap_encoding(struct kvm_ppc_rmmu_info *info,
+ int psize, int *indexp)
+{
+ if (!mmu_psize_defs[psize].shift)
+ return;
+ info->ap_encodings[*indexp] = mmu_psize_defs[psize].shift |
+ (mmu_psize_defs[psize].ap << 29);
+ ++(*indexp);
+}
+
+int kvmhv_get_rmmu_info(struct kvm *kvm, struct kvm_ppc_rmmu_info *info)
+{
+ int i;
+
+ if (!radix_enabled())
+ return -EINVAL;
+ memset(info, 0, sizeof(*info));
+
+ /* 4k page size */
+ info->geometries[0].page_shift = 12;
+ info->geometries[0].level_bits[0] = 9;
+ for (i = 1; i < 4; ++i)
+ info->geometries[0].level_bits[i] = p9_supported_radix_bits[i];
+ /* 64k page size */
+ info->geometries[1].page_shift = 16;
+ for (i = 0; i < 4; ++i)
+ info->geometries[1].level_bits[i] = p9_supported_radix_bits[i];
+
+ i = 0;
+ add_rmmu_ap_encoding(info, MMU_PAGE_4K, &i);
+ add_rmmu_ap_encoding(info, MMU_PAGE_64K, &i);
+ add_rmmu_ap_encoding(info, MMU_PAGE_2M, &i);
+ add_rmmu_ap_encoding(info, MMU_PAGE_1G, &i);
+
+ return 0;
+}
+
+int kvmppc_init_vm_radix(struct kvm *kvm)
+{
+ kvm->arch.pgtable = pgd_alloc(kvm->mm);
+ if (!kvm->arch.pgtable)
+ return -ENOMEM;
+ return 0;
+}
+
void kvmppc_free_radix(struct kvm *kvm)
{
unsigned long ig, iu, im;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ab5adcd..14a9efe 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1136,10 +1136,13 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
/*
* Userspace can only modify DPFD (default prefetch depth),
* ILE (interrupt little-endian) and TC (translation control).
- * On POWER8 userspace can also modify AIL (alt. interrupt loc.)
+ * On POWER8 userspace can also modify AIL (alt. interrupt loc.).
+ * On POWER9 with a radix guest, we can't allow AIL to be set
+ * since we don't yet have KVM handlers in the relocation-on
+ * interrupt vectors.
*/
mask = LPCR_DPFD | LPCR_ILE | LPCR_TC;
- if (cpu_has_feature(CPU_FTR_ARCH_207S))
+ if (cpu_has_feature(CPU_FTR_ARCH_207S) && !kvm_is_radix(kvm))
mask |= LPCR_AIL;
/* Broken 32-bit version of LPCR must not clear top bits */
@@ -2878,7 +2881,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
smp_mb();
/* On the first time here, set up HTAB and VRMA */
- if (!vcpu->kvm->arch.hpte_setup_done) {
+ if (!kvm_is_radix(vcpu->kvm) && !vcpu->kvm->arch.hpte_setup_done) {
r = kvmppc_hv_setup_htab_rma(vcpu);
if (r)
goto out;
@@ -2940,6 +2943,13 @@ static int kvm_vm_ioctl_get_smmu_info_hv(struct kvm *kvm,
{
struct kvm_ppc_one_seg_page_size *sps;
+ /*
+ * Since we don't yet support HPT guests on a radix host,
+ * return an error if the host uses radix.
+ */
+ if (radix_enabled())
+ return -EINVAL;
+
info->flags = KVM_PPC_PAGE_SIZES_REAL;
if (mmu_has_feature(MMU_FTR_1T_SEGMENT))
info->flags |= KVM_PPC_1T_SEGMENTS;
@@ -3025,6 +3035,15 @@ static void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *free,
static int kvmppc_core_create_memslot_hv(struct kvm_memory_slot *slot,
unsigned long npages)
{
+ /*
+ * For now, if radix_enabled() then we only support radix guests,
+ * and in that case we don't need the rmap array.
+ */
+ if (radix_enabled()) {
+ slot->arch.rmap = NULL;
+ return 0;
+ }
+
slot->arch.rmap = vzalloc(npages * sizeof(*slot->arch.rmap));
if (!slot->arch.rmap)
return -ENOMEM;
@@ -3105,14 +3124,20 @@ static void kvmppc_setup_partition_table(struct kvm *kvm)
{
unsigned long dw0, dw1;
- /* PS field - page size for VRMA */
- dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
- ((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
- /* HTABSIZE and HTABORG fields */
- dw0 |= kvm->arch.sdr1;
+ if (!kvm->arch.radix) {
+ /* PS field - page size for VRMA */
+ dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
+ ((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
+ /* HTABSIZE and HTABORG fields */
+ dw0 |= kvm->arch.sdr1;
- /* Second dword as set by userspace */
- dw1 = kvm->arch.process_table;
+ /* Second dword as set by userspace */
+ dw1 = kvm->arch.process_table;
+ } else {
+ dw0 = PATB_HR | radix__get_tree_size() |
+ __pa(kvm->arch.pgtable) | RADIX_PGD_INDEX_SIZE;
+ dw1 = PATB_GR | kvm->arch.process_table;
+ }
mmu_partition_table_set_entry(kvm->arch.lpid, dw0, dw1);
}
@@ -3282,6 +3307,7 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
{
unsigned long lpcr, lpid;
char buf[32];
+ int ret;
/* Allocate the guest's logical partition ID */
@@ -3329,13 +3355,30 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
lpcr |= LPCR_HVICE;
}
+ /*
+ * For now, if the host uses radix, the guest must be radix.
+ */
+ if (radix_enabled()) {
+ kvm->arch.radix = 1;
+ lpcr &= ~LPCR_VPM1;
+ lpcr |= LPCR_UPRT | LPCR_GTSE | LPCR_HR;
+ ret = kvmppc_init_vm_radix(kvm);
+ if (ret) {
+ kvmppc_free_lpid(kvm->arch.lpid);
+ return ret;
+ }
+ kvmppc_setup_partition_table(kvm);
+ }
+
kvm->arch.lpcr = lpcr;
/*
* Work out how many sets the TLB has, for the use of
* the TLB invalidation loop in book3s_hv_rmhandlers.S.
*/
- if (cpu_has_feature(CPU_FTR_ARCH_300))
+ if (kvm_is_radix(kvm))
+ kvm->arch.tlb_sets = POWER9_TLB_SETS_RADIX; /* 128 */
+ else if (cpu_has_feature(CPU_FTR_ARCH_300))
kvm->arch.tlb_sets = POWER9_TLB_SETS_HASH; /* 256 */
else if (cpu_has_feature(CPU_FTR_ARCH_207S))
kvm->arch.tlb_sets = POWER8_TLB_SETS; /* 512 */
@@ -3345,8 +3388,11 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
/*
* Track that we now have a HV mode VM active. This blocks secondary
* CPU threads from coming online.
+ * On POWER9, we only need to do this for HPT guests on a radix
+ * host, which is not yet supported.
*/
- kvm_hv_vm_activated();
+ if (!cpu_has_feature(CPU_FTR_ARCH_300))
+ kvm_hv_vm_activated();
/*
* Create a debugfs directory for the VM
@@ -3372,10 +3418,13 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
{
debugfs_remove_recursive(kvm->arch.debugfs_dir);
- kvm_hv_vm_deactivated();
+ if (!cpu_has_feature(CPU_FTR_ARCH_300))
+ kvm_hv_vm_deactivated();
kvmppc_free_vcores(kvm);
+ kvmppc_free_lpid(kvm->arch.lpid);
+
if (kvm->arch.radix)
kvmppc_free_radix(kvm);
else
@@ -3408,11 +3457,6 @@ static int kvmppc_core_check_processor_compat_hv(void)
if (!cpu_has_feature(CPU_FTR_HVMODE) ||
!cpu_has_feature(CPU_FTR_ARCH_206))
return -EIO;
- /*
- * Disable KVM for Power9 in radix mode.
- */
- if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled())
- return -EIO;
return 0;
}
@@ -3683,6 +3727,7 @@ static void init_default_hcalls(void)
static int kvmhv_configure_mmu(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
{
unsigned long lpcr;
+ int radix;
/* If not on a POWER9, reject it */
if (!cpu_has_feature(CPU_FTR_ARCH_300))
@@ -3692,12 +3737,13 @@ static int kvmhv_configure_mmu(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
if (cfg->flags & ~(KVM_PPC_MMUV3_RADIX | KVM_PPC_MMUV3_GTSE))
return -EINVAL;
- /* We can't do radix yet */
- if (cfg->flags & KVM_PPC_MMUV3_RADIX)
+ /* We can't change a guest to/from radix yet */
+ radix = !!(cfg->flags & KVM_PPC_MMUV3_RADIX);
+ if (radix != kvm_is_radix(kvm))
return -EINVAL;
/* GR (guest radix) bit in process_table field must match */
- if (cfg->process_table & PATB_GR)
+ if (!!(cfg->process_table & PATB_GR) != radix)
return -EINVAL;
/* Process table size field must be reasonable, i.e. <= 24 */
@@ -3713,11 +3759,6 @@ static int kvmhv_configure_mmu(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
return 0;
}
-static int kvmhv_get_rmmu_info(struct kvm *kvm, struct kvm_ppc_rmmu_info *info)
-{
- return -EINVAL;
-}
-
static struct kvmppc_ops kvm_ops_hv = {
.get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
.set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 1476a48..40a5b2d 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -566,7 +566,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = kvmppc_hwrng_present();
break;
case KVM_CAP_PPC_MMU_RADIX:
- r = !!(0 && hv_enabled && radix_enabled());
+ r = !!(hv_enabled && radix_enabled());
break;
case KVM_CAP_PPC_MMU_HASH_V3:
r = !!(hv_enabled && !radix_enabled() &&
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 17/18] KVM: PPC: Book3S HV: Enable radix guest support
2017-01-12 9:07 ` [PATCH 17/18] KVM: PPC: Book3S HV: Enable radix guest support Paul Mackerras
@ 2017-01-23 3:31 ` Suraj Jitindar Singh
0 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2017-01-23 3:31 UTC (permalink / raw)
To: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc
On Thu, 2017-01-12 at 20:07 +1100, Paul Mackerras wrote:
> This adds a few last pieces of the support for radix guests:
>
> * Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and
> KVM_PPC_GET_RMMU_INFO ioctls for radix guests
>
> * On POWER9, allow secondary threads to be on/off-lined while guests
> are running.
>
> * Set up LPCR and the partition table entry for radix guests.
>
> * Don't allocate the rmap array in the kvm_memory_slot structure
> on radix.
>
> * Prevent the AIL field in the LPCR being set for radix guests,
> since we can't yet handle getting interrupts from the guest with
> the MMU on.
>
> * Don't try to initialize the HPT for radix guests, since they don't
> have an HPT.
>
> * Take out the code that prevents the HV KVM module from
> initializing on radix hosts.
>
> At this stage, we only support radix guests if the host is running
> in radix mode, and only support HPT guests if the host is running in
> HPT mode. Thus a guest cannot switch from one mode to the other,
> which enables some simplifications.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> arch/powerpc/include/asm/kvm_book3s.h | 2 +
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 1 -
> arch/powerpc/kvm/book3s_64_mmu_radix.c | 45 ++++++++++++++++
> arch/powerpc/kvm/book3s_hv.c | 93
> ++++++++++++++++++++++++----------
> arch/powerpc/kvm/powerpc.c | 2 +-
> 5 files changed, 115 insertions(+), 28 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h
> b/arch/powerpc/include/asm/kvm_book3s.h
> index 57dc407..2bf3501 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -189,6 +189,7 @@ extern int kvmppc_book3s_radix_page_fault(struct
> kvm_run *run,
> unsigned long ea, unsigned long dsisr);
> extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t
> eaddr,
> struct kvmppc_pte *gpte, bool data, bool
> iswrite);
> +extern int kvmppc_init_vm_radix(struct kvm *kvm);
> extern void kvmppc_free_radix(struct kvm *kvm);
> extern int kvmppc_radix_init(void);
> extern void kvmppc_radix_exit(void);
> @@ -200,6 +201,7 @@ extern int kvm_test_age_radix(struct kvm *kvm,
> struct kvm_memory_slot *memslot,
> unsigned long gfn);
> extern long kvmppc_hv_get_dirty_log_radix(struct kvm *kvm,
> struct kvm_memory_slot *memslot, unsigned
> long *map);
> +extern int kvmhv_get_rmmu_info(struct kvm *kvm, struct
> kvm_ppc_rmmu_info *info);
>
> /* XXX remove this export when load_last_inst() is generic */
> extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size,
> void *ptr, bool data);
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 7a9afbe..db8de17 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -155,7 +155,6 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32
> *htab_orderp)
>
> void kvmppc_free_hpt(struct kvm *kvm)
> {
> - kvmppc_free_lpid(kvm->arch.lpid);
> vfree(kvm->arch.revmap);
> if (kvm->arch.hpt_cma_alloc)
> kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt),
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> index 125cc7c..4344651 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> @@ -610,6 +610,51 @@ long kvmppc_hv_get_dirty_log_radix(struct kvm
> *kvm,
> return 0;
> }
>
> +static void add_rmmu_ap_encoding(struct kvm_ppc_rmmu_info *info,
> + int psize, int *indexp)
> +{
> + if (!mmu_psize_defs[psize].shift)
> + return;
> + info->ap_encodings[*indexp] = mmu_psize_defs[psize].shift |
> + (mmu_psize_defs[psize].ap << 29);
> + ++(*indexp);
> +}
> +
> +int kvmhv_get_rmmu_info(struct kvm *kvm, struct kvm_ppc_rmmu_info
> *info)
> +{
> + int i;
> +
> + if (!radix_enabled())
> + return -EINVAL;
> + memset(info, 0, sizeof(*info));
> +
> + /* 4k page size */
> + info->geometries[0].page_shift = 12;
> + info->geometries[0].level_bits[0] = 9;
> + for (i = 1; i < 4; ++i)
> + info->geometries[0].level_bits[i] =
> p9_supported_radix_bits[i];
> + /* 64k page size */
> + info->geometries[1].page_shift = 16;
> + for (i = 0; i < 4; ++i)
> + info->geometries[1].level_bits[i] =
> p9_supported_radix_bits[i];
> +
> + i = 0;
> + add_rmmu_ap_encoding(info, MMU_PAGE_4K, &i);
> + add_rmmu_ap_encoding(info, MMU_PAGE_64K, &i);
> + add_rmmu_ap_encoding(info, MMU_PAGE_2M, &i);
> + add_rmmu_ap_encoding(info, MMU_PAGE_1G, &i);
> +
> + return 0;
> +}
> +
> +int kvmppc_init_vm_radix(struct kvm *kvm)
> +{
> + kvm->arch.pgtable = pgd_alloc(kvm->mm);
> + if (!kvm->arch.pgtable)
> + return -ENOMEM;
> + return 0;
> +}
> +
> void kvmppc_free_radix(struct kvm *kvm)
> {
> unsigned long ig, iu, im;
> diff --git a/arch/powerpc/kvm/book3s_hv.c
> b/arch/powerpc/kvm/book3s_hv.c
> index ab5adcd..14a9efe 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1136,10 +1136,13 @@ static void kvmppc_set_lpcr(struct kvm_vcpu
> *vcpu, u64 new_lpcr,
> /*
> * Userspace can only modify DPFD (default prefetch depth),
> * ILE (interrupt little-endian) and TC (translation
> control).
> - * On POWER8 userspace can also modify AIL (alt. interrupt
> loc.)
> + * On POWER8 userspace can also modify AIL (alt. interrupt
> loc.).
> + * On POWER9 with a radix guest, we can't allow AIL to be
> set
> + * since we don't yet have KVM handlers in the relocation-on
> + * interrupt vectors.
> */
> mask = LPCR_DPFD | LPCR_ILE | LPCR_TC;
> - if (cpu_has_feature(CPU_FTR_ARCH_207S))
> + if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
> !kvm_is_radix(kvm))
> mask |= LPCR_AIL;
>
> /* Broken 32-bit version of LPCR must not clear top bits */
> @@ -2878,7 +2881,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_run
> *run, struct kvm_vcpu *vcpu)
> smp_mb();
>
> /* On the first time here, set up HTAB and VRMA */
> - if (!vcpu->kvm->arch.hpte_setup_done) {
> + if (!kvm_is_radix(vcpu->kvm) && !vcpu->kvm-
> >arch.hpte_setup_done) {
> r = kvmppc_hv_setup_htab_rma(vcpu);
> if (r)
> goto out;
> @@ -2940,6 +2943,13 @@ static int
> kvm_vm_ioctl_get_smmu_info_hv(struct kvm *kvm,
> {
> struct kvm_ppc_one_seg_page_size *sps;
>
> + /*
> + * Since we don't yet support HPT guests on a radix host,
> + * return an error if the host uses radix.
> + */
> + if (radix_enabled())
> + return -EINVAL;
> +
> info->flags = KVM_PPC_PAGE_SIZES_REAL;
> if (mmu_has_feature(MMU_FTR_1T_SEGMENT))
> info->flags |= KVM_PPC_1T_SEGMENTS;
> @@ -3025,6 +3035,15 @@ static void kvmppc_core_free_memslot_hv(struct
> kvm_memory_slot *free,
> static int kvmppc_core_create_memslot_hv(struct kvm_memory_slot
> *slot,
> unsigned long npages)
> {
> + /*
> + * For now, if radix_enabled() then we only support radix
> guests,
> + * and in that case we don't need the rmap array.
> + */
> + if (radix_enabled()) {
> + slot->arch.rmap = NULL;
> + return 0;
> + }
> +
> slot->arch.rmap = vzalloc(npages * sizeof(*slot-
> >arch.rmap));
> if (!slot->arch.rmap)
> return -ENOMEM;
> @@ -3105,14 +3124,20 @@ static void
> kvmppc_setup_partition_table(struct kvm *kvm)
> {
> unsigned long dw0, dw1;
>
> - /* PS field - page size for VRMA */
> - dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
> - ((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
> - /* HTABSIZE and HTABORG fields */
> - dw0 |= kvm->arch.sdr1;
> + if (!kvm->arch.radix) {
kvm_is_radix() for consistency?
> + /* PS field - page size for VRMA */
> + dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
> + ((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
> + /* HTABSIZE and HTABORG fields */
> + dw0 |= kvm->arch.sdr1;
>
> - /* Second dword as set by userspace */
> - dw1 = kvm->arch.process_table;
> + /* Second dword as set by userspace */
> + dw1 = kvm->arch.process_table;
> + } else {
> + dw0 = PATB_HR | radix__get_tree_size() |
> + __pa(kvm->arch.pgtable) |
> RADIX_PGD_INDEX_SIZE;
> + dw1 = PATB_GR | kvm->arch.process_table;
> + }
>
> mmu_partition_table_set_entry(kvm->arch.lpid, dw0, dw1);
> }
> @@ -3282,6 +3307,7 @@ static int kvmppc_core_init_vm_hv(struct kvm
> *kvm)
> {
> unsigned long lpcr, lpid;
> char buf[32];
> + int ret;
>
> /* Allocate the guest's logical partition ID */
>
> @@ -3329,13 +3355,30 @@ static int kvmppc_core_init_vm_hv(struct kvm
> *kvm)
> lpcr |= LPCR_HVICE;
> }
>
> + /*
> + * For now, if the host uses radix, the guest must be radix.
> + */
> + if (radix_enabled()) {
> + kvm->arch.radix = 1;
> + lpcr &= ~LPCR_VPM1;
> + lpcr |= LPCR_UPRT | LPCR_GTSE | LPCR_HR;
> + ret = kvmppc_init_vm_radix(kvm);
> + if (ret) {
> + kvmppc_free_lpid(kvm->arch.lpid);
> + return ret;
> + }
> + kvmppc_setup_partition_table(kvm);
> + }
> +
> kvm->arch.lpcr = lpcr;
>
> /*
> * Work out how many sets the TLB has, for the use of
> * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> */
> - if (cpu_has_feature(CPU_FTR_ARCH_300))
> + if (kvm_is_radix(kvm))
> + kvm->arch.tlb_sets = POWER9_TLB_SETS_RADIX; /
> * 128 */
> + else if (cpu_has_feature(CPU_FTR_ARCH_300))
> kvm->arch.tlb_sets = POWER9_TLB_SETS_HASH; /*
> 256 */
> else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> kvm->arch.tlb_sets = POWER8_TLB_SETS;
> /* 512 */
> @@ -3345,8 +3388,11 @@ static int kvmppc_core_init_vm_hv(struct kvm
> *kvm)
> /*
> * Track that we now have a HV mode VM active. This blocks
> secondary
> * CPU threads from coming online.
> + * On POWER9, we only need to do this for HPT guests on a
> radix
> + * host, which is not yet supported.
> */
> - kvm_hv_vm_activated();
> + if (!cpu_has_feature(CPU_FTR_ARCH_300))
> + kvm_hv_vm_activated();
>
> /*
> * Create a debugfs directory for the VM
> @@ -3372,10 +3418,13 @@ static void kvmppc_core_destroy_vm_hv(struct
> kvm *kvm)
> {
> debugfs_remove_recursive(kvm->arch.debugfs_dir);
>
> - kvm_hv_vm_deactivated();
> + if (!cpu_has_feature(CPU_FTR_ARCH_300))
> + kvm_hv_vm_deactivated();
>
> kvmppc_free_vcores(kvm);
>
> + kvmppc_free_lpid(kvm->arch.lpid);
> +
> if (kvm->arch.radix)
ditto
> kvmppc_free_radix(kvm);
> else
> @@ -3408,11 +3457,6 @@ static int
> kvmppc_core_check_processor_compat_hv(void)
> if (!cpu_has_feature(CPU_FTR_HVMODE) ||
> !cpu_has_feature(CPU_FTR_ARCH_206))
> return -EIO;
> - /*
> - * Disable KVM for Power9 in radix mode.
> - */
> - if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled())
> - return -EIO;
>
> return 0;
> }
> @@ -3683,6 +3727,7 @@ static void init_default_hcalls(void)
> static int kvmhv_configure_mmu(struct kvm *kvm, struct
> kvm_ppc_mmuv3_cfg *cfg)
> {
> unsigned long lpcr;
> + int radix;
For clarity, this could be a bool.
>
> /* If not on a POWER9, reject it */
> if (!cpu_has_feature(CPU_FTR_ARCH_300))
> @@ -3692,12 +3737,13 @@ static int kvmhv_configure_mmu(struct kvm
> *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
> if (cfg->flags & ~(KVM_PPC_MMUV3_RADIX |
> KVM_PPC_MMUV3_GTSE))
> return -EINVAL;
>
> - /* We can't do radix yet */
> - if (cfg->flags & KVM_PPC_MMUV3_RADIX)
> + /* We can't change a guest to/from radix yet */
> + radix = !!(cfg->flags & KVM_PPC_MMUV3_RADIX);
> + if (radix != kvm_is_radix(kvm))
> return -EINVAL;
>
> /* GR (guest radix) bit in process_table field must match */
> - if (cfg->process_table & PATB_GR)
> + if (!!(cfg->process_table & PATB_GR) != radix)
> return -EINVAL;
>
> /* Process table size field must be reasonable, i.e. <= 24
> */
> @@ -3713,11 +3759,6 @@ static int kvmhv_configure_mmu(struct kvm
> *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
> return 0;
> }
>
> -static int kvmhv_get_rmmu_info(struct kvm *kvm, struct
> kvm_ppc_rmmu_info *info)
> -{
> - return -EINVAL;
> -}
> -
> static struct kvmppc_ops kvm_ops_hv = {
> .get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
> .set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 1476a48..40a5b2d 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -566,7 +566,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm,
> long ext)
> r = kvmppc_hwrng_present();
> break;
> case KVM_CAP_PPC_MMU_RADIX:
> - r = !!(0 && hv_enabled && radix_enabled());
> + r = !!(hv_enabled && radix_enabled());
> break;
> case KVM_CAP_PPC_MMU_HASH_V3:
> r = !!(hv_enabled && !radix_enabled() &&
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 18/18] KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9
2017-01-12 9:07 [PATCH 00/18] Support for radix guest and host on POWER9 Paul Mackerras
` (16 preceding siblings ...)
2017-01-12 9:07 ` [PATCH 17/18] KVM: PPC: Book3S HV: Enable radix guest support Paul Mackerras
@ 2017-01-12 9:07 ` Paul Mackerras
17 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2017-01-12 9:07 UTC (permalink / raw)
To: linuxppc-dev, kvm, kvm-ppc
POWER9 adds a register called ASDR (Access Segment Descriptor
Register), which is set by hypervisor data/instruction storage
interrupts to contain the segment descriptor for the address
being accessed, assuming the guest is using HPT translation.
(For radix guests, it contains the guest real address of the
access.)
Thus, for HPT guests on POWER9, we can use this register rather
than looking up the SLB with the slbfee. instruction.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f638f3e..625ba5e 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1731,6 +1731,10 @@ kvmppc_hdsi:
/* HPTE not found fault or protection fault? */
andis. r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
beq 1f /* if not, send it to the guest */
+BEGIN_FTR_SECTION
+ mfspr r5, SPRN_ASDR /* on POWER9, use ASDR to get VSID */
+ b 4f
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
andi. r0, r11, MSR_DR /* data relocation enabled? */
beq 3f
clrrdi r0, r4, 28
@@ -1819,6 +1823,10 @@ kvmppc_hisi:
bne .Lradix_hisi /* for radix, just save ASDR */
andis. r0, r11, SRR1_ISI_NOPT@h
beq 1f
+BEGIN_FTR_SECTION
+ mfspr r5, SPRN_ASDR /* on POWER9, use ASDR to get VSID */
+ b 4f
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
andi. r0, r11, MSR_IR /* instruction relocation enabled? */
beq 3f
clrrdi r0, r10, 28
--
2.7.4
^ permalink raw reply related [flat|nested] 27+ messages in thread