All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests
@ 2023-06-05  6:48 ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

There is existing support for nested guests on powernv hosts however the
hcall interface this uses is not support by other PAPR hosts. A set of
new hcalls will be added to PAPR to facilitate creating and managing
guests by a regular partition in the following way:

  - L1 and L0 negotiate capabilities with
    H_GUEST_{G,S}ET_CAPABILITIES

  - L1 requests the L0 create a L2 with
    H_GUEST_CREATE and receives a handle to use in future hcalls

  - L1 requests the L0 create a L2 vCPU with
    H_GUEST_CREATE_VCPU

  - L1 sets up the L2 using H_GUEST_SET and the
    H_GUEST_VCPU_RUN input buffer

  - L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN

  - L2 returns to L1 with an exit reason and L1 reads the
    H_GUEST_VCPU_RUN output buffer populated by the L0

  - L1 handles the exit using H_GET_STATE if necessary

  - L1 reruns L2 vCPU with H_GUEST_VCPU_RUN

  - L1 frees the L2 in the L0 with H_GUEST_DELETE

Further details are available in Documentation/powerpc/kvm-nested.rst.

This series adds KVM support for using this hcall interface as a regular
PAPR partition, i.e. the L1. It does not add support for running as the
L0.

The new hcalls have been implemented in the spapr qemu model for
testing.

This is available at https://github.com/mikey/qemu/tree/kvm-papr

There are scripts available to assist in setting up an environment for
testing nested guests at https://github.com/mikey/kvm-powervm-test

A tree with this series is available at
https://github.com/iamjpn/linux/tree/features/kvm-papr

Thanks to Amit Machhiwal, Kautuk Consul, Vaibhav Jain, Michael Neuling,
Shivaprasad Bhat, Harsh Prateek Bora, Paul Mackerras and Nicholas
Piggin. 

Change overview in v2:
  - Rebase on top of kvm ppc prefix instruction support
  - Make documentation an individual patch
  - Move guest state buffer files from arch/powerpc/lib/ to
    arch/powerpc/kvm/
  - Use kunit for testing guest state buffer
  - Fix some build errors
  - Change HEIR element from 4 bytes to 8 bytes

Previous revisions:

  - v1: https://lore.kernel.org/linuxppc-dev/20230508072332.2937883-1-jpn@linux.vnet.ibm.com/

Jordan Niethe (5):
  KVM: PPC: Use getters and setters for vcpu register state
  KVM: PPC: Add fpr getters and setters
  KVM: PPC: Add vr getters and setters
  KVM: PPC: Add helper library for Guest State Buffers
  KVM: PPC: Add support for nested PAPR guests

Michael Neuling (1):
  docs: powerpc: Document nested KVM on POWER

 Documentation/powerpc/index.rst               |   1 +
 Documentation/powerpc/kvm-nested.rst          | 636 +++++++++++
 arch/powerpc/Kconfig.debug                    |  12 +
 arch/powerpc/include/asm/guest-state-buffer.h | 988 ++++++++++++++++++
 arch/powerpc/include/asm/hvcall.h             |  30 +
 arch/powerpc/include/asm/kvm_book3s.h         | 205 +++-
 arch/powerpc/include/asm/kvm_book3s_64.h      |   6 +
 arch/powerpc/include/asm/kvm_booke.h          |  10 +
 arch/powerpc/include/asm/kvm_host.h           |  21 +
 arch/powerpc/include/asm/kvm_ppc.h            |  80 +-
 arch/powerpc/include/asm/plpar_wrappers.h     | 198 ++++
 arch/powerpc/kvm/Makefile                     |   4 +
 arch/powerpc/kvm/book3s.c                     |  38 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |   4 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c        |   9 +-
 arch/powerpc/kvm/book3s_64_vio.c              |   4 +-
 arch/powerpc/kvm/book3s_hv.c                  | 336 ++++--
 arch/powerpc/kvm/book3s_hv.h                  |  65 ++
 arch/powerpc/kvm/book3s_hv_builtin.c          |  10 +-
 arch/powerpc/kvm/book3s_hv_nested.c           |  38 +-
 arch/powerpc/kvm/book3s_hv_p9_entry.c         |   4 +-
 arch/powerpc/kvm/book3s_hv_papr.c             | 940 +++++++++++++++++
 arch/powerpc/kvm/book3s_hv_ras.c              |   5 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c           |   8 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c          |   4 +-
 arch/powerpc/kvm/book3s_xive.c                |   9 +-
 arch/powerpc/kvm/emulate_loadstore.c          |   6 +-
 arch/powerpc/kvm/guest-state-buffer.c         | 612 +++++++++++
 arch/powerpc/kvm/powerpc.c                    |  76 +-
 arch/powerpc/kvm/test-guest-state-buffer.c    | 321 ++++++
 30 files changed, 4467 insertions(+), 213 deletions(-)
 create mode 100644 Documentation/powerpc/kvm-nested.rst
 create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_papr.c
 create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
 create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests
@ 2023-06-05  6:48 ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Jordan Niethe, mikey, kautuk.consul.1980, kvm, npiggin, kvm-ppc,
	sbhat, vaibhav

There is existing support for nested guests on powernv hosts however the
hcall interface this uses is not support by other PAPR hosts. A set of
new hcalls will be added to PAPR to facilitate creating and managing
guests by a regular partition in the following way:

  - L1 and L0 negotiate capabilities with
    H_GUEST_{G,S}ET_CAPABILITIES

  - L1 requests the L0 create a L2 with
    H_GUEST_CREATE and receives a handle to use in future hcalls

  - L1 requests the L0 create a L2 vCPU with
    H_GUEST_CREATE_VCPU

  - L1 sets up the L2 using H_GUEST_SET and the
    H_GUEST_VCPU_RUN input buffer

  - L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN

  - L2 returns to L1 with an exit reason and L1 reads the
    H_GUEST_VCPU_RUN output buffer populated by the L0

  - L1 handles the exit using H_GET_STATE if necessary

  - L1 reruns L2 vCPU with H_GUEST_VCPU_RUN

  - L1 frees the L2 in the L0 with H_GUEST_DELETE

Further details are available in Documentation/powerpc/kvm-nested.rst.

This series adds KVM support for using this hcall interface as a regular
PAPR partition, i.e. the L1. It does not add support for running as the
L0.

The new hcalls have been implemented in the spapr qemu model for
testing.

This is available at https://github.com/mikey/qemu/tree/kvm-papr

There are scripts available to assist in setting up an environment for
testing nested guests at https://github.com/mikey/kvm-powervm-test

A tree with this series is available at
https://github.com/iamjpn/linux/tree/features/kvm-papr

Thanks to Amit Machhiwal, Kautuk Consul, Vaibhav Jain, Michael Neuling,
Shivaprasad Bhat, Harsh Prateek Bora, Paul Mackerras and Nicholas
Piggin. 

Change overview in v2:
  - Rebase on top of kvm ppc prefix instruction support
  - Make documentation an individual patch
  - Move guest state buffer files from arch/powerpc/lib/ to
    arch/powerpc/kvm/
  - Use kunit for testing guest state buffer
  - Fix some build errors
  - Change HEIR element from 4 bytes to 8 bytes

Previous revisions:

  - v1: https://lore.kernel.org/linuxppc-dev/20230508072332.2937883-1-jpn@linux.vnet.ibm.com/

Jordan Niethe (5):
  KVM: PPC: Use getters and setters for vcpu register state
  KVM: PPC: Add fpr getters and setters
  KVM: PPC: Add vr getters and setters
  KVM: PPC: Add helper library for Guest State Buffers
  KVM: PPC: Add support for nested PAPR guests

Michael Neuling (1):
  docs: powerpc: Document nested KVM on POWER

 Documentation/powerpc/index.rst               |   1 +
 Documentation/powerpc/kvm-nested.rst          | 636 +++++++++++
 arch/powerpc/Kconfig.debug                    |  12 +
 arch/powerpc/include/asm/guest-state-buffer.h | 988 ++++++++++++++++++
 arch/powerpc/include/asm/hvcall.h             |  30 +
 arch/powerpc/include/asm/kvm_book3s.h         | 205 +++-
 arch/powerpc/include/asm/kvm_book3s_64.h      |   6 +
 arch/powerpc/include/asm/kvm_booke.h          |  10 +
 arch/powerpc/include/asm/kvm_host.h           |  21 +
 arch/powerpc/include/asm/kvm_ppc.h            |  80 +-
 arch/powerpc/include/asm/plpar_wrappers.h     | 198 ++++
 arch/powerpc/kvm/Makefile                     |   4 +
 arch/powerpc/kvm/book3s.c                     |  38 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |   4 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c        |   9 +-
 arch/powerpc/kvm/book3s_64_vio.c              |   4 +-
 arch/powerpc/kvm/book3s_hv.c                  | 336 ++++--
 arch/powerpc/kvm/book3s_hv.h                  |  65 ++
 arch/powerpc/kvm/book3s_hv_builtin.c          |  10 +-
 arch/powerpc/kvm/book3s_hv_nested.c           |  38 +-
 arch/powerpc/kvm/book3s_hv_p9_entry.c         |   4 +-
 arch/powerpc/kvm/book3s_hv_papr.c             | 940 +++++++++++++++++
 arch/powerpc/kvm/book3s_hv_ras.c              |   5 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c           |   8 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c          |   4 +-
 arch/powerpc/kvm/book3s_xive.c                |   9 +-
 arch/powerpc/kvm/emulate_loadstore.c          |   6 +-
 arch/powerpc/kvm/guest-state-buffer.c         | 612 +++++++++++
 arch/powerpc/kvm/powerpc.c                    |  76 +-
 arch/powerpc/kvm/test-guest-state-buffer.c    | 321 ++++++
 30 files changed, 4467 insertions(+), 213 deletions(-)
 create mode 100644 Documentation/powerpc/kvm-nested.rst
 create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_papr.c
 create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
 create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests
@ 2023-06-05  6:48 ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

There is existing support for nested guests on powernv hosts however the
hcall interface this uses is not support by other PAPR hosts. A set of
new hcalls will be added to PAPR to facilitate creating and managing
guests by a regular partition in the following way:

  - L1 and L0 negotiate capabilities with
    H_GUEST_{G,S}ET_CAPABILITIES

  - L1 requests the L0 create a L2 with
    H_GUEST_CREATE and receives a handle to use in future hcalls

  - L1 requests the L0 create a L2 vCPU with
    H_GUEST_CREATE_VCPU

  - L1 sets up the L2 using H_GUEST_SET and the
    H_GUEST_VCPU_RUN input buffer

  - L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN

  - L2 returns to L1 with an exit reason and L1 reads the
    H_GUEST_VCPU_RUN output buffer populated by the L0

  - L1 handles the exit using H_GET_STATE if necessary

  - L1 reruns L2 vCPU with H_GUEST_VCPU_RUN

  - L1 frees the L2 in the L0 with H_GUEST_DELETE

Further details are available in Documentation/powerpc/kvm-nested.rst.

This series adds KVM support for using this hcall interface as a regular
PAPR partition, i.e. the L1. It does not add support for running as the
L0.

The new hcalls have been implemented in the spapr qemu model for
testing.

This is available at https://github.com/mikey/qemu/tree/kvm-papr

There are scripts available to assist in setting up an environment for
testing nested guests at https://github.com/mikey/kvm-powervm-test

A tree with this series is available at
https://github.com/iamjpn/linux/tree/features/kvm-papr

Thanks to Amit Machhiwal, Kautuk Consul, Vaibhav Jain, Michael Neuling,
Shivaprasad Bhat, Harsh Prateek Bora, Paul Mackerras and Nicholas
Piggin. 

Change overview in v2:
  - Rebase on top of kvm ppc prefix instruction support
  - Make documentation an individual patch
  - Move guest state buffer files from arch/powerpc/lib/ to
    arch/powerpc/kvm/
  - Use kunit for testing guest state buffer
  - Fix some build errors
  - Change HEIR element from 4 bytes to 8 bytes

Previous revisions:

  - v1: https://lore.kernel.org/linuxppc-dev/20230508072332.2937883-1-jpn@linux.vnet.ibm.com/

Jordan Niethe (5):
  KVM: PPC: Use getters and setters for vcpu register state
  KVM: PPC: Add fpr getters and setters
  KVM: PPC: Add vr getters and setters
  KVM: PPC: Add helper library for Guest State Buffers
  KVM: PPC: Add support for nested PAPR guests

Michael Neuling (1):
  docs: powerpc: Document nested KVM on POWER

 Documentation/powerpc/index.rst               |   1 +
 Documentation/powerpc/kvm-nested.rst          | 636 +++++++++++
 arch/powerpc/Kconfig.debug                    |  12 +
 arch/powerpc/include/asm/guest-state-buffer.h | 988 ++++++++++++++++++
 arch/powerpc/include/asm/hvcall.h             |  30 +
 arch/powerpc/include/asm/kvm_book3s.h         | 205 +++-
 arch/powerpc/include/asm/kvm_book3s_64.h      |   6 +
 arch/powerpc/include/asm/kvm_booke.h          |  10 +
 arch/powerpc/include/asm/kvm_host.h           |  21 +
 arch/powerpc/include/asm/kvm_ppc.h            |  80 +-
 arch/powerpc/include/asm/plpar_wrappers.h     | 198 ++++
 arch/powerpc/kvm/Makefile                     |   4 +
 arch/powerpc/kvm/book3s.c                     |  38 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |   4 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c        |   9 +-
 arch/powerpc/kvm/book3s_64_vio.c              |   4 +-
 arch/powerpc/kvm/book3s_hv.c                  | 336 ++++--
 arch/powerpc/kvm/book3s_hv.h                  |  65 ++
 arch/powerpc/kvm/book3s_hv_builtin.c          |  10 +-
 arch/powerpc/kvm/book3s_hv_nested.c           |  38 +-
 arch/powerpc/kvm/book3s_hv_p9_entry.c         |   4 +-
 arch/powerpc/kvm/book3s_hv_papr.c             | 940 +++++++++++++++++
 arch/powerpc/kvm/book3s_hv_ras.c              |   5 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c           |   8 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c          |   4 +-
 arch/powerpc/kvm/book3s_xive.c                |   9 +-
 arch/powerpc/kvm/emulate_loadstore.c          |   6 +-
 arch/powerpc/kvm/guest-state-buffer.c         | 612 +++++++++++
 arch/powerpc/kvm/powerpc.c                    |  76 +-
 arch/powerpc/kvm/test-guest-state-buffer.c    | 321 ++++++
 30 files changed, 4467 insertions(+), 213 deletions(-)
 create mode 100644 Documentation/powerpc/kvm-nested.rst
 create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_papr.c
 create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
 create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c

-- 
2.31.1

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 1/6] KVM: PPC: Use getters and setters for vcpu register state
  2023-06-05  6:48 ` Jordan Niethe
  (?)
@ 2023-06-05  6:48   ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

There are already some getter and setter functions used for accessing
vcpu register state, e.g. kvmppc_get_pc(). There are also more
complicated examples that are generated by macros like
kvmppc_get_sprg0() which are generated by the SHARED_SPRNG_WRAPPER()
macro.

In the new PAPR API for nested guest partitions the L1 is required to
communicate with the L0 to modify and read nested guest state.

Prepare to support this by replacing direct accesses to vcpu register
state with wrapper functions. Follow the existing pattern of using
macros to generate individual wrappers. These wrappers will
be augmented for supporting PAPR nested guests later.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s.h  |  68 +++++++-
 arch/powerpc/include/asm/kvm_ppc.h     |  48 ++++--
 arch/powerpc/kvm/book3s.c              |  22 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   4 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   9 +-
 arch/powerpc/kvm/book3s_64_vio.c       |   4 +-
 arch/powerpc/kvm/book3s_hv.c           | 222 +++++++++++++------------
 arch/powerpc/kvm/book3s_hv.h           |  59 +++++++
 arch/powerpc/kvm/book3s_hv_builtin.c   |  10 +-
 arch/powerpc/kvm/book3s_hv_p9_entry.c  |   4 +-
 arch/powerpc/kvm/book3s_hv_ras.c       |   5 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c    |   8 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c   |   4 +-
 arch/powerpc/kvm/book3s_xive.c         |   9 +-
 arch/powerpc/kvm/powerpc.c             |   4 +-
 15 files changed, 322 insertions(+), 158 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index bbf5e2c5fe09..4e91f54a3f9f 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -392,6 +392,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 	return vcpu->arch.regs.nip;
 }
 
+static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 val)
+{
+	vcpu->arch.pid = val;
+}
+
+static inline u32 kvmppc_get_pid(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.pid;
+}
+
 static inline u64 kvmppc_get_msr(struct kvm_vcpu *vcpu);
 static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu)
 {
@@ -403,10 +413,66 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault_dar;
 }
 
+#define BOOK3S_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+									\
+	vcpu->arch.reg = val;						\
+}
+
+#define BOOK3S_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
+{									\
+	return vcpu->arch.reg;						\
+}
+
+#define BOOK3S_WRAPPER(reg, size)					\
+	BOOK3S_WRAPPER_SET(reg, size)					\
+	BOOK3S_WRAPPER_GET(reg, size)					\
+
+BOOK3S_WRAPPER(tar, 64)
+BOOK3S_WRAPPER(ebbhr, 64)
+BOOK3S_WRAPPER(ebbrr, 64)
+BOOK3S_WRAPPER(bescr, 64)
+BOOK3S_WRAPPER(ic, 64)
+BOOK3S_WRAPPER(vrsave, 64)
+
+
+#define VCORE_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+	vcpu->arch.vcore->reg = val;					\
+}
+
+#define VCORE_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
+{									\
+	return vcpu->arch.vcore->reg;					\
+}
+
+#define VCORE_WRAPPER(reg, size)					\
+	VCORE_WRAPPER_SET(reg, size)					\
+	VCORE_WRAPPER_GET(reg, size)					\
+
+
+VCORE_WRAPPER(vtb, 64)
+VCORE_WRAPPER(tb_offset, 64)
+VCORE_WRAPPER(lpcr, 64)
+
+static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.dec_expires;
+}
+
+static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
+{
+	vcpu->arch.dec_expires = val;
+}
+
 /* Expiry time of vcpu DEC relative to host TB */
 static inline u64 kvmppc_dec_expires_host_tb(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.dec_expires - vcpu->arch.vcore->tb_offset;
+	return kvmppc_get_dec_expires(vcpu) - kvmppc_get_tb_offset_hv(vcpu);
 }
 
 static inline bool is_kvmppc_resume_guest(int r)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 79a9c0bb8bba..fbac353ac46b 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -936,7 +936,7 @@ static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
 #define SPRNG_WRAPPER_SET(reg, bookehv_spr)				\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val)	\
 {									\
-	mtspr(bookehv_spr, val);						\
+	mtspr(bookehv_spr, val);					\
 }									\
 
 #define SHARED_WRAPPER_GET(reg, size)					\
@@ -957,10 +957,32 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
 }									\
 
+#define SHARED_CACHE_WRAPPER_GET(reg, size)				\
+static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
+{									\
+	if (kvmppc_shared_big_endian(vcpu))				\
+	       return be##size##_to_cpu(vcpu->arch.shared->reg);	\
+	else								\
+	       return le##size##_to_cpu(vcpu->arch.shared->reg);	\
+}									\
+
+#define SHARED_CACHE_WRAPPER_SET(reg, size)				\
+static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+	if (kvmppc_shared_big_endian(vcpu))				\
+	       vcpu->arch.shared->reg = cpu_to_be##size(val);		\
+	else								\
+	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
+}									\
+
 #define SHARED_WRAPPER(reg, size)					\
 	SHARED_WRAPPER_GET(reg, size)					\
 	SHARED_WRAPPER_SET(reg, size)					\
 
+#define SHARED_CACHE_WRAPPER(reg, size)					\
+	SHARED_CACHE_WRAPPER_GET(reg, size)				\
+	SHARED_CACHE_WRAPPER_SET(reg, size)				\
+
 #define SPRNG_WRAPPER(reg, bookehv_spr)					\
 	SPRNG_WRAPPER_GET(reg, bookehv_spr)				\
 	SPRNG_WRAPPER_SET(reg, bookehv_spr)				\
@@ -970,23 +992,29 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 #define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)			\
 	SPRNG_WRAPPER(reg, bookehv_spr)					\
 
+#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr)		\
+	SPRNG_WRAPPER(reg, bookehv_spr)					\
+
 #else
 
 #define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)			\
 	SHARED_WRAPPER(reg, size)					\
 
+#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr)		\
+	SHARED_CACHE_WRAPPER(reg, size)					\
+
 #endif
 
 SHARED_WRAPPER(critical, 64)
-SHARED_SPRNG_WRAPPER(sprg0, 64, SPRN_GSPRG0)
-SHARED_SPRNG_WRAPPER(sprg1, 64, SPRN_GSPRG1)
-SHARED_SPRNG_WRAPPER(sprg2, 64, SPRN_GSPRG2)
-SHARED_SPRNG_WRAPPER(sprg3, 64, SPRN_GSPRG3)
-SHARED_SPRNG_WRAPPER(srr0, 64, SPRN_GSRR0)
-SHARED_SPRNG_WRAPPER(srr1, 64, SPRN_GSRR1)
-SHARED_SPRNG_WRAPPER(dar, 64, SPRN_GDEAR)
+SHARED_SPRNG_CACHE_WRAPPER(sprg0, 64, SPRN_GSPRG0)
+SHARED_SPRNG_CACHE_WRAPPER(sprg1, 64, SPRN_GSPRG1)
+SHARED_SPRNG_CACHE_WRAPPER(sprg2, 64, SPRN_GSPRG2)
+SHARED_SPRNG_CACHE_WRAPPER(sprg3, 64, SPRN_GSPRG3)
+SHARED_SPRNG_CACHE_WRAPPER(srr0, 64, SPRN_GSRR0)
+SHARED_SPRNG_CACHE_WRAPPER(srr1, 64, SPRN_GSRR1)
+SHARED_SPRNG_CACHE_WRAPPER(dar, 64, SPRN_GDEAR)
 SHARED_SPRNG_WRAPPER(esr, 64, SPRN_GESR)
-SHARED_WRAPPER_GET(msr, 64)
+SHARED_CACHE_WRAPPER_GET(msr, 64)
 static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val)
 {
 	if (kvmppc_shared_big_endian(vcpu))
@@ -994,7 +1022,7 @@ static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val)
 	else
 	       vcpu->arch.shared->msr = cpu_to_le64(val);
 }
-SHARED_WRAPPER(dsisr, 32)
+SHARED_CACHE_WRAPPER(dsisr, 32)
 SHARED_WRAPPER(int_pending, 32)
 SHARED_WRAPPER(sprg4, 64)
 SHARED_WRAPPER(sprg5, 64)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 686d8d9eda3e..2fe31b518886 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -565,7 +565,7 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 	regs->msr = kvmppc_get_msr(vcpu);
 	regs->srr0 = kvmppc_get_srr0(vcpu);
 	regs->srr1 = kvmppc_get_srr1(vcpu);
-	regs->pid = vcpu->arch.pid;
+	regs->pid = kvmppc_get_pid(vcpu);
 	regs->sprg0 = kvmppc_get_sprg0(vcpu);
 	regs->sprg1 = kvmppc_get_sprg1(vcpu);
 	regs->sprg2 = kvmppc_get_sprg2(vcpu);
@@ -683,19 +683,19 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			*val = get_reg_val(id, vcpu->arch.fscr);
 			break;
 		case KVM_REG_PPC_TAR:
-			*val = get_reg_val(id, vcpu->arch.tar);
+			*val = get_reg_val(id, kvmppc_get_tar(vcpu));
 			break;
 		case KVM_REG_PPC_EBBHR:
-			*val = get_reg_val(id, vcpu->arch.ebbhr);
+			*val = get_reg_val(id, kvmppc_get_ebbhr(vcpu));
 			break;
 		case KVM_REG_PPC_EBBRR:
-			*val = get_reg_val(id, vcpu->arch.ebbrr);
+			*val = get_reg_val(id, kvmppc_get_ebbrr(vcpu));
 			break;
 		case KVM_REG_PPC_BESCR:
-			*val = get_reg_val(id, vcpu->arch.bescr);
+			*val = get_reg_val(id, kvmppc_get_bescr(vcpu));
 			break;
 		case KVM_REG_PPC_IC:
-			*val = get_reg_val(id, vcpu->arch.ic);
+			*val = get_reg_val(id, kvmppc_get_ic(vcpu));
 			break;
 		default:
 			r = -EINVAL;
@@ -768,19 +768,19 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			vcpu->arch.fscr = set_reg_val(id, *val);
 			break;
 		case KVM_REG_PPC_TAR:
-			vcpu->arch.tar = set_reg_val(id, *val);
+			kvmppc_set_tar(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_EBBHR:
-			vcpu->arch.ebbhr = set_reg_val(id, *val);
+			kvmppc_set_ebbhr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_EBBRR:
-			vcpu->arch.ebbrr = set_reg_val(id, *val);
+			kvmppc_set_ebbrr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_BESCR:
-			vcpu->arch.bescr = set_reg_val(id, *val);
+			kvmppc_set_bescr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_IC:
-			vcpu->arch.ic = set_reg_val(id, *val);
+			kvmppc_set_ic(vcpu, set_reg_val(id, *val));
 			break;
 		default:
 			r = -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7f765d5ad436..738f2ecbe9b9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -347,7 +347,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	unsigned long v, orig_v, gr;
 	__be64 *hptep;
 	long int index;
-	int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
+	int virtmode = kvmppc_get_msr(vcpu) & (data ? MSR_DR : MSR_IR);
 
 	if (kvm_is_radix(vcpu->kvm))
 		return kvmppc_mmu_radix_xlate(vcpu, eaddr, gpte, data, iswrite);
@@ -385,7 +385,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 
 	/* Get PP bits and key for permission check */
 	pp = gr & (HPTE_R_PP0 | HPTE_R_PP);
-	key = (vcpu->arch.shregs.msr & MSR_PR) ? SLB_VSID_KP : SLB_VSID_KS;
+	key = (kvmppc_get_msr(vcpu) & MSR_PR) ? SLB_VSID_KP : SLB_VSID_KS;
 	key &= slb_v;
 
 	/* Calculate permissions */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 461307b89c3a..e1aa078580a1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -15,6 +15,7 @@
 
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
+#include "book3s_hv.h"
 #include <asm/page.h>
 #include <asm/mmu.h>
 #include <asm/pgalloc.h>
@@ -96,7 +97,7 @@ static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr,
 					  void *to, void *from, unsigned long n)
 {
 	int lpid = vcpu->kvm->arch.lpid;
-	int pid = vcpu->arch.pid;
+	int pid = kvmppc_get_pid(vcpu);
 
 	/* This would cause a data segment intr so don't allow the access */
 	if (eaddr & (0x3FFUL << 52))
@@ -270,7 +271,7 @@ int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	/* Work out effective PID */
 	switch (eaddr >> 62) {
 	case 0:
-		pid = vcpu->arch.pid;
+		pid = kvmppc_get_pid(vcpu);
 		break;
 	case 3:
 		pid = 0;
@@ -294,9 +295,9 @@ int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	} else {
 		if (!(pte & _PAGE_PRIVILEGED)) {
 			/* Check AMR/IAMR to see if strict mode is in force */
-			if (vcpu->arch.amr & (1ul << 62))
+			if (kvmppc_get_amr_hv(vcpu) & (1ul << 62))
 				gpte->may_read = 0;
-			if (vcpu->arch.amr & (1ul << 63))
+			if (kvmppc_get_amr_hv(vcpu) & (1ul << 63))
 				gpte->may_write = 0;
 			if (vcpu->arch.iamr & (1ul << 62))
 				gpte->may_execute = 0;
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 93b695b289e9..4ba048f272f2 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -786,12 +786,12 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	idx = (ioba >> stt->page_shift) - stt->offset;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	if (!page) {
-		vcpu->arch.regs.gpr[4] = 0;
+		kvmppc_set_gpr(vcpu, 4, 0);
 		return H_SUCCESS;
 	}
 	tbl = (u64 *)page_address(page);
 
-	vcpu->arch.regs.gpr[4] = tbl[idx % TCES_PER_PAGE];
+	kvmppc_set_gpr(vcpu, 4, tbl[idx % TCES_PER_PAGE]);
 
 	return H_SUCCESS;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 130bafdb1430..521d84621422 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -383,11 +383,6 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 	spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
 }
 
-static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
-{
-	vcpu->arch.pvr = pvr;
-}
-
 /* Dummy value used in computing PCR value below */
 #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
 
@@ -794,7 +789,7 @@ static void kvmppc_update_vpa_dispatch(struct kvm_vcpu *vcpu,
 
 	vpa->enqueue_dispatch_tb = cpu_to_be64(be64_to_cpu(vpa->enqueue_dispatch_tb) + stolen);
 
-	__kvmppc_create_dtl_entry(vcpu, vpa, vc->pcpu, now + vc->tb_offset, stolen);
+	__kvmppc_create_dtl_entry(vcpu, vpa, vc->pcpu, now + kvmppc_get_tb_offset_hv(vcpu), stolen);
 
 	vcpu->arch.vpa.dirty = true;
 }
@@ -868,7 +863,7 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
 		/* Guests can't breakpoint the hypervisor */
 		if ((value1 & CIABR_PRIV) == CIABR_PRIV_HYPER)
 			return H_P3;
-		vcpu->arch.ciabr  = value1;
+		kvmppc_set_ciabr_hv(vcpu, value1);
 		return H_SUCCESS;
 	case H_SET_MODE_RESOURCE_SET_DAWR0:
 		if (!kvmppc_power8_compatible(vcpu))
@@ -879,8 +874,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
 			return H_UNSUPPORTED_FLAG_START;
 		if (value2 & DABRX_HYP)
 			return H_P4;
-		vcpu->arch.dawr0  = value1;
-		vcpu->arch.dawrx0 = value2;
+		kvmppc_set_dawr0_hv(vcpu, value1);
+		kvmppc_set_dawrx0_hv(vcpu, value2);
 		return H_SUCCESS;
 	case H_SET_MODE_RESOURCE_SET_DAWR1:
 		if (!kvmppc_power8_compatible(vcpu))
@@ -895,8 +890,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
 			return H_UNSUPPORTED_FLAG_START;
 		if (value2 & DABRX_HYP)
 			return H_P4;
-		vcpu->arch.dawr1  = value1;
-		vcpu->arch.dawrx1 = value2;
+		kvmppc_set_dawr1_hv(vcpu, value1);
+		kvmppc_set_dawrx1_hv(vcpu, value2);
 		return H_SUCCESS;
 	case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE:
 		/*
@@ -1268,8 +1263,11 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 		break;
 #endif
 	case H_RANDOM:
-		if (!arch_get_random_seed_longs(&vcpu->arch.regs.gpr[4], 1))
+		unsigned long rand;
+
+		if (!arch_get_random_seed_longs(&rand, 1))
 			ret = H_HARDWARE;
+		kvmppc_set_gpr(vcpu, 4, rand);
 		break;
 	case H_RPT_INVALIDATE:
 		ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
@@ -1370,7 +1368,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
  */
 static void kvmppc_cede(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.shregs.msr |= MSR_EE;
+	kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
 	vcpu->arch.ceded = 1;
 	smp_mb();
 	if (vcpu->arch.prodded) {
@@ -1544,7 +1542,7 @@ static int kvmppc_pmu_unavailable(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.hfscr_permitted & HFSCR_PM))
 		return EMULATE_FAIL;
 
-	vcpu->arch.hfscr |= HFSCR_PM;
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_PM);
 
 	return RESUME_GUEST;
 }
@@ -1554,7 +1552,7 @@ static int kvmppc_ebb_unavailable(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.hfscr_permitted & HFSCR_EBB))
 		return EMULATE_FAIL;
 
-	vcpu->arch.hfscr |= HFSCR_EBB;
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_EBB);
 
 	return RESUME_GUEST;
 }
@@ -1564,7 +1562,7 @@ static int kvmppc_tm_unavailable(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.hfscr_permitted & HFSCR_TM))
 		return EMULATE_FAIL;
 
-	vcpu->arch.hfscr |= HFSCR_TM;
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_TM);
 
 	return RESUME_GUEST;
 }
@@ -1585,7 +1583,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 	 * That can happen due to a bug, or due to a machine check
 	 * occurring at just the wrong time.
 	 */
-	if (vcpu->arch.shregs.msr & MSR_HV) {
+	if (kvmppc_get_msr(vcpu) & MSR_HV) {
 		printk(KERN_EMERG "KVM trap in HV mode!\n");
 		printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
 			vcpu->arch.trap, kvmppc_get_pc(vcpu),
@@ -1636,7 +1634,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		 * so that it knows that the machine check occurred.
 		 */
 		if (!vcpu->kvm->arch.fwnmi_enabled) {
-			ulong flags = (vcpu->arch.shregs.msr & 0x083c0000) |
+			ulong flags = (kvmppc_get_msr(vcpu) & 0x083c0000) |
 					(kvmppc_get_msr(vcpu) & SRR1_PREFIXED);
 			kvmppc_core_queue_machine_check(vcpu, flags);
 			r = RESUME_GUEST;
@@ -1666,7 +1664,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		 * as a result of a hypervisor emulation interrupt
 		 * (e40) getting turned into a 700 by BML RTAS.
 		 */
-		flags = (vcpu->arch.shregs.msr & 0x1f0000ull) |
+		flags = (kvmppc_get_msr(vcpu) & 0x1f0000ull) |
 			(kvmppc_get_msr(vcpu) & SRR1_PREFIXED);
 		kvmppc_core_queue_program(vcpu, flags);
 		r = RESUME_GUEST;
@@ -1676,7 +1674,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 	{
 		int i;
 
-		if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
+		if (unlikely(kvmppc_get_msr(vcpu) & MSR_PR)) {
 			/*
 			 * Guest userspace executed sc 1. This can only be
 			 * reached by the P9 path because the old path
@@ -1754,7 +1752,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			break;
 		}
 
-		if (!(vcpu->arch.shregs.msr & MSR_DR))
+		if (!(kvmppc_get_msr(vcpu) & MSR_DR))
 			vsid = vcpu->kvm->arch.vrma_slb_v;
 		else
 			vsid = vcpu->arch.fault_gpa;
@@ -1778,7 +1776,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		long err;
 
 		vcpu->arch.fault_dar = kvmppc_get_pc(vcpu);
-		vcpu->arch.fault_dsisr = vcpu->arch.shregs.msr &
+		vcpu->arch.fault_dsisr = kvmppc_get_msr(vcpu) &
 			DSISR_SRR1_MATCH_64S;
 		if (kvm_is_radix(vcpu->kvm) || !cpu_has_feature(CPU_FTR_ARCH_300)) {
 			/*
@@ -1787,7 +1785,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			 * hash fault handling below is v3 only (it uses ASDR
 			 * via fault_gpa).
 			 */
-			if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE)
+			if (kvmppc_get_msr(vcpu) & HSRR1_HISI_WRITE)
 				vcpu->arch.fault_dsisr |= DSISR_ISSTORE;
 			r = RESUME_PAGE_FAULT;
 			break;
@@ -1801,7 +1799,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			break;
 		}
 
-		if (!(vcpu->arch.shregs.msr & MSR_IR))
+		if (!(kvmppc_get_msr(vcpu) & MSR_IR))
 			vsid = vcpu->kvm->arch.vrma_slb_v;
 		else
 			vsid = vcpu->arch.fault_gpa;
@@ -1863,7 +1861,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 	 * Otherwise, we just generate a program interrupt to the guest.
 	 */
 	case BOOK3S_INTERRUPT_H_FAC_UNAVAIL: {
-		u64 cause = vcpu->arch.hfscr >> 56;
+		u64 cause = kvmppc_get_hfscr_hv(vcpu) >> 56;
 
 		r = EMULATE_FAIL;
 		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
@@ -1891,7 +1889,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		kvmppc_dump_regs(vcpu);
 		printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
 			vcpu->arch.trap, kvmppc_get_pc(vcpu),
-			vcpu->arch.shregs.msr);
+			kvmppc_get_msr(vcpu));
 		run->hw.hardware_exit_reason = vcpu->arch.trap;
 		r = RESUME_HOST;
 		break;
@@ -1915,11 +1913,11 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu)
 	 * That can happen due to a bug, or due to a machine check
 	 * occurring at just the wrong time.
 	 */
-	if (vcpu->arch.shregs.msr & MSR_HV) {
+	if (kvmppc_get_msr(vcpu) & MSR_HV) {
 		pr_emerg("KVM trap in HV mode while nested!\n");
 		pr_emerg("trap=0x%x | pc=0x%lx | msr=0x%llx\n",
 			 vcpu->arch.trap, kvmppc_get_pc(vcpu),
-			 vcpu->arch.shregs.msr);
+			 kvmppc_get_msr(vcpu));
 		kvmppc_dump_regs(vcpu);
 		return RESUME_HOST;
 	}
@@ -1976,7 +1974,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu)
 		vcpu->arch.fault_dar = kvmppc_get_pc(vcpu);
 		vcpu->arch.fault_dsisr = kvmppc_get_msr(vcpu) &
 					 DSISR_SRR1_MATCH_64S;
-		if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE)
+		if (kvmppc_get_msr(vcpu) & HSRR1_HISI_WRITE)
 			vcpu->arch.fault_dsisr |= DSISR_ISSTORE;
 		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 		r = kvmhv_nested_page_fault(vcpu);
@@ -2182,7 +2180,7 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 		}
 	}
 
-	vc->lpcr = new_lpcr;
+	kvmppc_set_lpcr_hv(vcpu, new_lpcr);
 
 	spin_unlock(&vc->lock);
 }
@@ -2207,64 +2205,64 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		*val = get_reg_val(id, vcpu->arch.dabrx);
 		break;
 	case KVM_REG_PPC_DSCR:
-		*val = get_reg_val(id, vcpu->arch.dscr);
+		*val = get_reg_val(id, kvmppc_get_dscr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_PURR:
-		*val = get_reg_val(id, vcpu->arch.purr);
+		*val = get_reg_val(id, kvmppc_get_purr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_SPURR:
-		*val = get_reg_val(id, vcpu->arch.spurr);
+		*val = get_reg_val(id, kvmppc_get_spurr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_AMR:
-		*val = get_reg_val(id, vcpu->arch.amr);
+		*val = get_reg_val(id, kvmppc_get_amr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_UAMOR:
-		*val = get_reg_val(id, vcpu->arch.uamor);
+		*val = get_reg_val(id, kvmppc_get_uamor_hv(vcpu));
 		break;
 	case KVM_REG_PPC_MMCR0 ... KVM_REG_PPC_MMCR1:
 		i = id - KVM_REG_PPC_MMCR0;
-		*val = get_reg_val(id, vcpu->arch.mmcr[i]);
+		*val = get_reg_val(id, kvmppc_get_mmcr_hv(vcpu, i));
 		break;
 	case KVM_REG_PPC_MMCR2:
-		*val = get_reg_val(id, vcpu->arch.mmcr[2]);
+		*val = get_reg_val(id, kvmppc_get_mmcr_hv(vcpu, 2));
 		break;
 	case KVM_REG_PPC_MMCRA:
-		*val = get_reg_val(id, vcpu->arch.mmcra);
+		*val = get_reg_val(id, kvmppc_get_mmcra_hv(vcpu));
 		break;
 	case KVM_REG_PPC_MMCRS:
 		*val = get_reg_val(id, vcpu->arch.mmcrs);
 		break;
 	case KVM_REG_PPC_MMCR3:
-		*val = get_reg_val(id, vcpu->arch.mmcr[3]);
+		*val = get_reg_val(id, kvmppc_get_mmcr_hv(vcpu, 3));
 		break;
 	case KVM_REG_PPC_PMC1 ... KVM_REG_PPC_PMC8:
 		i = id - KVM_REG_PPC_PMC1;
-		*val = get_reg_val(id, vcpu->arch.pmc[i]);
+		*val = get_reg_val(id, kvmppc_get_pmc_hv(vcpu, i));
 		break;
 	case KVM_REG_PPC_SPMC1 ... KVM_REG_PPC_SPMC2:
 		i = id - KVM_REG_PPC_SPMC1;
 		*val = get_reg_val(id, vcpu->arch.spmc[i]);
 		break;
 	case KVM_REG_PPC_SIAR:
-		*val = get_reg_val(id, vcpu->arch.siar);
+		*val = get_reg_val(id, kvmppc_get_siar_hv(vcpu));
 		break;
 	case KVM_REG_PPC_SDAR:
-		*val = get_reg_val(id, vcpu->arch.sdar);
+		*val = get_reg_val(id, kvmppc_get_siar_hv(vcpu));
 		break;
 	case KVM_REG_PPC_SIER:
-		*val = get_reg_val(id, vcpu->arch.sier[0]);
+		*val = get_reg_val(id, kvmppc_get_sier_hv(vcpu, 0));
 		break;
 	case KVM_REG_PPC_SIER2:
-		*val = get_reg_val(id, vcpu->arch.sier[1]);
+		*val = get_reg_val(id, kvmppc_get_sier_hv(vcpu, 1));
 		break;
 	case KVM_REG_PPC_SIER3:
-		*val = get_reg_val(id, vcpu->arch.sier[2]);
+		*val = get_reg_val(id, kvmppc_get_sier_hv(vcpu, 2));
 		break;
 	case KVM_REG_PPC_IAMR:
-		*val = get_reg_val(id, vcpu->arch.iamr);
+		*val = get_reg_val(id, kvmppc_get_iamr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_PSPB:
-		*val = get_reg_val(id, vcpu->arch.pspb);
+		*val = get_reg_val(id, kvmppc_get_pspb_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DPDES:
 		/*
@@ -2279,22 +2277,22 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 			*val = get_reg_val(id, vcpu->arch.vcore->dpdes);
 		break;
 	case KVM_REG_PPC_VTB:
-		*val = get_reg_val(id, vcpu->arch.vcore->vtb);
+		*val = get_reg_val(id, kvmppc_get_vtb_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWR:
-		*val = get_reg_val(id, vcpu->arch.dawr0);
+		*val = get_reg_val(id, kvmppc_get_dawr0_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWRX:
-		*val = get_reg_val(id, vcpu->arch.dawrx0);
+		*val = get_reg_val(id, kvmppc_get_dawrx0_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWR1:
-		*val = get_reg_val(id, vcpu->arch.dawr1);
+		*val = get_reg_val(id, kvmppc_get_dawr1_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWRX1:
-		*val = get_reg_val(id, vcpu->arch.dawrx1);
+		*val = get_reg_val(id, kvmppc_get_dawrx1_hv(vcpu));
 		break;
 	case KVM_REG_PPC_CIABR:
-		*val = get_reg_val(id, vcpu->arch.ciabr);
+		*val = get_reg_val(id, kvmppc_get_ciabr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_CSIGR:
 		*val = get_reg_val(id, vcpu->arch.csigr);
@@ -2306,13 +2304,13 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		*val = get_reg_val(id, vcpu->arch.tcscr);
 		break;
 	case KVM_REG_PPC_PID:
-		*val = get_reg_val(id, vcpu->arch.pid);
+		*val = get_reg_val(id, kvmppc_get_pid(vcpu));
 		break;
 	case KVM_REG_PPC_ACOP:
 		*val = get_reg_val(id, vcpu->arch.acop);
 		break;
 	case KVM_REG_PPC_WORT:
-		*val = get_reg_val(id, vcpu->arch.wort);
+		*val = get_reg_val(id, kvmppc_get_wort_hv(vcpu));
 		break;
 	case KVM_REG_PPC_TIDR:
 		*val = get_reg_val(id, vcpu->arch.tid);
@@ -2338,14 +2336,14 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		spin_unlock(&vcpu->arch.vpa_update_lock);
 		break;
 	case KVM_REG_PPC_TB_OFFSET:
-		*val = get_reg_val(id, vcpu->arch.vcore->tb_offset);
+		*val = get_reg_val(id, kvmppc_get_tb_offset_hv(vcpu));
 		break;
 	case KVM_REG_PPC_LPCR:
 	case KVM_REG_PPC_LPCR_64:
 		*val = get_reg_val(id, vcpu->arch.vcore->lpcr);
 		break;
 	case KVM_REG_PPC_PPR:
-		*val = get_reg_val(id, vcpu->arch.ppr);
+		*val = get_reg_val(id, kvmppc_get_ppr_hv(vcpu));
 		break;
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	case KVM_REG_PPC_TFHAR:
@@ -2417,7 +2415,7 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		*val = get_reg_val(id, vcpu->arch.vcore->arch_compat);
 		break;
 	case KVM_REG_PPC_DEC_EXPIRY:
-		*val = get_reg_val(id, vcpu->arch.dec_expires);
+		*val = get_reg_val(id, kvmppc_get_dec_expires(vcpu));
 		break;
 	case KVM_REG_PPC_ONLINE:
 		*val = get_reg_val(id, vcpu->arch.online);
@@ -2425,6 +2423,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_PTCR:
 		*val = get_reg_val(id, vcpu->kvm->arch.l1_ptcr);
 		break;
+	case KVM_REG_PPC_FSCR:
+		*val = get_reg_val(id, kvmppc_get_fscr_hv(vcpu));
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -2453,29 +2454,29 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		vcpu->arch.dabrx = set_reg_val(id, *val) & ~DABRX_HYP;
 		break;
 	case KVM_REG_PPC_DSCR:
-		vcpu->arch.dscr = set_reg_val(id, *val);
+		kvmppc_set_dscr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_PURR:
-		vcpu->arch.purr = set_reg_val(id, *val);
+		kvmppc_set_purr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SPURR:
-		vcpu->arch.spurr = set_reg_val(id, *val);
+		kvmppc_set_spurr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_AMR:
-		vcpu->arch.amr = set_reg_val(id, *val);
+		kvmppc_set_amr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_UAMOR:
-		vcpu->arch.uamor = set_reg_val(id, *val);
+		kvmppc_set_uamor_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCR0 ... KVM_REG_PPC_MMCR1:
 		i = id - KVM_REG_PPC_MMCR0;
-		vcpu->arch.mmcr[i] = set_reg_val(id, *val);
+		kvmppc_set_mmcr_hv(vcpu, i, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCR2:
-		vcpu->arch.mmcr[2] = set_reg_val(id, *val);
+		kvmppc_set_mmcr_hv(vcpu, 2, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCRA:
-		vcpu->arch.mmcra = set_reg_val(id, *val);
+		kvmppc_set_mmcra_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCRS:
 		vcpu->arch.mmcrs = set_reg_val(id, *val);
@@ -2485,32 +2486,32 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		break;
 	case KVM_REG_PPC_PMC1 ... KVM_REG_PPC_PMC8:
 		i = id - KVM_REG_PPC_PMC1;
-		vcpu->arch.pmc[i] = set_reg_val(id, *val);
+		kvmppc_set_pmc_hv(vcpu, i, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SPMC1 ... KVM_REG_PPC_SPMC2:
 		i = id - KVM_REG_PPC_SPMC1;
 		vcpu->arch.spmc[i] = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_SIAR:
-		vcpu->arch.siar = set_reg_val(id, *val);
+		kvmppc_set_siar_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SDAR:
-		vcpu->arch.sdar = set_reg_val(id, *val);
+		kvmppc_set_sdar_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SIER:
-		vcpu->arch.sier[0] = set_reg_val(id, *val);
+		kvmppc_set_sier_hv(vcpu, 0, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SIER2:
-		vcpu->arch.sier[1] = set_reg_val(id, *val);
+		kvmppc_set_sier_hv(vcpu, 1, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SIER3:
-		vcpu->arch.sier[2] = set_reg_val(id, *val);
+		kvmppc_set_sier_hv(vcpu, 2, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_IAMR:
-		vcpu->arch.iamr = set_reg_val(id, *val);
+		kvmppc_set_iamr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_PSPB:
-		vcpu->arch.pspb = set_reg_val(id, *val);
+		kvmppc_set_pspb_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DPDES:
 		if (cpu_has_feature(CPU_FTR_ARCH_300))
@@ -2519,25 +2520,25 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 			vcpu->arch.vcore->dpdes = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_VTB:
-		vcpu->arch.vcore->vtb = set_reg_val(id, *val);
+		kvmppc_set_vtb_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DAWR:
-		vcpu->arch.dawr0 = set_reg_val(id, *val);
+		kvmppc_set_dawr0_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DAWRX:
-		vcpu->arch.dawrx0 = set_reg_val(id, *val) & ~DAWRX_HYP;
+		kvmppc_set_dawrx0_hv(vcpu, set_reg_val(id, *val) & ~DAWRX_HYP);
 		break;
 	case KVM_REG_PPC_DAWR1:
-		vcpu->arch.dawr1 = set_reg_val(id, *val);
+		kvmppc_set_dawr1_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DAWRX1:
-		vcpu->arch.dawrx1 = set_reg_val(id, *val) & ~DAWRX_HYP;
+		kvmppc_set_dawrx1_hv(vcpu, set_reg_val(id, *val) & ~DAWRX_HYP);
 		break;
 	case KVM_REG_PPC_CIABR:
-		vcpu->arch.ciabr = set_reg_val(id, *val);
+		kvmppc_set_ciabr_hv(vcpu, set_reg_val(id, *val));
 		/* Don't allow setting breakpoints in hypervisor code */
-		if ((vcpu->arch.ciabr & CIABR_PRIV) == CIABR_PRIV_HYPER)
-			vcpu->arch.ciabr &= ~CIABR_PRIV;	/* disable */
+		if ((kvmppc_get_ciabr_hv(vcpu) & CIABR_PRIV) == CIABR_PRIV_HYPER)
+			kvmppc_set_ciabr_hv(vcpu, kvmppc_get_ciabr_hv(vcpu) & ~CIABR_PRIV);	/* disable */
 		break;
 	case KVM_REG_PPC_CSIGR:
 		vcpu->arch.csigr = set_reg_val(id, *val);
@@ -2549,13 +2550,13 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		vcpu->arch.tcscr = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_PID:
-		vcpu->arch.pid = set_reg_val(id, *val);
+		kvmppc_set_pid(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_ACOP:
 		vcpu->arch.acop = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_WORT:
-		vcpu->arch.wort = set_reg_val(id, *val);
+		kvmppc_set_wort_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_TIDR:
 		vcpu->arch.tid = set_reg_val(id, *val);
@@ -2602,10 +2603,11 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		 * decrementer, which is better than a large one that
 		 * causes a hang.
 		 */
-		if (!vcpu->arch.dec_expires && tb_offset)
-			vcpu->arch.dec_expires = get_tb() + tb_offset;
+		kvmppc_set_tb_offset_hv(vcpu, tb_offset);
+		if (!kvmppc_get_dec_expires(vcpu) && tb_offset)
+			kvmppc_set_dec_expires(vcpu, get_tb() + tb_offset);
 
-		vcpu->arch.vcore->tb_offset = tb_offset;
+		kvmppc_set_tb_offset_hv(vcpu, tb_offset);
 		break;
 	}
 	case KVM_REG_PPC_LPCR:
@@ -2615,7 +2617,7 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		kvmppc_set_lpcr(vcpu, set_reg_val(id, *val), false);
 		break;
 	case KVM_REG_PPC_PPR:
-		vcpu->arch.ppr = set_reg_val(id, *val);
+		kvmppc_set_ppr_hv(vcpu, set_reg_val(id, *val));
 		break;
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	case KVM_REG_PPC_TFHAR:
@@ -2686,7 +2688,7 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		r = kvmppc_set_arch_compat(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DEC_EXPIRY:
-		vcpu->arch.dec_expires = set_reg_val(id, *val);
+		kvmppc_set_dec_expires(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_ONLINE:
 		i = set_reg_val(id, *val);
@@ -2699,6 +2701,9 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_PTCR:
 		vcpu->kvm->arch.l1_ptcr = set_reg_val(id, *val);
 		break;
+	case KVM_REG_PPC_FSCR:
+		kvmppc_set_fscr_hv(vcpu, set_reg_val(id, *val));
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -2916,19 +2921,20 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 	vcpu->arch.shared_big_endian = false;
 #endif
 #endif
-	vcpu->arch.mmcr[0] = MMCR0_FC;
+	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
+
 	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
-		vcpu->arch.mmcr[0] |= MMCR0_PMCCEXT;
-		vcpu->arch.mmcra = MMCRA_BHRB_DISABLE;
+		kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
+		kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
 	}
 
-	vcpu->arch.ctrl = CTRL_RUNLATCH;
+	kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
 	/* default to host PVR, since we can't spoof it */
 	kvmppc_set_pvr_hv(vcpu, mfspr(SPRN_PVR));
 	spin_lock_init(&vcpu->arch.vpa_update_lock);
 	spin_lock_init(&vcpu->arch.tbacct_lock);
 	vcpu->arch.busy_preempt = TB_NIL;
-	vcpu->arch.shregs.msr = MSR_ME;
+	kvmppc_set_msr_fast(vcpu, MSR_ME);
 	vcpu->arch.intr_msr = MSR_SF | MSR_ME;
 
 	/*
@@ -2938,29 +2944,30 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 	 * don't set the HFSCR_MSGP bit, and that causes those instructions
 	 * to trap and then we emulate them.
 	 */
-	vcpu->arch.hfscr = HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
-		HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP;
+	kvmppc_set_hfscr_hv(vcpu, HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
+			    HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP);
 
 	/* On POWER10 and later, allow prefixed instructions */
 	if (cpu_has_feature(CPU_FTR_ARCH_31))
-		vcpu->arch.hfscr |= HFSCR_PREFIX;
+		kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_PREFIX);
 
 	if (cpu_has_feature(CPU_FTR_HVMODE)) {
-		vcpu->arch.hfscr &= mfspr(SPRN_HFSCR);
+		kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) & mfspr(SPRN_HFSCR));
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 		if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
-			vcpu->arch.hfscr |= HFSCR_TM;
+			kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_TM);
 #endif
 	}
 	if (cpu_has_feature(CPU_FTR_TM_COMP))
 		vcpu->arch.hfscr |= HFSCR_TM;
 
-	vcpu->arch.hfscr_permitted = vcpu->arch.hfscr;
+	vcpu->arch.hfscr_permitted = kvmppc_get_hfscr_hv(vcpu);
 
 	/*
 	 * PM, EBB, TM are demand-faulted so start with it clear.
 	 */
-	vcpu->arch.hfscr &= ~(HFSCR_PM | HFSCR_EBB | HFSCR_TM);
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) & ~(HFSCR_PM | HFSCR_EBB | HFSCR_TM));
 
 	kvmppc_mmu_book3s_hv_init(vcpu);
 
@@ -4038,7 +4045,6 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
 /* call our hypervisor to load up HV regs and go */
 static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
 {
-	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 	unsigned long host_psscr;
 	unsigned long msr;
 	struct hv_guest_state hvregs;
@@ -4118,7 +4124,7 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, uns
 	if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
 		dec = (s32) dec;
 	*tb = mftb();
-	vcpu->arch.dec_expires = dec + (*tb + vc->tb_offset);
+	vcpu->arch.dec_expires = dec + (*tb + kvmppc_get_tb_offset_hv(vcpu));
 
 	timer_rearm_host_dec(*tb);
 
@@ -4176,7 +4182,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 		__this_cpu_write(cpu_in_guest, NULL);
 
 		if (trap == BOOK3S_INTERRUPT_SYSCALL &&
-		    !(vcpu->arch.shregs.msr & MSR_PR)) {
+		    !(kvmppc_get_msr(vcpu) & MSR_PR)) {
 			unsigned long req = kvmppc_get_gpr(vcpu, 3);
 
 			/*
@@ -4655,7 +4661,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 
 	if (!nested) {
 		kvmppc_core_prepare_to_enter(vcpu);
-		if (vcpu->arch.shregs.msr & MSR_EE) {
+		if (kvmppc_get_msr(vcpu) & MSR_EE) {
 			if (xive_interrupt_pending(vcpu))
 				kvmppc_inject_interrupt_hv(vcpu,
 						BOOK3S_INTERRUPT_EXTERNAL, 0);
@@ -4677,7 +4683,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 
 	tb = mftb();
 
-	kvmppc_update_vpa_dispatch_p9(vcpu, vc, tb + vc->tb_offset);
+	kvmppc_update_vpa_dispatch_p9(vcpu, vc, tb + kvmppc_get_tb_offset_hv(vcpu));
 
 	trace_kvm_guest_enter(vcpu);
 
@@ -4844,7 +4850,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 		msr |= MSR_VSX;
 	if ((cpu_has_feature(CPU_FTR_TM) ||
 	    cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) &&
-			(vcpu->arch.hfscr & HFSCR_TM))
+			(kvmppc_get_hfscr_hv(vcpu) & HFSCR_TM))
 		msr |= MSR_TM;
 	msr = msr_check_and_set(msr);
 
@@ -4868,7 +4874,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 		if (run->exit_reason == KVM_EXIT_PAPR_HCALL) {
 			accumulate_time(vcpu, &vcpu->arch.hcall);
 
-			if (WARN_ON_ONCE(vcpu->arch.shregs.msr & MSR_PR)) {
+			if (WARN_ON_ONCE(kvmppc_get_msr(vcpu) & MSR_PR)) {
 				/*
 				 * These should have been caught reflected
 				 * into the guest by now. Final sanity check:
diff --git a/arch/powerpc/kvm/book3s_hv.h b/arch/powerpc/kvm/book3s_hv.h
index 2f2e59d7d433..7a7005189ab1 100644
--- a/arch/powerpc/kvm/book3s_hv.h
+++ b/arch/powerpc/kvm/book3s_hv.h
@@ -50,3 +50,62 @@ void accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next);
 #define start_timing(vcpu, next) do {} while (0)
 #define end_timing(vcpu) do {} while (0)
 #endif
+
+#define HV_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+	vcpu->arch.reg = val;						\
+}
+
+#define HV_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
+{									\
+	return vcpu->arch.reg;						\
+}
+
+#define HV_WRAPPER(reg, size)						\
+	HV_WRAPPER_SET(reg, size)					\
+	HV_WRAPPER_GET(reg, size)					\
+
+#define HV_ARRAY_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, int i, u##size val)	\
+{									\
+	vcpu->arch.reg[i] = val;					\
+}
+
+#define HV_ARRAY_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu, int i)	\
+{									\
+	return vcpu->arch.reg[i];					\
+}
+
+#define HV_ARRAY_WRAPPER(reg, size)					\
+	HV_ARRAY_WRAPPER_SET(reg, size)					\
+	HV_ARRAY_WRAPPER_GET(reg, size)					\
+
+HV_WRAPPER(mmcra, 64)
+HV_WRAPPER(hfscr, 64)
+HV_WRAPPER(fscr, 64)
+HV_WRAPPER(dscr, 64)
+HV_WRAPPER(purr, 64)
+HV_WRAPPER(spurr, 64)
+HV_WRAPPER(amr, 64)
+HV_WRAPPER(uamor, 64)
+HV_WRAPPER(siar, 64)
+HV_WRAPPER(sdar, 64)
+HV_WRAPPER(iamr, 64)
+HV_WRAPPER(dawr0, 64)
+HV_WRAPPER(dawr1, 64)
+HV_WRAPPER(dawrx0, 64)
+HV_WRAPPER(dawrx1, 64)
+HV_WRAPPER(ciabr, 64)
+HV_WRAPPER(wort, 64)
+HV_WRAPPER(ppr, 64)
+HV_WRAPPER(ctrl, 64)
+
+HV_ARRAY_WRAPPER(mmcr, 64)
+HV_ARRAY_WRAPPER(sier, 64)
+HV_ARRAY_WRAPPER(pmc, 32)
+
+HV_WRAPPER(pvr, 32)
+HV_WRAPPER(pspb, 32)
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index da85f046377a..9f9e9aab6015 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -182,9 +182,13 @@ EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);
 
 long kvmppc_rm_h_random(struct kvm_vcpu *vcpu)
 {
+	unsigned long rand;
+
 	if (ppc_md.get_random_seed &&
-	    ppc_md.get_random_seed(&vcpu->arch.regs.gpr[4]))
+	    ppc_md.get_random_seed(&rand)) {
+		kvmppc_set_gpr(vcpu, 4, rand);
 		return H_SUCCESS;
+	}
 
 	return H_HARDWARE;
 }
@@ -510,7 +514,7 @@ void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
 	 */
 	if ((msr & MSR_TS_MASK) == MSR_TS_MASK)
 		msr &= ~MSR_TS_MASK;
-	vcpu->arch.shregs.msr = msr;
+	kvmppc_set_msr_fast(vcpu, msr);
 	kvmppc_end_cede(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvmppc_set_msr_hv);
@@ -548,7 +552,7 @@ static void inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 srr1_flags)
 	kvmppc_set_srr0(vcpu, pc);
 	kvmppc_set_srr1(vcpu, (msr & SRR1_MSR_BITS) | srr1_flags);
 	kvmppc_set_pc(vcpu, new_pc);
-	vcpu->arch.shregs.msr = new_msr;
+	kvmppc_set_msr_fast(vcpu, new_msr);
 }
 
 void kvmppc_inject_interrupt_hv(struct kvm_vcpu *vcpu, int vec, u64 srr1_flags)
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 34f1db212824..34bc0a8a1288 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -305,7 +305,7 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6
 	u32 pid;
 
 	lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
-	pid = vcpu->arch.pid;
+	pid = kvmppc_get_pid(vcpu);
 
 	/*
 	 * Prior memory accesses to host PID Q3 must be completed before we
@@ -330,7 +330,7 @@ static void switch_mmu_to_guest_hpt(struct kvm *kvm, struct kvm_vcpu *vcpu, u64
 	int i;
 
 	lpid = kvm->arch.lpid;
-	pid = vcpu->arch.pid;
+	pid = kvmppc_get_pid(vcpu);
 
 	/*
 	 * See switch_mmu_to_guest_radix. ptesync should not be required here
diff --git a/arch/powerpc/kvm/book3s_hv_ras.c b/arch/powerpc/kvm/book3s_hv_ras.c
index ccfd96965630..3b43c3d00311 100644
--- a/arch/powerpc/kvm/book3s_hv_ras.c
+++ b/arch/powerpc/kvm/book3s_hv_ras.c
@@ -15,6 +15,7 @@
 #include <asm/cputhreads.h>
 #include <asm/hmi.h>
 #include <asm/kvm_ppc.h>
+#include "book3s_hv.h"
 
 /* SRR1 bits for machine check on POWER7 */
 #define SRR1_MC_LDSTERR		(1ul << (63-42))
@@ -173,14 +174,14 @@ long kvmppc_p9_realmode_hmi_handler(struct kvm_vcpu *vcpu)
 		ppc_md.hmi_exception_early(NULL);
 
 out:
-	if (vc->tb_offset) {
+	if (kvmppc_get_tb_offset_hv(vcpu)) {
 		u64 new_tb = mftb() + vc->tb_offset;
 		mtspr(SPRN_TBU40, new_tb);
 		if ((mftb() & 0xffffff) < (new_tb & 0xffffff)) {
 			new_tb += 0x1000000;
 			mtspr(SPRN_TBU40, new_tb);
 		}
-		vc->tb_offset_applied = vc->tb_offset;
+		vc->tb_offset_applied = kvmppc_get_tb_offset_hv(vcpu);
 	}
 
 	return ret;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9182324dbef9..17cb75a127b0 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -776,8 +776,8 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 			r = rev[i].guest_rpte | (r & (HPTE_R_R | HPTE_R_C));
 			r &= ~HPTE_GR_RESERVED;
 		}
-		vcpu->arch.regs.gpr[4 + i * 2] = v;
-		vcpu->arch.regs.gpr[5 + i * 2] = r;
+		kvmppc_set_gpr(vcpu, 4 + i * 2, v);
+		kvmppc_set_gpr(vcpu, 5 + i * 2, r);
 	}
 	return H_SUCCESS;
 }
@@ -824,7 +824,7 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
 			}
 		}
 	}
-	vcpu->arch.regs.gpr[4] = gr;
+	kvmppc_set_gpr(vcpu, 4, gr);
 	ret = H_SUCCESS;
  out:
 	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
@@ -872,7 +872,7 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
 			kvmppc_set_dirty_from_hpte(kvm, v, gr);
 		}
 	}
-	vcpu->arch.regs.gpr[4] = gr;
+	kvmppc_set_gpr(vcpu, 4, gr);
 	ret = H_SUCCESS;
  out:
 	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index e165bfa842bf..e42984878503 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -481,7 +481,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 
 unsigned long xics_rm_h_xirr_x(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.regs.gpr[5] = get_tb();
+	kvmppc_set_gpr(vcpu, 5, get_tb());
 	return xics_rm_h_xirr(vcpu);
 }
 
@@ -518,7 +518,7 @@ unsigned long xics_rm_h_xirr(struct kvm_vcpu *vcpu)
 	} while (!icp_rm_try_update(icp, old_state, new_state));
 
 	/* Return the result in GPR4 */
-	vcpu->arch.regs.gpr[4] = xirr;
+	kvmppc_set_gpr(vcpu, 4, xirr);
 
 	return check_too_hard(xics, icp);
 }
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index f4115819e738..4adff4f1896d 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -29,6 +29,7 @@
 #include <linux/seq_file.h>
 
 #include "book3s_xive.h"
+#include "book3s_hv.h"
 
 #define __x_eoi_page(xd)	((void __iomem *)((xd)->eoi_mmio))
 #define __x_trig_page(xd)	((void __iomem *)((xd)->trig_mmio))
@@ -328,7 +329,7 @@ static unsigned long xive_vm_h_xirr(struct kvm_vcpu *vcpu)
 	 */
 
 	/* Return interrupt and old CPPR in GPR4 */
-	vcpu->arch.regs.gpr[4] = hirq | (old_cppr << 24);
+	kvmppc_set_gpr(vcpu, 4, hirq | (old_cppr << 24));
 
 	return H_SUCCESS;
 }
@@ -364,7 +365,7 @@ static unsigned long xive_vm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server
 	hirq = xive_vm_scan_interrupts(xc, pending, scan_poll);
 
 	/* Return interrupt and old CPPR in GPR4 */
-	vcpu->arch.regs.gpr[4] = hirq | (xc->cppr << 24);
+	kvmppc_set_gpr(vcpu, 4, hirq | (xc->cppr << 24));
 
 	return H_SUCCESS;
 }
@@ -2779,8 +2780,6 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 
 int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 {
-	struct kvmppc_vcore *vc = vcpu->arch.vcore;
-
 	/* The VM should have configured XICS mode before doing XICS hcalls. */
 	if (!kvmppc_xics_enabled(vcpu))
 		return H_TOO_HARD;
@@ -2799,7 +2798,7 @@ int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 		return xive_vm_h_ipoll(vcpu, kvmppc_get_gpr(vcpu, 4));
 	case H_XIRR_X:
 		xive_vm_h_xirr(vcpu);
-		kvmppc_set_gpr(vcpu, 5, get_tb() + vc->tb_offset);
+		kvmppc_set_gpr(vcpu, 5, get_tb() + kvmppc_get_tb_offset_hv(vcpu));
 		return H_SUCCESS;
 	}
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 7197c8256668..ca9793c3d437 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1729,7 +1729,7 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 			val = get_reg_val(reg->id, vcpu->arch.vr.vscr.u[3]);
 			break;
 		case KVM_REG_PPC_VRSAVE:
-			val = get_reg_val(reg->id, vcpu->arch.vrsave);
+			val = get_reg_val(reg->id, kvmppc_get_vrsave(vcpu));
 			break;
 #endif /* CONFIG_ALTIVEC */
 		default:
@@ -1784,7 +1784,7 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 				r = -ENXIO;
 				break;
 			}
-			vcpu->arch.vrsave = set_reg_val(reg->id, val);
+			kvmppc_set_vrsave(vcpu, set_reg_val(reg->id, val));
 			break;
 #endif /* CONFIG_ALTIVEC */
 		default:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 1/6] KVM: PPC: Use getters and setters for vcpu register state
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Jordan Niethe, mikey, kautuk.consul.1980, kvm, npiggin, kvm-ppc,
	sbhat, vaibhav

There are already some getter and setter functions used for accessing
vcpu register state, e.g. kvmppc_get_pc(). There are also more
complicated examples that are generated by macros like
kvmppc_get_sprg0() which are generated by the SHARED_SPRNG_WRAPPER()
macro.

In the new PAPR API for nested guest partitions the L1 is required to
communicate with the L0 to modify and read nested guest state.

Prepare to support this by replacing direct accesses to vcpu register
state with wrapper functions. Follow the existing pattern of using
macros to generate individual wrappers. These wrappers will
be augmented for supporting PAPR nested guests later.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s.h  |  68 +++++++-
 arch/powerpc/include/asm/kvm_ppc.h     |  48 ++++--
 arch/powerpc/kvm/book3s.c              |  22 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   4 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   9 +-
 arch/powerpc/kvm/book3s_64_vio.c       |   4 +-
 arch/powerpc/kvm/book3s_hv.c           | 222 +++++++++++++------------
 arch/powerpc/kvm/book3s_hv.h           |  59 +++++++
 arch/powerpc/kvm/book3s_hv_builtin.c   |  10 +-
 arch/powerpc/kvm/book3s_hv_p9_entry.c  |   4 +-
 arch/powerpc/kvm/book3s_hv_ras.c       |   5 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c    |   8 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c   |   4 +-
 arch/powerpc/kvm/book3s_xive.c         |   9 +-
 arch/powerpc/kvm/powerpc.c             |   4 +-
 15 files changed, 322 insertions(+), 158 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index bbf5e2c5fe09..4e91f54a3f9f 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -392,6 +392,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 	return vcpu->arch.regs.nip;
 }
 
+static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 val)
+{
+	vcpu->arch.pid = val;
+}
+
+static inline u32 kvmppc_get_pid(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.pid;
+}
+
 static inline u64 kvmppc_get_msr(struct kvm_vcpu *vcpu);
 static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu)
 {
@@ -403,10 +413,66 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault_dar;
 }
 
+#define BOOK3S_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+									\
+	vcpu->arch.reg = val;						\
+}
+
+#define BOOK3S_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
+{									\
+	return vcpu->arch.reg;						\
+}
+
+#define BOOK3S_WRAPPER(reg, size)					\
+	BOOK3S_WRAPPER_SET(reg, size)					\
+	BOOK3S_WRAPPER_GET(reg, size)					\
+
+BOOK3S_WRAPPER(tar, 64)
+BOOK3S_WRAPPER(ebbhr, 64)
+BOOK3S_WRAPPER(ebbrr, 64)
+BOOK3S_WRAPPER(bescr, 64)
+BOOK3S_WRAPPER(ic, 64)
+BOOK3S_WRAPPER(vrsave, 64)
+
+
+#define VCORE_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+	vcpu->arch.vcore->reg = val;					\
+}
+
+#define VCORE_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
+{									\
+	return vcpu->arch.vcore->reg;					\
+}
+
+#define VCORE_WRAPPER(reg, size)					\
+	VCORE_WRAPPER_SET(reg, size)					\
+	VCORE_WRAPPER_GET(reg, size)					\
+
+
+VCORE_WRAPPER(vtb, 64)
+VCORE_WRAPPER(tb_offset, 64)
+VCORE_WRAPPER(lpcr, 64)
+
+static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.dec_expires;
+}
+
+static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
+{
+	vcpu->arch.dec_expires = val;
+}
+
 /* Expiry time of vcpu DEC relative to host TB */
 static inline u64 kvmppc_dec_expires_host_tb(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.dec_expires - vcpu->arch.vcore->tb_offset;
+	return kvmppc_get_dec_expires(vcpu) - kvmppc_get_tb_offset_hv(vcpu);
 }
 
 static inline bool is_kvmppc_resume_guest(int r)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 79a9c0bb8bba..fbac353ac46b 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -936,7 +936,7 @@ static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
 #define SPRNG_WRAPPER_SET(reg, bookehv_spr)				\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val)	\
 {									\
-	mtspr(bookehv_spr, val);						\
+	mtspr(bookehv_spr, val);					\
 }									\
 
 #define SHARED_WRAPPER_GET(reg, size)					\
@@ -957,10 +957,32 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
 }									\
 
+#define SHARED_CACHE_WRAPPER_GET(reg, size)				\
+static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
+{									\
+	if (kvmppc_shared_big_endian(vcpu))				\
+	       return be##size##_to_cpu(vcpu->arch.shared->reg);	\
+	else								\
+	       return le##size##_to_cpu(vcpu->arch.shared->reg);	\
+}									\
+
+#define SHARED_CACHE_WRAPPER_SET(reg, size)				\
+static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+	if (kvmppc_shared_big_endian(vcpu))				\
+	       vcpu->arch.shared->reg = cpu_to_be##size(val);		\
+	else								\
+	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
+}									\
+
 #define SHARED_WRAPPER(reg, size)					\
 	SHARED_WRAPPER_GET(reg, size)					\
 	SHARED_WRAPPER_SET(reg, size)					\
 
+#define SHARED_CACHE_WRAPPER(reg, size)					\
+	SHARED_CACHE_WRAPPER_GET(reg, size)				\
+	SHARED_CACHE_WRAPPER_SET(reg, size)				\
+
 #define SPRNG_WRAPPER(reg, bookehv_spr)					\
 	SPRNG_WRAPPER_GET(reg, bookehv_spr)				\
 	SPRNG_WRAPPER_SET(reg, bookehv_spr)				\
@@ -970,23 +992,29 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 #define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)			\
 	SPRNG_WRAPPER(reg, bookehv_spr)					\
 
+#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr)		\
+	SPRNG_WRAPPER(reg, bookehv_spr)					\
+
 #else
 
 #define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)			\
 	SHARED_WRAPPER(reg, size)					\
 
+#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr)		\
+	SHARED_CACHE_WRAPPER(reg, size)					\
+
 #endif
 
 SHARED_WRAPPER(critical, 64)
-SHARED_SPRNG_WRAPPER(sprg0, 64, SPRN_GSPRG0)
-SHARED_SPRNG_WRAPPER(sprg1, 64, SPRN_GSPRG1)
-SHARED_SPRNG_WRAPPER(sprg2, 64, SPRN_GSPRG2)
-SHARED_SPRNG_WRAPPER(sprg3, 64, SPRN_GSPRG3)
-SHARED_SPRNG_WRAPPER(srr0, 64, SPRN_GSRR0)
-SHARED_SPRNG_WRAPPER(srr1, 64, SPRN_GSRR1)
-SHARED_SPRNG_WRAPPER(dar, 64, SPRN_GDEAR)
+SHARED_SPRNG_CACHE_WRAPPER(sprg0, 64, SPRN_GSPRG0)
+SHARED_SPRNG_CACHE_WRAPPER(sprg1, 64, SPRN_GSPRG1)
+SHARED_SPRNG_CACHE_WRAPPER(sprg2, 64, SPRN_GSPRG2)
+SHARED_SPRNG_CACHE_WRAPPER(sprg3, 64, SPRN_GSPRG3)
+SHARED_SPRNG_CACHE_WRAPPER(srr0, 64, SPRN_GSRR0)
+SHARED_SPRNG_CACHE_WRAPPER(srr1, 64, SPRN_GSRR1)
+SHARED_SPRNG_CACHE_WRAPPER(dar, 64, SPRN_GDEAR)
 SHARED_SPRNG_WRAPPER(esr, 64, SPRN_GESR)
-SHARED_WRAPPER_GET(msr, 64)
+SHARED_CACHE_WRAPPER_GET(msr, 64)
 static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val)
 {
 	if (kvmppc_shared_big_endian(vcpu))
@@ -994,7 +1022,7 @@ static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val)
 	else
 	       vcpu->arch.shared->msr = cpu_to_le64(val);
 }
-SHARED_WRAPPER(dsisr, 32)
+SHARED_CACHE_WRAPPER(dsisr, 32)
 SHARED_WRAPPER(int_pending, 32)
 SHARED_WRAPPER(sprg4, 64)
 SHARED_WRAPPER(sprg5, 64)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 686d8d9eda3e..2fe31b518886 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -565,7 +565,7 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 	regs->msr = kvmppc_get_msr(vcpu);
 	regs->srr0 = kvmppc_get_srr0(vcpu);
 	regs->srr1 = kvmppc_get_srr1(vcpu);
-	regs->pid = vcpu->arch.pid;
+	regs->pid = kvmppc_get_pid(vcpu);
 	regs->sprg0 = kvmppc_get_sprg0(vcpu);
 	regs->sprg1 = kvmppc_get_sprg1(vcpu);
 	regs->sprg2 = kvmppc_get_sprg2(vcpu);
@@ -683,19 +683,19 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			*val = get_reg_val(id, vcpu->arch.fscr);
 			break;
 		case KVM_REG_PPC_TAR:
-			*val = get_reg_val(id, vcpu->arch.tar);
+			*val = get_reg_val(id, kvmppc_get_tar(vcpu));
 			break;
 		case KVM_REG_PPC_EBBHR:
-			*val = get_reg_val(id, vcpu->arch.ebbhr);
+			*val = get_reg_val(id, kvmppc_get_ebbhr(vcpu));
 			break;
 		case KVM_REG_PPC_EBBRR:
-			*val = get_reg_val(id, vcpu->arch.ebbrr);
+			*val = get_reg_val(id, kvmppc_get_ebbrr(vcpu));
 			break;
 		case KVM_REG_PPC_BESCR:
-			*val = get_reg_val(id, vcpu->arch.bescr);
+			*val = get_reg_val(id, kvmppc_get_bescr(vcpu));
 			break;
 		case KVM_REG_PPC_IC:
-			*val = get_reg_val(id, vcpu->arch.ic);
+			*val = get_reg_val(id, kvmppc_get_ic(vcpu));
 			break;
 		default:
 			r = -EINVAL;
@@ -768,19 +768,19 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			vcpu->arch.fscr = set_reg_val(id, *val);
 			break;
 		case KVM_REG_PPC_TAR:
-			vcpu->arch.tar = set_reg_val(id, *val);
+			kvmppc_set_tar(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_EBBHR:
-			vcpu->arch.ebbhr = set_reg_val(id, *val);
+			kvmppc_set_ebbhr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_EBBRR:
-			vcpu->arch.ebbrr = set_reg_val(id, *val);
+			kvmppc_set_ebbrr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_BESCR:
-			vcpu->arch.bescr = set_reg_val(id, *val);
+			kvmppc_set_bescr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_IC:
-			vcpu->arch.ic = set_reg_val(id, *val);
+			kvmppc_set_ic(vcpu, set_reg_val(id, *val));
 			break;
 		default:
 			r = -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7f765d5ad436..738f2ecbe9b9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -347,7 +347,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	unsigned long v, orig_v, gr;
 	__be64 *hptep;
 	long int index;
-	int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
+	int virtmode = kvmppc_get_msr(vcpu) & (data ? MSR_DR : MSR_IR);
 
 	if (kvm_is_radix(vcpu->kvm))
 		return kvmppc_mmu_radix_xlate(vcpu, eaddr, gpte, data, iswrite);
@@ -385,7 +385,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 
 	/* Get PP bits and key for permission check */
 	pp = gr & (HPTE_R_PP0 | HPTE_R_PP);
-	key = (vcpu->arch.shregs.msr & MSR_PR) ? SLB_VSID_KP : SLB_VSID_KS;
+	key = (kvmppc_get_msr(vcpu) & MSR_PR) ? SLB_VSID_KP : SLB_VSID_KS;
 	key &= slb_v;
 
 	/* Calculate permissions */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 461307b89c3a..e1aa078580a1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -15,6 +15,7 @@
 
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
+#include "book3s_hv.h"
 #include <asm/page.h>
 #include <asm/mmu.h>
 #include <asm/pgalloc.h>
@@ -96,7 +97,7 @@ static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr,
 					  void *to, void *from, unsigned long n)
 {
 	int lpid = vcpu->kvm->arch.lpid;
-	int pid = vcpu->arch.pid;
+	int pid = kvmppc_get_pid(vcpu);
 
 	/* This would cause a data segment intr so don't allow the access */
 	if (eaddr & (0x3FFUL << 52))
@@ -270,7 +271,7 @@ int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	/* Work out effective PID */
 	switch (eaddr >> 62) {
 	case 0:
-		pid = vcpu->arch.pid;
+		pid = kvmppc_get_pid(vcpu);
 		break;
 	case 3:
 		pid = 0;
@@ -294,9 +295,9 @@ int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	} else {
 		if (!(pte & _PAGE_PRIVILEGED)) {
 			/* Check AMR/IAMR to see if strict mode is in force */
-			if (vcpu->arch.amr & (1ul << 62))
+			if (kvmppc_get_amr_hv(vcpu) & (1ul << 62))
 				gpte->may_read = 0;
-			if (vcpu->arch.amr & (1ul << 63))
+			if (kvmppc_get_amr_hv(vcpu) & (1ul << 63))
 				gpte->may_write = 0;
 			if (vcpu->arch.iamr & (1ul << 62))
 				gpte->may_execute = 0;
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 93b695b289e9..4ba048f272f2 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -786,12 +786,12 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	idx = (ioba >> stt->page_shift) - stt->offset;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	if (!page) {
-		vcpu->arch.regs.gpr[4] = 0;
+		kvmppc_set_gpr(vcpu, 4, 0);
 		return H_SUCCESS;
 	}
 	tbl = (u64 *)page_address(page);
 
-	vcpu->arch.regs.gpr[4] = tbl[idx % TCES_PER_PAGE];
+	kvmppc_set_gpr(vcpu, 4, tbl[idx % TCES_PER_PAGE]);
 
 	return H_SUCCESS;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 130bafdb1430..521d84621422 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -383,11 +383,6 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 	spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
 }
 
-static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
-{
-	vcpu->arch.pvr = pvr;
-}
-
 /* Dummy value used in computing PCR value below */
 #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
 
@@ -794,7 +789,7 @@ static void kvmppc_update_vpa_dispatch(struct kvm_vcpu *vcpu,
 
 	vpa->enqueue_dispatch_tb = cpu_to_be64(be64_to_cpu(vpa->enqueue_dispatch_tb) + stolen);
 
-	__kvmppc_create_dtl_entry(vcpu, vpa, vc->pcpu, now + vc->tb_offset, stolen);
+	__kvmppc_create_dtl_entry(vcpu, vpa, vc->pcpu, now + kvmppc_get_tb_offset_hv(vcpu), stolen);
 
 	vcpu->arch.vpa.dirty = true;
 }
@@ -868,7 +863,7 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
 		/* Guests can't breakpoint the hypervisor */
 		if ((value1 & CIABR_PRIV) == CIABR_PRIV_HYPER)
 			return H_P3;
-		vcpu->arch.ciabr  = value1;
+		kvmppc_set_ciabr_hv(vcpu, value1);
 		return H_SUCCESS;
 	case H_SET_MODE_RESOURCE_SET_DAWR0:
 		if (!kvmppc_power8_compatible(vcpu))
@@ -879,8 +874,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
 			return H_UNSUPPORTED_FLAG_START;
 		if (value2 & DABRX_HYP)
 			return H_P4;
-		vcpu->arch.dawr0  = value1;
-		vcpu->arch.dawrx0 = value2;
+		kvmppc_set_dawr0_hv(vcpu, value1);
+		kvmppc_set_dawrx0_hv(vcpu, value2);
 		return H_SUCCESS;
 	case H_SET_MODE_RESOURCE_SET_DAWR1:
 		if (!kvmppc_power8_compatible(vcpu))
@@ -895,8 +890,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
 			return H_UNSUPPORTED_FLAG_START;
 		if (value2 & DABRX_HYP)
 			return H_P4;
-		vcpu->arch.dawr1  = value1;
-		vcpu->arch.dawrx1 = value2;
+		kvmppc_set_dawr1_hv(vcpu, value1);
+		kvmppc_set_dawrx1_hv(vcpu, value2);
 		return H_SUCCESS;
 	case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE:
 		/*
@@ -1268,8 +1263,11 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 		break;
 #endif
 	case H_RANDOM:
-		if (!arch_get_random_seed_longs(&vcpu->arch.regs.gpr[4], 1))
+		unsigned long rand;
+
+		if (!arch_get_random_seed_longs(&rand, 1))
 			ret = H_HARDWARE;
+		kvmppc_set_gpr(vcpu, 4, rand);
 		break;
 	case H_RPT_INVALIDATE:
 		ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
@@ -1370,7 +1368,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
  */
 static void kvmppc_cede(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.shregs.msr |= MSR_EE;
+	kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
 	vcpu->arch.ceded = 1;
 	smp_mb();
 	if (vcpu->arch.prodded) {
@@ -1544,7 +1542,7 @@ static int kvmppc_pmu_unavailable(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.hfscr_permitted & HFSCR_PM))
 		return EMULATE_FAIL;
 
-	vcpu->arch.hfscr |= HFSCR_PM;
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_PM);
 
 	return RESUME_GUEST;
 }
@@ -1554,7 +1552,7 @@ static int kvmppc_ebb_unavailable(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.hfscr_permitted & HFSCR_EBB))
 		return EMULATE_FAIL;
 
-	vcpu->arch.hfscr |= HFSCR_EBB;
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_EBB);
 
 	return RESUME_GUEST;
 }
@@ -1564,7 +1562,7 @@ static int kvmppc_tm_unavailable(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.hfscr_permitted & HFSCR_TM))
 		return EMULATE_FAIL;
 
-	vcpu->arch.hfscr |= HFSCR_TM;
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_TM);
 
 	return RESUME_GUEST;
 }
@@ -1585,7 +1583,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 	 * That can happen due to a bug, or due to a machine check
 	 * occurring at just the wrong time.
 	 */
-	if (vcpu->arch.shregs.msr & MSR_HV) {
+	if (kvmppc_get_msr(vcpu) & MSR_HV) {
 		printk(KERN_EMERG "KVM trap in HV mode!\n");
 		printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
 			vcpu->arch.trap, kvmppc_get_pc(vcpu),
@@ -1636,7 +1634,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		 * so that it knows that the machine check occurred.
 		 */
 		if (!vcpu->kvm->arch.fwnmi_enabled) {
-			ulong flags = (vcpu->arch.shregs.msr & 0x083c0000) |
+			ulong flags = (kvmppc_get_msr(vcpu) & 0x083c0000) |
 					(kvmppc_get_msr(vcpu) & SRR1_PREFIXED);
 			kvmppc_core_queue_machine_check(vcpu, flags);
 			r = RESUME_GUEST;
@@ -1666,7 +1664,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		 * as a result of a hypervisor emulation interrupt
 		 * (e40) getting turned into a 700 by BML RTAS.
 		 */
-		flags = (vcpu->arch.shregs.msr & 0x1f0000ull) |
+		flags = (kvmppc_get_msr(vcpu) & 0x1f0000ull) |
 			(kvmppc_get_msr(vcpu) & SRR1_PREFIXED);
 		kvmppc_core_queue_program(vcpu, flags);
 		r = RESUME_GUEST;
@@ -1676,7 +1674,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 	{
 		int i;
 
-		if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
+		if (unlikely(kvmppc_get_msr(vcpu) & MSR_PR)) {
 			/*
 			 * Guest userspace executed sc 1. This can only be
 			 * reached by the P9 path because the old path
@@ -1754,7 +1752,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			break;
 		}
 
-		if (!(vcpu->arch.shregs.msr & MSR_DR))
+		if (!(kvmppc_get_msr(vcpu) & MSR_DR))
 			vsid = vcpu->kvm->arch.vrma_slb_v;
 		else
 			vsid = vcpu->arch.fault_gpa;
@@ -1778,7 +1776,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		long err;
 
 		vcpu->arch.fault_dar = kvmppc_get_pc(vcpu);
-		vcpu->arch.fault_dsisr = vcpu->arch.shregs.msr &
+		vcpu->arch.fault_dsisr = kvmppc_get_msr(vcpu) &
 			DSISR_SRR1_MATCH_64S;
 		if (kvm_is_radix(vcpu->kvm) || !cpu_has_feature(CPU_FTR_ARCH_300)) {
 			/*
@@ -1787,7 +1785,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			 * hash fault handling below is v3 only (it uses ASDR
 			 * via fault_gpa).
 			 */
-			if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE)
+			if (kvmppc_get_msr(vcpu) & HSRR1_HISI_WRITE)
 				vcpu->arch.fault_dsisr |= DSISR_ISSTORE;
 			r = RESUME_PAGE_FAULT;
 			break;
@@ -1801,7 +1799,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			break;
 		}
 
-		if (!(vcpu->arch.shregs.msr & MSR_IR))
+		if (!(kvmppc_get_msr(vcpu) & MSR_IR))
 			vsid = vcpu->kvm->arch.vrma_slb_v;
 		else
 			vsid = vcpu->arch.fault_gpa;
@@ -1863,7 +1861,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 	 * Otherwise, we just generate a program interrupt to the guest.
 	 */
 	case BOOK3S_INTERRUPT_H_FAC_UNAVAIL: {
-		u64 cause = vcpu->arch.hfscr >> 56;
+		u64 cause = kvmppc_get_hfscr_hv(vcpu) >> 56;
 
 		r = EMULATE_FAIL;
 		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
@@ -1891,7 +1889,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		kvmppc_dump_regs(vcpu);
 		printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
 			vcpu->arch.trap, kvmppc_get_pc(vcpu),
-			vcpu->arch.shregs.msr);
+			kvmppc_get_msr(vcpu));
 		run->hw.hardware_exit_reason = vcpu->arch.trap;
 		r = RESUME_HOST;
 		break;
@@ -1915,11 +1913,11 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu)
 	 * That can happen due to a bug, or due to a machine check
 	 * occurring at just the wrong time.
 	 */
-	if (vcpu->arch.shregs.msr & MSR_HV) {
+	if (kvmppc_get_msr(vcpu) & MSR_HV) {
 		pr_emerg("KVM trap in HV mode while nested!\n");
 		pr_emerg("trap=0x%x | pc=0x%lx | msr=0x%llx\n",
 			 vcpu->arch.trap, kvmppc_get_pc(vcpu),
-			 vcpu->arch.shregs.msr);
+			 kvmppc_get_msr(vcpu));
 		kvmppc_dump_regs(vcpu);
 		return RESUME_HOST;
 	}
@@ -1976,7 +1974,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu)
 		vcpu->arch.fault_dar = kvmppc_get_pc(vcpu);
 		vcpu->arch.fault_dsisr = kvmppc_get_msr(vcpu) &
 					 DSISR_SRR1_MATCH_64S;
-		if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE)
+		if (kvmppc_get_msr(vcpu) & HSRR1_HISI_WRITE)
 			vcpu->arch.fault_dsisr |= DSISR_ISSTORE;
 		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 		r = kvmhv_nested_page_fault(vcpu);
@@ -2182,7 +2180,7 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 		}
 	}
 
-	vc->lpcr = new_lpcr;
+	kvmppc_set_lpcr_hv(vcpu, new_lpcr);
 
 	spin_unlock(&vc->lock);
 }
@@ -2207,64 +2205,64 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		*val = get_reg_val(id, vcpu->arch.dabrx);
 		break;
 	case KVM_REG_PPC_DSCR:
-		*val = get_reg_val(id, vcpu->arch.dscr);
+		*val = get_reg_val(id, kvmppc_get_dscr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_PURR:
-		*val = get_reg_val(id, vcpu->arch.purr);
+		*val = get_reg_val(id, kvmppc_get_purr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_SPURR:
-		*val = get_reg_val(id, vcpu->arch.spurr);
+		*val = get_reg_val(id, kvmppc_get_spurr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_AMR:
-		*val = get_reg_val(id, vcpu->arch.amr);
+		*val = get_reg_val(id, kvmppc_get_amr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_UAMOR:
-		*val = get_reg_val(id, vcpu->arch.uamor);
+		*val = get_reg_val(id, kvmppc_get_uamor_hv(vcpu));
 		break;
 	case KVM_REG_PPC_MMCR0 ... KVM_REG_PPC_MMCR1:
 		i = id - KVM_REG_PPC_MMCR0;
-		*val = get_reg_val(id, vcpu->arch.mmcr[i]);
+		*val = get_reg_val(id, kvmppc_get_mmcr_hv(vcpu, i));
 		break;
 	case KVM_REG_PPC_MMCR2:
-		*val = get_reg_val(id, vcpu->arch.mmcr[2]);
+		*val = get_reg_val(id, kvmppc_get_mmcr_hv(vcpu, 2));
 		break;
 	case KVM_REG_PPC_MMCRA:
-		*val = get_reg_val(id, vcpu->arch.mmcra);
+		*val = get_reg_val(id, kvmppc_get_mmcra_hv(vcpu));
 		break;
 	case KVM_REG_PPC_MMCRS:
 		*val = get_reg_val(id, vcpu->arch.mmcrs);
 		break;
 	case KVM_REG_PPC_MMCR3:
-		*val = get_reg_val(id, vcpu->arch.mmcr[3]);
+		*val = get_reg_val(id, kvmppc_get_mmcr_hv(vcpu, 3));
 		break;
 	case KVM_REG_PPC_PMC1 ... KVM_REG_PPC_PMC8:
 		i = id - KVM_REG_PPC_PMC1;
-		*val = get_reg_val(id, vcpu->arch.pmc[i]);
+		*val = get_reg_val(id, kvmppc_get_pmc_hv(vcpu, i));
 		break;
 	case KVM_REG_PPC_SPMC1 ... KVM_REG_PPC_SPMC2:
 		i = id - KVM_REG_PPC_SPMC1;
 		*val = get_reg_val(id, vcpu->arch.spmc[i]);
 		break;
 	case KVM_REG_PPC_SIAR:
-		*val = get_reg_val(id, vcpu->arch.siar);
+		*val = get_reg_val(id, kvmppc_get_siar_hv(vcpu));
 		break;
 	case KVM_REG_PPC_SDAR:
-		*val = get_reg_val(id, vcpu->arch.sdar);
+		*val = get_reg_val(id, kvmppc_get_siar_hv(vcpu));
 		break;
 	case KVM_REG_PPC_SIER:
-		*val = get_reg_val(id, vcpu->arch.sier[0]);
+		*val = get_reg_val(id, kvmppc_get_sier_hv(vcpu, 0));
 		break;
 	case KVM_REG_PPC_SIER2:
-		*val = get_reg_val(id, vcpu->arch.sier[1]);
+		*val = get_reg_val(id, kvmppc_get_sier_hv(vcpu, 1));
 		break;
 	case KVM_REG_PPC_SIER3:
-		*val = get_reg_val(id, vcpu->arch.sier[2]);
+		*val = get_reg_val(id, kvmppc_get_sier_hv(vcpu, 2));
 		break;
 	case KVM_REG_PPC_IAMR:
-		*val = get_reg_val(id, vcpu->arch.iamr);
+		*val = get_reg_val(id, kvmppc_get_iamr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_PSPB:
-		*val = get_reg_val(id, vcpu->arch.pspb);
+		*val = get_reg_val(id, kvmppc_get_pspb_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DPDES:
 		/*
@@ -2279,22 +2277,22 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 			*val = get_reg_val(id, vcpu->arch.vcore->dpdes);
 		break;
 	case KVM_REG_PPC_VTB:
-		*val = get_reg_val(id, vcpu->arch.vcore->vtb);
+		*val = get_reg_val(id, kvmppc_get_vtb_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWR:
-		*val = get_reg_val(id, vcpu->arch.dawr0);
+		*val = get_reg_val(id, kvmppc_get_dawr0_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWRX:
-		*val = get_reg_val(id, vcpu->arch.dawrx0);
+		*val = get_reg_val(id, kvmppc_get_dawrx0_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWR1:
-		*val = get_reg_val(id, vcpu->arch.dawr1);
+		*val = get_reg_val(id, kvmppc_get_dawr1_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWRX1:
-		*val = get_reg_val(id, vcpu->arch.dawrx1);
+		*val = get_reg_val(id, kvmppc_get_dawrx1_hv(vcpu));
 		break;
 	case KVM_REG_PPC_CIABR:
-		*val = get_reg_val(id, vcpu->arch.ciabr);
+		*val = get_reg_val(id, kvmppc_get_ciabr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_CSIGR:
 		*val = get_reg_val(id, vcpu->arch.csigr);
@@ -2306,13 +2304,13 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		*val = get_reg_val(id, vcpu->arch.tcscr);
 		break;
 	case KVM_REG_PPC_PID:
-		*val = get_reg_val(id, vcpu->arch.pid);
+		*val = get_reg_val(id, kvmppc_get_pid(vcpu));
 		break;
 	case KVM_REG_PPC_ACOP:
 		*val = get_reg_val(id, vcpu->arch.acop);
 		break;
 	case KVM_REG_PPC_WORT:
-		*val = get_reg_val(id, vcpu->arch.wort);
+		*val = get_reg_val(id, kvmppc_get_wort_hv(vcpu));
 		break;
 	case KVM_REG_PPC_TIDR:
 		*val = get_reg_val(id, vcpu->arch.tid);
@@ -2338,14 +2336,14 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		spin_unlock(&vcpu->arch.vpa_update_lock);
 		break;
 	case KVM_REG_PPC_TB_OFFSET:
-		*val = get_reg_val(id, vcpu->arch.vcore->tb_offset);
+		*val = get_reg_val(id, kvmppc_get_tb_offset_hv(vcpu));
 		break;
 	case KVM_REG_PPC_LPCR:
 	case KVM_REG_PPC_LPCR_64:
 		*val = get_reg_val(id, vcpu->arch.vcore->lpcr);
 		break;
 	case KVM_REG_PPC_PPR:
-		*val = get_reg_val(id, vcpu->arch.ppr);
+		*val = get_reg_val(id, kvmppc_get_ppr_hv(vcpu));
 		break;
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	case KVM_REG_PPC_TFHAR:
@@ -2417,7 +2415,7 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		*val = get_reg_val(id, vcpu->arch.vcore->arch_compat);
 		break;
 	case KVM_REG_PPC_DEC_EXPIRY:
-		*val = get_reg_val(id, vcpu->arch.dec_expires);
+		*val = get_reg_val(id, kvmppc_get_dec_expires(vcpu));
 		break;
 	case KVM_REG_PPC_ONLINE:
 		*val = get_reg_val(id, vcpu->arch.online);
@@ -2425,6 +2423,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_PTCR:
 		*val = get_reg_val(id, vcpu->kvm->arch.l1_ptcr);
 		break;
+	case KVM_REG_PPC_FSCR:
+		*val = get_reg_val(id, kvmppc_get_fscr_hv(vcpu));
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -2453,29 +2454,29 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		vcpu->arch.dabrx = set_reg_val(id, *val) & ~DABRX_HYP;
 		break;
 	case KVM_REG_PPC_DSCR:
-		vcpu->arch.dscr = set_reg_val(id, *val);
+		kvmppc_set_dscr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_PURR:
-		vcpu->arch.purr = set_reg_val(id, *val);
+		kvmppc_set_purr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SPURR:
-		vcpu->arch.spurr = set_reg_val(id, *val);
+		kvmppc_set_spurr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_AMR:
-		vcpu->arch.amr = set_reg_val(id, *val);
+		kvmppc_set_amr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_UAMOR:
-		vcpu->arch.uamor = set_reg_val(id, *val);
+		kvmppc_set_uamor_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCR0 ... KVM_REG_PPC_MMCR1:
 		i = id - KVM_REG_PPC_MMCR0;
-		vcpu->arch.mmcr[i] = set_reg_val(id, *val);
+		kvmppc_set_mmcr_hv(vcpu, i, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCR2:
-		vcpu->arch.mmcr[2] = set_reg_val(id, *val);
+		kvmppc_set_mmcr_hv(vcpu, 2, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCRA:
-		vcpu->arch.mmcra = set_reg_val(id, *val);
+		kvmppc_set_mmcra_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCRS:
 		vcpu->arch.mmcrs = set_reg_val(id, *val);
@@ -2485,32 +2486,32 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		break;
 	case KVM_REG_PPC_PMC1 ... KVM_REG_PPC_PMC8:
 		i = id - KVM_REG_PPC_PMC1;
-		vcpu->arch.pmc[i] = set_reg_val(id, *val);
+		kvmppc_set_pmc_hv(vcpu, i, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SPMC1 ... KVM_REG_PPC_SPMC2:
 		i = id - KVM_REG_PPC_SPMC1;
 		vcpu->arch.spmc[i] = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_SIAR:
-		vcpu->arch.siar = set_reg_val(id, *val);
+		kvmppc_set_siar_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SDAR:
-		vcpu->arch.sdar = set_reg_val(id, *val);
+		kvmppc_set_sdar_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SIER:
-		vcpu->arch.sier[0] = set_reg_val(id, *val);
+		kvmppc_set_sier_hv(vcpu, 0, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SIER2:
-		vcpu->arch.sier[1] = set_reg_val(id, *val);
+		kvmppc_set_sier_hv(vcpu, 1, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SIER3:
-		vcpu->arch.sier[2] = set_reg_val(id, *val);
+		kvmppc_set_sier_hv(vcpu, 2, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_IAMR:
-		vcpu->arch.iamr = set_reg_val(id, *val);
+		kvmppc_set_iamr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_PSPB:
-		vcpu->arch.pspb = set_reg_val(id, *val);
+		kvmppc_set_pspb_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DPDES:
 		if (cpu_has_feature(CPU_FTR_ARCH_300))
@@ -2519,25 +2520,25 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 			vcpu->arch.vcore->dpdes = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_VTB:
-		vcpu->arch.vcore->vtb = set_reg_val(id, *val);
+		kvmppc_set_vtb_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DAWR:
-		vcpu->arch.dawr0 = set_reg_val(id, *val);
+		kvmppc_set_dawr0_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DAWRX:
-		vcpu->arch.dawrx0 = set_reg_val(id, *val) & ~DAWRX_HYP;
+		kvmppc_set_dawrx0_hv(vcpu, set_reg_val(id, *val) & ~DAWRX_HYP);
 		break;
 	case KVM_REG_PPC_DAWR1:
-		vcpu->arch.dawr1 = set_reg_val(id, *val);
+		kvmppc_set_dawr1_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DAWRX1:
-		vcpu->arch.dawrx1 = set_reg_val(id, *val) & ~DAWRX_HYP;
+		kvmppc_set_dawrx1_hv(vcpu, set_reg_val(id, *val) & ~DAWRX_HYP);
 		break;
 	case KVM_REG_PPC_CIABR:
-		vcpu->arch.ciabr = set_reg_val(id, *val);
+		kvmppc_set_ciabr_hv(vcpu, set_reg_val(id, *val));
 		/* Don't allow setting breakpoints in hypervisor code */
-		if ((vcpu->arch.ciabr & CIABR_PRIV) == CIABR_PRIV_HYPER)
-			vcpu->arch.ciabr &= ~CIABR_PRIV;	/* disable */
+		if ((kvmppc_get_ciabr_hv(vcpu) & CIABR_PRIV) == CIABR_PRIV_HYPER)
+			kvmppc_set_ciabr_hv(vcpu, kvmppc_get_ciabr_hv(vcpu) & ~CIABR_PRIV);	/* disable */
 		break;
 	case KVM_REG_PPC_CSIGR:
 		vcpu->arch.csigr = set_reg_val(id, *val);
@@ -2549,13 +2550,13 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		vcpu->arch.tcscr = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_PID:
-		vcpu->arch.pid = set_reg_val(id, *val);
+		kvmppc_set_pid(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_ACOP:
 		vcpu->arch.acop = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_WORT:
-		vcpu->arch.wort = set_reg_val(id, *val);
+		kvmppc_set_wort_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_TIDR:
 		vcpu->arch.tid = set_reg_val(id, *val);
@@ -2602,10 +2603,11 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		 * decrementer, which is better than a large one that
 		 * causes a hang.
 		 */
-		if (!vcpu->arch.dec_expires && tb_offset)
-			vcpu->arch.dec_expires = get_tb() + tb_offset;
+		kvmppc_set_tb_offset_hv(vcpu, tb_offset);
+		if (!kvmppc_get_dec_expires(vcpu) && tb_offset)
+			kvmppc_set_dec_expires(vcpu, get_tb() + tb_offset);
 
-		vcpu->arch.vcore->tb_offset = tb_offset;
+		kvmppc_set_tb_offset_hv(vcpu, tb_offset);
 		break;
 	}
 	case KVM_REG_PPC_LPCR:
@@ -2615,7 +2617,7 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		kvmppc_set_lpcr(vcpu, set_reg_val(id, *val), false);
 		break;
 	case KVM_REG_PPC_PPR:
-		vcpu->arch.ppr = set_reg_val(id, *val);
+		kvmppc_set_ppr_hv(vcpu, set_reg_val(id, *val));
 		break;
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	case KVM_REG_PPC_TFHAR:
@@ -2686,7 +2688,7 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		r = kvmppc_set_arch_compat(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DEC_EXPIRY:
-		vcpu->arch.dec_expires = set_reg_val(id, *val);
+		kvmppc_set_dec_expires(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_ONLINE:
 		i = set_reg_val(id, *val);
@@ -2699,6 +2701,9 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_PTCR:
 		vcpu->kvm->arch.l1_ptcr = set_reg_val(id, *val);
 		break;
+	case KVM_REG_PPC_FSCR:
+		kvmppc_set_fscr_hv(vcpu, set_reg_val(id, *val));
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -2916,19 +2921,20 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 	vcpu->arch.shared_big_endian = false;
 #endif
 #endif
-	vcpu->arch.mmcr[0] = MMCR0_FC;
+	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
+
 	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
-		vcpu->arch.mmcr[0] |= MMCR0_PMCCEXT;
-		vcpu->arch.mmcra = MMCRA_BHRB_DISABLE;
+		kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
+		kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
 	}
 
-	vcpu->arch.ctrl = CTRL_RUNLATCH;
+	kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
 	/* default to host PVR, since we can't spoof it */
 	kvmppc_set_pvr_hv(vcpu, mfspr(SPRN_PVR));
 	spin_lock_init(&vcpu->arch.vpa_update_lock);
 	spin_lock_init(&vcpu->arch.tbacct_lock);
 	vcpu->arch.busy_preempt = TB_NIL;
-	vcpu->arch.shregs.msr = MSR_ME;
+	kvmppc_set_msr_fast(vcpu, MSR_ME);
 	vcpu->arch.intr_msr = MSR_SF | MSR_ME;
 
 	/*
@@ -2938,29 +2944,30 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 	 * don't set the HFSCR_MSGP bit, and that causes those instructions
 	 * to trap and then we emulate them.
 	 */
-	vcpu->arch.hfscr = HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
-		HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP;
+	kvmppc_set_hfscr_hv(vcpu, HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
+			    HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP);
 
 	/* On POWER10 and later, allow prefixed instructions */
 	if (cpu_has_feature(CPU_FTR_ARCH_31))
-		vcpu->arch.hfscr |= HFSCR_PREFIX;
+		kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_PREFIX);
 
 	if (cpu_has_feature(CPU_FTR_HVMODE)) {
-		vcpu->arch.hfscr &= mfspr(SPRN_HFSCR);
+		kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) & mfspr(SPRN_HFSCR));
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 		if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
-			vcpu->arch.hfscr |= HFSCR_TM;
+			kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_TM);
 #endif
 	}
 	if (cpu_has_feature(CPU_FTR_TM_COMP))
 		vcpu->arch.hfscr |= HFSCR_TM;
 
-	vcpu->arch.hfscr_permitted = vcpu->arch.hfscr;
+	vcpu->arch.hfscr_permitted = kvmppc_get_hfscr_hv(vcpu);
 
 	/*
 	 * PM, EBB, TM are demand-faulted so start with it clear.
 	 */
-	vcpu->arch.hfscr &= ~(HFSCR_PM | HFSCR_EBB | HFSCR_TM);
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) & ~(HFSCR_PM | HFSCR_EBB | HFSCR_TM));
 
 	kvmppc_mmu_book3s_hv_init(vcpu);
 
@@ -4038,7 +4045,6 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
 /* call our hypervisor to load up HV regs and go */
 static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
 {
-	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 	unsigned long host_psscr;
 	unsigned long msr;
 	struct hv_guest_state hvregs;
@@ -4118,7 +4124,7 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, uns
 	if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
 		dec = (s32) dec;
 	*tb = mftb();
-	vcpu->arch.dec_expires = dec + (*tb + vc->tb_offset);
+	vcpu->arch.dec_expires = dec + (*tb + kvmppc_get_tb_offset_hv(vcpu));
 
 	timer_rearm_host_dec(*tb);
 
@@ -4176,7 +4182,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 		__this_cpu_write(cpu_in_guest, NULL);
 
 		if (trap == BOOK3S_INTERRUPT_SYSCALL &&
-		    !(vcpu->arch.shregs.msr & MSR_PR)) {
+		    !(kvmppc_get_msr(vcpu) & MSR_PR)) {
 			unsigned long req = kvmppc_get_gpr(vcpu, 3);
 
 			/*
@@ -4655,7 +4661,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 
 	if (!nested) {
 		kvmppc_core_prepare_to_enter(vcpu);
-		if (vcpu->arch.shregs.msr & MSR_EE) {
+		if (kvmppc_get_msr(vcpu) & MSR_EE) {
 			if (xive_interrupt_pending(vcpu))
 				kvmppc_inject_interrupt_hv(vcpu,
 						BOOK3S_INTERRUPT_EXTERNAL, 0);
@@ -4677,7 +4683,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 
 	tb = mftb();
 
-	kvmppc_update_vpa_dispatch_p9(vcpu, vc, tb + vc->tb_offset);
+	kvmppc_update_vpa_dispatch_p9(vcpu, vc, tb + kvmppc_get_tb_offset_hv(vcpu));
 
 	trace_kvm_guest_enter(vcpu);
 
@@ -4844,7 +4850,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 		msr |= MSR_VSX;
 	if ((cpu_has_feature(CPU_FTR_TM) ||
 	    cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) &&
-			(vcpu->arch.hfscr & HFSCR_TM))
+			(kvmppc_get_hfscr_hv(vcpu) & HFSCR_TM))
 		msr |= MSR_TM;
 	msr = msr_check_and_set(msr);
 
@@ -4868,7 +4874,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 		if (run->exit_reason == KVM_EXIT_PAPR_HCALL) {
 			accumulate_time(vcpu, &vcpu->arch.hcall);
 
-			if (WARN_ON_ONCE(vcpu->arch.shregs.msr & MSR_PR)) {
+			if (WARN_ON_ONCE(kvmppc_get_msr(vcpu) & MSR_PR)) {
 				/*
 				 * These should have been caught reflected
 				 * into the guest by now. Final sanity check:
diff --git a/arch/powerpc/kvm/book3s_hv.h b/arch/powerpc/kvm/book3s_hv.h
index 2f2e59d7d433..7a7005189ab1 100644
--- a/arch/powerpc/kvm/book3s_hv.h
+++ b/arch/powerpc/kvm/book3s_hv.h
@@ -50,3 +50,62 @@ void accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next);
 #define start_timing(vcpu, next) do {} while (0)
 #define end_timing(vcpu) do {} while (0)
 #endif
+
+#define HV_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+	vcpu->arch.reg = val;						\
+}
+
+#define HV_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
+{									\
+	return vcpu->arch.reg;						\
+}
+
+#define HV_WRAPPER(reg, size)						\
+	HV_WRAPPER_SET(reg, size)					\
+	HV_WRAPPER_GET(reg, size)					\
+
+#define HV_ARRAY_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, int i, u##size val)	\
+{									\
+	vcpu->arch.reg[i] = val;					\
+}
+
+#define HV_ARRAY_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu, int i)	\
+{									\
+	return vcpu->arch.reg[i];					\
+}
+
+#define HV_ARRAY_WRAPPER(reg, size)					\
+	HV_ARRAY_WRAPPER_SET(reg, size)					\
+	HV_ARRAY_WRAPPER_GET(reg, size)					\
+
+HV_WRAPPER(mmcra, 64)
+HV_WRAPPER(hfscr, 64)
+HV_WRAPPER(fscr, 64)
+HV_WRAPPER(dscr, 64)
+HV_WRAPPER(purr, 64)
+HV_WRAPPER(spurr, 64)
+HV_WRAPPER(amr, 64)
+HV_WRAPPER(uamor, 64)
+HV_WRAPPER(siar, 64)
+HV_WRAPPER(sdar, 64)
+HV_WRAPPER(iamr, 64)
+HV_WRAPPER(dawr0, 64)
+HV_WRAPPER(dawr1, 64)
+HV_WRAPPER(dawrx0, 64)
+HV_WRAPPER(dawrx1, 64)
+HV_WRAPPER(ciabr, 64)
+HV_WRAPPER(wort, 64)
+HV_WRAPPER(ppr, 64)
+HV_WRAPPER(ctrl, 64)
+
+HV_ARRAY_WRAPPER(mmcr, 64)
+HV_ARRAY_WRAPPER(sier, 64)
+HV_ARRAY_WRAPPER(pmc, 32)
+
+HV_WRAPPER(pvr, 32)
+HV_WRAPPER(pspb, 32)
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index da85f046377a..9f9e9aab6015 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -182,9 +182,13 @@ EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);
 
 long kvmppc_rm_h_random(struct kvm_vcpu *vcpu)
 {
+	unsigned long rand;
+
 	if (ppc_md.get_random_seed &&
-	    ppc_md.get_random_seed(&vcpu->arch.regs.gpr[4]))
+	    ppc_md.get_random_seed(&rand)) {
+		kvmppc_set_gpr(vcpu, 4, rand);
 		return H_SUCCESS;
+	}
 
 	return H_HARDWARE;
 }
@@ -510,7 +514,7 @@ void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
 	 */
 	if ((msr & MSR_TS_MASK) == MSR_TS_MASK)
 		msr &= ~MSR_TS_MASK;
-	vcpu->arch.shregs.msr = msr;
+	kvmppc_set_msr_fast(vcpu, msr);
 	kvmppc_end_cede(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvmppc_set_msr_hv);
@@ -548,7 +552,7 @@ static void inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 srr1_flags)
 	kvmppc_set_srr0(vcpu, pc);
 	kvmppc_set_srr1(vcpu, (msr & SRR1_MSR_BITS) | srr1_flags);
 	kvmppc_set_pc(vcpu, new_pc);
-	vcpu->arch.shregs.msr = new_msr;
+	kvmppc_set_msr_fast(vcpu, new_msr);
 }
 
 void kvmppc_inject_interrupt_hv(struct kvm_vcpu *vcpu, int vec, u64 srr1_flags)
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 34f1db212824..34bc0a8a1288 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -305,7 +305,7 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6
 	u32 pid;
 
 	lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
-	pid = vcpu->arch.pid;
+	pid = kvmppc_get_pid(vcpu);
 
 	/*
 	 * Prior memory accesses to host PID Q3 must be completed before we
@@ -330,7 +330,7 @@ static void switch_mmu_to_guest_hpt(struct kvm *kvm, struct kvm_vcpu *vcpu, u64
 	int i;
 
 	lpid = kvm->arch.lpid;
-	pid = vcpu->arch.pid;
+	pid = kvmppc_get_pid(vcpu);
 
 	/*
 	 * See switch_mmu_to_guest_radix. ptesync should not be required here
diff --git a/arch/powerpc/kvm/book3s_hv_ras.c b/arch/powerpc/kvm/book3s_hv_ras.c
index ccfd96965630..3b43c3d00311 100644
--- a/arch/powerpc/kvm/book3s_hv_ras.c
+++ b/arch/powerpc/kvm/book3s_hv_ras.c
@@ -15,6 +15,7 @@
 #include <asm/cputhreads.h>
 #include <asm/hmi.h>
 #include <asm/kvm_ppc.h>
+#include "book3s_hv.h"
 
 /* SRR1 bits for machine check on POWER7 */
 #define SRR1_MC_LDSTERR		(1ul << (63-42))
@@ -173,14 +174,14 @@ long kvmppc_p9_realmode_hmi_handler(struct kvm_vcpu *vcpu)
 		ppc_md.hmi_exception_early(NULL);
 
 out:
-	if (vc->tb_offset) {
+	if (kvmppc_get_tb_offset_hv(vcpu)) {
 		u64 new_tb = mftb() + vc->tb_offset;
 		mtspr(SPRN_TBU40, new_tb);
 		if ((mftb() & 0xffffff) < (new_tb & 0xffffff)) {
 			new_tb += 0x1000000;
 			mtspr(SPRN_TBU40, new_tb);
 		}
-		vc->tb_offset_applied = vc->tb_offset;
+		vc->tb_offset_applied = kvmppc_get_tb_offset_hv(vcpu);
 	}
 
 	return ret;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9182324dbef9..17cb75a127b0 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -776,8 +776,8 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 			r = rev[i].guest_rpte | (r & (HPTE_R_R | HPTE_R_C));
 			r &= ~HPTE_GR_RESERVED;
 		}
-		vcpu->arch.regs.gpr[4 + i * 2] = v;
-		vcpu->arch.regs.gpr[5 + i * 2] = r;
+		kvmppc_set_gpr(vcpu, 4 + i * 2, v);
+		kvmppc_set_gpr(vcpu, 5 + i * 2, r);
 	}
 	return H_SUCCESS;
 }
@@ -824,7 +824,7 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
 			}
 		}
 	}
-	vcpu->arch.regs.gpr[4] = gr;
+	kvmppc_set_gpr(vcpu, 4, gr);
 	ret = H_SUCCESS;
  out:
 	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
@@ -872,7 +872,7 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
 			kvmppc_set_dirty_from_hpte(kvm, v, gr);
 		}
 	}
-	vcpu->arch.regs.gpr[4] = gr;
+	kvmppc_set_gpr(vcpu, 4, gr);
 	ret = H_SUCCESS;
  out:
 	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index e165bfa842bf..e42984878503 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -481,7 +481,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 
 unsigned long xics_rm_h_xirr_x(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.regs.gpr[5] = get_tb();
+	kvmppc_set_gpr(vcpu, 5, get_tb());
 	return xics_rm_h_xirr(vcpu);
 }
 
@@ -518,7 +518,7 @@ unsigned long xics_rm_h_xirr(struct kvm_vcpu *vcpu)
 	} while (!icp_rm_try_update(icp, old_state, new_state));
 
 	/* Return the result in GPR4 */
-	vcpu->arch.regs.gpr[4] = xirr;
+	kvmppc_set_gpr(vcpu, 4, xirr);
 
 	return check_too_hard(xics, icp);
 }
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index f4115819e738..4adff4f1896d 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -29,6 +29,7 @@
 #include <linux/seq_file.h>
 
 #include "book3s_xive.h"
+#include "book3s_hv.h"
 
 #define __x_eoi_page(xd)	((void __iomem *)((xd)->eoi_mmio))
 #define __x_trig_page(xd)	((void __iomem *)((xd)->trig_mmio))
@@ -328,7 +329,7 @@ static unsigned long xive_vm_h_xirr(struct kvm_vcpu *vcpu)
 	 */
 
 	/* Return interrupt and old CPPR in GPR4 */
-	vcpu->arch.regs.gpr[4] = hirq | (old_cppr << 24);
+	kvmppc_set_gpr(vcpu, 4, hirq | (old_cppr << 24));
 
 	return H_SUCCESS;
 }
@@ -364,7 +365,7 @@ static unsigned long xive_vm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server
 	hirq = xive_vm_scan_interrupts(xc, pending, scan_poll);
 
 	/* Return interrupt and old CPPR in GPR4 */
-	vcpu->arch.regs.gpr[4] = hirq | (xc->cppr << 24);
+	kvmppc_set_gpr(vcpu, 4, hirq | (xc->cppr << 24));
 
 	return H_SUCCESS;
 }
@@ -2779,8 +2780,6 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 
 int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 {
-	struct kvmppc_vcore *vc = vcpu->arch.vcore;
-
 	/* The VM should have configured XICS mode before doing XICS hcalls. */
 	if (!kvmppc_xics_enabled(vcpu))
 		return H_TOO_HARD;
@@ -2799,7 +2798,7 @@ int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 		return xive_vm_h_ipoll(vcpu, kvmppc_get_gpr(vcpu, 4));
 	case H_XIRR_X:
 		xive_vm_h_xirr(vcpu);
-		kvmppc_set_gpr(vcpu, 5, get_tb() + vc->tb_offset);
+		kvmppc_set_gpr(vcpu, 5, get_tb() + kvmppc_get_tb_offset_hv(vcpu));
 		return H_SUCCESS;
 	}
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 7197c8256668..ca9793c3d437 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1729,7 +1729,7 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 			val = get_reg_val(reg->id, vcpu->arch.vr.vscr.u[3]);
 			break;
 		case KVM_REG_PPC_VRSAVE:
-			val = get_reg_val(reg->id, vcpu->arch.vrsave);
+			val = get_reg_val(reg->id, kvmppc_get_vrsave(vcpu));
 			break;
 #endif /* CONFIG_ALTIVEC */
 		default:
@@ -1784,7 +1784,7 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 				r = -ENXIO;
 				break;
 			}
-			vcpu->arch.vrsave = set_reg_val(reg->id, val);
+			kvmppc_set_vrsave(vcpu, set_reg_val(reg->id, val));
 			break;
 #endif /* CONFIG_ALTIVEC */
 		default:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 1/6] KVM: PPC: Use getters and setters for vcpu register state
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

There are already some getter and setter functions used for accessing
vcpu register state, e.g. kvmppc_get_pc(). There are also more
complicated examples that are generated by macros like
kvmppc_get_sprg0() which are generated by the SHARED_SPRNG_WRAPPER()
macro.

In the new PAPR API for nested guest partitions the L1 is required to
communicate with the L0 to modify and read nested guest state.

Prepare to support this by replacing direct accesses to vcpu register
state with wrapper functions. Follow the existing pattern of using
macros to generate individual wrappers. These wrappers will
be augmented for supporting PAPR nested guests later.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s.h  |  68 +++++++-
 arch/powerpc/include/asm/kvm_ppc.h     |  48 ++++--
 arch/powerpc/kvm/book3s.c              |  22 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   4 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   9 +-
 arch/powerpc/kvm/book3s_64_vio.c       |   4 +-
 arch/powerpc/kvm/book3s_hv.c           | 222 +++++++++++++------------
 arch/powerpc/kvm/book3s_hv.h           |  59 +++++++
 arch/powerpc/kvm/book3s_hv_builtin.c   |  10 +-
 arch/powerpc/kvm/book3s_hv_p9_entry.c  |   4 +-
 arch/powerpc/kvm/book3s_hv_ras.c       |   5 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c    |   8 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c   |   4 +-
 arch/powerpc/kvm/book3s_xive.c         |   9 +-
 arch/powerpc/kvm/powerpc.c             |   4 +-
 15 files changed, 322 insertions(+), 158 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index bbf5e2c5fe09..4e91f54a3f9f 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -392,6 +392,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 	return vcpu->arch.regs.nip;
 }
 
+static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 val)
+{
+	vcpu->arch.pid = val;
+}
+
+static inline u32 kvmppc_get_pid(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.pid;
+}
+
 static inline u64 kvmppc_get_msr(struct kvm_vcpu *vcpu);
 static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu)
 {
@@ -403,10 +413,66 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault_dar;
 }
 
+#define BOOK3S_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+									\
+	vcpu->arch.reg = val;						\
+}
+
+#define BOOK3S_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
+{									\
+	return vcpu->arch.reg;						\
+}
+
+#define BOOK3S_WRAPPER(reg, size)					\
+	BOOK3S_WRAPPER_SET(reg, size)					\
+	BOOK3S_WRAPPER_GET(reg, size)					\
+
+BOOK3S_WRAPPER(tar, 64)
+BOOK3S_WRAPPER(ebbhr, 64)
+BOOK3S_WRAPPER(ebbrr, 64)
+BOOK3S_WRAPPER(bescr, 64)
+BOOK3S_WRAPPER(ic, 64)
+BOOK3S_WRAPPER(vrsave, 64)
+
+
+#define VCORE_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+	vcpu->arch.vcore->reg = val;					\
+}
+
+#define VCORE_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
+{									\
+	return vcpu->arch.vcore->reg;					\
+}
+
+#define VCORE_WRAPPER(reg, size)					\
+	VCORE_WRAPPER_SET(reg, size)					\
+	VCORE_WRAPPER_GET(reg, size)					\
+
+
+VCORE_WRAPPER(vtb, 64)
+VCORE_WRAPPER(tb_offset, 64)
+VCORE_WRAPPER(lpcr, 64)
+
+static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.dec_expires;
+}
+
+static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
+{
+	vcpu->arch.dec_expires = val;
+}
+
 /* Expiry time of vcpu DEC relative to host TB */
 static inline u64 kvmppc_dec_expires_host_tb(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.dec_expires - vcpu->arch.vcore->tb_offset;
+	return kvmppc_get_dec_expires(vcpu) - kvmppc_get_tb_offset_hv(vcpu);
 }
 
 static inline bool is_kvmppc_resume_guest(int r)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 79a9c0bb8bba..fbac353ac46b 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -936,7 +936,7 @@ static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
 #define SPRNG_WRAPPER_SET(reg, bookehv_spr)				\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val)	\
 {									\
-	mtspr(bookehv_spr, val);						\
+	mtspr(bookehv_spr, val);					\
 }									\
 
 #define SHARED_WRAPPER_GET(reg, size)					\
@@ -957,10 +957,32 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
 }									\
 
+#define SHARED_CACHE_WRAPPER_GET(reg, size)				\
+static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
+{									\
+	if (kvmppc_shared_big_endian(vcpu))				\
+	       return be##size##_to_cpu(vcpu->arch.shared->reg);	\
+	else								\
+	       return le##size##_to_cpu(vcpu->arch.shared->reg);	\
+}									\
+
+#define SHARED_CACHE_WRAPPER_SET(reg, size)				\
+static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+	if (kvmppc_shared_big_endian(vcpu))				\
+	       vcpu->arch.shared->reg = cpu_to_be##size(val);		\
+	else								\
+	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
+}									\
+
 #define SHARED_WRAPPER(reg, size)					\
 	SHARED_WRAPPER_GET(reg, size)					\
 	SHARED_WRAPPER_SET(reg, size)					\
 
+#define SHARED_CACHE_WRAPPER(reg, size)					\
+	SHARED_CACHE_WRAPPER_GET(reg, size)				\
+	SHARED_CACHE_WRAPPER_SET(reg, size)				\
+
 #define SPRNG_WRAPPER(reg, bookehv_spr)					\
 	SPRNG_WRAPPER_GET(reg, bookehv_spr)				\
 	SPRNG_WRAPPER_SET(reg, bookehv_spr)				\
@@ -970,23 +992,29 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 #define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)			\
 	SPRNG_WRAPPER(reg, bookehv_spr)					\
 
+#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr)		\
+	SPRNG_WRAPPER(reg, bookehv_spr)					\
+
 #else
 
 #define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)			\
 	SHARED_WRAPPER(reg, size)					\
 
+#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr)		\
+	SHARED_CACHE_WRAPPER(reg, size)					\
+
 #endif
 
 SHARED_WRAPPER(critical, 64)
-SHARED_SPRNG_WRAPPER(sprg0, 64, SPRN_GSPRG0)
-SHARED_SPRNG_WRAPPER(sprg1, 64, SPRN_GSPRG1)
-SHARED_SPRNG_WRAPPER(sprg2, 64, SPRN_GSPRG2)
-SHARED_SPRNG_WRAPPER(sprg3, 64, SPRN_GSPRG3)
-SHARED_SPRNG_WRAPPER(srr0, 64, SPRN_GSRR0)
-SHARED_SPRNG_WRAPPER(srr1, 64, SPRN_GSRR1)
-SHARED_SPRNG_WRAPPER(dar, 64, SPRN_GDEAR)
+SHARED_SPRNG_CACHE_WRAPPER(sprg0, 64, SPRN_GSPRG0)
+SHARED_SPRNG_CACHE_WRAPPER(sprg1, 64, SPRN_GSPRG1)
+SHARED_SPRNG_CACHE_WRAPPER(sprg2, 64, SPRN_GSPRG2)
+SHARED_SPRNG_CACHE_WRAPPER(sprg3, 64, SPRN_GSPRG3)
+SHARED_SPRNG_CACHE_WRAPPER(srr0, 64, SPRN_GSRR0)
+SHARED_SPRNG_CACHE_WRAPPER(srr1, 64, SPRN_GSRR1)
+SHARED_SPRNG_CACHE_WRAPPER(dar, 64, SPRN_GDEAR)
 SHARED_SPRNG_WRAPPER(esr, 64, SPRN_GESR)
-SHARED_WRAPPER_GET(msr, 64)
+SHARED_CACHE_WRAPPER_GET(msr, 64)
 static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val)
 {
 	if (kvmppc_shared_big_endian(vcpu))
@@ -994,7 +1022,7 @@ static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val)
 	else
 	       vcpu->arch.shared->msr = cpu_to_le64(val);
 }
-SHARED_WRAPPER(dsisr, 32)
+SHARED_CACHE_WRAPPER(dsisr, 32)
 SHARED_WRAPPER(int_pending, 32)
 SHARED_WRAPPER(sprg4, 64)
 SHARED_WRAPPER(sprg5, 64)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 686d8d9eda3e..2fe31b518886 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -565,7 +565,7 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 	regs->msr = kvmppc_get_msr(vcpu);
 	regs->srr0 = kvmppc_get_srr0(vcpu);
 	regs->srr1 = kvmppc_get_srr1(vcpu);
-	regs->pid = vcpu->arch.pid;
+	regs->pid = kvmppc_get_pid(vcpu);
 	regs->sprg0 = kvmppc_get_sprg0(vcpu);
 	regs->sprg1 = kvmppc_get_sprg1(vcpu);
 	regs->sprg2 = kvmppc_get_sprg2(vcpu);
@@ -683,19 +683,19 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			*val = get_reg_val(id, vcpu->arch.fscr);
 			break;
 		case KVM_REG_PPC_TAR:
-			*val = get_reg_val(id, vcpu->arch.tar);
+			*val = get_reg_val(id, kvmppc_get_tar(vcpu));
 			break;
 		case KVM_REG_PPC_EBBHR:
-			*val = get_reg_val(id, vcpu->arch.ebbhr);
+			*val = get_reg_val(id, kvmppc_get_ebbhr(vcpu));
 			break;
 		case KVM_REG_PPC_EBBRR:
-			*val = get_reg_val(id, vcpu->arch.ebbrr);
+			*val = get_reg_val(id, kvmppc_get_ebbrr(vcpu));
 			break;
 		case KVM_REG_PPC_BESCR:
-			*val = get_reg_val(id, vcpu->arch.bescr);
+			*val = get_reg_val(id, kvmppc_get_bescr(vcpu));
 			break;
 		case KVM_REG_PPC_IC:
-			*val = get_reg_val(id, vcpu->arch.ic);
+			*val = get_reg_val(id, kvmppc_get_ic(vcpu));
 			break;
 		default:
 			r = -EINVAL;
@@ -768,19 +768,19 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			vcpu->arch.fscr = set_reg_val(id, *val);
 			break;
 		case KVM_REG_PPC_TAR:
-			vcpu->arch.tar = set_reg_val(id, *val);
+			kvmppc_set_tar(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_EBBHR:
-			vcpu->arch.ebbhr = set_reg_val(id, *val);
+			kvmppc_set_ebbhr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_EBBRR:
-			vcpu->arch.ebbrr = set_reg_val(id, *val);
+			kvmppc_set_ebbrr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_BESCR:
-			vcpu->arch.bescr = set_reg_val(id, *val);
+			kvmppc_set_bescr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_IC:
-			vcpu->arch.ic = set_reg_val(id, *val);
+			kvmppc_set_ic(vcpu, set_reg_val(id, *val));
 			break;
 		default:
 			r = -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7f765d5ad436..738f2ecbe9b9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -347,7 +347,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	unsigned long v, orig_v, gr;
 	__be64 *hptep;
 	long int index;
-	int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
+	int virtmode = kvmppc_get_msr(vcpu) & (data ? MSR_DR : MSR_IR);
 
 	if (kvm_is_radix(vcpu->kvm))
 		return kvmppc_mmu_radix_xlate(vcpu, eaddr, gpte, data, iswrite);
@@ -385,7 +385,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 
 	/* Get PP bits and key for permission check */
 	pp = gr & (HPTE_R_PP0 | HPTE_R_PP);
-	key = (vcpu->arch.shregs.msr & MSR_PR) ? SLB_VSID_KP : SLB_VSID_KS;
+	key = (kvmppc_get_msr(vcpu) & MSR_PR) ? SLB_VSID_KP : SLB_VSID_KS;
 	key &= slb_v;
 
 	/* Calculate permissions */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 461307b89c3a..e1aa078580a1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -15,6 +15,7 @@
 
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
+#include "book3s_hv.h"
 #include <asm/page.h>
 #include <asm/mmu.h>
 #include <asm/pgalloc.h>
@@ -96,7 +97,7 @@ static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr,
 					  void *to, void *from, unsigned long n)
 {
 	int lpid = vcpu->kvm->arch.lpid;
-	int pid = vcpu->arch.pid;
+	int pid = kvmppc_get_pid(vcpu);
 
 	/* This would cause a data segment intr so don't allow the access */
 	if (eaddr & (0x3FFUL << 52))
@@ -270,7 +271,7 @@ int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	/* Work out effective PID */
 	switch (eaddr >> 62) {
 	case 0:
-		pid = vcpu->arch.pid;
+		pid = kvmppc_get_pid(vcpu);
 		break;
 	case 3:
 		pid = 0;
@@ -294,9 +295,9 @@ int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	} else {
 		if (!(pte & _PAGE_PRIVILEGED)) {
 			/* Check AMR/IAMR to see if strict mode is in force */
-			if (vcpu->arch.amr & (1ul << 62))
+			if (kvmppc_get_amr_hv(vcpu) & (1ul << 62))
 				gpte->may_read = 0;
-			if (vcpu->arch.amr & (1ul << 63))
+			if (kvmppc_get_amr_hv(vcpu) & (1ul << 63))
 				gpte->may_write = 0;
 			if (vcpu->arch.iamr & (1ul << 62))
 				gpte->may_execute = 0;
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 93b695b289e9..4ba048f272f2 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -786,12 +786,12 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	idx = (ioba >> stt->page_shift) - stt->offset;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	if (!page) {
-		vcpu->arch.regs.gpr[4] = 0;
+		kvmppc_set_gpr(vcpu, 4, 0);
 		return H_SUCCESS;
 	}
 	tbl = (u64 *)page_address(page);
 
-	vcpu->arch.regs.gpr[4] = tbl[idx % TCES_PER_PAGE];
+	kvmppc_set_gpr(vcpu, 4, tbl[idx % TCES_PER_PAGE]);
 
 	return H_SUCCESS;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 130bafdb1430..521d84621422 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -383,11 +383,6 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 	spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
 }
 
-static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
-{
-	vcpu->arch.pvr = pvr;
-}
-
 /* Dummy value used in computing PCR value below */
 #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
 
@@ -794,7 +789,7 @@ static void kvmppc_update_vpa_dispatch(struct kvm_vcpu *vcpu,
 
 	vpa->enqueue_dispatch_tb = cpu_to_be64(be64_to_cpu(vpa->enqueue_dispatch_tb) + stolen);
 
-	__kvmppc_create_dtl_entry(vcpu, vpa, vc->pcpu, now + vc->tb_offset, stolen);
+	__kvmppc_create_dtl_entry(vcpu, vpa, vc->pcpu, now + kvmppc_get_tb_offset_hv(vcpu), stolen);
 
 	vcpu->arch.vpa.dirty = true;
 }
@@ -868,7 +863,7 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
 		/* Guests can't breakpoint the hypervisor */
 		if ((value1 & CIABR_PRIV) = CIABR_PRIV_HYPER)
 			return H_P3;
-		vcpu->arch.ciabr  = value1;
+		kvmppc_set_ciabr_hv(vcpu, value1);
 		return H_SUCCESS;
 	case H_SET_MODE_RESOURCE_SET_DAWR0:
 		if (!kvmppc_power8_compatible(vcpu))
@@ -879,8 +874,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
 			return H_UNSUPPORTED_FLAG_START;
 		if (value2 & DABRX_HYP)
 			return H_P4;
-		vcpu->arch.dawr0  = value1;
-		vcpu->arch.dawrx0 = value2;
+		kvmppc_set_dawr0_hv(vcpu, value1);
+		kvmppc_set_dawrx0_hv(vcpu, value2);
 		return H_SUCCESS;
 	case H_SET_MODE_RESOURCE_SET_DAWR1:
 		if (!kvmppc_power8_compatible(vcpu))
@@ -895,8 +890,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
 			return H_UNSUPPORTED_FLAG_START;
 		if (value2 & DABRX_HYP)
 			return H_P4;
-		vcpu->arch.dawr1  = value1;
-		vcpu->arch.dawrx1 = value2;
+		kvmppc_set_dawr1_hv(vcpu, value1);
+		kvmppc_set_dawrx1_hv(vcpu, value2);
 		return H_SUCCESS;
 	case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE:
 		/*
@@ -1268,8 +1263,11 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 		break;
 #endif
 	case H_RANDOM:
-		if (!arch_get_random_seed_longs(&vcpu->arch.regs.gpr[4], 1))
+		unsigned long rand;
+
+		if (!arch_get_random_seed_longs(&rand, 1))
 			ret = H_HARDWARE;
+		kvmppc_set_gpr(vcpu, 4, rand);
 		break;
 	case H_RPT_INVALIDATE:
 		ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
@@ -1370,7 +1368,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
  */
 static void kvmppc_cede(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.shregs.msr |= MSR_EE;
+	kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
 	vcpu->arch.ceded = 1;
 	smp_mb();
 	if (vcpu->arch.prodded) {
@@ -1544,7 +1542,7 @@ static int kvmppc_pmu_unavailable(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.hfscr_permitted & HFSCR_PM))
 		return EMULATE_FAIL;
 
-	vcpu->arch.hfscr |= HFSCR_PM;
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_PM);
 
 	return RESUME_GUEST;
 }
@@ -1554,7 +1552,7 @@ static int kvmppc_ebb_unavailable(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.hfscr_permitted & HFSCR_EBB))
 		return EMULATE_FAIL;
 
-	vcpu->arch.hfscr |= HFSCR_EBB;
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_EBB);
 
 	return RESUME_GUEST;
 }
@@ -1564,7 +1562,7 @@ static int kvmppc_tm_unavailable(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.hfscr_permitted & HFSCR_TM))
 		return EMULATE_FAIL;
 
-	vcpu->arch.hfscr |= HFSCR_TM;
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_TM);
 
 	return RESUME_GUEST;
 }
@@ -1585,7 +1583,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 	 * That can happen due to a bug, or due to a machine check
 	 * occurring at just the wrong time.
 	 */
-	if (vcpu->arch.shregs.msr & MSR_HV) {
+	if (kvmppc_get_msr(vcpu) & MSR_HV) {
 		printk(KERN_EMERG "KVM trap in HV mode!\n");
 		printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
 			vcpu->arch.trap, kvmppc_get_pc(vcpu),
@@ -1636,7 +1634,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		 * so that it knows that the machine check occurred.
 		 */
 		if (!vcpu->kvm->arch.fwnmi_enabled) {
-			ulong flags = (vcpu->arch.shregs.msr & 0x083c0000) |
+			ulong flags = (kvmppc_get_msr(vcpu) & 0x083c0000) |
 					(kvmppc_get_msr(vcpu) & SRR1_PREFIXED);
 			kvmppc_core_queue_machine_check(vcpu, flags);
 			r = RESUME_GUEST;
@@ -1666,7 +1664,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		 * as a result of a hypervisor emulation interrupt
 		 * (e40) getting turned into a 700 by BML RTAS.
 		 */
-		flags = (vcpu->arch.shregs.msr & 0x1f0000ull) |
+		flags = (kvmppc_get_msr(vcpu) & 0x1f0000ull) |
 			(kvmppc_get_msr(vcpu) & SRR1_PREFIXED);
 		kvmppc_core_queue_program(vcpu, flags);
 		r = RESUME_GUEST;
@@ -1676,7 +1674,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 	{
 		int i;
 
-		if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
+		if (unlikely(kvmppc_get_msr(vcpu) & MSR_PR)) {
 			/*
 			 * Guest userspace executed sc 1. This can only be
 			 * reached by the P9 path because the old path
@@ -1754,7 +1752,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			break;
 		}
 
-		if (!(vcpu->arch.shregs.msr & MSR_DR))
+		if (!(kvmppc_get_msr(vcpu) & MSR_DR))
 			vsid = vcpu->kvm->arch.vrma_slb_v;
 		else
 			vsid = vcpu->arch.fault_gpa;
@@ -1778,7 +1776,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		long err;
 
 		vcpu->arch.fault_dar = kvmppc_get_pc(vcpu);
-		vcpu->arch.fault_dsisr = vcpu->arch.shregs.msr &
+		vcpu->arch.fault_dsisr = kvmppc_get_msr(vcpu) &
 			DSISR_SRR1_MATCH_64S;
 		if (kvm_is_radix(vcpu->kvm) || !cpu_has_feature(CPU_FTR_ARCH_300)) {
 			/*
@@ -1787,7 +1785,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			 * hash fault handling below is v3 only (it uses ASDR
 			 * via fault_gpa).
 			 */
-			if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE)
+			if (kvmppc_get_msr(vcpu) & HSRR1_HISI_WRITE)
 				vcpu->arch.fault_dsisr |= DSISR_ISSTORE;
 			r = RESUME_PAGE_FAULT;
 			break;
@@ -1801,7 +1799,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			break;
 		}
 
-		if (!(vcpu->arch.shregs.msr & MSR_IR))
+		if (!(kvmppc_get_msr(vcpu) & MSR_IR))
 			vsid = vcpu->kvm->arch.vrma_slb_v;
 		else
 			vsid = vcpu->arch.fault_gpa;
@@ -1863,7 +1861,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 	 * Otherwise, we just generate a program interrupt to the guest.
 	 */
 	case BOOK3S_INTERRUPT_H_FAC_UNAVAIL: {
-		u64 cause = vcpu->arch.hfscr >> 56;
+		u64 cause = kvmppc_get_hfscr_hv(vcpu) >> 56;
 
 		r = EMULATE_FAIL;
 		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
@@ -1891,7 +1889,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		kvmppc_dump_regs(vcpu);
 		printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
 			vcpu->arch.trap, kvmppc_get_pc(vcpu),
-			vcpu->arch.shregs.msr);
+			kvmppc_get_msr(vcpu));
 		run->hw.hardware_exit_reason = vcpu->arch.trap;
 		r = RESUME_HOST;
 		break;
@@ -1915,11 +1913,11 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu)
 	 * That can happen due to a bug, or due to a machine check
 	 * occurring at just the wrong time.
 	 */
-	if (vcpu->arch.shregs.msr & MSR_HV) {
+	if (kvmppc_get_msr(vcpu) & MSR_HV) {
 		pr_emerg("KVM trap in HV mode while nested!\n");
 		pr_emerg("trap=0x%x | pc=0x%lx | msr=0x%llx\n",
 			 vcpu->arch.trap, kvmppc_get_pc(vcpu),
-			 vcpu->arch.shregs.msr);
+			 kvmppc_get_msr(vcpu));
 		kvmppc_dump_regs(vcpu);
 		return RESUME_HOST;
 	}
@@ -1976,7 +1974,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu)
 		vcpu->arch.fault_dar = kvmppc_get_pc(vcpu);
 		vcpu->arch.fault_dsisr = kvmppc_get_msr(vcpu) &
 					 DSISR_SRR1_MATCH_64S;
-		if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE)
+		if (kvmppc_get_msr(vcpu) & HSRR1_HISI_WRITE)
 			vcpu->arch.fault_dsisr |= DSISR_ISSTORE;
 		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 		r = kvmhv_nested_page_fault(vcpu);
@@ -2182,7 +2180,7 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 		}
 	}
 
-	vc->lpcr = new_lpcr;
+	kvmppc_set_lpcr_hv(vcpu, new_lpcr);
 
 	spin_unlock(&vc->lock);
 }
@@ -2207,64 +2205,64 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		*val = get_reg_val(id, vcpu->arch.dabrx);
 		break;
 	case KVM_REG_PPC_DSCR:
-		*val = get_reg_val(id, vcpu->arch.dscr);
+		*val = get_reg_val(id, kvmppc_get_dscr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_PURR:
-		*val = get_reg_val(id, vcpu->arch.purr);
+		*val = get_reg_val(id, kvmppc_get_purr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_SPURR:
-		*val = get_reg_val(id, vcpu->arch.spurr);
+		*val = get_reg_val(id, kvmppc_get_spurr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_AMR:
-		*val = get_reg_val(id, vcpu->arch.amr);
+		*val = get_reg_val(id, kvmppc_get_amr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_UAMOR:
-		*val = get_reg_val(id, vcpu->arch.uamor);
+		*val = get_reg_val(id, kvmppc_get_uamor_hv(vcpu));
 		break;
 	case KVM_REG_PPC_MMCR0 ... KVM_REG_PPC_MMCR1:
 		i = id - KVM_REG_PPC_MMCR0;
-		*val = get_reg_val(id, vcpu->arch.mmcr[i]);
+		*val = get_reg_val(id, kvmppc_get_mmcr_hv(vcpu, i));
 		break;
 	case KVM_REG_PPC_MMCR2:
-		*val = get_reg_val(id, vcpu->arch.mmcr[2]);
+		*val = get_reg_val(id, kvmppc_get_mmcr_hv(vcpu, 2));
 		break;
 	case KVM_REG_PPC_MMCRA:
-		*val = get_reg_val(id, vcpu->arch.mmcra);
+		*val = get_reg_val(id, kvmppc_get_mmcra_hv(vcpu));
 		break;
 	case KVM_REG_PPC_MMCRS:
 		*val = get_reg_val(id, vcpu->arch.mmcrs);
 		break;
 	case KVM_REG_PPC_MMCR3:
-		*val = get_reg_val(id, vcpu->arch.mmcr[3]);
+		*val = get_reg_val(id, kvmppc_get_mmcr_hv(vcpu, 3));
 		break;
 	case KVM_REG_PPC_PMC1 ... KVM_REG_PPC_PMC8:
 		i = id - KVM_REG_PPC_PMC1;
-		*val = get_reg_val(id, vcpu->arch.pmc[i]);
+		*val = get_reg_val(id, kvmppc_get_pmc_hv(vcpu, i));
 		break;
 	case KVM_REG_PPC_SPMC1 ... KVM_REG_PPC_SPMC2:
 		i = id - KVM_REG_PPC_SPMC1;
 		*val = get_reg_val(id, vcpu->arch.spmc[i]);
 		break;
 	case KVM_REG_PPC_SIAR:
-		*val = get_reg_val(id, vcpu->arch.siar);
+		*val = get_reg_val(id, kvmppc_get_siar_hv(vcpu));
 		break;
 	case KVM_REG_PPC_SDAR:
-		*val = get_reg_val(id, vcpu->arch.sdar);
+		*val = get_reg_val(id, kvmppc_get_siar_hv(vcpu));
 		break;
 	case KVM_REG_PPC_SIER:
-		*val = get_reg_val(id, vcpu->arch.sier[0]);
+		*val = get_reg_val(id, kvmppc_get_sier_hv(vcpu, 0));
 		break;
 	case KVM_REG_PPC_SIER2:
-		*val = get_reg_val(id, vcpu->arch.sier[1]);
+		*val = get_reg_val(id, kvmppc_get_sier_hv(vcpu, 1));
 		break;
 	case KVM_REG_PPC_SIER3:
-		*val = get_reg_val(id, vcpu->arch.sier[2]);
+		*val = get_reg_val(id, kvmppc_get_sier_hv(vcpu, 2));
 		break;
 	case KVM_REG_PPC_IAMR:
-		*val = get_reg_val(id, vcpu->arch.iamr);
+		*val = get_reg_val(id, kvmppc_get_iamr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_PSPB:
-		*val = get_reg_val(id, vcpu->arch.pspb);
+		*val = get_reg_val(id, kvmppc_get_pspb_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DPDES:
 		/*
@@ -2279,22 +2277,22 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 			*val = get_reg_val(id, vcpu->arch.vcore->dpdes);
 		break;
 	case KVM_REG_PPC_VTB:
-		*val = get_reg_val(id, vcpu->arch.vcore->vtb);
+		*val = get_reg_val(id, kvmppc_get_vtb_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWR:
-		*val = get_reg_val(id, vcpu->arch.dawr0);
+		*val = get_reg_val(id, kvmppc_get_dawr0_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWRX:
-		*val = get_reg_val(id, vcpu->arch.dawrx0);
+		*val = get_reg_val(id, kvmppc_get_dawrx0_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWR1:
-		*val = get_reg_val(id, vcpu->arch.dawr1);
+		*val = get_reg_val(id, kvmppc_get_dawr1_hv(vcpu));
 		break;
 	case KVM_REG_PPC_DAWRX1:
-		*val = get_reg_val(id, vcpu->arch.dawrx1);
+		*val = get_reg_val(id, kvmppc_get_dawrx1_hv(vcpu));
 		break;
 	case KVM_REG_PPC_CIABR:
-		*val = get_reg_val(id, vcpu->arch.ciabr);
+		*val = get_reg_val(id, kvmppc_get_ciabr_hv(vcpu));
 		break;
 	case KVM_REG_PPC_CSIGR:
 		*val = get_reg_val(id, vcpu->arch.csigr);
@@ -2306,13 +2304,13 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		*val = get_reg_val(id, vcpu->arch.tcscr);
 		break;
 	case KVM_REG_PPC_PID:
-		*val = get_reg_val(id, vcpu->arch.pid);
+		*val = get_reg_val(id, kvmppc_get_pid(vcpu));
 		break;
 	case KVM_REG_PPC_ACOP:
 		*val = get_reg_val(id, vcpu->arch.acop);
 		break;
 	case KVM_REG_PPC_WORT:
-		*val = get_reg_val(id, vcpu->arch.wort);
+		*val = get_reg_val(id, kvmppc_get_wort_hv(vcpu));
 		break;
 	case KVM_REG_PPC_TIDR:
 		*val = get_reg_val(id, vcpu->arch.tid);
@@ -2338,14 +2336,14 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		spin_unlock(&vcpu->arch.vpa_update_lock);
 		break;
 	case KVM_REG_PPC_TB_OFFSET:
-		*val = get_reg_val(id, vcpu->arch.vcore->tb_offset);
+		*val = get_reg_val(id, kvmppc_get_tb_offset_hv(vcpu));
 		break;
 	case KVM_REG_PPC_LPCR:
 	case KVM_REG_PPC_LPCR_64:
 		*val = get_reg_val(id, vcpu->arch.vcore->lpcr);
 		break;
 	case KVM_REG_PPC_PPR:
-		*val = get_reg_val(id, vcpu->arch.ppr);
+		*val = get_reg_val(id, kvmppc_get_ppr_hv(vcpu));
 		break;
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	case KVM_REG_PPC_TFHAR:
@@ -2417,7 +2415,7 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		*val = get_reg_val(id, vcpu->arch.vcore->arch_compat);
 		break;
 	case KVM_REG_PPC_DEC_EXPIRY:
-		*val = get_reg_val(id, vcpu->arch.dec_expires);
+		*val = get_reg_val(id, kvmppc_get_dec_expires(vcpu));
 		break;
 	case KVM_REG_PPC_ONLINE:
 		*val = get_reg_val(id, vcpu->arch.online);
@@ -2425,6 +2423,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_PTCR:
 		*val = get_reg_val(id, vcpu->kvm->arch.l1_ptcr);
 		break;
+	case KVM_REG_PPC_FSCR:
+		*val = get_reg_val(id, kvmppc_get_fscr_hv(vcpu));
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -2453,29 +2454,29 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		vcpu->arch.dabrx = set_reg_val(id, *val) & ~DABRX_HYP;
 		break;
 	case KVM_REG_PPC_DSCR:
-		vcpu->arch.dscr = set_reg_val(id, *val);
+		kvmppc_set_dscr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_PURR:
-		vcpu->arch.purr = set_reg_val(id, *val);
+		kvmppc_set_purr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SPURR:
-		vcpu->arch.spurr = set_reg_val(id, *val);
+		kvmppc_set_spurr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_AMR:
-		vcpu->arch.amr = set_reg_val(id, *val);
+		kvmppc_set_amr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_UAMOR:
-		vcpu->arch.uamor = set_reg_val(id, *val);
+		kvmppc_set_uamor_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCR0 ... KVM_REG_PPC_MMCR1:
 		i = id - KVM_REG_PPC_MMCR0;
-		vcpu->arch.mmcr[i] = set_reg_val(id, *val);
+		kvmppc_set_mmcr_hv(vcpu, i, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCR2:
-		vcpu->arch.mmcr[2] = set_reg_val(id, *val);
+		kvmppc_set_mmcr_hv(vcpu, 2, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCRA:
-		vcpu->arch.mmcra = set_reg_val(id, *val);
+		kvmppc_set_mmcra_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_MMCRS:
 		vcpu->arch.mmcrs = set_reg_val(id, *val);
@@ -2485,32 +2486,32 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		break;
 	case KVM_REG_PPC_PMC1 ... KVM_REG_PPC_PMC8:
 		i = id - KVM_REG_PPC_PMC1;
-		vcpu->arch.pmc[i] = set_reg_val(id, *val);
+		kvmppc_set_pmc_hv(vcpu, i, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SPMC1 ... KVM_REG_PPC_SPMC2:
 		i = id - KVM_REG_PPC_SPMC1;
 		vcpu->arch.spmc[i] = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_SIAR:
-		vcpu->arch.siar = set_reg_val(id, *val);
+		kvmppc_set_siar_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SDAR:
-		vcpu->arch.sdar = set_reg_val(id, *val);
+		kvmppc_set_sdar_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SIER:
-		vcpu->arch.sier[0] = set_reg_val(id, *val);
+		kvmppc_set_sier_hv(vcpu, 0, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SIER2:
-		vcpu->arch.sier[1] = set_reg_val(id, *val);
+		kvmppc_set_sier_hv(vcpu, 1, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_SIER3:
-		vcpu->arch.sier[2] = set_reg_val(id, *val);
+		kvmppc_set_sier_hv(vcpu, 2, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_IAMR:
-		vcpu->arch.iamr = set_reg_val(id, *val);
+		kvmppc_set_iamr_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_PSPB:
-		vcpu->arch.pspb = set_reg_val(id, *val);
+		kvmppc_set_pspb_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DPDES:
 		if (cpu_has_feature(CPU_FTR_ARCH_300))
@@ -2519,25 +2520,25 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 			vcpu->arch.vcore->dpdes = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_VTB:
-		vcpu->arch.vcore->vtb = set_reg_val(id, *val);
+		kvmppc_set_vtb_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DAWR:
-		vcpu->arch.dawr0 = set_reg_val(id, *val);
+		kvmppc_set_dawr0_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DAWRX:
-		vcpu->arch.dawrx0 = set_reg_val(id, *val) & ~DAWRX_HYP;
+		kvmppc_set_dawrx0_hv(vcpu, set_reg_val(id, *val) & ~DAWRX_HYP);
 		break;
 	case KVM_REG_PPC_DAWR1:
-		vcpu->arch.dawr1 = set_reg_val(id, *val);
+		kvmppc_set_dawr1_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DAWRX1:
-		vcpu->arch.dawrx1 = set_reg_val(id, *val) & ~DAWRX_HYP;
+		kvmppc_set_dawrx1_hv(vcpu, set_reg_val(id, *val) & ~DAWRX_HYP);
 		break;
 	case KVM_REG_PPC_CIABR:
-		vcpu->arch.ciabr = set_reg_val(id, *val);
+		kvmppc_set_ciabr_hv(vcpu, set_reg_val(id, *val));
 		/* Don't allow setting breakpoints in hypervisor code */
-		if ((vcpu->arch.ciabr & CIABR_PRIV) = CIABR_PRIV_HYPER)
-			vcpu->arch.ciabr &= ~CIABR_PRIV;	/* disable */
+		if ((kvmppc_get_ciabr_hv(vcpu) & CIABR_PRIV) = CIABR_PRIV_HYPER)
+			kvmppc_set_ciabr_hv(vcpu, kvmppc_get_ciabr_hv(vcpu) & ~CIABR_PRIV);	/* disable */
 		break;
 	case KVM_REG_PPC_CSIGR:
 		vcpu->arch.csigr = set_reg_val(id, *val);
@@ -2549,13 +2550,13 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		vcpu->arch.tcscr = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_PID:
-		vcpu->arch.pid = set_reg_val(id, *val);
+		kvmppc_set_pid(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_ACOP:
 		vcpu->arch.acop = set_reg_val(id, *val);
 		break;
 	case KVM_REG_PPC_WORT:
-		vcpu->arch.wort = set_reg_val(id, *val);
+		kvmppc_set_wort_hv(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_TIDR:
 		vcpu->arch.tid = set_reg_val(id, *val);
@@ -2602,10 +2603,11 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		 * decrementer, which is better than a large one that
 		 * causes a hang.
 		 */
-		if (!vcpu->arch.dec_expires && tb_offset)
-			vcpu->arch.dec_expires = get_tb() + tb_offset;
+		kvmppc_set_tb_offset_hv(vcpu, tb_offset);
+		if (!kvmppc_get_dec_expires(vcpu) && tb_offset)
+			kvmppc_set_dec_expires(vcpu, get_tb() + tb_offset);
 
-		vcpu->arch.vcore->tb_offset = tb_offset;
+		kvmppc_set_tb_offset_hv(vcpu, tb_offset);
 		break;
 	}
 	case KVM_REG_PPC_LPCR:
@@ -2615,7 +2617,7 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		kvmppc_set_lpcr(vcpu, set_reg_val(id, *val), false);
 		break;
 	case KVM_REG_PPC_PPR:
-		vcpu->arch.ppr = set_reg_val(id, *val);
+		kvmppc_set_ppr_hv(vcpu, set_reg_val(id, *val));
 		break;
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	case KVM_REG_PPC_TFHAR:
@@ -2686,7 +2688,7 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 		r = kvmppc_set_arch_compat(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_DEC_EXPIRY:
-		vcpu->arch.dec_expires = set_reg_val(id, *val);
+		kvmppc_set_dec_expires(vcpu, set_reg_val(id, *val));
 		break;
 	case KVM_REG_PPC_ONLINE:
 		i = set_reg_val(id, *val);
@@ -2699,6 +2701,9 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_PTCR:
 		vcpu->kvm->arch.l1_ptcr = set_reg_val(id, *val);
 		break;
+	case KVM_REG_PPC_FSCR:
+		kvmppc_set_fscr_hv(vcpu, set_reg_val(id, *val));
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -2916,19 +2921,20 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 	vcpu->arch.shared_big_endian = false;
 #endif
 #endif
-	vcpu->arch.mmcr[0] = MMCR0_FC;
+	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
+
 	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
-		vcpu->arch.mmcr[0] |= MMCR0_PMCCEXT;
-		vcpu->arch.mmcra = MMCRA_BHRB_DISABLE;
+		kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
+		kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
 	}
 
-	vcpu->arch.ctrl = CTRL_RUNLATCH;
+	kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
 	/* default to host PVR, since we can't spoof it */
 	kvmppc_set_pvr_hv(vcpu, mfspr(SPRN_PVR));
 	spin_lock_init(&vcpu->arch.vpa_update_lock);
 	spin_lock_init(&vcpu->arch.tbacct_lock);
 	vcpu->arch.busy_preempt = TB_NIL;
-	vcpu->arch.shregs.msr = MSR_ME;
+	kvmppc_set_msr_fast(vcpu, MSR_ME);
 	vcpu->arch.intr_msr = MSR_SF | MSR_ME;
 
 	/*
@@ -2938,29 +2944,30 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 	 * don't set the HFSCR_MSGP bit, and that causes those instructions
 	 * to trap and then we emulate them.
 	 */
-	vcpu->arch.hfscr = HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
-		HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP;
+	kvmppc_set_hfscr_hv(vcpu, HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
+			    HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP);
 
 	/* On POWER10 and later, allow prefixed instructions */
 	if (cpu_has_feature(CPU_FTR_ARCH_31))
-		vcpu->arch.hfscr |= HFSCR_PREFIX;
+		kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_PREFIX);
 
 	if (cpu_has_feature(CPU_FTR_HVMODE)) {
-		vcpu->arch.hfscr &= mfspr(SPRN_HFSCR);
+		kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) & mfspr(SPRN_HFSCR));
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 		if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
-			vcpu->arch.hfscr |= HFSCR_TM;
+			kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_TM);
 #endif
 	}
 	if (cpu_has_feature(CPU_FTR_TM_COMP))
 		vcpu->arch.hfscr |= HFSCR_TM;
 
-	vcpu->arch.hfscr_permitted = vcpu->arch.hfscr;
+	vcpu->arch.hfscr_permitted = kvmppc_get_hfscr_hv(vcpu);
 
 	/*
 	 * PM, EBB, TM are demand-faulted so start with it clear.
 	 */
-	vcpu->arch.hfscr &= ~(HFSCR_PM | HFSCR_EBB | HFSCR_TM);
+	kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) & ~(HFSCR_PM | HFSCR_EBB | HFSCR_TM));
 
 	kvmppc_mmu_book3s_hv_init(vcpu);
 
@@ -4038,7 +4045,6 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
 /* call our hypervisor to load up HV regs and go */
 static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
 {
-	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 	unsigned long host_psscr;
 	unsigned long msr;
 	struct hv_guest_state hvregs;
@@ -4118,7 +4124,7 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, uns
 	if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
 		dec = (s32) dec;
 	*tb = mftb();
-	vcpu->arch.dec_expires = dec + (*tb + vc->tb_offset);
+	vcpu->arch.dec_expires = dec + (*tb + kvmppc_get_tb_offset_hv(vcpu));
 
 	timer_rearm_host_dec(*tb);
 
@@ -4176,7 +4182,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 		__this_cpu_write(cpu_in_guest, NULL);
 
 		if (trap = BOOK3S_INTERRUPT_SYSCALL &&
-		    !(vcpu->arch.shregs.msr & MSR_PR)) {
+		    !(kvmppc_get_msr(vcpu) & MSR_PR)) {
 			unsigned long req = kvmppc_get_gpr(vcpu, 3);
 
 			/*
@@ -4655,7 +4661,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 
 	if (!nested) {
 		kvmppc_core_prepare_to_enter(vcpu);
-		if (vcpu->arch.shregs.msr & MSR_EE) {
+		if (kvmppc_get_msr(vcpu) & MSR_EE) {
 			if (xive_interrupt_pending(vcpu))
 				kvmppc_inject_interrupt_hv(vcpu,
 						BOOK3S_INTERRUPT_EXTERNAL, 0);
@@ -4677,7 +4683,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 
 	tb = mftb();
 
-	kvmppc_update_vpa_dispatch_p9(vcpu, vc, tb + vc->tb_offset);
+	kvmppc_update_vpa_dispatch_p9(vcpu, vc, tb + kvmppc_get_tb_offset_hv(vcpu));
 
 	trace_kvm_guest_enter(vcpu);
 
@@ -4844,7 +4850,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 		msr |= MSR_VSX;
 	if ((cpu_has_feature(CPU_FTR_TM) ||
 	    cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) &&
-			(vcpu->arch.hfscr & HFSCR_TM))
+			(kvmppc_get_hfscr_hv(vcpu) & HFSCR_TM))
 		msr |= MSR_TM;
 	msr = msr_check_and_set(msr);
 
@@ -4868,7 +4874,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 		if (run->exit_reason = KVM_EXIT_PAPR_HCALL) {
 			accumulate_time(vcpu, &vcpu->arch.hcall);
 
-			if (WARN_ON_ONCE(vcpu->arch.shregs.msr & MSR_PR)) {
+			if (WARN_ON_ONCE(kvmppc_get_msr(vcpu) & MSR_PR)) {
 				/*
 				 * These should have been caught reflected
 				 * into the guest by now. Final sanity check:
diff --git a/arch/powerpc/kvm/book3s_hv.h b/arch/powerpc/kvm/book3s_hv.h
index 2f2e59d7d433..7a7005189ab1 100644
--- a/arch/powerpc/kvm/book3s_hv.h
+++ b/arch/powerpc/kvm/book3s_hv.h
@@ -50,3 +50,62 @@ void accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next);
 #define start_timing(vcpu, next) do {} while (0)
 #define end_timing(vcpu) do {} while (0)
 #endif
+
+#define HV_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
+{									\
+	vcpu->arch.reg = val;						\
+}
+
+#define HV_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
+{									\
+	return vcpu->arch.reg;						\
+}
+
+#define HV_WRAPPER(reg, size)						\
+	HV_WRAPPER_SET(reg, size)					\
+	HV_WRAPPER_GET(reg, size)					\
+
+#define HV_ARRAY_WRAPPER_SET(reg, size)					\
+static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, int i, u##size val)	\
+{									\
+	vcpu->arch.reg[i] = val;					\
+}
+
+#define HV_ARRAY_WRAPPER_GET(reg, size)					\
+static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu, int i)	\
+{									\
+	return vcpu->arch.reg[i];					\
+}
+
+#define HV_ARRAY_WRAPPER(reg, size)					\
+	HV_ARRAY_WRAPPER_SET(reg, size)					\
+	HV_ARRAY_WRAPPER_GET(reg, size)					\
+
+HV_WRAPPER(mmcra, 64)
+HV_WRAPPER(hfscr, 64)
+HV_WRAPPER(fscr, 64)
+HV_WRAPPER(dscr, 64)
+HV_WRAPPER(purr, 64)
+HV_WRAPPER(spurr, 64)
+HV_WRAPPER(amr, 64)
+HV_WRAPPER(uamor, 64)
+HV_WRAPPER(siar, 64)
+HV_WRAPPER(sdar, 64)
+HV_WRAPPER(iamr, 64)
+HV_WRAPPER(dawr0, 64)
+HV_WRAPPER(dawr1, 64)
+HV_WRAPPER(dawrx0, 64)
+HV_WRAPPER(dawrx1, 64)
+HV_WRAPPER(ciabr, 64)
+HV_WRAPPER(wort, 64)
+HV_WRAPPER(ppr, 64)
+HV_WRAPPER(ctrl, 64)
+
+HV_ARRAY_WRAPPER(mmcr, 64)
+HV_ARRAY_WRAPPER(sier, 64)
+HV_ARRAY_WRAPPER(pmc, 32)
+
+HV_WRAPPER(pvr, 32)
+HV_WRAPPER(pspb, 32)
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index da85f046377a..9f9e9aab6015 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -182,9 +182,13 @@ EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);
 
 long kvmppc_rm_h_random(struct kvm_vcpu *vcpu)
 {
+	unsigned long rand;
+
 	if (ppc_md.get_random_seed &&
-	    ppc_md.get_random_seed(&vcpu->arch.regs.gpr[4]))
+	    ppc_md.get_random_seed(&rand)) {
+		kvmppc_set_gpr(vcpu, 4, rand);
 		return H_SUCCESS;
+	}
 
 	return H_HARDWARE;
 }
@@ -510,7 +514,7 @@ void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
 	 */
 	if ((msr & MSR_TS_MASK) = MSR_TS_MASK)
 		msr &= ~MSR_TS_MASK;
-	vcpu->arch.shregs.msr = msr;
+	kvmppc_set_msr_fast(vcpu, msr);
 	kvmppc_end_cede(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvmppc_set_msr_hv);
@@ -548,7 +552,7 @@ static void inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 srr1_flags)
 	kvmppc_set_srr0(vcpu, pc);
 	kvmppc_set_srr1(vcpu, (msr & SRR1_MSR_BITS) | srr1_flags);
 	kvmppc_set_pc(vcpu, new_pc);
-	vcpu->arch.shregs.msr = new_msr;
+	kvmppc_set_msr_fast(vcpu, new_msr);
 }
 
 void kvmppc_inject_interrupt_hv(struct kvm_vcpu *vcpu, int vec, u64 srr1_flags)
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 34f1db212824..34bc0a8a1288 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -305,7 +305,7 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6
 	u32 pid;
 
 	lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
-	pid = vcpu->arch.pid;
+	pid = kvmppc_get_pid(vcpu);
 
 	/*
 	 * Prior memory accesses to host PID Q3 must be completed before we
@@ -330,7 +330,7 @@ static void switch_mmu_to_guest_hpt(struct kvm *kvm, struct kvm_vcpu *vcpu, u64
 	int i;
 
 	lpid = kvm->arch.lpid;
-	pid = vcpu->arch.pid;
+	pid = kvmppc_get_pid(vcpu);
 
 	/*
 	 * See switch_mmu_to_guest_radix. ptesync should not be required here
diff --git a/arch/powerpc/kvm/book3s_hv_ras.c b/arch/powerpc/kvm/book3s_hv_ras.c
index ccfd96965630..3b43c3d00311 100644
--- a/arch/powerpc/kvm/book3s_hv_ras.c
+++ b/arch/powerpc/kvm/book3s_hv_ras.c
@@ -15,6 +15,7 @@
 #include <asm/cputhreads.h>
 #include <asm/hmi.h>
 #include <asm/kvm_ppc.h>
+#include "book3s_hv.h"
 
 /* SRR1 bits for machine check on POWER7 */
 #define SRR1_MC_LDSTERR		(1ul << (63-42))
@@ -173,14 +174,14 @@ long kvmppc_p9_realmode_hmi_handler(struct kvm_vcpu *vcpu)
 		ppc_md.hmi_exception_early(NULL);
 
 out:
-	if (vc->tb_offset) {
+	if (kvmppc_get_tb_offset_hv(vcpu)) {
 		u64 new_tb = mftb() + vc->tb_offset;
 		mtspr(SPRN_TBU40, new_tb);
 		if ((mftb() & 0xffffff) < (new_tb & 0xffffff)) {
 			new_tb += 0x1000000;
 			mtspr(SPRN_TBU40, new_tb);
 		}
-		vc->tb_offset_applied = vc->tb_offset;
+		vc->tb_offset_applied = kvmppc_get_tb_offset_hv(vcpu);
 	}
 
 	return ret;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9182324dbef9..17cb75a127b0 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -776,8 +776,8 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 			r = rev[i].guest_rpte | (r & (HPTE_R_R | HPTE_R_C));
 			r &= ~HPTE_GR_RESERVED;
 		}
-		vcpu->arch.regs.gpr[4 + i * 2] = v;
-		vcpu->arch.regs.gpr[5 + i * 2] = r;
+		kvmppc_set_gpr(vcpu, 4 + i * 2, v);
+		kvmppc_set_gpr(vcpu, 5 + i * 2, r);
 	}
 	return H_SUCCESS;
 }
@@ -824,7 +824,7 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
 			}
 		}
 	}
-	vcpu->arch.regs.gpr[4] = gr;
+	kvmppc_set_gpr(vcpu, 4, gr);
 	ret = H_SUCCESS;
  out:
 	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
@@ -872,7 +872,7 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
 			kvmppc_set_dirty_from_hpte(kvm, v, gr);
 		}
 	}
-	vcpu->arch.regs.gpr[4] = gr;
+	kvmppc_set_gpr(vcpu, 4, gr);
 	ret = H_SUCCESS;
  out:
 	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index e165bfa842bf..e42984878503 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -481,7 +481,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 
 unsigned long xics_rm_h_xirr_x(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.regs.gpr[5] = get_tb();
+	kvmppc_set_gpr(vcpu, 5, get_tb());
 	return xics_rm_h_xirr(vcpu);
 }
 
@@ -518,7 +518,7 @@ unsigned long xics_rm_h_xirr(struct kvm_vcpu *vcpu)
 	} while (!icp_rm_try_update(icp, old_state, new_state));
 
 	/* Return the result in GPR4 */
-	vcpu->arch.regs.gpr[4] = xirr;
+	kvmppc_set_gpr(vcpu, 4, xirr);
 
 	return check_too_hard(xics, icp);
 }
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index f4115819e738..4adff4f1896d 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -29,6 +29,7 @@
 #include <linux/seq_file.h>
 
 #include "book3s_xive.h"
+#include "book3s_hv.h"
 
 #define __x_eoi_page(xd)	((void __iomem *)((xd)->eoi_mmio))
 #define __x_trig_page(xd)	((void __iomem *)((xd)->trig_mmio))
@@ -328,7 +329,7 @@ static unsigned long xive_vm_h_xirr(struct kvm_vcpu *vcpu)
 	 */
 
 	/* Return interrupt and old CPPR in GPR4 */
-	vcpu->arch.regs.gpr[4] = hirq | (old_cppr << 24);
+	kvmppc_set_gpr(vcpu, 4, hirq | (old_cppr << 24));
 
 	return H_SUCCESS;
 }
@@ -364,7 +365,7 @@ static unsigned long xive_vm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server
 	hirq = xive_vm_scan_interrupts(xc, pending, scan_poll);
 
 	/* Return interrupt and old CPPR in GPR4 */
-	vcpu->arch.regs.gpr[4] = hirq | (xc->cppr << 24);
+	kvmppc_set_gpr(vcpu, 4, hirq | (xc->cppr << 24));
 
 	return H_SUCCESS;
 }
@@ -2779,8 +2780,6 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 
 int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 {
-	struct kvmppc_vcore *vc = vcpu->arch.vcore;
-
 	/* The VM should have configured XICS mode before doing XICS hcalls. */
 	if (!kvmppc_xics_enabled(vcpu))
 		return H_TOO_HARD;
@@ -2799,7 +2798,7 @@ int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 		return xive_vm_h_ipoll(vcpu, kvmppc_get_gpr(vcpu, 4));
 	case H_XIRR_X:
 		xive_vm_h_xirr(vcpu);
-		kvmppc_set_gpr(vcpu, 5, get_tb() + vc->tb_offset);
+		kvmppc_set_gpr(vcpu, 5, get_tb() + kvmppc_get_tb_offset_hv(vcpu));
 		return H_SUCCESS;
 	}
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 7197c8256668..ca9793c3d437 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1729,7 +1729,7 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 			val = get_reg_val(reg->id, vcpu->arch.vr.vscr.u[3]);
 			break;
 		case KVM_REG_PPC_VRSAVE:
-			val = get_reg_val(reg->id, vcpu->arch.vrsave);
+			val = get_reg_val(reg->id, kvmppc_get_vrsave(vcpu));
 			break;
 #endif /* CONFIG_ALTIVEC */
 		default:
@@ -1784,7 +1784,7 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 				r = -ENXIO;
 				break;
 			}
-			vcpu->arch.vrsave = set_reg_val(reg->id, val);
+			kvmppc_set_vrsave(vcpu, set_reg_val(reg->id, val));
 			break;
 #endif /* CONFIG_ALTIVEC */
 		default:
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 2/6] KVM: PPC: Add fpr getters and setters
  2023-06-05  6:48 ` Jordan Niethe
  (?)
@ 2023-06-05  6:48   ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

Add wrappers for fpr registers to prepare for supporting PAPR nested
guests.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s.h | 31 +++++++++++++++++++++++++++
 arch/powerpc/include/asm/kvm_booke.h  | 10 +++++++++
 arch/powerpc/kvm/book3s.c             | 16 +++++++-------
 arch/powerpc/kvm/emulate_loadstore.c  |  2 +-
 arch/powerpc/kvm/powerpc.c            | 22 +++++++++----------
 5 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 4e91f54a3f9f..a632e79639f0 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -413,6 +413,37 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault_dar;
 }
 
+static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
+{
+	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
+}
+
+static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
+{
+	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
+}
+
+static inline u64 kvmppc_get_fpscr(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.fp.fpscr;
+}
+
+static inline void kvmppc_set_fpscr(struct kvm_vcpu *vcpu, u64 val)
+{
+	vcpu->arch.fp.fpscr = val;
+}
+
+
+static inline u64 kvmppc_get_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j)
+{
+	return vcpu->arch.fp.fpr[i][j];
+}
+
+static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 val)
+{
+	vcpu->arch.fp.fpr[i][j] = val;
+}
+
 #define BOOK3S_WRAPPER_SET(reg, size)					\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
index 0c3401b2e19e..7c3291aa8922 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -89,6 +89,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 	return vcpu->arch.regs.nip;
 }
 
+static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
+{
+	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
+}
+
+static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
+{
+	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
+}
+
 #ifdef CONFIG_BOOKE
 static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2fe31b518886..6cd20ab9e94e 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -636,17 +636,17 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			break;
 		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
 			i = id - KVM_REG_PPC_FPR0;
-			*val = get_reg_val(id, VCPU_FPR(vcpu, i));
+			*val = get_reg_val(id, kvmppc_get_fpr(vcpu, i));
 			break;
 		case KVM_REG_PPC_FPSCR:
-			*val = get_reg_val(id, vcpu->arch.fp.fpscr);
+			*val = get_reg_val(id, kvmppc_get_fpscr(vcpu));
 			break;
 #ifdef CONFIG_VSX
 		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
 			if (cpu_has_feature(CPU_FTR_VSX)) {
 				i = id - KVM_REG_PPC_VSR0;
-				val->vsxval[0] = vcpu->arch.fp.fpr[i][0];
-				val->vsxval[1] = vcpu->arch.fp.fpr[i][1];
+				val->vsxval[0] = kvmppc_get_vsx_fpr(vcpu, i, 0);
+				val->vsxval[1] = kvmppc_get_vsx_fpr(vcpu, i, 1);
 			} else {
 				r = -ENXIO;
 			}
@@ -724,7 +724,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			break;
 		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
 			i = id - KVM_REG_PPC_FPR0;
-			VCPU_FPR(vcpu, i) = set_reg_val(id, *val);
+			kvmppc_set_fpr(vcpu, i, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_FPSCR:
 			vcpu->arch.fp.fpscr = set_reg_val(id, *val);
@@ -733,8 +733,8 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
 			if (cpu_has_feature(CPU_FTR_VSX)) {
 				i = id - KVM_REG_PPC_VSR0;
-				vcpu->arch.fp.fpr[i][0] = val->vsxval[0];
-				vcpu->arch.fp.fpr[i][1] = val->vsxval[1];
+				kvmppc_set_vsx_fpr(vcpu, i, 0, val->vsxval[0]);
+				kvmppc_set_vsx_fpr(vcpu, i, 1, val->vsxval[1]);
 			} else {
 				r = -ENXIO;
 			}
@@ -765,7 +765,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			break;
 #endif /* CONFIG_KVM_XIVE */
 		case KVM_REG_PPC_FSCR:
-			vcpu->arch.fscr = set_reg_val(id, *val);
+			kvmppc_set_fpscr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_TAR:
 			kvmppc_set_tar(vcpu, set_reg_val(id, *val));
diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c
index 059c08ae0340..e6e66c3792f8 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -250,7 +250,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 				vcpu->arch.mmio_sp64_extend = 1;
 
 			emulated = kvmppc_handle_store(vcpu,
-					VCPU_FPR(vcpu, op.reg), size, 1);
+					kvmppc_get_fpr(vcpu, op.reg), size, 1);
 
 			if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
 				kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index ca9793c3d437..7f913e68342a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -938,7 +938,7 @@ static inline void kvmppc_set_vsr_dword(struct kvm_vcpu *vcpu,
 		val.vsxval[offset] = gpr;
 		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
 	} else {
-		VCPU_VSX_FPR(vcpu, index, offset) = gpr;
+		kvmppc_set_vsx_fpr(vcpu, index, offset, gpr);
 	}
 }
 
@@ -954,8 +954,8 @@ static inline void kvmppc_set_vsr_dword_dump(struct kvm_vcpu *vcpu,
 		val.vsxval[1] = gpr;
 		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
 	} else {
-		VCPU_VSX_FPR(vcpu, index, 0) = gpr;
-		VCPU_VSX_FPR(vcpu, index, 1) = gpr;
+		kvmppc_set_vsx_fpr(vcpu, index, 0, gpr);
+		kvmppc_set_vsx_fpr(vcpu, index, 1,  gpr);
 	}
 }
 
@@ -974,8 +974,8 @@ static inline void kvmppc_set_vsr_word_dump(struct kvm_vcpu *vcpu,
 	} else {
 		val.vsx32val[0] = gpr;
 		val.vsx32val[1] = gpr;
-		VCPU_VSX_FPR(vcpu, index, 0) = val.vsxval[0];
-		VCPU_VSX_FPR(vcpu, index, 1) = val.vsxval[0];
+		kvmppc_set_vsx_fpr(vcpu, index, 0, val.vsxval[0]);
+		kvmppc_set_vsx_fpr(vcpu, index, 1, val.vsxval[0]);
 	}
 }
 
@@ -997,9 +997,9 @@ static inline void kvmppc_set_vsr_word(struct kvm_vcpu *vcpu,
 	} else {
 		dword_offset = offset / 2;
 		word_offset = offset % 2;
-		val.vsxval[0] = VCPU_VSX_FPR(vcpu, index, dword_offset);
+		val.vsxval[0] = kvmppc_get_vsx_fpr(vcpu, index, dword_offset);
 		val.vsx32val[word_offset] = gpr32;
-		VCPU_VSX_FPR(vcpu, index, dword_offset) = val.vsxval[0];
+		kvmppc_set_vsx_fpr(vcpu, index, dword_offset, val.vsxval[0]);
 	}
 }
 #endif /* CONFIG_VSX */
@@ -1194,14 +1194,14 @@ static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu)
 		if (vcpu->kvm->arch.kvm_ops->giveup_ext)
 			vcpu->kvm->arch.kvm_ops->giveup_ext(vcpu, MSR_FP);
 
-		VCPU_FPR(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK) = gpr;
+		kvmppc_set_fpr(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK, gpr);
 		break;
 #ifdef CONFIG_PPC_BOOK3S
 	case KVM_MMIO_REG_QPR:
 		vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_MMIO_REG_MASK] = gpr;
 		break;
 	case KVM_MMIO_REG_FQPR:
-		VCPU_FPR(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK) = gpr;
+		kvmppc_set_fpr(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK, gpr);
 		vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_MMIO_REG_MASK] = gpr;
 		break;
 #endif
@@ -1419,7 +1419,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 		}
 
 		if (rs < 32) {
-			*val = VCPU_VSX_FPR(vcpu, rs, vsx_offset);
+			*val = kvmppc_get_vsx_fpr(vcpu, rs, vsx_offset);
 		} else {
 			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
 			*val = reg.vsxval[vsx_offset];
@@ -1438,7 +1438,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 		if (rs < 32) {
 			dword_offset = vsx_offset / 2;
 			word_offset = vsx_offset % 2;
-			reg.vsxval[0] = VCPU_VSX_FPR(vcpu, rs, dword_offset);
+			reg.vsxval[0] = kvmppc_get_vsx_fpr(vcpu, rs, dword_offset);
 			*val = reg.vsx32val[word_offset];
 		} else {
 			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 2/6] KVM: PPC: Add fpr getters and setters
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Jordan Niethe, mikey, kautuk.consul.1980, kvm, npiggin, kvm-ppc,
	sbhat, vaibhav

Add wrappers for fpr registers to prepare for supporting PAPR nested
guests.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s.h | 31 +++++++++++++++++++++++++++
 arch/powerpc/include/asm/kvm_booke.h  | 10 +++++++++
 arch/powerpc/kvm/book3s.c             | 16 +++++++-------
 arch/powerpc/kvm/emulate_loadstore.c  |  2 +-
 arch/powerpc/kvm/powerpc.c            | 22 +++++++++----------
 5 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 4e91f54a3f9f..a632e79639f0 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -413,6 +413,37 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault_dar;
 }
 
+static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
+{
+	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
+}
+
+static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
+{
+	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
+}
+
+static inline u64 kvmppc_get_fpscr(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.fp.fpscr;
+}
+
+static inline void kvmppc_set_fpscr(struct kvm_vcpu *vcpu, u64 val)
+{
+	vcpu->arch.fp.fpscr = val;
+}
+
+
+static inline u64 kvmppc_get_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j)
+{
+	return vcpu->arch.fp.fpr[i][j];
+}
+
+static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 val)
+{
+	vcpu->arch.fp.fpr[i][j] = val;
+}
+
 #define BOOK3S_WRAPPER_SET(reg, size)					\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
index 0c3401b2e19e..7c3291aa8922 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -89,6 +89,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 	return vcpu->arch.regs.nip;
 }
 
+static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
+{
+	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
+}
+
+static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
+{
+	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
+}
+
 #ifdef CONFIG_BOOKE
 static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2fe31b518886..6cd20ab9e94e 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -636,17 +636,17 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			break;
 		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
 			i = id - KVM_REG_PPC_FPR0;
-			*val = get_reg_val(id, VCPU_FPR(vcpu, i));
+			*val = get_reg_val(id, kvmppc_get_fpr(vcpu, i));
 			break;
 		case KVM_REG_PPC_FPSCR:
-			*val = get_reg_val(id, vcpu->arch.fp.fpscr);
+			*val = get_reg_val(id, kvmppc_get_fpscr(vcpu));
 			break;
 #ifdef CONFIG_VSX
 		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
 			if (cpu_has_feature(CPU_FTR_VSX)) {
 				i = id - KVM_REG_PPC_VSR0;
-				val->vsxval[0] = vcpu->arch.fp.fpr[i][0];
-				val->vsxval[1] = vcpu->arch.fp.fpr[i][1];
+				val->vsxval[0] = kvmppc_get_vsx_fpr(vcpu, i, 0);
+				val->vsxval[1] = kvmppc_get_vsx_fpr(vcpu, i, 1);
 			} else {
 				r = -ENXIO;
 			}
@@ -724,7 +724,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			break;
 		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
 			i = id - KVM_REG_PPC_FPR0;
-			VCPU_FPR(vcpu, i) = set_reg_val(id, *val);
+			kvmppc_set_fpr(vcpu, i, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_FPSCR:
 			vcpu->arch.fp.fpscr = set_reg_val(id, *val);
@@ -733,8 +733,8 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
 			if (cpu_has_feature(CPU_FTR_VSX)) {
 				i = id - KVM_REG_PPC_VSR0;
-				vcpu->arch.fp.fpr[i][0] = val->vsxval[0];
-				vcpu->arch.fp.fpr[i][1] = val->vsxval[1];
+				kvmppc_set_vsx_fpr(vcpu, i, 0, val->vsxval[0]);
+				kvmppc_set_vsx_fpr(vcpu, i, 1, val->vsxval[1]);
 			} else {
 				r = -ENXIO;
 			}
@@ -765,7 +765,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			break;
 #endif /* CONFIG_KVM_XIVE */
 		case KVM_REG_PPC_FSCR:
-			vcpu->arch.fscr = set_reg_val(id, *val);
+			kvmppc_set_fpscr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_TAR:
 			kvmppc_set_tar(vcpu, set_reg_val(id, *val));
diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c
index 059c08ae0340..e6e66c3792f8 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -250,7 +250,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 				vcpu->arch.mmio_sp64_extend = 1;
 
 			emulated = kvmppc_handle_store(vcpu,
-					VCPU_FPR(vcpu, op.reg), size, 1);
+					kvmppc_get_fpr(vcpu, op.reg), size, 1);
 
 			if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
 				kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index ca9793c3d437..7f913e68342a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -938,7 +938,7 @@ static inline void kvmppc_set_vsr_dword(struct kvm_vcpu *vcpu,
 		val.vsxval[offset] = gpr;
 		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
 	} else {
-		VCPU_VSX_FPR(vcpu, index, offset) = gpr;
+		kvmppc_set_vsx_fpr(vcpu, index, offset, gpr);
 	}
 }
 
@@ -954,8 +954,8 @@ static inline void kvmppc_set_vsr_dword_dump(struct kvm_vcpu *vcpu,
 		val.vsxval[1] = gpr;
 		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
 	} else {
-		VCPU_VSX_FPR(vcpu, index, 0) = gpr;
-		VCPU_VSX_FPR(vcpu, index, 1) = gpr;
+		kvmppc_set_vsx_fpr(vcpu, index, 0, gpr);
+		kvmppc_set_vsx_fpr(vcpu, index, 1,  gpr);
 	}
 }
 
@@ -974,8 +974,8 @@ static inline void kvmppc_set_vsr_word_dump(struct kvm_vcpu *vcpu,
 	} else {
 		val.vsx32val[0] = gpr;
 		val.vsx32val[1] = gpr;
-		VCPU_VSX_FPR(vcpu, index, 0) = val.vsxval[0];
-		VCPU_VSX_FPR(vcpu, index, 1) = val.vsxval[0];
+		kvmppc_set_vsx_fpr(vcpu, index, 0, val.vsxval[0]);
+		kvmppc_set_vsx_fpr(vcpu, index, 1, val.vsxval[0]);
 	}
 }
 
@@ -997,9 +997,9 @@ static inline void kvmppc_set_vsr_word(struct kvm_vcpu *vcpu,
 	} else {
 		dword_offset = offset / 2;
 		word_offset = offset % 2;
-		val.vsxval[0] = VCPU_VSX_FPR(vcpu, index, dword_offset);
+		val.vsxval[0] = kvmppc_get_vsx_fpr(vcpu, index, dword_offset);
 		val.vsx32val[word_offset] = gpr32;
-		VCPU_VSX_FPR(vcpu, index, dword_offset) = val.vsxval[0];
+		kvmppc_set_vsx_fpr(vcpu, index, dword_offset, val.vsxval[0]);
 	}
 }
 #endif /* CONFIG_VSX */
@@ -1194,14 +1194,14 @@ static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu)
 		if (vcpu->kvm->arch.kvm_ops->giveup_ext)
 			vcpu->kvm->arch.kvm_ops->giveup_ext(vcpu, MSR_FP);
 
-		VCPU_FPR(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK) = gpr;
+		kvmppc_set_fpr(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK, gpr);
 		break;
 #ifdef CONFIG_PPC_BOOK3S
 	case KVM_MMIO_REG_QPR:
 		vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_MMIO_REG_MASK] = gpr;
 		break;
 	case KVM_MMIO_REG_FQPR:
-		VCPU_FPR(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK) = gpr;
+		kvmppc_set_fpr(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK, gpr);
 		vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_MMIO_REG_MASK] = gpr;
 		break;
 #endif
@@ -1419,7 +1419,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 		}
 
 		if (rs < 32) {
-			*val = VCPU_VSX_FPR(vcpu, rs, vsx_offset);
+			*val = kvmppc_get_vsx_fpr(vcpu, rs, vsx_offset);
 		} else {
 			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
 			*val = reg.vsxval[vsx_offset];
@@ -1438,7 +1438,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 		if (rs < 32) {
 			dword_offset = vsx_offset / 2;
 			word_offset = vsx_offset % 2;
-			reg.vsxval[0] = VCPU_VSX_FPR(vcpu, rs, dword_offset);
+			reg.vsxval[0] = kvmppc_get_vsx_fpr(vcpu, rs, dword_offset);
 			*val = reg.vsx32val[word_offset];
 		} else {
 			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 2/6] KVM: PPC: Add fpr getters and setters
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

Add wrappers for fpr registers to prepare for supporting PAPR nested
guests.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s.h | 31 +++++++++++++++++++++++++++
 arch/powerpc/include/asm/kvm_booke.h  | 10 +++++++++
 arch/powerpc/kvm/book3s.c             | 16 +++++++-------
 arch/powerpc/kvm/emulate_loadstore.c  |  2 +-
 arch/powerpc/kvm/powerpc.c            | 22 +++++++++----------
 5 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 4e91f54a3f9f..a632e79639f0 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -413,6 +413,37 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault_dar;
 }
 
+static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
+{
+	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
+}
+
+static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
+{
+	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
+}
+
+static inline u64 kvmppc_get_fpscr(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.fp.fpscr;
+}
+
+static inline void kvmppc_set_fpscr(struct kvm_vcpu *vcpu, u64 val)
+{
+	vcpu->arch.fp.fpscr = val;
+}
+
+
+static inline u64 kvmppc_get_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j)
+{
+	return vcpu->arch.fp.fpr[i][j];
+}
+
+static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 val)
+{
+	vcpu->arch.fp.fpr[i][j] = val;
+}
+
 #define BOOK3S_WRAPPER_SET(reg, size)					\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
index 0c3401b2e19e..7c3291aa8922 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -89,6 +89,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 	return vcpu->arch.regs.nip;
 }
 
+static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
+{
+	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
+}
+
+static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
+{
+	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
+}
+
 #ifdef CONFIG_BOOKE
 static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2fe31b518886..6cd20ab9e94e 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -636,17 +636,17 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			break;
 		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
 			i = id - KVM_REG_PPC_FPR0;
-			*val = get_reg_val(id, VCPU_FPR(vcpu, i));
+			*val = get_reg_val(id, kvmppc_get_fpr(vcpu, i));
 			break;
 		case KVM_REG_PPC_FPSCR:
-			*val = get_reg_val(id, vcpu->arch.fp.fpscr);
+			*val = get_reg_val(id, kvmppc_get_fpscr(vcpu));
 			break;
 #ifdef CONFIG_VSX
 		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
 			if (cpu_has_feature(CPU_FTR_VSX)) {
 				i = id - KVM_REG_PPC_VSR0;
-				val->vsxval[0] = vcpu->arch.fp.fpr[i][0];
-				val->vsxval[1] = vcpu->arch.fp.fpr[i][1];
+				val->vsxval[0] = kvmppc_get_vsx_fpr(vcpu, i, 0);
+				val->vsxval[1] = kvmppc_get_vsx_fpr(vcpu, i, 1);
 			} else {
 				r = -ENXIO;
 			}
@@ -724,7 +724,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			break;
 		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
 			i = id - KVM_REG_PPC_FPR0;
-			VCPU_FPR(vcpu, i) = set_reg_val(id, *val);
+			kvmppc_set_fpr(vcpu, i, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_FPSCR:
 			vcpu->arch.fp.fpscr = set_reg_val(id, *val);
@@ -733,8 +733,8 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
 			if (cpu_has_feature(CPU_FTR_VSX)) {
 				i = id - KVM_REG_PPC_VSR0;
-				vcpu->arch.fp.fpr[i][0] = val->vsxval[0];
-				vcpu->arch.fp.fpr[i][1] = val->vsxval[1];
+				kvmppc_set_vsx_fpr(vcpu, i, 0, val->vsxval[0]);
+				kvmppc_set_vsx_fpr(vcpu, i, 1, val->vsxval[1]);
 			} else {
 				r = -ENXIO;
 			}
@@ -765,7 +765,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 			break;
 #endif /* CONFIG_KVM_XIVE */
 		case KVM_REG_PPC_FSCR:
-			vcpu->arch.fscr = set_reg_val(id, *val);
+			kvmppc_set_fpscr(vcpu, set_reg_val(id, *val));
 			break;
 		case KVM_REG_PPC_TAR:
 			kvmppc_set_tar(vcpu, set_reg_val(id, *val));
diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c
index 059c08ae0340..e6e66c3792f8 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -250,7 +250,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 				vcpu->arch.mmio_sp64_extend = 1;
 
 			emulated = kvmppc_handle_store(vcpu,
-					VCPU_FPR(vcpu, op.reg), size, 1);
+					kvmppc_get_fpr(vcpu, op.reg), size, 1);
 
 			if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
 				kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index ca9793c3d437..7f913e68342a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -938,7 +938,7 @@ static inline void kvmppc_set_vsr_dword(struct kvm_vcpu *vcpu,
 		val.vsxval[offset] = gpr;
 		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
 	} else {
-		VCPU_VSX_FPR(vcpu, index, offset) = gpr;
+		kvmppc_set_vsx_fpr(vcpu, index, offset, gpr);
 	}
 }
 
@@ -954,8 +954,8 @@ static inline void kvmppc_set_vsr_dword_dump(struct kvm_vcpu *vcpu,
 		val.vsxval[1] = gpr;
 		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
 	} else {
-		VCPU_VSX_FPR(vcpu, index, 0) = gpr;
-		VCPU_VSX_FPR(vcpu, index, 1) = gpr;
+		kvmppc_set_vsx_fpr(vcpu, index, 0, gpr);
+		kvmppc_set_vsx_fpr(vcpu, index, 1,  gpr);
 	}
 }
 
@@ -974,8 +974,8 @@ static inline void kvmppc_set_vsr_word_dump(struct kvm_vcpu *vcpu,
 	} else {
 		val.vsx32val[0] = gpr;
 		val.vsx32val[1] = gpr;
-		VCPU_VSX_FPR(vcpu, index, 0) = val.vsxval[0];
-		VCPU_VSX_FPR(vcpu, index, 1) = val.vsxval[0];
+		kvmppc_set_vsx_fpr(vcpu, index, 0, val.vsxval[0]);
+		kvmppc_set_vsx_fpr(vcpu, index, 1, val.vsxval[0]);
 	}
 }
 
@@ -997,9 +997,9 @@ static inline void kvmppc_set_vsr_word(struct kvm_vcpu *vcpu,
 	} else {
 		dword_offset = offset / 2;
 		word_offset = offset % 2;
-		val.vsxval[0] = VCPU_VSX_FPR(vcpu, index, dword_offset);
+		val.vsxval[0] = kvmppc_get_vsx_fpr(vcpu, index, dword_offset);
 		val.vsx32val[word_offset] = gpr32;
-		VCPU_VSX_FPR(vcpu, index, dword_offset) = val.vsxval[0];
+		kvmppc_set_vsx_fpr(vcpu, index, dword_offset, val.vsxval[0]);
 	}
 }
 #endif /* CONFIG_VSX */
@@ -1194,14 +1194,14 @@ static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu)
 		if (vcpu->kvm->arch.kvm_ops->giveup_ext)
 			vcpu->kvm->arch.kvm_ops->giveup_ext(vcpu, MSR_FP);
 
-		VCPU_FPR(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK) = gpr;
+		kvmppc_set_fpr(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK, gpr);
 		break;
 #ifdef CONFIG_PPC_BOOK3S
 	case KVM_MMIO_REG_QPR:
 		vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_MMIO_REG_MASK] = gpr;
 		break;
 	case KVM_MMIO_REG_FQPR:
-		VCPU_FPR(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK) = gpr;
+		kvmppc_set_fpr(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK, gpr);
 		vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_MMIO_REG_MASK] = gpr;
 		break;
 #endif
@@ -1419,7 +1419,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 		}
 
 		if (rs < 32) {
-			*val = VCPU_VSX_FPR(vcpu, rs, vsx_offset);
+			*val = kvmppc_get_vsx_fpr(vcpu, rs, vsx_offset);
 		} else {
 			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
 			*val = reg.vsxval[vsx_offset];
@@ -1438,7 +1438,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 		if (rs < 32) {
 			dword_offset = vsx_offset / 2;
 			word_offset = vsx_offset % 2;
-			reg.vsxval[0] = VCPU_VSX_FPR(vcpu, rs, dword_offset);
+			reg.vsxval[0] = kvmppc_get_vsx_fpr(vcpu, rs, dword_offset);
 			*val = reg.vsx32val[word_offset];
 		} else {
 			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 3/6] KVM: PPC: Add vr getters and setters
  2023-06-05  6:48 ` Jordan Niethe
  (?)
@ 2023-06-05  6:48   ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

Add wrappers for vr registers to prepare for supporting PAPR nested
guests.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s.h | 20 +++++++++++
 arch/powerpc/kvm/powerpc.c            | 50 +++++++++++++--------------
 2 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index a632e79639f0..77653c5b356b 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -444,6 +444,26 @@ static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 v
 	vcpu->arch.fp.fpr[i][j] = val;
 }
 
+static inline vector128 kvmppc_get_vsx_vr(struct kvm_vcpu *vcpu, int i)
+{
+	return vcpu->arch.vr.vr[i];
+}
+
+static inline void kvmppc_set_vsx_vr(struct kvm_vcpu *vcpu, int i, vector128 val)
+{
+	vcpu->arch.vr.vr[i] = val;
+}
+
+static inline u32 kvmppc_get_vscr(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.vr.vscr.u[3];
+}
+
+static inline void kvmppc_set_vscr(struct kvm_vcpu *vcpu, u32 val)
+{
+	vcpu->arch.vr.vscr.u[3] = val;
+}
+
 #define BOOK3S_WRAPPER_SET(reg, size)					\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 7f913e68342a..10436213aea2 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -934,9 +934,9 @@ static inline void kvmppc_set_vsr_dword(struct kvm_vcpu *vcpu,
 		return;
 
 	if (index >= 32) {
-		val.vval = VCPU_VSX_VR(vcpu, index - 32);
+		val.vval = kvmppc_get_vsx_vr(vcpu, index - 32);
 		val.vsxval[offset] = gpr;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		kvmppc_set_vsx_fpr(vcpu, index, offset, gpr);
 	}
@@ -949,10 +949,10 @@ static inline void kvmppc_set_vsr_dword_dump(struct kvm_vcpu *vcpu,
 	int index = vcpu->arch.io_gpr & KVM_MMIO_REG_MASK;
 
 	if (index >= 32) {
-		val.vval = VCPU_VSX_VR(vcpu, index - 32);
+		val.vval = kvmppc_get_vsx_vr(vcpu, index - 32);
 		val.vsxval[0] = gpr;
 		val.vsxval[1] = gpr;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		kvmppc_set_vsx_fpr(vcpu, index, 0, gpr);
 		kvmppc_set_vsx_fpr(vcpu, index, 1,  gpr);
@@ -970,7 +970,7 @@ static inline void kvmppc_set_vsr_word_dump(struct kvm_vcpu *vcpu,
 		val.vsx32val[1] = gpr;
 		val.vsx32val[2] = gpr;
 		val.vsx32val[3] = gpr;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		val.vsx32val[0] = gpr;
 		val.vsx32val[1] = gpr;
@@ -991,9 +991,9 @@ static inline void kvmppc_set_vsr_word(struct kvm_vcpu *vcpu,
 		return;
 
 	if (index >= 32) {
-		val.vval = VCPU_VSX_VR(vcpu, index - 32);
+		val.vval = kvmppc_get_vsx_vr(vcpu, index - 32);
 		val.vsx32val[offset] = gpr32;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		dword_offset = offset / 2;
 		word_offset = offset % 2;
@@ -1058,9 +1058,9 @@ static inline void kvmppc_set_vmx_dword(struct kvm_vcpu *vcpu,
 	if (offset == -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsxval[offset] = gpr;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 
 static inline void kvmppc_set_vmx_word(struct kvm_vcpu *vcpu,
@@ -1074,9 +1074,9 @@ static inline void kvmppc_set_vmx_word(struct kvm_vcpu *vcpu,
 	if (offset == -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsx32val[offset] = gpr32;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 
 static inline void kvmppc_set_vmx_hword(struct kvm_vcpu *vcpu,
@@ -1090,9 +1090,9 @@ static inline void kvmppc_set_vmx_hword(struct kvm_vcpu *vcpu,
 	if (offset == -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsx16val[offset] = gpr16;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 
 static inline void kvmppc_set_vmx_byte(struct kvm_vcpu *vcpu,
@@ -1106,9 +1106,9 @@ static inline void kvmppc_set_vmx_byte(struct kvm_vcpu *vcpu,
 	if (offset == -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsx8val[offset] = gpr8;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 #endif /* CONFIG_ALTIVEC */
 
@@ -1421,7 +1421,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 		if (rs < 32) {
 			*val = kvmppc_get_vsx_fpr(vcpu, rs, vsx_offset);
 		} else {
-			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
+			reg.vval = kvmppc_get_vsx_vr(vcpu, rs - 32);
 			*val = reg.vsxval[vsx_offset];
 		}
 		break;
@@ -1441,7 +1441,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 			reg.vsxval[0] = kvmppc_get_vsx_fpr(vcpu, rs, dword_offset);
 			*val = reg.vsx32val[word_offset];
 		} else {
-			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
+			reg.vval = kvmppc_get_vsx_vr(vcpu, rs - 32);
 			*val = reg.vsx32val[vsx_offset];
 		}
 		break;
@@ -1556,7 +1556,7 @@ static int kvmppc_get_vmx_dword(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset == -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsxval[vmx_offset];
 
 	return result;
@@ -1574,7 +1574,7 @@ static int kvmppc_get_vmx_word(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset == -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsx32val[vmx_offset];
 
 	return result;
@@ -1592,7 +1592,7 @@ static int kvmppc_get_vmx_hword(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset == -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsx16val[vmx_offset];
 
 	return result;
@@ -1610,7 +1610,7 @@ static int kvmppc_get_vmx_byte(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset == -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsx8val[vmx_offset];
 
 	return result;
@@ -1719,14 +1719,14 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 				r = -ENXIO;
 				break;
 			}
-			val.vval = vcpu->arch.vr.vr[reg->id - KVM_REG_PPC_VR0];
+			val.vval = kvmppc_get_vsx_vr(vcpu, reg->id - KVM_REG_PPC_VR0);
 			break;
 		case KVM_REG_PPC_VSCR:
 			if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
 				r = -ENXIO;
 				break;
 			}
-			val = get_reg_val(reg->id, vcpu->arch.vr.vscr.u[3]);
+			val = get_reg_val(reg->id, kvmppc_get_vscr(vcpu));
 			break;
 		case KVM_REG_PPC_VRSAVE:
 			val = get_reg_val(reg->id, kvmppc_get_vrsave(vcpu));
@@ -1770,14 +1770,14 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 				r = -ENXIO;
 				break;
 			}
-			vcpu->arch.vr.vr[reg->id - KVM_REG_PPC_VR0] = val.vval;
+			kvmppc_set_vsx_vr(vcpu, reg->id - KVM_REG_PPC_VR0, val.vval);
 			break;
 		case KVM_REG_PPC_VSCR:
 			if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
 				r = -ENXIO;
 				break;
 			}
-			vcpu->arch.vr.vscr.u[3] = set_reg_val(reg->id, val);
+			kvmppc_set_vscr(vcpu, set_reg_val(reg->id, val));
 			break;
 		case KVM_REG_PPC_VRSAVE:
 			if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 3/6] KVM: PPC: Add vr getters and setters
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Jordan Niethe, mikey, kautuk.consul.1980, kvm, npiggin, kvm-ppc,
	sbhat, vaibhav

Add wrappers for vr registers to prepare for supporting PAPR nested
guests.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s.h | 20 +++++++++++
 arch/powerpc/kvm/powerpc.c            | 50 +++++++++++++--------------
 2 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index a632e79639f0..77653c5b356b 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -444,6 +444,26 @@ static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 v
 	vcpu->arch.fp.fpr[i][j] = val;
 }
 
+static inline vector128 kvmppc_get_vsx_vr(struct kvm_vcpu *vcpu, int i)
+{
+	return vcpu->arch.vr.vr[i];
+}
+
+static inline void kvmppc_set_vsx_vr(struct kvm_vcpu *vcpu, int i, vector128 val)
+{
+	vcpu->arch.vr.vr[i] = val;
+}
+
+static inline u32 kvmppc_get_vscr(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.vr.vscr.u[3];
+}
+
+static inline void kvmppc_set_vscr(struct kvm_vcpu *vcpu, u32 val)
+{
+	vcpu->arch.vr.vscr.u[3] = val;
+}
+
 #define BOOK3S_WRAPPER_SET(reg, size)					\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 7f913e68342a..10436213aea2 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -934,9 +934,9 @@ static inline void kvmppc_set_vsr_dword(struct kvm_vcpu *vcpu,
 		return;
 
 	if (index >= 32) {
-		val.vval = VCPU_VSX_VR(vcpu, index - 32);
+		val.vval = kvmppc_get_vsx_vr(vcpu, index - 32);
 		val.vsxval[offset] = gpr;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		kvmppc_set_vsx_fpr(vcpu, index, offset, gpr);
 	}
@@ -949,10 +949,10 @@ static inline void kvmppc_set_vsr_dword_dump(struct kvm_vcpu *vcpu,
 	int index = vcpu->arch.io_gpr & KVM_MMIO_REG_MASK;
 
 	if (index >= 32) {
-		val.vval = VCPU_VSX_VR(vcpu, index - 32);
+		val.vval = kvmppc_get_vsx_vr(vcpu, index - 32);
 		val.vsxval[0] = gpr;
 		val.vsxval[1] = gpr;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		kvmppc_set_vsx_fpr(vcpu, index, 0, gpr);
 		kvmppc_set_vsx_fpr(vcpu, index, 1,  gpr);
@@ -970,7 +970,7 @@ static inline void kvmppc_set_vsr_word_dump(struct kvm_vcpu *vcpu,
 		val.vsx32val[1] = gpr;
 		val.vsx32val[2] = gpr;
 		val.vsx32val[3] = gpr;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		val.vsx32val[0] = gpr;
 		val.vsx32val[1] = gpr;
@@ -991,9 +991,9 @@ static inline void kvmppc_set_vsr_word(struct kvm_vcpu *vcpu,
 		return;
 
 	if (index >= 32) {
-		val.vval = VCPU_VSX_VR(vcpu, index - 32);
+		val.vval = kvmppc_get_vsx_vr(vcpu, index - 32);
 		val.vsx32val[offset] = gpr32;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		dword_offset = offset / 2;
 		word_offset = offset % 2;
@@ -1058,9 +1058,9 @@ static inline void kvmppc_set_vmx_dword(struct kvm_vcpu *vcpu,
 	if (offset == -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsxval[offset] = gpr;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 
 static inline void kvmppc_set_vmx_word(struct kvm_vcpu *vcpu,
@@ -1074,9 +1074,9 @@ static inline void kvmppc_set_vmx_word(struct kvm_vcpu *vcpu,
 	if (offset == -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsx32val[offset] = gpr32;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 
 static inline void kvmppc_set_vmx_hword(struct kvm_vcpu *vcpu,
@@ -1090,9 +1090,9 @@ static inline void kvmppc_set_vmx_hword(struct kvm_vcpu *vcpu,
 	if (offset == -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsx16val[offset] = gpr16;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 
 static inline void kvmppc_set_vmx_byte(struct kvm_vcpu *vcpu,
@@ -1106,9 +1106,9 @@ static inline void kvmppc_set_vmx_byte(struct kvm_vcpu *vcpu,
 	if (offset == -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsx8val[offset] = gpr8;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 #endif /* CONFIG_ALTIVEC */
 
@@ -1421,7 +1421,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 		if (rs < 32) {
 			*val = kvmppc_get_vsx_fpr(vcpu, rs, vsx_offset);
 		} else {
-			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
+			reg.vval = kvmppc_get_vsx_vr(vcpu, rs - 32);
 			*val = reg.vsxval[vsx_offset];
 		}
 		break;
@@ -1441,7 +1441,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 			reg.vsxval[0] = kvmppc_get_vsx_fpr(vcpu, rs, dword_offset);
 			*val = reg.vsx32val[word_offset];
 		} else {
-			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
+			reg.vval = kvmppc_get_vsx_vr(vcpu, rs - 32);
 			*val = reg.vsx32val[vsx_offset];
 		}
 		break;
@@ -1556,7 +1556,7 @@ static int kvmppc_get_vmx_dword(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset == -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsxval[vmx_offset];
 
 	return result;
@@ -1574,7 +1574,7 @@ static int kvmppc_get_vmx_word(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset == -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsx32val[vmx_offset];
 
 	return result;
@@ -1592,7 +1592,7 @@ static int kvmppc_get_vmx_hword(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset == -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsx16val[vmx_offset];
 
 	return result;
@@ -1610,7 +1610,7 @@ static int kvmppc_get_vmx_byte(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset == -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsx8val[vmx_offset];
 
 	return result;
@@ -1719,14 +1719,14 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 				r = -ENXIO;
 				break;
 			}
-			val.vval = vcpu->arch.vr.vr[reg->id - KVM_REG_PPC_VR0];
+			val.vval = kvmppc_get_vsx_vr(vcpu, reg->id - KVM_REG_PPC_VR0);
 			break;
 		case KVM_REG_PPC_VSCR:
 			if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
 				r = -ENXIO;
 				break;
 			}
-			val = get_reg_val(reg->id, vcpu->arch.vr.vscr.u[3]);
+			val = get_reg_val(reg->id, kvmppc_get_vscr(vcpu));
 			break;
 		case KVM_REG_PPC_VRSAVE:
 			val = get_reg_val(reg->id, kvmppc_get_vrsave(vcpu));
@@ -1770,14 +1770,14 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 				r = -ENXIO;
 				break;
 			}
-			vcpu->arch.vr.vr[reg->id - KVM_REG_PPC_VR0] = val.vval;
+			kvmppc_set_vsx_vr(vcpu, reg->id - KVM_REG_PPC_VR0, val.vval);
 			break;
 		case KVM_REG_PPC_VSCR:
 			if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
 				r = -ENXIO;
 				break;
 			}
-			vcpu->arch.vr.vscr.u[3] = set_reg_val(reg->id, val);
+			kvmppc_set_vscr(vcpu, set_reg_val(reg->id, val));
 			break;
 		case KVM_REG_PPC_VRSAVE:
 			if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 3/6] KVM: PPC: Add vr getters and setters
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

Add wrappers for vr registers to prepare for supporting PAPR nested
guests.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s.h | 20 +++++++++++
 arch/powerpc/kvm/powerpc.c            | 50 +++++++++++++--------------
 2 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index a632e79639f0..77653c5b356b 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -444,6 +444,26 @@ static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 v
 	vcpu->arch.fp.fpr[i][j] = val;
 }
 
+static inline vector128 kvmppc_get_vsx_vr(struct kvm_vcpu *vcpu, int i)
+{
+	return vcpu->arch.vr.vr[i];
+}
+
+static inline void kvmppc_set_vsx_vr(struct kvm_vcpu *vcpu, int i, vector128 val)
+{
+	vcpu->arch.vr.vr[i] = val;
+}
+
+static inline u32 kvmppc_get_vscr(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.vr.vscr.u[3];
+}
+
+static inline void kvmppc_set_vscr(struct kvm_vcpu *vcpu, u32 val)
+{
+	vcpu->arch.vr.vscr.u[3] = val;
+}
+
 #define BOOK3S_WRAPPER_SET(reg, size)					\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 7f913e68342a..10436213aea2 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -934,9 +934,9 @@ static inline void kvmppc_set_vsr_dword(struct kvm_vcpu *vcpu,
 		return;
 
 	if (index >= 32) {
-		val.vval = VCPU_VSX_VR(vcpu, index - 32);
+		val.vval = kvmppc_get_vsx_vr(vcpu, index - 32);
 		val.vsxval[offset] = gpr;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		kvmppc_set_vsx_fpr(vcpu, index, offset, gpr);
 	}
@@ -949,10 +949,10 @@ static inline void kvmppc_set_vsr_dword_dump(struct kvm_vcpu *vcpu,
 	int index = vcpu->arch.io_gpr & KVM_MMIO_REG_MASK;
 
 	if (index >= 32) {
-		val.vval = VCPU_VSX_VR(vcpu, index - 32);
+		val.vval = kvmppc_get_vsx_vr(vcpu, index - 32);
 		val.vsxval[0] = gpr;
 		val.vsxval[1] = gpr;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		kvmppc_set_vsx_fpr(vcpu, index, 0, gpr);
 		kvmppc_set_vsx_fpr(vcpu, index, 1,  gpr);
@@ -970,7 +970,7 @@ static inline void kvmppc_set_vsr_word_dump(struct kvm_vcpu *vcpu,
 		val.vsx32val[1] = gpr;
 		val.vsx32val[2] = gpr;
 		val.vsx32val[3] = gpr;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		val.vsx32val[0] = gpr;
 		val.vsx32val[1] = gpr;
@@ -991,9 +991,9 @@ static inline void kvmppc_set_vsr_word(struct kvm_vcpu *vcpu,
 		return;
 
 	if (index >= 32) {
-		val.vval = VCPU_VSX_VR(vcpu, index - 32);
+		val.vval = kvmppc_get_vsx_vr(vcpu, index - 32);
 		val.vsx32val[offset] = gpr32;
-		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
+		kvmppc_set_vsx_vr(vcpu, index - 32, val.vval);
 	} else {
 		dword_offset = offset / 2;
 		word_offset = offset % 2;
@@ -1058,9 +1058,9 @@ static inline void kvmppc_set_vmx_dword(struct kvm_vcpu *vcpu,
 	if (offset = -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsxval[offset] = gpr;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 
 static inline void kvmppc_set_vmx_word(struct kvm_vcpu *vcpu,
@@ -1074,9 +1074,9 @@ static inline void kvmppc_set_vmx_word(struct kvm_vcpu *vcpu,
 	if (offset = -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsx32val[offset] = gpr32;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 
 static inline void kvmppc_set_vmx_hword(struct kvm_vcpu *vcpu,
@@ -1090,9 +1090,9 @@ static inline void kvmppc_set_vmx_hword(struct kvm_vcpu *vcpu,
 	if (offset = -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsx16val[offset] = gpr16;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 
 static inline void kvmppc_set_vmx_byte(struct kvm_vcpu *vcpu,
@@ -1106,9 +1106,9 @@ static inline void kvmppc_set_vmx_byte(struct kvm_vcpu *vcpu,
 	if (offset = -1)
 		return;
 
-	val.vval = VCPU_VSX_VR(vcpu, index);
+	val.vval = kvmppc_get_vsx_vr(vcpu, index);
 	val.vsx8val[offset] = gpr8;
-	VCPU_VSX_VR(vcpu, index) = val.vval;
+	kvmppc_set_vsx_vr(vcpu, index, val.vval);
 }
 #endif /* CONFIG_ALTIVEC */
 
@@ -1421,7 +1421,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 		if (rs < 32) {
 			*val = kvmppc_get_vsx_fpr(vcpu, rs, vsx_offset);
 		} else {
-			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
+			reg.vval = kvmppc_get_vsx_vr(vcpu, rs - 32);
 			*val = reg.vsxval[vsx_offset];
 		}
 		break;
@@ -1441,7 +1441,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
 			reg.vsxval[0] = kvmppc_get_vsx_fpr(vcpu, rs, dword_offset);
 			*val = reg.vsx32val[word_offset];
 		} else {
-			reg.vval = VCPU_VSX_VR(vcpu, rs - 32);
+			reg.vval = kvmppc_get_vsx_vr(vcpu, rs - 32);
 			*val = reg.vsx32val[vsx_offset];
 		}
 		break;
@@ -1556,7 +1556,7 @@ static int kvmppc_get_vmx_dword(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset = -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsxval[vmx_offset];
 
 	return result;
@@ -1574,7 +1574,7 @@ static int kvmppc_get_vmx_word(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset = -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsx32val[vmx_offset];
 
 	return result;
@@ -1592,7 +1592,7 @@ static int kvmppc_get_vmx_hword(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset = -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsx16val[vmx_offset];
 
 	return result;
@@ -1610,7 +1610,7 @@ static int kvmppc_get_vmx_byte(struct kvm_vcpu *vcpu, int index, u64 *val)
 	if (vmx_offset = -1)
 		return -1;
 
-	reg.vval = VCPU_VSX_VR(vcpu, index);
+	reg.vval = kvmppc_get_vsx_vr(vcpu, index);
 	*val = reg.vsx8val[vmx_offset];
 
 	return result;
@@ -1719,14 +1719,14 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 				r = -ENXIO;
 				break;
 			}
-			val.vval = vcpu->arch.vr.vr[reg->id - KVM_REG_PPC_VR0];
+			val.vval = kvmppc_get_vsx_vr(vcpu, reg->id - KVM_REG_PPC_VR0);
 			break;
 		case KVM_REG_PPC_VSCR:
 			if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
 				r = -ENXIO;
 				break;
 			}
-			val = get_reg_val(reg->id, vcpu->arch.vr.vscr.u[3]);
+			val = get_reg_val(reg->id, kvmppc_get_vscr(vcpu));
 			break;
 		case KVM_REG_PPC_VRSAVE:
 			val = get_reg_val(reg->id, kvmppc_get_vrsave(vcpu));
@@ -1770,14 +1770,14 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 				r = -ENXIO;
 				break;
 			}
-			vcpu->arch.vr.vr[reg->id - KVM_REG_PPC_VR0] = val.vval;
+			kvmppc_set_vsx_vr(vcpu, reg->id - KVM_REG_PPC_VR0, val.vval);
 			break;
 		case KVM_REG_PPC_VSCR:
 			if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
 				r = -ENXIO;
 				break;
 			}
-			vcpu->arch.vr.vscr.u[3] = set_reg_val(reg->id, val);
+			kvmppc_set_vscr(vcpu, set_reg_val(reg->id, val));
 			break;
 		case KVM_REG_PPC_VRSAVE:
 			if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers
  2023-06-05  6:48 ` Jordan Niethe
  (?)
@ 2023-06-05  6:48   ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

The new PAPR nested guest API introduces the concept of a Guest State
Buffer for communication about L2 guests between L1 and L0 hosts.

In the new API, the L0 manages the L2 on behalf of the L1. This means
that if the L1 needs to change L2 state (e.g. GPRs, SPRs, partition
table...), it must request the L0 perform the modification. If the
nested host needs to read L2 state likewise this request must
go through the L0.

The Guest State Buffer is a Type-Length-Value style data format defined
in the PAPR which assigns all relevant partition state a unique
identity. Unlike a typical TLV format the length is redundant as the
length of each identity is fixed but is included for checking
correctness.

A guest state buffer consists of an element count followed by a stream
of elements, where elements are composed of an ID number, data length,
then the data:

  Header:

   <---4 bytes--->
  +----------------+-----
  | Element Count  | Elements...
  +----------------+-----

  Element:

   <----2 bytes---> <-2 bytes-> <-Length bytes->
  +----------------+-----------+----------------+
  | Guest State ID |  Length   |      Data      |
  +----------------+-----------+----------------+

Guest State IDs have other attributes defined in the PAPR such as
whether they are per thread or per guest, or read-only.

Introduce a library for using guest state buffers. This includes support
for actions such as creating buffers, adding elements to buffers,
reading the value of elements and parsing buffers. This will be used
later by the PAPR nested guest support.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
v2:
  - Add missing #ifdef CONFIG_VSXs
  - Move files from lib/ to kvm/
  - Guard compilation on CONFIG_KVM_BOOK3S_HV_POSSIBLE
  - Use kunit for guest state buffer tests
  - Add configuration option for the tests
  - Use macros for contiguous id ranges like GPRs
  - Add some missing EXPORTs to functions
  - HEIR element is a double word not a word
---
 arch/powerpc/Kconfig.debug                    |  12 +
 arch/powerpc/include/asm/guest-state-buffer.h | 901 ++++++++++++++++++
 arch/powerpc/include/asm/kvm_book3s.h         |   2 +
 arch/powerpc/kvm/Makefile                     |   3 +
 arch/powerpc/kvm/guest-state-buffer.c         | 563 +++++++++++
 arch/powerpc/kvm/test-guest-state-buffer.c    | 321 +++++++
 6 files changed, 1802 insertions(+)
 create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
 create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
 create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 6aaf8dc60610..ed830a714720 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -82,6 +82,18 @@ config MSI_BITMAP_SELFTEST
 	bool "Run self-tests of the MSI bitmap code"
 	depends on DEBUG_KERNEL
 
+config GUEST_STATE_BUFFER_TEST
+	def_tristate n
+	prompt "Enable Guest State Buffer unit tests"
+	depends on KUNIT
+	depends on KVM_BOOK3S_HV_POSSIBLE
+	default KUNIT_ALL_TESTS
+	help
+	  The Guest State Buffer is a data format specified in the PAPR.
+	  It is by hcalls to communicate the state of L2 guests between
+	  the L1 and L0 hypervisors. Enable unit tests for the library
+	  used to create and use guest state buffers.
+
 config PPC_IRQ_SOFT_MASK_DEBUG
 	bool "Include extra checks for powerpc irq soft masking"
 	depends on PPC64
diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
new file mode 100644
index 000000000000..65a840abf1bb
--- /dev/null
+++ b/arch/powerpc/include/asm/guest-state-buffer.h
@@ -0,0 +1,901 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Interface based on include/net/netlink.h
+ */
+#ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
+#define _ASM_POWERPC_GUEST_STATE_BUFFER_H
+
+#include <linux/gfp.h>
+#include <linux/bitmap.h>
+#include <asm/plpar_wrappers.h>
+
+/**************************************************************************
+ * Guest State Buffer Constants
+ **************************************************************************/
+#define GSID_BLANK			0x0000
+
+#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
+#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
+#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
+#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
+#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
+#define GSID_PROCESS_TABLE		0x0006 /* Process Table */
+
+#define GSID_RUN_INPUT			0x0C00 /* Run VCPU Input Buffer */
+#define GSID_RUN_OUTPUT			0x0C01 /* Run VCPU Out Buffer */
+#define GSID_VPA			0x0C02 /* HRA to Guest VCPU VPA */
+
+#define GSID_GPR(x)			(0x1000 + (x))
+#define GSID_HDEC_EXPIRY_TB		0x1020
+#define GSID_NIA			0x1021
+#define GSID_MSR			0x1022
+#define GSID_LR				0x1023
+#define GSID_XER			0x1024
+#define GSID_CTR			0x1025
+#define GSID_CFAR			0x1026
+#define GSID_SRR0			0x1027
+#define GSID_SRR1			0x1028
+#define GSID_DAR			0x1029
+#define GSID_DEC_EXPIRY_TB		0x102A
+#define GSID_VTB			0x102B
+#define GSID_LPCR			0x102C
+#define GSID_HFSCR			0x102D
+#define GSID_FSCR			0x102E
+#define GSID_FPSCR			0x102F
+#define GSID_DAWR0			0x1030
+#define GSID_DAWR1			0x1031
+#define GSID_CIABR			0x1032
+#define GSID_PURR			0x1033
+#define GSID_SPURR			0x1034
+#define GSID_IC				0x1035
+#define GSID_SPRG0			0x1036
+#define GSID_SPRG1			0x1037
+#define GSID_SPRG2			0x1038
+#define GSID_SPRG3			0x1039
+#define GSID_PPR			0x103A
+#define GSID_MMCR(x)			(0x103B + (x))
+#define GSID_MMCRA			0x103F
+#define GSID_SIER(x)			(0x1040 + (x))
+#define GSID_BESCR			0x1043
+#define GSID_EBBHR			0x1044
+#define GSID_EBBRR			0x1045
+#define GSID_AMR			0x1046
+#define GSID_IAMR			0x1047
+#define GSID_AMOR			0x1048
+#define GSID_UAMOR			0x1049
+#define GSID_SDAR			0x104A
+#define GSID_SIAR			0x104B
+#define GSID_DSCR			0x104C
+#define GSID_TAR			0x104D
+#define GSID_DEXCR			0x104E
+#define GSID_HDEXCR			0x104F
+#define GSID_HASHKEYR			0x1050
+#define GSID_HASHPKEYR			0x1051
+#define GSID_CTRL			0x1052
+
+#define GSID_CR				0x2000
+#define GSID_PIDR			0x2001
+#define GSID_DSISR			0x2002
+#define GSID_VSCR			0x2003
+#define GSID_VRSAVE			0x2004
+#define GSID_DAWRX0			0x2005
+#define GSID_DAWRX1			0x2006
+#define GSID_PMC(x)			(0x2007 + (x))
+#define GSID_WORT			0x200D
+#define GSID_PSPB			0x200E
+
+#define GSID_VSRS(x)			(0x3000 + (x))
+
+#define GSID_HDAR			0xF000
+#define GSID_HDSISR			0xF001
+#define GSID_HEIR			0xF002
+#define GSID_ASDR			0xF003
+
+
+#define GSE_GUESTWIDE_START GSID_BLANK
+#define GSE_GUESTWIDE_END GSID_PROCESS_TABLE
+#define GSE_GUESTWIDE_COUNT (GSE_GUESTWIDE_END - GSE_GUESTWIDE_START + 1)
+
+#define GSE_META_START GSID_RUN_INPUT
+#define GSE_META_END GSID_VPA
+#define GSE_META_COUNT (GSE_META_END - GSE_META_START + 1)
+
+#define GSE_DW_REGS_START GSID_GPR(0)
+#define GSE_DW_REGS_END GSID_CTRL
+#define GSE_DW_REGS_COUNT (GSE_DW_REGS_END - GSE_DW_REGS_START + 1)
+
+#define GSE_W_REGS_START GSID_CR
+#define GSE_W_REGS_END GSID_PSPB
+#define GSE_W_REGS_COUNT (GSE_W_REGS_END - GSE_W_REGS_START + 1)
+
+#define GSE_VSRS_START GSID_VSRS(0)
+#define GSE_VSRS_END GSID_VSRS(63)
+#define GSE_VSRS_COUNT (GSE_VSRS_END - GSE_VSRS_START + 1)
+
+#define GSE_INTR_REGS_START GSID_HDAR
+#define GSE_INTR_REGS_END GSID_ASDR
+#define GSE_INTR_REGS_COUNT (GSE_INTR_REGS_END - GSE_INTR_REGS_START + 1)
+
+#define GSE_IDEN_COUNT                                              \
+	(GSE_GUESTWIDE_COUNT + GSE_META_COUNT + GSE_DW_REGS_COUNT + \
+	 GSE_W_REGS_COUNT + GSE_VSRS_COUNT + GSE_INTR_REGS_COUNT)
+
+
+/**
+ * Ranges of guest state buffer elements
+ */
+enum {
+	GS_CLASS_GUESTWIDE = 0x01,
+	GS_CLASS_META = 0x02,
+	GS_CLASS_DWORD_REG = 0x04,
+	GS_CLASS_WORD_REG = 0x08,
+	GS_CLASS_VECTOR = 0x10,
+	GS_CLASS_INTR = 0x20,
+};
+
+/**
+ * Types of guest state buffer elements
+ */
+enum {
+	GSE_BE32,
+	GSE_BE64,
+	GSE_VEC128,
+	GSE_PARTITION_TABLE,
+	GSE_PROCESS_TABLE,
+	GSE_BUFFER,
+	__GSE_TYPE_MAX,
+};
+
+/**
+ * Flags for guest state elements
+ */
+enum {
+	GS_FLAGS_WIDE = 0x01,
+};
+
+/**
+ * struct gs_part_table - deserialized partition table information element
+ * @address: start of the partition table
+ * @ea_bits: number of bits in the effective address
+ * @gpd_size: root page directory size
+ */
+struct gs_part_table {
+	u64 address;
+	u64 ea_bits;
+	u64 gpd_size;
+};
+
+/**
+ * struct gs_proc_table - deserialized process table information element
+ * @address: start of the process table
+ * @gpd_size: process table size
+ */
+struct gs_proc_table {
+	u64 address;
+	u64 gpd_size;
+};
+
+/**
+ * struct gs_buff_info - deserialized meta guest state buffer information
+ * @address: start of the guest state buffer
+ * @size: size of the guest state buffer
+ */
+struct gs_buff_info {
+	u64 address;
+	u64 size;
+};
+
+/**
+ * struct gs_header - serialized guest state buffer header
+ * @nelem: count of guest state elements in the buffer
+ * @data: start of the stream of elements in the buffer
+ */
+struct gs_header {
+	__be32 nelems;
+	char data[];
+} __packed;
+
+/**
+ * struct gs_elem - serialized guest state buffer element
+ * @iden: Guest State ID
+ * @len: length of data
+ * @data: the guest state buffer element's value
+ */
+struct gs_elem {
+	__be16 iden;
+	__be16 len;
+	char data[];
+} __packed;
+
+/**
+ * struct gs_buff - a guest state buffer with metadata.
+ * @capacity: total length of the buffer
+ * @len: current length of the elements and header
+ * @guest_id: guest id associated with the buffer
+ * @vcpu_id: vcpu_id associated with the buffer
+ * @hdr: the serialised guest state buffer
+ */
+struct gs_buff {
+	size_t capacity;
+	size_t len;
+	unsigned long guest_id;
+	unsigned long vcpu_id;
+	struct gs_header *hdr;
+};
+
+/**
+ * struct gs_bitmap - a bitmap for element ids
+ * @bitmap: a bitmap large enough for all Guest State IDs
+ */
+struct gs_bitmap {
+/* private: */
+	DECLARE_BITMAP(bitmap, GSE_IDEN_COUNT);
+};
+
+/**
+ * struct gs_parser - a map of element ids to locations in a buffer
+ * @iterator: bitmap used for iterating
+ * @gses: contains the pointers to elements
+ *
+ * A guest state parser is used for deserialising a guest state buffer.
+ * Given a buffer, it then allows looking up guest state elements using
+ * a guest state id.
+ */
+struct gs_parser {
+/* private: */
+	struct gs_bitmap iterator;
+	struct gs_elem *gses[GSE_IDEN_COUNT];
+};
+
+enum {
+	GSM_GUEST_WIDE = 0x1,
+	GSM_SEND = 0x2,
+	GSM_RECEIVE = 0x4,
+	GSM_GSB_OWNER = 0x8,
+};
+
+struct gs_msg;
+
+/**
+ * struct gs_msg_ops - guest state message behavior
+ * @get_size: maximum size required for the message data
+ * @fill_info: serializes to the guest state buffer format
+ * @refresh_info: dserializes from the guest state buffer format
+ */
+struct gs_msg_ops {
+	size_t (*get_size)(struct gs_msg *gsm);
+	int (*fill_info)(struct gs_buff *gsb, struct gs_msg *gsm);
+	int (*refresh_info)(struct gs_msg *gsm, struct gs_buff *gsb);
+};
+
+/**
+ * struct gs_msg - a guest state message
+ * @bitmap: the guest state ids that should be included
+ * @ops: modify message behavior for reading and writing to buffers
+ * @flags: guest wide or thread wide
+ * @data: location where buffer data will be written to or from.
+ *
+ * A guest state message is allows flexibility in sending in receiving data
+ * in a guest state buffer format.
+ */
+struct gs_msg {
+	struct gs_bitmap bitmap;
+	struct gs_msg_ops *ops;
+	unsigned long flags;
+	void *data;
+};
+
+/**************************************************************************
+ * Guest State IDs
+ **************************************************************************/
+
+u16 gsid_size(u16 iden);
+unsigned long gsid_flags(u16 iden);
+u64 gsid_mask(u16 iden);
+
+/**************************************************************************
+ * Guest State Buffers
+ **************************************************************************/
+struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
+			unsigned long vcpu_id, gfp_t flags);
+void gsb_free(struct gs_buff *gsb);
+void *gsb_put(struct gs_buff *gsb, size_t size);
+
+/**
+ * gsb_header() - the header of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns a pointer to the buffer header.
+ */
+static inline struct gs_header *gsb_header(struct gs_buff *gsb)
+{
+	return gsb->hdr;
+}
+
+/**
+ * gsb_data() - the elements of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns a pointer to the first element of the buffer data.
+ */
+static inline struct gs_elem *gsb_data(struct gs_buff *gsb)
+{
+	return (struct gs_elem *)gsb_header(gsb)->data;
+}
+
+/**
+ * gsb_len() - the current length of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the length including the header of a buffer.
+ */
+static inline size_t gsb_len(struct gs_buff *gsb)
+{
+	return gsb->len;
+}
+
+/**
+ * gsb_capacity() - the capacity of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the capacity of a buffer.
+ */
+static inline size_t gsb_capacity(struct gs_buff *gsb)
+{
+	return gsb->capacity;
+}
+
+/**
+ * gsb_paddress() - the physical address of buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the physical address of the buffer.
+ */
+static inline u64 gsb_paddress(struct gs_buff *gsb)
+{
+	return __pa(gsb_header(gsb));
+}
+
+/**
+ * gsb_nelems() - the number of elements in a buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the number of elements in a buffer
+ */
+static inline u32 gsb_nelems(struct gs_buff *gsb)
+{
+	return be32_to_cpu(gsb_header(gsb)->nelems);
+}
+
+/**
+ * gsb_reset() - empty a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Reset the number of elements and length of buffer to empty.
+ */
+static inline void gsb_reset(struct gs_buff *gsb)
+{
+	gsb_header(gsb)->nelems = cpu_to_be32(0);
+	gsb->len = sizeof(struct gs_header);
+}
+
+/**
+ * gsb_data_len() - the length of a buffer excluding the header
+ * @gsb: guest state buffer
+ *
+ * Returns the length of a buffer excluding the header
+ */
+static inline size_t gsb_data_len(struct gs_buff *gsb)
+{
+	return gsb->len - sizeof(struct gs_header);
+}
+
+/**
+ * gsb_data_cap() - the capacity of a buffer excluding the header
+ * @gsb: guest state buffer
+ *
+ * Returns the capacity of a buffer excluding the header
+ */
+static inline size_t gsb_data_cap(struct gs_buff *gsb)
+{
+	return gsb->capacity - sizeof(struct gs_header);
+}
+
+/**
+ * gsb_for_each_elem - iterate over the elements in a buffer
+ * @i: loop counter
+ * @pos: set to current element
+ * @gsb: guest state buffer
+ * @rem: initialized to buffer capacity, holds bytes currently remaining in stream
+ */
+#define gsb_for_each_elem(i, pos, gsb, rem)                       \
+	gse_for_each_elem(i, gsb_nelems(gsb), pos, gsb_data(gsb), \
+			  gsb_data_cap(gsb), rem)
+
+/**************************************************************************
+ * Guest State Elements
+ **************************************************************************/
+
+/**
+ * gse_iden() - guest state ID of element
+ * @gse: guest state element
+ *
+ * Return the guest state ID in host endianness.
+ */
+static inline u16 gse_iden(const struct gs_elem *gse)
+{
+	return be16_to_cpu(gse->iden);
+}
+
+/**
+ * gse_len() - length of guest state element data
+ * @gse: guest state element
+ *
+ * Returns the length of guest state element data
+ */
+static inline u16 gse_len(const struct gs_elem *gse)
+{
+	return be16_to_cpu(gse->len);
+}
+
+/**
+ * gse_total_len() - total length of guest state element
+ * @gse: guest state element
+ *
+ * Returns the length of the data plus the ID and size header.
+ */
+static inline u16 gse_total_len(const struct gs_elem *gse)
+{
+	return be16_to_cpu(gse->len) + sizeof(*gse);
+}
+
+/**
+ * gse_total_size() - space needed for a given data length
+ * @size: data length
+ *
+ * Returns size plus the space needed for the ID and size header.
+ */
+static inline u16 gse_total_size(u16 size)
+{
+	return sizeof(struct gs_elem) + size;
+}
+
+/**
+ * gse_data() - pointer to data of a guest state element
+ * @gse: guest state element
+ *
+ * Returns a pointer to the beginning of guest state element data.
+ */
+static inline void *gse_data(const struct gs_elem *gse)
+{
+	return (void *)gse->data;
+}
+
+/**
+ * gse_ok() - checks space exists for guest state element
+ * @gse: guest state element
+ * @remaining: bytes of space remaining
+ *
+ * Returns true if the guest state element can fit in remaining space.
+ */
+static inline bool gse_ok(const struct gs_elem *gse, int remaining)
+{
+	return remaining >= gse_total_len(gse);
+}
+
+/**
+ * gse_next() - iterate to the next guest state element in a stream
+ * @gse: stream of guest state elements
+ * @remaining: length of the guest element stream
+ *
+ * Returns the next guest state element in a stream of elements. The length of
+ * the stream is updated in remaining.
+ */
+static inline struct gs_elem *gse_next(const struct gs_elem *gse,
+				       int *remaining)
+{
+	int len = sizeof(*gse) + gse_len(gse);
+
+	*remaining -= len;
+	return (struct gs_elem *)(gse->data + gse_len(gse));
+}
+
+/**
+ * gse_for_each_elem - iterate over a stream of guest state elements
+ * @i: loop counter
+ * @max: number of elements
+ * @pos: set to current element
+ * @head: head of elements
+ * @len: length of the stream
+ * @rem: initialized to len, holds bytes currently remaining elements
+ */
+#define gse_for_each_elem(i, max, pos, head, len, rem)                  \
+	for (i = 0, pos = head, rem = len; gse_ok(pos, rem) && i < max; \
+	     pos = gse_next(pos, &(rem)), i++)
+
+int __gse_put(struct gs_buff *gsb, u16 iden, u16 size, const void *data);
+int gse_parse(struct gs_parser *gsp, struct gs_buff *gsb);
+
+/**
+ * gse_put_be32() - add a be32 guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: big endian value
+ */
+static inline int gse_put_be32(struct gs_buff *gsb, u16 iden, __be32 val)
+{
+	__be32 tmp;
+
+	tmp = val;
+	return __gse_put(gsb, iden, sizeof(__be32), &tmp);
+}
+
+/**
+ * gse_put_u32() - add a host endian 32bit int guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: host endian value
+ */
+static inline int gse_put_u32(struct gs_buff *gsb, u16 iden, u32 val)
+{
+	__be32 tmp;
+
+	tmp = cpu_to_be32(val);
+	return gse_put_be32(gsb, iden, tmp);
+}
+
+/**
+ * gse_put_be64() - add a be64 guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: big endian value
+ */
+static inline int gse_put_be64(struct gs_buff *gsb, u16 iden, __be64 val)
+{
+	__be64 tmp;
+
+	tmp = val;
+	return __gse_put(gsb, iden, sizeof(__be64), &tmp);
+}
+
+/**
+ * gse_put_u64() - add a host endian 64bit guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: host endian value
+ */
+static inline int gse_put_u64(struct gs_buff *gsb, u16 iden, u64 val)
+{
+	__be64 tmp;
+
+	tmp = cpu_to_be64(val);
+	return gse_put_be64(gsb, iden, tmp);
+}
+
+/**
+ * __gse_put_reg() - add a register type guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: host endian value
+ *
+ * Adds a register type guest state element. Uses the guest state ID for
+ * determining the length of the guest element. If the guest state ID has
+ * bits that can not be set they will be cleared.
+ */
+static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
+{
+	val &= gsid_mask(iden);
+	if (gsid_size(iden) == sizeof(u64))
+		return gse_put_u64(gsb, iden, val);
+
+	if (gsid_size(iden) == sizeof(u32)) {
+		u32 tmp;
+
+		tmp = (u32)val;
+		if (tmp != val)
+			return -EINVAL;
+
+		return gse_put_u32(gsb, iden, tmp);
+	}
+	return -EINVAL;
+}
+
+/**
+ * gse_put_vector128() - add a vector guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: 16 byte vector value
+ */
+static inline int gse_put_vector128(struct gs_buff *gsb, u16 iden,
+				    vector128 val)
+{
+	__be64 tmp[2] = { 0 };
+	union {
+		__vector128 v;
+		u64 dw[2];
+	} u;
+
+	u.v = val;
+	tmp[0] = cpu_to_be64(u.dw[TS_FPROFFSET]);
+#ifdef CONFIG_VSX
+	tmp[1] = cpu_to_be64(u.dw[TS_VSRLOWOFFSET]);
+#endif
+	return __gse_put(gsb, iden, sizeof(tmp), &tmp);
+}
+
+/**
+ * gse_put_part_table() - add a partition table guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: partition table value
+ */
+static inline int gse_put_part_table(struct gs_buff *gsb, u16 iden,
+				     struct gs_part_table val)
+{
+	__be64 tmp[3];
+
+	tmp[0] = cpu_to_be64(val.address);
+	tmp[1] = cpu_to_be64(val.ea_bits);
+	tmp[2] = cpu_to_be64(val.gpd_size);
+	return __gse_put(gsb, GSID_PARTITION_TABLE, sizeof(tmp), &tmp);
+}
+
+/**
+ * gse_put_proc_table() - add a process table guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: process table value
+ */
+static inline int gse_put_proc_table(struct gs_buff *gsb, u16 iden,
+				     struct gs_proc_table val)
+{
+	__be64 tmp[2];
+
+	tmp[0] = cpu_to_be64(val.address);
+	tmp[1] = cpu_to_be64(val.gpd_size);
+	return __gse_put(gsb, GSID_PROCESS_TABLE, sizeof(tmp), &tmp);
+}
+
+/**
+ * gse_put_buff_info() - adds a GSB description guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: guest state buffer description value
+ */
+static inline int gse_put_buff_info(struct gs_buff *gsb, u16 iden,
+				    struct gs_buff_info val)
+{
+	__be64 tmp[2];
+
+	tmp[0] = cpu_to_be64(val.address);
+	tmp[1] = cpu_to_be64(val.size);
+	return __gse_put(gsb, iden, sizeof(tmp), &tmp);
+}
+
+int __gse_put(struct gs_buff *gsb, u16 iden, u16 size, const void *data);
+
+/**
+ * gse_get_be32() - return the data of a be32 element
+ * @gse: guest state element
+ */
+static inline __be32 gse_get_be32(const struct gs_elem *gse)
+{
+	return *(__be32 *)gse_data(gse);
+}
+
+/**
+ * gse_get_u32() - return the data of a be32 element in host endianness
+ * @gse: guest state element
+ */
+static inline u32 gse_get_u32(const struct gs_elem *gse)
+{
+	return be32_to_cpu(gse_get_be32(gse));
+}
+
+/**
+ * gse_get_be64() - return the data of a be64 element
+ * @gse: guest state element
+ */
+static inline __be64 gse_get_be64(const struct gs_elem *gse)
+{
+	return *(__be64 *)gse_data(gse);
+}
+
+/**
+ * gse_get_u64() - return the data of a be64 element in host endianness
+ * @gse: guest state element
+ */
+static inline u64 gse_get_u64(const struct gs_elem *gse)
+{
+	return be64_to_cpu(gse_get_be64(gse));
+}
+
+/**
+ * __gse_get_reg() - return the date of a register type guest state element
+ * @gse: guest state element
+ *
+ * Determine the element data size from its guest state ID and return the
+ * correctly sized value.
+ */
+static inline u64 __gse_get_reg(const struct gs_elem *gse)
+{
+	if (gse_len(gse) == sizeof(u64))
+		return gse_get_u64(gse);
+
+	if (gse_len(gse) == sizeof(u32)) {
+		u32 tmp;
+
+		tmp = gse_get_u32(gse);
+		return (u64)tmp;
+	}
+	return 0;
+}
+
+/**
+ * gse_get_vector128() - return the data of a vector element
+ * @gse: guest state element
+ */
+static inline vector128 gse_get_vector128(const struct gs_elem *gse)
+{
+	union {
+		__vector128 v;
+		u64 dw[2];
+	} u = { 0 };
+	__be64 *src;
+
+	src = (__be64 *)gse_data(gse);
+	u.dw[TS_FPROFFSET] = be64_to_cpu(src[0]);
+#ifdef CONFIG_VSX
+	u.dw[TS_VSRLOWOFFSET] = be64_to_cpu(src[1]);
+#endif
+	return u.v;
+}
+
+/**
+ * gse_put - add a guest state element to a buffer
+ * @gsb: guest state buffer to add to
+ * @iden: guest state identity
+ * @v: generic value
+ */
+#define gse_put(gsb, iden, v)					\
+	(_Generic((v),						\
+		  u64 : __gse_put_reg,				\
+		  long unsigned int : __gse_put_reg,		\
+		  u32 : __gse_put_reg,				\
+		  struct gs_buff_info : gse_put_buff_info,	\
+		  struct gs_proc_table : gse_put_proc_table,	\
+		  struct gs_part_table : gse_put_part_table,	\
+		  vector128 : gse_put_vector128)(gsb, iden, v))
+
+/**
+ * gse_get - return the data of a guest state element
+ * @gsb: guest state element to add to
+ * @v: generic value pointer to return in
+ */
+#define gse_get(gse, v)						\
+	(*v = (_Generic((v),					\
+			u64 * : __gse_get_reg,			\
+			unsigned long * : __gse_get_reg,	\
+			u32 * : __gse_get_reg,			\
+			vector128 * : gse_get_vector128)(gse)))
+
+/**************************************************************************
+ * Guest State Bitmap
+ **************************************************************************/
+
+bool gsbm_test(struct gs_bitmap *gsbm, u16 iden);
+void gsbm_set(struct gs_bitmap *gsbm, u16 iden);
+void gsbm_clear(struct gs_bitmap *gsbm, u16 iden);
+u16 gsbm_next(struct gs_bitmap *gsbm, u16 prev);
+
+/**
+ * gsbm_zero - zero the entire bitmap
+ * @gsbm: guest state buffer bitmap
+ */
+static inline void gsbm_zero(struct gs_bitmap *gsbm)
+{
+	bitmap_zero(gsbm->bitmap, GSE_IDEN_COUNT);
+}
+
+/**
+ * gsbm_fill - fill the entire bitmap
+ * @gsbm: guest state buffer bitmap
+ */
+static inline void gsbm_fill(struct gs_bitmap *gsbm)
+{
+	bitmap_fill(gsbm->bitmap, GSE_IDEN_COUNT);
+	clear_bit(0, gsbm->bitmap);
+}
+
+/**
+ * gsbm_for_each - iterate the present guest state IDs
+ * @gsbm: guest state buffer bitmap
+ * @iden: current guest state ID
+ */
+#define gsbm_for_each(gsbm, iden) \
+	for (iden = gsbm_next(gsbm, 0); iden != 0; iden = gsbm_next(gsbm, iden))
+
+
+/**************************************************************************
+ * Guest State Parser
+ **************************************************************************/
+
+void gsp_insert(struct gs_parser *gsp, u16 iden, struct gs_elem *gse);
+struct gs_elem *gsp_lookup(struct gs_parser *gsp, u16 iden);
+
+/**
+ * gsp_for_each - iterate the <guest state IDs, guest state element> pairs
+ * @gsp: guest state buffer bitmap
+ * @iden: current guest state ID
+ * @gse: guest state element
+ */
+#define gsp_for_each(gsp, iden, gse)                              \
+	for (iden = gsbm_next(&(gsp)->iterator, 0),               \
+	    gse = gsp_lookup((gsp), iden);                        \
+	     iden != 0; iden = gsbm_next(&(gsp)->iterator, iden), \
+	    gse = gsp_lookup((gsp), iden))
+
+/**************************************************************************
+ * Guest State Message
+ **************************************************************************/
+
+/**
+ * gsm_for_each - iterate the guest state IDs included in a guest state message
+ * @gsp: guest state buffer bitmap
+ * @iden: current guest state ID
+ * @gse: guest state element
+ */
+#define gsm_for_each(gsm, iden)                            \
+	for (iden = gsbm_next(&gsm->bitmap, 0); iden != 0; \
+	     iden = gsbm_next(&gsm->bitmap, iden))
+
+int gsm_init(struct gs_msg *mgs, struct gs_msg_ops *ops, void *data,
+	     unsigned long flags);
+
+struct gs_msg *gsm_new(struct gs_msg_ops *ops, void *data, unsigned long flags,
+		       gfp_t gfp_flags);
+void gsm_free(struct gs_msg *gsm);
+size_t gsm_size(struct gs_msg *gsm);
+int gsm_fill_info(struct gs_msg *gsm, struct gs_buff *gsb);
+int gsm_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb);
+
+/**
+ * gsm_include - indicate a guest state ID should be included when serializing
+ * @gsm: guest state message
+ * @iden: guest state ID
+ */
+static inline void gsm_include(struct gs_msg *gsm, u16 iden)
+{
+	gsbm_set(&gsm->bitmap, iden);
+}
+
+/**
+ * gsm_includes - check if a guest state ID will be included when serializing
+ * @gsm: guest state message
+ * @iden: guest state ID
+ */
+static inline bool gsm_includes(struct gs_msg *gsm, u16 iden)
+{
+	return gsbm_test(&gsm->bitmap, iden);
+}
+
+/**
+ * gsm_includes - indicate all guest state IDs should be included when serializing
+ * @gsm: guest state message
+ * @iden: guest state ID
+ */
+static inline void gsm_include_all(struct gs_msg *gsm)
+{
+	gsbm_fill(&gsm->bitmap);
+}
+
+/**
+ * gsm_include - clear the guest state IDs that should be included when serializing
+ * @gsm: guest state message
+ */
+static inline void gsm_reset(struct gs_msg *gsm)
+{
+	gsbm_zero(&gsm->bitmap);
+}
+
+#endif /* _ASM_POWERPC_GUEST_STATE_BUFFER_H */
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 77653c5b356b..0ca2d8b37b42 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -444,6 +444,7 @@ static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 v
 	vcpu->arch.fp.fpr[i][j] = val;
 }
 
+#ifdef CONFIG_VSX
 static inline vector128 kvmppc_get_vsx_vr(struct kvm_vcpu *vcpu, int i)
 {
 	return vcpu->arch.vr.vr[i];
@@ -463,6 +464,7 @@ static inline void kvmppc_set_vscr(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.vr.vscr.u[3] = val;
 }
+#endif
 
 #define BOOK3S_WRAPPER_SET(reg, size)					\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 5319d889b184..eb8445e71c14 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -87,8 +87,11 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
 	book3s_hv_ras.o \
 	book3s_hv_builtin.o \
 	book3s_hv_p9_perf.o \
+	guest-state-buffer.o \
 	$(kvm-book3s_64-builtin-tm-objs-y) \
 	$(kvm-book3s_64-builtin-xics-objs-y)
+
+obj-$(CONFIG_GUEST_STATE_BUFFER_TEST) += test-guest-state-buffer.o
 endif
 
 kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
diff --git a/arch/powerpc/kvm/guest-state-buffer.c b/arch/powerpc/kvm/guest-state-buffer.c
new file mode 100644
index 000000000000..db4a79bfcaf1
--- /dev/null
+++ b/arch/powerpc/kvm/guest-state-buffer.c
@@ -0,0 +1,563 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "asm/hvcall.h"
+#include <linux/log2.h>
+#include <asm/pgalloc.h>
+#include <asm/guest-state-buffer.h>
+
+static const u16 gse_iden_len[__GSE_TYPE_MAX] = {
+	[GSE_BE32] = sizeof(__be32),
+	[GSE_BE64] = sizeof(__be64),
+	[GSE_VEC128] = sizeof(vector128),
+	[GSE_PARTITION_TABLE] = sizeof(struct gs_part_table),
+	[GSE_PROCESS_TABLE] = sizeof(struct gs_proc_table),
+	[GSE_BUFFER] = sizeof(struct gs_buff_info),
+};
+
+/**
+ * gsb_new() - create a new guest state buffer
+ * @size: total size of the guest state buffer (includes header)
+ * @guest_id: guest_id
+ * @vcpu_id: vcpu_id
+ * @flags: GFP flags
+ *
+ * Returns a guest state buffer.
+ */
+struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
+			unsigned long vcpu_id, gfp_t flags)
+{
+	struct gs_buff *gsb;
+
+	gsb = kzalloc(sizeof(*gsb), flags);
+	if (!gsb)
+		return NULL;
+
+	size = roundup_pow_of_two(size);
+	gsb->hdr = kzalloc(size, GFP_KERNEL);
+	if (!gsb->hdr)
+		goto free;
+
+	gsb->capacity = size;
+	gsb->len = sizeof(struct gs_header);
+	gsb->vcpu_id = vcpu_id;
+	gsb->guest_id = guest_id;
+
+	gsb->hdr->nelems = cpu_to_be32(0);
+
+	return gsb;
+
+free:
+	kfree(gsb);
+	return NULL;
+}
+EXPORT_SYMBOL(gsb_new);
+
+/**
+ * gsb_free() - free a guest state buffer
+ * @gsb: guest state buffer
+ */
+void gsb_free(struct gs_buff *gsb)
+{
+	kfree(gsb->hdr);
+	kfree(gsb);
+}
+EXPORT_SYMBOL(gsb_free);
+
+/**
+ * gsb_put() - allocate space in a guest state buffer
+ * @gsb: buffer to allocate in
+ * @size: amount of space to allocate
+ *
+ * Returns a pointer to the amount of space requested within the buffer and
+ * increments the count of elements in the buffer.
+ *
+ * Does not check if there is enough space in the buffer.
+ */
+void *gsb_put(struct gs_buff *gsb, size_t size)
+{
+	u32 nelems = gsb_nelems(gsb);
+	void *p;
+
+	p = (void *)gsb_header(gsb) + gsb_len(gsb);
+	gsb->len += size;
+
+	gsb_header(gsb)->nelems = cpu_to_be32(nelems + 1);
+	return p;
+}
+EXPORT_SYMBOL(gsb_put);
+
+static int gsid_class(u16 iden)
+{
+	if ((iden >= GSE_GUESTWIDE_START) && (iden <= GSE_GUESTWIDE_END))
+		return GS_CLASS_GUESTWIDE;
+
+	if ((iden >= GSE_META_START) && (iden <= GSE_META_END))
+		return GS_CLASS_META;
+
+	if ((iden >= GSE_DW_REGS_START) && (iden <= GSE_DW_REGS_END))
+		return GS_CLASS_DWORD_REG;
+
+	if ((iden >= GSE_W_REGS_START) && (iden <= GSE_W_REGS_END))
+		return GS_CLASS_WORD_REG;
+
+	if ((iden >= GSE_VSRS_START) && (iden <= GSE_VSRS_END))
+		return GS_CLASS_VECTOR;
+
+	if ((iden >= GSE_INTR_REGS_START) && (iden <= GSE_INTR_REGS_END))
+		return GS_CLASS_INTR;
+
+	return -1;
+}
+
+static int gsid_type(u16 iden)
+{
+	int type = -1;
+
+	switch (gsid_class(iden)) {
+	case GS_CLASS_GUESTWIDE:
+		switch (iden) {
+		case GSID_HOST_STATE_SIZE:
+		case GSID_RUN_OUTPUT_MIN_SIZE:
+		case GSID_TB_OFFSET:
+			type = GSE_BE64;
+			break;
+		case GSID_PARTITION_TABLE:
+			type = GSE_PARTITION_TABLE;
+			break;
+		case GSID_PROCESS_TABLE:
+			type = GSE_PROCESS_TABLE;
+			break;
+		case GSID_LOGICAL_PVR:
+			type = GSE_BE32;
+			break;
+		}
+		break;
+	case GS_CLASS_META:
+		switch (iden) {
+		case GSID_RUN_INPUT:
+		case GSID_RUN_OUTPUT:
+			type = GSE_BUFFER;
+			break;
+		case GSID_VPA:
+			type = GSE_BE64;
+			break;
+		}
+		break;
+	case GS_CLASS_DWORD_REG:
+		type = GSE_BE64;
+		break;
+	case GS_CLASS_WORD_REG:
+		type = GSE_BE32;
+		break;
+	case GS_CLASS_VECTOR:
+		type = GSE_VEC128;
+		break;
+	case GS_CLASS_INTR:
+		switch (iden) {
+		case GSID_HDAR:
+		case GSID_ASDR:
+		case GSID_HEIR:
+			type = GSE_BE64;
+			break;
+		case GSID_HDSISR:
+			type = GSE_BE32;
+			break;
+		}
+		break;
+	}
+
+	return type;
+}
+
+/**
+ * gsid_flags() - the flags for a guest state ID
+ * @iden: guest state ID
+ *
+ * Returns any flags for the guest state ID.
+ */
+unsigned long gsid_flags(u16 iden)
+{
+	unsigned long flags = 0;
+
+	switch (gsid_class(iden)) {
+	case GS_CLASS_GUESTWIDE:
+		flags = GS_FLAGS_WIDE;
+		break;
+	case GS_CLASS_META:
+	case GS_CLASS_DWORD_REG:
+	case GS_CLASS_WORD_REG:
+	case GS_CLASS_VECTOR:
+	case GS_CLASS_INTR:
+		break;
+	}
+
+	return flags;
+}
+EXPORT_SYMBOL(gsid_flags);
+
+/**
+ * gsid_size() - the size of a guest state ID
+ * @iden: guest state ID
+ *
+ * Returns the size of guest state ID.
+ */
+u16 gsid_size(u16 iden)
+{
+	int type;
+
+	type = gsid_type(iden);
+	if (type == -1)
+		return 0;
+
+	if (type >= __GSE_TYPE_MAX)
+		return 0;
+
+	return gse_iden_len[type];
+}
+EXPORT_SYMBOL(gsid_size);
+
+/**
+ * gsid_mask() - the settable bits of a guest state ID
+ * @iden: guest state ID
+ *
+ * Returns a mask of settable bits for a guest state ID.
+ */
+u64 gsid_mask(u16 iden)
+{
+	u64 mask = ~0ull;
+
+	switch (iden) {
+	case GSID_LPCR:
+		mask = LPCR_DPFD | LPCR_ILE | LPCR_AIL | LPCR_LD | LPCR_MER | LPCR_GTSE;
+		break;
+	case GSID_MSR:
+		mask = ~(MSR_HV | MSR_S | MSR_ME);
+		break;
+	}
+
+	return mask;
+}
+EXPORT_SYMBOL(gsid_mask);
+
+/**
+ * __gse_put() - add a guest state element to a buffer
+ * @gsb: buffer to the element to
+ * @iden: guest state ID
+ * @size: length of data
+ * @data: pointer to data
+ */
+int __gse_put(struct gs_buff *gsb, u16 iden, u16 size, const void *data)
+{
+	struct gs_elem *gse;
+	u16 total_size;
+
+	total_size = sizeof(*gse) + size;
+	if (total_size + gsb_len(gsb) > gsb_capacity(gsb))
+		return -ENOMEM;
+
+	if (gsid_size(iden) != size)
+		return -EINVAL;
+
+	gse = gsb_put(gsb, total_size);
+	gse->iden = cpu_to_be16(iden);
+	gse->len = cpu_to_be16(size);
+	memcpy(gse->data, data, size);
+
+	return 0;
+}
+EXPORT_SYMBOL(__gse_put);
+
+/**
+ * gse_parse() - create a parse map from a guest state buffer
+ * @gsp: guest state parser
+ * @gsb: guest state buffer
+ */
+int gse_parse(struct gs_parser *gsp, struct gs_buff *gsb)
+{
+	struct gs_elem *curr;
+	int rem, i;
+
+	gsb_for_each_elem(i, curr, gsb, rem) {
+		if (gse_len(curr) != gsid_size(gse_iden(curr)))
+			return -EINVAL;
+		gsp_insert(gsp, gse_iden(curr), curr);
+	}
+
+	if (gsb_nelems(gsb) != i)
+		return -EINVAL;
+	return 0;
+}
+EXPORT_SYMBOL(gse_parse);
+
+static inline int gse_flatten_iden(u16 iden)
+{
+	int bit = 0;
+	int class;
+
+	class = gsid_class(iden);
+
+	if (class == GS_CLASS_GUESTWIDE) {
+		bit += iden - GSE_GUESTWIDE_START;
+		return bit;
+	}
+
+	bit += GSE_GUESTWIDE_COUNT;
+
+	if (class == GS_CLASS_META) {
+		bit += iden - GSE_META_START;
+		return bit;
+	}
+
+	bit += GSE_META_COUNT;
+
+	if (class == GS_CLASS_DWORD_REG) {
+		bit += iden - GSE_DW_REGS_START;
+		return bit;
+	}
+
+	bit += GSE_DW_REGS_COUNT;
+
+	if (class == GS_CLASS_WORD_REG) {
+		bit += iden - GSE_W_REGS_START;
+		return bit;
+	}
+
+	bit += GSE_W_REGS_COUNT;
+
+	if (class == GS_CLASS_VECTOR) {
+		bit += iden - GSE_VSRS_START;
+		return bit;
+	}
+
+	bit += GSE_VSRS_COUNT;
+
+	if (class == GS_CLASS_INTR) {
+		bit += iden - GSE_INTR_REGS_START;
+		return bit;
+	}
+
+	return 0;
+}
+
+static inline u16 gse_unflatten_iden(int bit)
+{
+	u16 iden;
+
+	if (bit < GSE_GUESTWIDE_COUNT) {
+		iden = GSE_GUESTWIDE_START + bit;
+		return iden;
+	}
+	bit -= GSE_GUESTWIDE_COUNT;
+
+	if (bit < GSE_META_COUNT) {
+		iden = GSE_META_START + bit;
+		return iden;
+	}
+	bit -= GSE_META_COUNT;
+
+	if (bit < GSE_DW_REGS_COUNT) {
+		iden = GSE_DW_REGS_START + bit;
+		return iden;
+	}
+	bit -= GSE_DW_REGS_COUNT;
+
+	if (bit < GSE_W_REGS_COUNT) {
+		iden = GSE_W_REGS_START + bit;
+		return iden;
+	}
+	bit -= GSE_W_REGS_COUNT;
+
+	if (bit < GSE_VSRS_COUNT) {
+		iden = GSE_VSRS_START + bit;
+		return iden;
+	}
+	bit -= GSE_VSRS_COUNT;
+
+	if (bit < GSE_IDEN_COUNT) {
+		iden = GSE_INTR_REGS_START + bit;
+		return iden;
+	}
+
+	return 0;
+}
+
+/**
+ * gsp_insert() - add a mapping from an guest state ID to an element
+ * @gsp: guest state parser
+ * @iden: guest state id (key)
+ * @gse: guest state element (value)
+ */
+void gsp_insert(struct gs_parser *gsp, u16 iden, struct gs_elem *gse)
+{
+	int i;
+
+	i = gse_flatten_iden(iden);
+	gsbm_set(&gsp->iterator, iden);
+	gsp->gses[i] = gse;
+}
+EXPORT_SYMBOL(gsp_insert);
+
+/**
+ * gsp_lookup() - lookup an element from a guest state ID
+ * @gsp: guest state parser
+ * @iden: guest state ID (key)
+ *
+ * Returns the guest state element if present.
+ */
+struct gs_elem *gsp_lookup(struct gs_parser *gsp, u16 iden)
+{
+	int i;
+
+	i = gse_flatten_iden(iden);
+	return gsp->gses[i];
+}
+EXPORT_SYMBOL(gsp_lookup);
+
+/**
+ * gsbm_set() - set the guest state ID
+ * @gsbm: guest state bitmap
+ * @iden: guest state ID
+ */
+void gsbm_set(struct gs_bitmap *gsbm, u16 iden)
+{
+	set_bit(gse_flatten_iden(iden), gsbm->bitmap);
+}
+EXPORT_SYMBOL(gsbm_set);
+
+/**
+ * gsbm_clear() - clear the guest state ID
+ * @gsbm: guest state bitmap
+ * @iden: guest state ID
+ */
+void gsbm_clear(struct gs_bitmap *gsbm, u16 iden)
+{
+	clear_bit(gse_flatten_iden(iden), gsbm->bitmap);
+}
+EXPORT_SYMBOL(gsbm_clear);
+
+/**
+ * gsbm_test() - test the guest state ID
+ * @gsbm: guest state bitmap
+ * @iden: guest state ID
+ */
+bool gsbm_test(struct gs_bitmap *gsbm, u16 iden)
+{
+	return test_bit(gse_flatten_iden(iden), gsbm->bitmap);
+}
+EXPORT_SYMBOL(gsbm_test);
+
+/**
+ * gsbm_next() - return the next set guest state ID
+ * @gsbm: guest state bitmap
+ * @prev: last guest state ID
+ */
+u16 gsbm_next(struct gs_bitmap *gsbm, u16 prev)
+{
+	int bit, pbit;
+
+	pbit = prev ? gse_flatten_iden(prev) + 1 : 0;
+	bit = find_next_bit(gsbm->bitmap, GSE_IDEN_COUNT, pbit);
+
+	if (bit < GSE_IDEN_COUNT)
+		return gse_unflatten_iden(bit);
+	return 0;
+}
+EXPORT_SYMBOL(gsbm_next);
+
+/**
+ * gsm_init() - initialize a guest state message
+ * @gsm: guest state message
+ * @ops: callbacks
+ * @data: private data
+ * @flags: guest wide or thread wide
+ */
+int gsm_init(struct gs_msg *gsm, struct gs_msg_ops *ops, void *data,
+	     unsigned long flags)
+{
+	memset(gsm, 0, sizeof(*gsm));
+	gsm->ops = ops;
+	gsm->data = data;
+	gsm->flags = flags;
+
+	return 0;
+}
+EXPORT_SYMBOL(gsm_init);
+
+/**
+ * gsm_init() - creates a new guest state message
+ * @ops: callbacks
+ * @data: private data
+ * @flags: guest wide or thread wide
+ * @gfp_flags: GFP allocation flags
+ *
+ * Returns an initialized guest state message.
+ */
+struct gs_msg *gsm_new(struct gs_msg_ops *ops, void *data, unsigned long flags,
+		       gfp_t gfp_flags)
+{
+	struct gs_msg *gsm;
+
+	gsm = kzalloc(sizeof(*gsm), gfp_flags);
+	if (!gsm)
+		return NULL;
+
+	gsm_init(gsm, ops, data, flags);
+
+	return gsm;
+}
+EXPORT_SYMBOL(gsm_new);
+
+/**
+ * gsm_size() - creates a new guest state message
+ * @gsm: self
+ *
+ * Returns the size required for the message.
+ */
+size_t gsm_size(struct gs_msg *gsm)
+{
+	if (gsm->ops->get_size)
+		return gsm->ops->get_size(gsm);
+	return 0;
+}
+EXPORT_SYMBOL(gsm_size);
+
+/**
+ * gsm_free() - free guest state message
+ * @gsm: guest state message
+ *
+ * Returns the size required for the message.
+ */
+void gsm_free(struct gs_msg *gsm)
+{
+	kfree(gsm);
+}
+EXPORT_SYMBOL(gsm_free);
+
+/**
+ * gsm_fill_info() - serialises message to guest state buffer format
+ * @gsm: self
+ * @gsb: buffer to serialise into
+ */
+int gsm_fill_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	if (!gsm->ops->fill_info)
+		return -EINVAL;
+
+	gsb_reset(gsb);
+	return gsm->ops->fill_info(gsb, gsm);
+}
+EXPORT_SYMBOL(gsm_fill_info);
+
+/**
+ * gsm_fill_info() - deserialises from guest state buffer
+ * @gsm: self
+ * @gsb: buffer to serialise from
+ */
+int gsm_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	if (!gsm->ops->fill_info)
+		return -EINVAL;
+
+	return gsm->ops->refresh_info(gsm, gsb);
+}
+EXPORT_SYMBOL(gsm_refresh_info);
diff --git a/arch/powerpc/kvm/test-guest-state-buffer.c b/arch/powerpc/kvm/test-guest-state-buffer.c
new file mode 100644
index 000000000000..d038051b61f8
--- /dev/null
+++ b/arch/powerpc/kvm/test-guest-state-buffer.c
@@ -0,0 +1,321 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/init.h>
+#include <linux/log2.h>
+#include <kunit/test.h>
+
+
+#include <asm/guest-state-buffer.h>
+
+static void test_creating_buffer(struct kunit *test)
+{
+	struct gs_buff *gsb;
+	size_t size = 0x100;
+
+	gsb = gsb_new(size, 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb->hdr);
+
+	KUNIT_EXPECT_EQ(test, gsb->capacity, roundup_pow_of_two(size));
+	KUNIT_EXPECT_EQ(test, gsb->len, sizeof(__be32));
+
+	gsb_free(gsb);
+}
+
+static void test_adding_element(struct kunit *test)
+{
+	const struct gs_elem *head, *curr;
+	union {
+		__vector128 v;
+		u64 dw[2];
+	} u;
+	int rem;
+	struct gs_buff *gsb;
+	size_t size = 0x1000;
+	int i, rc;
+	u64 data;
+
+	gsb = gsb_new(size, 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	/* Single elements, direct use of __gse_put() */
+	data = 0xdeadbeef;
+	rc = __gse_put(gsb, GSID_GPR(0), 8, &data);
+	KUNIT_EXPECT_GE(test, rc, 0);
+
+	head = gsb_data(gsb);
+	KUNIT_EXPECT_EQ(test, gse_iden(head), GSID_GPR(0));
+	KUNIT_EXPECT_EQ(test, gse_len(head), 8);
+	data = 0;
+	memcpy(&data, gse_data(head), 8);
+	KUNIT_EXPECT_EQ(test, data, 0xdeadbeef);
+
+	/* Multiple elements, simple wrapper */
+	rc = gse_put_u64(gsb, GSID_GPR(1), 0xcafef00d);
+	KUNIT_EXPECT_GE(test, rc, 0);
+
+	u.dw[0] = 0x1;
+	u.dw[1] = 0x2;
+	rc = gse_put_vector128(gsb, GSID_VSRS(0), u.v);
+	KUNIT_EXPECT_GE(test, rc, 0);
+	u.dw[0] = 0x0;
+	u.dw[1] = 0x0;
+
+	gsb_for_each_elem(i, curr, gsb, rem) {
+		switch (i) {
+		case 0:
+			KUNIT_EXPECT_EQ(test, gse_iden(curr), GSID_GPR(0));
+			KUNIT_EXPECT_EQ(test, gse_len(curr), 8);
+			KUNIT_EXPECT_EQ(test, gse_get_be64(curr), 0xdeadbeef);
+			break;
+		case 1:
+			KUNIT_EXPECT_EQ(test, gse_iden(curr), GSID_GPR(1));
+			KUNIT_EXPECT_EQ(test, gse_len(curr), 8);
+			KUNIT_EXPECT_EQ(test, gse_get_u64(curr), 0xcafef00d);
+			break;
+		case 2:
+			KUNIT_EXPECT_EQ(test, gse_iden(curr), GSID_VSRS(0));
+			KUNIT_EXPECT_EQ(test, gse_len(curr), 16);
+			u.v = gse_get_vector128(curr);
+			KUNIT_EXPECT_EQ(test, u.dw[0], 0x1);
+			KUNIT_EXPECT_EQ(test, u.dw[1], 0x2);
+			break;
+		}
+	}
+	KUNIT_EXPECT_EQ(test, i, 3);
+
+	gsb_reset(gsb);
+	KUNIT_EXPECT_EQ(test, gsb_nelems(gsb), 0);
+	KUNIT_EXPECT_EQ(test, gsb_len(gsb), sizeof(struct gs_header));
+
+	gsb_free(gsb);
+}
+
+static void test_gs_parsing(struct kunit *test)
+{
+	struct gs_elem *gse;
+	struct gs_parser gsp = { 0 };
+	struct gs_buff *gsb;
+	size_t size = 0x1000;
+	u64 tmp1, tmp2;
+
+	gsb = gsb_new(size, 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	tmp1 = 0xdeadbeefull;
+	gse_put(gsb, GSID_GPR(0), tmp1);
+
+	KUNIT_EXPECT_GE(test, gse_parse(&gsp, gsb), 0);
+
+	gse = gsp_lookup(&gsp, GSID_GPR(0));
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gse);
+
+	gse_get(gse, &tmp2);
+	KUNIT_EXPECT_EQ(test, tmp2, 0xdeadbeefull);
+
+	gsb_free(gsb);
+}
+
+static void test_gs_bitmap(struct kunit *test)
+{
+	struct gs_bitmap gsbm = { 0 };
+	struct gs_bitmap gsbm1 = { 0 };
+	struct gs_bitmap gsbm2 = { 0 };
+	u16 iden;
+	int i, j;
+
+	i = 0;
+	for (u16 iden = GSID_HOST_STATE_SIZE;
+	     iden <= GSID_PROCESS_TABLE; iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_RUN_INPUT; iden <= GSID_VPA;
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_GPR(0); iden <= GSID_CTRL;
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_CR; iden <= GSID_PSPB; iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_VSRS(0); iden <= GSID_VSRS(63);
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_HDAR; iden <= GSID_ASDR;
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	j = 0;
+	gsbm_for_each(&gsbm1, iden)
+	{
+		gsbm_set(&gsbm2, iden);
+		j++;
+	}
+	KUNIT_EXPECT_EQ(test, i, j);
+	KUNIT_EXPECT_MEMEQ(test, &gsbm1, &gsbm2, sizeof(gsbm1));
+}
+
+struct gs_msg_test1_data {
+	u64 a;
+	u32 b;
+	struct gs_part_table c;
+	struct gs_proc_table d;
+	struct gs_buff_info e;
+};
+
+static size_t test1_get_size(struct gs_msg *gsm)
+{
+	size_t size = 0;
+	u16 ids[] = {
+		GSID_PARTITION_TABLE,
+		GSID_PROCESS_TABLE,
+		GSID_RUN_INPUT,
+		GSID_GPR(0),
+		GSID_CR,
+	};
+
+	for (int i = 0; i < ARRAY_SIZE(ids); i++)
+		size += gse_total_size(gsid_size(ids[i]));
+	return size;
+}
+
+static int test1_fill_info(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	struct gs_msg_test1_data *data = gsm->data;
+
+	if (gsm_includes(gsm, GSID_GPR(0)))
+		gse_put(gsb, GSID_GPR(0), data->a);
+
+	if (gsm_includes(gsm, GSID_CR))
+		gse_put(gsb, GSID_CR, data->b);
+
+	if (gsm_includes(gsm, GSID_PARTITION_TABLE))
+		gse_put(gsb, GSID_PARTITION_TABLE, data->c);
+
+	if (gsm_includes(gsm, GSID_PROCESS_TABLE))
+		gse_put(gsb, GSID_PARTITION_TABLE, data->d);
+
+	if (gsm_includes(gsm, GSID_RUN_INPUT))
+		gse_put(gsb, GSID_RUN_INPUT, data->e);
+
+	return 0;
+}
+
+static int test1_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	struct gs_parser gsp = { 0 };
+	struct gs_msg_test1_data *data = gsm->data;
+	struct gs_elem *gse;
+	int rc;
+
+	rc = gse_parse(&gsp, gsb);
+	if (rc < 0)
+		return rc;
+
+	gse = gsp_lookup(&gsp, GSID_GPR(0));
+	if (gse)
+		gse_get(gse, &data->a);
+
+	gse = gsp_lookup(&gsp, GSID_CR);
+	if (gse)
+		gse_get(gse, &data->b);
+
+	return 0;
+}
+
+static struct gs_msg_ops gs_msg_test1_ops = {
+	.get_size = test1_get_size,
+	.fill_info = test1_fill_info,
+	.refresh_info = test1_refresh_info,
+};
+
+static void test_gs_msg(struct kunit *test)
+{
+	struct gs_msg_test1_data test1_data = {
+		.a = 0xdeadbeef,
+		.b = 0x1,
+	};
+	struct gs_msg *gsm;
+	struct gs_buff *gsb;
+
+	gsm = gsm_new(&gs_msg_test1_ops, &test1_data, GSM_SEND, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsm);
+
+	gsb = gsb_new(gsm_size(gsm), 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	gsm_include(gsm, GSID_PARTITION_TABLE);
+	gsm_include(gsm, GSID_PROCESS_TABLE);
+	gsm_include(gsm, GSID_RUN_INPUT);
+	gsm_include(gsm, GSID_GPR(0));
+	gsm_include(gsm, GSID_CR);
+
+	gsm_fill_info(gsm, gsb);
+
+	memset(&test1_data, 0, sizeof(test1_data));
+
+	gsm_refresh_info(gsm, gsb);
+	KUNIT_EXPECT_EQ(test, test1_data.a, 0xdeadbeef);
+	KUNIT_EXPECT_EQ(test, test1_data.b, 0x1);
+
+	gsm_free(gsm);
+}
+
+
+static struct kunit_case guest_state_buffer_testcases[] = {
+	KUNIT_CASE(test_creating_buffer),
+	KUNIT_CASE(test_adding_element),
+	KUNIT_CASE(test_gs_bitmap),
+	KUNIT_CASE(test_gs_parsing),
+	KUNIT_CASE(test_gs_msg),
+	{}
+};
+
+static struct kunit_suite guest_state_buffer_test_suite = {
+	.name = "guest_state_buffer_test",
+	.test_cases = guest_state_buffer_testcases,
+};
+
+kunit_test_suites(&guest_state_buffer_test_suite);
+
+MODULE_LICENSE("GPL");
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Jordan Niethe, mikey, kautuk.consul.1980, kvm, npiggin, kvm-ppc,
	sbhat, vaibhav

The new PAPR nested guest API introduces the concept of a Guest State
Buffer for communication about L2 guests between L1 and L0 hosts.

In the new API, the L0 manages the L2 on behalf of the L1. This means
that if the L1 needs to change L2 state (e.g. GPRs, SPRs, partition
table...), it must request the L0 perform the modification. If the
nested host needs to read L2 state likewise this request must
go through the L0.

The Guest State Buffer is a Type-Length-Value style data format defined
in the PAPR which assigns all relevant partition state a unique
identity. Unlike a typical TLV format the length is redundant as the
length of each identity is fixed but is included for checking
correctness.

A guest state buffer consists of an element count followed by a stream
of elements, where elements are composed of an ID number, data length,
then the data:

  Header:

   <---4 bytes--->
  +----------------+-----
  | Element Count  | Elements...
  +----------------+-----

  Element:

   <----2 bytes---> <-2 bytes-> <-Length bytes->
  +----------------+-----------+----------------+
  | Guest State ID |  Length   |      Data      |
  +----------------+-----------+----------------+

Guest State IDs have other attributes defined in the PAPR such as
whether they are per thread or per guest, or read-only.

Introduce a library for using guest state buffers. This includes support
for actions such as creating buffers, adding elements to buffers,
reading the value of elements and parsing buffers. This will be used
later by the PAPR nested guest support.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
v2:
  - Add missing #ifdef CONFIG_VSXs
  - Move files from lib/ to kvm/
  - Guard compilation on CONFIG_KVM_BOOK3S_HV_POSSIBLE
  - Use kunit for guest state buffer tests
  - Add configuration option for the tests
  - Use macros for contiguous id ranges like GPRs
  - Add some missing EXPORTs to functions
  - HEIR element is a double word not a word
---
 arch/powerpc/Kconfig.debug                    |  12 +
 arch/powerpc/include/asm/guest-state-buffer.h | 901 ++++++++++++++++++
 arch/powerpc/include/asm/kvm_book3s.h         |   2 +
 arch/powerpc/kvm/Makefile                     |   3 +
 arch/powerpc/kvm/guest-state-buffer.c         | 563 +++++++++++
 arch/powerpc/kvm/test-guest-state-buffer.c    | 321 +++++++
 6 files changed, 1802 insertions(+)
 create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
 create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
 create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 6aaf8dc60610..ed830a714720 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -82,6 +82,18 @@ config MSI_BITMAP_SELFTEST
 	bool "Run self-tests of the MSI bitmap code"
 	depends on DEBUG_KERNEL
 
+config GUEST_STATE_BUFFER_TEST
+	def_tristate n
+	prompt "Enable Guest State Buffer unit tests"
+	depends on KUNIT
+	depends on KVM_BOOK3S_HV_POSSIBLE
+	default KUNIT_ALL_TESTS
+	help
+	  The Guest State Buffer is a data format specified in the PAPR.
+	  It is by hcalls to communicate the state of L2 guests between
+	  the L1 and L0 hypervisors. Enable unit tests for the library
+	  used to create and use guest state buffers.
+
 config PPC_IRQ_SOFT_MASK_DEBUG
 	bool "Include extra checks for powerpc irq soft masking"
 	depends on PPC64
diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
new file mode 100644
index 000000000000..65a840abf1bb
--- /dev/null
+++ b/arch/powerpc/include/asm/guest-state-buffer.h
@@ -0,0 +1,901 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Interface based on include/net/netlink.h
+ */
+#ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
+#define _ASM_POWERPC_GUEST_STATE_BUFFER_H
+
+#include <linux/gfp.h>
+#include <linux/bitmap.h>
+#include <asm/plpar_wrappers.h>
+
+/**************************************************************************
+ * Guest State Buffer Constants
+ **************************************************************************/
+#define GSID_BLANK			0x0000
+
+#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
+#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
+#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
+#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
+#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
+#define GSID_PROCESS_TABLE		0x0006 /* Process Table */
+
+#define GSID_RUN_INPUT			0x0C00 /* Run VCPU Input Buffer */
+#define GSID_RUN_OUTPUT			0x0C01 /* Run VCPU Out Buffer */
+#define GSID_VPA			0x0C02 /* HRA to Guest VCPU VPA */
+
+#define GSID_GPR(x)			(0x1000 + (x))
+#define GSID_HDEC_EXPIRY_TB		0x1020
+#define GSID_NIA			0x1021
+#define GSID_MSR			0x1022
+#define GSID_LR				0x1023
+#define GSID_XER			0x1024
+#define GSID_CTR			0x1025
+#define GSID_CFAR			0x1026
+#define GSID_SRR0			0x1027
+#define GSID_SRR1			0x1028
+#define GSID_DAR			0x1029
+#define GSID_DEC_EXPIRY_TB		0x102A
+#define GSID_VTB			0x102B
+#define GSID_LPCR			0x102C
+#define GSID_HFSCR			0x102D
+#define GSID_FSCR			0x102E
+#define GSID_FPSCR			0x102F
+#define GSID_DAWR0			0x1030
+#define GSID_DAWR1			0x1031
+#define GSID_CIABR			0x1032
+#define GSID_PURR			0x1033
+#define GSID_SPURR			0x1034
+#define GSID_IC				0x1035
+#define GSID_SPRG0			0x1036
+#define GSID_SPRG1			0x1037
+#define GSID_SPRG2			0x1038
+#define GSID_SPRG3			0x1039
+#define GSID_PPR			0x103A
+#define GSID_MMCR(x)			(0x103B + (x))
+#define GSID_MMCRA			0x103F
+#define GSID_SIER(x)			(0x1040 + (x))
+#define GSID_BESCR			0x1043
+#define GSID_EBBHR			0x1044
+#define GSID_EBBRR			0x1045
+#define GSID_AMR			0x1046
+#define GSID_IAMR			0x1047
+#define GSID_AMOR			0x1048
+#define GSID_UAMOR			0x1049
+#define GSID_SDAR			0x104A
+#define GSID_SIAR			0x104B
+#define GSID_DSCR			0x104C
+#define GSID_TAR			0x104D
+#define GSID_DEXCR			0x104E
+#define GSID_HDEXCR			0x104F
+#define GSID_HASHKEYR			0x1050
+#define GSID_HASHPKEYR			0x1051
+#define GSID_CTRL			0x1052
+
+#define GSID_CR				0x2000
+#define GSID_PIDR			0x2001
+#define GSID_DSISR			0x2002
+#define GSID_VSCR			0x2003
+#define GSID_VRSAVE			0x2004
+#define GSID_DAWRX0			0x2005
+#define GSID_DAWRX1			0x2006
+#define GSID_PMC(x)			(0x2007 + (x))
+#define GSID_WORT			0x200D
+#define GSID_PSPB			0x200E
+
+#define GSID_VSRS(x)			(0x3000 + (x))
+
+#define GSID_HDAR			0xF000
+#define GSID_HDSISR			0xF001
+#define GSID_HEIR			0xF002
+#define GSID_ASDR			0xF003
+
+
+#define GSE_GUESTWIDE_START GSID_BLANK
+#define GSE_GUESTWIDE_END GSID_PROCESS_TABLE
+#define GSE_GUESTWIDE_COUNT (GSE_GUESTWIDE_END - GSE_GUESTWIDE_START + 1)
+
+#define GSE_META_START GSID_RUN_INPUT
+#define GSE_META_END GSID_VPA
+#define GSE_META_COUNT (GSE_META_END - GSE_META_START + 1)
+
+#define GSE_DW_REGS_START GSID_GPR(0)
+#define GSE_DW_REGS_END GSID_CTRL
+#define GSE_DW_REGS_COUNT (GSE_DW_REGS_END - GSE_DW_REGS_START + 1)
+
+#define GSE_W_REGS_START GSID_CR
+#define GSE_W_REGS_END GSID_PSPB
+#define GSE_W_REGS_COUNT (GSE_W_REGS_END - GSE_W_REGS_START + 1)
+
+#define GSE_VSRS_START GSID_VSRS(0)
+#define GSE_VSRS_END GSID_VSRS(63)
+#define GSE_VSRS_COUNT (GSE_VSRS_END - GSE_VSRS_START + 1)
+
+#define GSE_INTR_REGS_START GSID_HDAR
+#define GSE_INTR_REGS_END GSID_ASDR
+#define GSE_INTR_REGS_COUNT (GSE_INTR_REGS_END - GSE_INTR_REGS_START + 1)
+
+#define GSE_IDEN_COUNT                                              \
+	(GSE_GUESTWIDE_COUNT + GSE_META_COUNT + GSE_DW_REGS_COUNT + \
+	 GSE_W_REGS_COUNT + GSE_VSRS_COUNT + GSE_INTR_REGS_COUNT)
+
+
+/**
+ * Ranges of guest state buffer elements
+ */
+enum {
+	GS_CLASS_GUESTWIDE = 0x01,
+	GS_CLASS_META = 0x02,
+	GS_CLASS_DWORD_REG = 0x04,
+	GS_CLASS_WORD_REG = 0x08,
+	GS_CLASS_VECTOR = 0x10,
+	GS_CLASS_INTR = 0x20,
+};
+
+/**
+ * Types of guest state buffer elements
+ */
+enum {
+	GSE_BE32,
+	GSE_BE64,
+	GSE_VEC128,
+	GSE_PARTITION_TABLE,
+	GSE_PROCESS_TABLE,
+	GSE_BUFFER,
+	__GSE_TYPE_MAX,
+};
+
+/**
+ * Flags for guest state elements
+ */
+enum {
+	GS_FLAGS_WIDE = 0x01,
+};
+
+/**
+ * struct gs_part_table - deserialized partition table information element
+ * @address: start of the partition table
+ * @ea_bits: number of bits in the effective address
+ * @gpd_size: root page directory size
+ */
+struct gs_part_table {
+	u64 address;
+	u64 ea_bits;
+	u64 gpd_size;
+};
+
+/**
+ * struct gs_proc_table - deserialized process table information element
+ * @address: start of the process table
+ * @gpd_size: process table size
+ */
+struct gs_proc_table {
+	u64 address;
+	u64 gpd_size;
+};
+
+/**
+ * struct gs_buff_info - deserialized meta guest state buffer information
+ * @address: start of the guest state buffer
+ * @size: size of the guest state buffer
+ */
+struct gs_buff_info {
+	u64 address;
+	u64 size;
+};
+
+/**
+ * struct gs_header - serialized guest state buffer header
+ * @nelem: count of guest state elements in the buffer
+ * @data: start of the stream of elements in the buffer
+ */
+struct gs_header {
+	__be32 nelems;
+	char data[];
+} __packed;
+
+/**
+ * struct gs_elem - serialized guest state buffer element
+ * @iden: Guest State ID
+ * @len: length of data
+ * @data: the guest state buffer element's value
+ */
+struct gs_elem {
+	__be16 iden;
+	__be16 len;
+	char data[];
+} __packed;
+
+/**
+ * struct gs_buff - a guest state buffer with metadata.
+ * @capacity: total length of the buffer
+ * @len: current length of the elements and header
+ * @guest_id: guest id associated with the buffer
+ * @vcpu_id: vcpu_id associated with the buffer
+ * @hdr: the serialised guest state buffer
+ */
+struct gs_buff {
+	size_t capacity;
+	size_t len;
+	unsigned long guest_id;
+	unsigned long vcpu_id;
+	struct gs_header *hdr;
+};
+
+/**
+ * struct gs_bitmap - a bitmap for element ids
+ * @bitmap: a bitmap large enough for all Guest State IDs
+ */
+struct gs_bitmap {
+/* private: */
+	DECLARE_BITMAP(bitmap, GSE_IDEN_COUNT);
+};
+
+/**
+ * struct gs_parser - a map of element ids to locations in a buffer
+ * @iterator: bitmap used for iterating
+ * @gses: contains the pointers to elements
+ *
+ * A guest state parser is used for deserialising a guest state buffer.
+ * Given a buffer, it then allows looking up guest state elements using
+ * a guest state id.
+ */
+struct gs_parser {
+/* private: */
+	struct gs_bitmap iterator;
+	struct gs_elem *gses[GSE_IDEN_COUNT];
+};
+
+enum {
+	GSM_GUEST_WIDE = 0x1,
+	GSM_SEND = 0x2,
+	GSM_RECEIVE = 0x4,
+	GSM_GSB_OWNER = 0x8,
+};
+
+struct gs_msg;
+
+/**
+ * struct gs_msg_ops - guest state message behavior
+ * @get_size: maximum size required for the message data
+ * @fill_info: serializes to the guest state buffer format
+ * @refresh_info: dserializes from the guest state buffer format
+ */
+struct gs_msg_ops {
+	size_t (*get_size)(struct gs_msg *gsm);
+	int (*fill_info)(struct gs_buff *gsb, struct gs_msg *gsm);
+	int (*refresh_info)(struct gs_msg *gsm, struct gs_buff *gsb);
+};
+
+/**
+ * struct gs_msg - a guest state message
+ * @bitmap: the guest state ids that should be included
+ * @ops: modify message behavior for reading and writing to buffers
+ * @flags: guest wide or thread wide
+ * @data: location where buffer data will be written to or from.
+ *
+ * A guest state message is allows flexibility in sending in receiving data
+ * in a guest state buffer format.
+ */
+struct gs_msg {
+	struct gs_bitmap bitmap;
+	struct gs_msg_ops *ops;
+	unsigned long flags;
+	void *data;
+};
+
+/**************************************************************************
+ * Guest State IDs
+ **************************************************************************/
+
+u16 gsid_size(u16 iden);
+unsigned long gsid_flags(u16 iden);
+u64 gsid_mask(u16 iden);
+
+/**************************************************************************
+ * Guest State Buffers
+ **************************************************************************/
+struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
+			unsigned long vcpu_id, gfp_t flags);
+void gsb_free(struct gs_buff *gsb);
+void *gsb_put(struct gs_buff *gsb, size_t size);
+
+/**
+ * gsb_header() - the header of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns a pointer to the buffer header.
+ */
+static inline struct gs_header *gsb_header(struct gs_buff *gsb)
+{
+	return gsb->hdr;
+}
+
+/**
+ * gsb_data() - the elements of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns a pointer to the first element of the buffer data.
+ */
+static inline struct gs_elem *gsb_data(struct gs_buff *gsb)
+{
+	return (struct gs_elem *)gsb_header(gsb)->data;
+}
+
+/**
+ * gsb_len() - the current length of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the length including the header of a buffer.
+ */
+static inline size_t gsb_len(struct gs_buff *gsb)
+{
+	return gsb->len;
+}
+
+/**
+ * gsb_capacity() - the capacity of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the capacity of a buffer.
+ */
+static inline size_t gsb_capacity(struct gs_buff *gsb)
+{
+	return gsb->capacity;
+}
+
+/**
+ * gsb_paddress() - the physical address of buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the physical address of the buffer.
+ */
+static inline u64 gsb_paddress(struct gs_buff *gsb)
+{
+	return __pa(gsb_header(gsb));
+}
+
+/**
+ * gsb_nelems() - the number of elements in a buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the number of elements in a buffer
+ */
+static inline u32 gsb_nelems(struct gs_buff *gsb)
+{
+	return be32_to_cpu(gsb_header(gsb)->nelems);
+}
+
+/**
+ * gsb_reset() - empty a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Reset the number of elements and length of buffer to empty.
+ */
+static inline void gsb_reset(struct gs_buff *gsb)
+{
+	gsb_header(gsb)->nelems = cpu_to_be32(0);
+	gsb->len = sizeof(struct gs_header);
+}
+
+/**
+ * gsb_data_len() - the length of a buffer excluding the header
+ * @gsb: guest state buffer
+ *
+ * Returns the length of a buffer excluding the header
+ */
+static inline size_t gsb_data_len(struct gs_buff *gsb)
+{
+	return gsb->len - sizeof(struct gs_header);
+}
+
+/**
+ * gsb_data_cap() - the capacity of a buffer excluding the header
+ * @gsb: guest state buffer
+ *
+ * Returns the capacity of a buffer excluding the header
+ */
+static inline size_t gsb_data_cap(struct gs_buff *gsb)
+{
+	return gsb->capacity - sizeof(struct gs_header);
+}
+
+/**
+ * gsb_for_each_elem - iterate over the elements in a buffer
+ * @i: loop counter
+ * @pos: set to current element
+ * @gsb: guest state buffer
+ * @rem: initialized to buffer capacity, holds bytes currently remaining in stream
+ */
+#define gsb_for_each_elem(i, pos, gsb, rem)                       \
+	gse_for_each_elem(i, gsb_nelems(gsb), pos, gsb_data(gsb), \
+			  gsb_data_cap(gsb), rem)
+
+/**************************************************************************
+ * Guest State Elements
+ **************************************************************************/
+
+/**
+ * gse_iden() - guest state ID of element
+ * @gse: guest state element
+ *
+ * Return the guest state ID in host endianness.
+ */
+static inline u16 gse_iden(const struct gs_elem *gse)
+{
+	return be16_to_cpu(gse->iden);
+}
+
+/**
+ * gse_len() - length of guest state element data
+ * @gse: guest state element
+ *
+ * Returns the length of guest state element data
+ */
+static inline u16 gse_len(const struct gs_elem *gse)
+{
+	return be16_to_cpu(gse->len);
+}
+
+/**
+ * gse_total_len() - total length of guest state element
+ * @gse: guest state element
+ *
+ * Returns the length of the data plus the ID and size header.
+ */
+static inline u16 gse_total_len(const struct gs_elem *gse)
+{
+	return be16_to_cpu(gse->len) + sizeof(*gse);
+}
+
+/**
+ * gse_total_size() - space needed for a given data length
+ * @size: data length
+ *
+ * Returns size plus the space needed for the ID and size header.
+ */
+static inline u16 gse_total_size(u16 size)
+{
+	return sizeof(struct gs_elem) + size;
+}
+
+/**
+ * gse_data() - pointer to data of a guest state element
+ * @gse: guest state element
+ *
+ * Returns a pointer to the beginning of guest state element data.
+ */
+static inline void *gse_data(const struct gs_elem *gse)
+{
+	return (void *)gse->data;
+}
+
+/**
+ * gse_ok() - checks space exists for guest state element
+ * @gse: guest state element
+ * @remaining: bytes of space remaining
+ *
+ * Returns true if the guest state element can fit in remaining space.
+ */
+static inline bool gse_ok(const struct gs_elem *gse, int remaining)
+{
+	return remaining >= gse_total_len(gse);
+}
+
+/**
+ * gse_next() - iterate to the next guest state element in a stream
+ * @gse: stream of guest state elements
+ * @remaining: length of the guest element stream
+ *
+ * Returns the next guest state element in a stream of elements. The length of
+ * the stream is updated in remaining.
+ */
+static inline struct gs_elem *gse_next(const struct gs_elem *gse,
+				       int *remaining)
+{
+	int len = sizeof(*gse) + gse_len(gse);
+
+	*remaining -= len;
+	return (struct gs_elem *)(gse->data + gse_len(gse));
+}
+
+/**
+ * gse_for_each_elem - iterate over a stream of guest state elements
+ * @i: loop counter
+ * @max: number of elements
+ * @pos: set to current element
+ * @head: head of elements
+ * @len: length of the stream
+ * @rem: initialized to len, holds bytes currently remaining elements
+ */
+#define gse_for_each_elem(i, max, pos, head, len, rem)                  \
+	for (i = 0, pos = head, rem = len; gse_ok(pos, rem) && i < max; \
+	     pos = gse_next(pos, &(rem)), i++)
+
+int __gse_put(struct gs_buff *gsb, u16 iden, u16 size, const void *data);
+int gse_parse(struct gs_parser *gsp, struct gs_buff *gsb);
+
+/**
+ * gse_put_be32() - add a be32 guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: big endian value
+ */
+static inline int gse_put_be32(struct gs_buff *gsb, u16 iden, __be32 val)
+{
+	__be32 tmp;
+
+	tmp = val;
+	return __gse_put(gsb, iden, sizeof(__be32), &tmp);
+}
+
+/**
+ * gse_put_u32() - add a host endian 32bit int guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: host endian value
+ */
+static inline int gse_put_u32(struct gs_buff *gsb, u16 iden, u32 val)
+{
+	__be32 tmp;
+
+	tmp = cpu_to_be32(val);
+	return gse_put_be32(gsb, iden, tmp);
+}
+
+/**
+ * gse_put_be64() - add a be64 guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: big endian value
+ */
+static inline int gse_put_be64(struct gs_buff *gsb, u16 iden, __be64 val)
+{
+	__be64 tmp;
+
+	tmp = val;
+	return __gse_put(gsb, iden, sizeof(__be64), &tmp);
+}
+
+/**
+ * gse_put_u64() - add a host endian 64bit guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: host endian value
+ */
+static inline int gse_put_u64(struct gs_buff *gsb, u16 iden, u64 val)
+{
+	__be64 tmp;
+
+	tmp = cpu_to_be64(val);
+	return gse_put_be64(gsb, iden, tmp);
+}
+
+/**
+ * __gse_put_reg() - add a register type guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: host endian value
+ *
+ * Adds a register type guest state element. Uses the guest state ID for
+ * determining the length of the guest element. If the guest state ID has
+ * bits that can not be set they will be cleared.
+ */
+static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
+{
+	val &= gsid_mask(iden);
+	if (gsid_size(iden) == sizeof(u64))
+		return gse_put_u64(gsb, iden, val);
+
+	if (gsid_size(iden) == sizeof(u32)) {
+		u32 tmp;
+
+		tmp = (u32)val;
+		if (tmp != val)
+			return -EINVAL;
+
+		return gse_put_u32(gsb, iden, tmp);
+	}
+	return -EINVAL;
+}
+
+/**
+ * gse_put_vector128() - add a vector guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: 16 byte vector value
+ */
+static inline int gse_put_vector128(struct gs_buff *gsb, u16 iden,
+				    vector128 val)
+{
+	__be64 tmp[2] = { 0 };
+	union {
+		__vector128 v;
+		u64 dw[2];
+	} u;
+
+	u.v = val;
+	tmp[0] = cpu_to_be64(u.dw[TS_FPROFFSET]);
+#ifdef CONFIG_VSX
+	tmp[1] = cpu_to_be64(u.dw[TS_VSRLOWOFFSET]);
+#endif
+	return __gse_put(gsb, iden, sizeof(tmp), &tmp);
+}
+
+/**
+ * gse_put_part_table() - add a partition table guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: partition table value
+ */
+static inline int gse_put_part_table(struct gs_buff *gsb, u16 iden,
+				     struct gs_part_table val)
+{
+	__be64 tmp[3];
+
+	tmp[0] = cpu_to_be64(val.address);
+	tmp[1] = cpu_to_be64(val.ea_bits);
+	tmp[2] = cpu_to_be64(val.gpd_size);
+	return __gse_put(gsb, GSID_PARTITION_TABLE, sizeof(tmp), &tmp);
+}
+
+/**
+ * gse_put_proc_table() - add a process table guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: process table value
+ */
+static inline int gse_put_proc_table(struct gs_buff *gsb, u16 iden,
+				     struct gs_proc_table val)
+{
+	__be64 tmp[2];
+
+	tmp[0] = cpu_to_be64(val.address);
+	tmp[1] = cpu_to_be64(val.gpd_size);
+	return __gse_put(gsb, GSID_PROCESS_TABLE, sizeof(tmp), &tmp);
+}
+
+/**
+ * gse_put_buff_info() - adds a GSB description guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: guest state buffer description value
+ */
+static inline int gse_put_buff_info(struct gs_buff *gsb, u16 iden,
+				    struct gs_buff_info val)
+{
+	__be64 tmp[2];
+
+	tmp[0] = cpu_to_be64(val.address);
+	tmp[1] = cpu_to_be64(val.size);
+	return __gse_put(gsb, iden, sizeof(tmp), &tmp);
+}
+
+int __gse_put(struct gs_buff *gsb, u16 iden, u16 size, const void *data);
+
+/**
+ * gse_get_be32() - return the data of a be32 element
+ * @gse: guest state element
+ */
+static inline __be32 gse_get_be32(const struct gs_elem *gse)
+{
+	return *(__be32 *)gse_data(gse);
+}
+
+/**
+ * gse_get_u32() - return the data of a be32 element in host endianness
+ * @gse: guest state element
+ */
+static inline u32 gse_get_u32(const struct gs_elem *gse)
+{
+	return be32_to_cpu(gse_get_be32(gse));
+}
+
+/**
+ * gse_get_be64() - return the data of a be64 element
+ * @gse: guest state element
+ */
+static inline __be64 gse_get_be64(const struct gs_elem *gse)
+{
+	return *(__be64 *)gse_data(gse);
+}
+
+/**
+ * gse_get_u64() - return the data of a be64 element in host endianness
+ * @gse: guest state element
+ */
+static inline u64 gse_get_u64(const struct gs_elem *gse)
+{
+	return be64_to_cpu(gse_get_be64(gse));
+}
+
+/**
+ * __gse_get_reg() - return the date of a register type guest state element
+ * @gse: guest state element
+ *
+ * Determine the element data size from its guest state ID and return the
+ * correctly sized value.
+ */
+static inline u64 __gse_get_reg(const struct gs_elem *gse)
+{
+	if (gse_len(gse) == sizeof(u64))
+		return gse_get_u64(gse);
+
+	if (gse_len(gse) == sizeof(u32)) {
+		u32 tmp;
+
+		tmp = gse_get_u32(gse);
+		return (u64)tmp;
+	}
+	return 0;
+}
+
+/**
+ * gse_get_vector128() - return the data of a vector element
+ * @gse: guest state element
+ */
+static inline vector128 gse_get_vector128(const struct gs_elem *gse)
+{
+	union {
+		__vector128 v;
+		u64 dw[2];
+	} u = { 0 };
+	__be64 *src;
+
+	src = (__be64 *)gse_data(gse);
+	u.dw[TS_FPROFFSET] = be64_to_cpu(src[0]);
+#ifdef CONFIG_VSX
+	u.dw[TS_VSRLOWOFFSET] = be64_to_cpu(src[1]);
+#endif
+	return u.v;
+}
+
+/**
+ * gse_put - add a guest state element to a buffer
+ * @gsb: guest state buffer to add to
+ * @iden: guest state identity
+ * @v: generic value
+ */
+#define gse_put(gsb, iden, v)					\
+	(_Generic((v),						\
+		  u64 : __gse_put_reg,				\
+		  long unsigned int : __gse_put_reg,		\
+		  u32 : __gse_put_reg,				\
+		  struct gs_buff_info : gse_put_buff_info,	\
+		  struct gs_proc_table : gse_put_proc_table,	\
+		  struct gs_part_table : gse_put_part_table,	\
+		  vector128 : gse_put_vector128)(gsb, iden, v))
+
+/**
+ * gse_get - return the data of a guest state element
+ * @gsb: guest state element to add to
+ * @v: generic value pointer to return in
+ */
+#define gse_get(gse, v)						\
+	(*v = (_Generic((v),					\
+			u64 * : __gse_get_reg,			\
+			unsigned long * : __gse_get_reg,	\
+			u32 * : __gse_get_reg,			\
+			vector128 * : gse_get_vector128)(gse)))
+
+/**************************************************************************
+ * Guest State Bitmap
+ **************************************************************************/
+
+bool gsbm_test(struct gs_bitmap *gsbm, u16 iden);
+void gsbm_set(struct gs_bitmap *gsbm, u16 iden);
+void gsbm_clear(struct gs_bitmap *gsbm, u16 iden);
+u16 gsbm_next(struct gs_bitmap *gsbm, u16 prev);
+
+/**
+ * gsbm_zero - zero the entire bitmap
+ * @gsbm: guest state buffer bitmap
+ */
+static inline void gsbm_zero(struct gs_bitmap *gsbm)
+{
+	bitmap_zero(gsbm->bitmap, GSE_IDEN_COUNT);
+}
+
+/**
+ * gsbm_fill - fill the entire bitmap
+ * @gsbm: guest state buffer bitmap
+ */
+static inline void gsbm_fill(struct gs_bitmap *gsbm)
+{
+	bitmap_fill(gsbm->bitmap, GSE_IDEN_COUNT);
+	clear_bit(0, gsbm->bitmap);
+}
+
+/**
+ * gsbm_for_each - iterate the present guest state IDs
+ * @gsbm: guest state buffer bitmap
+ * @iden: current guest state ID
+ */
+#define gsbm_for_each(gsbm, iden) \
+	for (iden = gsbm_next(gsbm, 0); iden != 0; iden = gsbm_next(gsbm, iden))
+
+
+/**************************************************************************
+ * Guest State Parser
+ **************************************************************************/
+
+void gsp_insert(struct gs_parser *gsp, u16 iden, struct gs_elem *gse);
+struct gs_elem *gsp_lookup(struct gs_parser *gsp, u16 iden);
+
+/**
+ * gsp_for_each - iterate the <guest state IDs, guest state element> pairs
+ * @gsp: guest state buffer bitmap
+ * @iden: current guest state ID
+ * @gse: guest state element
+ */
+#define gsp_for_each(gsp, iden, gse)                              \
+	for (iden = gsbm_next(&(gsp)->iterator, 0),               \
+	    gse = gsp_lookup((gsp), iden);                        \
+	     iden != 0; iden = gsbm_next(&(gsp)->iterator, iden), \
+	    gse = gsp_lookup((gsp), iden))
+
+/**************************************************************************
+ * Guest State Message
+ **************************************************************************/
+
+/**
+ * gsm_for_each - iterate the guest state IDs included in a guest state message
+ * @gsp: guest state buffer bitmap
+ * @iden: current guest state ID
+ * @gse: guest state element
+ */
+#define gsm_for_each(gsm, iden)                            \
+	for (iden = gsbm_next(&gsm->bitmap, 0); iden != 0; \
+	     iden = gsbm_next(&gsm->bitmap, iden))
+
+int gsm_init(struct gs_msg *mgs, struct gs_msg_ops *ops, void *data,
+	     unsigned long flags);
+
+struct gs_msg *gsm_new(struct gs_msg_ops *ops, void *data, unsigned long flags,
+		       gfp_t gfp_flags);
+void gsm_free(struct gs_msg *gsm);
+size_t gsm_size(struct gs_msg *gsm);
+int gsm_fill_info(struct gs_msg *gsm, struct gs_buff *gsb);
+int gsm_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb);
+
+/**
+ * gsm_include - indicate a guest state ID should be included when serializing
+ * @gsm: guest state message
+ * @iden: guest state ID
+ */
+static inline void gsm_include(struct gs_msg *gsm, u16 iden)
+{
+	gsbm_set(&gsm->bitmap, iden);
+}
+
+/**
+ * gsm_includes - check if a guest state ID will be included when serializing
+ * @gsm: guest state message
+ * @iden: guest state ID
+ */
+static inline bool gsm_includes(struct gs_msg *gsm, u16 iden)
+{
+	return gsbm_test(&gsm->bitmap, iden);
+}
+
+/**
+ * gsm_includes - indicate all guest state IDs should be included when serializing
+ * @gsm: guest state message
+ * @iden: guest state ID
+ */
+static inline void gsm_include_all(struct gs_msg *gsm)
+{
+	gsbm_fill(&gsm->bitmap);
+}
+
+/**
+ * gsm_include - clear the guest state IDs that should be included when serializing
+ * @gsm: guest state message
+ */
+static inline void gsm_reset(struct gs_msg *gsm)
+{
+	gsbm_zero(&gsm->bitmap);
+}
+
+#endif /* _ASM_POWERPC_GUEST_STATE_BUFFER_H */
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 77653c5b356b..0ca2d8b37b42 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -444,6 +444,7 @@ static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 v
 	vcpu->arch.fp.fpr[i][j] = val;
 }
 
+#ifdef CONFIG_VSX
 static inline vector128 kvmppc_get_vsx_vr(struct kvm_vcpu *vcpu, int i)
 {
 	return vcpu->arch.vr.vr[i];
@@ -463,6 +464,7 @@ static inline void kvmppc_set_vscr(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.vr.vscr.u[3] = val;
 }
+#endif
 
 #define BOOK3S_WRAPPER_SET(reg, size)					\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 5319d889b184..eb8445e71c14 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -87,8 +87,11 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
 	book3s_hv_ras.o \
 	book3s_hv_builtin.o \
 	book3s_hv_p9_perf.o \
+	guest-state-buffer.o \
 	$(kvm-book3s_64-builtin-tm-objs-y) \
 	$(kvm-book3s_64-builtin-xics-objs-y)
+
+obj-$(CONFIG_GUEST_STATE_BUFFER_TEST) += test-guest-state-buffer.o
 endif
 
 kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
diff --git a/arch/powerpc/kvm/guest-state-buffer.c b/arch/powerpc/kvm/guest-state-buffer.c
new file mode 100644
index 000000000000..db4a79bfcaf1
--- /dev/null
+++ b/arch/powerpc/kvm/guest-state-buffer.c
@@ -0,0 +1,563 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "asm/hvcall.h"
+#include <linux/log2.h>
+#include <asm/pgalloc.h>
+#include <asm/guest-state-buffer.h>
+
+static const u16 gse_iden_len[__GSE_TYPE_MAX] = {
+	[GSE_BE32] = sizeof(__be32),
+	[GSE_BE64] = sizeof(__be64),
+	[GSE_VEC128] = sizeof(vector128),
+	[GSE_PARTITION_TABLE] = sizeof(struct gs_part_table),
+	[GSE_PROCESS_TABLE] = sizeof(struct gs_proc_table),
+	[GSE_BUFFER] = sizeof(struct gs_buff_info),
+};
+
+/**
+ * gsb_new() - create a new guest state buffer
+ * @size: total size of the guest state buffer (includes header)
+ * @guest_id: guest_id
+ * @vcpu_id: vcpu_id
+ * @flags: GFP flags
+ *
+ * Returns a guest state buffer.
+ */
+struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
+			unsigned long vcpu_id, gfp_t flags)
+{
+	struct gs_buff *gsb;
+
+	gsb = kzalloc(sizeof(*gsb), flags);
+	if (!gsb)
+		return NULL;
+
+	size = roundup_pow_of_two(size);
+	gsb->hdr = kzalloc(size, GFP_KERNEL);
+	if (!gsb->hdr)
+		goto free;
+
+	gsb->capacity = size;
+	gsb->len = sizeof(struct gs_header);
+	gsb->vcpu_id = vcpu_id;
+	gsb->guest_id = guest_id;
+
+	gsb->hdr->nelems = cpu_to_be32(0);
+
+	return gsb;
+
+free:
+	kfree(gsb);
+	return NULL;
+}
+EXPORT_SYMBOL(gsb_new);
+
+/**
+ * gsb_free() - free a guest state buffer
+ * @gsb: guest state buffer
+ */
+void gsb_free(struct gs_buff *gsb)
+{
+	kfree(gsb->hdr);
+	kfree(gsb);
+}
+EXPORT_SYMBOL(gsb_free);
+
+/**
+ * gsb_put() - allocate space in a guest state buffer
+ * @gsb: buffer to allocate in
+ * @size: amount of space to allocate
+ *
+ * Returns a pointer to the amount of space requested within the buffer and
+ * increments the count of elements in the buffer.
+ *
+ * Does not check if there is enough space in the buffer.
+ */
+void *gsb_put(struct gs_buff *gsb, size_t size)
+{
+	u32 nelems = gsb_nelems(gsb);
+	void *p;
+
+	p = (void *)gsb_header(gsb) + gsb_len(gsb);
+	gsb->len += size;
+
+	gsb_header(gsb)->nelems = cpu_to_be32(nelems + 1);
+	return p;
+}
+EXPORT_SYMBOL(gsb_put);
+
+static int gsid_class(u16 iden)
+{
+	if ((iden >= GSE_GUESTWIDE_START) && (iden <= GSE_GUESTWIDE_END))
+		return GS_CLASS_GUESTWIDE;
+
+	if ((iden >= GSE_META_START) && (iden <= GSE_META_END))
+		return GS_CLASS_META;
+
+	if ((iden >= GSE_DW_REGS_START) && (iden <= GSE_DW_REGS_END))
+		return GS_CLASS_DWORD_REG;
+
+	if ((iden >= GSE_W_REGS_START) && (iden <= GSE_W_REGS_END))
+		return GS_CLASS_WORD_REG;
+
+	if ((iden >= GSE_VSRS_START) && (iden <= GSE_VSRS_END))
+		return GS_CLASS_VECTOR;
+
+	if ((iden >= GSE_INTR_REGS_START) && (iden <= GSE_INTR_REGS_END))
+		return GS_CLASS_INTR;
+
+	return -1;
+}
+
+static int gsid_type(u16 iden)
+{
+	int type = -1;
+
+	switch (gsid_class(iden)) {
+	case GS_CLASS_GUESTWIDE:
+		switch (iden) {
+		case GSID_HOST_STATE_SIZE:
+		case GSID_RUN_OUTPUT_MIN_SIZE:
+		case GSID_TB_OFFSET:
+			type = GSE_BE64;
+			break;
+		case GSID_PARTITION_TABLE:
+			type = GSE_PARTITION_TABLE;
+			break;
+		case GSID_PROCESS_TABLE:
+			type = GSE_PROCESS_TABLE;
+			break;
+		case GSID_LOGICAL_PVR:
+			type = GSE_BE32;
+			break;
+		}
+		break;
+	case GS_CLASS_META:
+		switch (iden) {
+		case GSID_RUN_INPUT:
+		case GSID_RUN_OUTPUT:
+			type = GSE_BUFFER;
+			break;
+		case GSID_VPA:
+			type = GSE_BE64;
+			break;
+		}
+		break;
+	case GS_CLASS_DWORD_REG:
+		type = GSE_BE64;
+		break;
+	case GS_CLASS_WORD_REG:
+		type = GSE_BE32;
+		break;
+	case GS_CLASS_VECTOR:
+		type = GSE_VEC128;
+		break;
+	case GS_CLASS_INTR:
+		switch (iden) {
+		case GSID_HDAR:
+		case GSID_ASDR:
+		case GSID_HEIR:
+			type = GSE_BE64;
+			break;
+		case GSID_HDSISR:
+			type = GSE_BE32;
+			break;
+		}
+		break;
+	}
+
+	return type;
+}
+
+/**
+ * gsid_flags() - the flags for a guest state ID
+ * @iden: guest state ID
+ *
+ * Returns any flags for the guest state ID.
+ */
+unsigned long gsid_flags(u16 iden)
+{
+	unsigned long flags = 0;
+
+	switch (gsid_class(iden)) {
+	case GS_CLASS_GUESTWIDE:
+		flags = GS_FLAGS_WIDE;
+		break;
+	case GS_CLASS_META:
+	case GS_CLASS_DWORD_REG:
+	case GS_CLASS_WORD_REG:
+	case GS_CLASS_VECTOR:
+	case GS_CLASS_INTR:
+		break;
+	}
+
+	return flags;
+}
+EXPORT_SYMBOL(gsid_flags);
+
+/**
+ * gsid_size() - the size of a guest state ID
+ * @iden: guest state ID
+ *
+ * Returns the size of guest state ID.
+ */
+u16 gsid_size(u16 iden)
+{
+	int type;
+
+	type = gsid_type(iden);
+	if (type == -1)
+		return 0;
+
+	if (type >= __GSE_TYPE_MAX)
+		return 0;
+
+	return gse_iden_len[type];
+}
+EXPORT_SYMBOL(gsid_size);
+
+/**
+ * gsid_mask() - the settable bits of a guest state ID
+ * @iden: guest state ID
+ *
+ * Returns a mask of settable bits for a guest state ID.
+ */
+u64 gsid_mask(u16 iden)
+{
+	u64 mask = ~0ull;
+
+	switch (iden) {
+	case GSID_LPCR:
+		mask = LPCR_DPFD | LPCR_ILE | LPCR_AIL | LPCR_LD | LPCR_MER | LPCR_GTSE;
+		break;
+	case GSID_MSR:
+		mask = ~(MSR_HV | MSR_S | MSR_ME);
+		break;
+	}
+
+	return mask;
+}
+EXPORT_SYMBOL(gsid_mask);
+
+/**
+ * __gse_put() - add a guest state element to a buffer
+ * @gsb: buffer to the element to
+ * @iden: guest state ID
+ * @size: length of data
+ * @data: pointer to data
+ */
+int __gse_put(struct gs_buff *gsb, u16 iden, u16 size, const void *data)
+{
+	struct gs_elem *gse;
+	u16 total_size;
+
+	total_size = sizeof(*gse) + size;
+	if (total_size + gsb_len(gsb) > gsb_capacity(gsb))
+		return -ENOMEM;
+
+	if (gsid_size(iden) != size)
+		return -EINVAL;
+
+	gse = gsb_put(gsb, total_size);
+	gse->iden = cpu_to_be16(iden);
+	gse->len = cpu_to_be16(size);
+	memcpy(gse->data, data, size);
+
+	return 0;
+}
+EXPORT_SYMBOL(__gse_put);
+
+/**
+ * gse_parse() - create a parse map from a guest state buffer
+ * @gsp: guest state parser
+ * @gsb: guest state buffer
+ */
+int gse_parse(struct gs_parser *gsp, struct gs_buff *gsb)
+{
+	struct gs_elem *curr;
+	int rem, i;
+
+	gsb_for_each_elem(i, curr, gsb, rem) {
+		if (gse_len(curr) != gsid_size(gse_iden(curr)))
+			return -EINVAL;
+		gsp_insert(gsp, gse_iden(curr), curr);
+	}
+
+	if (gsb_nelems(gsb) != i)
+		return -EINVAL;
+	return 0;
+}
+EXPORT_SYMBOL(gse_parse);
+
+static inline int gse_flatten_iden(u16 iden)
+{
+	int bit = 0;
+	int class;
+
+	class = gsid_class(iden);
+
+	if (class == GS_CLASS_GUESTWIDE) {
+		bit += iden - GSE_GUESTWIDE_START;
+		return bit;
+	}
+
+	bit += GSE_GUESTWIDE_COUNT;
+
+	if (class == GS_CLASS_META) {
+		bit += iden - GSE_META_START;
+		return bit;
+	}
+
+	bit += GSE_META_COUNT;
+
+	if (class == GS_CLASS_DWORD_REG) {
+		bit += iden - GSE_DW_REGS_START;
+		return bit;
+	}
+
+	bit += GSE_DW_REGS_COUNT;
+
+	if (class == GS_CLASS_WORD_REG) {
+		bit += iden - GSE_W_REGS_START;
+		return bit;
+	}
+
+	bit += GSE_W_REGS_COUNT;
+
+	if (class == GS_CLASS_VECTOR) {
+		bit += iden - GSE_VSRS_START;
+		return bit;
+	}
+
+	bit += GSE_VSRS_COUNT;
+
+	if (class == GS_CLASS_INTR) {
+		bit += iden - GSE_INTR_REGS_START;
+		return bit;
+	}
+
+	return 0;
+}
+
+static inline u16 gse_unflatten_iden(int bit)
+{
+	u16 iden;
+
+	if (bit < GSE_GUESTWIDE_COUNT) {
+		iden = GSE_GUESTWIDE_START + bit;
+		return iden;
+	}
+	bit -= GSE_GUESTWIDE_COUNT;
+
+	if (bit < GSE_META_COUNT) {
+		iden = GSE_META_START + bit;
+		return iden;
+	}
+	bit -= GSE_META_COUNT;
+
+	if (bit < GSE_DW_REGS_COUNT) {
+		iden = GSE_DW_REGS_START + bit;
+		return iden;
+	}
+	bit -= GSE_DW_REGS_COUNT;
+
+	if (bit < GSE_W_REGS_COUNT) {
+		iden = GSE_W_REGS_START + bit;
+		return iden;
+	}
+	bit -= GSE_W_REGS_COUNT;
+
+	if (bit < GSE_VSRS_COUNT) {
+		iden = GSE_VSRS_START + bit;
+		return iden;
+	}
+	bit -= GSE_VSRS_COUNT;
+
+	if (bit < GSE_IDEN_COUNT) {
+		iden = GSE_INTR_REGS_START + bit;
+		return iden;
+	}
+
+	return 0;
+}
+
+/**
+ * gsp_insert() - add a mapping from an guest state ID to an element
+ * @gsp: guest state parser
+ * @iden: guest state id (key)
+ * @gse: guest state element (value)
+ */
+void gsp_insert(struct gs_parser *gsp, u16 iden, struct gs_elem *gse)
+{
+	int i;
+
+	i = gse_flatten_iden(iden);
+	gsbm_set(&gsp->iterator, iden);
+	gsp->gses[i] = gse;
+}
+EXPORT_SYMBOL(gsp_insert);
+
+/**
+ * gsp_lookup() - lookup an element from a guest state ID
+ * @gsp: guest state parser
+ * @iden: guest state ID (key)
+ *
+ * Returns the guest state element if present.
+ */
+struct gs_elem *gsp_lookup(struct gs_parser *gsp, u16 iden)
+{
+	int i;
+
+	i = gse_flatten_iden(iden);
+	return gsp->gses[i];
+}
+EXPORT_SYMBOL(gsp_lookup);
+
+/**
+ * gsbm_set() - set the guest state ID
+ * @gsbm: guest state bitmap
+ * @iden: guest state ID
+ */
+void gsbm_set(struct gs_bitmap *gsbm, u16 iden)
+{
+	set_bit(gse_flatten_iden(iden), gsbm->bitmap);
+}
+EXPORT_SYMBOL(gsbm_set);
+
+/**
+ * gsbm_clear() - clear the guest state ID
+ * @gsbm: guest state bitmap
+ * @iden: guest state ID
+ */
+void gsbm_clear(struct gs_bitmap *gsbm, u16 iden)
+{
+	clear_bit(gse_flatten_iden(iden), gsbm->bitmap);
+}
+EXPORT_SYMBOL(gsbm_clear);
+
+/**
+ * gsbm_test() - test the guest state ID
+ * @gsbm: guest state bitmap
+ * @iden: guest state ID
+ */
+bool gsbm_test(struct gs_bitmap *gsbm, u16 iden)
+{
+	return test_bit(gse_flatten_iden(iden), gsbm->bitmap);
+}
+EXPORT_SYMBOL(gsbm_test);
+
+/**
+ * gsbm_next() - return the next set guest state ID
+ * @gsbm: guest state bitmap
+ * @prev: last guest state ID
+ */
+u16 gsbm_next(struct gs_bitmap *gsbm, u16 prev)
+{
+	int bit, pbit;
+
+	pbit = prev ? gse_flatten_iden(prev) + 1 : 0;
+	bit = find_next_bit(gsbm->bitmap, GSE_IDEN_COUNT, pbit);
+
+	if (bit < GSE_IDEN_COUNT)
+		return gse_unflatten_iden(bit);
+	return 0;
+}
+EXPORT_SYMBOL(gsbm_next);
+
+/**
+ * gsm_init() - initialize a guest state message
+ * @gsm: guest state message
+ * @ops: callbacks
+ * @data: private data
+ * @flags: guest wide or thread wide
+ */
+int gsm_init(struct gs_msg *gsm, struct gs_msg_ops *ops, void *data,
+	     unsigned long flags)
+{
+	memset(gsm, 0, sizeof(*gsm));
+	gsm->ops = ops;
+	gsm->data = data;
+	gsm->flags = flags;
+
+	return 0;
+}
+EXPORT_SYMBOL(gsm_init);
+
+/**
+ * gsm_init() - creates a new guest state message
+ * @ops: callbacks
+ * @data: private data
+ * @flags: guest wide or thread wide
+ * @gfp_flags: GFP allocation flags
+ *
+ * Returns an initialized guest state message.
+ */
+struct gs_msg *gsm_new(struct gs_msg_ops *ops, void *data, unsigned long flags,
+		       gfp_t gfp_flags)
+{
+	struct gs_msg *gsm;
+
+	gsm = kzalloc(sizeof(*gsm), gfp_flags);
+	if (!gsm)
+		return NULL;
+
+	gsm_init(gsm, ops, data, flags);
+
+	return gsm;
+}
+EXPORT_SYMBOL(gsm_new);
+
+/**
+ * gsm_size() - creates a new guest state message
+ * @gsm: self
+ *
+ * Returns the size required for the message.
+ */
+size_t gsm_size(struct gs_msg *gsm)
+{
+	if (gsm->ops->get_size)
+		return gsm->ops->get_size(gsm);
+	return 0;
+}
+EXPORT_SYMBOL(gsm_size);
+
+/**
+ * gsm_free() - free guest state message
+ * @gsm: guest state message
+ *
+ * Returns the size required for the message.
+ */
+void gsm_free(struct gs_msg *gsm)
+{
+	kfree(gsm);
+}
+EXPORT_SYMBOL(gsm_free);
+
+/**
+ * gsm_fill_info() - serialises message to guest state buffer format
+ * @gsm: self
+ * @gsb: buffer to serialise into
+ */
+int gsm_fill_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	if (!gsm->ops->fill_info)
+		return -EINVAL;
+
+	gsb_reset(gsb);
+	return gsm->ops->fill_info(gsb, gsm);
+}
+EXPORT_SYMBOL(gsm_fill_info);
+
+/**
+ * gsm_fill_info() - deserialises from guest state buffer
+ * @gsm: self
+ * @gsb: buffer to serialise from
+ */
+int gsm_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	if (!gsm->ops->fill_info)
+		return -EINVAL;
+
+	return gsm->ops->refresh_info(gsm, gsb);
+}
+EXPORT_SYMBOL(gsm_refresh_info);
diff --git a/arch/powerpc/kvm/test-guest-state-buffer.c b/arch/powerpc/kvm/test-guest-state-buffer.c
new file mode 100644
index 000000000000..d038051b61f8
--- /dev/null
+++ b/arch/powerpc/kvm/test-guest-state-buffer.c
@@ -0,0 +1,321 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/init.h>
+#include <linux/log2.h>
+#include <kunit/test.h>
+
+
+#include <asm/guest-state-buffer.h>
+
+static void test_creating_buffer(struct kunit *test)
+{
+	struct gs_buff *gsb;
+	size_t size = 0x100;
+
+	gsb = gsb_new(size, 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb->hdr);
+
+	KUNIT_EXPECT_EQ(test, gsb->capacity, roundup_pow_of_two(size));
+	KUNIT_EXPECT_EQ(test, gsb->len, sizeof(__be32));
+
+	gsb_free(gsb);
+}
+
+static void test_adding_element(struct kunit *test)
+{
+	const struct gs_elem *head, *curr;
+	union {
+		__vector128 v;
+		u64 dw[2];
+	} u;
+	int rem;
+	struct gs_buff *gsb;
+	size_t size = 0x1000;
+	int i, rc;
+	u64 data;
+
+	gsb = gsb_new(size, 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	/* Single elements, direct use of __gse_put() */
+	data = 0xdeadbeef;
+	rc = __gse_put(gsb, GSID_GPR(0), 8, &data);
+	KUNIT_EXPECT_GE(test, rc, 0);
+
+	head = gsb_data(gsb);
+	KUNIT_EXPECT_EQ(test, gse_iden(head), GSID_GPR(0));
+	KUNIT_EXPECT_EQ(test, gse_len(head), 8);
+	data = 0;
+	memcpy(&data, gse_data(head), 8);
+	KUNIT_EXPECT_EQ(test, data, 0xdeadbeef);
+
+	/* Multiple elements, simple wrapper */
+	rc = gse_put_u64(gsb, GSID_GPR(1), 0xcafef00d);
+	KUNIT_EXPECT_GE(test, rc, 0);
+
+	u.dw[0] = 0x1;
+	u.dw[1] = 0x2;
+	rc = gse_put_vector128(gsb, GSID_VSRS(0), u.v);
+	KUNIT_EXPECT_GE(test, rc, 0);
+	u.dw[0] = 0x0;
+	u.dw[1] = 0x0;
+
+	gsb_for_each_elem(i, curr, gsb, rem) {
+		switch (i) {
+		case 0:
+			KUNIT_EXPECT_EQ(test, gse_iden(curr), GSID_GPR(0));
+			KUNIT_EXPECT_EQ(test, gse_len(curr), 8);
+			KUNIT_EXPECT_EQ(test, gse_get_be64(curr), 0xdeadbeef);
+			break;
+		case 1:
+			KUNIT_EXPECT_EQ(test, gse_iden(curr), GSID_GPR(1));
+			KUNIT_EXPECT_EQ(test, gse_len(curr), 8);
+			KUNIT_EXPECT_EQ(test, gse_get_u64(curr), 0xcafef00d);
+			break;
+		case 2:
+			KUNIT_EXPECT_EQ(test, gse_iden(curr), GSID_VSRS(0));
+			KUNIT_EXPECT_EQ(test, gse_len(curr), 16);
+			u.v = gse_get_vector128(curr);
+			KUNIT_EXPECT_EQ(test, u.dw[0], 0x1);
+			KUNIT_EXPECT_EQ(test, u.dw[1], 0x2);
+			break;
+		}
+	}
+	KUNIT_EXPECT_EQ(test, i, 3);
+
+	gsb_reset(gsb);
+	KUNIT_EXPECT_EQ(test, gsb_nelems(gsb), 0);
+	KUNIT_EXPECT_EQ(test, gsb_len(gsb), sizeof(struct gs_header));
+
+	gsb_free(gsb);
+}
+
+static void test_gs_parsing(struct kunit *test)
+{
+	struct gs_elem *gse;
+	struct gs_parser gsp = { 0 };
+	struct gs_buff *gsb;
+	size_t size = 0x1000;
+	u64 tmp1, tmp2;
+
+	gsb = gsb_new(size, 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	tmp1 = 0xdeadbeefull;
+	gse_put(gsb, GSID_GPR(0), tmp1);
+
+	KUNIT_EXPECT_GE(test, gse_parse(&gsp, gsb), 0);
+
+	gse = gsp_lookup(&gsp, GSID_GPR(0));
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gse);
+
+	gse_get(gse, &tmp2);
+	KUNIT_EXPECT_EQ(test, tmp2, 0xdeadbeefull);
+
+	gsb_free(gsb);
+}
+
+static void test_gs_bitmap(struct kunit *test)
+{
+	struct gs_bitmap gsbm = { 0 };
+	struct gs_bitmap gsbm1 = { 0 };
+	struct gs_bitmap gsbm2 = { 0 };
+	u16 iden;
+	int i, j;
+
+	i = 0;
+	for (u16 iden = GSID_HOST_STATE_SIZE;
+	     iden <= GSID_PROCESS_TABLE; iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_RUN_INPUT; iden <= GSID_VPA;
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_GPR(0); iden <= GSID_CTRL;
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_CR; iden <= GSID_PSPB; iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_VSRS(0); iden <= GSID_VSRS(63);
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_HDAR; iden <= GSID_ASDR;
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	j = 0;
+	gsbm_for_each(&gsbm1, iden)
+	{
+		gsbm_set(&gsbm2, iden);
+		j++;
+	}
+	KUNIT_EXPECT_EQ(test, i, j);
+	KUNIT_EXPECT_MEMEQ(test, &gsbm1, &gsbm2, sizeof(gsbm1));
+}
+
+struct gs_msg_test1_data {
+	u64 a;
+	u32 b;
+	struct gs_part_table c;
+	struct gs_proc_table d;
+	struct gs_buff_info e;
+};
+
+static size_t test1_get_size(struct gs_msg *gsm)
+{
+	size_t size = 0;
+	u16 ids[] = {
+		GSID_PARTITION_TABLE,
+		GSID_PROCESS_TABLE,
+		GSID_RUN_INPUT,
+		GSID_GPR(0),
+		GSID_CR,
+	};
+
+	for (int i = 0; i < ARRAY_SIZE(ids); i++)
+		size += gse_total_size(gsid_size(ids[i]));
+	return size;
+}
+
+static int test1_fill_info(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	struct gs_msg_test1_data *data = gsm->data;
+
+	if (gsm_includes(gsm, GSID_GPR(0)))
+		gse_put(gsb, GSID_GPR(0), data->a);
+
+	if (gsm_includes(gsm, GSID_CR))
+		gse_put(gsb, GSID_CR, data->b);
+
+	if (gsm_includes(gsm, GSID_PARTITION_TABLE))
+		gse_put(gsb, GSID_PARTITION_TABLE, data->c);
+
+	if (gsm_includes(gsm, GSID_PROCESS_TABLE))
+		gse_put(gsb, GSID_PARTITION_TABLE, data->d);
+
+	if (gsm_includes(gsm, GSID_RUN_INPUT))
+		gse_put(gsb, GSID_RUN_INPUT, data->e);
+
+	return 0;
+}
+
+static int test1_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	struct gs_parser gsp = { 0 };
+	struct gs_msg_test1_data *data = gsm->data;
+	struct gs_elem *gse;
+	int rc;
+
+	rc = gse_parse(&gsp, gsb);
+	if (rc < 0)
+		return rc;
+
+	gse = gsp_lookup(&gsp, GSID_GPR(0));
+	if (gse)
+		gse_get(gse, &data->a);
+
+	gse = gsp_lookup(&gsp, GSID_CR);
+	if (gse)
+		gse_get(gse, &data->b);
+
+	return 0;
+}
+
+static struct gs_msg_ops gs_msg_test1_ops = {
+	.get_size = test1_get_size,
+	.fill_info = test1_fill_info,
+	.refresh_info = test1_refresh_info,
+};
+
+static void test_gs_msg(struct kunit *test)
+{
+	struct gs_msg_test1_data test1_data = {
+		.a = 0xdeadbeef,
+		.b = 0x1,
+	};
+	struct gs_msg *gsm;
+	struct gs_buff *gsb;
+
+	gsm = gsm_new(&gs_msg_test1_ops, &test1_data, GSM_SEND, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsm);
+
+	gsb = gsb_new(gsm_size(gsm), 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	gsm_include(gsm, GSID_PARTITION_TABLE);
+	gsm_include(gsm, GSID_PROCESS_TABLE);
+	gsm_include(gsm, GSID_RUN_INPUT);
+	gsm_include(gsm, GSID_GPR(0));
+	gsm_include(gsm, GSID_CR);
+
+	gsm_fill_info(gsm, gsb);
+
+	memset(&test1_data, 0, sizeof(test1_data));
+
+	gsm_refresh_info(gsm, gsb);
+	KUNIT_EXPECT_EQ(test, test1_data.a, 0xdeadbeef);
+	KUNIT_EXPECT_EQ(test, test1_data.b, 0x1);
+
+	gsm_free(gsm);
+}
+
+
+static struct kunit_case guest_state_buffer_testcases[] = {
+	KUNIT_CASE(test_creating_buffer),
+	KUNIT_CASE(test_adding_element),
+	KUNIT_CASE(test_gs_bitmap),
+	KUNIT_CASE(test_gs_parsing),
+	KUNIT_CASE(test_gs_msg),
+	{}
+};
+
+static struct kunit_suite guest_state_buffer_test_suite = {
+	.name = "guest_state_buffer_test",
+	.test_cases = guest_state_buffer_testcases,
+};
+
+kunit_test_suites(&guest_state_buffer_test_suite);
+
+MODULE_LICENSE("GPL");
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

The new PAPR nested guest API introduces the concept of a Guest State
Buffer for communication about L2 guests between L1 and L0 hosts.

In the new API, the L0 manages the L2 on behalf of the L1. This means
that if the L1 needs to change L2 state (e.g. GPRs, SPRs, partition
table...), it must request the L0 perform the modification. If the
nested host needs to read L2 state likewise this request must
go through the L0.

The Guest State Buffer is a Type-Length-Value style data format defined
in the PAPR which assigns all relevant partition state a unique
identity. Unlike a typical TLV format the length is redundant as the
length of each identity is fixed but is included for checking
correctness.

A guest state buffer consists of an element count followed by a stream
of elements, where elements are composed of an ID number, data length,
then the data:

  Header:

   <---4 bytes--->
  +----------------+-----
  | Element Count  | Elements...
  +----------------+-----

  Element:

   <----2 bytes---> <-2 bytes-> <-Length bytes->
  +----------------+-----------+----------------+
  | Guest State ID |  Length   |      Data      |
  +----------------+-----------+----------------+

Guest State IDs have other attributes defined in the PAPR such as
whether they are per thread or per guest, or read-only.

Introduce a library for using guest state buffers. This includes support
for actions such as creating buffers, adding elements to buffers,
reading the value of elements and parsing buffers. This will be used
later by the PAPR nested guest support.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
v2:
  - Add missing #ifdef CONFIG_VSXs
  - Move files from lib/ to kvm/
  - Guard compilation on CONFIG_KVM_BOOK3S_HV_POSSIBLE
  - Use kunit for guest state buffer tests
  - Add configuration option for the tests
  - Use macros for contiguous id ranges like GPRs
  - Add some missing EXPORTs to functions
  - HEIR element is a double word not a word
---
 arch/powerpc/Kconfig.debug                    |  12 +
 arch/powerpc/include/asm/guest-state-buffer.h | 901 ++++++++++++++++++
 arch/powerpc/include/asm/kvm_book3s.h         |   2 +
 arch/powerpc/kvm/Makefile                     |   3 +
 arch/powerpc/kvm/guest-state-buffer.c         | 563 +++++++++++
 arch/powerpc/kvm/test-guest-state-buffer.c    | 321 +++++++
 6 files changed, 1802 insertions(+)
 create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
 create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
 create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 6aaf8dc60610..ed830a714720 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -82,6 +82,18 @@ config MSI_BITMAP_SELFTEST
 	bool "Run self-tests of the MSI bitmap code"
 	depends on DEBUG_KERNEL
 
+config GUEST_STATE_BUFFER_TEST
+	def_tristate n
+	prompt "Enable Guest State Buffer unit tests"
+	depends on KUNIT
+	depends on KVM_BOOK3S_HV_POSSIBLE
+	default KUNIT_ALL_TESTS
+	help
+	  The Guest State Buffer is a data format specified in the PAPR.
+	  It is by hcalls to communicate the state of L2 guests between
+	  the L1 and L0 hypervisors. Enable unit tests for the library
+	  used to create and use guest state buffers.
+
 config PPC_IRQ_SOFT_MASK_DEBUG
 	bool "Include extra checks for powerpc irq soft masking"
 	depends on PPC64
diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
new file mode 100644
index 000000000000..65a840abf1bb
--- /dev/null
+++ b/arch/powerpc/include/asm/guest-state-buffer.h
@@ -0,0 +1,901 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Interface based on include/net/netlink.h
+ */
+#ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
+#define _ASM_POWERPC_GUEST_STATE_BUFFER_H
+
+#include <linux/gfp.h>
+#include <linux/bitmap.h>
+#include <asm/plpar_wrappers.h>
+
+/**************************************************************************
+ * Guest State Buffer Constants
+ **************************************************************************/
+#define GSID_BLANK			0x0000
+
+#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
+#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
+#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
+#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
+#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
+#define GSID_PROCESS_TABLE		0x0006 /* Process Table */
+
+#define GSID_RUN_INPUT			0x0C00 /* Run VCPU Input Buffer */
+#define GSID_RUN_OUTPUT			0x0C01 /* Run VCPU Out Buffer */
+#define GSID_VPA			0x0C02 /* HRA to Guest VCPU VPA */
+
+#define GSID_GPR(x)			(0x1000 + (x))
+#define GSID_HDEC_EXPIRY_TB		0x1020
+#define GSID_NIA			0x1021
+#define GSID_MSR			0x1022
+#define GSID_LR				0x1023
+#define GSID_XER			0x1024
+#define GSID_CTR			0x1025
+#define GSID_CFAR			0x1026
+#define GSID_SRR0			0x1027
+#define GSID_SRR1			0x1028
+#define GSID_DAR			0x1029
+#define GSID_DEC_EXPIRY_TB		0x102A
+#define GSID_VTB			0x102B
+#define GSID_LPCR			0x102C
+#define GSID_HFSCR			0x102D
+#define GSID_FSCR			0x102E
+#define GSID_FPSCR			0x102F
+#define GSID_DAWR0			0x1030
+#define GSID_DAWR1			0x1031
+#define GSID_CIABR			0x1032
+#define GSID_PURR			0x1033
+#define GSID_SPURR			0x1034
+#define GSID_IC				0x1035
+#define GSID_SPRG0			0x1036
+#define GSID_SPRG1			0x1037
+#define GSID_SPRG2			0x1038
+#define GSID_SPRG3			0x1039
+#define GSID_PPR			0x103A
+#define GSID_MMCR(x)			(0x103B + (x))
+#define GSID_MMCRA			0x103F
+#define GSID_SIER(x)			(0x1040 + (x))
+#define GSID_BESCR			0x1043
+#define GSID_EBBHR			0x1044
+#define GSID_EBBRR			0x1045
+#define GSID_AMR			0x1046
+#define GSID_IAMR			0x1047
+#define GSID_AMOR			0x1048
+#define GSID_UAMOR			0x1049
+#define GSID_SDAR			0x104A
+#define GSID_SIAR			0x104B
+#define GSID_DSCR			0x104C
+#define GSID_TAR			0x104D
+#define GSID_DEXCR			0x104E
+#define GSID_HDEXCR			0x104F
+#define GSID_HASHKEYR			0x1050
+#define GSID_HASHPKEYR			0x1051
+#define GSID_CTRL			0x1052
+
+#define GSID_CR				0x2000
+#define GSID_PIDR			0x2001
+#define GSID_DSISR			0x2002
+#define GSID_VSCR			0x2003
+#define GSID_VRSAVE			0x2004
+#define GSID_DAWRX0			0x2005
+#define GSID_DAWRX1			0x2006
+#define GSID_PMC(x)			(0x2007 + (x))
+#define GSID_WORT			0x200D
+#define GSID_PSPB			0x200E
+
+#define GSID_VSRS(x)			(0x3000 + (x))
+
+#define GSID_HDAR			0xF000
+#define GSID_HDSISR			0xF001
+#define GSID_HEIR			0xF002
+#define GSID_ASDR			0xF003
+
+
+#define GSE_GUESTWIDE_START GSID_BLANK
+#define GSE_GUESTWIDE_END GSID_PROCESS_TABLE
+#define GSE_GUESTWIDE_COUNT (GSE_GUESTWIDE_END - GSE_GUESTWIDE_START + 1)
+
+#define GSE_META_START GSID_RUN_INPUT
+#define GSE_META_END GSID_VPA
+#define GSE_META_COUNT (GSE_META_END - GSE_META_START + 1)
+
+#define GSE_DW_REGS_START GSID_GPR(0)
+#define GSE_DW_REGS_END GSID_CTRL
+#define GSE_DW_REGS_COUNT (GSE_DW_REGS_END - GSE_DW_REGS_START + 1)
+
+#define GSE_W_REGS_START GSID_CR
+#define GSE_W_REGS_END GSID_PSPB
+#define GSE_W_REGS_COUNT (GSE_W_REGS_END - GSE_W_REGS_START + 1)
+
+#define GSE_VSRS_START GSID_VSRS(0)
+#define GSE_VSRS_END GSID_VSRS(63)
+#define GSE_VSRS_COUNT (GSE_VSRS_END - GSE_VSRS_START + 1)
+
+#define GSE_INTR_REGS_START GSID_HDAR
+#define GSE_INTR_REGS_END GSID_ASDR
+#define GSE_INTR_REGS_COUNT (GSE_INTR_REGS_END - GSE_INTR_REGS_START + 1)
+
+#define GSE_IDEN_COUNT                                              \
+	(GSE_GUESTWIDE_COUNT + GSE_META_COUNT + GSE_DW_REGS_COUNT + \
+	 GSE_W_REGS_COUNT + GSE_VSRS_COUNT + GSE_INTR_REGS_COUNT)
+
+
+/**
+ * Ranges of guest state buffer elements
+ */
+enum {
+	GS_CLASS_GUESTWIDE = 0x01,
+	GS_CLASS_META = 0x02,
+	GS_CLASS_DWORD_REG = 0x04,
+	GS_CLASS_WORD_REG = 0x08,
+	GS_CLASS_VECTOR = 0x10,
+	GS_CLASS_INTR = 0x20,
+};
+
+/**
+ * Types of guest state buffer elements
+ */
+enum {
+	GSE_BE32,
+	GSE_BE64,
+	GSE_VEC128,
+	GSE_PARTITION_TABLE,
+	GSE_PROCESS_TABLE,
+	GSE_BUFFER,
+	__GSE_TYPE_MAX,
+};
+
+/**
+ * Flags for guest state elements
+ */
+enum {
+	GS_FLAGS_WIDE = 0x01,
+};
+
+/**
+ * struct gs_part_table - deserialized partition table information element
+ * @address: start of the partition table
+ * @ea_bits: number of bits in the effective address
+ * @gpd_size: root page directory size
+ */
+struct gs_part_table {
+	u64 address;
+	u64 ea_bits;
+	u64 gpd_size;
+};
+
+/**
+ * struct gs_proc_table - deserialized process table information element
+ * @address: start of the process table
+ * @gpd_size: process table size
+ */
+struct gs_proc_table {
+	u64 address;
+	u64 gpd_size;
+};
+
+/**
+ * struct gs_buff_info - deserialized meta guest state buffer information
+ * @address: start of the guest state buffer
+ * @size: size of the guest state buffer
+ */
+struct gs_buff_info {
+	u64 address;
+	u64 size;
+};
+
+/**
+ * struct gs_header - serialized guest state buffer header
+ * @nelem: count of guest state elements in the buffer
+ * @data: start of the stream of elements in the buffer
+ */
+struct gs_header {
+	__be32 nelems;
+	char data[];
+} __packed;
+
+/**
+ * struct gs_elem - serialized guest state buffer element
+ * @iden: Guest State ID
+ * @len: length of data
+ * @data: the guest state buffer element's value
+ */
+struct gs_elem {
+	__be16 iden;
+	__be16 len;
+	char data[];
+} __packed;
+
+/**
+ * struct gs_buff - a guest state buffer with metadata.
+ * @capacity: total length of the buffer
+ * @len: current length of the elements and header
+ * @guest_id: guest id associated with the buffer
+ * @vcpu_id: vcpu_id associated with the buffer
+ * @hdr: the serialised guest state buffer
+ */
+struct gs_buff {
+	size_t capacity;
+	size_t len;
+	unsigned long guest_id;
+	unsigned long vcpu_id;
+	struct gs_header *hdr;
+};
+
+/**
+ * struct gs_bitmap - a bitmap for element ids
+ * @bitmap: a bitmap large enough for all Guest State IDs
+ */
+struct gs_bitmap {
+/* private: */
+	DECLARE_BITMAP(bitmap, GSE_IDEN_COUNT);
+};
+
+/**
+ * struct gs_parser - a map of element ids to locations in a buffer
+ * @iterator: bitmap used for iterating
+ * @gses: contains the pointers to elements
+ *
+ * A guest state parser is used for deserialising a guest state buffer.
+ * Given a buffer, it then allows looking up guest state elements using
+ * a guest state id.
+ */
+struct gs_parser {
+/* private: */
+	struct gs_bitmap iterator;
+	struct gs_elem *gses[GSE_IDEN_COUNT];
+};
+
+enum {
+	GSM_GUEST_WIDE = 0x1,
+	GSM_SEND = 0x2,
+	GSM_RECEIVE = 0x4,
+	GSM_GSB_OWNER = 0x8,
+};
+
+struct gs_msg;
+
+/**
+ * struct gs_msg_ops - guest state message behavior
+ * @get_size: maximum size required for the message data
+ * @fill_info: serializes to the guest state buffer format
+ * @refresh_info: dserializes from the guest state buffer format
+ */
+struct gs_msg_ops {
+	size_t (*get_size)(struct gs_msg *gsm);
+	int (*fill_info)(struct gs_buff *gsb, struct gs_msg *gsm);
+	int (*refresh_info)(struct gs_msg *gsm, struct gs_buff *gsb);
+};
+
+/**
+ * struct gs_msg - a guest state message
+ * @bitmap: the guest state ids that should be included
+ * @ops: modify message behavior for reading and writing to buffers
+ * @flags: guest wide or thread wide
+ * @data: location where buffer data will be written to or from.
+ *
+ * A guest state message is allows flexibility in sending in receiving data
+ * in a guest state buffer format.
+ */
+struct gs_msg {
+	struct gs_bitmap bitmap;
+	struct gs_msg_ops *ops;
+	unsigned long flags;
+	void *data;
+};
+
+/**************************************************************************
+ * Guest State IDs
+ **************************************************************************/
+
+u16 gsid_size(u16 iden);
+unsigned long gsid_flags(u16 iden);
+u64 gsid_mask(u16 iden);
+
+/**************************************************************************
+ * Guest State Buffers
+ **************************************************************************/
+struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
+			unsigned long vcpu_id, gfp_t flags);
+void gsb_free(struct gs_buff *gsb);
+void *gsb_put(struct gs_buff *gsb, size_t size);
+
+/**
+ * gsb_header() - the header of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns a pointer to the buffer header.
+ */
+static inline struct gs_header *gsb_header(struct gs_buff *gsb)
+{
+	return gsb->hdr;
+}
+
+/**
+ * gsb_data() - the elements of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns a pointer to the first element of the buffer data.
+ */
+static inline struct gs_elem *gsb_data(struct gs_buff *gsb)
+{
+	return (struct gs_elem *)gsb_header(gsb)->data;
+}
+
+/**
+ * gsb_len() - the current length of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the length including the header of a buffer.
+ */
+static inline size_t gsb_len(struct gs_buff *gsb)
+{
+	return gsb->len;
+}
+
+/**
+ * gsb_capacity() - the capacity of a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the capacity of a buffer.
+ */
+static inline size_t gsb_capacity(struct gs_buff *gsb)
+{
+	return gsb->capacity;
+}
+
+/**
+ * gsb_paddress() - the physical address of buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the physical address of the buffer.
+ */
+static inline u64 gsb_paddress(struct gs_buff *gsb)
+{
+	return __pa(gsb_header(gsb));
+}
+
+/**
+ * gsb_nelems() - the number of elements in a buffer
+ * @gsb: guest state buffer
+ *
+ * Returns the number of elements in a buffer
+ */
+static inline u32 gsb_nelems(struct gs_buff *gsb)
+{
+	return be32_to_cpu(gsb_header(gsb)->nelems);
+}
+
+/**
+ * gsb_reset() - empty a guest state buffer
+ * @gsb: guest state buffer
+ *
+ * Reset the number of elements and length of buffer to empty.
+ */
+static inline void gsb_reset(struct gs_buff *gsb)
+{
+	gsb_header(gsb)->nelems = cpu_to_be32(0);
+	gsb->len = sizeof(struct gs_header);
+}
+
+/**
+ * gsb_data_len() - the length of a buffer excluding the header
+ * @gsb: guest state buffer
+ *
+ * Returns the length of a buffer excluding the header
+ */
+static inline size_t gsb_data_len(struct gs_buff *gsb)
+{
+	return gsb->len - sizeof(struct gs_header);
+}
+
+/**
+ * gsb_data_cap() - the capacity of a buffer excluding the header
+ * @gsb: guest state buffer
+ *
+ * Returns the capacity of a buffer excluding the header
+ */
+static inline size_t gsb_data_cap(struct gs_buff *gsb)
+{
+	return gsb->capacity - sizeof(struct gs_header);
+}
+
+/**
+ * gsb_for_each_elem - iterate over the elements in a buffer
+ * @i: loop counter
+ * @pos: set to current element
+ * @gsb: guest state buffer
+ * @rem: initialized to buffer capacity, holds bytes currently remaining in stream
+ */
+#define gsb_for_each_elem(i, pos, gsb, rem)                       \
+	gse_for_each_elem(i, gsb_nelems(gsb), pos, gsb_data(gsb), \
+			  gsb_data_cap(gsb), rem)
+
+/**************************************************************************
+ * Guest State Elements
+ **************************************************************************/
+
+/**
+ * gse_iden() - guest state ID of element
+ * @gse: guest state element
+ *
+ * Return the guest state ID in host endianness.
+ */
+static inline u16 gse_iden(const struct gs_elem *gse)
+{
+	return be16_to_cpu(gse->iden);
+}
+
+/**
+ * gse_len() - length of guest state element data
+ * @gse: guest state element
+ *
+ * Returns the length of guest state element data
+ */
+static inline u16 gse_len(const struct gs_elem *gse)
+{
+	return be16_to_cpu(gse->len);
+}
+
+/**
+ * gse_total_len() - total length of guest state element
+ * @gse: guest state element
+ *
+ * Returns the length of the data plus the ID and size header.
+ */
+static inline u16 gse_total_len(const struct gs_elem *gse)
+{
+	return be16_to_cpu(gse->len) + sizeof(*gse);
+}
+
+/**
+ * gse_total_size() - space needed for a given data length
+ * @size: data length
+ *
+ * Returns size plus the space needed for the ID and size header.
+ */
+static inline u16 gse_total_size(u16 size)
+{
+	return sizeof(struct gs_elem) + size;
+}
+
+/**
+ * gse_data() - pointer to data of a guest state element
+ * @gse: guest state element
+ *
+ * Returns a pointer to the beginning of guest state element data.
+ */
+static inline void *gse_data(const struct gs_elem *gse)
+{
+	return (void *)gse->data;
+}
+
+/**
+ * gse_ok() - checks space exists for guest state element
+ * @gse: guest state element
+ * @remaining: bytes of space remaining
+ *
+ * Returns true if the guest state element can fit in remaining space.
+ */
+static inline bool gse_ok(const struct gs_elem *gse, int remaining)
+{
+	return remaining >= gse_total_len(gse);
+}
+
+/**
+ * gse_next() - iterate to the next guest state element in a stream
+ * @gse: stream of guest state elements
+ * @remaining: length of the guest element stream
+ *
+ * Returns the next guest state element in a stream of elements. The length of
+ * the stream is updated in remaining.
+ */
+static inline struct gs_elem *gse_next(const struct gs_elem *gse,
+				       int *remaining)
+{
+	int len = sizeof(*gse) + gse_len(gse);
+
+	*remaining -= len;
+	return (struct gs_elem *)(gse->data + gse_len(gse));
+}
+
+/**
+ * gse_for_each_elem - iterate over a stream of guest state elements
+ * @i: loop counter
+ * @max: number of elements
+ * @pos: set to current element
+ * @head: head of elements
+ * @len: length of the stream
+ * @rem: initialized to len, holds bytes currently remaining elements
+ */
+#define gse_for_each_elem(i, max, pos, head, len, rem)                  \
+	for (i = 0, pos = head, rem = len; gse_ok(pos, rem) && i < max; \
+	     pos = gse_next(pos, &(rem)), i++)
+
+int __gse_put(struct gs_buff *gsb, u16 iden, u16 size, const void *data);
+int gse_parse(struct gs_parser *gsp, struct gs_buff *gsb);
+
+/**
+ * gse_put_be32() - add a be32 guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: big endian value
+ */
+static inline int gse_put_be32(struct gs_buff *gsb, u16 iden, __be32 val)
+{
+	__be32 tmp;
+
+	tmp = val;
+	return __gse_put(gsb, iden, sizeof(__be32), &tmp);
+}
+
+/**
+ * gse_put_u32() - add a host endian 32bit int guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: host endian value
+ */
+static inline int gse_put_u32(struct gs_buff *gsb, u16 iden, u32 val)
+{
+	__be32 tmp;
+
+	tmp = cpu_to_be32(val);
+	return gse_put_be32(gsb, iden, tmp);
+}
+
+/**
+ * gse_put_be64() - add a be64 guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: big endian value
+ */
+static inline int gse_put_be64(struct gs_buff *gsb, u16 iden, __be64 val)
+{
+	__be64 tmp;
+
+	tmp = val;
+	return __gse_put(gsb, iden, sizeof(__be64), &tmp);
+}
+
+/**
+ * gse_put_u64() - add a host endian 64bit guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: host endian value
+ */
+static inline int gse_put_u64(struct gs_buff *gsb, u16 iden, u64 val)
+{
+	__be64 tmp;
+
+	tmp = cpu_to_be64(val);
+	return gse_put_be64(gsb, iden, tmp);
+}
+
+/**
+ * __gse_put_reg() - add a register type guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: host endian value
+ *
+ * Adds a register type guest state element. Uses the guest state ID for
+ * determining the length of the guest element. If the guest state ID has
+ * bits that can not be set they will be cleared.
+ */
+static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
+{
+	val &= gsid_mask(iden);
+	if (gsid_size(iden) = sizeof(u64))
+		return gse_put_u64(gsb, iden, val);
+
+	if (gsid_size(iden) = sizeof(u32)) {
+		u32 tmp;
+
+		tmp = (u32)val;
+		if (tmp != val)
+			return -EINVAL;
+
+		return gse_put_u32(gsb, iden, tmp);
+	}
+	return -EINVAL;
+}
+
+/**
+ * gse_put_vector128() - add a vector guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: 16 byte vector value
+ */
+static inline int gse_put_vector128(struct gs_buff *gsb, u16 iden,
+				    vector128 val)
+{
+	__be64 tmp[2] = { 0 };
+	union {
+		__vector128 v;
+		u64 dw[2];
+	} u;
+
+	u.v = val;
+	tmp[0] = cpu_to_be64(u.dw[TS_FPROFFSET]);
+#ifdef CONFIG_VSX
+	tmp[1] = cpu_to_be64(u.dw[TS_VSRLOWOFFSET]);
+#endif
+	return __gse_put(gsb, iden, sizeof(tmp), &tmp);
+}
+
+/**
+ * gse_put_part_table() - add a partition table guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: partition table value
+ */
+static inline int gse_put_part_table(struct gs_buff *gsb, u16 iden,
+				     struct gs_part_table val)
+{
+	__be64 tmp[3];
+
+	tmp[0] = cpu_to_be64(val.address);
+	tmp[1] = cpu_to_be64(val.ea_bits);
+	tmp[2] = cpu_to_be64(val.gpd_size);
+	return __gse_put(gsb, GSID_PARTITION_TABLE, sizeof(tmp), &tmp);
+}
+
+/**
+ * gse_put_proc_table() - add a process table guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: process table value
+ */
+static inline int gse_put_proc_table(struct gs_buff *gsb, u16 iden,
+				     struct gs_proc_table val)
+{
+	__be64 tmp[2];
+
+	tmp[0] = cpu_to_be64(val.address);
+	tmp[1] = cpu_to_be64(val.gpd_size);
+	return __gse_put(gsb, GSID_PROCESS_TABLE, sizeof(tmp), &tmp);
+}
+
+/**
+ * gse_put_buff_info() - adds a GSB description guest state element to a buffer
+ * @gsb: guest state buffer to add element to
+ * @iden: guest state ID
+ * @val: guest state buffer description value
+ */
+static inline int gse_put_buff_info(struct gs_buff *gsb, u16 iden,
+				    struct gs_buff_info val)
+{
+	__be64 tmp[2];
+
+	tmp[0] = cpu_to_be64(val.address);
+	tmp[1] = cpu_to_be64(val.size);
+	return __gse_put(gsb, iden, sizeof(tmp), &tmp);
+}
+
+int __gse_put(struct gs_buff *gsb, u16 iden, u16 size, const void *data);
+
+/**
+ * gse_get_be32() - return the data of a be32 element
+ * @gse: guest state element
+ */
+static inline __be32 gse_get_be32(const struct gs_elem *gse)
+{
+	return *(__be32 *)gse_data(gse);
+}
+
+/**
+ * gse_get_u32() - return the data of a be32 element in host endianness
+ * @gse: guest state element
+ */
+static inline u32 gse_get_u32(const struct gs_elem *gse)
+{
+	return be32_to_cpu(gse_get_be32(gse));
+}
+
+/**
+ * gse_get_be64() - return the data of a be64 element
+ * @gse: guest state element
+ */
+static inline __be64 gse_get_be64(const struct gs_elem *gse)
+{
+	return *(__be64 *)gse_data(gse);
+}
+
+/**
+ * gse_get_u64() - return the data of a be64 element in host endianness
+ * @gse: guest state element
+ */
+static inline u64 gse_get_u64(const struct gs_elem *gse)
+{
+	return be64_to_cpu(gse_get_be64(gse));
+}
+
+/**
+ * __gse_get_reg() - return the date of a register type guest state element
+ * @gse: guest state element
+ *
+ * Determine the element data size from its guest state ID and return the
+ * correctly sized value.
+ */
+static inline u64 __gse_get_reg(const struct gs_elem *gse)
+{
+	if (gse_len(gse) = sizeof(u64))
+		return gse_get_u64(gse);
+
+	if (gse_len(gse) = sizeof(u32)) {
+		u32 tmp;
+
+		tmp = gse_get_u32(gse);
+		return (u64)tmp;
+	}
+	return 0;
+}
+
+/**
+ * gse_get_vector128() - return the data of a vector element
+ * @gse: guest state element
+ */
+static inline vector128 gse_get_vector128(const struct gs_elem *gse)
+{
+	union {
+		__vector128 v;
+		u64 dw[2];
+	} u = { 0 };
+	__be64 *src;
+
+	src = (__be64 *)gse_data(gse);
+	u.dw[TS_FPROFFSET] = be64_to_cpu(src[0]);
+#ifdef CONFIG_VSX
+	u.dw[TS_VSRLOWOFFSET] = be64_to_cpu(src[1]);
+#endif
+	return u.v;
+}
+
+/**
+ * gse_put - add a guest state element to a buffer
+ * @gsb: guest state buffer to add to
+ * @iden: guest state identity
+ * @v: generic value
+ */
+#define gse_put(gsb, iden, v)					\
+	(_Generic((v),						\
+		  u64 : __gse_put_reg,				\
+		  long unsigned int : __gse_put_reg,		\
+		  u32 : __gse_put_reg,				\
+		  struct gs_buff_info : gse_put_buff_info,	\
+		  struct gs_proc_table : gse_put_proc_table,	\
+		  struct gs_part_table : gse_put_part_table,	\
+		  vector128 : gse_put_vector128)(gsb, iden, v))
+
+/**
+ * gse_get - return the data of a guest state element
+ * @gsb: guest state element to add to
+ * @v: generic value pointer to return in
+ */
+#define gse_get(gse, v)						\
+	(*v = (_Generic((v),					\
+			u64 * : __gse_get_reg,			\
+			unsigned long * : __gse_get_reg,	\
+			u32 * : __gse_get_reg,			\
+			vector128 * : gse_get_vector128)(gse)))
+
+/**************************************************************************
+ * Guest State Bitmap
+ **************************************************************************/
+
+bool gsbm_test(struct gs_bitmap *gsbm, u16 iden);
+void gsbm_set(struct gs_bitmap *gsbm, u16 iden);
+void gsbm_clear(struct gs_bitmap *gsbm, u16 iden);
+u16 gsbm_next(struct gs_bitmap *gsbm, u16 prev);
+
+/**
+ * gsbm_zero - zero the entire bitmap
+ * @gsbm: guest state buffer bitmap
+ */
+static inline void gsbm_zero(struct gs_bitmap *gsbm)
+{
+	bitmap_zero(gsbm->bitmap, GSE_IDEN_COUNT);
+}
+
+/**
+ * gsbm_fill - fill the entire bitmap
+ * @gsbm: guest state buffer bitmap
+ */
+static inline void gsbm_fill(struct gs_bitmap *gsbm)
+{
+	bitmap_fill(gsbm->bitmap, GSE_IDEN_COUNT);
+	clear_bit(0, gsbm->bitmap);
+}
+
+/**
+ * gsbm_for_each - iterate the present guest state IDs
+ * @gsbm: guest state buffer bitmap
+ * @iden: current guest state ID
+ */
+#define gsbm_for_each(gsbm, iden) \
+	for (iden = gsbm_next(gsbm, 0); iden != 0; iden = gsbm_next(gsbm, iden))
+
+
+/**************************************************************************
+ * Guest State Parser
+ **************************************************************************/
+
+void gsp_insert(struct gs_parser *gsp, u16 iden, struct gs_elem *gse);
+struct gs_elem *gsp_lookup(struct gs_parser *gsp, u16 iden);
+
+/**
+ * gsp_for_each - iterate the <guest state IDs, guest state element> pairs
+ * @gsp: guest state buffer bitmap
+ * @iden: current guest state ID
+ * @gse: guest state element
+ */
+#define gsp_for_each(gsp, iden, gse)                              \
+	for (iden = gsbm_next(&(gsp)->iterator, 0),               \
+	    gse = gsp_lookup((gsp), iden);                        \
+	     iden != 0; iden = gsbm_next(&(gsp)->iterator, iden), \
+	    gse = gsp_lookup((gsp), iden))
+
+/**************************************************************************
+ * Guest State Message
+ **************************************************************************/
+
+/**
+ * gsm_for_each - iterate the guest state IDs included in a guest state message
+ * @gsp: guest state buffer bitmap
+ * @iden: current guest state ID
+ * @gse: guest state element
+ */
+#define gsm_for_each(gsm, iden)                            \
+	for (iden = gsbm_next(&gsm->bitmap, 0); iden != 0; \
+	     iden = gsbm_next(&gsm->bitmap, iden))
+
+int gsm_init(struct gs_msg *mgs, struct gs_msg_ops *ops, void *data,
+	     unsigned long flags);
+
+struct gs_msg *gsm_new(struct gs_msg_ops *ops, void *data, unsigned long flags,
+		       gfp_t gfp_flags);
+void gsm_free(struct gs_msg *gsm);
+size_t gsm_size(struct gs_msg *gsm);
+int gsm_fill_info(struct gs_msg *gsm, struct gs_buff *gsb);
+int gsm_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb);
+
+/**
+ * gsm_include - indicate a guest state ID should be included when serializing
+ * @gsm: guest state message
+ * @iden: guest state ID
+ */
+static inline void gsm_include(struct gs_msg *gsm, u16 iden)
+{
+	gsbm_set(&gsm->bitmap, iden);
+}
+
+/**
+ * gsm_includes - check if a guest state ID will be included when serializing
+ * @gsm: guest state message
+ * @iden: guest state ID
+ */
+static inline bool gsm_includes(struct gs_msg *gsm, u16 iden)
+{
+	return gsbm_test(&gsm->bitmap, iden);
+}
+
+/**
+ * gsm_includes - indicate all guest state IDs should be included when serializing
+ * @gsm: guest state message
+ * @iden: guest state ID
+ */
+static inline void gsm_include_all(struct gs_msg *gsm)
+{
+	gsbm_fill(&gsm->bitmap);
+}
+
+/**
+ * gsm_include - clear the guest state IDs that should be included when serializing
+ * @gsm: guest state message
+ */
+static inline void gsm_reset(struct gs_msg *gsm)
+{
+	gsbm_zero(&gsm->bitmap);
+}
+
+#endif /* _ASM_POWERPC_GUEST_STATE_BUFFER_H */
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 77653c5b356b..0ca2d8b37b42 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -444,6 +444,7 @@ static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 v
 	vcpu->arch.fp.fpr[i][j] = val;
 }
 
+#ifdef CONFIG_VSX
 static inline vector128 kvmppc_get_vsx_vr(struct kvm_vcpu *vcpu, int i)
 {
 	return vcpu->arch.vr.vr[i];
@@ -463,6 +464,7 @@ static inline void kvmppc_set_vscr(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.vr.vscr.u[3] = val;
 }
+#endif
 
 #define BOOK3S_WRAPPER_SET(reg, size)					\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 5319d889b184..eb8445e71c14 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -87,8 +87,11 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
 	book3s_hv_ras.o \
 	book3s_hv_builtin.o \
 	book3s_hv_p9_perf.o \
+	guest-state-buffer.o \
 	$(kvm-book3s_64-builtin-tm-objs-y) \
 	$(kvm-book3s_64-builtin-xics-objs-y)
+
+obj-$(CONFIG_GUEST_STATE_BUFFER_TEST) += test-guest-state-buffer.o
 endif
 
 kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
diff --git a/arch/powerpc/kvm/guest-state-buffer.c b/arch/powerpc/kvm/guest-state-buffer.c
new file mode 100644
index 000000000000..db4a79bfcaf1
--- /dev/null
+++ b/arch/powerpc/kvm/guest-state-buffer.c
@@ -0,0 +1,563 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "asm/hvcall.h"
+#include <linux/log2.h>
+#include <asm/pgalloc.h>
+#include <asm/guest-state-buffer.h>
+
+static const u16 gse_iden_len[__GSE_TYPE_MAX] = {
+	[GSE_BE32] = sizeof(__be32),
+	[GSE_BE64] = sizeof(__be64),
+	[GSE_VEC128] = sizeof(vector128),
+	[GSE_PARTITION_TABLE] = sizeof(struct gs_part_table),
+	[GSE_PROCESS_TABLE] = sizeof(struct gs_proc_table),
+	[GSE_BUFFER] = sizeof(struct gs_buff_info),
+};
+
+/**
+ * gsb_new() - create a new guest state buffer
+ * @size: total size of the guest state buffer (includes header)
+ * @guest_id: guest_id
+ * @vcpu_id: vcpu_id
+ * @flags: GFP flags
+ *
+ * Returns a guest state buffer.
+ */
+struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
+			unsigned long vcpu_id, gfp_t flags)
+{
+	struct gs_buff *gsb;
+
+	gsb = kzalloc(sizeof(*gsb), flags);
+	if (!gsb)
+		return NULL;
+
+	size = roundup_pow_of_two(size);
+	gsb->hdr = kzalloc(size, GFP_KERNEL);
+	if (!gsb->hdr)
+		goto free;
+
+	gsb->capacity = size;
+	gsb->len = sizeof(struct gs_header);
+	gsb->vcpu_id = vcpu_id;
+	gsb->guest_id = guest_id;
+
+	gsb->hdr->nelems = cpu_to_be32(0);
+
+	return gsb;
+
+free:
+	kfree(gsb);
+	return NULL;
+}
+EXPORT_SYMBOL(gsb_new);
+
+/**
+ * gsb_free() - free a guest state buffer
+ * @gsb: guest state buffer
+ */
+void gsb_free(struct gs_buff *gsb)
+{
+	kfree(gsb->hdr);
+	kfree(gsb);
+}
+EXPORT_SYMBOL(gsb_free);
+
+/**
+ * gsb_put() - allocate space in a guest state buffer
+ * @gsb: buffer to allocate in
+ * @size: amount of space to allocate
+ *
+ * Returns a pointer to the amount of space requested within the buffer and
+ * increments the count of elements in the buffer.
+ *
+ * Does not check if there is enough space in the buffer.
+ */
+void *gsb_put(struct gs_buff *gsb, size_t size)
+{
+	u32 nelems = gsb_nelems(gsb);
+	void *p;
+
+	p = (void *)gsb_header(gsb) + gsb_len(gsb);
+	gsb->len += size;
+
+	gsb_header(gsb)->nelems = cpu_to_be32(nelems + 1);
+	return p;
+}
+EXPORT_SYMBOL(gsb_put);
+
+static int gsid_class(u16 iden)
+{
+	if ((iden >= GSE_GUESTWIDE_START) && (iden <= GSE_GUESTWIDE_END))
+		return GS_CLASS_GUESTWIDE;
+
+	if ((iden >= GSE_META_START) && (iden <= GSE_META_END))
+		return GS_CLASS_META;
+
+	if ((iden >= GSE_DW_REGS_START) && (iden <= GSE_DW_REGS_END))
+		return GS_CLASS_DWORD_REG;
+
+	if ((iden >= GSE_W_REGS_START) && (iden <= GSE_W_REGS_END))
+		return GS_CLASS_WORD_REG;
+
+	if ((iden >= GSE_VSRS_START) && (iden <= GSE_VSRS_END))
+		return GS_CLASS_VECTOR;
+
+	if ((iden >= GSE_INTR_REGS_START) && (iden <= GSE_INTR_REGS_END))
+		return GS_CLASS_INTR;
+
+	return -1;
+}
+
+static int gsid_type(u16 iden)
+{
+	int type = -1;
+
+	switch (gsid_class(iden)) {
+	case GS_CLASS_GUESTWIDE:
+		switch (iden) {
+		case GSID_HOST_STATE_SIZE:
+		case GSID_RUN_OUTPUT_MIN_SIZE:
+		case GSID_TB_OFFSET:
+			type = GSE_BE64;
+			break;
+		case GSID_PARTITION_TABLE:
+			type = GSE_PARTITION_TABLE;
+			break;
+		case GSID_PROCESS_TABLE:
+			type = GSE_PROCESS_TABLE;
+			break;
+		case GSID_LOGICAL_PVR:
+			type = GSE_BE32;
+			break;
+		}
+		break;
+	case GS_CLASS_META:
+		switch (iden) {
+		case GSID_RUN_INPUT:
+		case GSID_RUN_OUTPUT:
+			type = GSE_BUFFER;
+			break;
+		case GSID_VPA:
+			type = GSE_BE64;
+			break;
+		}
+		break;
+	case GS_CLASS_DWORD_REG:
+		type = GSE_BE64;
+		break;
+	case GS_CLASS_WORD_REG:
+		type = GSE_BE32;
+		break;
+	case GS_CLASS_VECTOR:
+		type = GSE_VEC128;
+		break;
+	case GS_CLASS_INTR:
+		switch (iden) {
+		case GSID_HDAR:
+		case GSID_ASDR:
+		case GSID_HEIR:
+			type = GSE_BE64;
+			break;
+		case GSID_HDSISR:
+			type = GSE_BE32;
+			break;
+		}
+		break;
+	}
+
+	return type;
+}
+
+/**
+ * gsid_flags() - the flags for a guest state ID
+ * @iden: guest state ID
+ *
+ * Returns any flags for the guest state ID.
+ */
+unsigned long gsid_flags(u16 iden)
+{
+	unsigned long flags = 0;
+
+	switch (gsid_class(iden)) {
+	case GS_CLASS_GUESTWIDE:
+		flags = GS_FLAGS_WIDE;
+		break;
+	case GS_CLASS_META:
+	case GS_CLASS_DWORD_REG:
+	case GS_CLASS_WORD_REG:
+	case GS_CLASS_VECTOR:
+	case GS_CLASS_INTR:
+		break;
+	}
+
+	return flags;
+}
+EXPORT_SYMBOL(gsid_flags);
+
+/**
+ * gsid_size() - the size of a guest state ID
+ * @iden: guest state ID
+ *
+ * Returns the size of guest state ID.
+ */
+u16 gsid_size(u16 iden)
+{
+	int type;
+
+	type = gsid_type(iden);
+	if (type = -1)
+		return 0;
+
+	if (type >= __GSE_TYPE_MAX)
+		return 0;
+
+	return gse_iden_len[type];
+}
+EXPORT_SYMBOL(gsid_size);
+
+/**
+ * gsid_mask() - the settable bits of a guest state ID
+ * @iden: guest state ID
+ *
+ * Returns a mask of settable bits for a guest state ID.
+ */
+u64 gsid_mask(u16 iden)
+{
+	u64 mask = ~0ull;
+
+	switch (iden) {
+	case GSID_LPCR:
+		mask = LPCR_DPFD | LPCR_ILE | LPCR_AIL | LPCR_LD | LPCR_MER | LPCR_GTSE;
+		break;
+	case GSID_MSR:
+		mask = ~(MSR_HV | MSR_S | MSR_ME);
+		break;
+	}
+
+	return mask;
+}
+EXPORT_SYMBOL(gsid_mask);
+
+/**
+ * __gse_put() - add a guest state element to a buffer
+ * @gsb: buffer to the element to
+ * @iden: guest state ID
+ * @size: length of data
+ * @data: pointer to data
+ */
+int __gse_put(struct gs_buff *gsb, u16 iden, u16 size, const void *data)
+{
+	struct gs_elem *gse;
+	u16 total_size;
+
+	total_size = sizeof(*gse) + size;
+	if (total_size + gsb_len(gsb) > gsb_capacity(gsb))
+		return -ENOMEM;
+
+	if (gsid_size(iden) != size)
+		return -EINVAL;
+
+	gse = gsb_put(gsb, total_size);
+	gse->iden = cpu_to_be16(iden);
+	gse->len = cpu_to_be16(size);
+	memcpy(gse->data, data, size);
+
+	return 0;
+}
+EXPORT_SYMBOL(__gse_put);
+
+/**
+ * gse_parse() - create a parse map from a guest state buffer
+ * @gsp: guest state parser
+ * @gsb: guest state buffer
+ */
+int gse_parse(struct gs_parser *gsp, struct gs_buff *gsb)
+{
+	struct gs_elem *curr;
+	int rem, i;
+
+	gsb_for_each_elem(i, curr, gsb, rem) {
+		if (gse_len(curr) != gsid_size(gse_iden(curr)))
+			return -EINVAL;
+		gsp_insert(gsp, gse_iden(curr), curr);
+	}
+
+	if (gsb_nelems(gsb) != i)
+		return -EINVAL;
+	return 0;
+}
+EXPORT_SYMBOL(gse_parse);
+
+static inline int gse_flatten_iden(u16 iden)
+{
+	int bit = 0;
+	int class;
+
+	class = gsid_class(iden);
+
+	if (class = GS_CLASS_GUESTWIDE) {
+		bit += iden - GSE_GUESTWIDE_START;
+		return bit;
+	}
+
+	bit += GSE_GUESTWIDE_COUNT;
+
+	if (class = GS_CLASS_META) {
+		bit += iden - GSE_META_START;
+		return bit;
+	}
+
+	bit += GSE_META_COUNT;
+
+	if (class = GS_CLASS_DWORD_REG) {
+		bit += iden - GSE_DW_REGS_START;
+		return bit;
+	}
+
+	bit += GSE_DW_REGS_COUNT;
+
+	if (class = GS_CLASS_WORD_REG) {
+		bit += iden - GSE_W_REGS_START;
+		return bit;
+	}
+
+	bit += GSE_W_REGS_COUNT;
+
+	if (class = GS_CLASS_VECTOR) {
+		bit += iden - GSE_VSRS_START;
+		return bit;
+	}
+
+	bit += GSE_VSRS_COUNT;
+
+	if (class = GS_CLASS_INTR) {
+		bit += iden - GSE_INTR_REGS_START;
+		return bit;
+	}
+
+	return 0;
+}
+
+static inline u16 gse_unflatten_iden(int bit)
+{
+	u16 iden;
+
+	if (bit < GSE_GUESTWIDE_COUNT) {
+		iden = GSE_GUESTWIDE_START + bit;
+		return iden;
+	}
+	bit -= GSE_GUESTWIDE_COUNT;
+
+	if (bit < GSE_META_COUNT) {
+		iden = GSE_META_START + bit;
+		return iden;
+	}
+	bit -= GSE_META_COUNT;
+
+	if (bit < GSE_DW_REGS_COUNT) {
+		iden = GSE_DW_REGS_START + bit;
+		return iden;
+	}
+	bit -= GSE_DW_REGS_COUNT;
+
+	if (bit < GSE_W_REGS_COUNT) {
+		iden = GSE_W_REGS_START + bit;
+		return iden;
+	}
+	bit -= GSE_W_REGS_COUNT;
+
+	if (bit < GSE_VSRS_COUNT) {
+		iden = GSE_VSRS_START + bit;
+		return iden;
+	}
+	bit -= GSE_VSRS_COUNT;
+
+	if (bit < GSE_IDEN_COUNT) {
+		iden = GSE_INTR_REGS_START + bit;
+		return iden;
+	}
+
+	return 0;
+}
+
+/**
+ * gsp_insert() - add a mapping from an guest state ID to an element
+ * @gsp: guest state parser
+ * @iden: guest state id (key)
+ * @gse: guest state element (value)
+ */
+void gsp_insert(struct gs_parser *gsp, u16 iden, struct gs_elem *gse)
+{
+	int i;
+
+	i = gse_flatten_iden(iden);
+	gsbm_set(&gsp->iterator, iden);
+	gsp->gses[i] = gse;
+}
+EXPORT_SYMBOL(gsp_insert);
+
+/**
+ * gsp_lookup() - lookup an element from a guest state ID
+ * @gsp: guest state parser
+ * @iden: guest state ID (key)
+ *
+ * Returns the guest state element if present.
+ */
+struct gs_elem *gsp_lookup(struct gs_parser *gsp, u16 iden)
+{
+	int i;
+
+	i = gse_flatten_iden(iden);
+	return gsp->gses[i];
+}
+EXPORT_SYMBOL(gsp_lookup);
+
+/**
+ * gsbm_set() - set the guest state ID
+ * @gsbm: guest state bitmap
+ * @iden: guest state ID
+ */
+void gsbm_set(struct gs_bitmap *gsbm, u16 iden)
+{
+	set_bit(gse_flatten_iden(iden), gsbm->bitmap);
+}
+EXPORT_SYMBOL(gsbm_set);
+
+/**
+ * gsbm_clear() - clear the guest state ID
+ * @gsbm: guest state bitmap
+ * @iden: guest state ID
+ */
+void gsbm_clear(struct gs_bitmap *gsbm, u16 iden)
+{
+	clear_bit(gse_flatten_iden(iden), gsbm->bitmap);
+}
+EXPORT_SYMBOL(gsbm_clear);
+
+/**
+ * gsbm_test() - test the guest state ID
+ * @gsbm: guest state bitmap
+ * @iden: guest state ID
+ */
+bool gsbm_test(struct gs_bitmap *gsbm, u16 iden)
+{
+	return test_bit(gse_flatten_iden(iden), gsbm->bitmap);
+}
+EXPORT_SYMBOL(gsbm_test);
+
+/**
+ * gsbm_next() - return the next set guest state ID
+ * @gsbm: guest state bitmap
+ * @prev: last guest state ID
+ */
+u16 gsbm_next(struct gs_bitmap *gsbm, u16 prev)
+{
+	int bit, pbit;
+
+	pbit = prev ? gse_flatten_iden(prev) + 1 : 0;
+	bit = find_next_bit(gsbm->bitmap, GSE_IDEN_COUNT, pbit);
+
+	if (bit < GSE_IDEN_COUNT)
+		return gse_unflatten_iden(bit);
+	return 0;
+}
+EXPORT_SYMBOL(gsbm_next);
+
+/**
+ * gsm_init() - initialize a guest state message
+ * @gsm: guest state message
+ * @ops: callbacks
+ * @data: private data
+ * @flags: guest wide or thread wide
+ */
+int gsm_init(struct gs_msg *gsm, struct gs_msg_ops *ops, void *data,
+	     unsigned long flags)
+{
+	memset(gsm, 0, sizeof(*gsm));
+	gsm->ops = ops;
+	gsm->data = data;
+	gsm->flags = flags;
+
+	return 0;
+}
+EXPORT_SYMBOL(gsm_init);
+
+/**
+ * gsm_init() - creates a new guest state message
+ * @ops: callbacks
+ * @data: private data
+ * @flags: guest wide or thread wide
+ * @gfp_flags: GFP allocation flags
+ *
+ * Returns an initialized guest state message.
+ */
+struct gs_msg *gsm_new(struct gs_msg_ops *ops, void *data, unsigned long flags,
+		       gfp_t gfp_flags)
+{
+	struct gs_msg *gsm;
+
+	gsm = kzalloc(sizeof(*gsm), gfp_flags);
+	if (!gsm)
+		return NULL;
+
+	gsm_init(gsm, ops, data, flags);
+
+	return gsm;
+}
+EXPORT_SYMBOL(gsm_new);
+
+/**
+ * gsm_size() - creates a new guest state message
+ * @gsm: self
+ *
+ * Returns the size required for the message.
+ */
+size_t gsm_size(struct gs_msg *gsm)
+{
+	if (gsm->ops->get_size)
+		return gsm->ops->get_size(gsm);
+	return 0;
+}
+EXPORT_SYMBOL(gsm_size);
+
+/**
+ * gsm_free() - free guest state message
+ * @gsm: guest state message
+ *
+ * Returns the size required for the message.
+ */
+void gsm_free(struct gs_msg *gsm)
+{
+	kfree(gsm);
+}
+EXPORT_SYMBOL(gsm_free);
+
+/**
+ * gsm_fill_info() - serialises message to guest state buffer format
+ * @gsm: self
+ * @gsb: buffer to serialise into
+ */
+int gsm_fill_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	if (!gsm->ops->fill_info)
+		return -EINVAL;
+
+	gsb_reset(gsb);
+	return gsm->ops->fill_info(gsb, gsm);
+}
+EXPORT_SYMBOL(gsm_fill_info);
+
+/**
+ * gsm_fill_info() - deserialises from guest state buffer
+ * @gsm: self
+ * @gsb: buffer to serialise from
+ */
+int gsm_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	if (!gsm->ops->fill_info)
+		return -EINVAL;
+
+	return gsm->ops->refresh_info(gsm, gsb);
+}
+EXPORT_SYMBOL(gsm_refresh_info);
diff --git a/arch/powerpc/kvm/test-guest-state-buffer.c b/arch/powerpc/kvm/test-guest-state-buffer.c
new file mode 100644
index 000000000000..d038051b61f8
--- /dev/null
+++ b/arch/powerpc/kvm/test-guest-state-buffer.c
@@ -0,0 +1,321 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/init.h>
+#include <linux/log2.h>
+#include <kunit/test.h>
+
+
+#include <asm/guest-state-buffer.h>
+
+static void test_creating_buffer(struct kunit *test)
+{
+	struct gs_buff *gsb;
+	size_t size = 0x100;
+
+	gsb = gsb_new(size, 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb->hdr);
+
+	KUNIT_EXPECT_EQ(test, gsb->capacity, roundup_pow_of_two(size));
+	KUNIT_EXPECT_EQ(test, gsb->len, sizeof(__be32));
+
+	gsb_free(gsb);
+}
+
+static void test_adding_element(struct kunit *test)
+{
+	const struct gs_elem *head, *curr;
+	union {
+		__vector128 v;
+		u64 dw[2];
+	} u;
+	int rem;
+	struct gs_buff *gsb;
+	size_t size = 0x1000;
+	int i, rc;
+	u64 data;
+
+	gsb = gsb_new(size, 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	/* Single elements, direct use of __gse_put() */
+	data = 0xdeadbeef;
+	rc = __gse_put(gsb, GSID_GPR(0), 8, &data);
+	KUNIT_EXPECT_GE(test, rc, 0);
+
+	head = gsb_data(gsb);
+	KUNIT_EXPECT_EQ(test, gse_iden(head), GSID_GPR(0));
+	KUNIT_EXPECT_EQ(test, gse_len(head), 8);
+	data = 0;
+	memcpy(&data, gse_data(head), 8);
+	KUNIT_EXPECT_EQ(test, data, 0xdeadbeef);
+
+	/* Multiple elements, simple wrapper */
+	rc = gse_put_u64(gsb, GSID_GPR(1), 0xcafef00d);
+	KUNIT_EXPECT_GE(test, rc, 0);
+
+	u.dw[0] = 0x1;
+	u.dw[1] = 0x2;
+	rc = gse_put_vector128(gsb, GSID_VSRS(0), u.v);
+	KUNIT_EXPECT_GE(test, rc, 0);
+	u.dw[0] = 0x0;
+	u.dw[1] = 0x0;
+
+	gsb_for_each_elem(i, curr, gsb, rem) {
+		switch (i) {
+		case 0:
+			KUNIT_EXPECT_EQ(test, gse_iden(curr), GSID_GPR(0));
+			KUNIT_EXPECT_EQ(test, gse_len(curr), 8);
+			KUNIT_EXPECT_EQ(test, gse_get_be64(curr), 0xdeadbeef);
+			break;
+		case 1:
+			KUNIT_EXPECT_EQ(test, gse_iden(curr), GSID_GPR(1));
+			KUNIT_EXPECT_EQ(test, gse_len(curr), 8);
+			KUNIT_EXPECT_EQ(test, gse_get_u64(curr), 0xcafef00d);
+			break;
+		case 2:
+			KUNIT_EXPECT_EQ(test, gse_iden(curr), GSID_VSRS(0));
+			KUNIT_EXPECT_EQ(test, gse_len(curr), 16);
+			u.v = gse_get_vector128(curr);
+			KUNIT_EXPECT_EQ(test, u.dw[0], 0x1);
+			KUNIT_EXPECT_EQ(test, u.dw[1], 0x2);
+			break;
+		}
+	}
+	KUNIT_EXPECT_EQ(test, i, 3);
+
+	gsb_reset(gsb);
+	KUNIT_EXPECT_EQ(test, gsb_nelems(gsb), 0);
+	KUNIT_EXPECT_EQ(test, gsb_len(gsb), sizeof(struct gs_header));
+
+	gsb_free(gsb);
+}
+
+static void test_gs_parsing(struct kunit *test)
+{
+	struct gs_elem *gse;
+	struct gs_parser gsp = { 0 };
+	struct gs_buff *gsb;
+	size_t size = 0x1000;
+	u64 tmp1, tmp2;
+
+	gsb = gsb_new(size, 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	tmp1 = 0xdeadbeefull;
+	gse_put(gsb, GSID_GPR(0), tmp1);
+
+	KUNIT_EXPECT_GE(test, gse_parse(&gsp, gsb), 0);
+
+	gse = gsp_lookup(&gsp, GSID_GPR(0));
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gse);
+
+	gse_get(gse, &tmp2);
+	KUNIT_EXPECT_EQ(test, tmp2, 0xdeadbeefull);
+
+	gsb_free(gsb);
+}
+
+static void test_gs_bitmap(struct kunit *test)
+{
+	struct gs_bitmap gsbm = { 0 };
+	struct gs_bitmap gsbm1 = { 0 };
+	struct gs_bitmap gsbm2 = { 0 };
+	u16 iden;
+	int i, j;
+
+	i = 0;
+	for (u16 iden = GSID_HOST_STATE_SIZE;
+	     iden <= GSID_PROCESS_TABLE; iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_RUN_INPUT; iden <= GSID_VPA;
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_GPR(0); iden <= GSID_CTRL;
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_CR; iden <= GSID_PSPB; iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_VSRS(0); iden <= GSID_VSRS(63);
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	for (u16 iden = GSID_HDAR; iden <= GSID_ASDR;
+	     iden++) {
+		gsbm_set(&gsbm, iden);
+		gsbm_set(&gsbm1, iden);
+		KUNIT_EXPECT_TRUE(test, gsbm_test(&gsbm, iden));
+		gsbm_clear(&gsbm, iden);
+		KUNIT_EXPECT_FALSE(test, gsbm_test(&gsbm, iden));
+		i++;
+	}
+
+	j = 0;
+	gsbm_for_each(&gsbm1, iden)
+	{
+		gsbm_set(&gsbm2, iden);
+		j++;
+	}
+	KUNIT_EXPECT_EQ(test, i, j);
+	KUNIT_EXPECT_MEMEQ(test, &gsbm1, &gsbm2, sizeof(gsbm1));
+}
+
+struct gs_msg_test1_data {
+	u64 a;
+	u32 b;
+	struct gs_part_table c;
+	struct gs_proc_table d;
+	struct gs_buff_info e;
+};
+
+static size_t test1_get_size(struct gs_msg *gsm)
+{
+	size_t size = 0;
+	u16 ids[] = {
+		GSID_PARTITION_TABLE,
+		GSID_PROCESS_TABLE,
+		GSID_RUN_INPUT,
+		GSID_GPR(0),
+		GSID_CR,
+	};
+
+	for (int i = 0; i < ARRAY_SIZE(ids); i++)
+		size += gse_total_size(gsid_size(ids[i]));
+	return size;
+}
+
+static int test1_fill_info(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	struct gs_msg_test1_data *data = gsm->data;
+
+	if (gsm_includes(gsm, GSID_GPR(0)))
+		gse_put(gsb, GSID_GPR(0), data->a);
+
+	if (gsm_includes(gsm, GSID_CR))
+		gse_put(gsb, GSID_CR, data->b);
+
+	if (gsm_includes(gsm, GSID_PARTITION_TABLE))
+		gse_put(gsb, GSID_PARTITION_TABLE, data->c);
+
+	if (gsm_includes(gsm, GSID_PROCESS_TABLE))
+		gse_put(gsb, GSID_PARTITION_TABLE, data->d);
+
+	if (gsm_includes(gsm, GSID_RUN_INPUT))
+		gse_put(gsb, GSID_RUN_INPUT, data->e);
+
+	return 0;
+}
+
+static int test1_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	struct gs_parser gsp = { 0 };
+	struct gs_msg_test1_data *data = gsm->data;
+	struct gs_elem *gse;
+	int rc;
+
+	rc = gse_parse(&gsp, gsb);
+	if (rc < 0)
+		return rc;
+
+	gse = gsp_lookup(&gsp, GSID_GPR(0));
+	if (gse)
+		gse_get(gse, &data->a);
+
+	gse = gsp_lookup(&gsp, GSID_CR);
+	if (gse)
+		gse_get(gse, &data->b);
+
+	return 0;
+}
+
+static struct gs_msg_ops gs_msg_test1_ops = {
+	.get_size = test1_get_size,
+	.fill_info = test1_fill_info,
+	.refresh_info = test1_refresh_info,
+};
+
+static void test_gs_msg(struct kunit *test)
+{
+	struct gs_msg_test1_data test1_data = {
+		.a = 0xdeadbeef,
+		.b = 0x1,
+	};
+	struct gs_msg *gsm;
+	struct gs_buff *gsb;
+
+	gsm = gsm_new(&gs_msg_test1_ops, &test1_data, GSM_SEND, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsm);
+
+	gsb = gsb_new(gsm_size(gsm), 0, 0, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gsb);
+
+	gsm_include(gsm, GSID_PARTITION_TABLE);
+	gsm_include(gsm, GSID_PROCESS_TABLE);
+	gsm_include(gsm, GSID_RUN_INPUT);
+	gsm_include(gsm, GSID_GPR(0));
+	gsm_include(gsm, GSID_CR);
+
+	gsm_fill_info(gsm, gsb);
+
+	memset(&test1_data, 0, sizeof(test1_data));
+
+	gsm_refresh_info(gsm, gsb);
+	KUNIT_EXPECT_EQ(test, test1_data.a, 0xdeadbeef);
+	KUNIT_EXPECT_EQ(test, test1_data.b, 0x1);
+
+	gsm_free(gsm);
+}
+
+
+static struct kunit_case guest_state_buffer_testcases[] = {
+	KUNIT_CASE(test_creating_buffer),
+	KUNIT_CASE(test_adding_element),
+	KUNIT_CASE(test_gs_bitmap),
+	KUNIT_CASE(test_gs_parsing),
+	KUNIT_CASE(test_gs_msg),
+	{}
+};
+
+static struct kunit_suite guest_state_buffer_test_suite = {
+	.name = "guest_state_buffer_test",
+	.test_cases = guest_state_buffer_testcases,
+};
+
+kunit_test_suites(&guest_state_buffer_test_suite);
+
+MODULE_LICENSE("GPL");
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 5/6] KVM: PPC: Add support for nested PAPR guests
  2023-06-05  6:48 ` Jordan Niethe
  (?)
@ 2023-06-05  6:48   ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

A series of hcalls have been added to the PAPR which allow a regular
guest partition to create and manage guest partitions of its own. Add
support to KVM to utilize these hcalls to enable running nested guests.

Overview of the new hcall usage:

- L1 and L0 negotiate capabilities with
  H_GUEST_{G,S}ET_CAPABILITIES()

- L1 requests the L0 create a L2 with
  H_GUEST_CREATE() and receives a handle to use in future hcalls

- L1 requests the L0 create a L2 vCPU with
  H_GUEST_CREATE_VCPU()

- L1 sets up the L2 using H_GUEST_SET and the
  H_GUEST_VCPU_RUN input buffer

- L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN()

- L2 returns to L1 with an exit reason and L1 reads the
  H_GUEST_VCPU_RUN output buffer populated by the L0

- L1 handles the exit using H_GET_STATE if necessary

- L1 reruns L2 vCPU with H_GUEST_VCPU_RUN

- L1 frees the L2 in the L0 with H_GUEST_DELETE()

Support for the new API is determined by trying
H_GUEST_GET_CAPABILITIES. On a successful return, the new API will then
be used.

Use the vcpu register state setters for tracking modified guest state
elements and copy the thread wide values into the H_GUEST_VCPU_RUN input
buffer immediately before running a L2. The guest wide
elements can not be added to the input buffer so send them with a
separate H_GUEST_SET call if necessary.

Make the vcpu register getter load the corresponding value from the real
host with H_GUEST_GET. To avoid unnecessarily calling H_GUEST_GET, track
which values have already been loaded between H_GUEST_VCPU_RUN calls. If
an element is present in the H_GUEST_VCPU_RUN output buffer it also does
not need to be loaded again.

There is existing support for running nested guests on KVM
with powernv. However the interface used for this is not supported by
other PAPR hosts. This existing API is still supported.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
v2:
  - Declare op structs as static
  - Use expressions in switch case with local variables
  - Do not use the PVR for the LOGICAL PVR ID
  - Handle emul_inst as now a double word
  - Use new GPR(), etc macros
  - Determine PAPR nested capabilities from cpu features
---
 arch/powerpc/include/asm/guest-state-buffer.h | 105 +-
 arch/powerpc/include/asm/hvcall.h             |  30 +
 arch/powerpc/include/asm/kvm_book3s.h         | 122 ++-
 arch/powerpc/include/asm/kvm_book3s_64.h      |   6 +
 arch/powerpc/include/asm/kvm_host.h           |  21 +
 arch/powerpc/include/asm/kvm_ppc.h            |  64 +-
 arch/powerpc/include/asm/plpar_wrappers.h     | 198 ++++
 arch/powerpc/kvm/Makefile                     |   1 +
 arch/powerpc/kvm/book3s_hv.c                  | 126 ++-
 arch/powerpc/kvm/book3s_hv.h                  |  74 +-
 arch/powerpc/kvm/book3s_hv_nested.c           |  38 +-
 arch/powerpc/kvm/book3s_hv_papr.c             | 940 ++++++++++++++++++
 arch/powerpc/kvm/emulate_loadstore.c          |   4 +-
 arch/powerpc/kvm/guest-state-buffer.c         |  49 +
 14 files changed, 1684 insertions(+), 94 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_papr.c

diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
index 65a840abf1bb..116126edd8e2 100644
--- a/arch/powerpc/include/asm/guest-state-buffer.h
+++ b/arch/powerpc/include/asm/guest-state-buffer.h
@@ -5,6 +5,7 @@
 #ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
 #define _ASM_POWERPC_GUEST_STATE_BUFFER_H
 
+#include "asm/hvcall.h"
 #include <linux/gfp.h>
 #include <linux/bitmap.h>
 #include <asm/plpar_wrappers.h>
@@ -14,16 +15,16 @@
  **************************************************************************/
 #define GSID_BLANK			0x0000
 
-#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
-#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
-#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
-#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
-#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
-#define GSID_PROCESS_TABLE		0x0006 /* Process Table */
+#define GSID_HOST_STATE_SIZE		0x0001
+#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002
+#define GSID_LOGICAL_PVR		0x0003
+#define GSID_TB_OFFSET			0x0004
+#define GSID_PARTITION_TABLE		0x0005
+#define GSID_PROCESS_TABLE		0x0006
 
-#define GSID_RUN_INPUT			0x0C00 /* Run VCPU Input Buffer */
-#define GSID_RUN_OUTPUT			0x0C01 /* Run VCPU Out Buffer */
-#define GSID_VPA			0x0C02 /* HRA to Guest VCPU VPA */
+#define GSID_RUN_INPUT			0x0C00
+#define GSID_RUN_OUTPUT			0x0C01
+#define GSID_VPA			0x0C02
 
 #define GSID_GPR(x)			(0x1000 + (x))
 #define GSID_HDEC_EXPIRY_TB		0x1020
@@ -300,6 +301,8 @@ struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
 			unsigned long vcpu_id, gfp_t flags);
 void gsb_free(struct gs_buff *gsb);
 void *gsb_put(struct gs_buff *gsb, size_t size);
+int gsb_send(struct gs_buff *gsb, unsigned long flags);
+int gsb_recv(struct gs_buff *gsb, unsigned long flags);
 
 /**
  * gsb_header() - the header of a guest state buffer
@@ -898,4 +901,88 @@ static inline void gsm_reset(struct gs_msg *gsm)
 	gsbm_zero(&gsm->bitmap);
 }
 
+/**
+ * gsb_receive_data - flexibly update values from a guest state buffer
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ *
+ * Requests updated values for the guest state values included in the guest
+ * state message. The guest state message will then deserialize the guest state
+ * buffer.
+ */
+static inline int gsb_receive_data(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	int rc;
+
+	rc = gsm_fill_info(gsm, gsb);
+	if (rc < 0)
+		return rc;
+
+	rc = gsb_recv(gsb, gsm->flags);
+	if (rc < 0)
+		return rc;
+
+	rc = gsm_refresh_info(gsm, gsb);
+	if (rc < 0)
+		return rc;
+	return 0;
+}
+
+/**
+ * gsb_recv - receive a single guest state ID
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ * @iden: guest state identity
+ */
+static inline int gsb_receive_datum(struct gs_buff *gsb, struct gs_msg *gsm,
+				    u16 iden)
+{
+	int rc;
+
+	gsm_include(gsm, iden);
+	rc = gsb_receive_data(gsb, gsm);
+	if (rc < 0)
+		return rc;
+	gsm_reset(gsm);
+	return 0;
+}
+
+/**
+ * gsb_send_data - flexibly send values from a guest state buffer
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ *
+ * Sends the guest state values included in the guest state message.
+ */
+static inline int gsb_send_data(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	int rc;
+
+	rc = gsm_fill_info(gsm, gsb);
+	if (rc < 0)
+		return rc;
+	rc = gsb_send(gsb, gsm->flags);
+
+	return rc;
+}
+
+/**
+ * gsb_recv - send a single guest state ID
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ * @iden: guest state identity
+ */
+static inline int gsb_send_datum(struct gs_buff *gsb, struct gs_msg *gsm,
+				 u16 iden)
+{
+	int rc;
+
+	gsm_include(gsm, iden);
+	rc = gsb_send_data(gsb, gsm);
+	if (rc < 0)
+		return rc;
+	gsm_reset(gsm);
+	return 0;
+}
+
 #endif /* _ASM_POWERPC_GUEST_STATE_BUFFER_H */
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index c099780385dd..ddb99e982917 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -100,6 +100,18 @@
 #define H_COP_HW	-74
 #define H_STATE		-75
 #define H_IN_USE	-77
+
+#define H_INVALID_ELEMENT_ID			-79
+#define H_INVALID_ELEMENT_SIZE			-80
+#define H_INVALID_ELEMENT_VALUE			-81
+#define H_INPUT_BUFFER_NOT_DEFINED		-82
+#define H_INPUT_BUFFER_TOO_SMALL		-83
+#define H_OUTPUT_BUFFER_NOT_DEFINED		-84
+#define H_OUTPUT_BUFFER_TOO_SMALL		-85
+#define H_PARTITION_PAGE_TABLE_NOT_DEFINED	-86
+#define H_GUEST_VCPU_STATE_NOT_HV_OWNED		-87
+
+
 #define H_UNSUPPORTED_FLAG_START	-256
 #define H_UNSUPPORTED_FLAG_END		-511
 #define H_MULTI_THREADS_ACTIVE	-9005
@@ -381,6 +393,15 @@
 #define H_ENTER_NESTED		0xF804
 #define H_TLB_INVALIDATE	0xF808
 #define H_COPY_TOFROM_GUEST	0xF80C
+#define H_GUEST_GET_CAPABILITIES 0x460
+#define H_GUEST_SET_CAPABILITIES 0x464
+#define H_GUEST_CREATE		0x470
+#define H_GUEST_CREATE_VCPU	0x474
+#define H_GUEST_GET_STATE	0x478
+#define H_GUEST_SET_STATE	0x47C
+#define H_GUEST_RUN_VCPU	0x480
+#define H_GUEST_COPY_MEMORY	0x484
+#define H_GUEST_DELETE		0x488
 
 /* Flags for H_SVM_PAGE_IN */
 #define H_PAGE_IN_SHARED        0x1
@@ -467,6 +488,15 @@
 #define H_RPTI_PAGE_1G	0x08
 #define H_RPTI_PAGE_ALL (-1UL)
 
+/* Flags for H_GUEST_{S,G}_STATE */
+#define H_GUEST_FLAGS_WIDE     (1UL<<(63-0))
+
+/* Flag values used for H_{S,G}SET_GUEST_CAPABILITIES */
+#define H_GUEST_CAP_COPY_MEM	(1UL<<(63-0))
+#define H_GUEST_CAP_POWER9	(1UL<<(63-1))
+#define H_GUEST_CAP_POWER10	(1UL<<(63-2))
+#define H_GUEST_CAP_BITMAP2	(1UL<<(63-63))
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 0ca2d8b37b42..c5c57552b447 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -12,6 +12,7 @@
 #include <linux/types.h>
 #include <linux/kvm_host.h>
 #include <asm/kvm_book3s_asm.h>
+#include <asm/guest-state-buffer.h>
 
 struct kvmppc_bat {
 	u64 raw;
@@ -316,6 +317,57 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu);
 
 void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
 
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+
+extern bool __kvmhv_on_papr;
+
+static inline bool kvmhv_on_papr(void)
+{
+	return __kvmhv_on_papr;
+}
+
+#else
+
+static inline bool kvmhv_on_papr(void)
+{
+	return false;
+}
+
+#endif
+
+int __kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs);
+int __kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs);
+int __kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden);
+int __kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden);
+
+static inline int kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_reload_ptregs(vcpu, regs);
+	return 0;
+}
+static inline int kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_mark_dirty_ptregs(vcpu, regs);
+	return 0;
+}
+
+static inline int kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_mark_dirty(vcpu, iden);
+	return 0;
+}
+
+static inline int kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_cached_reload(vcpu, iden);
+	return 0;
+}
+
 extern int kvm_irq_bypass;
 
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
@@ -335,70 +387,84 @@ static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
 {
 	vcpu->arch.regs.gpr[num] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_GPR(num));
 }
 
 static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_GPR(num));
 	return vcpu->arch.regs.gpr[num];
 }
 
 static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.regs.ccr = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_CR);
 }
 
 static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_CR);
 	return vcpu->arch.regs.ccr;
 }
 
 static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.xer = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_XER);
 }
 
 static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_XER);
 	return vcpu->arch.regs.xer;
 }
 
 static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.ctr = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_CTR);
 }
 
 static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_CTR);
 	return vcpu->arch.regs.ctr;
 }
 
 static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.link = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_LR);
 }
 
 static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_LR);
 	return vcpu->arch.regs.link;
 }
 
 static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.nip = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_NIA);
 }
 
 static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_NIA);
 	return vcpu->arch.regs.nip;
 }
 
 static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.pid = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_PIDR);
 }
 
 static inline u32 kvmppc_get_pid(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_PIDR);
 	return vcpu->arch.pid;
 }
 
@@ -415,111 +481,129 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 
 static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSRS(i));
 	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
 }
 
 static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
 {
 	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSRS(i));
 }
 
 static inline u64 kvmppc_get_fpscr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_FPSCR);
 	return vcpu->arch.fp.fpscr;
 }
 
 static inline void kvmppc_set_fpscr(struct kvm_vcpu *vcpu, u64 val)
 {
 	vcpu->arch.fp.fpscr = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_FPSCR);
 }
 
 
 static inline u64 kvmppc_get_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSRS(i));
 	return vcpu->arch.fp.fpr[i][j];
 }
 
 static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 val)
 {
 	vcpu->arch.fp.fpr[i][j] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSRS(i));
 }
 
 #ifdef CONFIG_VSX
 static inline vector128 kvmppc_get_vsx_vr(struct kvm_vcpu *vcpu, int i)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSRS(32 + i));
 	return vcpu->arch.vr.vr[i];
 }
 
 static inline void kvmppc_set_vsx_vr(struct kvm_vcpu *vcpu, int i, vector128 val)
 {
 	vcpu->arch.vr.vr[i] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSRS(32 + i));
 }
 
 static inline u32 kvmppc_get_vscr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSCR);
 	return vcpu->arch.vr.vscr.u[3];
 }
 
 static inline void kvmppc_set_vscr(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.vr.vscr.u[3] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSCR);
 }
 #endif
 
-#define BOOK3S_WRAPPER_SET(reg, size)					\
+#define BOOK3S_WRAPPER_SET(reg, size, iden)				\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 									\
 	vcpu->arch.reg = val;						\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }
 
-#define BOOK3S_WRAPPER_GET(reg, size)					\
+#define BOOK3S_WRAPPER_GET(reg, size, iden)				\
 static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	return vcpu->arch.reg;						\
 }
 
-#define BOOK3S_WRAPPER(reg, size)					\
-	BOOK3S_WRAPPER_SET(reg, size)					\
-	BOOK3S_WRAPPER_GET(reg, size)					\
+#define BOOK3S_WRAPPER(reg, size, iden)					\
+	BOOK3S_WRAPPER_SET(reg, size, iden)				\
+	BOOK3S_WRAPPER_GET(reg, size, iden)				\
 
-BOOK3S_WRAPPER(tar, 64)
-BOOK3S_WRAPPER(ebbhr, 64)
-BOOK3S_WRAPPER(ebbrr, 64)
-BOOK3S_WRAPPER(bescr, 64)
-BOOK3S_WRAPPER(ic, 64)
-BOOK3S_WRAPPER(vrsave, 64)
+BOOK3S_WRAPPER(tar, 64, GSID_TAR)
+BOOK3S_WRAPPER(ebbhr, 64, GSID_EBBHR)
+BOOK3S_WRAPPER(ebbrr, 64, GSID_EBBRR)
+BOOK3S_WRAPPER(bescr, 64, GSID_BESCR)
+BOOK3S_WRAPPER(ic, 64, GSID_IC)
+BOOK3S_WRAPPER(vrsave, 64, GSID_VRSAVE)
 
 
-#define VCORE_WRAPPER_SET(reg, size)					\
+#define VCORE_WRAPPER_SET(reg, size, iden)				\
 static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 	vcpu->arch.vcore->reg = val;					\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }
 
-#define VCORE_WRAPPER_GET(reg, size)					\
+#define VCORE_WRAPPER_GET(reg, size, iden)				\
 static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	return vcpu->arch.vcore->reg;					\
 }
 
-#define VCORE_WRAPPER(reg, size)					\
-	VCORE_WRAPPER_SET(reg, size)					\
-	VCORE_WRAPPER_GET(reg, size)					\
+#define VCORE_WRAPPER(reg, size, iden)					\
+	VCORE_WRAPPER_SET(reg, size, iden)				\
+	VCORE_WRAPPER_GET(reg, size, iden)				\
 
 
-VCORE_WRAPPER(vtb, 64)
-VCORE_WRAPPER(tb_offset, 64)
-VCORE_WRAPPER(lpcr, 64)
+VCORE_WRAPPER(vtb, 64, GSID_VTB)
+VCORE_WRAPPER(tb_offset, 64, GSID_TB_OFFSET)
+VCORE_WRAPPER(lpcr, 64, GSID_LPCR)
 
 static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_TB_OFFSET);
+	kvmhv_papr_cached_reload(vcpu, GSID_DEC_EXPIRY_TB);
 	return vcpu->arch.dec_expires;
 }
 
 static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
 {
 	vcpu->arch.dec_expires = val;
+	kvmhv_papr_cached_reload(vcpu, GSID_TB_OFFSET);
+	kvmhv_papr_mark_dirty(vcpu, GSID_DEC_EXPIRY_TB);
 }
 
 /* Expiry time of vcpu DEC relative to host TB */
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index d49065af08e9..689e14284127 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -677,6 +677,12 @@ static inline pte_t *find_kvm_host_pte(struct kvm *kvm, unsigned long mmu_seq,
 extern pte_t *find_kvm_nested_guest_pte(struct kvm *kvm, unsigned long lpid,
 					unsigned long ea, unsigned *hshift);
 
+int kvmhv_papr_vcpu_create(struct kvm_vcpu *vcpu, struct kvmhv_papr_host *nested_state);
+void kvmhv_papr_vcpu_free(struct kvm_vcpu *vcpu, struct kvmhv_papr_host *nested_state);
+int kvmhv_papr_flush_vcpu(struct kvm_vcpu *vcpu, u64 time_limit);
+int kvmhv_papr_set_ptbl_entry(u64 lpid, u64 dw0, u64 dw1);
+int kvmhv_papr_parse_output(struct kvm_vcpu *vcpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 14ee0dece853..21e8bf9e530a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include <asm/cacheflush.h>
 #include <asm/hvcall.h>
 #include <asm/mce.h>
+#include <asm/guest-state-buffer.h>
 
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
@@ -509,6 +510,23 @@ union xive_tma_w01 {
 	__be64 w01;
 };
 
+ /* Nested PAPR host H_GUEST_RUN_VCPU configuration */
+struct kvmhv_papr_config {
+	struct gs_buff_info vcpu_run_output_cfg;
+	struct gs_buff_info vcpu_run_input_cfg;
+	u64 vcpu_run_output_size;
+};
+
+ /* Nested PAPR host state */
+struct kvmhv_papr_host {
+	struct kvmhv_papr_config cfg;
+	struct gs_buff *vcpu_run_output;
+	struct gs_buff *vcpu_run_input;
+	struct gs_msg *vcpu_message;
+	struct gs_msg *vcore_message;
+	struct gs_bitmap valids;
+};
+
 struct kvm_vcpu_arch {
 	ulong host_stack;
 	u32 host_pid;
@@ -575,6 +593,7 @@ struct kvm_vcpu_arch {
 	ulong dscr;
 	ulong amr;
 	ulong uamor;
+	ulong amor;
 	ulong iamr;
 	u32 ctrl;
 	u32 dabrx;
@@ -829,6 +848,8 @@ struct kvm_vcpu_arch {
 	u64 nested_hfscr;	/* HFSCR that the L1 requested for the nested guest */
 	u32 nested_vcpu_id;
 	gpa_t nested_io_gpr;
+	/* For nested APIv2 guests*/
+	struct kvmhv_papr_host papr_host;
 #endif
 
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index fbac353ac46b..4d43bb29ba7c 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -615,6 +615,35 @@ static inline bool kvmhv_on_pseries(void)
 {
 	return false;
 }
+
+#endif
+
+#ifndef CONFIG_PPC_BOOK3S
+
+static inline bool kvmhv_on_papr(void)
+{
+	return false;
+}
+
+static inline int kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	return 0;
+}
+static inline int kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	return 0;
+}
+
+static inline int kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden)
+{
+	return 0;
+}
+
+static inline int kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden)
+{
+	return 0;
+}
+
 #endif
 
 #ifdef CONFIG_KVM_XICS
@@ -957,31 +986,33 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
 }									\
 
-#define SHARED_CACHE_WRAPPER_GET(reg, size)				\
+#define SHARED_CACHE_WRAPPER_GET(reg, size, iden)			\
 static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	if (kvmppc_shared_big_endian(vcpu))				\
 	       return be##size##_to_cpu(vcpu->arch.shared->reg);	\
 	else								\
 	       return le##size##_to_cpu(vcpu->arch.shared->reg);	\
 }									\
 
-#define SHARED_CACHE_WRAPPER_SET(reg, size)				\
+#define SHARED_CACHE_WRAPPER_SET(reg, size, iden)			\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 	if (kvmppc_shared_big_endian(vcpu))				\
 	       vcpu->arch.shared->reg = cpu_to_be##size(val);		\
 	else								\
 	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }									\
 
 #define SHARED_WRAPPER(reg, size)					\
 	SHARED_WRAPPER_GET(reg, size)					\
 	SHARED_WRAPPER_SET(reg, size)					\
 
-#define SHARED_CACHE_WRAPPER(reg, size)					\
-	SHARED_CACHE_WRAPPER_GET(reg, size)				\
-	SHARED_CACHE_WRAPPER_SET(reg, size)				\
+#define SHARED_CACHE_WRAPPER(reg, size, iden)				\
+	SHARED_CACHE_WRAPPER_GET(reg, size, iden)			\
+	SHARED_CACHE_WRAPPER_SET(reg, size, iden)			\
 
 #define SPRNG_WRAPPER(reg, bookehv_spr)					\
 	SPRNG_WRAPPER_GET(reg, bookehv_spr)				\
@@ -1000,29 +1031,30 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 #define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)			\
 	SHARED_WRAPPER(reg, size)					\
 
-#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr)		\
-	SHARED_CACHE_WRAPPER(reg, size)					\
+#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr, iden)	\
+	SHARED_CACHE_WRAPPER(reg, size, iden)				\
 
 #endif
 
 SHARED_WRAPPER(critical, 64)
-SHARED_SPRNG_CACHE_WRAPPER(sprg0, 64, SPRN_GSPRG0)
-SHARED_SPRNG_CACHE_WRAPPER(sprg1, 64, SPRN_GSPRG1)
-SHARED_SPRNG_CACHE_WRAPPER(sprg2, 64, SPRN_GSPRG2)
-SHARED_SPRNG_CACHE_WRAPPER(sprg3, 64, SPRN_GSPRG3)
-SHARED_SPRNG_CACHE_WRAPPER(srr0, 64, SPRN_GSRR0)
-SHARED_SPRNG_CACHE_WRAPPER(srr1, 64, SPRN_GSRR1)
-SHARED_SPRNG_CACHE_WRAPPER(dar, 64, SPRN_GDEAR)
+SHARED_SPRNG_CACHE_WRAPPER(sprg0, 64, SPRN_GSPRG0, GSID_SPRG0)
+SHARED_SPRNG_CACHE_WRAPPER(sprg1, 64, SPRN_GSPRG1, GSID_SPRG1)
+SHARED_SPRNG_CACHE_WRAPPER(sprg2, 64, SPRN_GSPRG2, GSID_SPRG2)
+SHARED_SPRNG_CACHE_WRAPPER(sprg3, 64, SPRN_GSPRG3, GSID_SPRG3)
+SHARED_SPRNG_CACHE_WRAPPER(srr0, 64, SPRN_GSRR0, GSID_SRR0)
+SHARED_SPRNG_CACHE_WRAPPER(srr1, 64, SPRN_GSRR1, GSID_SRR1)
+SHARED_SPRNG_CACHE_WRAPPER(dar, 64, SPRN_GDEAR, GSID_DAR)
 SHARED_SPRNG_WRAPPER(esr, 64, SPRN_GESR)
-SHARED_CACHE_WRAPPER_GET(msr, 64)
+SHARED_CACHE_WRAPPER_GET(msr, 64, GSID_MSR)
 static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val)
 {
 	if (kvmppc_shared_big_endian(vcpu))
 	       vcpu->arch.shared->msr = cpu_to_be64(val);
 	else
 	       vcpu->arch.shared->msr = cpu_to_le64(val);
+	kvmhv_papr_mark_dirty(vcpu, GSID_MSR);
 }
-SHARED_CACHE_WRAPPER(dsisr, 32)
+SHARED_CACHE_WRAPPER(dsisr, 32, GSID_DSISR)
 SHARED_WRAPPER(int_pending, 32)
 SHARED_WRAPPER(sprg4, 64)
 SHARED_WRAPPER(sprg5, 64)
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index 8239c0af5eb2..b48f90884522 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -6,6 +6,7 @@
 
 #include <linux/string.h>
 #include <linux/irqflags.h>
+#include <linux/delay.h>
 
 #include <asm/hvcall.h>
 #include <asm/paca.h>
@@ -342,6 +343,203 @@ static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
 	return rc;
 }
 
+static inline long plpar_guest_create(unsigned long flags, unsigned long *guest_id)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	unsigned long token;
+	long rc;
+
+	token = -1UL;
+	while (true) {
+		rc = plpar_hcall(H_GUEST_CREATE, retbuf, flags, token);
+		if (rc == H_SUCCESS) {
+			*guest_id = retbuf[0];
+			break;
+		}
+
+		if (rc == H_BUSY) {
+			token = retbuf[0];
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			token = retbuf[0];
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_create_vcpu(unsigned long flags,
+					   unsigned long guest_id,
+					   unsigned long vcpu_id)
+{
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall_norets(H_GUEST_CREATE_VCPU, 0, guest_id, vcpu_id);
+
+		if (rc == H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_set_state(unsigned long flags,
+					 unsigned long guest_id,
+					 unsigned long vcpu_id,
+					 unsigned long data_buffer,
+					 unsigned long data_size,
+					 unsigned long *failed_index)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall(H_GUEST_SET_STATE, retbuf, flags, guest_id,
+				 vcpu_id, data_buffer, data_size);
+
+		if (rc == H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		if (rc == H_INVALID_ELEMENT_ID)
+			*failed_index = retbuf[0];
+		else if (rc == H_INVALID_ELEMENT_SIZE)
+			*failed_index = retbuf[0];
+		else if (rc == H_INVALID_ELEMENT_VALUE)
+			*failed_index = retbuf[0];
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_get_state(unsigned long flags,
+					 unsigned long guest_id,
+					 unsigned long vcpu_id,
+					 unsigned long data_buffer,
+					 unsigned long data_size,
+					 unsigned long *failed_index)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall(H_GUEST_GET_STATE, retbuf, flags, guest_id,
+				 vcpu_id, data_buffer, data_size);
+
+		if (rc == H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		if (rc == H_INVALID_ELEMENT_ID)
+			*failed_index = retbuf[0];
+		else if (rc == H_INVALID_ELEMENT_SIZE)
+			*failed_index = retbuf[0];
+		else if (rc == H_INVALID_ELEMENT_VALUE)
+			*failed_index = retbuf[0];
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_run_vcpu(unsigned long flags, unsigned long guest_id,
+					unsigned long vcpu_id, int *trap,
+					unsigned long *failed_index)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	rc = plpar_hcall(H_GUEST_RUN_VCPU, retbuf, flags, guest_id, vcpu_id);
+	if (rc == H_SUCCESS)
+		*trap = retbuf[0];
+	else if (rc == H_INVALID_ELEMENT_ID)
+		*failed_index = retbuf[0];
+	else if (rc == H_INVALID_ELEMENT_SIZE)
+		*failed_index = retbuf[0];
+	else if (rc == H_INVALID_ELEMENT_VALUE)
+		*failed_index = retbuf[0];
+
+	return rc;
+}
+
+static inline long plpar_guest_delete(unsigned long flags, u64 guest_id)
+{
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall_norets(H_GUEST_DELETE, flags, guest_id);
+		if (rc == H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_set_capabilities(unsigned long flags,
+						unsigned long capabilities)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	rc = plpar_hcall(H_GUEST_SET_CAPABILITIES, retbuf, flags, capabilities);
+
+	return rc;
+}
+
+static inline long plpar_guest_get_capabilities(unsigned long flags,
+						unsigned long *capabilities)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	rc = plpar_hcall(H_GUEST_GET_CAPABILITIES, retbuf, flags);
+	if (rc == H_SUCCESS)
+		*capabilities = retbuf[0];
+
+	return rc;
+}
+
 /*
  * Wrapper to H_RPT_INVALIDATE hcall that handles return values appropriately
  *
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index eb8445e71c14..9bb0876521ee 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -87,6 +87,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
 	book3s_hv_ras.o \
 	book3s_hv_builtin.o \
 	book3s_hv_p9_perf.o \
+	book3s_hv_papr.o \
 	guest-state-buffer.o \
 	$(kvm-book3s_64-builtin-tm-objs-y) \
 	$(kvm-book3s_64-builtin-xics-objs-y)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 521d84621422..f22ee582e209 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -383,6 +383,11 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 	spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
 }
 
+static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
+{
+	vcpu->arch.pvr = pvr;
+}
+
 /* Dummy value used in computing PCR value below */
 #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
 
@@ -1262,13 +1267,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 			return RESUME_HOST;
 		break;
 #endif
-	case H_RANDOM:
+	case H_RANDOM: {
 		unsigned long rand;
 
 		if (!arch_get_random_seed_longs(&rand, 1))
 			ret = H_HARDWARE;
 		kvmppc_set_gpr(vcpu, 4, rand);
 		break;
+	}
 	case H_RPT_INVALIDATE:
 		ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
 					      kvmppc_get_gpr(vcpu, 5),
@@ -2921,14 +2927,21 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 	vcpu->arch.shared_big_endian = false;
 #endif
 #endif
-	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
 
+	if (kvmhv_on_papr()) {
+		err = kvmhv_papr_vcpu_create(vcpu, &vcpu->arch.papr_host);
+		if (err < 0)
+			return err;
+	}
+
+	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
 	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
 		kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
 		kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
 	}
 
 	kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
+	kvmppc_set_amor_hv(vcpu, ~0);
 	/* default to host PVR, since we can't spoof it */
 	kvmppc_set_pvr_hv(vcpu, mfspr(SPRN_PVR));
 	spin_lock_init(&vcpu->arch.vpa_update_lock);
@@ -3006,6 +3019,8 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 			kvm->arch.vcores[core] = vcore;
 			kvm->arch.online_vcores++;
 			mutex_unlock(&kvm->arch.mmu_setup_lock);
+			if (kvmhv_on_papr())
+				kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
 		}
 	}
 	mutex_unlock(&kvm->lock);
@@ -3078,6 +3093,8 @@ static void kvmppc_core_vcpu_free_hv(struct kvm_vcpu *vcpu)
 	unpin_vpa(vcpu->kvm, &vcpu->arch.slb_shadow);
 	unpin_vpa(vcpu->kvm, &vcpu->arch.vpa);
 	spin_unlock(&vcpu->arch.vpa_update_lock);
+	if (kvmhv_on_papr())
+		kvmhv_papr_vcpu_free(vcpu, &vcpu->arch.papr_host);
 }
 
 static int kvmppc_core_check_requests_hv(struct kvm_vcpu *vcpu)
@@ -4042,6 +4059,50 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
 	}
 }
 
+static int kvmhv_vcpu_entry_papr(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
+{
+	struct kvmhv_papr_host *ph;
+	unsigned long msr, i;
+	int trap;
+	long rc;
+
+	ph = &vcpu->arch.papr_host;
+
+	msr = mfmsr();
+	kvmppc_msr_hard_disable_set_facilities(vcpu, msr);
+	if (lazy_irq_pending())
+		return 0;
+
+	kvmhv_papr_flush_vcpu(vcpu, time_limit);
+
+	accumulate_time(vcpu, &vcpu->arch.in_guest);
+	rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
+				  &trap, &i);
+
+	if (rc != H_SUCCESS) {
+		pr_err("KVM Guest Run VCPU hcall failed\n");
+		if (rc == H_INVALID_ELEMENT_ID)
+			pr_err("KVM: Guest Run VCPU invalid element id at %ld\n", i);
+		else if (rc == H_INVALID_ELEMENT_SIZE)
+			pr_err("KVM: Guest Run VCPU invalid element size at %ld\n", i);
+		else if (rc == H_INVALID_ELEMENT_VALUE)
+			pr_err("KVM: Guest Run VCPU invalid element value at %ld\n", i);
+		return 0;
+	}
+	accumulate_time(vcpu, &vcpu->arch.guest_exit);
+
+	*tb = mftb();
+	gsm_reset(ph->vcpu_message);
+	gsm_reset(ph->vcore_message);
+	gsbm_zero(&ph->valids);
+
+	kvmhv_papr_parse_output(vcpu);
+
+	timer_rearm_host_dec(*tb);
+
+	return trap;
+}
+
 /* call our hypervisor to load up HV regs and go */
 static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
 {
@@ -4159,7 +4220,10 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 	vcpu_vpa_increment_dispatch(vcpu);
 
 	if (kvmhv_on_pseries()) {
-		trap = kvmhv_vcpu_entry_p9_nested(vcpu, time_limit, lpcr, tb);
+		if (!kvmhv_on_papr())
+			trap = kvmhv_vcpu_entry_p9_nested(vcpu, time_limit, lpcr, tb);
+		else
+			trap = kvmhv_vcpu_entry_papr(vcpu, time_limit, lpcr, tb);
 
 		/* H_CEDE has to be handled now, not later */
 		if (trap == BOOK3S_INTERRUPT_SYSCALL && !nested &&
@@ -5119,6 +5183,7 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
  */
 void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
 {
+	struct kvm_vcpu *vcpu;
 	long int i;
 	u32 cores_done = 0;
 
@@ -5139,6 +5204,12 @@ void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
 		if (++cores_done >= kvm->arch.online_vcores)
 			break;
 	}
+
+	if (kvmhv_on_papr()) {
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
+		}
+	}
 }
 
 void kvmppc_setup_partition_table(struct kvm *kvm)
@@ -5405,15 +5476,43 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 
 	/* Allocate the guest's logical partition ID */
 
-	lpid = kvmppc_alloc_lpid();
-	if ((long)lpid < 0)
-		return -ENOMEM;
-	kvm->arch.lpid = lpid;
+	if (!kvmhv_on_papr()) {
+		lpid = kvmppc_alloc_lpid();
+		if ((long)lpid < 0)
+			return -ENOMEM;
+		kvm->arch.lpid = lpid;
+	}
 
 	kvmppc_alloc_host_rm_ops();
 
 	kvmhv_vm_nested_init(kvm);
 
+	if (kvmhv_on_papr()) {
+		long rc;
+		unsigned long guest_id;
+
+		rc = plpar_guest_create(0, &guest_id);
+
+		if (rc != H_SUCCESS)
+			pr_err("KVM: Create Guest hcall failed, rc=%ld\n", rc);
+
+		switch (rc) {
+		case H_PARAMETER:
+		case H_FUNCTION:
+		case H_STATE:
+			return -EINVAL;
+		case H_NOT_ENOUGH_RESOURCES:
+		case H_ABORTED:
+			return -ENOMEM;
+		case H_AUTHORITY:
+			return -EPERM;
+		case H_NOT_AVAILABLE:
+			return -EBUSY;
+		}
+		kvm->arch.lpid = guest_id;
+	}
+
+
 	/*
 	 * Since we don't flush the TLB when tearing down a VM,
 	 * and this lpid might have previously been used,
@@ -5483,7 +5582,10 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 			lpcr |= LPCR_HAIL;
 		ret = kvmppc_init_vm_radix(kvm);
 		if (ret) {
-			kvmppc_free_lpid(kvm->arch.lpid);
+			if (kvmhv_on_papr())
+				plpar_guest_delete(0, kvm->arch.lpid);
+			else
+				kvmppc_free_lpid(kvm->arch.lpid);
 			return ret;
 		}
 		kvmppc_setup_partition_table(kvm);
@@ -5573,10 +5675,14 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 		kvm->arch.process_table = 0;
 		if (kvm->arch.secure_guest)
 			uv_svm_terminate(kvm->arch.lpid);
-		kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
+		if (!kvmhv_on_papr())
+			kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
 	}
 
-	kvmppc_free_lpid(kvm->arch.lpid);
+	if (kvmhv_on_papr())
+		plpar_guest_delete(0, kvm->arch.lpid);
+	else
+		kvmppc_free_lpid(kvm->arch.lpid);
 
 	kvmppc_free_pimap(kvm);
 }
diff --git a/arch/powerpc/kvm/book3s_hv.h b/arch/powerpc/kvm/book3s_hv.h
index 7a7005189ab1..61d2c2b8d084 100644
--- a/arch/powerpc/kvm/book3s_hv.h
+++ b/arch/powerpc/kvm/book3s_hv.h
@@ -3,6 +3,8 @@
 /*
  * Privileged (non-hypervisor) host registers to save.
  */
+#include "asm/guest-state-buffer.h"
+
 struct p9_host_os_sprs {
 	unsigned long iamr;
 	unsigned long amr;
@@ -51,61 +53,65 @@ void accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next);
 #define end_timing(vcpu) do {} while (0)
 #endif
 
-#define HV_WRAPPER_SET(reg, size)					\
+#define HV_WRAPPER_SET(reg, size, iden)					\
 static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 	vcpu->arch.reg = val;						\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }
 
-#define HV_WRAPPER_GET(reg, size)					\
+#define HV_WRAPPER_GET(reg, size, iden)					\
 static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	return vcpu->arch.reg;						\
 }
 
-#define HV_WRAPPER(reg, size)						\
-	HV_WRAPPER_SET(reg, size)					\
-	HV_WRAPPER_GET(reg, size)					\
+#define HV_WRAPPER(reg, size, iden)					\
+	HV_WRAPPER_SET(reg, size, iden)					\
+	HV_WRAPPER_GET(reg, size, iden)					\
 
-#define HV_ARRAY_WRAPPER_SET(reg, size)					\
+#define HV_ARRAY_WRAPPER_SET(reg, size, iden)				\
 static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, int i, u##size val)	\
 {									\
 	vcpu->arch.reg[i] = val;					\
+	kvmhv_papr_mark_dirty(vcpu, iden(i));				\
 }
 
-#define HV_ARRAY_WRAPPER_GET(reg, size)					\
+#define HV_ARRAY_WRAPPER_GET(reg, size, iden)				\
 static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu, int i)	\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden(i));			\
 	return vcpu->arch.reg[i];					\
 }
 
-#define HV_ARRAY_WRAPPER(reg, size)					\
-	HV_ARRAY_WRAPPER_SET(reg, size)					\
-	HV_ARRAY_WRAPPER_GET(reg, size)					\
+#define HV_ARRAY_WRAPPER(reg, size, iden)				\
+	HV_ARRAY_WRAPPER_SET(reg, size, iden)				\
+	HV_ARRAY_WRAPPER_GET(reg, size, iden)				\
 
-HV_WRAPPER(mmcra, 64)
-HV_WRAPPER(hfscr, 64)
-HV_WRAPPER(fscr, 64)
-HV_WRAPPER(dscr, 64)
-HV_WRAPPER(purr, 64)
-HV_WRAPPER(spurr, 64)
-HV_WRAPPER(amr, 64)
-HV_WRAPPER(uamor, 64)
-HV_WRAPPER(siar, 64)
-HV_WRAPPER(sdar, 64)
-HV_WRAPPER(iamr, 64)
-HV_WRAPPER(dawr0, 64)
-HV_WRAPPER(dawr1, 64)
-HV_WRAPPER(dawrx0, 64)
-HV_WRAPPER(dawrx1, 64)
-HV_WRAPPER(ciabr, 64)
-HV_WRAPPER(wort, 64)
-HV_WRAPPER(ppr, 64)
-HV_WRAPPER(ctrl, 64)
+HV_WRAPPER(mmcra, 64, GSID_MMCRA)
+HV_WRAPPER(hfscr, 64, GSID_HFSCR)
+HV_WRAPPER(fscr, 64, GSID_FSCR)
+HV_WRAPPER(dscr, 64, GSID_DSCR)
+HV_WRAPPER(purr, 64, GSID_PURR)
+HV_WRAPPER(spurr, 64, GSID_SPURR)
+HV_WRAPPER(amr, 64, GSID_AMR)
+HV_WRAPPER(uamor, 64, GSID_UAMOR)
+HV_WRAPPER(siar, 64, GSID_SIAR)
+HV_WRAPPER(sdar, 64, GSID_SDAR)
+HV_WRAPPER(iamr, 64, GSID_IAMR)
+HV_WRAPPER(dawr0, 64, GSID_DAWR0)
+HV_WRAPPER(dawr1, 64, GSID_DAWR1)
+HV_WRAPPER(dawrx0, 64, GSID_DAWRX0)
+HV_WRAPPER(dawrx1, 64, GSID_DAWRX1)
+HV_WRAPPER(ciabr, 64, GSID_CIABR)
+HV_WRAPPER(wort, 64, GSID_WORT)
+HV_WRAPPER(ppr, 64, GSID_PPR)
+HV_WRAPPER(ctrl, 64, GSID_CTRL);
+HV_WRAPPER(amor, 64, GSID_AMOR)
 
-HV_ARRAY_WRAPPER(mmcr, 64)
-HV_ARRAY_WRAPPER(sier, 64)
-HV_ARRAY_WRAPPER(pmc, 32)
+HV_ARRAY_WRAPPER(mmcr, 64, GSID_MMCR)
+HV_ARRAY_WRAPPER(sier, 64, GSID_SIER)
+HV_ARRAY_WRAPPER(pmc, 32, GSID_PMC)
 
-HV_WRAPPER(pvr, 32)
-HV_WRAPPER(pspb, 32)
+HV_WRAPPER(pspb, 32, GSID_PSPB)
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 377d0b4a05ee..62e011d1e912 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -428,10 +428,12 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	return vcpu->arch.trap;
 }
 
+static unsigned long nested_capabilities;
+
 long kvmhv_nested_init(void)
 {
 	long int ptb_order;
-	unsigned long ptcr;
+	unsigned long ptcr, host_capabilities;
 	long rc;
 
 	if (!kvmhv_on_pseries())
@@ -439,6 +441,27 @@ long kvmhv_nested_init(void)
 	if (!radix_enabled())
 		return -ENODEV;
 
+	rc = plpar_guest_get_capabilities(0, &host_capabilities);
+	if (rc == H_SUCCESS) {
+		unsigned long capabilities = 0;
+
+		if (cpu_has_feature(CPU_FTR_ARCH_31))
+			capabilities |= H_GUEST_CAP_POWER10;
+		if (cpu_has_feature(CPU_FTR_ARCH_300))
+			capabilities |= H_GUEST_CAP_POWER9;
+
+		nested_capabilities = capabilities & host_capabilities;
+		rc = plpar_guest_set_capabilities(0, nested_capabilities);
+		if (rc != H_SUCCESS) {
+			pr_err("kvm-hv: Could not configure parent hypervisor capabilities (rc=%ld)",
+			       rc);
+			return -ENODEV;
+		}
+
+		__kvmhv_on_papr = true;
+		return 0;
+	}
+
 	/* Partition table entry is 1<<4 bytes in size, hence the 4. */
 	ptb_order = KVM_MAX_NESTED_GUESTS_SHIFT + 4;
 	/* Minimum partition table size is 1<<12 bytes */
@@ -507,10 +530,15 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1)
 		return;
 	}
 
-	pseries_partition_tb[lpid].patb0 = cpu_to_be64(dw0);
-	pseries_partition_tb[lpid].patb1 = cpu_to_be64(dw1);
-	/* L0 will do the necessary barriers */
-	kvmhv_flush_lpid(lpid);
+	if (!kvmhv_on_papr()) {
+		pseries_partition_tb[lpid].patb0 = cpu_to_be64(dw0);
+		pseries_partition_tb[lpid].patb1 = cpu_to_be64(dw1);
+		/* L0 will do the necessary barriers */
+		kvmhv_flush_lpid(lpid);
+	}
+
+	if (kvmhv_on_papr())
+		kvmhv_papr_set_ptbl_entry(lpid, dw0, dw1);
 }
 
 static void kvmhv_set_nested_ptbl(struct kvm_nested_guest *gp)
diff --git a/arch/powerpc/kvm/book3s_hv_papr.c b/arch/powerpc/kvm/book3s_hv_papr.c
new file mode 100644
index 000000000000..05d8e735e2a9
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_papr.c
@@ -0,0 +1,940 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2023 Jordan Niethe, IBM Corp. <jniethe5@gmail.com>
+ *
+ * Authors:
+ *    Jordan Niethe <jniethe5@gmail.com>
+ *
+ * Description: KVM functions specific to running on Book 3S
+ * processors as a PAPR guest.
+ *
+ */
+
+#include "linux/blk-mq.h"
+#include "linux/console.h"
+#include "linux/gfp_types.h"
+#include "linux/signal.h"
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/pgtable.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/hvcall.h>
+#include <asm/pgalloc.h>
+#include <asm/reg.h>
+#include <asm/plpar_wrappers.h>
+#include <asm/guest-state-buffer.h>
+#include "trace_hv.h"
+
+bool __kvmhv_on_papr __read_mostly;
+EXPORT_SYMBOL_GPL(__kvmhv_on_papr);
+
+
+static size_t gs_msg_ops_kvmhv_papr_config_get_size(struct gs_msg *gsm)
+{
+	u16 ids[] = {
+		GSID_RUN_OUTPUT_MIN_SIZE,
+		GSID_RUN_INPUT,
+		GSID_RUN_OUTPUT,
+
+	};
+	size_t size = 0;
+
+	for (int i = 0; i < ARRAY_SIZE(ids); i++)
+		size += gse_total_size(gsid_size(ids[i]));
+	return size;
+}
+
+static int gs_msg_ops_kvmhv_papr_config_fill_info(struct gs_buff *gsb,
+						  struct gs_msg *gsm)
+{
+	struct kvmhv_papr_config *cfg;
+	int rc;
+
+	cfg = gsm->data;
+
+	if (gsm_includes(gsm, GSID_RUN_OUTPUT_MIN_SIZE)) {
+		rc = gse_put(gsb, GSID_RUN_OUTPUT_MIN_SIZE,
+			     cfg->vcpu_run_output_size);
+		if (rc < 0)
+			return rc;
+	}
+
+	if (gsm_includes(gsm, GSID_RUN_INPUT)) {
+		rc = gse_put(gsb, GSID_RUN_INPUT, cfg->vcpu_run_input_cfg);
+		if (rc < 0)
+			return rc;
+	}
+
+	if (gsm_includes(gsm, GSID_RUN_OUTPUT)) {
+		gse_put(gsb, GSID_RUN_OUTPUT, cfg->vcpu_run_output_cfg);
+		if (rc < 0)
+			return rc;
+	}
+
+	return 0;
+}
+
+static int gs_msg_ops_kvmhv_papr_config_refresh_info(struct gs_msg *gsm,
+						     struct gs_buff *gsb)
+{
+	struct kvmhv_papr_config *cfg;
+	struct gs_parser gsp = { 0 };
+	struct gs_elem *gse;
+	int rc;
+
+	cfg = gsm->data;
+
+	rc = gse_parse(&gsp, gsb);
+	if (rc < 0)
+		return rc;
+
+	gse = gsp_lookup(&gsp, GSID_RUN_OUTPUT_MIN_SIZE);
+	if (gse)
+		gse_get(gse, &cfg->vcpu_run_output_size);
+	return 0;
+}
+
+static struct gs_msg_ops config_msg_ops = {
+	.get_size = gs_msg_ops_kvmhv_papr_config_get_size,
+	.fill_info = gs_msg_ops_kvmhv_papr_config_fill_info,
+	.refresh_info = gs_msg_ops_kvmhv_papr_config_refresh_info,
+};
+
+static size_t gs_msg_ops_vcpu_get_size(struct gs_msg *gsm)
+{
+	struct gs_bitmap gsbm = { 0 };
+	size_t size = 0;
+	u16 iden;
+
+	gsbm_fill(&gsbm);
+	gsbm_for_each(&gsbm, iden) {
+		switch (iden) {
+		case GSID_HOST_STATE_SIZE:
+		case GSID_RUN_OUTPUT_MIN_SIZE:
+		case GSID_PARTITION_TABLE:
+		case GSID_PROCESS_TABLE:
+		case GSID_RUN_INPUT:
+		case GSID_RUN_OUTPUT:
+			break;
+		default:
+			size += gse_total_size(gsid_size(iden));
+		}
+	}
+	return size;
+}
+
+static int gs_msg_ops_vcpu_fill_info(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	struct kvm_vcpu *vcpu;
+	vector128 v;
+	int rc, i;
+	u16 iden;
+
+	vcpu = gsm->data;
+
+	gsm_for_each(gsm, iden)
+	{
+		rc = 0;
+
+		if ((gsm->flags & GS_FLAGS_WIDE) !=
+		    (gsid_flags(iden) & GS_FLAGS_WIDE))
+			continue;
+
+		switch (iden) {
+		case GSID_DSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.dscr);
+			break;
+		case GSID_MMCRA:
+			rc = gse_put(gsb, iden, vcpu->arch.mmcra);
+			break;
+		case GSID_HFSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.hfscr);
+			break;
+		case GSID_PURR:
+			rc = gse_put(gsb, iden, vcpu->arch.purr);
+			break;
+		case GSID_SPURR:
+			rc = gse_put(gsb, iden, vcpu->arch.spurr);
+			break;
+		case GSID_AMR:
+			rc = gse_put(gsb, iden, vcpu->arch.amr);
+			break;
+		case GSID_UAMOR:
+			rc = gse_put(gsb, iden, vcpu->arch.uamor);
+			break;
+		case GSID_SIAR:
+			rc = gse_put(gsb, iden, vcpu->arch.siar);
+			break;
+		case GSID_SDAR:
+			rc = gse_put(gsb, iden, vcpu->arch.sdar);
+			break;
+		case GSID_IAMR:
+			rc = gse_put(gsb, iden, vcpu->arch.iamr);
+			break;
+		case GSID_DAWR0:
+			rc = gse_put(gsb, iden, vcpu->arch.dawr0);
+			break;
+		case GSID_DAWR1:
+			rc = gse_put(gsb, iden, vcpu->arch.dawr1);
+			break;
+		case GSID_DAWRX0:
+			rc = gse_put(gsb, iden, vcpu->arch.dawrx0);
+			break;
+		case GSID_DAWRX1:
+			rc = gse_put(gsb, iden, vcpu->arch.dawrx1);
+			break;
+		case GSID_CIABR:
+			rc = gse_put(gsb, iden, vcpu->arch.ciabr);
+			break;
+		case GSID_WORT:
+			rc = gse_put(gsb, iden, vcpu->arch.wort);
+			break;
+		case GSID_PPR:
+			rc = gse_put(gsb, iden, vcpu->arch.ppr);
+			break;
+		case GSID_PSPB:
+			rc = gse_put(gsb, iden, vcpu->arch.pspb);
+			break;
+		case GSID_TAR:
+			rc = gse_put(gsb, iden, vcpu->arch.tar);
+			break;
+		case GSID_FSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.fscr);
+			break;
+		case GSID_EBBHR:
+			rc = gse_put(gsb, iden, vcpu->arch.ebbhr);
+			break;
+		case GSID_EBBRR:
+			rc = gse_put(gsb, iden, vcpu->arch.ebbrr);
+			break;
+		case GSID_BESCR:
+			rc = gse_put(gsb, iden, vcpu->arch.bescr);
+			break;
+		case GSID_IC:
+			rc = gse_put(gsb, iden, vcpu->arch.ic);
+			break;
+		case GSID_CTRL:
+			rc = gse_put(gsb, iden, vcpu->arch.ctrl);
+			break;
+		case GSID_PIDR:
+			rc = gse_put(gsb, iden, vcpu->arch.pid);
+			break;
+		case GSID_AMOR:
+			rc = gse_put(gsb, iden, vcpu->arch.amor);
+			break;
+		case GSID_VRSAVE:
+			rc = gse_put(gsb, iden, vcpu->arch.vrsave);
+			break;
+		case GSID_MMCR(0) ... GSID_MMCR(3):
+			i = iden - GSID_MMCR(0);
+			rc = gse_put(gsb, iden, vcpu->arch.mmcr[i]);
+			break;
+		case GSID_SIER(0) ... GSID_SIER(2):
+			i = iden - GSID_SIER(0);
+			rc = gse_put(gsb, iden, vcpu->arch.sier[i]);
+			break;
+		case GSID_PMC(0) ... GSID_PMC(5):
+			i = iden - GSID_PMC(0);
+			rc = gse_put(gsb, iden, vcpu->arch.pmc[i]);
+			break;
+		case GSID_GPR(0) ... GSID_GPR(31):
+			i = iden - GSID_GPR(0);
+			rc = gse_put(gsb, iden, vcpu->arch.regs.gpr[i]);
+			break;
+		case GSID_CR:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.ccr);
+			break;
+		case GSID_XER:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.xer);
+			break;
+		case GSID_CTR:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.ctr);
+			break;
+		case GSID_LR:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.link);
+			break;
+		case GSID_NIA:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.nip);
+			break;
+		case GSID_SRR0:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.srr0);
+			break;
+		case GSID_SRR1:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.srr1);
+			break;
+		case GSID_SPRG0:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg0);
+			break;
+		case GSID_SPRG1:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg1);
+			break;
+		case GSID_SPRG2:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg2);
+			break;
+		case GSID_SPRG3:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg3);
+			break;
+		case GSID_DAR:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.dar);
+			break;
+		case GSID_DSISR:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.dsisr);
+			break;
+		case GSID_MSR:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.msr);
+			break;
+		case GSID_VTB:
+			rc = gse_put(gsb, iden, vcpu->arch.vcore->vtb);
+			break;
+		case GSID_LPCR:
+			rc = gse_put(gsb, iden, vcpu->arch.vcore->lpcr);
+			break;
+		case GSID_TB_OFFSET:
+			rc = gse_put(gsb, iden, vcpu->arch.vcore->tb_offset);
+			break;
+		case GSID_FPSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.fp.fpscr);
+			break;
+		case GSID_VSRS(0) ... GSID_VSRS(31):
+			i = iden - GSID_VSRS(0);
+			memcpy(&v, &vcpu->arch.fp.fpr[i],
+			       sizeof(vcpu->arch.fp.fpr[i]));
+			rc = gse_put(gsb, iden, v);
+			break;
+#ifdef CONFIG_VSX
+		case GSID_VSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.vr.vscr.u[3]);
+			break;
+		case GSID_VSRS(32) ... GSID_VSRS(63):
+			i = iden - GSID_VSRS(32);
+			rc = gse_put(gsb, iden, vcpu->arch.vr.vr[i]);
+			break;
+#endif
+		case GSID_DEC_EXPIRY_TB: {
+			u64 dw;
+
+			dw = vcpu->arch.dec_expires -
+			     vcpu->arch.vcore->tb_offset;
+			rc = gse_put(gsb, iden, dw);
+		}
+			break;
+		}
+
+		if (rc < 0)
+			return rc;
+	}
+
+	return 0;
+}
+
+static int gs_msg_ops_vcpu_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	struct gs_parser gsp = { 0 };
+	struct kvmhv_papr_host *ph;
+	struct gs_bitmap *valids;
+	struct kvm_vcpu *vcpu;
+	struct gs_elem *gse;
+	vector128 v;
+	int rc, i;
+	u16 iden;
+
+	vcpu = gsm->data;
+
+	rc = gse_parse(&gsp, gsb);
+	if (rc < 0)
+		return rc;
+
+	ph = &vcpu->arch.papr_host;
+	valids = &ph->valids;
+
+	gsp_for_each(&gsp, iden, gse)
+	{
+		switch (iden) {
+		case GSID_DSCR:
+			gse_get(gse, &vcpu->arch.dscr);
+			break;
+		case GSID_MMCRA:
+			gse_get(gse, &vcpu->arch.mmcra);
+			break;
+		case GSID_HFSCR:
+			gse_get(gse, &vcpu->arch.hfscr);
+			break;
+		case GSID_PURR:
+			gse_get(gse, &vcpu->arch.purr);
+			break;
+		case GSID_SPURR:
+			gse_get(gse, &vcpu->arch.spurr);
+			break;
+		case GSID_AMR:
+			gse_get(gse, &vcpu->arch.amr);
+			break;
+		case GSID_UAMOR:
+			gse_get(gse, &vcpu->arch.uamor);
+			break;
+		case GSID_SIAR:
+			gse_get(gse, &vcpu->arch.siar);
+			break;
+		case GSID_SDAR:
+			gse_get(gse, &vcpu->arch.sdar);
+			break;
+		case GSID_IAMR:
+			gse_get(gse, &vcpu->arch.iamr);
+			break;
+		case GSID_DAWR0:
+			gse_get(gse, &vcpu->arch.dawr0);
+			break;
+		case GSID_DAWR1:
+			gse_get(gse, &vcpu->arch.dawr1);
+			break;
+		case GSID_DAWRX0:
+			gse_get(gse, &vcpu->arch.dawrx0);
+			break;
+		case GSID_DAWRX1:
+			gse_get(gse, &vcpu->arch.dawrx1);
+			break;
+		case GSID_CIABR:
+			gse_get(gse, &vcpu->arch.ciabr);
+			break;
+		case GSID_WORT:
+			gse_get(gse, &vcpu->arch.wort);
+			break;
+		case GSID_PPR:
+			gse_get(gse, &vcpu->arch.ppr);
+			break;
+		case GSID_PSPB:
+			gse_get(gse, &vcpu->arch.pspb);
+			break;
+		case GSID_TAR:
+			gse_get(gse, &vcpu->arch.tar);
+			break;
+		case GSID_FSCR:
+			gse_get(gse, &vcpu->arch.fscr);
+			break;
+		case GSID_EBBHR:
+			gse_get(gse, &vcpu->arch.ebbhr);
+			break;
+		case GSID_EBBRR:
+			gse_get(gse, &vcpu->arch.ebbrr);
+			break;
+		case GSID_BESCR:
+			gse_get(gse, &vcpu->arch.bescr);
+			break;
+		case GSID_IC:
+			gse_get(gse, &vcpu->arch.ic);
+			break;
+		case GSID_CTRL:
+			gse_get(gse, &vcpu->arch.ctrl);
+			break;
+		case GSID_PIDR:
+			gse_get(gse, &vcpu->arch.pid);
+			break;
+		case GSID_AMOR:
+			gse_get(gse, &vcpu->arch.amor);
+			break;
+		case GSID_VRSAVE:
+			gse_get(gse, &vcpu->arch.vrsave);
+			break;
+		case GSID_MMCR(0) ... GSID_MMCR(3):
+			i = iden - GSID_MMCR(0);
+			gse_get(gse, &vcpu->arch.mmcr[i]);
+			break;
+		case GSID_SIER(0) ... GSID_SIER(2):
+			i = iden - GSID_SIER(0);
+			gse_get(gse, &vcpu->arch.sier[i]);
+			break;
+		case GSID_PMC(0) ... GSID_PMC(5):
+			i = iden - GSID_PMC(0);
+			gse_get(gse, &vcpu->arch.pmc[i]);
+			break;
+		case GSID_GPR(0) ... GSID_GPR(31):
+			i = iden - GSID_GPR(0);
+			gse_get(gse, &vcpu->arch.regs.gpr[i]);
+			break;
+		case GSID_CR:
+			gse_get(gse, &vcpu->arch.regs.ccr);
+			break;
+		case GSID_XER:
+			gse_get(gse, &vcpu->arch.regs.xer);
+			break;
+		case GSID_CTR:
+			gse_get(gse, &vcpu->arch.regs.ctr);
+			break;
+		case GSID_LR:
+			gse_get(gse, &vcpu->arch.regs.link);
+			break;
+		case GSID_NIA:
+			gse_get(gse, &vcpu->arch.regs.nip);
+			break;
+		case GSID_SRR0:
+			gse_get(gse, &vcpu->arch.shregs.srr0);
+			break;
+		case GSID_SRR1:
+			gse_get(gse, &vcpu->arch.shregs.srr1);
+			break;
+		case GSID_SPRG0:
+			gse_get(gse, &vcpu->arch.shregs.sprg0);
+			break;
+		case GSID_SPRG1:
+			gse_get(gse, &vcpu->arch.shregs.sprg1);
+			break;
+		case GSID_SPRG2:
+			gse_get(gse, &vcpu->arch.shregs.sprg2);
+			break;
+		case GSID_SPRG3:
+			gse_get(gse, &vcpu->arch.shregs.sprg3);
+			break;
+		case GSID_DAR:
+			gse_get(gse, &vcpu->arch.shregs.dar);
+			break;
+		case GSID_DSISR:
+			gse_get(gse, &vcpu->arch.shregs.dsisr);
+			break;
+		case GSID_MSR:
+			gse_get(gse, &vcpu->arch.shregs.msr);
+			break;
+		case GSID_VTB:
+			gse_get(gse, &vcpu->arch.vcore->vtb);
+			break;
+		case GSID_LPCR:
+			gse_get(gse, &vcpu->arch.vcore->lpcr);
+			break;
+		case GSID_TB_OFFSET:
+			gse_get(gse, &vcpu->arch.vcore->tb_offset);
+			break;
+		case GSID_FPSCR:
+			gse_get(gse, &vcpu->arch.fp.fpscr);
+			break;
+		case GSID_VSRS(0) ... GSID_VSRS(31):
+			gse_get(gse, &v);
+			i = iden - GSID_VSRS(0);
+			memcpy(&vcpu->arch.fp.fpr[i], &v,
+			       sizeof(vcpu->arch.fp.fpr[i]));
+			break;
+#ifdef CONFIG_VSX
+		case GSID_VSCR:
+			gse_get(gse, &vcpu->arch.vr.vscr.u[3]);
+			break;
+		case GSID_VSRS(32) ... GSID_VSRS(63):
+			i = iden - GSID_VSRS(32);
+			gse_get(gse, &vcpu->arch.vr.vr[i]);
+			break;
+#endif
+		case GSID_HDAR:
+			gse_get(gse, &vcpu->arch.fault_dar);
+			break;
+		case GSID_HDSISR:
+			gse_get(gse, &vcpu->arch.fault_dsisr);
+			break;
+		case GSID_ASDR:
+			gse_get(gse, &vcpu->arch.fault_gpa);
+			break;
+		case GSID_HEIR:
+			gse_get(gse, &vcpu->arch.emul_inst);
+			break;
+		case GSID_DEC_EXPIRY_TB: {
+			u64 dw;
+
+			gse_get(gse, &dw);
+			vcpu->arch.dec_expires =
+				dw + vcpu->arch.vcore->tb_offset;
+			break;
+		}
+		default:
+			continue;
+		}
+		gsbm_set(valids, iden);
+	}
+
+	return 0;
+}
+
+static struct gs_msg_ops vcpu_message_ops = {
+	.get_size = gs_msg_ops_vcpu_get_size,
+	.fill_info = gs_msg_ops_vcpu_fill_info,
+	.refresh_info = gs_msg_ops_vcpu_refresh_info,
+};
+
+static int kvmhv_papr_host_create(struct kvm_vcpu *vcpu,
+				  struct kvmhv_papr_host *ph)
+{
+	struct kvmhv_papr_config *cfg;
+	struct gs_buff *gsb, *vcpu_run_output, *vcpu_run_input;
+	unsigned long guest_id, vcpu_id;
+	struct gs_msg *gsm, *vcpu_message, *vcore_message;
+	int rc;
+
+	cfg = &ph->cfg;
+	guest_id = vcpu->kvm->arch.lpid;
+	vcpu_id = vcpu->vcpu_id;
+
+	gsm = gsm_new(&config_msg_ops, cfg, GS_FLAGS_WIDE, GFP_KERNEL);
+	if (!gsm) {
+		rc = -ENOMEM;
+		goto err;
+	}
+
+	gsb = gsb_new(gsm_size(gsm), guest_id, vcpu_id, GFP_KERNEL);
+	if (!gsb) {
+		rc = -ENOMEM;
+		goto free_gsm;
+	}
+
+	rc = gsb_receive_datum(gsb, gsm, GSID_RUN_OUTPUT_MIN_SIZE);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't get vcpu run output buffer minimum size\n");
+		goto free_gsb;
+	}
+
+	vcpu_run_output = gsb_new(cfg->vcpu_run_output_size, guest_id, vcpu_id, GFP_KERNEL);
+	if (!vcpu_run_output) {
+		rc = -ENOMEM;
+		goto free_gsb;
+	}
+
+	cfg->vcpu_run_output_cfg.address = gsb_paddress(vcpu_run_output);
+	cfg->vcpu_run_output_cfg.size = gsb_capacity(vcpu_run_output);
+	ph->vcpu_run_output = vcpu_run_output;
+
+	gsm->flags = 0;
+	rc = gsb_send_datum(gsb, gsm, GSID_RUN_OUTPUT);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set vcpu run output buffer\n");
+		goto free_gs_out;
+	}
+
+	vcpu_message = gsm_new(&vcpu_message_ops, vcpu, 0, GFP_KERNEL);
+	if (!vcpu_message) {
+		rc = -ENOMEM;
+		goto free_gs_out;
+	}
+	gsm_include_all(vcpu_message);
+
+	ph->vcpu_message = vcpu_message;
+
+	vcpu_run_input = gsb_new(gsm_size(vcpu_message), guest_id, vcpu_id, GFP_KERNEL);
+	if (!vcpu_run_input) {
+		rc = -ENOMEM;
+		goto free_vcpu_message;
+	}
+
+	ph->vcpu_run_input = vcpu_run_input;
+	cfg->vcpu_run_input_cfg.address = gsb_paddress(vcpu_run_input);
+	cfg->vcpu_run_input_cfg.size = gsb_capacity(vcpu_run_input);
+	rc = gsb_send_datum(gsb, gsm, GSID_RUN_INPUT);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set vcpu run input buffer\n");
+		goto free_vcpu_run_input;
+	}
+
+	vcore_message =
+		gsm_new(&vcpu_message_ops, vcpu, GS_FLAGS_WIDE, GFP_KERNEL);
+	if (!vcore_message) {
+		rc = -ENOMEM;
+		goto free_vcpu_run_input;
+	}
+
+	gsm_include_all(vcore_message);
+	ph->vcore_message = vcore_message;
+
+	gsbm_fill(&ph->valids);
+	gsm_free(gsm);
+	gsb_free(gsb);
+	return 0;
+
+free_vcpu_run_input:
+	gsb_free(vcpu_run_input);
+free_vcpu_message:
+	gsm_free(vcpu_message);
+free_gs_out:
+	gsb_free(vcpu_run_output);
+free_gsb:
+	gsb_free(gsb);
+free_gsm:
+	gsm_free(gsm);
+err:
+	return rc;
+}
+
+/**
+ * __kvmhv_papr_mark_dirty() - mark a Guest State ID to be sent to the host
+ * @vcpu: vcpu
+ * @iden: guest state ID
+ *
+ * Mark a guest state ID as having been changed by the L1 host and thus
+ * the new value must be sent to the L0 hypervisor. See kvmhv_papr_flush_vcpu()
+ */
+int __kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_bitmap *valids;
+	struct gs_msg *gsm;
+
+	if (!iden)
+		return 0;
+
+	ph = &vcpu->arch.papr_host;
+	valids = &ph->valids;
+	gsm = ph->vcpu_message;
+	gsm_include(gsm, iden);
+	gsm = ph->vcore_message;
+	gsm_include(gsm, iden);
+	gsbm_set(valids, iden);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_mark_dirty);
+
+/**
+ * __kvmhv_papr_cached_reload() - reload a Guest State ID from the host
+ * @vcpu: vcpu
+ * @iden: guest state ID
+ *
+ * Reload the value for the guest state ID from the L0 host into the L1 host.
+ * This is cached so that going out to the L0 host only happens if necessary.
+ */
+int __kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_bitmap *valids;
+	struct gs_buff *gsb;
+	struct gs_msg gsm;
+	int rc;
+
+	if (!iden)
+		return 0;
+
+	ph = &vcpu->arch.papr_host;
+	valids = &ph->valids;
+	if (gsbm_test(valids, iden))
+		return 0;
+
+	gsb = ph->vcpu_run_input;
+	gsm_init(&gsm, &vcpu_message_ops, vcpu, gsid_flags(iden));
+	rc = gsb_receive_datum(gsb, &gsm, iden);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't get GSID: 0x%x\n", iden);
+		return rc;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_cached_reload);
+
+/**
+ * kvmhv_papr_flush_vcpu() - send modified Guest State IDs to the host
+ * @vcpu: vcpu
+ * @time_limit: hdec expiry tb
+ *
+ * Send the values marked by __kvmhv_papr_mark_dirty() to the L0 host. Thread
+ * wide values are copied to the H_GUEST_RUN_VCPU input buffer. Guest wide
+ * values need to be sent with H_GUEST_SET first.
+ *
+ * The hdec tb offset is always sent to L0 host.
+ */
+int kvmhv_papr_flush_vcpu(struct kvm_vcpu *vcpu, u64 time_limit)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_buff *gsb;
+	struct gs_msg *gsm;
+	int rc;
+
+	ph = &vcpu->arch.papr_host;
+	gsb = ph->vcpu_run_input;
+	gsm = ph->vcore_message;
+	rc = gsb_send_data(gsb, gsm);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set guest wide elements\n");
+		return rc;
+	}
+
+	gsm = ph->vcpu_message;
+	rc = gsm_fill_info(gsm, gsb);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't fill vcpu run input buffer\n");
+		return rc;
+	}
+
+	rc = gse_put(gsb, GSID_HDEC_EXPIRY_TB, time_limit);
+	if (rc < 0)
+		return rc;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_flush_vcpu);
+
+
+/**
+ * kvmhv_papr_set_ptbl_entry() - send partition and process table state to L0 host
+ * @lpid: guest id
+ * @dw0: partition table double word
+ * @dw1: process table double word
+ */
+int kvmhv_papr_set_ptbl_entry(u64 lpid, u64 dw0, u64 dw1)
+{
+	struct gs_part_table patbl;
+	struct gs_proc_table prtbl;
+	struct gs_buff *gsb;
+	size_t size;
+	int rc;
+
+	size = gse_total_size(gsid_size(GSID_PARTITION_TABLE)) +
+	       gse_total_size(gsid_size(GSID_PROCESS_TABLE)) +
+	       sizeof(struct gs_header);
+	gsb = gsb_new(size, lpid, 0, GFP_KERNEL);
+	if (!gsb)
+		return -ENOMEM;
+
+	patbl.address = dw0 & RPDB_MASK;
+	patbl.ea_bits = ((((dw0 & RTS1_MASK) >> (RTS1_SHIFT - 3)) |
+			  ((dw0 & RTS2_MASK) >> RTS2_SHIFT)) +
+			 31);
+	patbl.gpd_size = 1ul << ((dw0 & RPDS_MASK) + 3);
+	rc = gse_put(gsb, GSID_PARTITION_TABLE, patbl);
+	if (rc < 0)
+		goto free_gsb;
+
+	prtbl.address = dw1 & PRTB_MASK;
+	prtbl.gpd_size = 1ul << ((dw1 & PRTS_MASK) + 12);
+	rc = gse_put(gsb, GSID_PROCESS_TABLE, prtbl);
+	if (rc < 0)
+		goto free_gsb;
+
+	rc = gsb_send(gsb, GS_FLAGS_WIDE);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set the PATE\n");
+		goto free_gsb;
+	}
+
+	gsb_free(gsb);
+	return 0;
+
+free_gsb:
+	gsb_free(gsb);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_set_ptbl_entry);
+
+/**
+ * kvmhv_papr_parse_output() - receive values from H_GUEST_RUN_VCPU output
+ * @vcpu: vcpu
+ *
+ * Parse the output buffer from H_GUEST_RUN_VCPU to update vcpu.
+ */
+int kvmhv_papr_parse_output(struct kvm_vcpu *vcpu)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_buff *gsb;
+	struct gs_msg gsm;
+
+	ph = &vcpu->arch.papr_host;
+	gsb = ph->vcpu_run_output;
+
+	vcpu->arch.fault_dar = 0;
+	vcpu->arch.fault_dsisr = 0;
+	vcpu->arch.fault_gpa = 0;
+	vcpu->arch.emul_inst = KVM_INST_FETCH_FAILED;
+
+	gsm_init(&gsm, &vcpu_message_ops, vcpu, 0);
+	gsm_refresh_info(&gsm, gsb);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_parse_output);
+
+static void kvmhv_papr_host_free(struct kvm_vcpu *vcpu,
+				 struct kvmhv_papr_host *ph)
+{
+	gsm_free(ph->vcpu_message);
+	gsm_free(ph->vcore_message);
+	gsb_free(ph->vcpu_run_input);
+	gsb_free(ph->vcpu_run_output);
+}
+
+int __kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	int rc;
+
+	for (int i = 0; i < 32; i++) {
+		rc = kvmhv_papr_cached_reload(vcpu, GSID_GPR(i));
+		if (rc < 0)
+			return rc;
+	}
+
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_CR);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_XER);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_CTR);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_LR);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_NIA);
+	if (rc < 0)
+		return rc;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_reload_ptregs);
+
+int __kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	for (int i = 0; i < 32; i++)
+		kvmhv_papr_mark_dirty(vcpu, GSID_GPR(i));
+
+	kvmhv_papr_mark_dirty(vcpu, GSID_CR);
+	kvmhv_papr_mark_dirty(vcpu, GSID_XER);
+	kvmhv_papr_mark_dirty(vcpu, GSID_CTR);
+	kvmhv_papr_mark_dirty(vcpu, GSID_LR);
+	kvmhv_papr_mark_dirty(vcpu, GSID_NIA);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_mark_dirty_ptregs);
+
+/**
+ * kvmhv_papr_vcpu_create() - create nested vcpu for the PAPR API
+ * @vcpu: vcpu
+ * @ph: PAPR nested host state
+ *
+ * Parse the output buffer from H_GUEST_RUN_VCPU to update vcpu.
+ */
+int kvmhv_papr_vcpu_create(struct kvm_vcpu *vcpu,
+			   struct kvmhv_papr_host *ph)
+{
+	long rc;
+
+	rc = plpar_guest_create_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id);
+
+	if (rc != H_SUCCESS) {
+		pr_err("KVM: Create Guest vcpu hcall failed, rc=%ld\n", rc);
+		switch (rc) {
+		case H_NOT_ENOUGH_RESOURCES:
+		case H_ABORTED:
+			return -ENOMEM;
+		case H_AUTHORITY:
+			return -EPERM;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	rc = kvmhv_papr_host_create(vcpu, ph);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_vcpu_create);
+
+/**
+ * kvmhv_papr_vcpu_free() - free the PAPR host state
+ * @vcpu: vcpu
+ * @ph: PAPR nested host state
+ */
+void kvmhv_papr_vcpu_free(struct kvm_vcpu *vcpu,
+			  struct kvmhv_papr_host *ph)
+{
+	kvmhv_papr_host_free(vcpu, ph);
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_vcpu_free);
diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c
index e6e66c3792f8..663403fa86d4 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -92,7 +92,8 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmio_host_swabbed = 0;
 
 	emulated = EMULATE_FAIL;
-	vcpu->arch.regs.msr = vcpu->arch.shared->msr;
+	vcpu->arch.regs.msr = kvmppc_get_msr(vcpu);
+	kvmhv_papr_reload_ptregs(vcpu, &vcpu->arch.regs);
 	if (analyse_instr(&op, &vcpu->arch.regs, inst) == 0) {
 		int type = op.type & INSTR_TYPE_MASK;
 		int size = GETSIZE(op.type);
@@ -357,6 +358,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 	}
 
 	trace_kvm_ppc_instr(ppc_inst_val(inst), kvmppc_get_pc(vcpu), emulated);
+	kvmhv_papr_mark_dirty_ptregs(vcpu, &vcpu->arch.regs);
 
 	/* Advance past emulated instruction. */
 	if (emulated != EMULATE_FAIL)
diff --git a/arch/powerpc/kvm/guest-state-buffer.c b/arch/powerpc/kvm/guest-state-buffer.c
index db4a79bfcaf1..cc3a7a416867 100644
--- a/arch/powerpc/kvm/guest-state-buffer.c
+++ b/arch/powerpc/kvm/guest-state-buffer.c
@@ -561,3 +561,52 @@ int gsm_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
 	return gsm->ops->refresh_info(gsm, gsb);
 }
 EXPORT_SYMBOL(gsm_refresh_info);
+
+/**
+ * gsb_send - send all elements in the buffer to the hypervisor.
+ * @gsb: guest state buffer
+ * @flags: guest wide or thread wide
+ *
+ * Performs the H_GUEST_SET_STATE hcall for the guest state buffer.
+ */
+int gsb_send(struct gs_buff *gsb, unsigned long flags)
+{
+	unsigned long hflags = 0;
+	unsigned long i;
+	int rc;
+
+	if (gsb_nelems(gsb) == 0)
+		return 0;
+
+	if (flags & GS_FLAGS_WIDE)
+		hflags |= H_GUEST_FLAGS_WIDE;
+
+	rc = plpar_guest_set_state(hflags, gsb->guest_id, gsb->vcpu_id,
+				   __pa(gsb->hdr), gsb->capacity, &i);
+	return rc;
+}
+EXPORT_SYMBOL(gsb_send);
+
+/**
+ * gsb_recv - request all elements in the buffer have their value updated.
+ * @gsb: guest state buffer
+ * @flags: guest wide or thread wide
+ *
+ * Performs the H_GUEST_GET_STATE hcall for the guest state buffer.
+ * After returning from the hcall the guest state elements that were
+ * present in the buffer will have updated values from the hypervisor.
+ */
+int gsb_recv(struct gs_buff *gsb, unsigned long flags)
+{
+	unsigned long hflags = 0;
+	unsigned long i;
+	int rc;
+
+	if (flags & GS_FLAGS_WIDE)
+		hflags |= H_GUEST_FLAGS_WIDE;
+
+	rc = plpar_guest_get_state(hflags, gsb->guest_id, gsb->vcpu_id,
+				   __pa(gsb->hdr), gsb->capacity, &i);
+	return rc;
+}
+EXPORT_SYMBOL(gsb_recv);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 5/6] KVM: PPC: Add support for nested PAPR guests
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Jordan Niethe, mikey, kautuk.consul.1980, kvm, npiggin, kvm-ppc,
	sbhat, vaibhav

A series of hcalls have been added to the PAPR which allow a regular
guest partition to create and manage guest partitions of its own. Add
support to KVM to utilize these hcalls to enable running nested guests.

Overview of the new hcall usage:

- L1 and L0 negotiate capabilities with
  H_GUEST_{G,S}ET_CAPABILITIES()

- L1 requests the L0 create a L2 with
  H_GUEST_CREATE() and receives a handle to use in future hcalls

- L1 requests the L0 create a L2 vCPU with
  H_GUEST_CREATE_VCPU()

- L1 sets up the L2 using H_GUEST_SET and the
  H_GUEST_VCPU_RUN input buffer

- L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN()

- L2 returns to L1 with an exit reason and L1 reads the
  H_GUEST_VCPU_RUN output buffer populated by the L0

- L1 handles the exit using H_GET_STATE if necessary

- L1 reruns L2 vCPU with H_GUEST_VCPU_RUN

- L1 frees the L2 in the L0 with H_GUEST_DELETE()

Support for the new API is determined by trying
H_GUEST_GET_CAPABILITIES. On a successful return, the new API will then
be used.

Use the vcpu register state setters for tracking modified guest state
elements and copy the thread wide values into the H_GUEST_VCPU_RUN input
buffer immediately before running a L2. The guest wide
elements can not be added to the input buffer so send them with a
separate H_GUEST_SET call if necessary.

Make the vcpu register getter load the corresponding value from the real
host with H_GUEST_GET. To avoid unnecessarily calling H_GUEST_GET, track
which values have already been loaded between H_GUEST_VCPU_RUN calls. If
an element is present in the H_GUEST_VCPU_RUN output buffer it also does
not need to be loaded again.

There is existing support for running nested guests on KVM
with powernv. However the interface used for this is not supported by
other PAPR hosts. This existing API is still supported.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
v2:
  - Declare op structs as static
  - Use expressions in switch case with local variables
  - Do not use the PVR for the LOGICAL PVR ID
  - Handle emul_inst as now a double word
  - Use new GPR(), etc macros
  - Determine PAPR nested capabilities from cpu features
---
 arch/powerpc/include/asm/guest-state-buffer.h | 105 +-
 arch/powerpc/include/asm/hvcall.h             |  30 +
 arch/powerpc/include/asm/kvm_book3s.h         | 122 ++-
 arch/powerpc/include/asm/kvm_book3s_64.h      |   6 +
 arch/powerpc/include/asm/kvm_host.h           |  21 +
 arch/powerpc/include/asm/kvm_ppc.h            |  64 +-
 arch/powerpc/include/asm/plpar_wrappers.h     | 198 ++++
 arch/powerpc/kvm/Makefile                     |   1 +
 arch/powerpc/kvm/book3s_hv.c                  | 126 ++-
 arch/powerpc/kvm/book3s_hv.h                  |  74 +-
 arch/powerpc/kvm/book3s_hv_nested.c           |  38 +-
 arch/powerpc/kvm/book3s_hv_papr.c             | 940 ++++++++++++++++++
 arch/powerpc/kvm/emulate_loadstore.c          |   4 +-
 arch/powerpc/kvm/guest-state-buffer.c         |  49 +
 14 files changed, 1684 insertions(+), 94 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_papr.c

diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
index 65a840abf1bb..116126edd8e2 100644
--- a/arch/powerpc/include/asm/guest-state-buffer.h
+++ b/arch/powerpc/include/asm/guest-state-buffer.h
@@ -5,6 +5,7 @@
 #ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
 #define _ASM_POWERPC_GUEST_STATE_BUFFER_H
 
+#include "asm/hvcall.h"
 #include <linux/gfp.h>
 #include <linux/bitmap.h>
 #include <asm/plpar_wrappers.h>
@@ -14,16 +15,16 @@
  **************************************************************************/
 #define GSID_BLANK			0x0000
 
-#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
-#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
-#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
-#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
-#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
-#define GSID_PROCESS_TABLE		0x0006 /* Process Table */
+#define GSID_HOST_STATE_SIZE		0x0001
+#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002
+#define GSID_LOGICAL_PVR		0x0003
+#define GSID_TB_OFFSET			0x0004
+#define GSID_PARTITION_TABLE		0x0005
+#define GSID_PROCESS_TABLE		0x0006
 
-#define GSID_RUN_INPUT			0x0C00 /* Run VCPU Input Buffer */
-#define GSID_RUN_OUTPUT			0x0C01 /* Run VCPU Out Buffer */
-#define GSID_VPA			0x0C02 /* HRA to Guest VCPU VPA */
+#define GSID_RUN_INPUT			0x0C00
+#define GSID_RUN_OUTPUT			0x0C01
+#define GSID_VPA			0x0C02
 
 #define GSID_GPR(x)			(0x1000 + (x))
 #define GSID_HDEC_EXPIRY_TB		0x1020
@@ -300,6 +301,8 @@ struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
 			unsigned long vcpu_id, gfp_t flags);
 void gsb_free(struct gs_buff *gsb);
 void *gsb_put(struct gs_buff *gsb, size_t size);
+int gsb_send(struct gs_buff *gsb, unsigned long flags);
+int gsb_recv(struct gs_buff *gsb, unsigned long flags);
 
 /**
  * gsb_header() - the header of a guest state buffer
@@ -898,4 +901,88 @@ static inline void gsm_reset(struct gs_msg *gsm)
 	gsbm_zero(&gsm->bitmap);
 }
 
+/**
+ * gsb_receive_data - flexibly update values from a guest state buffer
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ *
+ * Requests updated values for the guest state values included in the guest
+ * state message. The guest state message will then deserialize the guest state
+ * buffer.
+ */
+static inline int gsb_receive_data(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	int rc;
+
+	rc = gsm_fill_info(gsm, gsb);
+	if (rc < 0)
+		return rc;
+
+	rc = gsb_recv(gsb, gsm->flags);
+	if (rc < 0)
+		return rc;
+
+	rc = gsm_refresh_info(gsm, gsb);
+	if (rc < 0)
+		return rc;
+	return 0;
+}
+
+/**
+ * gsb_recv - receive a single guest state ID
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ * @iden: guest state identity
+ */
+static inline int gsb_receive_datum(struct gs_buff *gsb, struct gs_msg *gsm,
+				    u16 iden)
+{
+	int rc;
+
+	gsm_include(gsm, iden);
+	rc = gsb_receive_data(gsb, gsm);
+	if (rc < 0)
+		return rc;
+	gsm_reset(gsm);
+	return 0;
+}
+
+/**
+ * gsb_send_data - flexibly send values from a guest state buffer
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ *
+ * Sends the guest state values included in the guest state message.
+ */
+static inline int gsb_send_data(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	int rc;
+
+	rc = gsm_fill_info(gsm, gsb);
+	if (rc < 0)
+		return rc;
+	rc = gsb_send(gsb, gsm->flags);
+
+	return rc;
+}
+
+/**
+ * gsb_recv - send a single guest state ID
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ * @iden: guest state identity
+ */
+static inline int gsb_send_datum(struct gs_buff *gsb, struct gs_msg *gsm,
+				 u16 iden)
+{
+	int rc;
+
+	gsm_include(gsm, iden);
+	rc = gsb_send_data(gsb, gsm);
+	if (rc < 0)
+		return rc;
+	gsm_reset(gsm);
+	return 0;
+}
+
 #endif /* _ASM_POWERPC_GUEST_STATE_BUFFER_H */
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index c099780385dd..ddb99e982917 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -100,6 +100,18 @@
 #define H_COP_HW	-74
 #define H_STATE		-75
 #define H_IN_USE	-77
+
+#define H_INVALID_ELEMENT_ID			-79
+#define H_INVALID_ELEMENT_SIZE			-80
+#define H_INVALID_ELEMENT_VALUE			-81
+#define H_INPUT_BUFFER_NOT_DEFINED		-82
+#define H_INPUT_BUFFER_TOO_SMALL		-83
+#define H_OUTPUT_BUFFER_NOT_DEFINED		-84
+#define H_OUTPUT_BUFFER_TOO_SMALL		-85
+#define H_PARTITION_PAGE_TABLE_NOT_DEFINED	-86
+#define H_GUEST_VCPU_STATE_NOT_HV_OWNED		-87
+
+
 #define H_UNSUPPORTED_FLAG_START	-256
 #define H_UNSUPPORTED_FLAG_END		-511
 #define H_MULTI_THREADS_ACTIVE	-9005
@@ -381,6 +393,15 @@
 #define H_ENTER_NESTED		0xF804
 #define H_TLB_INVALIDATE	0xF808
 #define H_COPY_TOFROM_GUEST	0xF80C
+#define H_GUEST_GET_CAPABILITIES 0x460
+#define H_GUEST_SET_CAPABILITIES 0x464
+#define H_GUEST_CREATE		0x470
+#define H_GUEST_CREATE_VCPU	0x474
+#define H_GUEST_GET_STATE	0x478
+#define H_GUEST_SET_STATE	0x47C
+#define H_GUEST_RUN_VCPU	0x480
+#define H_GUEST_COPY_MEMORY	0x484
+#define H_GUEST_DELETE		0x488
 
 /* Flags for H_SVM_PAGE_IN */
 #define H_PAGE_IN_SHARED        0x1
@@ -467,6 +488,15 @@
 #define H_RPTI_PAGE_1G	0x08
 #define H_RPTI_PAGE_ALL (-1UL)
 
+/* Flags for H_GUEST_{S,G}_STATE */
+#define H_GUEST_FLAGS_WIDE     (1UL<<(63-0))
+
+/* Flag values used for H_{S,G}SET_GUEST_CAPABILITIES */
+#define H_GUEST_CAP_COPY_MEM	(1UL<<(63-0))
+#define H_GUEST_CAP_POWER9	(1UL<<(63-1))
+#define H_GUEST_CAP_POWER10	(1UL<<(63-2))
+#define H_GUEST_CAP_BITMAP2	(1UL<<(63-63))
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 0ca2d8b37b42..c5c57552b447 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -12,6 +12,7 @@
 #include <linux/types.h>
 #include <linux/kvm_host.h>
 #include <asm/kvm_book3s_asm.h>
+#include <asm/guest-state-buffer.h>
 
 struct kvmppc_bat {
 	u64 raw;
@@ -316,6 +317,57 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu);
 
 void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
 
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+
+extern bool __kvmhv_on_papr;
+
+static inline bool kvmhv_on_papr(void)
+{
+	return __kvmhv_on_papr;
+}
+
+#else
+
+static inline bool kvmhv_on_papr(void)
+{
+	return false;
+}
+
+#endif
+
+int __kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs);
+int __kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs);
+int __kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden);
+int __kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden);
+
+static inline int kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_reload_ptregs(vcpu, regs);
+	return 0;
+}
+static inline int kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_mark_dirty_ptregs(vcpu, regs);
+	return 0;
+}
+
+static inline int kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_mark_dirty(vcpu, iden);
+	return 0;
+}
+
+static inline int kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_cached_reload(vcpu, iden);
+	return 0;
+}
+
 extern int kvm_irq_bypass;
 
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
@@ -335,70 +387,84 @@ static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
 {
 	vcpu->arch.regs.gpr[num] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_GPR(num));
 }
 
 static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_GPR(num));
 	return vcpu->arch.regs.gpr[num];
 }
 
 static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.regs.ccr = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_CR);
 }
 
 static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_CR);
 	return vcpu->arch.regs.ccr;
 }
 
 static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.xer = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_XER);
 }
 
 static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_XER);
 	return vcpu->arch.regs.xer;
 }
 
 static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.ctr = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_CTR);
 }
 
 static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_CTR);
 	return vcpu->arch.regs.ctr;
 }
 
 static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.link = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_LR);
 }
 
 static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_LR);
 	return vcpu->arch.regs.link;
 }
 
 static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.nip = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_NIA);
 }
 
 static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_NIA);
 	return vcpu->arch.regs.nip;
 }
 
 static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.pid = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_PIDR);
 }
 
 static inline u32 kvmppc_get_pid(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_PIDR);
 	return vcpu->arch.pid;
 }
 
@@ -415,111 +481,129 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 
 static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSRS(i));
 	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
 }
 
 static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
 {
 	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSRS(i));
 }
 
 static inline u64 kvmppc_get_fpscr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_FPSCR);
 	return vcpu->arch.fp.fpscr;
 }
 
 static inline void kvmppc_set_fpscr(struct kvm_vcpu *vcpu, u64 val)
 {
 	vcpu->arch.fp.fpscr = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_FPSCR);
 }
 
 
 static inline u64 kvmppc_get_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSRS(i));
 	return vcpu->arch.fp.fpr[i][j];
 }
 
 static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 val)
 {
 	vcpu->arch.fp.fpr[i][j] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSRS(i));
 }
 
 #ifdef CONFIG_VSX
 static inline vector128 kvmppc_get_vsx_vr(struct kvm_vcpu *vcpu, int i)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSRS(32 + i));
 	return vcpu->arch.vr.vr[i];
 }
 
 static inline void kvmppc_set_vsx_vr(struct kvm_vcpu *vcpu, int i, vector128 val)
 {
 	vcpu->arch.vr.vr[i] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSRS(32 + i));
 }
 
 static inline u32 kvmppc_get_vscr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSCR);
 	return vcpu->arch.vr.vscr.u[3];
 }
 
 static inline void kvmppc_set_vscr(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.vr.vscr.u[3] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSCR);
 }
 #endif
 
-#define BOOK3S_WRAPPER_SET(reg, size)					\
+#define BOOK3S_WRAPPER_SET(reg, size, iden)				\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 									\
 	vcpu->arch.reg = val;						\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }
 
-#define BOOK3S_WRAPPER_GET(reg, size)					\
+#define BOOK3S_WRAPPER_GET(reg, size, iden)				\
 static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	return vcpu->arch.reg;						\
 }
 
-#define BOOK3S_WRAPPER(reg, size)					\
-	BOOK3S_WRAPPER_SET(reg, size)					\
-	BOOK3S_WRAPPER_GET(reg, size)					\
+#define BOOK3S_WRAPPER(reg, size, iden)					\
+	BOOK3S_WRAPPER_SET(reg, size, iden)				\
+	BOOK3S_WRAPPER_GET(reg, size, iden)				\
 
-BOOK3S_WRAPPER(tar, 64)
-BOOK3S_WRAPPER(ebbhr, 64)
-BOOK3S_WRAPPER(ebbrr, 64)
-BOOK3S_WRAPPER(bescr, 64)
-BOOK3S_WRAPPER(ic, 64)
-BOOK3S_WRAPPER(vrsave, 64)
+BOOK3S_WRAPPER(tar, 64, GSID_TAR)
+BOOK3S_WRAPPER(ebbhr, 64, GSID_EBBHR)
+BOOK3S_WRAPPER(ebbrr, 64, GSID_EBBRR)
+BOOK3S_WRAPPER(bescr, 64, GSID_BESCR)
+BOOK3S_WRAPPER(ic, 64, GSID_IC)
+BOOK3S_WRAPPER(vrsave, 64, GSID_VRSAVE)
 
 
-#define VCORE_WRAPPER_SET(reg, size)					\
+#define VCORE_WRAPPER_SET(reg, size, iden)				\
 static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 	vcpu->arch.vcore->reg = val;					\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }
 
-#define VCORE_WRAPPER_GET(reg, size)					\
+#define VCORE_WRAPPER_GET(reg, size, iden)				\
 static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	return vcpu->arch.vcore->reg;					\
 }
 
-#define VCORE_WRAPPER(reg, size)					\
-	VCORE_WRAPPER_SET(reg, size)					\
-	VCORE_WRAPPER_GET(reg, size)					\
+#define VCORE_WRAPPER(reg, size, iden)					\
+	VCORE_WRAPPER_SET(reg, size, iden)				\
+	VCORE_WRAPPER_GET(reg, size, iden)				\
 
 
-VCORE_WRAPPER(vtb, 64)
-VCORE_WRAPPER(tb_offset, 64)
-VCORE_WRAPPER(lpcr, 64)
+VCORE_WRAPPER(vtb, 64, GSID_VTB)
+VCORE_WRAPPER(tb_offset, 64, GSID_TB_OFFSET)
+VCORE_WRAPPER(lpcr, 64, GSID_LPCR)
 
 static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_TB_OFFSET);
+	kvmhv_papr_cached_reload(vcpu, GSID_DEC_EXPIRY_TB);
 	return vcpu->arch.dec_expires;
 }
 
 static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
 {
 	vcpu->arch.dec_expires = val;
+	kvmhv_papr_cached_reload(vcpu, GSID_TB_OFFSET);
+	kvmhv_papr_mark_dirty(vcpu, GSID_DEC_EXPIRY_TB);
 }
 
 /* Expiry time of vcpu DEC relative to host TB */
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index d49065af08e9..689e14284127 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -677,6 +677,12 @@ static inline pte_t *find_kvm_host_pte(struct kvm *kvm, unsigned long mmu_seq,
 extern pte_t *find_kvm_nested_guest_pte(struct kvm *kvm, unsigned long lpid,
 					unsigned long ea, unsigned *hshift);
 
+int kvmhv_papr_vcpu_create(struct kvm_vcpu *vcpu, struct kvmhv_papr_host *nested_state);
+void kvmhv_papr_vcpu_free(struct kvm_vcpu *vcpu, struct kvmhv_papr_host *nested_state);
+int kvmhv_papr_flush_vcpu(struct kvm_vcpu *vcpu, u64 time_limit);
+int kvmhv_papr_set_ptbl_entry(u64 lpid, u64 dw0, u64 dw1);
+int kvmhv_papr_parse_output(struct kvm_vcpu *vcpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 14ee0dece853..21e8bf9e530a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include <asm/cacheflush.h>
 #include <asm/hvcall.h>
 #include <asm/mce.h>
+#include <asm/guest-state-buffer.h>
 
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
@@ -509,6 +510,23 @@ union xive_tma_w01 {
 	__be64 w01;
 };
 
+ /* Nested PAPR host H_GUEST_RUN_VCPU configuration */
+struct kvmhv_papr_config {
+	struct gs_buff_info vcpu_run_output_cfg;
+	struct gs_buff_info vcpu_run_input_cfg;
+	u64 vcpu_run_output_size;
+};
+
+ /* Nested PAPR host state */
+struct kvmhv_papr_host {
+	struct kvmhv_papr_config cfg;
+	struct gs_buff *vcpu_run_output;
+	struct gs_buff *vcpu_run_input;
+	struct gs_msg *vcpu_message;
+	struct gs_msg *vcore_message;
+	struct gs_bitmap valids;
+};
+
 struct kvm_vcpu_arch {
 	ulong host_stack;
 	u32 host_pid;
@@ -575,6 +593,7 @@ struct kvm_vcpu_arch {
 	ulong dscr;
 	ulong amr;
 	ulong uamor;
+	ulong amor;
 	ulong iamr;
 	u32 ctrl;
 	u32 dabrx;
@@ -829,6 +848,8 @@ struct kvm_vcpu_arch {
 	u64 nested_hfscr;	/* HFSCR that the L1 requested for the nested guest */
 	u32 nested_vcpu_id;
 	gpa_t nested_io_gpr;
+	/* For nested APIv2 guests*/
+	struct kvmhv_papr_host papr_host;
 #endif
 
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index fbac353ac46b..4d43bb29ba7c 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -615,6 +615,35 @@ static inline bool kvmhv_on_pseries(void)
 {
 	return false;
 }
+
+#endif
+
+#ifndef CONFIG_PPC_BOOK3S
+
+static inline bool kvmhv_on_papr(void)
+{
+	return false;
+}
+
+static inline int kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	return 0;
+}
+static inline int kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	return 0;
+}
+
+static inline int kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden)
+{
+	return 0;
+}
+
+static inline int kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden)
+{
+	return 0;
+}
+
 #endif
 
 #ifdef CONFIG_KVM_XICS
@@ -957,31 +986,33 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
 }									\
 
-#define SHARED_CACHE_WRAPPER_GET(reg, size)				\
+#define SHARED_CACHE_WRAPPER_GET(reg, size, iden)			\
 static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	if (kvmppc_shared_big_endian(vcpu))				\
 	       return be##size##_to_cpu(vcpu->arch.shared->reg);	\
 	else								\
 	       return le##size##_to_cpu(vcpu->arch.shared->reg);	\
 }									\
 
-#define SHARED_CACHE_WRAPPER_SET(reg, size)				\
+#define SHARED_CACHE_WRAPPER_SET(reg, size, iden)			\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 	if (kvmppc_shared_big_endian(vcpu))				\
 	       vcpu->arch.shared->reg = cpu_to_be##size(val);		\
 	else								\
 	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }									\
 
 #define SHARED_WRAPPER(reg, size)					\
 	SHARED_WRAPPER_GET(reg, size)					\
 	SHARED_WRAPPER_SET(reg, size)					\
 
-#define SHARED_CACHE_WRAPPER(reg, size)					\
-	SHARED_CACHE_WRAPPER_GET(reg, size)				\
-	SHARED_CACHE_WRAPPER_SET(reg, size)				\
+#define SHARED_CACHE_WRAPPER(reg, size, iden)				\
+	SHARED_CACHE_WRAPPER_GET(reg, size, iden)			\
+	SHARED_CACHE_WRAPPER_SET(reg, size, iden)			\
 
 #define SPRNG_WRAPPER(reg, bookehv_spr)					\
 	SPRNG_WRAPPER_GET(reg, bookehv_spr)				\
@@ -1000,29 +1031,30 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 #define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)			\
 	SHARED_WRAPPER(reg, size)					\
 
-#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr)		\
-	SHARED_CACHE_WRAPPER(reg, size)					\
+#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr, iden)	\
+	SHARED_CACHE_WRAPPER(reg, size, iden)				\
 
 #endif
 
 SHARED_WRAPPER(critical, 64)
-SHARED_SPRNG_CACHE_WRAPPER(sprg0, 64, SPRN_GSPRG0)
-SHARED_SPRNG_CACHE_WRAPPER(sprg1, 64, SPRN_GSPRG1)
-SHARED_SPRNG_CACHE_WRAPPER(sprg2, 64, SPRN_GSPRG2)
-SHARED_SPRNG_CACHE_WRAPPER(sprg3, 64, SPRN_GSPRG3)
-SHARED_SPRNG_CACHE_WRAPPER(srr0, 64, SPRN_GSRR0)
-SHARED_SPRNG_CACHE_WRAPPER(srr1, 64, SPRN_GSRR1)
-SHARED_SPRNG_CACHE_WRAPPER(dar, 64, SPRN_GDEAR)
+SHARED_SPRNG_CACHE_WRAPPER(sprg0, 64, SPRN_GSPRG0, GSID_SPRG0)
+SHARED_SPRNG_CACHE_WRAPPER(sprg1, 64, SPRN_GSPRG1, GSID_SPRG1)
+SHARED_SPRNG_CACHE_WRAPPER(sprg2, 64, SPRN_GSPRG2, GSID_SPRG2)
+SHARED_SPRNG_CACHE_WRAPPER(sprg3, 64, SPRN_GSPRG3, GSID_SPRG3)
+SHARED_SPRNG_CACHE_WRAPPER(srr0, 64, SPRN_GSRR0, GSID_SRR0)
+SHARED_SPRNG_CACHE_WRAPPER(srr1, 64, SPRN_GSRR1, GSID_SRR1)
+SHARED_SPRNG_CACHE_WRAPPER(dar, 64, SPRN_GDEAR, GSID_DAR)
 SHARED_SPRNG_WRAPPER(esr, 64, SPRN_GESR)
-SHARED_CACHE_WRAPPER_GET(msr, 64)
+SHARED_CACHE_WRAPPER_GET(msr, 64, GSID_MSR)
 static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val)
 {
 	if (kvmppc_shared_big_endian(vcpu))
 	       vcpu->arch.shared->msr = cpu_to_be64(val);
 	else
 	       vcpu->arch.shared->msr = cpu_to_le64(val);
+	kvmhv_papr_mark_dirty(vcpu, GSID_MSR);
 }
-SHARED_CACHE_WRAPPER(dsisr, 32)
+SHARED_CACHE_WRAPPER(dsisr, 32, GSID_DSISR)
 SHARED_WRAPPER(int_pending, 32)
 SHARED_WRAPPER(sprg4, 64)
 SHARED_WRAPPER(sprg5, 64)
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index 8239c0af5eb2..b48f90884522 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -6,6 +6,7 @@
 
 #include <linux/string.h>
 #include <linux/irqflags.h>
+#include <linux/delay.h>
 
 #include <asm/hvcall.h>
 #include <asm/paca.h>
@@ -342,6 +343,203 @@ static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
 	return rc;
 }
 
+static inline long plpar_guest_create(unsigned long flags, unsigned long *guest_id)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	unsigned long token;
+	long rc;
+
+	token = -1UL;
+	while (true) {
+		rc = plpar_hcall(H_GUEST_CREATE, retbuf, flags, token);
+		if (rc == H_SUCCESS) {
+			*guest_id = retbuf[0];
+			break;
+		}
+
+		if (rc == H_BUSY) {
+			token = retbuf[0];
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			token = retbuf[0];
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_create_vcpu(unsigned long flags,
+					   unsigned long guest_id,
+					   unsigned long vcpu_id)
+{
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall_norets(H_GUEST_CREATE_VCPU, 0, guest_id, vcpu_id);
+
+		if (rc == H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_set_state(unsigned long flags,
+					 unsigned long guest_id,
+					 unsigned long vcpu_id,
+					 unsigned long data_buffer,
+					 unsigned long data_size,
+					 unsigned long *failed_index)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall(H_GUEST_SET_STATE, retbuf, flags, guest_id,
+				 vcpu_id, data_buffer, data_size);
+
+		if (rc == H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		if (rc == H_INVALID_ELEMENT_ID)
+			*failed_index = retbuf[0];
+		else if (rc == H_INVALID_ELEMENT_SIZE)
+			*failed_index = retbuf[0];
+		else if (rc == H_INVALID_ELEMENT_VALUE)
+			*failed_index = retbuf[0];
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_get_state(unsigned long flags,
+					 unsigned long guest_id,
+					 unsigned long vcpu_id,
+					 unsigned long data_buffer,
+					 unsigned long data_size,
+					 unsigned long *failed_index)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall(H_GUEST_GET_STATE, retbuf, flags, guest_id,
+				 vcpu_id, data_buffer, data_size);
+
+		if (rc == H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		if (rc == H_INVALID_ELEMENT_ID)
+			*failed_index = retbuf[0];
+		else if (rc == H_INVALID_ELEMENT_SIZE)
+			*failed_index = retbuf[0];
+		else if (rc == H_INVALID_ELEMENT_VALUE)
+			*failed_index = retbuf[0];
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_run_vcpu(unsigned long flags, unsigned long guest_id,
+					unsigned long vcpu_id, int *trap,
+					unsigned long *failed_index)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	rc = plpar_hcall(H_GUEST_RUN_VCPU, retbuf, flags, guest_id, vcpu_id);
+	if (rc == H_SUCCESS)
+		*trap = retbuf[0];
+	else if (rc == H_INVALID_ELEMENT_ID)
+		*failed_index = retbuf[0];
+	else if (rc == H_INVALID_ELEMENT_SIZE)
+		*failed_index = retbuf[0];
+	else if (rc == H_INVALID_ELEMENT_VALUE)
+		*failed_index = retbuf[0];
+
+	return rc;
+}
+
+static inline long plpar_guest_delete(unsigned long flags, u64 guest_id)
+{
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall_norets(H_GUEST_DELETE, flags, guest_id);
+		if (rc == H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_set_capabilities(unsigned long flags,
+						unsigned long capabilities)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	rc = plpar_hcall(H_GUEST_SET_CAPABILITIES, retbuf, flags, capabilities);
+
+	return rc;
+}
+
+static inline long plpar_guest_get_capabilities(unsigned long flags,
+						unsigned long *capabilities)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	rc = plpar_hcall(H_GUEST_GET_CAPABILITIES, retbuf, flags);
+	if (rc == H_SUCCESS)
+		*capabilities = retbuf[0];
+
+	return rc;
+}
+
 /*
  * Wrapper to H_RPT_INVALIDATE hcall that handles return values appropriately
  *
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index eb8445e71c14..9bb0876521ee 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -87,6 +87,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
 	book3s_hv_ras.o \
 	book3s_hv_builtin.o \
 	book3s_hv_p9_perf.o \
+	book3s_hv_papr.o \
 	guest-state-buffer.o \
 	$(kvm-book3s_64-builtin-tm-objs-y) \
 	$(kvm-book3s_64-builtin-xics-objs-y)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 521d84621422..f22ee582e209 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -383,6 +383,11 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 	spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
 }
 
+static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
+{
+	vcpu->arch.pvr = pvr;
+}
+
 /* Dummy value used in computing PCR value below */
 #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
 
@@ -1262,13 +1267,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 			return RESUME_HOST;
 		break;
 #endif
-	case H_RANDOM:
+	case H_RANDOM: {
 		unsigned long rand;
 
 		if (!arch_get_random_seed_longs(&rand, 1))
 			ret = H_HARDWARE;
 		kvmppc_set_gpr(vcpu, 4, rand);
 		break;
+	}
 	case H_RPT_INVALIDATE:
 		ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
 					      kvmppc_get_gpr(vcpu, 5),
@@ -2921,14 +2927,21 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 	vcpu->arch.shared_big_endian = false;
 #endif
 #endif
-	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
 
+	if (kvmhv_on_papr()) {
+		err = kvmhv_papr_vcpu_create(vcpu, &vcpu->arch.papr_host);
+		if (err < 0)
+			return err;
+	}
+
+	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
 	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
 		kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
 		kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
 	}
 
 	kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
+	kvmppc_set_amor_hv(vcpu, ~0);
 	/* default to host PVR, since we can't spoof it */
 	kvmppc_set_pvr_hv(vcpu, mfspr(SPRN_PVR));
 	spin_lock_init(&vcpu->arch.vpa_update_lock);
@@ -3006,6 +3019,8 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 			kvm->arch.vcores[core] = vcore;
 			kvm->arch.online_vcores++;
 			mutex_unlock(&kvm->arch.mmu_setup_lock);
+			if (kvmhv_on_papr())
+				kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
 		}
 	}
 	mutex_unlock(&kvm->lock);
@@ -3078,6 +3093,8 @@ static void kvmppc_core_vcpu_free_hv(struct kvm_vcpu *vcpu)
 	unpin_vpa(vcpu->kvm, &vcpu->arch.slb_shadow);
 	unpin_vpa(vcpu->kvm, &vcpu->arch.vpa);
 	spin_unlock(&vcpu->arch.vpa_update_lock);
+	if (kvmhv_on_papr())
+		kvmhv_papr_vcpu_free(vcpu, &vcpu->arch.papr_host);
 }
 
 static int kvmppc_core_check_requests_hv(struct kvm_vcpu *vcpu)
@@ -4042,6 +4059,50 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
 	}
 }
 
+static int kvmhv_vcpu_entry_papr(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
+{
+	struct kvmhv_papr_host *ph;
+	unsigned long msr, i;
+	int trap;
+	long rc;
+
+	ph = &vcpu->arch.papr_host;
+
+	msr = mfmsr();
+	kvmppc_msr_hard_disable_set_facilities(vcpu, msr);
+	if (lazy_irq_pending())
+		return 0;
+
+	kvmhv_papr_flush_vcpu(vcpu, time_limit);
+
+	accumulate_time(vcpu, &vcpu->arch.in_guest);
+	rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
+				  &trap, &i);
+
+	if (rc != H_SUCCESS) {
+		pr_err("KVM Guest Run VCPU hcall failed\n");
+		if (rc == H_INVALID_ELEMENT_ID)
+			pr_err("KVM: Guest Run VCPU invalid element id at %ld\n", i);
+		else if (rc == H_INVALID_ELEMENT_SIZE)
+			pr_err("KVM: Guest Run VCPU invalid element size at %ld\n", i);
+		else if (rc == H_INVALID_ELEMENT_VALUE)
+			pr_err("KVM: Guest Run VCPU invalid element value at %ld\n", i);
+		return 0;
+	}
+	accumulate_time(vcpu, &vcpu->arch.guest_exit);
+
+	*tb = mftb();
+	gsm_reset(ph->vcpu_message);
+	gsm_reset(ph->vcore_message);
+	gsbm_zero(&ph->valids);
+
+	kvmhv_papr_parse_output(vcpu);
+
+	timer_rearm_host_dec(*tb);
+
+	return trap;
+}
+
 /* call our hypervisor to load up HV regs and go */
 static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
 {
@@ -4159,7 +4220,10 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 	vcpu_vpa_increment_dispatch(vcpu);
 
 	if (kvmhv_on_pseries()) {
-		trap = kvmhv_vcpu_entry_p9_nested(vcpu, time_limit, lpcr, tb);
+		if (!kvmhv_on_papr())
+			trap = kvmhv_vcpu_entry_p9_nested(vcpu, time_limit, lpcr, tb);
+		else
+			trap = kvmhv_vcpu_entry_papr(vcpu, time_limit, lpcr, tb);
 
 		/* H_CEDE has to be handled now, not later */
 		if (trap == BOOK3S_INTERRUPT_SYSCALL && !nested &&
@@ -5119,6 +5183,7 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
  */
 void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
 {
+	struct kvm_vcpu *vcpu;
 	long int i;
 	u32 cores_done = 0;
 
@@ -5139,6 +5204,12 @@ void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
 		if (++cores_done >= kvm->arch.online_vcores)
 			break;
 	}
+
+	if (kvmhv_on_papr()) {
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
+		}
+	}
 }
 
 void kvmppc_setup_partition_table(struct kvm *kvm)
@@ -5405,15 +5476,43 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 
 	/* Allocate the guest's logical partition ID */
 
-	lpid = kvmppc_alloc_lpid();
-	if ((long)lpid < 0)
-		return -ENOMEM;
-	kvm->arch.lpid = lpid;
+	if (!kvmhv_on_papr()) {
+		lpid = kvmppc_alloc_lpid();
+		if ((long)lpid < 0)
+			return -ENOMEM;
+		kvm->arch.lpid = lpid;
+	}
 
 	kvmppc_alloc_host_rm_ops();
 
 	kvmhv_vm_nested_init(kvm);
 
+	if (kvmhv_on_papr()) {
+		long rc;
+		unsigned long guest_id;
+
+		rc = plpar_guest_create(0, &guest_id);
+
+		if (rc != H_SUCCESS)
+			pr_err("KVM: Create Guest hcall failed, rc=%ld\n", rc);
+
+		switch (rc) {
+		case H_PARAMETER:
+		case H_FUNCTION:
+		case H_STATE:
+			return -EINVAL;
+		case H_NOT_ENOUGH_RESOURCES:
+		case H_ABORTED:
+			return -ENOMEM;
+		case H_AUTHORITY:
+			return -EPERM;
+		case H_NOT_AVAILABLE:
+			return -EBUSY;
+		}
+		kvm->arch.lpid = guest_id;
+	}
+
+
 	/*
 	 * Since we don't flush the TLB when tearing down a VM,
 	 * and this lpid might have previously been used,
@@ -5483,7 +5582,10 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 			lpcr |= LPCR_HAIL;
 		ret = kvmppc_init_vm_radix(kvm);
 		if (ret) {
-			kvmppc_free_lpid(kvm->arch.lpid);
+			if (kvmhv_on_papr())
+				plpar_guest_delete(0, kvm->arch.lpid);
+			else
+				kvmppc_free_lpid(kvm->arch.lpid);
 			return ret;
 		}
 		kvmppc_setup_partition_table(kvm);
@@ -5573,10 +5675,14 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 		kvm->arch.process_table = 0;
 		if (kvm->arch.secure_guest)
 			uv_svm_terminate(kvm->arch.lpid);
-		kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
+		if (!kvmhv_on_papr())
+			kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
 	}
 
-	kvmppc_free_lpid(kvm->arch.lpid);
+	if (kvmhv_on_papr())
+		plpar_guest_delete(0, kvm->arch.lpid);
+	else
+		kvmppc_free_lpid(kvm->arch.lpid);
 
 	kvmppc_free_pimap(kvm);
 }
diff --git a/arch/powerpc/kvm/book3s_hv.h b/arch/powerpc/kvm/book3s_hv.h
index 7a7005189ab1..61d2c2b8d084 100644
--- a/arch/powerpc/kvm/book3s_hv.h
+++ b/arch/powerpc/kvm/book3s_hv.h
@@ -3,6 +3,8 @@
 /*
  * Privileged (non-hypervisor) host registers to save.
  */
+#include "asm/guest-state-buffer.h"
+
 struct p9_host_os_sprs {
 	unsigned long iamr;
 	unsigned long amr;
@@ -51,61 +53,65 @@ void accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next);
 #define end_timing(vcpu) do {} while (0)
 #endif
 
-#define HV_WRAPPER_SET(reg, size)					\
+#define HV_WRAPPER_SET(reg, size, iden)					\
 static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 	vcpu->arch.reg = val;						\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }
 
-#define HV_WRAPPER_GET(reg, size)					\
+#define HV_WRAPPER_GET(reg, size, iden)					\
 static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	return vcpu->arch.reg;						\
 }
 
-#define HV_WRAPPER(reg, size)						\
-	HV_WRAPPER_SET(reg, size)					\
-	HV_WRAPPER_GET(reg, size)					\
+#define HV_WRAPPER(reg, size, iden)					\
+	HV_WRAPPER_SET(reg, size, iden)					\
+	HV_WRAPPER_GET(reg, size, iden)					\
 
-#define HV_ARRAY_WRAPPER_SET(reg, size)					\
+#define HV_ARRAY_WRAPPER_SET(reg, size, iden)				\
 static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, int i, u##size val)	\
 {									\
 	vcpu->arch.reg[i] = val;					\
+	kvmhv_papr_mark_dirty(vcpu, iden(i));				\
 }
 
-#define HV_ARRAY_WRAPPER_GET(reg, size)					\
+#define HV_ARRAY_WRAPPER_GET(reg, size, iden)				\
 static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu, int i)	\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden(i));			\
 	return vcpu->arch.reg[i];					\
 }
 
-#define HV_ARRAY_WRAPPER(reg, size)					\
-	HV_ARRAY_WRAPPER_SET(reg, size)					\
-	HV_ARRAY_WRAPPER_GET(reg, size)					\
+#define HV_ARRAY_WRAPPER(reg, size, iden)				\
+	HV_ARRAY_WRAPPER_SET(reg, size, iden)				\
+	HV_ARRAY_WRAPPER_GET(reg, size, iden)				\
 
-HV_WRAPPER(mmcra, 64)
-HV_WRAPPER(hfscr, 64)
-HV_WRAPPER(fscr, 64)
-HV_WRAPPER(dscr, 64)
-HV_WRAPPER(purr, 64)
-HV_WRAPPER(spurr, 64)
-HV_WRAPPER(amr, 64)
-HV_WRAPPER(uamor, 64)
-HV_WRAPPER(siar, 64)
-HV_WRAPPER(sdar, 64)
-HV_WRAPPER(iamr, 64)
-HV_WRAPPER(dawr0, 64)
-HV_WRAPPER(dawr1, 64)
-HV_WRAPPER(dawrx0, 64)
-HV_WRAPPER(dawrx1, 64)
-HV_WRAPPER(ciabr, 64)
-HV_WRAPPER(wort, 64)
-HV_WRAPPER(ppr, 64)
-HV_WRAPPER(ctrl, 64)
+HV_WRAPPER(mmcra, 64, GSID_MMCRA)
+HV_WRAPPER(hfscr, 64, GSID_HFSCR)
+HV_WRAPPER(fscr, 64, GSID_FSCR)
+HV_WRAPPER(dscr, 64, GSID_DSCR)
+HV_WRAPPER(purr, 64, GSID_PURR)
+HV_WRAPPER(spurr, 64, GSID_SPURR)
+HV_WRAPPER(amr, 64, GSID_AMR)
+HV_WRAPPER(uamor, 64, GSID_UAMOR)
+HV_WRAPPER(siar, 64, GSID_SIAR)
+HV_WRAPPER(sdar, 64, GSID_SDAR)
+HV_WRAPPER(iamr, 64, GSID_IAMR)
+HV_WRAPPER(dawr0, 64, GSID_DAWR0)
+HV_WRAPPER(dawr1, 64, GSID_DAWR1)
+HV_WRAPPER(dawrx0, 64, GSID_DAWRX0)
+HV_WRAPPER(dawrx1, 64, GSID_DAWRX1)
+HV_WRAPPER(ciabr, 64, GSID_CIABR)
+HV_WRAPPER(wort, 64, GSID_WORT)
+HV_WRAPPER(ppr, 64, GSID_PPR)
+HV_WRAPPER(ctrl, 64, GSID_CTRL);
+HV_WRAPPER(amor, 64, GSID_AMOR)
 
-HV_ARRAY_WRAPPER(mmcr, 64)
-HV_ARRAY_WRAPPER(sier, 64)
-HV_ARRAY_WRAPPER(pmc, 32)
+HV_ARRAY_WRAPPER(mmcr, 64, GSID_MMCR)
+HV_ARRAY_WRAPPER(sier, 64, GSID_SIER)
+HV_ARRAY_WRAPPER(pmc, 32, GSID_PMC)
 
-HV_WRAPPER(pvr, 32)
-HV_WRAPPER(pspb, 32)
+HV_WRAPPER(pspb, 32, GSID_PSPB)
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 377d0b4a05ee..62e011d1e912 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -428,10 +428,12 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	return vcpu->arch.trap;
 }
 
+static unsigned long nested_capabilities;
+
 long kvmhv_nested_init(void)
 {
 	long int ptb_order;
-	unsigned long ptcr;
+	unsigned long ptcr, host_capabilities;
 	long rc;
 
 	if (!kvmhv_on_pseries())
@@ -439,6 +441,27 @@ long kvmhv_nested_init(void)
 	if (!radix_enabled())
 		return -ENODEV;
 
+	rc = plpar_guest_get_capabilities(0, &host_capabilities);
+	if (rc == H_SUCCESS) {
+		unsigned long capabilities = 0;
+
+		if (cpu_has_feature(CPU_FTR_ARCH_31))
+			capabilities |= H_GUEST_CAP_POWER10;
+		if (cpu_has_feature(CPU_FTR_ARCH_300))
+			capabilities |= H_GUEST_CAP_POWER9;
+
+		nested_capabilities = capabilities & host_capabilities;
+		rc = plpar_guest_set_capabilities(0, nested_capabilities);
+		if (rc != H_SUCCESS) {
+			pr_err("kvm-hv: Could not configure parent hypervisor capabilities (rc=%ld)",
+			       rc);
+			return -ENODEV;
+		}
+
+		__kvmhv_on_papr = true;
+		return 0;
+	}
+
 	/* Partition table entry is 1<<4 bytes in size, hence the 4. */
 	ptb_order = KVM_MAX_NESTED_GUESTS_SHIFT + 4;
 	/* Minimum partition table size is 1<<12 bytes */
@@ -507,10 +530,15 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1)
 		return;
 	}
 
-	pseries_partition_tb[lpid].patb0 = cpu_to_be64(dw0);
-	pseries_partition_tb[lpid].patb1 = cpu_to_be64(dw1);
-	/* L0 will do the necessary barriers */
-	kvmhv_flush_lpid(lpid);
+	if (!kvmhv_on_papr()) {
+		pseries_partition_tb[lpid].patb0 = cpu_to_be64(dw0);
+		pseries_partition_tb[lpid].patb1 = cpu_to_be64(dw1);
+		/* L0 will do the necessary barriers */
+		kvmhv_flush_lpid(lpid);
+	}
+
+	if (kvmhv_on_papr())
+		kvmhv_papr_set_ptbl_entry(lpid, dw0, dw1);
 }
 
 static void kvmhv_set_nested_ptbl(struct kvm_nested_guest *gp)
diff --git a/arch/powerpc/kvm/book3s_hv_papr.c b/arch/powerpc/kvm/book3s_hv_papr.c
new file mode 100644
index 000000000000..05d8e735e2a9
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_papr.c
@@ -0,0 +1,940 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2023 Jordan Niethe, IBM Corp. <jniethe5@gmail.com>
+ *
+ * Authors:
+ *    Jordan Niethe <jniethe5@gmail.com>
+ *
+ * Description: KVM functions specific to running on Book 3S
+ * processors as a PAPR guest.
+ *
+ */
+
+#include "linux/blk-mq.h"
+#include "linux/console.h"
+#include "linux/gfp_types.h"
+#include "linux/signal.h"
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/pgtable.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/hvcall.h>
+#include <asm/pgalloc.h>
+#include <asm/reg.h>
+#include <asm/plpar_wrappers.h>
+#include <asm/guest-state-buffer.h>
+#include "trace_hv.h"
+
+bool __kvmhv_on_papr __read_mostly;
+EXPORT_SYMBOL_GPL(__kvmhv_on_papr);
+
+
+static size_t gs_msg_ops_kvmhv_papr_config_get_size(struct gs_msg *gsm)
+{
+	u16 ids[] = {
+		GSID_RUN_OUTPUT_MIN_SIZE,
+		GSID_RUN_INPUT,
+		GSID_RUN_OUTPUT,
+
+	};
+	size_t size = 0;
+
+	for (int i = 0; i < ARRAY_SIZE(ids); i++)
+		size += gse_total_size(gsid_size(ids[i]));
+	return size;
+}
+
+static int gs_msg_ops_kvmhv_papr_config_fill_info(struct gs_buff *gsb,
+						  struct gs_msg *gsm)
+{
+	struct kvmhv_papr_config *cfg;
+	int rc;
+
+	cfg = gsm->data;
+
+	if (gsm_includes(gsm, GSID_RUN_OUTPUT_MIN_SIZE)) {
+		rc = gse_put(gsb, GSID_RUN_OUTPUT_MIN_SIZE,
+			     cfg->vcpu_run_output_size);
+		if (rc < 0)
+			return rc;
+	}
+
+	if (gsm_includes(gsm, GSID_RUN_INPUT)) {
+		rc = gse_put(gsb, GSID_RUN_INPUT, cfg->vcpu_run_input_cfg);
+		if (rc < 0)
+			return rc;
+	}
+
+	if (gsm_includes(gsm, GSID_RUN_OUTPUT)) {
+		gse_put(gsb, GSID_RUN_OUTPUT, cfg->vcpu_run_output_cfg);
+		if (rc < 0)
+			return rc;
+	}
+
+	return 0;
+}
+
+static int gs_msg_ops_kvmhv_papr_config_refresh_info(struct gs_msg *gsm,
+						     struct gs_buff *gsb)
+{
+	struct kvmhv_papr_config *cfg;
+	struct gs_parser gsp = { 0 };
+	struct gs_elem *gse;
+	int rc;
+
+	cfg = gsm->data;
+
+	rc = gse_parse(&gsp, gsb);
+	if (rc < 0)
+		return rc;
+
+	gse = gsp_lookup(&gsp, GSID_RUN_OUTPUT_MIN_SIZE);
+	if (gse)
+		gse_get(gse, &cfg->vcpu_run_output_size);
+	return 0;
+}
+
+static struct gs_msg_ops config_msg_ops = {
+	.get_size = gs_msg_ops_kvmhv_papr_config_get_size,
+	.fill_info = gs_msg_ops_kvmhv_papr_config_fill_info,
+	.refresh_info = gs_msg_ops_kvmhv_papr_config_refresh_info,
+};
+
+static size_t gs_msg_ops_vcpu_get_size(struct gs_msg *gsm)
+{
+	struct gs_bitmap gsbm = { 0 };
+	size_t size = 0;
+	u16 iden;
+
+	gsbm_fill(&gsbm);
+	gsbm_for_each(&gsbm, iden) {
+		switch (iden) {
+		case GSID_HOST_STATE_SIZE:
+		case GSID_RUN_OUTPUT_MIN_SIZE:
+		case GSID_PARTITION_TABLE:
+		case GSID_PROCESS_TABLE:
+		case GSID_RUN_INPUT:
+		case GSID_RUN_OUTPUT:
+			break;
+		default:
+			size += gse_total_size(gsid_size(iden));
+		}
+	}
+	return size;
+}
+
+static int gs_msg_ops_vcpu_fill_info(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	struct kvm_vcpu *vcpu;
+	vector128 v;
+	int rc, i;
+	u16 iden;
+
+	vcpu = gsm->data;
+
+	gsm_for_each(gsm, iden)
+	{
+		rc = 0;
+
+		if ((gsm->flags & GS_FLAGS_WIDE) !=
+		    (gsid_flags(iden) & GS_FLAGS_WIDE))
+			continue;
+
+		switch (iden) {
+		case GSID_DSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.dscr);
+			break;
+		case GSID_MMCRA:
+			rc = gse_put(gsb, iden, vcpu->arch.mmcra);
+			break;
+		case GSID_HFSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.hfscr);
+			break;
+		case GSID_PURR:
+			rc = gse_put(gsb, iden, vcpu->arch.purr);
+			break;
+		case GSID_SPURR:
+			rc = gse_put(gsb, iden, vcpu->arch.spurr);
+			break;
+		case GSID_AMR:
+			rc = gse_put(gsb, iden, vcpu->arch.amr);
+			break;
+		case GSID_UAMOR:
+			rc = gse_put(gsb, iden, vcpu->arch.uamor);
+			break;
+		case GSID_SIAR:
+			rc = gse_put(gsb, iden, vcpu->arch.siar);
+			break;
+		case GSID_SDAR:
+			rc = gse_put(gsb, iden, vcpu->arch.sdar);
+			break;
+		case GSID_IAMR:
+			rc = gse_put(gsb, iden, vcpu->arch.iamr);
+			break;
+		case GSID_DAWR0:
+			rc = gse_put(gsb, iden, vcpu->arch.dawr0);
+			break;
+		case GSID_DAWR1:
+			rc = gse_put(gsb, iden, vcpu->arch.dawr1);
+			break;
+		case GSID_DAWRX0:
+			rc = gse_put(gsb, iden, vcpu->arch.dawrx0);
+			break;
+		case GSID_DAWRX1:
+			rc = gse_put(gsb, iden, vcpu->arch.dawrx1);
+			break;
+		case GSID_CIABR:
+			rc = gse_put(gsb, iden, vcpu->arch.ciabr);
+			break;
+		case GSID_WORT:
+			rc = gse_put(gsb, iden, vcpu->arch.wort);
+			break;
+		case GSID_PPR:
+			rc = gse_put(gsb, iden, vcpu->arch.ppr);
+			break;
+		case GSID_PSPB:
+			rc = gse_put(gsb, iden, vcpu->arch.pspb);
+			break;
+		case GSID_TAR:
+			rc = gse_put(gsb, iden, vcpu->arch.tar);
+			break;
+		case GSID_FSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.fscr);
+			break;
+		case GSID_EBBHR:
+			rc = gse_put(gsb, iden, vcpu->arch.ebbhr);
+			break;
+		case GSID_EBBRR:
+			rc = gse_put(gsb, iden, vcpu->arch.ebbrr);
+			break;
+		case GSID_BESCR:
+			rc = gse_put(gsb, iden, vcpu->arch.bescr);
+			break;
+		case GSID_IC:
+			rc = gse_put(gsb, iden, vcpu->arch.ic);
+			break;
+		case GSID_CTRL:
+			rc = gse_put(gsb, iden, vcpu->arch.ctrl);
+			break;
+		case GSID_PIDR:
+			rc = gse_put(gsb, iden, vcpu->arch.pid);
+			break;
+		case GSID_AMOR:
+			rc = gse_put(gsb, iden, vcpu->arch.amor);
+			break;
+		case GSID_VRSAVE:
+			rc = gse_put(gsb, iden, vcpu->arch.vrsave);
+			break;
+		case GSID_MMCR(0) ... GSID_MMCR(3):
+			i = iden - GSID_MMCR(0);
+			rc = gse_put(gsb, iden, vcpu->arch.mmcr[i]);
+			break;
+		case GSID_SIER(0) ... GSID_SIER(2):
+			i = iden - GSID_SIER(0);
+			rc = gse_put(gsb, iden, vcpu->arch.sier[i]);
+			break;
+		case GSID_PMC(0) ... GSID_PMC(5):
+			i = iden - GSID_PMC(0);
+			rc = gse_put(gsb, iden, vcpu->arch.pmc[i]);
+			break;
+		case GSID_GPR(0) ... GSID_GPR(31):
+			i = iden - GSID_GPR(0);
+			rc = gse_put(gsb, iden, vcpu->arch.regs.gpr[i]);
+			break;
+		case GSID_CR:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.ccr);
+			break;
+		case GSID_XER:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.xer);
+			break;
+		case GSID_CTR:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.ctr);
+			break;
+		case GSID_LR:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.link);
+			break;
+		case GSID_NIA:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.nip);
+			break;
+		case GSID_SRR0:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.srr0);
+			break;
+		case GSID_SRR1:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.srr1);
+			break;
+		case GSID_SPRG0:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg0);
+			break;
+		case GSID_SPRG1:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg1);
+			break;
+		case GSID_SPRG2:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg2);
+			break;
+		case GSID_SPRG3:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg3);
+			break;
+		case GSID_DAR:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.dar);
+			break;
+		case GSID_DSISR:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.dsisr);
+			break;
+		case GSID_MSR:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.msr);
+			break;
+		case GSID_VTB:
+			rc = gse_put(gsb, iden, vcpu->arch.vcore->vtb);
+			break;
+		case GSID_LPCR:
+			rc = gse_put(gsb, iden, vcpu->arch.vcore->lpcr);
+			break;
+		case GSID_TB_OFFSET:
+			rc = gse_put(gsb, iden, vcpu->arch.vcore->tb_offset);
+			break;
+		case GSID_FPSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.fp.fpscr);
+			break;
+		case GSID_VSRS(0) ... GSID_VSRS(31):
+			i = iden - GSID_VSRS(0);
+			memcpy(&v, &vcpu->arch.fp.fpr[i],
+			       sizeof(vcpu->arch.fp.fpr[i]));
+			rc = gse_put(gsb, iden, v);
+			break;
+#ifdef CONFIG_VSX
+		case GSID_VSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.vr.vscr.u[3]);
+			break;
+		case GSID_VSRS(32) ... GSID_VSRS(63):
+			i = iden - GSID_VSRS(32);
+			rc = gse_put(gsb, iden, vcpu->arch.vr.vr[i]);
+			break;
+#endif
+		case GSID_DEC_EXPIRY_TB: {
+			u64 dw;
+
+			dw = vcpu->arch.dec_expires -
+			     vcpu->arch.vcore->tb_offset;
+			rc = gse_put(gsb, iden, dw);
+		}
+			break;
+		}
+
+		if (rc < 0)
+			return rc;
+	}
+
+	return 0;
+}
+
+static int gs_msg_ops_vcpu_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	struct gs_parser gsp = { 0 };
+	struct kvmhv_papr_host *ph;
+	struct gs_bitmap *valids;
+	struct kvm_vcpu *vcpu;
+	struct gs_elem *gse;
+	vector128 v;
+	int rc, i;
+	u16 iden;
+
+	vcpu = gsm->data;
+
+	rc = gse_parse(&gsp, gsb);
+	if (rc < 0)
+		return rc;
+
+	ph = &vcpu->arch.papr_host;
+	valids = &ph->valids;
+
+	gsp_for_each(&gsp, iden, gse)
+	{
+		switch (iden) {
+		case GSID_DSCR:
+			gse_get(gse, &vcpu->arch.dscr);
+			break;
+		case GSID_MMCRA:
+			gse_get(gse, &vcpu->arch.mmcra);
+			break;
+		case GSID_HFSCR:
+			gse_get(gse, &vcpu->arch.hfscr);
+			break;
+		case GSID_PURR:
+			gse_get(gse, &vcpu->arch.purr);
+			break;
+		case GSID_SPURR:
+			gse_get(gse, &vcpu->arch.spurr);
+			break;
+		case GSID_AMR:
+			gse_get(gse, &vcpu->arch.amr);
+			break;
+		case GSID_UAMOR:
+			gse_get(gse, &vcpu->arch.uamor);
+			break;
+		case GSID_SIAR:
+			gse_get(gse, &vcpu->arch.siar);
+			break;
+		case GSID_SDAR:
+			gse_get(gse, &vcpu->arch.sdar);
+			break;
+		case GSID_IAMR:
+			gse_get(gse, &vcpu->arch.iamr);
+			break;
+		case GSID_DAWR0:
+			gse_get(gse, &vcpu->arch.dawr0);
+			break;
+		case GSID_DAWR1:
+			gse_get(gse, &vcpu->arch.dawr1);
+			break;
+		case GSID_DAWRX0:
+			gse_get(gse, &vcpu->arch.dawrx0);
+			break;
+		case GSID_DAWRX1:
+			gse_get(gse, &vcpu->arch.dawrx1);
+			break;
+		case GSID_CIABR:
+			gse_get(gse, &vcpu->arch.ciabr);
+			break;
+		case GSID_WORT:
+			gse_get(gse, &vcpu->arch.wort);
+			break;
+		case GSID_PPR:
+			gse_get(gse, &vcpu->arch.ppr);
+			break;
+		case GSID_PSPB:
+			gse_get(gse, &vcpu->arch.pspb);
+			break;
+		case GSID_TAR:
+			gse_get(gse, &vcpu->arch.tar);
+			break;
+		case GSID_FSCR:
+			gse_get(gse, &vcpu->arch.fscr);
+			break;
+		case GSID_EBBHR:
+			gse_get(gse, &vcpu->arch.ebbhr);
+			break;
+		case GSID_EBBRR:
+			gse_get(gse, &vcpu->arch.ebbrr);
+			break;
+		case GSID_BESCR:
+			gse_get(gse, &vcpu->arch.bescr);
+			break;
+		case GSID_IC:
+			gse_get(gse, &vcpu->arch.ic);
+			break;
+		case GSID_CTRL:
+			gse_get(gse, &vcpu->arch.ctrl);
+			break;
+		case GSID_PIDR:
+			gse_get(gse, &vcpu->arch.pid);
+			break;
+		case GSID_AMOR:
+			gse_get(gse, &vcpu->arch.amor);
+			break;
+		case GSID_VRSAVE:
+			gse_get(gse, &vcpu->arch.vrsave);
+			break;
+		case GSID_MMCR(0) ... GSID_MMCR(3):
+			i = iden - GSID_MMCR(0);
+			gse_get(gse, &vcpu->arch.mmcr[i]);
+			break;
+		case GSID_SIER(0) ... GSID_SIER(2):
+			i = iden - GSID_SIER(0);
+			gse_get(gse, &vcpu->arch.sier[i]);
+			break;
+		case GSID_PMC(0) ... GSID_PMC(5):
+			i = iden - GSID_PMC(0);
+			gse_get(gse, &vcpu->arch.pmc[i]);
+			break;
+		case GSID_GPR(0) ... GSID_GPR(31):
+			i = iden - GSID_GPR(0);
+			gse_get(gse, &vcpu->arch.regs.gpr[i]);
+			break;
+		case GSID_CR:
+			gse_get(gse, &vcpu->arch.regs.ccr);
+			break;
+		case GSID_XER:
+			gse_get(gse, &vcpu->arch.regs.xer);
+			break;
+		case GSID_CTR:
+			gse_get(gse, &vcpu->arch.regs.ctr);
+			break;
+		case GSID_LR:
+			gse_get(gse, &vcpu->arch.regs.link);
+			break;
+		case GSID_NIA:
+			gse_get(gse, &vcpu->arch.regs.nip);
+			break;
+		case GSID_SRR0:
+			gse_get(gse, &vcpu->arch.shregs.srr0);
+			break;
+		case GSID_SRR1:
+			gse_get(gse, &vcpu->arch.shregs.srr1);
+			break;
+		case GSID_SPRG0:
+			gse_get(gse, &vcpu->arch.shregs.sprg0);
+			break;
+		case GSID_SPRG1:
+			gse_get(gse, &vcpu->arch.shregs.sprg1);
+			break;
+		case GSID_SPRG2:
+			gse_get(gse, &vcpu->arch.shregs.sprg2);
+			break;
+		case GSID_SPRG3:
+			gse_get(gse, &vcpu->arch.shregs.sprg3);
+			break;
+		case GSID_DAR:
+			gse_get(gse, &vcpu->arch.shregs.dar);
+			break;
+		case GSID_DSISR:
+			gse_get(gse, &vcpu->arch.shregs.dsisr);
+			break;
+		case GSID_MSR:
+			gse_get(gse, &vcpu->arch.shregs.msr);
+			break;
+		case GSID_VTB:
+			gse_get(gse, &vcpu->arch.vcore->vtb);
+			break;
+		case GSID_LPCR:
+			gse_get(gse, &vcpu->arch.vcore->lpcr);
+			break;
+		case GSID_TB_OFFSET:
+			gse_get(gse, &vcpu->arch.vcore->tb_offset);
+			break;
+		case GSID_FPSCR:
+			gse_get(gse, &vcpu->arch.fp.fpscr);
+			break;
+		case GSID_VSRS(0) ... GSID_VSRS(31):
+			gse_get(gse, &v);
+			i = iden - GSID_VSRS(0);
+			memcpy(&vcpu->arch.fp.fpr[i], &v,
+			       sizeof(vcpu->arch.fp.fpr[i]));
+			break;
+#ifdef CONFIG_VSX
+		case GSID_VSCR:
+			gse_get(gse, &vcpu->arch.vr.vscr.u[3]);
+			break;
+		case GSID_VSRS(32) ... GSID_VSRS(63):
+			i = iden - GSID_VSRS(32);
+			gse_get(gse, &vcpu->arch.vr.vr[i]);
+			break;
+#endif
+		case GSID_HDAR:
+			gse_get(gse, &vcpu->arch.fault_dar);
+			break;
+		case GSID_HDSISR:
+			gse_get(gse, &vcpu->arch.fault_dsisr);
+			break;
+		case GSID_ASDR:
+			gse_get(gse, &vcpu->arch.fault_gpa);
+			break;
+		case GSID_HEIR:
+			gse_get(gse, &vcpu->arch.emul_inst);
+			break;
+		case GSID_DEC_EXPIRY_TB: {
+			u64 dw;
+
+			gse_get(gse, &dw);
+			vcpu->arch.dec_expires =
+				dw + vcpu->arch.vcore->tb_offset;
+			break;
+		}
+		default:
+			continue;
+		}
+		gsbm_set(valids, iden);
+	}
+
+	return 0;
+}
+
+static struct gs_msg_ops vcpu_message_ops = {
+	.get_size = gs_msg_ops_vcpu_get_size,
+	.fill_info = gs_msg_ops_vcpu_fill_info,
+	.refresh_info = gs_msg_ops_vcpu_refresh_info,
+};
+
+static int kvmhv_papr_host_create(struct kvm_vcpu *vcpu,
+				  struct kvmhv_papr_host *ph)
+{
+	struct kvmhv_papr_config *cfg;
+	struct gs_buff *gsb, *vcpu_run_output, *vcpu_run_input;
+	unsigned long guest_id, vcpu_id;
+	struct gs_msg *gsm, *vcpu_message, *vcore_message;
+	int rc;
+
+	cfg = &ph->cfg;
+	guest_id = vcpu->kvm->arch.lpid;
+	vcpu_id = vcpu->vcpu_id;
+
+	gsm = gsm_new(&config_msg_ops, cfg, GS_FLAGS_WIDE, GFP_KERNEL);
+	if (!gsm) {
+		rc = -ENOMEM;
+		goto err;
+	}
+
+	gsb = gsb_new(gsm_size(gsm), guest_id, vcpu_id, GFP_KERNEL);
+	if (!gsb) {
+		rc = -ENOMEM;
+		goto free_gsm;
+	}
+
+	rc = gsb_receive_datum(gsb, gsm, GSID_RUN_OUTPUT_MIN_SIZE);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't get vcpu run output buffer minimum size\n");
+		goto free_gsb;
+	}
+
+	vcpu_run_output = gsb_new(cfg->vcpu_run_output_size, guest_id, vcpu_id, GFP_KERNEL);
+	if (!vcpu_run_output) {
+		rc = -ENOMEM;
+		goto free_gsb;
+	}
+
+	cfg->vcpu_run_output_cfg.address = gsb_paddress(vcpu_run_output);
+	cfg->vcpu_run_output_cfg.size = gsb_capacity(vcpu_run_output);
+	ph->vcpu_run_output = vcpu_run_output;
+
+	gsm->flags = 0;
+	rc = gsb_send_datum(gsb, gsm, GSID_RUN_OUTPUT);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set vcpu run output buffer\n");
+		goto free_gs_out;
+	}
+
+	vcpu_message = gsm_new(&vcpu_message_ops, vcpu, 0, GFP_KERNEL);
+	if (!vcpu_message) {
+		rc = -ENOMEM;
+		goto free_gs_out;
+	}
+	gsm_include_all(vcpu_message);
+
+	ph->vcpu_message = vcpu_message;
+
+	vcpu_run_input = gsb_new(gsm_size(vcpu_message), guest_id, vcpu_id, GFP_KERNEL);
+	if (!vcpu_run_input) {
+		rc = -ENOMEM;
+		goto free_vcpu_message;
+	}
+
+	ph->vcpu_run_input = vcpu_run_input;
+	cfg->vcpu_run_input_cfg.address = gsb_paddress(vcpu_run_input);
+	cfg->vcpu_run_input_cfg.size = gsb_capacity(vcpu_run_input);
+	rc = gsb_send_datum(gsb, gsm, GSID_RUN_INPUT);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set vcpu run input buffer\n");
+		goto free_vcpu_run_input;
+	}
+
+	vcore_message =
+		gsm_new(&vcpu_message_ops, vcpu, GS_FLAGS_WIDE, GFP_KERNEL);
+	if (!vcore_message) {
+		rc = -ENOMEM;
+		goto free_vcpu_run_input;
+	}
+
+	gsm_include_all(vcore_message);
+	ph->vcore_message = vcore_message;
+
+	gsbm_fill(&ph->valids);
+	gsm_free(gsm);
+	gsb_free(gsb);
+	return 0;
+
+free_vcpu_run_input:
+	gsb_free(vcpu_run_input);
+free_vcpu_message:
+	gsm_free(vcpu_message);
+free_gs_out:
+	gsb_free(vcpu_run_output);
+free_gsb:
+	gsb_free(gsb);
+free_gsm:
+	gsm_free(gsm);
+err:
+	return rc;
+}
+
+/**
+ * __kvmhv_papr_mark_dirty() - mark a Guest State ID to be sent to the host
+ * @vcpu: vcpu
+ * @iden: guest state ID
+ *
+ * Mark a guest state ID as having been changed by the L1 host and thus
+ * the new value must be sent to the L0 hypervisor. See kvmhv_papr_flush_vcpu()
+ */
+int __kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_bitmap *valids;
+	struct gs_msg *gsm;
+
+	if (!iden)
+		return 0;
+
+	ph = &vcpu->arch.papr_host;
+	valids = &ph->valids;
+	gsm = ph->vcpu_message;
+	gsm_include(gsm, iden);
+	gsm = ph->vcore_message;
+	gsm_include(gsm, iden);
+	gsbm_set(valids, iden);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_mark_dirty);
+
+/**
+ * __kvmhv_papr_cached_reload() - reload a Guest State ID from the host
+ * @vcpu: vcpu
+ * @iden: guest state ID
+ *
+ * Reload the value for the guest state ID from the L0 host into the L1 host.
+ * This is cached so that going out to the L0 host only happens if necessary.
+ */
+int __kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_bitmap *valids;
+	struct gs_buff *gsb;
+	struct gs_msg gsm;
+	int rc;
+
+	if (!iden)
+		return 0;
+
+	ph = &vcpu->arch.papr_host;
+	valids = &ph->valids;
+	if (gsbm_test(valids, iden))
+		return 0;
+
+	gsb = ph->vcpu_run_input;
+	gsm_init(&gsm, &vcpu_message_ops, vcpu, gsid_flags(iden));
+	rc = gsb_receive_datum(gsb, &gsm, iden);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't get GSID: 0x%x\n", iden);
+		return rc;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_cached_reload);
+
+/**
+ * kvmhv_papr_flush_vcpu() - send modified Guest State IDs to the host
+ * @vcpu: vcpu
+ * @time_limit: hdec expiry tb
+ *
+ * Send the values marked by __kvmhv_papr_mark_dirty() to the L0 host. Thread
+ * wide values are copied to the H_GUEST_RUN_VCPU input buffer. Guest wide
+ * values need to be sent with H_GUEST_SET first.
+ *
+ * The hdec tb offset is always sent to L0 host.
+ */
+int kvmhv_papr_flush_vcpu(struct kvm_vcpu *vcpu, u64 time_limit)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_buff *gsb;
+	struct gs_msg *gsm;
+	int rc;
+
+	ph = &vcpu->arch.papr_host;
+	gsb = ph->vcpu_run_input;
+	gsm = ph->vcore_message;
+	rc = gsb_send_data(gsb, gsm);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set guest wide elements\n");
+		return rc;
+	}
+
+	gsm = ph->vcpu_message;
+	rc = gsm_fill_info(gsm, gsb);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't fill vcpu run input buffer\n");
+		return rc;
+	}
+
+	rc = gse_put(gsb, GSID_HDEC_EXPIRY_TB, time_limit);
+	if (rc < 0)
+		return rc;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_flush_vcpu);
+
+
+/**
+ * kvmhv_papr_set_ptbl_entry() - send partition and process table state to L0 host
+ * @lpid: guest id
+ * @dw0: partition table double word
+ * @dw1: process table double word
+ */
+int kvmhv_papr_set_ptbl_entry(u64 lpid, u64 dw0, u64 dw1)
+{
+	struct gs_part_table patbl;
+	struct gs_proc_table prtbl;
+	struct gs_buff *gsb;
+	size_t size;
+	int rc;
+
+	size = gse_total_size(gsid_size(GSID_PARTITION_TABLE)) +
+	       gse_total_size(gsid_size(GSID_PROCESS_TABLE)) +
+	       sizeof(struct gs_header);
+	gsb = gsb_new(size, lpid, 0, GFP_KERNEL);
+	if (!gsb)
+		return -ENOMEM;
+
+	patbl.address = dw0 & RPDB_MASK;
+	patbl.ea_bits = ((((dw0 & RTS1_MASK) >> (RTS1_SHIFT - 3)) |
+			  ((dw0 & RTS2_MASK) >> RTS2_SHIFT)) +
+			 31);
+	patbl.gpd_size = 1ul << ((dw0 & RPDS_MASK) + 3);
+	rc = gse_put(gsb, GSID_PARTITION_TABLE, patbl);
+	if (rc < 0)
+		goto free_gsb;
+
+	prtbl.address = dw1 & PRTB_MASK;
+	prtbl.gpd_size = 1ul << ((dw1 & PRTS_MASK) + 12);
+	rc = gse_put(gsb, GSID_PROCESS_TABLE, prtbl);
+	if (rc < 0)
+		goto free_gsb;
+
+	rc = gsb_send(gsb, GS_FLAGS_WIDE);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set the PATE\n");
+		goto free_gsb;
+	}
+
+	gsb_free(gsb);
+	return 0;
+
+free_gsb:
+	gsb_free(gsb);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_set_ptbl_entry);
+
+/**
+ * kvmhv_papr_parse_output() - receive values from H_GUEST_RUN_VCPU output
+ * @vcpu: vcpu
+ *
+ * Parse the output buffer from H_GUEST_RUN_VCPU to update vcpu.
+ */
+int kvmhv_papr_parse_output(struct kvm_vcpu *vcpu)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_buff *gsb;
+	struct gs_msg gsm;
+
+	ph = &vcpu->arch.papr_host;
+	gsb = ph->vcpu_run_output;
+
+	vcpu->arch.fault_dar = 0;
+	vcpu->arch.fault_dsisr = 0;
+	vcpu->arch.fault_gpa = 0;
+	vcpu->arch.emul_inst = KVM_INST_FETCH_FAILED;
+
+	gsm_init(&gsm, &vcpu_message_ops, vcpu, 0);
+	gsm_refresh_info(&gsm, gsb);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_parse_output);
+
+static void kvmhv_papr_host_free(struct kvm_vcpu *vcpu,
+				 struct kvmhv_papr_host *ph)
+{
+	gsm_free(ph->vcpu_message);
+	gsm_free(ph->vcore_message);
+	gsb_free(ph->vcpu_run_input);
+	gsb_free(ph->vcpu_run_output);
+}
+
+int __kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	int rc;
+
+	for (int i = 0; i < 32; i++) {
+		rc = kvmhv_papr_cached_reload(vcpu, GSID_GPR(i));
+		if (rc < 0)
+			return rc;
+	}
+
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_CR);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_XER);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_CTR);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_LR);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_NIA);
+	if (rc < 0)
+		return rc;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_reload_ptregs);
+
+int __kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	for (int i = 0; i < 32; i++)
+		kvmhv_papr_mark_dirty(vcpu, GSID_GPR(i));
+
+	kvmhv_papr_mark_dirty(vcpu, GSID_CR);
+	kvmhv_papr_mark_dirty(vcpu, GSID_XER);
+	kvmhv_papr_mark_dirty(vcpu, GSID_CTR);
+	kvmhv_papr_mark_dirty(vcpu, GSID_LR);
+	kvmhv_papr_mark_dirty(vcpu, GSID_NIA);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_mark_dirty_ptregs);
+
+/**
+ * kvmhv_papr_vcpu_create() - create nested vcpu for the PAPR API
+ * @vcpu: vcpu
+ * @ph: PAPR nested host state
+ *
+ * Parse the output buffer from H_GUEST_RUN_VCPU to update vcpu.
+ */
+int kvmhv_papr_vcpu_create(struct kvm_vcpu *vcpu,
+			   struct kvmhv_papr_host *ph)
+{
+	long rc;
+
+	rc = plpar_guest_create_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id);
+
+	if (rc != H_SUCCESS) {
+		pr_err("KVM: Create Guest vcpu hcall failed, rc=%ld\n", rc);
+		switch (rc) {
+		case H_NOT_ENOUGH_RESOURCES:
+		case H_ABORTED:
+			return -ENOMEM;
+		case H_AUTHORITY:
+			return -EPERM;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	rc = kvmhv_papr_host_create(vcpu, ph);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_vcpu_create);
+
+/**
+ * kvmhv_papr_vcpu_free() - free the PAPR host state
+ * @vcpu: vcpu
+ * @ph: PAPR nested host state
+ */
+void kvmhv_papr_vcpu_free(struct kvm_vcpu *vcpu,
+			  struct kvmhv_papr_host *ph)
+{
+	kvmhv_papr_host_free(vcpu, ph);
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_vcpu_free);
diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c
index e6e66c3792f8..663403fa86d4 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -92,7 +92,8 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmio_host_swabbed = 0;
 
 	emulated = EMULATE_FAIL;
-	vcpu->arch.regs.msr = vcpu->arch.shared->msr;
+	vcpu->arch.regs.msr = kvmppc_get_msr(vcpu);
+	kvmhv_papr_reload_ptregs(vcpu, &vcpu->arch.regs);
 	if (analyse_instr(&op, &vcpu->arch.regs, inst) == 0) {
 		int type = op.type & INSTR_TYPE_MASK;
 		int size = GETSIZE(op.type);
@@ -357,6 +358,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 	}
 
 	trace_kvm_ppc_instr(ppc_inst_val(inst), kvmppc_get_pc(vcpu), emulated);
+	kvmhv_papr_mark_dirty_ptregs(vcpu, &vcpu->arch.regs);
 
 	/* Advance past emulated instruction. */
 	if (emulated != EMULATE_FAIL)
diff --git a/arch/powerpc/kvm/guest-state-buffer.c b/arch/powerpc/kvm/guest-state-buffer.c
index db4a79bfcaf1..cc3a7a416867 100644
--- a/arch/powerpc/kvm/guest-state-buffer.c
+++ b/arch/powerpc/kvm/guest-state-buffer.c
@@ -561,3 +561,52 @@ int gsm_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
 	return gsm->ops->refresh_info(gsm, gsb);
 }
 EXPORT_SYMBOL(gsm_refresh_info);
+
+/**
+ * gsb_send - send all elements in the buffer to the hypervisor.
+ * @gsb: guest state buffer
+ * @flags: guest wide or thread wide
+ *
+ * Performs the H_GUEST_SET_STATE hcall for the guest state buffer.
+ */
+int gsb_send(struct gs_buff *gsb, unsigned long flags)
+{
+	unsigned long hflags = 0;
+	unsigned long i;
+	int rc;
+
+	if (gsb_nelems(gsb) == 0)
+		return 0;
+
+	if (flags & GS_FLAGS_WIDE)
+		hflags |= H_GUEST_FLAGS_WIDE;
+
+	rc = plpar_guest_set_state(hflags, gsb->guest_id, gsb->vcpu_id,
+				   __pa(gsb->hdr), gsb->capacity, &i);
+	return rc;
+}
+EXPORT_SYMBOL(gsb_send);
+
+/**
+ * gsb_recv - request all elements in the buffer have their value updated.
+ * @gsb: guest state buffer
+ * @flags: guest wide or thread wide
+ *
+ * Performs the H_GUEST_GET_STATE hcall for the guest state buffer.
+ * After returning from the hcall the guest state elements that were
+ * present in the buffer will have updated values from the hypervisor.
+ */
+int gsb_recv(struct gs_buff *gsb, unsigned long flags)
+{
+	unsigned long hflags = 0;
+	unsigned long i;
+	int rc;
+
+	if (flags & GS_FLAGS_WIDE)
+		hflags |= H_GUEST_FLAGS_WIDE;
+
+	rc = plpar_guest_get_state(hflags, gsb->guest_id, gsb->vcpu_id,
+				   __pa(gsb->hdr), gsb->capacity, &i);
+	return rc;
+}
+EXPORT_SYMBOL(gsb_recv);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 5/6] KVM: PPC: Add support for nested PAPR guests
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

A series of hcalls have been added to the PAPR which allow a regular
guest partition to create and manage guest partitions of its own. Add
support to KVM to utilize these hcalls to enable running nested guests.

Overview of the new hcall usage:

- L1 and L0 negotiate capabilities with
  H_GUEST_{G,S}ET_CAPABILITIES()

- L1 requests the L0 create a L2 with
  H_GUEST_CREATE() and receives a handle to use in future hcalls

- L1 requests the L0 create a L2 vCPU with
  H_GUEST_CREATE_VCPU()

- L1 sets up the L2 using H_GUEST_SET and the
  H_GUEST_VCPU_RUN input buffer

- L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN()

- L2 returns to L1 with an exit reason and L1 reads the
  H_GUEST_VCPU_RUN output buffer populated by the L0

- L1 handles the exit using H_GET_STATE if necessary

- L1 reruns L2 vCPU with H_GUEST_VCPU_RUN

- L1 frees the L2 in the L0 with H_GUEST_DELETE()

Support for the new API is determined by trying
H_GUEST_GET_CAPABILITIES. On a successful return, the new API will then
be used.

Use the vcpu register state setters for tracking modified guest state
elements and copy the thread wide values into the H_GUEST_VCPU_RUN input
buffer immediately before running a L2. The guest wide
elements can not be added to the input buffer so send them with a
separate H_GUEST_SET call if necessary.

Make the vcpu register getter load the corresponding value from the real
host with H_GUEST_GET. To avoid unnecessarily calling H_GUEST_GET, track
which values have already been loaded between H_GUEST_VCPU_RUN calls. If
an element is present in the H_GUEST_VCPU_RUN output buffer it also does
not need to be loaded again.

There is existing support for running nested guests on KVM
with powernv. However the interface used for this is not supported by
other PAPR hosts. This existing API is still supported.

Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
v2:
  - Declare op structs as static
  - Use expressions in switch case with local variables
  - Do not use the PVR for the LOGICAL PVR ID
  - Handle emul_inst as now a double word
  - Use new GPR(), etc macros
  - Determine PAPR nested capabilities from cpu features
---
 arch/powerpc/include/asm/guest-state-buffer.h | 105 +-
 arch/powerpc/include/asm/hvcall.h             |  30 +
 arch/powerpc/include/asm/kvm_book3s.h         | 122 ++-
 arch/powerpc/include/asm/kvm_book3s_64.h      |   6 +
 arch/powerpc/include/asm/kvm_host.h           |  21 +
 arch/powerpc/include/asm/kvm_ppc.h            |  64 +-
 arch/powerpc/include/asm/plpar_wrappers.h     | 198 ++++
 arch/powerpc/kvm/Makefile                     |   1 +
 arch/powerpc/kvm/book3s_hv.c                  | 126 ++-
 arch/powerpc/kvm/book3s_hv.h                  |  74 +-
 arch/powerpc/kvm/book3s_hv_nested.c           |  38 +-
 arch/powerpc/kvm/book3s_hv_papr.c             | 940 ++++++++++++++++++
 arch/powerpc/kvm/emulate_loadstore.c          |   4 +-
 arch/powerpc/kvm/guest-state-buffer.c         |  49 +
 14 files changed, 1684 insertions(+), 94 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_papr.c

diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
index 65a840abf1bb..116126edd8e2 100644
--- a/arch/powerpc/include/asm/guest-state-buffer.h
+++ b/arch/powerpc/include/asm/guest-state-buffer.h
@@ -5,6 +5,7 @@
 #ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
 #define _ASM_POWERPC_GUEST_STATE_BUFFER_H
 
+#include "asm/hvcall.h"
 #include <linux/gfp.h>
 #include <linux/bitmap.h>
 #include <asm/plpar_wrappers.h>
@@ -14,16 +15,16 @@
  **************************************************************************/
 #define GSID_BLANK			0x0000
 
-#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
-#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
-#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
-#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
-#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
-#define GSID_PROCESS_TABLE		0x0006 /* Process Table */
+#define GSID_HOST_STATE_SIZE		0x0001
+#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002
+#define GSID_LOGICAL_PVR		0x0003
+#define GSID_TB_OFFSET			0x0004
+#define GSID_PARTITION_TABLE		0x0005
+#define GSID_PROCESS_TABLE		0x0006
 
-#define GSID_RUN_INPUT			0x0C00 /* Run VCPU Input Buffer */
-#define GSID_RUN_OUTPUT			0x0C01 /* Run VCPU Out Buffer */
-#define GSID_VPA			0x0C02 /* HRA to Guest VCPU VPA */
+#define GSID_RUN_INPUT			0x0C00
+#define GSID_RUN_OUTPUT			0x0C01
+#define GSID_VPA			0x0C02
 
 #define GSID_GPR(x)			(0x1000 + (x))
 #define GSID_HDEC_EXPIRY_TB		0x1020
@@ -300,6 +301,8 @@ struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
 			unsigned long vcpu_id, gfp_t flags);
 void gsb_free(struct gs_buff *gsb);
 void *gsb_put(struct gs_buff *gsb, size_t size);
+int gsb_send(struct gs_buff *gsb, unsigned long flags);
+int gsb_recv(struct gs_buff *gsb, unsigned long flags);
 
 /**
  * gsb_header() - the header of a guest state buffer
@@ -898,4 +901,88 @@ static inline void gsm_reset(struct gs_msg *gsm)
 	gsbm_zero(&gsm->bitmap);
 }
 
+/**
+ * gsb_receive_data - flexibly update values from a guest state buffer
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ *
+ * Requests updated values for the guest state values included in the guest
+ * state message. The guest state message will then deserialize the guest state
+ * buffer.
+ */
+static inline int gsb_receive_data(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	int rc;
+
+	rc = gsm_fill_info(gsm, gsb);
+	if (rc < 0)
+		return rc;
+
+	rc = gsb_recv(gsb, gsm->flags);
+	if (rc < 0)
+		return rc;
+
+	rc = gsm_refresh_info(gsm, gsb);
+	if (rc < 0)
+		return rc;
+	return 0;
+}
+
+/**
+ * gsb_recv - receive a single guest state ID
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ * @iden: guest state identity
+ */
+static inline int gsb_receive_datum(struct gs_buff *gsb, struct gs_msg *gsm,
+				    u16 iden)
+{
+	int rc;
+
+	gsm_include(gsm, iden);
+	rc = gsb_receive_data(gsb, gsm);
+	if (rc < 0)
+		return rc;
+	gsm_reset(gsm);
+	return 0;
+}
+
+/**
+ * gsb_send_data - flexibly send values from a guest state buffer
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ *
+ * Sends the guest state values included in the guest state message.
+ */
+static inline int gsb_send_data(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	int rc;
+
+	rc = gsm_fill_info(gsm, gsb);
+	if (rc < 0)
+		return rc;
+	rc = gsb_send(gsb, gsm->flags);
+
+	return rc;
+}
+
+/**
+ * gsb_recv - send a single guest state ID
+ * @gsb: guest state buffer
+ * @gsm: guest state message
+ * @iden: guest state identity
+ */
+static inline int gsb_send_datum(struct gs_buff *gsb, struct gs_msg *gsm,
+				 u16 iden)
+{
+	int rc;
+
+	gsm_include(gsm, iden);
+	rc = gsb_send_data(gsb, gsm);
+	if (rc < 0)
+		return rc;
+	gsm_reset(gsm);
+	return 0;
+}
+
 #endif /* _ASM_POWERPC_GUEST_STATE_BUFFER_H */
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index c099780385dd..ddb99e982917 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -100,6 +100,18 @@
 #define H_COP_HW	-74
 #define H_STATE		-75
 #define H_IN_USE	-77
+
+#define H_INVALID_ELEMENT_ID			-79
+#define H_INVALID_ELEMENT_SIZE			-80
+#define H_INVALID_ELEMENT_VALUE			-81
+#define H_INPUT_BUFFER_NOT_DEFINED		-82
+#define H_INPUT_BUFFER_TOO_SMALL		-83
+#define H_OUTPUT_BUFFER_NOT_DEFINED		-84
+#define H_OUTPUT_BUFFER_TOO_SMALL		-85
+#define H_PARTITION_PAGE_TABLE_NOT_DEFINED	-86
+#define H_GUEST_VCPU_STATE_NOT_HV_OWNED		-87
+
+
 #define H_UNSUPPORTED_FLAG_START	-256
 #define H_UNSUPPORTED_FLAG_END		-511
 #define H_MULTI_THREADS_ACTIVE	-9005
@@ -381,6 +393,15 @@
 #define H_ENTER_NESTED		0xF804
 #define H_TLB_INVALIDATE	0xF808
 #define H_COPY_TOFROM_GUEST	0xF80C
+#define H_GUEST_GET_CAPABILITIES 0x460
+#define H_GUEST_SET_CAPABILITIES 0x464
+#define H_GUEST_CREATE		0x470
+#define H_GUEST_CREATE_VCPU	0x474
+#define H_GUEST_GET_STATE	0x478
+#define H_GUEST_SET_STATE	0x47C
+#define H_GUEST_RUN_VCPU	0x480
+#define H_GUEST_COPY_MEMORY	0x484
+#define H_GUEST_DELETE		0x488
 
 /* Flags for H_SVM_PAGE_IN */
 #define H_PAGE_IN_SHARED        0x1
@@ -467,6 +488,15 @@
 #define H_RPTI_PAGE_1G	0x08
 #define H_RPTI_PAGE_ALL (-1UL)
 
+/* Flags for H_GUEST_{S,G}_STATE */
+#define H_GUEST_FLAGS_WIDE     (1UL<<(63-0))
+
+/* Flag values used for H_{S,G}SET_GUEST_CAPABILITIES */
+#define H_GUEST_CAP_COPY_MEM	(1UL<<(63-0))
+#define H_GUEST_CAP_POWER9	(1UL<<(63-1))
+#define H_GUEST_CAP_POWER10	(1UL<<(63-2))
+#define H_GUEST_CAP_BITMAP2	(1UL<<(63-63))
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 0ca2d8b37b42..c5c57552b447 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -12,6 +12,7 @@
 #include <linux/types.h>
 #include <linux/kvm_host.h>
 #include <asm/kvm_book3s_asm.h>
+#include <asm/guest-state-buffer.h>
 
 struct kvmppc_bat {
 	u64 raw;
@@ -316,6 +317,57 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu);
 
 void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
 
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+
+extern bool __kvmhv_on_papr;
+
+static inline bool kvmhv_on_papr(void)
+{
+	return __kvmhv_on_papr;
+}
+
+#else
+
+static inline bool kvmhv_on_papr(void)
+{
+	return false;
+}
+
+#endif
+
+int __kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs);
+int __kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs);
+int __kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden);
+int __kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden);
+
+static inline int kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_reload_ptregs(vcpu, regs);
+	return 0;
+}
+static inline int kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_mark_dirty_ptregs(vcpu, regs);
+	return 0;
+}
+
+static inline int kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_mark_dirty(vcpu, iden);
+	return 0;
+}
+
+static inline int kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden)
+{
+	if (kvmhv_on_papr())
+		return __kvmhv_papr_cached_reload(vcpu, iden);
+	return 0;
+}
+
 extern int kvm_irq_bypass;
 
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
@@ -335,70 +387,84 @@ static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
 {
 	vcpu->arch.regs.gpr[num] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_GPR(num));
 }
 
 static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_GPR(num));
 	return vcpu->arch.regs.gpr[num];
 }
 
 static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.regs.ccr = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_CR);
 }
 
 static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_CR);
 	return vcpu->arch.regs.ccr;
 }
 
 static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.xer = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_XER);
 }
 
 static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_XER);
 	return vcpu->arch.regs.xer;
 }
 
 static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.ctr = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_CTR);
 }
 
 static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_CTR);
 	return vcpu->arch.regs.ctr;
 }
 
 static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.link = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_LR);
 }
 
 static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_LR);
 	return vcpu->arch.regs.link;
 }
 
 static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val)
 {
 	vcpu->arch.regs.nip = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_NIA);
 }
 
 static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_NIA);
 	return vcpu->arch.regs.nip;
 }
 
 static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.pid = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_PIDR);
 }
 
 static inline u32 kvmppc_get_pid(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_PIDR);
 	return vcpu->arch.pid;
 }
 
@@ -415,111 +481,129 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 
 static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSRS(i));
 	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
 }
 
 static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
 {
 	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSRS(i));
 }
 
 static inline u64 kvmppc_get_fpscr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_FPSCR);
 	return vcpu->arch.fp.fpscr;
 }
 
 static inline void kvmppc_set_fpscr(struct kvm_vcpu *vcpu, u64 val)
 {
 	vcpu->arch.fp.fpscr = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_FPSCR);
 }
 
 
 static inline u64 kvmppc_get_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSRS(i));
 	return vcpu->arch.fp.fpr[i][j];
 }
 
 static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 val)
 {
 	vcpu->arch.fp.fpr[i][j] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSRS(i));
 }
 
 #ifdef CONFIG_VSX
 static inline vector128 kvmppc_get_vsx_vr(struct kvm_vcpu *vcpu, int i)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSRS(32 + i));
 	return vcpu->arch.vr.vr[i];
 }
 
 static inline void kvmppc_set_vsx_vr(struct kvm_vcpu *vcpu, int i, vector128 val)
 {
 	vcpu->arch.vr.vr[i] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSRS(32 + i));
 }
 
 static inline u32 kvmppc_get_vscr(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_VSCR);
 	return vcpu->arch.vr.vscr.u[3];
 }
 
 static inline void kvmppc_set_vscr(struct kvm_vcpu *vcpu, u32 val)
 {
 	vcpu->arch.vr.vscr.u[3] = val;
+	kvmhv_papr_mark_dirty(vcpu, GSID_VSCR);
 }
 #endif
 
-#define BOOK3S_WRAPPER_SET(reg, size)					\
+#define BOOK3S_WRAPPER_SET(reg, size, iden)				\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 									\
 	vcpu->arch.reg = val;						\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }
 
-#define BOOK3S_WRAPPER_GET(reg, size)					\
+#define BOOK3S_WRAPPER_GET(reg, size, iden)				\
 static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	return vcpu->arch.reg;						\
 }
 
-#define BOOK3S_WRAPPER(reg, size)					\
-	BOOK3S_WRAPPER_SET(reg, size)					\
-	BOOK3S_WRAPPER_GET(reg, size)					\
+#define BOOK3S_WRAPPER(reg, size, iden)					\
+	BOOK3S_WRAPPER_SET(reg, size, iden)				\
+	BOOK3S_WRAPPER_GET(reg, size, iden)				\
 
-BOOK3S_WRAPPER(tar, 64)
-BOOK3S_WRAPPER(ebbhr, 64)
-BOOK3S_WRAPPER(ebbrr, 64)
-BOOK3S_WRAPPER(bescr, 64)
-BOOK3S_WRAPPER(ic, 64)
-BOOK3S_WRAPPER(vrsave, 64)
+BOOK3S_WRAPPER(tar, 64, GSID_TAR)
+BOOK3S_WRAPPER(ebbhr, 64, GSID_EBBHR)
+BOOK3S_WRAPPER(ebbrr, 64, GSID_EBBRR)
+BOOK3S_WRAPPER(bescr, 64, GSID_BESCR)
+BOOK3S_WRAPPER(ic, 64, GSID_IC)
+BOOK3S_WRAPPER(vrsave, 64, GSID_VRSAVE)
 
 
-#define VCORE_WRAPPER_SET(reg, size)					\
+#define VCORE_WRAPPER_SET(reg, size, iden)				\
 static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 	vcpu->arch.vcore->reg = val;					\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }
 
-#define VCORE_WRAPPER_GET(reg, size)					\
+#define VCORE_WRAPPER_GET(reg, size, iden)				\
 static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	return vcpu->arch.vcore->reg;					\
 }
 
-#define VCORE_WRAPPER(reg, size)					\
-	VCORE_WRAPPER_SET(reg, size)					\
-	VCORE_WRAPPER_GET(reg, size)					\
+#define VCORE_WRAPPER(reg, size, iden)					\
+	VCORE_WRAPPER_SET(reg, size, iden)				\
+	VCORE_WRAPPER_GET(reg, size, iden)				\
 
 
-VCORE_WRAPPER(vtb, 64)
-VCORE_WRAPPER(tb_offset, 64)
-VCORE_WRAPPER(lpcr, 64)
+VCORE_WRAPPER(vtb, 64, GSID_VTB)
+VCORE_WRAPPER(tb_offset, 64, GSID_TB_OFFSET)
+VCORE_WRAPPER(lpcr, 64, GSID_LPCR)
 
 static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
 {
+	kvmhv_papr_cached_reload(vcpu, GSID_TB_OFFSET);
+	kvmhv_papr_cached_reload(vcpu, GSID_DEC_EXPIRY_TB);
 	return vcpu->arch.dec_expires;
 }
 
 static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
 {
 	vcpu->arch.dec_expires = val;
+	kvmhv_papr_cached_reload(vcpu, GSID_TB_OFFSET);
+	kvmhv_papr_mark_dirty(vcpu, GSID_DEC_EXPIRY_TB);
 }
 
 /* Expiry time of vcpu DEC relative to host TB */
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index d49065af08e9..689e14284127 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -677,6 +677,12 @@ static inline pte_t *find_kvm_host_pte(struct kvm *kvm, unsigned long mmu_seq,
 extern pte_t *find_kvm_nested_guest_pte(struct kvm *kvm, unsigned long lpid,
 					unsigned long ea, unsigned *hshift);
 
+int kvmhv_papr_vcpu_create(struct kvm_vcpu *vcpu, struct kvmhv_papr_host *nested_state);
+void kvmhv_papr_vcpu_free(struct kvm_vcpu *vcpu, struct kvmhv_papr_host *nested_state);
+int kvmhv_papr_flush_vcpu(struct kvm_vcpu *vcpu, u64 time_limit);
+int kvmhv_papr_set_ptbl_entry(u64 lpid, u64 dw0, u64 dw1);
+int kvmhv_papr_parse_output(struct kvm_vcpu *vcpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 14ee0dece853..21e8bf9e530a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include <asm/cacheflush.h>
 #include <asm/hvcall.h>
 #include <asm/mce.h>
+#include <asm/guest-state-buffer.h>
 
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
@@ -509,6 +510,23 @@ union xive_tma_w01 {
 	__be64 w01;
 };
 
+ /* Nested PAPR host H_GUEST_RUN_VCPU configuration */
+struct kvmhv_papr_config {
+	struct gs_buff_info vcpu_run_output_cfg;
+	struct gs_buff_info vcpu_run_input_cfg;
+	u64 vcpu_run_output_size;
+};
+
+ /* Nested PAPR host state */
+struct kvmhv_papr_host {
+	struct kvmhv_papr_config cfg;
+	struct gs_buff *vcpu_run_output;
+	struct gs_buff *vcpu_run_input;
+	struct gs_msg *vcpu_message;
+	struct gs_msg *vcore_message;
+	struct gs_bitmap valids;
+};
+
 struct kvm_vcpu_arch {
 	ulong host_stack;
 	u32 host_pid;
@@ -575,6 +593,7 @@ struct kvm_vcpu_arch {
 	ulong dscr;
 	ulong amr;
 	ulong uamor;
+	ulong amor;
 	ulong iamr;
 	u32 ctrl;
 	u32 dabrx;
@@ -829,6 +848,8 @@ struct kvm_vcpu_arch {
 	u64 nested_hfscr;	/* HFSCR that the L1 requested for the nested guest */
 	u32 nested_vcpu_id;
 	gpa_t nested_io_gpr;
+	/* For nested APIv2 guests*/
+	struct kvmhv_papr_host papr_host;
 #endif
 
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index fbac353ac46b..4d43bb29ba7c 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -615,6 +615,35 @@ static inline bool kvmhv_on_pseries(void)
 {
 	return false;
 }
+
+#endif
+
+#ifndef CONFIG_PPC_BOOK3S
+
+static inline bool kvmhv_on_papr(void)
+{
+	return false;
+}
+
+static inline int kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	return 0;
+}
+static inline int kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	return 0;
+}
+
+static inline int kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden)
+{
+	return 0;
+}
+
+static inline int kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden)
+{
+	return 0;
+}
+
 #endif
 
 #ifdef CONFIG_KVM_XICS
@@ -957,31 +986,33 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
 }									\
 
-#define SHARED_CACHE_WRAPPER_GET(reg, size)				\
+#define SHARED_CACHE_WRAPPER_GET(reg, size, iden)			\
 static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	if (kvmppc_shared_big_endian(vcpu))				\
 	       return be##size##_to_cpu(vcpu->arch.shared->reg);	\
 	else								\
 	       return le##size##_to_cpu(vcpu->arch.shared->reg);	\
 }									\
 
-#define SHARED_CACHE_WRAPPER_SET(reg, size)				\
+#define SHARED_CACHE_WRAPPER_SET(reg, size, iden)			\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 	if (kvmppc_shared_big_endian(vcpu))				\
 	       vcpu->arch.shared->reg = cpu_to_be##size(val);		\
 	else								\
 	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }									\
 
 #define SHARED_WRAPPER(reg, size)					\
 	SHARED_WRAPPER_GET(reg, size)					\
 	SHARED_WRAPPER_SET(reg, size)					\
 
-#define SHARED_CACHE_WRAPPER(reg, size)					\
-	SHARED_CACHE_WRAPPER_GET(reg, size)				\
-	SHARED_CACHE_WRAPPER_SET(reg, size)				\
+#define SHARED_CACHE_WRAPPER(reg, size, iden)				\
+	SHARED_CACHE_WRAPPER_GET(reg, size, iden)			\
+	SHARED_CACHE_WRAPPER_SET(reg, size, iden)			\
 
 #define SPRNG_WRAPPER(reg, bookehv_spr)					\
 	SPRNG_WRAPPER_GET(reg, bookehv_spr)				\
@@ -1000,29 +1031,30 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
 #define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)			\
 	SHARED_WRAPPER(reg, size)					\
 
-#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr)		\
-	SHARED_CACHE_WRAPPER(reg, size)					\
+#define SHARED_SPRNG_CACHE_WRAPPER(reg, size, bookehv_spr, iden)	\
+	SHARED_CACHE_WRAPPER(reg, size, iden)				\
 
 #endif
 
 SHARED_WRAPPER(critical, 64)
-SHARED_SPRNG_CACHE_WRAPPER(sprg0, 64, SPRN_GSPRG0)
-SHARED_SPRNG_CACHE_WRAPPER(sprg1, 64, SPRN_GSPRG1)
-SHARED_SPRNG_CACHE_WRAPPER(sprg2, 64, SPRN_GSPRG2)
-SHARED_SPRNG_CACHE_WRAPPER(sprg3, 64, SPRN_GSPRG3)
-SHARED_SPRNG_CACHE_WRAPPER(srr0, 64, SPRN_GSRR0)
-SHARED_SPRNG_CACHE_WRAPPER(srr1, 64, SPRN_GSRR1)
-SHARED_SPRNG_CACHE_WRAPPER(dar, 64, SPRN_GDEAR)
+SHARED_SPRNG_CACHE_WRAPPER(sprg0, 64, SPRN_GSPRG0, GSID_SPRG0)
+SHARED_SPRNG_CACHE_WRAPPER(sprg1, 64, SPRN_GSPRG1, GSID_SPRG1)
+SHARED_SPRNG_CACHE_WRAPPER(sprg2, 64, SPRN_GSPRG2, GSID_SPRG2)
+SHARED_SPRNG_CACHE_WRAPPER(sprg3, 64, SPRN_GSPRG3, GSID_SPRG3)
+SHARED_SPRNG_CACHE_WRAPPER(srr0, 64, SPRN_GSRR0, GSID_SRR0)
+SHARED_SPRNG_CACHE_WRAPPER(srr1, 64, SPRN_GSRR1, GSID_SRR1)
+SHARED_SPRNG_CACHE_WRAPPER(dar, 64, SPRN_GDEAR, GSID_DAR)
 SHARED_SPRNG_WRAPPER(esr, 64, SPRN_GESR)
-SHARED_CACHE_WRAPPER_GET(msr, 64)
+SHARED_CACHE_WRAPPER_GET(msr, 64, GSID_MSR)
 static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val)
 {
 	if (kvmppc_shared_big_endian(vcpu))
 	       vcpu->arch.shared->msr = cpu_to_be64(val);
 	else
 	       vcpu->arch.shared->msr = cpu_to_le64(val);
+	kvmhv_papr_mark_dirty(vcpu, GSID_MSR);
 }
-SHARED_CACHE_WRAPPER(dsisr, 32)
+SHARED_CACHE_WRAPPER(dsisr, 32, GSID_DSISR)
 SHARED_WRAPPER(int_pending, 32)
 SHARED_WRAPPER(sprg4, 64)
 SHARED_WRAPPER(sprg5, 64)
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index 8239c0af5eb2..b48f90884522 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -6,6 +6,7 @@
 
 #include <linux/string.h>
 #include <linux/irqflags.h>
+#include <linux/delay.h>
 
 #include <asm/hvcall.h>
 #include <asm/paca.h>
@@ -342,6 +343,203 @@ static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
 	return rc;
 }
 
+static inline long plpar_guest_create(unsigned long flags, unsigned long *guest_id)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	unsigned long token;
+	long rc;
+
+	token = -1UL;
+	while (true) {
+		rc = plpar_hcall(H_GUEST_CREATE, retbuf, flags, token);
+		if (rc = H_SUCCESS) {
+			*guest_id = retbuf[0];
+			break;
+		}
+
+		if (rc = H_BUSY) {
+			token = retbuf[0];
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			token = retbuf[0];
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_create_vcpu(unsigned long flags,
+					   unsigned long guest_id,
+					   unsigned long vcpu_id)
+{
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall_norets(H_GUEST_CREATE_VCPU, 0, guest_id, vcpu_id);
+
+		if (rc = H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_set_state(unsigned long flags,
+					 unsigned long guest_id,
+					 unsigned long vcpu_id,
+					 unsigned long data_buffer,
+					 unsigned long data_size,
+					 unsigned long *failed_index)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall(H_GUEST_SET_STATE, retbuf, flags, guest_id,
+				 vcpu_id, data_buffer, data_size);
+
+		if (rc = H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		if (rc = H_INVALID_ELEMENT_ID)
+			*failed_index = retbuf[0];
+		else if (rc = H_INVALID_ELEMENT_SIZE)
+			*failed_index = retbuf[0];
+		else if (rc = H_INVALID_ELEMENT_VALUE)
+			*failed_index = retbuf[0];
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_get_state(unsigned long flags,
+					 unsigned long guest_id,
+					 unsigned long vcpu_id,
+					 unsigned long data_buffer,
+					 unsigned long data_size,
+					 unsigned long *failed_index)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall(H_GUEST_GET_STATE, retbuf, flags, guest_id,
+				 vcpu_id, data_buffer, data_size);
+
+		if (rc = H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		if (rc = H_INVALID_ELEMENT_ID)
+			*failed_index = retbuf[0];
+		else if (rc = H_INVALID_ELEMENT_SIZE)
+			*failed_index = retbuf[0];
+		else if (rc = H_INVALID_ELEMENT_VALUE)
+			*failed_index = retbuf[0];
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_run_vcpu(unsigned long flags, unsigned long guest_id,
+					unsigned long vcpu_id, int *trap,
+					unsigned long *failed_index)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	rc = plpar_hcall(H_GUEST_RUN_VCPU, retbuf, flags, guest_id, vcpu_id);
+	if (rc = H_SUCCESS)
+		*trap = retbuf[0];
+	else if (rc = H_INVALID_ELEMENT_ID)
+		*failed_index = retbuf[0];
+	else if (rc = H_INVALID_ELEMENT_SIZE)
+		*failed_index = retbuf[0];
+	else if (rc = H_INVALID_ELEMENT_VALUE)
+		*failed_index = retbuf[0];
+
+	return rc;
+}
+
+static inline long plpar_guest_delete(unsigned long flags, u64 guest_id)
+{
+	long rc;
+
+	while (true) {
+		rc = plpar_hcall_norets(H_GUEST_DELETE, flags, guest_id);
+		if (rc = H_BUSY) {
+			cpu_relax();
+			continue;
+		}
+
+		if (H_IS_LONG_BUSY(rc)) {
+			mdelay(get_longbusy_msecs(rc));
+			continue;
+		}
+
+		break;
+	}
+
+	return rc;
+}
+
+static inline long plpar_guest_set_capabilities(unsigned long flags,
+						unsigned long capabilities)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	rc = plpar_hcall(H_GUEST_SET_CAPABILITIES, retbuf, flags, capabilities);
+
+	return rc;
+}
+
+static inline long plpar_guest_get_capabilities(unsigned long flags,
+						unsigned long *capabilities)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long rc;
+
+	rc = plpar_hcall(H_GUEST_GET_CAPABILITIES, retbuf, flags);
+	if (rc = H_SUCCESS)
+		*capabilities = retbuf[0];
+
+	return rc;
+}
+
 /*
  * Wrapper to H_RPT_INVALIDATE hcall that handles return values appropriately
  *
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index eb8445e71c14..9bb0876521ee 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -87,6 +87,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
 	book3s_hv_ras.o \
 	book3s_hv_builtin.o \
 	book3s_hv_p9_perf.o \
+	book3s_hv_papr.o \
 	guest-state-buffer.o \
 	$(kvm-book3s_64-builtin-tm-objs-y) \
 	$(kvm-book3s_64-builtin-xics-objs-y)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 521d84621422..f22ee582e209 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -383,6 +383,11 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 	spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
 }
 
+static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
+{
+	vcpu->arch.pvr = pvr;
+}
+
 /* Dummy value used in computing PCR value below */
 #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
 
@@ -1262,13 +1267,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 			return RESUME_HOST;
 		break;
 #endif
-	case H_RANDOM:
+	case H_RANDOM: {
 		unsigned long rand;
 
 		if (!arch_get_random_seed_longs(&rand, 1))
 			ret = H_HARDWARE;
 		kvmppc_set_gpr(vcpu, 4, rand);
 		break;
+	}
 	case H_RPT_INVALIDATE:
 		ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
 					      kvmppc_get_gpr(vcpu, 5),
@@ -2921,14 +2927,21 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 	vcpu->arch.shared_big_endian = false;
 #endif
 #endif
-	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
 
+	if (kvmhv_on_papr()) {
+		err = kvmhv_papr_vcpu_create(vcpu, &vcpu->arch.papr_host);
+		if (err < 0)
+			return err;
+	}
+
+	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
 	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
 		kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
 		kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
 	}
 
 	kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
+	kvmppc_set_amor_hv(vcpu, ~0);
 	/* default to host PVR, since we can't spoof it */
 	kvmppc_set_pvr_hv(vcpu, mfspr(SPRN_PVR));
 	spin_lock_init(&vcpu->arch.vpa_update_lock);
@@ -3006,6 +3019,8 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 			kvm->arch.vcores[core] = vcore;
 			kvm->arch.online_vcores++;
 			mutex_unlock(&kvm->arch.mmu_setup_lock);
+			if (kvmhv_on_papr())
+				kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
 		}
 	}
 	mutex_unlock(&kvm->lock);
@@ -3078,6 +3093,8 @@ static void kvmppc_core_vcpu_free_hv(struct kvm_vcpu *vcpu)
 	unpin_vpa(vcpu->kvm, &vcpu->arch.slb_shadow);
 	unpin_vpa(vcpu->kvm, &vcpu->arch.vpa);
 	spin_unlock(&vcpu->arch.vpa_update_lock);
+	if (kvmhv_on_papr())
+		kvmhv_papr_vcpu_free(vcpu, &vcpu->arch.papr_host);
 }
 
 static int kvmppc_core_check_requests_hv(struct kvm_vcpu *vcpu)
@@ -4042,6 +4059,50 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
 	}
 }
 
+static int kvmhv_vcpu_entry_papr(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
+{
+	struct kvmhv_papr_host *ph;
+	unsigned long msr, i;
+	int trap;
+	long rc;
+
+	ph = &vcpu->arch.papr_host;
+
+	msr = mfmsr();
+	kvmppc_msr_hard_disable_set_facilities(vcpu, msr);
+	if (lazy_irq_pending())
+		return 0;
+
+	kvmhv_papr_flush_vcpu(vcpu, time_limit);
+
+	accumulate_time(vcpu, &vcpu->arch.in_guest);
+	rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
+				  &trap, &i);
+
+	if (rc != H_SUCCESS) {
+		pr_err("KVM Guest Run VCPU hcall failed\n");
+		if (rc = H_INVALID_ELEMENT_ID)
+			pr_err("KVM: Guest Run VCPU invalid element id at %ld\n", i);
+		else if (rc = H_INVALID_ELEMENT_SIZE)
+			pr_err("KVM: Guest Run VCPU invalid element size at %ld\n", i);
+		else if (rc = H_INVALID_ELEMENT_VALUE)
+			pr_err("KVM: Guest Run VCPU invalid element value at %ld\n", i);
+		return 0;
+	}
+	accumulate_time(vcpu, &vcpu->arch.guest_exit);
+
+	*tb = mftb();
+	gsm_reset(ph->vcpu_message);
+	gsm_reset(ph->vcore_message);
+	gsbm_zero(&ph->valids);
+
+	kvmhv_papr_parse_output(vcpu);
+
+	timer_rearm_host_dec(*tb);
+
+	return trap;
+}
+
 /* call our hypervisor to load up HV regs and go */
 static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
 {
@@ -4159,7 +4220,10 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 	vcpu_vpa_increment_dispatch(vcpu);
 
 	if (kvmhv_on_pseries()) {
-		trap = kvmhv_vcpu_entry_p9_nested(vcpu, time_limit, lpcr, tb);
+		if (!kvmhv_on_papr())
+			trap = kvmhv_vcpu_entry_p9_nested(vcpu, time_limit, lpcr, tb);
+		else
+			trap = kvmhv_vcpu_entry_papr(vcpu, time_limit, lpcr, tb);
 
 		/* H_CEDE has to be handled now, not later */
 		if (trap = BOOK3S_INTERRUPT_SYSCALL && !nested &&
@@ -5119,6 +5183,7 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
  */
 void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
 {
+	struct kvm_vcpu *vcpu;
 	long int i;
 	u32 cores_done = 0;
 
@@ -5139,6 +5204,12 @@ void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
 		if (++cores_done >= kvm->arch.online_vcores)
 			break;
 	}
+
+	if (kvmhv_on_papr()) {
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
+		}
+	}
 }
 
 void kvmppc_setup_partition_table(struct kvm *kvm)
@@ -5405,15 +5476,43 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 
 	/* Allocate the guest's logical partition ID */
 
-	lpid = kvmppc_alloc_lpid();
-	if ((long)lpid < 0)
-		return -ENOMEM;
-	kvm->arch.lpid = lpid;
+	if (!kvmhv_on_papr()) {
+		lpid = kvmppc_alloc_lpid();
+		if ((long)lpid < 0)
+			return -ENOMEM;
+		kvm->arch.lpid = lpid;
+	}
 
 	kvmppc_alloc_host_rm_ops();
 
 	kvmhv_vm_nested_init(kvm);
 
+	if (kvmhv_on_papr()) {
+		long rc;
+		unsigned long guest_id;
+
+		rc = plpar_guest_create(0, &guest_id);
+
+		if (rc != H_SUCCESS)
+			pr_err("KVM: Create Guest hcall failed, rc=%ld\n", rc);
+
+		switch (rc) {
+		case H_PARAMETER:
+		case H_FUNCTION:
+		case H_STATE:
+			return -EINVAL;
+		case H_NOT_ENOUGH_RESOURCES:
+		case H_ABORTED:
+			return -ENOMEM;
+		case H_AUTHORITY:
+			return -EPERM;
+		case H_NOT_AVAILABLE:
+			return -EBUSY;
+		}
+		kvm->arch.lpid = guest_id;
+	}
+
+
 	/*
 	 * Since we don't flush the TLB when tearing down a VM,
 	 * and this lpid might have previously been used,
@@ -5483,7 +5582,10 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 			lpcr |= LPCR_HAIL;
 		ret = kvmppc_init_vm_radix(kvm);
 		if (ret) {
-			kvmppc_free_lpid(kvm->arch.lpid);
+			if (kvmhv_on_papr())
+				plpar_guest_delete(0, kvm->arch.lpid);
+			else
+				kvmppc_free_lpid(kvm->arch.lpid);
 			return ret;
 		}
 		kvmppc_setup_partition_table(kvm);
@@ -5573,10 +5675,14 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 		kvm->arch.process_table = 0;
 		if (kvm->arch.secure_guest)
 			uv_svm_terminate(kvm->arch.lpid);
-		kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
+		if (!kvmhv_on_papr())
+			kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
 	}
 
-	kvmppc_free_lpid(kvm->arch.lpid);
+	if (kvmhv_on_papr())
+		plpar_guest_delete(0, kvm->arch.lpid);
+	else
+		kvmppc_free_lpid(kvm->arch.lpid);
 
 	kvmppc_free_pimap(kvm);
 }
diff --git a/arch/powerpc/kvm/book3s_hv.h b/arch/powerpc/kvm/book3s_hv.h
index 7a7005189ab1..61d2c2b8d084 100644
--- a/arch/powerpc/kvm/book3s_hv.h
+++ b/arch/powerpc/kvm/book3s_hv.h
@@ -3,6 +3,8 @@
 /*
  * Privileged (non-hypervisor) host registers to save.
  */
+#include "asm/guest-state-buffer.h"
+
 struct p9_host_os_sprs {
 	unsigned long iamr;
 	unsigned long amr;
@@ -51,61 +53,65 @@ void accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next);
 #define end_timing(vcpu) do {} while (0)
 #endif
 
-#define HV_WRAPPER_SET(reg, size)					\
+#define HV_WRAPPER_SET(reg, size, iden)					\
 static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
 {									\
 	vcpu->arch.reg = val;						\
+	kvmhv_papr_mark_dirty(vcpu, iden);				\
 }
 
-#define HV_WRAPPER_GET(reg, size)					\
+#define HV_WRAPPER_GET(reg, size, iden)					\
 static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden);				\
 	return vcpu->arch.reg;						\
 }
 
-#define HV_WRAPPER(reg, size)						\
-	HV_WRAPPER_SET(reg, size)					\
-	HV_WRAPPER_GET(reg, size)					\
+#define HV_WRAPPER(reg, size, iden)					\
+	HV_WRAPPER_SET(reg, size, iden)					\
+	HV_WRAPPER_GET(reg, size, iden)					\
 
-#define HV_ARRAY_WRAPPER_SET(reg, size)					\
+#define HV_ARRAY_WRAPPER_SET(reg, size, iden)				\
 static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, int i, u##size val)	\
 {									\
 	vcpu->arch.reg[i] = val;					\
+	kvmhv_papr_mark_dirty(vcpu, iden(i));				\
 }
 
-#define HV_ARRAY_WRAPPER_GET(reg, size)					\
+#define HV_ARRAY_WRAPPER_GET(reg, size, iden)				\
 static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu, int i)	\
 {									\
+	kvmhv_papr_cached_reload(vcpu, iden(i));			\
 	return vcpu->arch.reg[i];					\
 }
 
-#define HV_ARRAY_WRAPPER(reg, size)					\
-	HV_ARRAY_WRAPPER_SET(reg, size)					\
-	HV_ARRAY_WRAPPER_GET(reg, size)					\
+#define HV_ARRAY_WRAPPER(reg, size, iden)				\
+	HV_ARRAY_WRAPPER_SET(reg, size, iden)				\
+	HV_ARRAY_WRAPPER_GET(reg, size, iden)				\
 
-HV_WRAPPER(mmcra, 64)
-HV_WRAPPER(hfscr, 64)
-HV_WRAPPER(fscr, 64)
-HV_WRAPPER(dscr, 64)
-HV_WRAPPER(purr, 64)
-HV_WRAPPER(spurr, 64)
-HV_WRAPPER(amr, 64)
-HV_WRAPPER(uamor, 64)
-HV_WRAPPER(siar, 64)
-HV_WRAPPER(sdar, 64)
-HV_WRAPPER(iamr, 64)
-HV_WRAPPER(dawr0, 64)
-HV_WRAPPER(dawr1, 64)
-HV_WRAPPER(dawrx0, 64)
-HV_WRAPPER(dawrx1, 64)
-HV_WRAPPER(ciabr, 64)
-HV_WRAPPER(wort, 64)
-HV_WRAPPER(ppr, 64)
-HV_WRAPPER(ctrl, 64)
+HV_WRAPPER(mmcra, 64, GSID_MMCRA)
+HV_WRAPPER(hfscr, 64, GSID_HFSCR)
+HV_WRAPPER(fscr, 64, GSID_FSCR)
+HV_WRAPPER(dscr, 64, GSID_DSCR)
+HV_WRAPPER(purr, 64, GSID_PURR)
+HV_WRAPPER(spurr, 64, GSID_SPURR)
+HV_WRAPPER(amr, 64, GSID_AMR)
+HV_WRAPPER(uamor, 64, GSID_UAMOR)
+HV_WRAPPER(siar, 64, GSID_SIAR)
+HV_WRAPPER(sdar, 64, GSID_SDAR)
+HV_WRAPPER(iamr, 64, GSID_IAMR)
+HV_WRAPPER(dawr0, 64, GSID_DAWR0)
+HV_WRAPPER(dawr1, 64, GSID_DAWR1)
+HV_WRAPPER(dawrx0, 64, GSID_DAWRX0)
+HV_WRAPPER(dawrx1, 64, GSID_DAWRX1)
+HV_WRAPPER(ciabr, 64, GSID_CIABR)
+HV_WRAPPER(wort, 64, GSID_WORT)
+HV_WRAPPER(ppr, 64, GSID_PPR)
+HV_WRAPPER(ctrl, 64, GSID_CTRL);
+HV_WRAPPER(amor, 64, GSID_AMOR)
 
-HV_ARRAY_WRAPPER(mmcr, 64)
-HV_ARRAY_WRAPPER(sier, 64)
-HV_ARRAY_WRAPPER(pmc, 32)
+HV_ARRAY_WRAPPER(mmcr, 64, GSID_MMCR)
+HV_ARRAY_WRAPPER(sier, 64, GSID_SIER)
+HV_ARRAY_WRAPPER(pmc, 32, GSID_PMC)
 
-HV_WRAPPER(pvr, 32)
-HV_WRAPPER(pspb, 32)
+HV_WRAPPER(pspb, 32, GSID_PSPB)
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 377d0b4a05ee..62e011d1e912 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -428,10 +428,12 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	return vcpu->arch.trap;
 }
 
+static unsigned long nested_capabilities;
+
 long kvmhv_nested_init(void)
 {
 	long int ptb_order;
-	unsigned long ptcr;
+	unsigned long ptcr, host_capabilities;
 	long rc;
 
 	if (!kvmhv_on_pseries())
@@ -439,6 +441,27 @@ long kvmhv_nested_init(void)
 	if (!radix_enabled())
 		return -ENODEV;
 
+	rc = plpar_guest_get_capabilities(0, &host_capabilities);
+	if (rc = H_SUCCESS) {
+		unsigned long capabilities = 0;
+
+		if (cpu_has_feature(CPU_FTR_ARCH_31))
+			capabilities |= H_GUEST_CAP_POWER10;
+		if (cpu_has_feature(CPU_FTR_ARCH_300))
+			capabilities |= H_GUEST_CAP_POWER9;
+
+		nested_capabilities = capabilities & host_capabilities;
+		rc = plpar_guest_set_capabilities(0, nested_capabilities);
+		if (rc != H_SUCCESS) {
+			pr_err("kvm-hv: Could not configure parent hypervisor capabilities (rc=%ld)",
+			       rc);
+			return -ENODEV;
+		}
+
+		__kvmhv_on_papr = true;
+		return 0;
+	}
+
 	/* Partition table entry is 1<<4 bytes in size, hence the 4. */
 	ptb_order = KVM_MAX_NESTED_GUESTS_SHIFT + 4;
 	/* Minimum partition table size is 1<<12 bytes */
@@ -507,10 +530,15 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1)
 		return;
 	}
 
-	pseries_partition_tb[lpid].patb0 = cpu_to_be64(dw0);
-	pseries_partition_tb[lpid].patb1 = cpu_to_be64(dw1);
-	/* L0 will do the necessary barriers */
-	kvmhv_flush_lpid(lpid);
+	if (!kvmhv_on_papr()) {
+		pseries_partition_tb[lpid].patb0 = cpu_to_be64(dw0);
+		pseries_partition_tb[lpid].patb1 = cpu_to_be64(dw1);
+		/* L0 will do the necessary barriers */
+		kvmhv_flush_lpid(lpid);
+	}
+
+	if (kvmhv_on_papr())
+		kvmhv_papr_set_ptbl_entry(lpid, dw0, dw1);
 }
 
 static void kvmhv_set_nested_ptbl(struct kvm_nested_guest *gp)
diff --git a/arch/powerpc/kvm/book3s_hv_papr.c b/arch/powerpc/kvm/book3s_hv_papr.c
new file mode 100644
index 000000000000..05d8e735e2a9
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_papr.c
@@ -0,0 +1,940 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2023 Jordan Niethe, IBM Corp. <jniethe5@gmail.com>
+ *
+ * Authors:
+ *    Jordan Niethe <jniethe5@gmail.com>
+ *
+ * Description: KVM functions specific to running on Book 3S
+ * processors as a PAPR guest.
+ *
+ */
+
+#include "linux/blk-mq.h"
+#include "linux/console.h"
+#include "linux/gfp_types.h"
+#include "linux/signal.h"
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/pgtable.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/hvcall.h>
+#include <asm/pgalloc.h>
+#include <asm/reg.h>
+#include <asm/plpar_wrappers.h>
+#include <asm/guest-state-buffer.h>
+#include "trace_hv.h"
+
+bool __kvmhv_on_papr __read_mostly;
+EXPORT_SYMBOL_GPL(__kvmhv_on_papr);
+
+
+static size_t gs_msg_ops_kvmhv_papr_config_get_size(struct gs_msg *gsm)
+{
+	u16 ids[] = {
+		GSID_RUN_OUTPUT_MIN_SIZE,
+		GSID_RUN_INPUT,
+		GSID_RUN_OUTPUT,
+
+	};
+	size_t size = 0;
+
+	for (int i = 0; i < ARRAY_SIZE(ids); i++)
+		size += gse_total_size(gsid_size(ids[i]));
+	return size;
+}
+
+static int gs_msg_ops_kvmhv_papr_config_fill_info(struct gs_buff *gsb,
+						  struct gs_msg *gsm)
+{
+	struct kvmhv_papr_config *cfg;
+	int rc;
+
+	cfg = gsm->data;
+
+	if (gsm_includes(gsm, GSID_RUN_OUTPUT_MIN_SIZE)) {
+		rc = gse_put(gsb, GSID_RUN_OUTPUT_MIN_SIZE,
+			     cfg->vcpu_run_output_size);
+		if (rc < 0)
+			return rc;
+	}
+
+	if (gsm_includes(gsm, GSID_RUN_INPUT)) {
+		rc = gse_put(gsb, GSID_RUN_INPUT, cfg->vcpu_run_input_cfg);
+		if (rc < 0)
+			return rc;
+	}
+
+	if (gsm_includes(gsm, GSID_RUN_OUTPUT)) {
+		gse_put(gsb, GSID_RUN_OUTPUT, cfg->vcpu_run_output_cfg);
+		if (rc < 0)
+			return rc;
+	}
+
+	return 0;
+}
+
+static int gs_msg_ops_kvmhv_papr_config_refresh_info(struct gs_msg *gsm,
+						     struct gs_buff *gsb)
+{
+	struct kvmhv_papr_config *cfg;
+	struct gs_parser gsp = { 0 };
+	struct gs_elem *gse;
+	int rc;
+
+	cfg = gsm->data;
+
+	rc = gse_parse(&gsp, gsb);
+	if (rc < 0)
+		return rc;
+
+	gse = gsp_lookup(&gsp, GSID_RUN_OUTPUT_MIN_SIZE);
+	if (gse)
+		gse_get(gse, &cfg->vcpu_run_output_size);
+	return 0;
+}
+
+static struct gs_msg_ops config_msg_ops = {
+	.get_size = gs_msg_ops_kvmhv_papr_config_get_size,
+	.fill_info = gs_msg_ops_kvmhv_papr_config_fill_info,
+	.refresh_info = gs_msg_ops_kvmhv_papr_config_refresh_info,
+};
+
+static size_t gs_msg_ops_vcpu_get_size(struct gs_msg *gsm)
+{
+	struct gs_bitmap gsbm = { 0 };
+	size_t size = 0;
+	u16 iden;
+
+	gsbm_fill(&gsbm);
+	gsbm_for_each(&gsbm, iden) {
+		switch (iden) {
+		case GSID_HOST_STATE_SIZE:
+		case GSID_RUN_OUTPUT_MIN_SIZE:
+		case GSID_PARTITION_TABLE:
+		case GSID_PROCESS_TABLE:
+		case GSID_RUN_INPUT:
+		case GSID_RUN_OUTPUT:
+			break;
+		default:
+			size += gse_total_size(gsid_size(iden));
+		}
+	}
+	return size;
+}
+
+static int gs_msg_ops_vcpu_fill_info(struct gs_buff *gsb, struct gs_msg *gsm)
+{
+	struct kvm_vcpu *vcpu;
+	vector128 v;
+	int rc, i;
+	u16 iden;
+
+	vcpu = gsm->data;
+
+	gsm_for_each(gsm, iden)
+	{
+		rc = 0;
+
+		if ((gsm->flags & GS_FLAGS_WIDE) !+		    (gsid_flags(iden) & GS_FLAGS_WIDE))
+			continue;
+
+		switch (iden) {
+		case GSID_DSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.dscr);
+			break;
+		case GSID_MMCRA:
+			rc = gse_put(gsb, iden, vcpu->arch.mmcra);
+			break;
+		case GSID_HFSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.hfscr);
+			break;
+		case GSID_PURR:
+			rc = gse_put(gsb, iden, vcpu->arch.purr);
+			break;
+		case GSID_SPURR:
+			rc = gse_put(gsb, iden, vcpu->arch.spurr);
+			break;
+		case GSID_AMR:
+			rc = gse_put(gsb, iden, vcpu->arch.amr);
+			break;
+		case GSID_UAMOR:
+			rc = gse_put(gsb, iden, vcpu->arch.uamor);
+			break;
+		case GSID_SIAR:
+			rc = gse_put(gsb, iden, vcpu->arch.siar);
+			break;
+		case GSID_SDAR:
+			rc = gse_put(gsb, iden, vcpu->arch.sdar);
+			break;
+		case GSID_IAMR:
+			rc = gse_put(gsb, iden, vcpu->arch.iamr);
+			break;
+		case GSID_DAWR0:
+			rc = gse_put(gsb, iden, vcpu->arch.dawr0);
+			break;
+		case GSID_DAWR1:
+			rc = gse_put(gsb, iden, vcpu->arch.dawr1);
+			break;
+		case GSID_DAWRX0:
+			rc = gse_put(gsb, iden, vcpu->arch.dawrx0);
+			break;
+		case GSID_DAWRX1:
+			rc = gse_put(gsb, iden, vcpu->arch.dawrx1);
+			break;
+		case GSID_CIABR:
+			rc = gse_put(gsb, iden, vcpu->arch.ciabr);
+			break;
+		case GSID_WORT:
+			rc = gse_put(gsb, iden, vcpu->arch.wort);
+			break;
+		case GSID_PPR:
+			rc = gse_put(gsb, iden, vcpu->arch.ppr);
+			break;
+		case GSID_PSPB:
+			rc = gse_put(gsb, iden, vcpu->arch.pspb);
+			break;
+		case GSID_TAR:
+			rc = gse_put(gsb, iden, vcpu->arch.tar);
+			break;
+		case GSID_FSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.fscr);
+			break;
+		case GSID_EBBHR:
+			rc = gse_put(gsb, iden, vcpu->arch.ebbhr);
+			break;
+		case GSID_EBBRR:
+			rc = gse_put(gsb, iden, vcpu->arch.ebbrr);
+			break;
+		case GSID_BESCR:
+			rc = gse_put(gsb, iden, vcpu->arch.bescr);
+			break;
+		case GSID_IC:
+			rc = gse_put(gsb, iden, vcpu->arch.ic);
+			break;
+		case GSID_CTRL:
+			rc = gse_put(gsb, iden, vcpu->arch.ctrl);
+			break;
+		case GSID_PIDR:
+			rc = gse_put(gsb, iden, vcpu->arch.pid);
+			break;
+		case GSID_AMOR:
+			rc = gse_put(gsb, iden, vcpu->arch.amor);
+			break;
+		case GSID_VRSAVE:
+			rc = gse_put(gsb, iden, vcpu->arch.vrsave);
+			break;
+		case GSID_MMCR(0) ... GSID_MMCR(3):
+			i = iden - GSID_MMCR(0);
+			rc = gse_put(gsb, iden, vcpu->arch.mmcr[i]);
+			break;
+		case GSID_SIER(0) ... GSID_SIER(2):
+			i = iden - GSID_SIER(0);
+			rc = gse_put(gsb, iden, vcpu->arch.sier[i]);
+			break;
+		case GSID_PMC(0) ... GSID_PMC(5):
+			i = iden - GSID_PMC(0);
+			rc = gse_put(gsb, iden, vcpu->arch.pmc[i]);
+			break;
+		case GSID_GPR(0) ... GSID_GPR(31):
+			i = iden - GSID_GPR(0);
+			rc = gse_put(gsb, iden, vcpu->arch.regs.gpr[i]);
+			break;
+		case GSID_CR:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.ccr);
+			break;
+		case GSID_XER:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.xer);
+			break;
+		case GSID_CTR:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.ctr);
+			break;
+		case GSID_LR:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.link);
+			break;
+		case GSID_NIA:
+			rc = gse_put(gsb, iden, vcpu->arch.regs.nip);
+			break;
+		case GSID_SRR0:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.srr0);
+			break;
+		case GSID_SRR1:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.srr1);
+			break;
+		case GSID_SPRG0:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg0);
+			break;
+		case GSID_SPRG1:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg1);
+			break;
+		case GSID_SPRG2:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg2);
+			break;
+		case GSID_SPRG3:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.sprg3);
+			break;
+		case GSID_DAR:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.dar);
+			break;
+		case GSID_DSISR:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.dsisr);
+			break;
+		case GSID_MSR:
+			rc = gse_put(gsb, iden, vcpu->arch.shregs.msr);
+			break;
+		case GSID_VTB:
+			rc = gse_put(gsb, iden, vcpu->arch.vcore->vtb);
+			break;
+		case GSID_LPCR:
+			rc = gse_put(gsb, iden, vcpu->arch.vcore->lpcr);
+			break;
+		case GSID_TB_OFFSET:
+			rc = gse_put(gsb, iden, vcpu->arch.vcore->tb_offset);
+			break;
+		case GSID_FPSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.fp.fpscr);
+			break;
+		case GSID_VSRS(0) ... GSID_VSRS(31):
+			i = iden - GSID_VSRS(0);
+			memcpy(&v, &vcpu->arch.fp.fpr[i],
+			       sizeof(vcpu->arch.fp.fpr[i]));
+			rc = gse_put(gsb, iden, v);
+			break;
+#ifdef CONFIG_VSX
+		case GSID_VSCR:
+			rc = gse_put(gsb, iden, vcpu->arch.vr.vscr.u[3]);
+			break;
+		case GSID_VSRS(32) ... GSID_VSRS(63):
+			i = iden - GSID_VSRS(32);
+			rc = gse_put(gsb, iden, vcpu->arch.vr.vr[i]);
+			break;
+#endif
+		case GSID_DEC_EXPIRY_TB: {
+			u64 dw;
+
+			dw = vcpu->arch.dec_expires -
+			     vcpu->arch.vcore->tb_offset;
+			rc = gse_put(gsb, iden, dw);
+		}
+			break;
+		}
+
+		if (rc < 0)
+			return rc;
+	}
+
+	return 0;
+}
+
+static int gs_msg_ops_vcpu_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
+{
+	struct gs_parser gsp = { 0 };
+	struct kvmhv_papr_host *ph;
+	struct gs_bitmap *valids;
+	struct kvm_vcpu *vcpu;
+	struct gs_elem *gse;
+	vector128 v;
+	int rc, i;
+	u16 iden;
+
+	vcpu = gsm->data;
+
+	rc = gse_parse(&gsp, gsb);
+	if (rc < 0)
+		return rc;
+
+	ph = &vcpu->arch.papr_host;
+	valids = &ph->valids;
+
+	gsp_for_each(&gsp, iden, gse)
+	{
+		switch (iden) {
+		case GSID_DSCR:
+			gse_get(gse, &vcpu->arch.dscr);
+			break;
+		case GSID_MMCRA:
+			gse_get(gse, &vcpu->arch.mmcra);
+			break;
+		case GSID_HFSCR:
+			gse_get(gse, &vcpu->arch.hfscr);
+			break;
+		case GSID_PURR:
+			gse_get(gse, &vcpu->arch.purr);
+			break;
+		case GSID_SPURR:
+			gse_get(gse, &vcpu->arch.spurr);
+			break;
+		case GSID_AMR:
+			gse_get(gse, &vcpu->arch.amr);
+			break;
+		case GSID_UAMOR:
+			gse_get(gse, &vcpu->arch.uamor);
+			break;
+		case GSID_SIAR:
+			gse_get(gse, &vcpu->arch.siar);
+			break;
+		case GSID_SDAR:
+			gse_get(gse, &vcpu->arch.sdar);
+			break;
+		case GSID_IAMR:
+			gse_get(gse, &vcpu->arch.iamr);
+			break;
+		case GSID_DAWR0:
+			gse_get(gse, &vcpu->arch.dawr0);
+			break;
+		case GSID_DAWR1:
+			gse_get(gse, &vcpu->arch.dawr1);
+			break;
+		case GSID_DAWRX0:
+			gse_get(gse, &vcpu->arch.dawrx0);
+			break;
+		case GSID_DAWRX1:
+			gse_get(gse, &vcpu->arch.dawrx1);
+			break;
+		case GSID_CIABR:
+			gse_get(gse, &vcpu->arch.ciabr);
+			break;
+		case GSID_WORT:
+			gse_get(gse, &vcpu->arch.wort);
+			break;
+		case GSID_PPR:
+			gse_get(gse, &vcpu->arch.ppr);
+			break;
+		case GSID_PSPB:
+			gse_get(gse, &vcpu->arch.pspb);
+			break;
+		case GSID_TAR:
+			gse_get(gse, &vcpu->arch.tar);
+			break;
+		case GSID_FSCR:
+			gse_get(gse, &vcpu->arch.fscr);
+			break;
+		case GSID_EBBHR:
+			gse_get(gse, &vcpu->arch.ebbhr);
+			break;
+		case GSID_EBBRR:
+			gse_get(gse, &vcpu->arch.ebbrr);
+			break;
+		case GSID_BESCR:
+			gse_get(gse, &vcpu->arch.bescr);
+			break;
+		case GSID_IC:
+			gse_get(gse, &vcpu->arch.ic);
+			break;
+		case GSID_CTRL:
+			gse_get(gse, &vcpu->arch.ctrl);
+			break;
+		case GSID_PIDR:
+			gse_get(gse, &vcpu->arch.pid);
+			break;
+		case GSID_AMOR:
+			gse_get(gse, &vcpu->arch.amor);
+			break;
+		case GSID_VRSAVE:
+			gse_get(gse, &vcpu->arch.vrsave);
+			break;
+		case GSID_MMCR(0) ... GSID_MMCR(3):
+			i = iden - GSID_MMCR(0);
+			gse_get(gse, &vcpu->arch.mmcr[i]);
+			break;
+		case GSID_SIER(0) ... GSID_SIER(2):
+			i = iden - GSID_SIER(0);
+			gse_get(gse, &vcpu->arch.sier[i]);
+			break;
+		case GSID_PMC(0) ... GSID_PMC(5):
+			i = iden - GSID_PMC(0);
+			gse_get(gse, &vcpu->arch.pmc[i]);
+			break;
+		case GSID_GPR(0) ... GSID_GPR(31):
+			i = iden - GSID_GPR(0);
+			gse_get(gse, &vcpu->arch.regs.gpr[i]);
+			break;
+		case GSID_CR:
+			gse_get(gse, &vcpu->arch.regs.ccr);
+			break;
+		case GSID_XER:
+			gse_get(gse, &vcpu->arch.regs.xer);
+			break;
+		case GSID_CTR:
+			gse_get(gse, &vcpu->arch.regs.ctr);
+			break;
+		case GSID_LR:
+			gse_get(gse, &vcpu->arch.regs.link);
+			break;
+		case GSID_NIA:
+			gse_get(gse, &vcpu->arch.regs.nip);
+			break;
+		case GSID_SRR0:
+			gse_get(gse, &vcpu->arch.shregs.srr0);
+			break;
+		case GSID_SRR1:
+			gse_get(gse, &vcpu->arch.shregs.srr1);
+			break;
+		case GSID_SPRG0:
+			gse_get(gse, &vcpu->arch.shregs.sprg0);
+			break;
+		case GSID_SPRG1:
+			gse_get(gse, &vcpu->arch.shregs.sprg1);
+			break;
+		case GSID_SPRG2:
+			gse_get(gse, &vcpu->arch.shregs.sprg2);
+			break;
+		case GSID_SPRG3:
+			gse_get(gse, &vcpu->arch.shregs.sprg3);
+			break;
+		case GSID_DAR:
+			gse_get(gse, &vcpu->arch.shregs.dar);
+			break;
+		case GSID_DSISR:
+			gse_get(gse, &vcpu->arch.shregs.dsisr);
+			break;
+		case GSID_MSR:
+			gse_get(gse, &vcpu->arch.shregs.msr);
+			break;
+		case GSID_VTB:
+			gse_get(gse, &vcpu->arch.vcore->vtb);
+			break;
+		case GSID_LPCR:
+			gse_get(gse, &vcpu->arch.vcore->lpcr);
+			break;
+		case GSID_TB_OFFSET:
+			gse_get(gse, &vcpu->arch.vcore->tb_offset);
+			break;
+		case GSID_FPSCR:
+			gse_get(gse, &vcpu->arch.fp.fpscr);
+			break;
+		case GSID_VSRS(0) ... GSID_VSRS(31):
+			gse_get(gse, &v);
+			i = iden - GSID_VSRS(0);
+			memcpy(&vcpu->arch.fp.fpr[i], &v,
+			       sizeof(vcpu->arch.fp.fpr[i]));
+			break;
+#ifdef CONFIG_VSX
+		case GSID_VSCR:
+			gse_get(gse, &vcpu->arch.vr.vscr.u[3]);
+			break;
+		case GSID_VSRS(32) ... GSID_VSRS(63):
+			i = iden - GSID_VSRS(32);
+			gse_get(gse, &vcpu->arch.vr.vr[i]);
+			break;
+#endif
+		case GSID_HDAR:
+			gse_get(gse, &vcpu->arch.fault_dar);
+			break;
+		case GSID_HDSISR:
+			gse_get(gse, &vcpu->arch.fault_dsisr);
+			break;
+		case GSID_ASDR:
+			gse_get(gse, &vcpu->arch.fault_gpa);
+			break;
+		case GSID_HEIR:
+			gse_get(gse, &vcpu->arch.emul_inst);
+			break;
+		case GSID_DEC_EXPIRY_TB: {
+			u64 dw;
+
+			gse_get(gse, &dw);
+			vcpu->arch.dec_expires +				dw + vcpu->arch.vcore->tb_offset;
+			break;
+		}
+		default:
+			continue;
+		}
+		gsbm_set(valids, iden);
+	}
+
+	return 0;
+}
+
+static struct gs_msg_ops vcpu_message_ops = {
+	.get_size = gs_msg_ops_vcpu_get_size,
+	.fill_info = gs_msg_ops_vcpu_fill_info,
+	.refresh_info = gs_msg_ops_vcpu_refresh_info,
+};
+
+static int kvmhv_papr_host_create(struct kvm_vcpu *vcpu,
+				  struct kvmhv_papr_host *ph)
+{
+	struct kvmhv_papr_config *cfg;
+	struct gs_buff *gsb, *vcpu_run_output, *vcpu_run_input;
+	unsigned long guest_id, vcpu_id;
+	struct gs_msg *gsm, *vcpu_message, *vcore_message;
+	int rc;
+
+	cfg = &ph->cfg;
+	guest_id = vcpu->kvm->arch.lpid;
+	vcpu_id = vcpu->vcpu_id;
+
+	gsm = gsm_new(&config_msg_ops, cfg, GS_FLAGS_WIDE, GFP_KERNEL);
+	if (!gsm) {
+		rc = -ENOMEM;
+		goto err;
+	}
+
+	gsb = gsb_new(gsm_size(gsm), guest_id, vcpu_id, GFP_KERNEL);
+	if (!gsb) {
+		rc = -ENOMEM;
+		goto free_gsm;
+	}
+
+	rc = gsb_receive_datum(gsb, gsm, GSID_RUN_OUTPUT_MIN_SIZE);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't get vcpu run output buffer minimum size\n");
+		goto free_gsb;
+	}
+
+	vcpu_run_output = gsb_new(cfg->vcpu_run_output_size, guest_id, vcpu_id, GFP_KERNEL);
+	if (!vcpu_run_output) {
+		rc = -ENOMEM;
+		goto free_gsb;
+	}
+
+	cfg->vcpu_run_output_cfg.address = gsb_paddress(vcpu_run_output);
+	cfg->vcpu_run_output_cfg.size = gsb_capacity(vcpu_run_output);
+	ph->vcpu_run_output = vcpu_run_output;
+
+	gsm->flags = 0;
+	rc = gsb_send_datum(gsb, gsm, GSID_RUN_OUTPUT);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set vcpu run output buffer\n");
+		goto free_gs_out;
+	}
+
+	vcpu_message = gsm_new(&vcpu_message_ops, vcpu, 0, GFP_KERNEL);
+	if (!vcpu_message) {
+		rc = -ENOMEM;
+		goto free_gs_out;
+	}
+	gsm_include_all(vcpu_message);
+
+	ph->vcpu_message = vcpu_message;
+
+	vcpu_run_input = gsb_new(gsm_size(vcpu_message), guest_id, vcpu_id, GFP_KERNEL);
+	if (!vcpu_run_input) {
+		rc = -ENOMEM;
+		goto free_vcpu_message;
+	}
+
+	ph->vcpu_run_input = vcpu_run_input;
+	cfg->vcpu_run_input_cfg.address = gsb_paddress(vcpu_run_input);
+	cfg->vcpu_run_input_cfg.size = gsb_capacity(vcpu_run_input);
+	rc = gsb_send_datum(gsb, gsm, GSID_RUN_INPUT);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set vcpu run input buffer\n");
+		goto free_vcpu_run_input;
+	}
+
+	vcore_message +		gsm_new(&vcpu_message_ops, vcpu, GS_FLAGS_WIDE, GFP_KERNEL);
+	if (!vcore_message) {
+		rc = -ENOMEM;
+		goto free_vcpu_run_input;
+	}
+
+	gsm_include_all(vcore_message);
+	ph->vcore_message = vcore_message;
+
+	gsbm_fill(&ph->valids);
+	gsm_free(gsm);
+	gsb_free(gsb);
+	return 0;
+
+free_vcpu_run_input:
+	gsb_free(vcpu_run_input);
+free_vcpu_message:
+	gsm_free(vcpu_message);
+free_gs_out:
+	gsb_free(vcpu_run_output);
+free_gsb:
+	gsb_free(gsb);
+free_gsm:
+	gsm_free(gsm);
+err:
+	return rc;
+}
+
+/**
+ * __kvmhv_papr_mark_dirty() - mark a Guest State ID to be sent to the host
+ * @vcpu: vcpu
+ * @iden: guest state ID
+ *
+ * Mark a guest state ID as having been changed by the L1 host and thus
+ * the new value must be sent to the L0 hypervisor. See kvmhv_papr_flush_vcpu()
+ */
+int __kvmhv_papr_mark_dirty(struct kvm_vcpu *vcpu, u16 iden)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_bitmap *valids;
+	struct gs_msg *gsm;
+
+	if (!iden)
+		return 0;
+
+	ph = &vcpu->arch.papr_host;
+	valids = &ph->valids;
+	gsm = ph->vcpu_message;
+	gsm_include(gsm, iden);
+	gsm = ph->vcore_message;
+	gsm_include(gsm, iden);
+	gsbm_set(valids, iden);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_mark_dirty);
+
+/**
+ * __kvmhv_papr_cached_reload() - reload a Guest State ID from the host
+ * @vcpu: vcpu
+ * @iden: guest state ID
+ *
+ * Reload the value for the guest state ID from the L0 host into the L1 host.
+ * This is cached so that going out to the L0 host only happens if necessary.
+ */
+int __kvmhv_papr_cached_reload(struct kvm_vcpu *vcpu, u16 iden)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_bitmap *valids;
+	struct gs_buff *gsb;
+	struct gs_msg gsm;
+	int rc;
+
+	if (!iden)
+		return 0;
+
+	ph = &vcpu->arch.papr_host;
+	valids = &ph->valids;
+	if (gsbm_test(valids, iden))
+		return 0;
+
+	gsb = ph->vcpu_run_input;
+	gsm_init(&gsm, &vcpu_message_ops, vcpu, gsid_flags(iden));
+	rc = gsb_receive_datum(gsb, &gsm, iden);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't get GSID: 0x%x\n", iden);
+		return rc;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_cached_reload);
+
+/**
+ * kvmhv_papr_flush_vcpu() - send modified Guest State IDs to the host
+ * @vcpu: vcpu
+ * @time_limit: hdec expiry tb
+ *
+ * Send the values marked by __kvmhv_papr_mark_dirty() to the L0 host. Thread
+ * wide values are copied to the H_GUEST_RUN_VCPU input buffer. Guest wide
+ * values need to be sent with H_GUEST_SET first.
+ *
+ * The hdec tb offset is always sent to L0 host.
+ */
+int kvmhv_papr_flush_vcpu(struct kvm_vcpu *vcpu, u64 time_limit)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_buff *gsb;
+	struct gs_msg *gsm;
+	int rc;
+
+	ph = &vcpu->arch.papr_host;
+	gsb = ph->vcpu_run_input;
+	gsm = ph->vcore_message;
+	rc = gsb_send_data(gsb, gsm);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set guest wide elements\n");
+		return rc;
+	}
+
+	gsm = ph->vcpu_message;
+	rc = gsm_fill_info(gsm, gsb);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't fill vcpu run input buffer\n");
+		return rc;
+	}
+
+	rc = gse_put(gsb, GSID_HDEC_EXPIRY_TB, time_limit);
+	if (rc < 0)
+		return rc;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_flush_vcpu);
+
+
+/**
+ * kvmhv_papr_set_ptbl_entry() - send partition and process table state to L0 host
+ * @lpid: guest id
+ * @dw0: partition table double word
+ * @dw1: process table double word
+ */
+int kvmhv_papr_set_ptbl_entry(u64 lpid, u64 dw0, u64 dw1)
+{
+	struct gs_part_table patbl;
+	struct gs_proc_table prtbl;
+	struct gs_buff *gsb;
+	size_t size;
+	int rc;
+
+	size = gse_total_size(gsid_size(GSID_PARTITION_TABLE)) +
+	       gse_total_size(gsid_size(GSID_PROCESS_TABLE)) +
+	       sizeof(struct gs_header);
+	gsb = gsb_new(size, lpid, 0, GFP_KERNEL);
+	if (!gsb)
+		return -ENOMEM;
+
+	patbl.address = dw0 & RPDB_MASK;
+	patbl.ea_bits = ((((dw0 & RTS1_MASK) >> (RTS1_SHIFT - 3)) |
+			  ((dw0 & RTS2_MASK) >> RTS2_SHIFT)) +
+			 31);
+	patbl.gpd_size = 1ul << ((dw0 & RPDS_MASK) + 3);
+	rc = gse_put(gsb, GSID_PARTITION_TABLE, patbl);
+	if (rc < 0)
+		goto free_gsb;
+
+	prtbl.address = dw1 & PRTB_MASK;
+	prtbl.gpd_size = 1ul << ((dw1 & PRTS_MASK) + 12);
+	rc = gse_put(gsb, GSID_PROCESS_TABLE, prtbl);
+	if (rc < 0)
+		goto free_gsb;
+
+	rc = gsb_send(gsb, GS_FLAGS_WIDE);
+	if (rc < 0) {
+		pr_err("KVM-PAPR: couldn't set the PATE\n");
+		goto free_gsb;
+	}
+
+	gsb_free(gsb);
+	return 0;
+
+free_gsb:
+	gsb_free(gsb);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_set_ptbl_entry);
+
+/**
+ * kvmhv_papr_parse_output() - receive values from H_GUEST_RUN_VCPU output
+ * @vcpu: vcpu
+ *
+ * Parse the output buffer from H_GUEST_RUN_VCPU to update vcpu.
+ */
+int kvmhv_papr_parse_output(struct kvm_vcpu *vcpu)
+{
+	struct kvmhv_papr_host *ph;
+	struct gs_buff *gsb;
+	struct gs_msg gsm;
+
+	ph = &vcpu->arch.papr_host;
+	gsb = ph->vcpu_run_output;
+
+	vcpu->arch.fault_dar = 0;
+	vcpu->arch.fault_dsisr = 0;
+	vcpu->arch.fault_gpa = 0;
+	vcpu->arch.emul_inst = KVM_INST_FETCH_FAILED;
+
+	gsm_init(&gsm, &vcpu_message_ops, vcpu, 0);
+	gsm_refresh_info(&gsm, gsb);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_parse_output);
+
+static void kvmhv_papr_host_free(struct kvm_vcpu *vcpu,
+				 struct kvmhv_papr_host *ph)
+{
+	gsm_free(ph->vcpu_message);
+	gsm_free(ph->vcore_message);
+	gsb_free(ph->vcpu_run_input);
+	gsb_free(ph->vcpu_run_output);
+}
+
+int __kvmhv_papr_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	int rc;
+
+	for (int i = 0; i < 32; i++) {
+		rc = kvmhv_papr_cached_reload(vcpu, GSID_GPR(i));
+		if (rc < 0)
+			return rc;
+	}
+
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_CR);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_XER);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_CTR);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_LR);
+	if (rc < 0)
+		return rc;
+	rc = kvmhv_papr_cached_reload(vcpu, GSID_NIA);
+	if (rc < 0)
+		return rc;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_reload_ptregs);
+
+int __kvmhv_papr_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs *regs)
+{
+	for (int i = 0; i < 32; i++)
+		kvmhv_papr_mark_dirty(vcpu, GSID_GPR(i));
+
+	kvmhv_papr_mark_dirty(vcpu, GSID_CR);
+	kvmhv_papr_mark_dirty(vcpu, GSID_XER);
+	kvmhv_papr_mark_dirty(vcpu, GSID_CTR);
+	kvmhv_papr_mark_dirty(vcpu, GSID_LR);
+	kvmhv_papr_mark_dirty(vcpu, GSID_NIA);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__kvmhv_papr_mark_dirty_ptregs);
+
+/**
+ * kvmhv_papr_vcpu_create() - create nested vcpu for the PAPR API
+ * @vcpu: vcpu
+ * @ph: PAPR nested host state
+ *
+ * Parse the output buffer from H_GUEST_RUN_VCPU to update vcpu.
+ */
+int kvmhv_papr_vcpu_create(struct kvm_vcpu *vcpu,
+			   struct kvmhv_papr_host *ph)
+{
+	long rc;
+
+	rc = plpar_guest_create_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id);
+
+	if (rc != H_SUCCESS) {
+		pr_err("KVM: Create Guest vcpu hcall failed, rc=%ld\n", rc);
+		switch (rc) {
+		case H_NOT_ENOUGH_RESOURCES:
+		case H_ABORTED:
+			return -ENOMEM;
+		case H_AUTHORITY:
+			return -EPERM;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	rc = kvmhv_papr_host_create(vcpu, ph);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_vcpu_create);
+
+/**
+ * kvmhv_papr_vcpu_free() - free the PAPR host state
+ * @vcpu: vcpu
+ * @ph: PAPR nested host state
+ */
+void kvmhv_papr_vcpu_free(struct kvm_vcpu *vcpu,
+			  struct kvmhv_papr_host *ph)
+{
+	kvmhv_papr_host_free(vcpu, ph);
+}
+EXPORT_SYMBOL_GPL(kvmhv_papr_vcpu_free);
diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c
index e6e66c3792f8..663403fa86d4 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -92,7 +92,8 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmio_host_swabbed = 0;
 
 	emulated = EMULATE_FAIL;
-	vcpu->arch.regs.msr = vcpu->arch.shared->msr;
+	vcpu->arch.regs.msr = kvmppc_get_msr(vcpu);
+	kvmhv_papr_reload_ptregs(vcpu, &vcpu->arch.regs);
 	if (analyse_instr(&op, &vcpu->arch.regs, inst) = 0) {
 		int type = op.type & INSTR_TYPE_MASK;
 		int size = GETSIZE(op.type);
@@ -357,6 +358,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 	}
 
 	trace_kvm_ppc_instr(ppc_inst_val(inst), kvmppc_get_pc(vcpu), emulated);
+	kvmhv_papr_mark_dirty_ptregs(vcpu, &vcpu->arch.regs);
 
 	/* Advance past emulated instruction. */
 	if (emulated != EMULATE_FAIL)
diff --git a/arch/powerpc/kvm/guest-state-buffer.c b/arch/powerpc/kvm/guest-state-buffer.c
index db4a79bfcaf1..cc3a7a416867 100644
--- a/arch/powerpc/kvm/guest-state-buffer.c
+++ b/arch/powerpc/kvm/guest-state-buffer.c
@@ -561,3 +561,52 @@ int gsm_refresh_info(struct gs_msg *gsm, struct gs_buff *gsb)
 	return gsm->ops->refresh_info(gsm, gsb);
 }
 EXPORT_SYMBOL(gsm_refresh_info);
+
+/**
+ * gsb_send - send all elements in the buffer to the hypervisor.
+ * @gsb: guest state buffer
+ * @flags: guest wide or thread wide
+ *
+ * Performs the H_GUEST_SET_STATE hcall for the guest state buffer.
+ */
+int gsb_send(struct gs_buff *gsb, unsigned long flags)
+{
+	unsigned long hflags = 0;
+	unsigned long i;
+	int rc;
+
+	if (gsb_nelems(gsb) = 0)
+		return 0;
+
+	if (flags & GS_FLAGS_WIDE)
+		hflags |= H_GUEST_FLAGS_WIDE;
+
+	rc = plpar_guest_set_state(hflags, gsb->guest_id, gsb->vcpu_id,
+				   __pa(gsb->hdr), gsb->capacity, &i);
+	return rc;
+}
+EXPORT_SYMBOL(gsb_send);
+
+/**
+ * gsb_recv - request all elements in the buffer have their value updated.
+ * @gsb: guest state buffer
+ * @flags: guest wide or thread wide
+ *
+ * Performs the H_GUEST_GET_STATE hcall for the guest state buffer.
+ * After returning from the hcall the guest state elements that were
+ * present in the buffer will have updated values from the hypervisor.
+ */
+int gsb_recv(struct gs_buff *gsb, unsigned long flags)
+{
+	unsigned long hflags = 0;
+	unsigned long i;
+	int rc;
+
+	if (flags & GS_FLAGS_WIDE)
+		hflags |= H_GUEST_FLAGS_WIDE;
+
+	rc = plpar_guest_get_state(hflags, gsb->guest_id, gsb->vcpu_id,
+				   __pa(gsb->hdr), gsb->capacity, &i);
+	return rc;
+}
+EXPORT_SYMBOL(gsb_recv);
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 6/6] docs: powerpc: Document nested KVM on POWER
  2023-06-05  6:48 ` Jordan Niethe
  (?)
@ 2023-06-05  6:48   ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

From: Michael Neuling <mikey@neuling.org>

Document support for nested KVM on POWER using the existing API as well
as the new PAPR API. This includes the new HCALL interface and how it
used by KVM.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
v2:
  - Separated into individual patch
---
 Documentation/powerpc/index.rst      |   1 +
 Documentation/powerpc/kvm-nested.rst | 636 +++++++++++++++++++++++++++
 2 files changed, 637 insertions(+)
 create mode 100644 Documentation/powerpc/kvm-nested.rst

diff --git a/Documentation/powerpc/index.rst b/Documentation/powerpc/index.rst
index 85e80e30160b..5a15dc6389ab 100644
--- a/Documentation/powerpc/index.rst
+++ b/Documentation/powerpc/index.rst
@@ -25,6 +25,7 @@ powerpc
     isa-versions
     kaslr-booke32
     mpc52xx
+    kvm-nested
     papr_hcalls
     pci_iov_resource_on_powernv
     pmu-ebb
diff --git a/Documentation/powerpc/kvm-nested.rst b/Documentation/powerpc/kvm-nested.rst
new file mode 100644
index 000000000000..c0c2e29a59d3
--- /dev/null
+++ b/Documentation/powerpc/kvm-nested.rst
@@ -0,0 +1,636 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================
+Nested KVM on POWER
+====================================
+
+Introduction
+============
+
+This document explains how a guest operating system can act as a
+hypervisor and run nested guests through the use of hypercalls, if the
+hypervisor has implemented them. The terms L0, L1, and L2 are used to
+refer to different software entities. L0 is the hypervisor mode entity
+that would normally be called the "host" or "hypervisor". L1 is a
+guest virtual machine that is directly run under L0 and is initiated
+and controlled by L0. L2 is a guest virtual machine that is initiated
+and controlled by L1 acting as a hypervisor.
+
+Existing API
+============
+
+Linux/KVM has had support for Nesting as an L0 or L1 since 2018
+
+The L0 code was added::
+
+   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
+   Author: Paul Mackerras <paulus@ozlabs.org>
+   Date:   Mon Oct 8 16:31:03 2018 +1100
+   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
+
+The L1 code was added::
+
+   commit 360cae313702cdd0b90f82c261a8302fecef030a
+   Author: Paul Mackerras <paulus@ozlabs.org>
+   Date:   Mon Oct 8 16:31:04 2018 +1100
+   KVM: PPC: Book3S HV: Nested guest entry via hypercall
+
+This API works primarily using a single hcall h_enter_nested(). This
+call made by the L1 to tell the L0 to start an L2 vCPU with the given
+state. The L0 then starts this L2 and runs until an L2 exit condition
+is reached. Once the L2 exits, the state of the L2 is given back to
+the L1 by the L0. The full L2 vCPU state is always transferred from
+and to L1 when the L2 is run. The L0 doesn't keep any state on the L2
+vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
+-> L1 exit).
+
+The only state kept by the L0 is the partition table. The L1 registers
+it's partition table using the h_set_partition_table() hcall. All
+other state held by the L0 about the L2s is cached state (such as
+shadow page tables).
+
+The L1 may run any L2 or vCPU without first informing the L0. It
+simply starts the vCPU using h_enter_nested(). The creation of L2s and
+vCPUs is done implicitly whenever h_enter_nested() is called.
+
+In this document, we call this existing API the v1 API.
+
+New PAPR API
+===============
+
+The new PAPR API changes from the v1 API such that the creating L2 and
+associated vCPUs is explicit. In this document, we call this the v2
+API.
+
+h_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
+be called the L1 must explicitly create the L2 using h_guest_create()
+and any associated vCPUs() created with h_guest_create_vCPU(). Getting
+and setting vCPU state can also be performed using h_guest_{g|s}et
+hcall.
+
+The basic execution flow is for an L1 to create an L2, run it, and
+delete it is:
+
+- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
+  (normally at L1 boot time).
+
+- L1 requests the L0 create an L2 with H_GUEST_CREATE() and receives a token
+
+- L1 requests the L0 create an L2 vCPU with H_GUEST_CREATE_VCPU()
+
+- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
+
+- L1 requests the L0 runs the vCPU running H_GUEST_VCPU_RUN() hcall
+
+- L1 deletes L2 with H_GUEST_DELETE()
+
+More details of the individual hcalls follows:
+
+HCALL Details
+=============
+
+This documentation is provided to give an overall understating of the
+API. It doesn't aim to provide all the details required to implement
+an L1 or L0. Latest version of PAPR can be referred to for more details.
+
+All these HCALLs are made by the L1 to the L0.
+
+H_GUEST_GET_CAPABILITIES()
+--------------------------
+
+This is called to get the capabilities of the L0 nested
+hypervisor. This includes capabilities such the CPU versions (eg
+POWER9, POWER10) that are supported as L2s::
+
+  H_GUEST_GET_CAPABILITIES(uint64 flags)
+
+  Parameters:
+    Input:
+      flags: Reserved
+    Output:
+      R3: Return code
+      R4: Hypervisor Supported Capabilities bitmap 1
+
+H_GUEST_SET_CAPABILITIES()
+--------------------------
+
+This is called to inform the L0 of the capabilities of the L1
+hypervisor. The set of flags passed here are the same as
+H_GUEST_GET_CAPABILITIES()
+
+Typically, GET will be called first and then SET will be called with a
+subset of the flags returned from GET. This process allows the L0 and
+L1 to negotiate an agreed set of capabilities::
+
+  H_GUEST_SET_CAPABILITIES(uint64 flags,
+                           uint64 capabilitiesBitmap1)
+  Parameters:
+    Input:
+      flags: Reserved
+      capabilitiesBitmap1: Only capabilities advertised through
+                           H_GUEST_GET_CAPABILITIES
+    Output:
+      R3: Return code
+      R4: If R3 = H_P2: The number of invalid bitmaps
+      R5: If R3 = H_P2: The index of first invalid bitmap
+
+H_GUEST_CREATE()
+----------------
+
+This is called to create an L2. A unique ID of the L2 created
+(similar to an LPID) is returned, which can be used on subsequent HCALLs to
+identify the L2::
+
+  H_GUEST_CREATE(uint64 flags,
+                 uint64 continueToken);
+  Parameters:
+    Input:
+      flags: Reserved
+      continueToken: Initial call set to -1. Subsequent calls,
+                     after H_Busy or H_LongBusyOrder has been
+                     returned, value that was returned in R4.
+    Output:
+      R3: Return code. Notable:
+        H_Not_Enough_Resources: Unable to create Guest VCPU due to not
+        enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags =
+        takeOwnershipOfVcpuState)
+      R4: If R3 = H_Busy or_H_LongBusyOrder -> continueToken
+
+H_GUEST_CREATE_VCPU()
+---------------------
+
+This is called to create a vCPU associated with an L2. The L2 id
+(returned from H_GUEST_CREATE()) should be passed it. Also passed in
+is a unique (for this L2) vCPUid. This vCPUid is allocated by the
+L1::
+
+  H_GUEST_CREATE_VCPU(uint64 flags,
+                      uint64 guestId,
+                      uint64 vcpuId);
+  Parameters:
+    Input:
+      flags: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU to be created. This must be within the
+              range of 0 to 2047
+    Output:
+      R3: Return code. Notable:
+        H_Not_Enough_Resources: Unable to create Guest VCPU due to not
+        enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags =
+        takeOwnershipOfVcpuState)
+
+H_GUEST_GET_STATE()
+-------------------
+
+This is called to get state associated with an L2 (Guest-wide or vCPU specific).
+This info is passed via the Guest State Buffer (GSB), a standard format as
+explained later in this doc, necessary details below:
+
+This can set either L2 wide or vcpu specific information. Examples of
+L2 wide is the timebase offset or process scoped page table
+info. Examples of vCPU specific are GPRs or VSRs. A bit in the flags
+parameter specifies if this call is L2 wide or vCPU specific and the
+IDs in the GSB must match this.
+
+The L1 provides a pointer to the GSB as a parameter to this call. Also
+provided is the L2 and vCPU IDs associated with the state to set.
+
+The L1 writes only the IDs and sizes in the GSB.  L0 writes the
+associated values for each ID in the GSB::
+
+  H_GUEST_GET_STATE(uint64 flags,
+                           uint64 guestId,
+                           uint64 vcpuId,
+                           uint64 dataBuffer,
+                           uint64 dataBufferSizeInBytes);
+  Parameters:
+    Input:
+      flags:
+         Bit 0: getGuestWideState: Request state of the Guest instead
+           of an individual VCPU.
+         Bit 1: takeOwnershipOfVcpuState Indicate the L1 is taking
+           over ownership of the VCPU state and that the L0 can free
+           the storage holding the state. The VCPU state will need to
+           be returned to the Hypervisor via H_GUEST_SET_STATE prior
+           to H_GUEST_RUN_VCPU being called for this VCPU. The data
+           returned in the dataBuffer is in a Hypervisor internal
+           format.
+         Bits 2-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
+      dataBuffer: A L1 real address of the GSB.
+        If takeOwnershipOfVcpuState, size must be at least the size
+        returned by ID=0x0001
+      dataBufferSizeInBytes: Size of dataBuffer
+    Output:
+      R3: Return code
+      R4: If R3 = H_Invalid_Element_Id: The array index of the bad
+            element ID.
+          If R3 = H_Invalid_Element_Size: The array index of the bad
+             element size.
+          If R3 = H_Invalid_Element_Value: The array index of the bad
+             element value.
+
+H_GUEST_SET_STATE()
+-------------------
+
+This is called to set L2 wide or vCPU specific L2 state. This info is
+passed via the Guest State Buffer (GSB), necessary details below:
+
+This can set either L2 wide or vcpu specific information. Examples of
+L2 wide is the timebase offset or process scoped page table
+info. Examples of vCPU specific are GPRs or VSRs. A bit in the flags
+parameter specifies if this call is L2 wide or vCPU specific and the
+IDs in the GSB must match this.
+
+The L1 provides a pointer to the GSB as a parameter to this call. Also
+provided is the L2 and vCPU IDs associated with the state to set.
+
+The L1 writes all values in the GSB and the L0 only reads the GSB for
+this call::
+
+  H_GUEST_SET_STATE(uint64 flags,
+                    uint64 guestId,
+                    uint64 vcpuId,
+                    uint64 dataBuffer,
+                    uint64 dataBufferSizeInBytes);
+  Parameters:
+    Input:
+      flags:
+         Bit 0: getGuestWideState: Request state of the Guest instead
+           of an individual VCPU.
+         Bit 1: returnOwnershipOfVcpuState Return Guest VCPU state. See
+           GET_STATE takeOwnershipOfVcpuState
+         Bits 2-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
+      dataBuffer: A L1 real address of the GSB.
+        If takeOwnershipOfVcpuState, size must be at least the size
+        returned by ID=0x0001
+      dataBufferSizeInBytes: Size of dataBuffer
+    Output:
+      R3: Return code
+      R4: If R3 = H_Invalid_Element_Id: The array index of the bad
+            element ID.
+          If R3 = H_Invalid_Element_Size: The array index of the bad
+             element size.
+          If R3 = H_Invalid_Element_Value: The array index of the bad
+             element value.
+
+H_GUEST_RUN_VCPU()
+------------------
+
+This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
+parameters. The vCPU run with the state set previously using
+H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
+hcall.
+
+This hcall also has associated input and output GSBs. Unlike
+H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
+parameters to the hcall (This was done in the interest of
+performance). The locations of these GSBs must be preregistered using
+the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
+below).
+
+The input GSB may contain only VCPU specific elements to be set. This
+GSB may also contain zero elements (ie 0 in the first 4 bytes of the
+GSB) if nothing needs to be set.
+
+On exit from the hcall, the output buffer is filled with elements
+determined by the L0. The reason for the exit is contained in GPR4 (ie
+NIP is put in GPR4).  The elements returned depend on the exit
+type. For example, if the exit reason is the L2 doing a hcall (GPR4 =
+0xc00), then GPR3-12 are provided in the output GSB as this is the
+state likely needed to service the hcall. If additional state is
+needed, H_GUEST_GET_STATE() may be called by the L1.
+
+To synthesize interrupts in the L2, when calling H_GUEST_RUN_VCPU()
+the L1 may set a flag (as a hcall parameter) and the L0 will
+synthesize the interrupt in the L2. Alternatively, the L1 may
+synthesize the interrupt itself using H_GUEST_SET_STATE() or the
+H_GUEST_RUN_VCPU() input GSB to set the state appropriately::
+
+  H_GUEST_RUN_VCPU(uint64 flags,
+                   uint64 guestId,
+                   uint64 vcpuId,
+                   uint64 dataBuffer,
+                   uint64 dataBufferSizeInBytes);
+  Parameters:
+    Input:
+      flags:
+         Bit 0: generateExternalInterrupt: Generate an external interrupt
+         Bit 1: generatePrivilegedDoorbell: Generate a Privileged Doorbell
+         Bit 2: sendToSystemReset”: Generate a System Reset Interrupt
+         Bits 3-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
+    Output:
+      R3: Return code
+      R4: If R3 = H_Success: The reason L1 VCPU exited (ie. NIA)
+            0x000: The VCPU stopped running for an unspecified reason. An
+              example of this is the Hypervisor stopping a VCPU running
+              due to an outstanding interrupt for the Host Partition.
+            0x980: HDEC
+            0xC00: HCALL
+            0xE00: HDSI
+            0xE20: HISI
+            0xE40: HEA
+            0xF80: HV Fac Unavail
+          If R3 = H_Invalid_Element_Id, H_Invalid_Element_Size, or
+            H_Invalid_Element_Value: R4 is offset of the invalid element
+            in the input buffer.
+
+H_GUEST_DELETE()
+----------------
+
+This is called to delete an L2. All associated vCPUs are also
+deleted. No specific vCPU delete call is provided.
+
+A flag may be provided to delete all guests. This is used to reset the
+L0 in the case of kdump/kexec::
+
+  H_GUEST_DELETE(uint64 flags,
+                 uint64 guestId)
+  Parameters:
+    Input:
+      flags:
+         Bit 0: deleteAllGuests: deletes all guests
+         Bits 1-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+    Output:
+      R3: Return code
+
+Guest State Buffer
+==================
+
+The Guest State Buffer (GSB) is the main method of communicating state
+about the L2 between the L1 and L0 via H_GUEST_{G,S}ET() and
+H_GUEST_VCPU_RUN() calls.
+
+State may be associated with a whole L2 (eg timebase offset) or a
+specific L2 vCPU (eg. GPR state). Only L2 VCPU state maybe be set by
+H_GUEST_VCPU_RUN(). 
+
+All data in the GSB is big endian (as is standard in PAPR)
+
+The Guest state buffer has a header which gives the number of
+elements, followed by the GSB elements themselves.
+
+GSB header:
+
++----------+----------+-------------------------------------------+
+|  Offset  |  Size    |  Purpose                                  |
+|  Bytes   |  Bytes   |                                           |
++==========+==========+===========================================+
+|    0     |    4     |  Number of elements                       |
++----------+----------+-------------------------------------------+
+|    4     |          |  Guest state buffer elements              |
++----------+----------+-------------------------------------------+
+
+GSB element:
+
++----------+----------+-------------------------------------------+
+|  Offset  |  Size    |  Purpose                                  |
+|  Bytes   |  Bytes   |                                           |
++==========+==========+===========================================+
+|    0     |    2     |  ID                                       |
++----------+----------+-------------------------------------------+
+|    2     |    2     |  Size of Value                            |
++----------+----------+-------------------------------------------+
+|    4     | As above |  Value                                    |
++----------+----------+-------------------------------------------+
+
+The ID in the GSB element specifies what is to be set. This includes
+archtected state like GPRs, VSRs, SPRs, plus also some meta data about
+the partition like the timebase offset and partition scoped page
+table information.
+
++--------+-------+----+--------+----------------------------------+
+|   ID   | Size  | RW | Thread | Details                          |
+|        | Bytes |    | Guest  |                                  |
+|        |       |    | Scope  |                                  |
++========+=======+====+========+==================================+
+| 0x0000 |       | RW |   TG   | NOP element                      |
++--------+-------+----+--------+----------------------------------+
+| 0x0001 | 0x08  | R  |   G    | Size of L0 vCPU state. See:      |
+|        |       |    |        | H_GUEST_GET_STATE:               |
+|        |       |    |        | flags = takeOwnershipOfVcpuState |
++--------+-------+----+--------+----------------------------------+
+| 0x0002 | 0x08  | R  |   G    | Size Run vCPU out buffer         |
++--------+-------+----+--------+----------------------------------+
+| 0x0003 | 0x04  | RW |   G    | Logical PVR                      |
++--------+-------+----+--------+----------------------------------+
+| 0x0004 | 0x08  | RW |   G    | TB Offset (L1 relative)          |
++--------+-------+----+--------+----------------------------------+
+| 0x0005 | 0x18  | RW |   G    |Partition scoped page tbl info:   |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x00 Addr part scope table      |
+|        |       |    |        |- 0x08 Num addr bits              |
+|        |       |    |        |- 0x10 Size root dir              |
++--------+-------+----+--------+----------------------------------+
+| 0x0006 | 0x10  | RW |   G    |Process Table Information:        |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr proc scope table       |
+|        |       |    |        |- 0x8 Table size.                 |
++--------+-------+----+--------+----------------------------------+
+| 0x0007-|       |    |        | Reserved                         |
+| 0x0BFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x0C00 | 0x10  | RW |   T    |Run vCPU Input Buffer:            |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr of buffer              |
+|        |       |    |        |- 0x8 Buffer Size.                |
++--------+-------+----+--------+----------------------------------+
+| 0x0C01 | 0x10  | RW |   T    |Run vCPU Output Buffer:           |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr of buffer              |
+|        |       |    |        |- 0x8 Buffer Size.                |
++--------+-------+----+--------+----------------------------------+
+| 0x0C02 | 0x08  | RW |   T    | vCPU VPA Address                 |
++--------+-------+----+--------+----------------------------------+
+| 0x0C03-|       |    |        | Reserved                         |
+| 0x0FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x1000-| 0x08  | RW |   T    | GPR 0-31                         |
+| 0x101F |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x1020 |  0x08 | T  |   T    | HDEC expiry TB                   |
++--------+-------+----+--------+----------------------------------+
+| 0x1021 | 0x08  | RW |   T    | NIA                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1022 | 0x08  | RW |   T    | MSR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1023 | 0x08  | RW |   T    | LR                               |
++--------+-------+----+--------+----------------------------------+
+| 0x1024 | 0x08  | RW |   T    | XER                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1025 | 0x08  | RW |   T    | CTR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1026 | 0x08  | RW |   T    | CFAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1027 | 0x08  | RW |   T    | SRR0                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1028 | 0x08  | RW |   T    | SRR1                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1029 | 0x08  | RW |   T    | DAR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x102A | 0x08  | RW |   T    | DEC expiry TB                    |
++--------+-------+----+--------+----------------------------------+
+| 0x102B | 0x08  | RW |   T    | VTB                              |
++--------+-------+----+--------+----------------------------------+
+| 0x102C | 0x08  | RW |   T    | LPCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x102D | 0x08  | RW |   T    | HFSCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x102E | 0x08  | RW |   T    | FSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x102F | 0x08  | RW |   T    | FPSCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1030 | 0x08  | RW |   T    | DAWR0                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1031 | 0x08  | RW |   T    | DAWR1                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1032 | 0x08  | RW |   T    | CIABR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1033 | 0x08  | RW |   T    | PURR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1034 | 0x08  | RW |   T    | SPURR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1035 | 0x08  | RW |   T    | IC                               |
++--------+-------+----+--------+----------------------------------+
+| 0x1036-| 0x08  | RW |   T    | SPRG 0-3                         |
+| 0x1039 |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x103A | 0x08  | W  |   T    | PPR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x103B | 0x08  | RW |   T    | MMCR 0-3                         |
+| 0x103E |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x103F | 0x08  | RW |   T    | MMCRA                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1040 | 0x08  | RW |   T    | SIER                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1041 | 0x08  | RW |   T    | SIER 2                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1042 | 0x08  | RW |   T    | SIER 3                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1043 | 0x08  | RW |   T    | BESCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1044 | 0x08  | RW |   T    | EBBHR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1045 | 0x08  | RW |   T    | EBBRR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1046 | 0x08  | RW |   T    | AMR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1047 | 0x08  | RW |   T    | IAMR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1048 | 0x08  | RW |   T    | AMOR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1049 | 0x08  | RW |   T    | UAMOR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x104A | 0x08  | RW |   T    | SDAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104B | 0x08  | RW |   T    | SIAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104C | 0x08  | RW |   T    | DSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104D | 0x08  | RW |   T    | TAR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x104E | 0x08  | RW |   T    | DEXCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x104F | 0x08  | RW |   T    | HDEXCR                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1050 | 0x08  | RW |   T    | HASHKEYR                         |
++--------+-------+----+--------+----------------------------------+
+| 0x1051 | 0x08  | RW |   T    | HASHPKEYR                        |
++--------+-------+----+--------+----------------------------------+
+| 0x1052 | 0x08  | RW |   T    | CTRL                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1053-|       |    |        | Reserved                         |
+| 0x1FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x2000 | 0x04  | RW |   T    | CR                               |
++--------+-------+----+--------+----------------------------------+
+| 0x2001 | 0x04  | RW |   T    | PIDR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x2002 | 0x04  | RW |   T    | DSISR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x2003 | 0x04  | RW |   T    | VSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x2004 | 0x04  | RW |   T    | VRSAVE                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2005 | 0x04  | RW |   T    | DAWRX0                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2006 | 0x04  | RW |   T    | DAWRX1                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2007-| 0x04  | RW |   T    | PMC 1-6                          |
+| 0x200c |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x200D | 0x04  | RW |   T    | WORT                             |
++--------+-------+----+--------+----------------------------------+
+| 0x200E | 0x04  | RW |   T    | PSPB                             |
++--------+-------+----+--------+----------------------------------+
+| 0x200F-|       |    |        | Reserved                         |
+| 0x2FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x3000-| 0x10  | RW |   T    | VSR 0-63                         |
+| 0x303F |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x3040-|       |    |        | Reserved                         |
+| 0xEFFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0xF000 | 0x08  | R  |   T    | HDAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0xF001 | 0x04  | R  |   T    | HDSISR                           |
++--------+-------+----+--------+----------------------------------+
+| 0xF002 | 0x04  | R  |   T    | HEIR                             |
++--------+-------+----+--------+----------------------------------+
+| 0xF003 | 0x08  | R  |   T    | ASDR                             |
++--------+-------+----+--------+----------------------------------+
+
+
+Miscellaneous info
+==================
+
+State not in ptregs/hvregs
+--------------------------
+
+In the v1 API, some state is not in the ptregs/hvstate. This includes
+the vector register and some SPRs. For the L1 to set this state for
+the L2, the L1 loads up these hardware registers before the
+h_enter_nested() call and the L0 ensures they end up as the L2 state
+(by not touching them).
+
+The v2 API removes this and explicitly sets this state via the GSB.
+
+L1 Implementation details: Caching state
+----------------------------------------
+
+In the v1 API, all state is sent from the L1 to the L0 and vice versa
+on every h_enter_nested() hcall. If the L0 is not currently running
+any L2s, the L0 has no state information about them. The only
+exception to this is the location of the partition table, registered
+via h_set_partition_table().
+
+The v2 API changes this so that the L0 retains the L2 state even when
+it's vCPUs are no longer running. This means that the L1 only needs to
+communicate with the L0 about L2 state when it needs to modify the L2
+state, or when it's value is out of date. This provides an opportunity
+for performance optimisation.
+
+When a vCPU exits from a H_GUEST_RUN_VCPU() call, the L1 internally
+marks all L2 state as invalid. This means that if the L1 wants to know
+the L2 state (say via a kvm_get_one_reg() call), it needs call
+H_GUEST_GET_STATE() to get that state. Once it's read, it's marked as
+valid in L1 until the L2 is run again.
+
+Also, when an L1 modifies L2 vcpu state, it doesn't need to write it
+to the L0 until that L2 vcpu runs again. Hence when the L1 updates
+state (say via a kvm_set_one_reg() call), it writes to an internal L1
+copy and only flushes this copy to the L0 when the L2 runs again via
+the H_GUEST_VCPU_RUN() input buffer.
+
+This lazy updating of state by the L1 avoids unnecessary
+H_GUEST_{G|S}ET_STATE() calls.
+
+
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 6/6] docs: powerpc: Document nested KVM on POWER
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Jordan Niethe, mikey, kautuk.consul.1980, kvm, npiggin, kvm-ppc,
	sbhat, vaibhav

From: Michael Neuling <mikey@neuling.org>

Document support for nested KVM on POWER using the existing API as well
as the new PAPR API. This includes the new HCALL interface and how it
used by KVM.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
v2:
  - Separated into individual patch
---
 Documentation/powerpc/index.rst      |   1 +
 Documentation/powerpc/kvm-nested.rst | 636 +++++++++++++++++++++++++++
 2 files changed, 637 insertions(+)
 create mode 100644 Documentation/powerpc/kvm-nested.rst

diff --git a/Documentation/powerpc/index.rst b/Documentation/powerpc/index.rst
index 85e80e30160b..5a15dc6389ab 100644
--- a/Documentation/powerpc/index.rst
+++ b/Documentation/powerpc/index.rst
@@ -25,6 +25,7 @@ powerpc
     isa-versions
     kaslr-booke32
     mpc52xx
+    kvm-nested
     papr_hcalls
     pci_iov_resource_on_powernv
     pmu-ebb
diff --git a/Documentation/powerpc/kvm-nested.rst b/Documentation/powerpc/kvm-nested.rst
new file mode 100644
index 000000000000..c0c2e29a59d3
--- /dev/null
+++ b/Documentation/powerpc/kvm-nested.rst
@@ -0,0 +1,636 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================
+Nested KVM on POWER
+====================================
+
+Introduction
+============
+
+This document explains how a guest operating system can act as a
+hypervisor and run nested guests through the use of hypercalls, if the
+hypervisor has implemented them. The terms L0, L1, and L2 are used to
+refer to different software entities. L0 is the hypervisor mode entity
+that would normally be called the "host" or "hypervisor". L1 is a
+guest virtual machine that is directly run under L0 and is initiated
+and controlled by L0. L2 is a guest virtual machine that is initiated
+and controlled by L1 acting as a hypervisor.
+
+Existing API
+============
+
+Linux/KVM has had support for Nesting as an L0 or L1 since 2018
+
+The L0 code was added::
+
+   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
+   Author: Paul Mackerras <paulus@ozlabs.org>
+   Date:   Mon Oct 8 16:31:03 2018 +1100
+   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
+
+The L1 code was added::
+
+   commit 360cae313702cdd0b90f82c261a8302fecef030a
+   Author: Paul Mackerras <paulus@ozlabs.org>
+   Date:   Mon Oct 8 16:31:04 2018 +1100
+   KVM: PPC: Book3S HV: Nested guest entry via hypercall
+
+This API works primarily using a single hcall h_enter_nested(). This
+call made by the L1 to tell the L0 to start an L2 vCPU with the given
+state. The L0 then starts this L2 and runs until an L2 exit condition
+is reached. Once the L2 exits, the state of the L2 is given back to
+the L1 by the L0. The full L2 vCPU state is always transferred from
+and to L1 when the L2 is run. The L0 doesn't keep any state on the L2
+vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
+-> L1 exit).
+
+The only state kept by the L0 is the partition table. The L1 registers
+it's partition table using the h_set_partition_table() hcall. All
+other state held by the L0 about the L2s is cached state (such as
+shadow page tables).
+
+The L1 may run any L2 or vCPU without first informing the L0. It
+simply starts the vCPU using h_enter_nested(). The creation of L2s and
+vCPUs is done implicitly whenever h_enter_nested() is called.
+
+In this document, we call this existing API the v1 API.
+
+New PAPR API
+===============
+
+The new PAPR API changes from the v1 API such that the creating L2 and
+associated vCPUs is explicit. In this document, we call this the v2
+API.
+
+h_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
+be called the L1 must explicitly create the L2 using h_guest_create()
+and any associated vCPUs() created with h_guest_create_vCPU(). Getting
+and setting vCPU state can also be performed using h_guest_{g|s}et
+hcall.
+
+The basic execution flow is for an L1 to create an L2, run it, and
+delete it is:
+
+- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
+  (normally at L1 boot time).
+
+- L1 requests the L0 create an L2 with H_GUEST_CREATE() and receives a token
+
+- L1 requests the L0 create an L2 vCPU with H_GUEST_CREATE_VCPU()
+
+- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
+
+- L1 requests the L0 runs the vCPU running H_GUEST_VCPU_RUN() hcall
+
+- L1 deletes L2 with H_GUEST_DELETE()
+
+More details of the individual hcalls follows:
+
+HCALL Details
+=============
+
+This documentation is provided to give an overall understating of the
+API. It doesn't aim to provide all the details required to implement
+an L1 or L0. Latest version of PAPR can be referred to for more details.
+
+All these HCALLs are made by the L1 to the L0.
+
+H_GUEST_GET_CAPABILITIES()
+--------------------------
+
+This is called to get the capabilities of the L0 nested
+hypervisor. This includes capabilities such the CPU versions (eg
+POWER9, POWER10) that are supported as L2s::
+
+  H_GUEST_GET_CAPABILITIES(uint64 flags)
+
+  Parameters:
+    Input:
+      flags: Reserved
+    Output:
+      R3: Return code
+      R4: Hypervisor Supported Capabilities bitmap 1
+
+H_GUEST_SET_CAPABILITIES()
+--------------------------
+
+This is called to inform the L0 of the capabilities of the L1
+hypervisor. The set of flags passed here are the same as
+H_GUEST_GET_CAPABILITIES()
+
+Typically, GET will be called first and then SET will be called with a
+subset of the flags returned from GET. This process allows the L0 and
+L1 to negotiate an agreed set of capabilities::
+
+  H_GUEST_SET_CAPABILITIES(uint64 flags,
+                           uint64 capabilitiesBitmap1)
+  Parameters:
+    Input:
+      flags: Reserved
+      capabilitiesBitmap1: Only capabilities advertised through
+                           H_GUEST_GET_CAPABILITIES
+    Output:
+      R3: Return code
+      R4: If R3 = H_P2: The number of invalid bitmaps
+      R5: If R3 = H_P2: The index of first invalid bitmap
+
+H_GUEST_CREATE()
+----------------
+
+This is called to create an L2. A unique ID of the L2 created
+(similar to an LPID) is returned, which can be used on subsequent HCALLs to
+identify the L2::
+
+  H_GUEST_CREATE(uint64 flags,
+                 uint64 continueToken);
+  Parameters:
+    Input:
+      flags: Reserved
+      continueToken: Initial call set to -1. Subsequent calls,
+                     after H_Busy or H_LongBusyOrder has been
+                     returned, value that was returned in R4.
+    Output:
+      R3: Return code. Notable:
+        H_Not_Enough_Resources: Unable to create Guest VCPU due to not
+        enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags =
+        takeOwnershipOfVcpuState)
+      R4: If R3 = H_Busy or_H_LongBusyOrder -> continueToken
+
+H_GUEST_CREATE_VCPU()
+---------------------
+
+This is called to create a vCPU associated with an L2. The L2 id
+(returned from H_GUEST_CREATE()) should be passed it. Also passed in
+is a unique (for this L2) vCPUid. This vCPUid is allocated by the
+L1::
+
+  H_GUEST_CREATE_VCPU(uint64 flags,
+                      uint64 guestId,
+                      uint64 vcpuId);
+  Parameters:
+    Input:
+      flags: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU to be created. This must be within the
+              range of 0 to 2047
+    Output:
+      R3: Return code. Notable:
+        H_Not_Enough_Resources: Unable to create Guest VCPU due to not
+        enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags =
+        takeOwnershipOfVcpuState)
+
+H_GUEST_GET_STATE()
+-------------------
+
+This is called to get state associated with an L2 (Guest-wide or vCPU specific).
+This info is passed via the Guest State Buffer (GSB), a standard format as
+explained later in this doc, necessary details below:
+
+This can set either L2 wide or vcpu specific information. Examples of
+L2 wide is the timebase offset or process scoped page table
+info. Examples of vCPU specific are GPRs or VSRs. A bit in the flags
+parameter specifies if this call is L2 wide or vCPU specific and the
+IDs in the GSB must match this.
+
+The L1 provides a pointer to the GSB as a parameter to this call. Also
+provided is the L2 and vCPU IDs associated with the state to set.
+
+The L1 writes only the IDs and sizes in the GSB.  L0 writes the
+associated values for each ID in the GSB::
+
+  H_GUEST_GET_STATE(uint64 flags,
+                           uint64 guestId,
+                           uint64 vcpuId,
+                           uint64 dataBuffer,
+                           uint64 dataBufferSizeInBytes);
+  Parameters:
+    Input:
+      flags:
+         Bit 0: getGuestWideState: Request state of the Guest instead
+           of an individual VCPU.
+         Bit 1: takeOwnershipOfVcpuState Indicate the L1 is taking
+           over ownership of the VCPU state and that the L0 can free
+           the storage holding the state. The VCPU state will need to
+           be returned to the Hypervisor via H_GUEST_SET_STATE prior
+           to H_GUEST_RUN_VCPU being called for this VCPU. The data
+           returned in the dataBuffer is in a Hypervisor internal
+           format.
+         Bits 2-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
+      dataBuffer: A L1 real address of the GSB.
+        If takeOwnershipOfVcpuState, size must be at least the size
+        returned by ID=0x0001
+      dataBufferSizeInBytes: Size of dataBuffer
+    Output:
+      R3: Return code
+      R4: If R3 = H_Invalid_Element_Id: The array index of the bad
+            element ID.
+          If R3 = H_Invalid_Element_Size: The array index of the bad
+             element size.
+          If R3 = H_Invalid_Element_Value: The array index of the bad
+             element value.
+
+H_GUEST_SET_STATE()
+-------------------
+
+This is called to set L2 wide or vCPU specific L2 state. This info is
+passed via the Guest State Buffer (GSB), necessary details below:
+
+This can set either L2 wide or vcpu specific information. Examples of
+L2 wide is the timebase offset or process scoped page table
+info. Examples of vCPU specific are GPRs or VSRs. A bit in the flags
+parameter specifies if this call is L2 wide or vCPU specific and the
+IDs in the GSB must match this.
+
+The L1 provides a pointer to the GSB as a parameter to this call. Also
+provided is the L2 and vCPU IDs associated with the state to set.
+
+The L1 writes all values in the GSB and the L0 only reads the GSB for
+this call::
+
+  H_GUEST_SET_STATE(uint64 flags,
+                    uint64 guestId,
+                    uint64 vcpuId,
+                    uint64 dataBuffer,
+                    uint64 dataBufferSizeInBytes);
+  Parameters:
+    Input:
+      flags:
+         Bit 0: getGuestWideState: Request state of the Guest instead
+           of an individual VCPU.
+         Bit 1: returnOwnershipOfVcpuState Return Guest VCPU state. See
+           GET_STATE takeOwnershipOfVcpuState
+         Bits 2-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
+      dataBuffer: A L1 real address of the GSB.
+        If takeOwnershipOfVcpuState, size must be at least the size
+        returned by ID=0x0001
+      dataBufferSizeInBytes: Size of dataBuffer
+    Output:
+      R3: Return code
+      R4: If R3 = H_Invalid_Element_Id: The array index of the bad
+            element ID.
+          If R3 = H_Invalid_Element_Size: The array index of the bad
+             element size.
+          If R3 = H_Invalid_Element_Value: The array index of the bad
+             element value.
+
+H_GUEST_RUN_VCPU()
+------------------
+
+This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
+parameters. The vCPU run with the state set previously using
+H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
+hcall.
+
+This hcall also has associated input and output GSBs. Unlike
+H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
+parameters to the hcall (This was done in the interest of
+performance). The locations of these GSBs must be preregistered using
+the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
+below).
+
+The input GSB may contain only VCPU specific elements to be set. This
+GSB may also contain zero elements (ie 0 in the first 4 bytes of the
+GSB) if nothing needs to be set.
+
+On exit from the hcall, the output buffer is filled with elements
+determined by the L0. The reason for the exit is contained in GPR4 (ie
+NIP is put in GPR4).  The elements returned depend on the exit
+type. For example, if the exit reason is the L2 doing a hcall (GPR4 =
+0xc00), then GPR3-12 are provided in the output GSB as this is the
+state likely needed to service the hcall. If additional state is
+needed, H_GUEST_GET_STATE() may be called by the L1.
+
+To synthesize interrupts in the L2, when calling H_GUEST_RUN_VCPU()
+the L1 may set a flag (as a hcall parameter) and the L0 will
+synthesize the interrupt in the L2. Alternatively, the L1 may
+synthesize the interrupt itself using H_GUEST_SET_STATE() or the
+H_GUEST_RUN_VCPU() input GSB to set the state appropriately::
+
+  H_GUEST_RUN_VCPU(uint64 flags,
+                   uint64 guestId,
+                   uint64 vcpuId,
+                   uint64 dataBuffer,
+                   uint64 dataBufferSizeInBytes);
+  Parameters:
+    Input:
+      flags:
+         Bit 0: generateExternalInterrupt: Generate an external interrupt
+         Bit 1: generatePrivilegedDoorbell: Generate a Privileged Doorbell
+         Bit 2: sendToSystemReset”: Generate a System Reset Interrupt
+         Bits 3-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
+    Output:
+      R3: Return code
+      R4: If R3 = H_Success: The reason L1 VCPU exited (ie. NIA)
+            0x000: The VCPU stopped running for an unspecified reason. An
+              example of this is the Hypervisor stopping a VCPU running
+              due to an outstanding interrupt for the Host Partition.
+            0x980: HDEC
+            0xC00: HCALL
+            0xE00: HDSI
+            0xE20: HISI
+            0xE40: HEA
+            0xF80: HV Fac Unavail
+          If R3 = H_Invalid_Element_Id, H_Invalid_Element_Size, or
+            H_Invalid_Element_Value: R4 is offset of the invalid element
+            in the input buffer.
+
+H_GUEST_DELETE()
+----------------
+
+This is called to delete an L2. All associated vCPUs are also
+deleted. No specific vCPU delete call is provided.
+
+A flag may be provided to delete all guests. This is used to reset the
+L0 in the case of kdump/kexec::
+
+  H_GUEST_DELETE(uint64 flags,
+                 uint64 guestId)
+  Parameters:
+    Input:
+      flags:
+         Bit 0: deleteAllGuests: deletes all guests
+         Bits 1-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+    Output:
+      R3: Return code
+
+Guest State Buffer
+==================
+
+The Guest State Buffer (GSB) is the main method of communicating state
+about the L2 between the L1 and L0 via H_GUEST_{G,S}ET() and
+H_GUEST_VCPU_RUN() calls.
+
+State may be associated with a whole L2 (eg timebase offset) or a
+specific L2 vCPU (eg. GPR state). Only L2 VCPU state maybe be set by
+H_GUEST_VCPU_RUN(). 
+
+All data in the GSB is big endian (as is standard in PAPR)
+
+The Guest state buffer has a header which gives the number of
+elements, followed by the GSB elements themselves.
+
+GSB header:
+
++----------+----------+-------------------------------------------+
+|  Offset  |  Size    |  Purpose                                  |
+|  Bytes   |  Bytes   |                                           |
++==========+==========+===========================================+
+|    0     |    4     |  Number of elements                       |
++----------+----------+-------------------------------------------+
+|    4     |          |  Guest state buffer elements              |
++----------+----------+-------------------------------------------+
+
+GSB element:
+
++----------+----------+-------------------------------------------+
+|  Offset  |  Size    |  Purpose                                  |
+|  Bytes   |  Bytes   |                                           |
++==========+==========+===========================================+
+|    0     |    2     |  ID                                       |
++----------+----------+-------------------------------------------+
+|    2     |    2     |  Size of Value                            |
++----------+----------+-------------------------------------------+
+|    4     | As above |  Value                                    |
++----------+----------+-------------------------------------------+
+
+The ID in the GSB element specifies what is to be set. This includes
+archtected state like GPRs, VSRs, SPRs, plus also some meta data about
+the partition like the timebase offset and partition scoped page
+table information.
+
++--------+-------+----+--------+----------------------------------+
+|   ID   | Size  | RW | Thread | Details                          |
+|        | Bytes |    | Guest  |                                  |
+|        |       |    | Scope  |                                  |
++========+=======+====+========+==================================+
+| 0x0000 |       | RW |   TG   | NOP element                      |
++--------+-------+----+--------+----------------------------------+
+| 0x0001 | 0x08  | R  |   G    | Size of L0 vCPU state. See:      |
+|        |       |    |        | H_GUEST_GET_STATE:               |
+|        |       |    |        | flags = takeOwnershipOfVcpuState |
++--------+-------+----+--------+----------------------------------+
+| 0x0002 | 0x08  | R  |   G    | Size Run vCPU out buffer         |
++--------+-------+----+--------+----------------------------------+
+| 0x0003 | 0x04  | RW |   G    | Logical PVR                      |
++--------+-------+----+--------+----------------------------------+
+| 0x0004 | 0x08  | RW |   G    | TB Offset (L1 relative)          |
++--------+-------+----+--------+----------------------------------+
+| 0x0005 | 0x18  | RW |   G    |Partition scoped page tbl info:   |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x00 Addr part scope table      |
+|        |       |    |        |- 0x08 Num addr bits              |
+|        |       |    |        |- 0x10 Size root dir              |
++--------+-------+----+--------+----------------------------------+
+| 0x0006 | 0x10  | RW |   G    |Process Table Information:        |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr proc scope table       |
+|        |       |    |        |- 0x8 Table size.                 |
++--------+-------+----+--------+----------------------------------+
+| 0x0007-|       |    |        | Reserved                         |
+| 0x0BFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x0C00 | 0x10  | RW |   T    |Run vCPU Input Buffer:            |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr of buffer              |
+|        |       |    |        |- 0x8 Buffer Size.                |
++--------+-------+----+--------+----------------------------------+
+| 0x0C01 | 0x10  | RW |   T    |Run vCPU Output Buffer:           |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr of buffer              |
+|        |       |    |        |- 0x8 Buffer Size.                |
++--------+-------+----+--------+----------------------------------+
+| 0x0C02 | 0x08  | RW |   T    | vCPU VPA Address                 |
++--------+-------+----+--------+----------------------------------+
+| 0x0C03-|       |    |        | Reserved                         |
+| 0x0FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x1000-| 0x08  | RW |   T    | GPR 0-31                         |
+| 0x101F |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x1020 |  0x08 | T  |   T    | HDEC expiry TB                   |
++--------+-------+----+--------+----------------------------------+
+| 0x1021 | 0x08  | RW |   T    | NIA                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1022 | 0x08  | RW |   T    | MSR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1023 | 0x08  | RW |   T    | LR                               |
++--------+-------+----+--------+----------------------------------+
+| 0x1024 | 0x08  | RW |   T    | XER                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1025 | 0x08  | RW |   T    | CTR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1026 | 0x08  | RW |   T    | CFAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1027 | 0x08  | RW |   T    | SRR0                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1028 | 0x08  | RW |   T    | SRR1                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1029 | 0x08  | RW |   T    | DAR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x102A | 0x08  | RW |   T    | DEC expiry TB                    |
++--------+-------+----+--------+----------------------------------+
+| 0x102B | 0x08  | RW |   T    | VTB                              |
++--------+-------+----+--------+----------------------------------+
+| 0x102C | 0x08  | RW |   T    | LPCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x102D | 0x08  | RW |   T    | HFSCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x102E | 0x08  | RW |   T    | FSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x102F | 0x08  | RW |   T    | FPSCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1030 | 0x08  | RW |   T    | DAWR0                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1031 | 0x08  | RW |   T    | DAWR1                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1032 | 0x08  | RW |   T    | CIABR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1033 | 0x08  | RW |   T    | PURR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1034 | 0x08  | RW |   T    | SPURR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1035 | 0x08  | RW |   T    | IC                               |
++--------+-------+----+--------+----------------------------------+
+| 0x1036-| 0x08  | RW |   T    | SPRG 0-3                         |
+| 0x1039 |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x103A | 0x08  | W  |   T    | PPR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x103B | 0x08  | RW |   T    | MMCR 0-3                         |
+| 0x103E |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x103F | 0x08  | RW |   T    | MMCRA                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1040 | 0x08  | RW |   T    | SIER                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1041 | 0x08  | RW |   T    | SIER 2                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1042 | 0x08  | RW |   T    | SIER 3                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1043 | 0x08  | RW |   T    | BESCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1044 | 0x08  | RW |   T    | EBBHR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1045 | 0x08  | RW |   T    | EBBRR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1046 | 0x08  | RW |   T    | AMR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1047 | 0x08  | RW |   T    | IAMR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1048 | 0x08  | RW |   T    | AMOR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1049 | 0x08  | RW |   T    | UAMOR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x104A | 0x08  | RW |   T    | SDAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104B | 0x08  | RW |   T    | SIAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104C | 0x08  | RW |   T    | DSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104D | 0x08  | RW |   T    | TAR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x104E | 0x08  | RW |   T    | DEXCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x104F | 0x08  | RW |   T    | HDEXCR                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1050 | 0x08  | RW |   T    | HASHKEYR                         |
++--------+-------+----+--------+----------------------------------+
+| 0x1051 | 0x08  | RW |   T    | HASHPKEYR                        |
++--------+-------+----+--------+----------------------------------+
+| 0x1052 | 0x08  | RW |   T    | CTRL                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1053-|       |    |        | Reserved                         |
+| 0x1FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x2000 | 0x04  | RW |   T    | CR                               |
++--------+-------+----+--------+----------------------------------+
+| 0x2001 | 0x04  | RW |   T    | PIDR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x2002 | 0x04  | RW |   T    | DSISR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x2003 | 0x04  | RW |   T    | VSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x2004 | 0x04  | RW |   T    | VRSAVE                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2005 | 0x04  | RW |   T    | DAWRX0                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2006 | 0x04  | RW |   T    | DAWRX1                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2007-| 0x04  | RW |   T    | PMC 1-6                          |
+| 0x200c |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x200D | 0x04  | RW |   T    | WORT                             |
++--------+-------+----+--------+----------------------------------+
+| 0x200E | 0x04  | RW |   T    | PSPB                             |
++--------+-------+----+--------+----------------------------------+
+| 0x200F-|       |    |        | Reserved                         |
+| 0x2FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x3000-| 0x10  | RW |   T    | VSR 0-63                         |
+| 0x303F |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x3040-|       |    |        | Reserved                         |
+| 0xEFFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0xF000 | 0x08  | R  |   T    | HDAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0xF001 | 0x04  | R  |   T    | HDSISR                           |
++--------+-------+----+--------+----------------------------------+
+| 0xF002 | 0x04  | R  |   T    | HEIR                             |
++--------+-------+----+--------+----------------------------------+
+| 0xF003 | 0x08  | R  |   T    | ASDR                             |
++--------+-------+----+--------+----------------------------------+
+
+
+Miscellaneous info
+==================
+
+State not in ptregs/hvregs
+--------------------------
+
+In the v1 API, some state is not in the ptregs/hvstate. This includes
+the vector register and some SPRs. For the L1 to set this state for
+the L2, the L1 loads up these hardware registers before the
+h_enter_nested() call and the L0 ensures they end up as the L2 state
+(by not touching them).
+
+The v2 API removes this and explicitly sets this state via the GSB.
+
+L1 Implementation details: Caching state
+----------------------------------------
+
+In the v1 API, all state is sent from the L1 to the L0 and vice versa
+on every h_enter_nested() hcall. If the L0 is not currently running
+any L2s, the L0 has no state information about them. The only
+exception to this is the location of the partition table, registered
+via h_set_partition_table().
+
+The v2 API changes this so that the L0 retains the L2 state even when
+it's vCPUs are no longer running. This means that the L1 only needs to
+communicate with the L0 about L2 state when it needs to modify the L2
+state, or when it's value is out of date. This provides an opportunity
+for performance optimisation.
+
+When a vCPU exits from a H_GUEST_RUN_VCPU() call, the L1 internally
+marks all L2 state as invalid. This means that if the L1 wants to know
+the L2 state (say via a kvm_get_one_reg() call), it needs call
+H_GUEST_GET_STATE() to get that state. Once it's read, it's marked as
+valid in L1 until the L2 is run again.
+
+Also, when an L1 modifies L2 vcpu state, it doesn't need to write it
+to the L0 until that L2 vcpu runs again. Hence when the L1 updates
+state (say via a kvm_set_one_reg() call), it writes to an internal L1
+copy and only flushes this copy to the L0 when the L2 runs again via
+the H_GUEST_VCPU_RUN() input buffer.
+
+This lazy updating of state by the L1 avoids unnecessary
+H_GUEST_{G|S}ET_STATE() calls.
+
+
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC PATCH v2 6/6] docs: powerpc: Document nested KVM on POWER
@ 2023-06-05  6:48   ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-05  6:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, npiggin, mikey, paulus, kautuk.consul.1980,
	vaibhav, sbhat, Jordan Niethe

From: Michael Neuling <mikey@neuling.org>

Document support for nested KVM on POWER using the existing API as well
as the new PAPR API. This includes the new HCALL interface and how it
used by KVM.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
---
v2:
  - Separated into individual patch
---
 Documentation/powerpc/index.rst      |   1 +
 Documentation/powerpc/kvm-nested.rst | 636 +++++++++++++++++++++++++++
 2 files changed, 637 insertions(+)
 create mode 100644 Documentation/powerpc/kvm-nested.rst

diff --git a/Documentation/powerpc/index.rst b/Documentation/powerpc/index.rst
index 85e80e30160b..5a15dc6389ab 100644
--- a/Documentation/powerpc/index.rst
+++ b/Documentation/powerpc/index.rst
@@ -25,6 +25,7 @@ powerpc
     isa-versions
     kaslr-booke32
     mpc52xx
+    kvm-nested
     papr_hcalls
     pci_iov_resource_on_powernv
     pmu-ebb
diff --git a/Documentation/powerpc/kvm-nested.rst b/Documentation/powerpc/kvm-nested.rst
new file mode 100644
index 000000000000..c0c2e29a59d3
--- /dev/null
+++ b/Documentation/powerpc/kvm-nested.rst
@@ -0,0 +1,636 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
+Nested KVM on POWER
+==================
+
+Introduction
+======
+
+This document explains how a guest operating system can act as a
+hypervisor and run nested guests through the use of hypercalls, if the
+hypervisor has implemented them. The terms L0, L1, and L2 are used to
+refer to different software entities. L0 is the hypervisor mode entity
+that would normally be called the "host" or "hypervisor". L1 is a
+guest virtual machine that is directly run under L0 and is initiated
+and controlled by L0. L2 is a guest virtual machine that is initiated
+and controlled by L1 acting as a hypervisor.
+
+Existing API
+======
+
+Linux/KVM has had support for Nesting as an L0 or L1 since 2018
+
+The L0 code was added::
+
+   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
+   Author: Paul Mackerras <paulus@ozlabs.org>
+   Date:   Mon Oct 8 16:31:03 2018 +1100
+   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
+
+The L1 code was added::
+
+   commit 360cae313702cdd0b90f82c261a8302fecef030a
+   Author: Paul Mackerras <paulus@ozlabs.org>
+   Date:   Mon Oct 8 16:31:04 2018 +1100
+   KVM: PPC: Book3S HV: Nested guest entry via hypercall
+
+This API works primarily using a single hcall h_enter_nested(). This
+call made by the L1 to tell the L0 to start an L2 vCPU with the given
+state. The L0 then starts this L2 and runs until an L2 exit condition
+is reached. Once the L2 exits, the state of the L2 is given back to
+the L1 by the L0. The full L2 vCPU state is always transferred from
+and to L1 when the L2 is run. The L0 doesn't keep any state on the L2
+vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
+-> L1 exit).
+
+The only state kept by the L0 is the partition table. The L1 registers
+it's partition table using the h_set_partition_table() hcall. All
+other state held by the L0 about the L2s is cached state (such as
+shadow page tables).
+
+The L1 may run any L2 or vCPU without first informing the L0. It
+simply starts the vCPU using h_enter_nested(). The creation of L2s and
+vCPUs is done implicitly whenever h_enter_nested() is called.
+
+In this document, we call this existing API the v1 API.
+
+New PAPR API
+=======+
+The new PAPR API changes from the v1 API such that the creating L2 and
+associated vCPUs is explicit. In this document, we call this the v2
+API.
+
+h_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
+be called the L1 must explicitly create the L2 using h_guest_create()
+and any associated vCPUs() created with h_guest_create_vCPU(). Getting
+and setting vCPU state can also be performed using h_guest_{g|s}et
+hcall.
+
+The basic execution flow is for an L1 to create an L2, run it, and
+delete it is:
+
+- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
+  (normally at L1 boot time).
+
+- L1 requests the L0 create an L2 with H_GUEST_CREATE() and receives a token
+
+- L1 requests the L0 create an L2 vCPU with H_GUEST_CREATE_VCPU()
+
+- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
+
+- L1 requests the L0 runs the vCPU running H_GUEST_VCPU_RUN() hcall
+
+- L1 deletes L2 with H_GUEST_DELETE()
+
+More details of the individual hcalls follows:
+
+HCALL Details
+======+
+This documentation is provided to give an overall understating of the
+API. It doesn't aim to provide all the details required to implement
+an L1 or L0. Latest version of PAPR can be referred to for more details.
+
+All these HCALLs are made by the L1 to the L0.
+
+H_GUEST_GET_CAPABILITIES()
+--------------------------
+
+This is called to get the capabilities of the L0 nested
+hypervisor. This includes capabilities such the CPU versions (eg
+POWER9, POWER10) that are supported as L2s::
+
+  H_GUEST_GET_CAPABILITIES(uint64 flags)
+
+  Parameters:
+    Input:
+      flags: Reserved
+    Output:
+      R3: Return code
+      R4: Hypervisor Supported Capabilities bitmap 1
+
+H_GUEST_SET_CAPABILITIES()
+--------------------------
+
+This is called to inform the L0 of the capabilities of the L1
+hypervisor. The set of flags passed here are the same as
+H_GUEST_GET_CAPABILITIES()
+
+Typically, GET will be called first and then SET will be called with a
+subset of the flags returned from GET. This process allows the L0 and
+L1 to negotiate an agreed set of capabilities::
+
+  H_GUEST_SET_CAPABILITIES(uint64 flags,
+                           uint64 capabilitiesBitmap1)
+  Parameters:
+    Input:
+      flags: Reserved
+      capabilitiesBitmap1: Only capabilities advertised through
+                           H_GUEST_GET_CAPABILITIES
+    Output:
+      R3: Return code
+      R4: If R3 = H_P2: The number of invalid bitmaps
+      R5: If R3 = H_P2: The index of first invalid bitmap
+
+H_GUEST_CREATE()
+----------------
+
+This is called to create an L2. A unique ID of the L2 created
+(similar to an LPID) is returned, which can be used on subsequent HCALLs to
+identify the L2::
+
+  H_GUEST_CREATE(uint64 flags,
+                 uint64 continueToken);
+  Parameters:
+    Input:
+      flags: Reserved
+      continueToken: Initial call set to -1. Subsequent calls,
+                     after H_Busy or H_LongBusyOrder has been
+                     returned, value that was returned in R4.
+    Output:
+      R3: Return code. Notable:
+        H_Not_Enough_Resources: Unable to create Guest VCPU due to not
+        enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags +        takeOwnershipOfVcpuState)
+      R4: If R3 = H_Busy or_H_LongBusyOrder -> continueToken
+
+H_GUEST_CREATE_VCPU()
+---------------------
+
+This is called to create a vCPU associated with an L2. The L2 id
+(returned from H_GUEST_CREATE()) should be passed it. Also passed in
+is a unique (for this L2) vCPUid. This vCPUid is allocated by the
+L1::
+
+  H_GUEST_CREATE_VCPU(uint64 flags,
+                      uint64 guestId,
+                      uint64 vcpuId);
+  Parameters:
+    Input:
+      flags: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU to be created. This must be within the
+              range of 0 to 2047
+    Output:
+      R3: Return code. Notable:
+        H_Not_Enough_Resources: Unable to create Guest VCPU due to not
+        enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags +        takeOwnershipOfVcpuState)
+
+H_GUEST_GET_STATE()
+-------------------
+
+This is called to get state associated with an L2 (Guest-wide or vCPU specific).
+This info is passed via the Guest State Buffer (GSB), a standard format as
+explained later in this doc, necessary details below:
+
+This can set either L2 wide or vcpu specific information. Examples of
+L2 wide is the timebase offset or process scoped page table
+info. Examples of vCPU specific are GPRs or VSRs. A bit in the flags
+parameter specifies if this call is L2 wide or vCPU specific and the
+IDs in the GSB must match this.
+
+The L1 provides a pointer to the GSB as a parameter to this call. Also
+provided is the L2 and vCPU IDs associated with the state to set.
+
+The L1 writes only the IDs and sizes in the GSB.  L0 writes the
+associated values for each ID in the GSB::
+
+  H_GUEST_GET_STATE(uint64 flags,
+                           uint64 guestId,
+                           uint64 vcpuId,
+                           uint64 dataBuffer,
+                           uint64 dataBufferSizeInBytes);
+  Parameters:
+    Input:
+      flags:
+         Bit 0: getGuestWideState: Request state of the Guest instead
+           of an individual VCPU.
+         Bit 1: takeOwnershipOfVcpuState Indicate the L1 is taking
+           over ownership of the VCPU state and that the L0 can free
+           the storage holding the state. The VCPU state will need to
+           be returned to the Hypervisor via H_GUEST_SET_STATE prior
+           to H_GUEST_RUN_VCPU being called for this VCPU. The data
+           returned in the dataBuffer is in a Hypervisor internal
+           format.
+         Bits 2-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
+      dataBuffer: A L1 real address of the GSB.
+        If takeOwnershipOfVcpuState, size must be at least the size
+        returned by ID=0x0001
+      dataBufferSizeInBytes: Size of dataBuffer
+    Output:
+      R3: Return code
+      R4: If R3 = H_Invalid_Element_Id: The array index of the bad
+            element ID.
+          If R3 = H_Invalid_Element_Size: The array index of the bad
+             element size.
+          If R3 = H_Invalid_Element_Value: The array index of the bad
+             element value.
+
+H_GUEST_SET_STATE()
+-------------------
+
+This is called to set L2 wide or vCPU specific L2 state. This info is
+passed via the Guest State Buffer (GSB), necessary details below:
+
+This can set either L2 wide or vcpu specific information. Examples of
+L2 wide is the timebase offset or process scoped page table
+info. Examples of vCPU specific are GPRs or VSRs. A bit in the flags
+parameter specifies if this call is L2 wide or vCPU specific and the
+IDs in the GSB must match this.
+
+The L1 provides a pointer to the GSB as a parameter to this call. Also
+provided is the L2 and vCPU IDs associated with the state to set.
+
+The L1 writes all values in the GSB and the L0 only reads the GSB for
+this call::
+
+  H_GUEST_SET_STATE(uint64 flags,
+                    uint64 guestId,
+                    uint64 vcpuId,
+                    uint64 dataBuffer,
+                    uint64 dataBufferSizeInBytes);
+  Parameters:
+    Input:
+      flags:
+         Bit 0: getGuestWideState: Request state of the Guest instead
+           of an individual VCPU.
+         Bit 1: returnOwnershipOfVcpuState Return Guest VCPU state. See
+           GET_STATE takeOwnershipOfVcpuState
+         Bits 2-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
+      dataBuffer: A L1 real address of the GSB.
+        If takeOwnershipOfVcpuState, size must be at least the size
+        returned by ID=0x0001
+      dataBufferSizeInBytes: Size of dataBuffer
+    Output:
+      R3: Return code
+      R4: If R3 = H_Invalid_Element_Id: The array index of the bad
+            element ID.
+          If R3 = H_Invalid_Element_Size: The array index of the bad
+             element size.
+          If R3 = H_Invalid_Element_Value: The array index of the bad
+             element value.
+
+H_GUEST_RUN_VCPU()
+------------------
+
+This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
+parameters. The vCPU run with the state set previously using
+H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
+hcall.
+
+This hcall also has associated input and output GSBs. Unlike
+H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
+parameters to the hcall (This was done in the interest of
+performance). The locations of these GSBs must be preregistered using
+the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
+below).
+
+The input GSB may contain only VCPU specific elements to be set. This
+GSB may also contain zero elements (ie 0 in the first 4 bytes of the
+GSB) if nothing needs to be set.
+
+On exit from the hcall, the output buffer is filled with elements
+determined by the L0. The reason for the exit is contained in GPR4 (ie
+NIP is put in GPR4).  The elements returned depend on the exit
+type. For example, if the exit reason is the L2 doing a hcall (GPR4 +0xc00), then GPR3-12 are provided in the output GSB as this is the
+state likely needed to service the hcall. If additional state is
+needed, H_GUEST_GET_STATE() may be called by the L1.
+
+To synthesize interrupts in the L2, when calling H_GUEST_RUN_VCPU()
+the L1 may set a flag (as a hcall parameter) and the L0 will
+synthesize the interrupt in the L2. Alternatively, the L1 may
+synthesize the interrupt itself using H_GUEST_SET_STATE() or the
+H_GUEST_RUN_VCPU() input GSB to set the state appropriately::
+
+  H_GUEST_RUN_VCPU(uint64 flags,
+                   uint64 guestId,
+                   uint64 vcpuId,
+                   uint64 dataBuffer,
+                   uint64 dataBufferSizeInBytes);
+  Parameters:
+    Input:
+      flags:
+         Bit 0: generateExternalInterrupt: Generate an external interrupt
+         Bit 1: generatePrivilegedDoorbell: Generate a Privileged Doorbell
+         Bit 2: sendToSystemReset”: Generate a System Reset Interrupt
+         Bits 3-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
+    Output:
+      R3: Return code
+      R4: If R3 = H_Success: The reason L1 VCPU exited (ie. NIA)
+            0x000: The VCPU stopped running for an unspecified reason. An
+              example of this is the Hypervisor stopping a VCPU running
+              due to an outstanding interrupt for the Host Partition.
+            0x980: HDEC
+            0xC00: HCALL
+            0xE00: HDSI
+            0xE20: HISI
+            0xE40: HEA
+            0xF80: HV Fac Unavail
+          If R3 = H_Invalid_Element_Id, H_Invalid_Element_Size, or
+            H_Invalid_Element_Value: R4 is offset of the invalid element
+            in the input buffer.
+
+H_GUEST_DELETE()
+----------------
+
+This is called to delete an L2. All associated vCPUs are also
+deleted. No specific vCPU delete call is provided.
+
+A flag may be provided to delete all guests. This is used to reset the
+L0 in the case of kdump/kexec::
+
+  H_GUEST_DELETE(uint64 flags,
+                 uint64 guestId)
+  Parameters:
+    Input:
+      flags:
+         Bit 0: deleteAllGuests: deletes all guests
+         Bits 1-63: Reserved
+      guestId: ID obtained from H_GUEST_CREATE
+    Output:
+      R3: Return code
+
+Guest State Buffer
+=========
+
+The Guest State Buffer (GSB) is the main method of communicating state
+about the L2 between the L1 and L0 via H_GUEST_{G,S}ET() and
+H_GUEST_VCPU_RUN() calls.
+
+State may be associated with a whole L2 (eg timebase offset) or a
+specific L2 vCPU (eg. GPR state). Only L2 VCPU state maybe be set by
+H_GUEST_VCPU_RUN(). 
+
+All data in the GSB is big endian (as is standard in PAPR)
+
+The Guest state buffer has a header which gives the number of
+elements, followed by the GSB elements themselves.
+
+GSB header:
+
++----------+----------+-------------------------------------------+
+|  Offset  |  Size    |  Purpose                                  |
+|  Bytes   |  Bytes   |                                           |
++=====+=====+======================+
+|    0     |    4     |  Number of elements                       |
++----------+----------+-------------------------------------------+
+|    4     |          |  Guest state buffer elements              |
++----------+----------+-------------------------------------------+
+
+GSB element:
+
++----------+----------+-------------------------------------------+
+|  Offset  |  Size    |  Purpose                                  |
+|  Bytes   |  Bytes   |                                           |
++=====+=====+======================+
+|    0     |    2     |  ID                                       |
++----------+----------+-------------------------------------------+
+|    2     |    2     |  Size of Value                            |
++----------+----------+-------------------------------------------+
+|    4     | As above |  Value                                    |
++----------+----------+-------------------------------------------+
+
+The ID in the GSB element specifies what is to be set. This includes
+archtected state like GPRs, VSRs, SPRs, plus also some meta data about
+the partition like the timebase offset and partition scoped page
+table information.
+
++--------+-------+----+--------+----------------------------------+
+|   ID   | Size  | RW | Thread | Details                          |
+|        | Bytes |    | Guest  |                                  |
+|        |       |    | Scope  |                                  |
++====+====+==+====+=================+
+| 0x0000 |       | RW |   TG   | NOP element                      |
++--------+-------+----+--------+----------------------------------+
+| 0x0001 | 0x08  | R  |   G    | Size of L0 vCPU state. See:      |
+|        |       |    |        | H_GUEST_GET_STATE:               |
+|        |       |    |        | flags = takeOwnershipOfVcpuState |
++--------+-------+----+--------+----------------------------------+
+| 0x0002 | 0x08  | R  |   G    | Size Run vCPU out buffer         |
++--------+-------+----+--------+----------------------------------+
+| 0x0003 | 0x04  | RW |   G    | Logical PVR                      |
++--------+-------+----+--------+----------------------------------+
+| 0x0004 | 0x08  | RW |   G    | TB Offset (L1 relative)          |
++--------+-------+----+--------+----------------------------------+
+| 0x0005 | 0x18  | RW |   G    |Partition scoped page tbl info:   |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x00 Addr part scope table      |
+|        |       |    |        |- 0x08 Num addr bits              |
+|        |       |    |        |- 0x10 Size root dir              |
++--------+-------+----+--------+----------------------------------+
+| 0x0006 | 0x10  | RW |   G    |Process Table Information:        |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr proc scope table       |
+|        |       |    |        |- 0x8 Table size.                 |
++--------+-------+----+--------+----------------------------------+
+| 0x0007-|       |    |        | Reserved                         |
+| 0x0BFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x0C00 | 0x10  | RW |   T    |Run vCPU Input Buffer:            |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr of buffer              |
+|        |       |    |        |- 0x8 Buffer Size.                |
++--------+-------+----+--------+----------------------------------+
+| 0x0C01 | 0x10  | RW |   T    |Run vCPU Output Buffer:           |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr of buffer              |
+|        |       |    |        |- 0x8 Buffer Size.                |
++--------+-------+----+--------+----------------------------------+
+| 0x0C02 | 0x08  | RW |   T    | vCPU VPA Address                 |
++--------+-------+----+--------+----------------------------------+
+| 0x0C03-|       |    |        | Reserved                         |
+| 0x0FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x1000-| 0x08  | RW |   T    | GPR 0-31                         |
+| 0x101F |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x1020 |  0x08 | T  |   T    | HDEC expiry TB                   |
++--------+-------+----+--------+----------------------------------+
+| 0x1021 | 0x08  | RW |   T    | NIA                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1022 | 0x08  | RW |   T    | MSR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1023 | 0x08  | RW |   T    | LR                               |
++--------+-------+----+--------+----------------------------------+
+| 0x1024 | 0x08  | RW |   T    | XER                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1025 | 0x08  | RW |   T    | CTR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1026 | 0x08  | RW |   T    | CFAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1027 | 0x08  | RW |   T    | SRR0                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1028 | 0x08  | RW |   T    | SRR1                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1029 | 0x08  | RW |   T    | DAR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x102A | 0x08  | RW |   T    | DEC expiry TB                    |
++--------+-------+----+--------+----------------------------------+
+| 0x102B | 0x08  | RW |   T    | VTB                              |
++--------+-------+----+--------+----------------------------------+
+| 0x102C | 0x08  | RW |   T    | LPCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x102D | 0x08  | RW |   T    | HFSCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x102E | 0x08  | RW |   T    | FSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x102F | 0x08  | RW |   T    | FPSCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1030 | 0x08  | RW |   T    | DAWR0                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1031 | 0x08  | RW |   T    | DAWR1                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1032 | 0x08  | RW |   T    | CIABR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1033 | 0x08  | RW |   T    | PURR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1034 | 0x08  | RW |   T    | SPURR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1035 | 0x08  | RW |   T    | IC                               |
++--------+-------+----+--------+----------------------------------+
+| 0x1036-| 0x08  | RW |   T    | SPRG 0-3                         |
+| 0x1039 |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x103A | 0x08  | W  |   T    | PPR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x103B | 0x08  | RW |   T    | MMCR 0-3                         |
+| 0x103E |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x103F | 0x08  | RW |   T    | MMCRA                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1040 | 0x08  | RW |   T    | SIER                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1041 | 0x08  | RW |   T    | SIER 2                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1042 | 0x08  | RW |   T    | SIER 3                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1043 | 0x08  | RW |   T    | BESCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1044 | 0x08  | RW |   T    | EBBHR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1045 | 0x08  | RW |   T    | EBBRR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1046 | 0x08  | RW |   T    | AMR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1047 | 0x08  | RW |   T    | IAMR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1048 | 0x08  | RW |   T    | AMOR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1049 | 0x08  | RW |   T    | UAMOR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x104A | 0x08  | RW |   T    | SDAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104B | 0x08  | RW |   T    | SIAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104C | 0x08  | RW |   T    | DSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104D | 0x08  | RW |   T    | TAR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x104E | 0x08  | RW |   T    | DEXCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x104F | 0x08  | RW |   T    | HDEXCR                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1050 | 0x08  | RW |   T    | HASHKEYR                         |
++--------+-------+----+--------+----------------------------------+
+| 0x1051 | 0x08  | RW |   T    | HASHPKEYR                        |
++--------+-------+----+--------+----------------------------------+
+| 0x1052 | 0x08  | RW |   T    | CTRL                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1053-|       |    |        | Reserved                         |
+| 0x1FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x2000 | 0x04  | RW |   T    | CR                               |
++--------+-------+----+--------+----------------------------------+
+| 0x2001 | 0x04  | RW |   T    | PIDR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x2002 | 0x04  | RW |   T    | DSISR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x2003 | 0x04  | RW |   T    | VSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x2004 | 0x04  | RW |   T    | VRSAVE                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2005 | 0x04  | RW |   T    | DAWRX0                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2006 | 0x04  | RW |   T    | DAWRX1                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2007-| 0x04  | RW |   T    | PMC 1-6                          |
+| 0x200c |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x200D | 0x04  | RW |   T    | WORT                             |
++--------+-------+----+--------+----------------------------------+
+| 0x200E | 0x04  | RW |   T    | PSPB                             |
++--------+-------+----+--------+----------------------------------+
+| 0x200F-|       |    |        | Reserved                         |
+| 0x2FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x3000-| 0x10  | RW |   T    | VSR 0-63                         |
+| 0x303F |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x3040-|       |    |        | Reserved                         |
+| 0xEFFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0xF000 | 0x08  | R  |   T    | HDAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0xF001 | 0x04  | R  |   T    | HDSISR                           |
++--------+-------+----+--------+----------------------------------+
+| 0xF002 | 0x04  | R  |   T    | HEIR                             |
++--------+-------+----+--------+----------------------------------+
+| 0xF003 | 0x08  | R  |   T    | ASDR                             |
++--------+-------+----+--------+----------------------------------+
+
+
+Miscellaneous info
+=========
+
+State not in ptregs/hvregs
+--------------------------
+
+In the v1 API, some state is not in the ptregs/hvstate. This includes
+the vector register and some SPRs. For the L1 to set this state for
+the L2, the L1 loads up these hardware registers before the
+h_enter_nested() call and the L0 ensures they end up as the L2 state
+(by not touching them).
+
+The v2 API removes this and explicitly sets this state via the GSB.
+
+L1 Implementation details: Caching state
+----------------------------------------
+
+In the v1 API, all state is sent from the L1 to the L0 and vice versa
+on every h_enter_nested() hcall. If the L0 is not currently running
+any L2s, the L0 has no state information about them. The only
+exception to this is the location of the partition table, registered
+via h_set_partition_table().
+
+The v2 API changes this so that the L0 retains the L2 state even when
+it's vCPUs are no longer running. This means that the L1 only needs to
+communicate with the L0 about L2 state when it needs to modify the L2
+state, or when it's value is out of date. This provides an opportunity
+for performance optimisation.
+
+When a vCPU exits from a H_GUEST_RUN_VCPU() call, the L1 internally
+marks all L2 state as invalid. This means that if the L1 wants to know
+the L2 state (say via a kvm_get_one_reg() call), it needs call
+H_GUEST_GET_STATE() to get that state. Once it's read, it's marked as
+valid in L1 until the L2 is run again.
+
+Also, when an L1 modifies L2 vcpu state, it doesn't need to write it
+to the L0 until that L2 vcpu runs again. Hence when the L1 updates
+state (say via a kvm_set_one_reg() call), it writes to an internal L1
+copy and only flushes this copy to the L0 when the L2 runs again via
+the H_GUEST_VCPU_RUN() input buffer.
+
+This lazy updating of state by the L1 avoids unnecessary
+H_GUEST_{G|S}ET_STATE() calls.
+
+
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH RFC v2 6/6] docs: powerpc: Document nested KVM on POWER
  2023-06-05  6:48   ` Jordan Niethe
  (?)
@ 2023-06-07  5:37     ` Gautam Menghani
  -1 siblings, 0 replies; 57+ messages in thread
From: Gautam Menghani @ 2023-06-07  5:37 UTC (permalink / raw)
  To: Jordan Niethe
  Cc: linuxppc-dev, mikey, kautuk.consul.1980, kvm, npiggin, kvm-ppc,
	sbhat, vaibhav

On Mon, Jun 05, 2023 at 04:48:48PM +1000, Jordan Niethe wrote:
> From: Michael Neuling <mikey@neuling.org>

Hi,
There are some minor typos in the documentation pointed out below


> +H_GUEST_GET_STATE()
> +-------------------
> +
> +This is called to get state associated with an L2 (Guest-wide or vCPU specific).
> +This info is passed via the Guest State Buffer (GSB), a standard format as
> +explained later in this doc, necessary details below:
> +
> +This can set either L2 wide or vcpu specific information. Examples of

We are getting the info about vcpu here : s/set/get

> +H_GUEST_RUN_VCPU()
> +------------------
> +
> +This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
> +parameters. The vCPU run with the state set previously using

Minor nit : s/run/runs

> +H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
> +hcall.
> +
> +This hcall also has associated input and output GSBs. Unlike
> +H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
> +parameters to the hcall (This was done in the interest of
> +performance). The locations of these GSBs must be preregistered using
> +the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
> +below).
> +
> 
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH RFC v2 6/6] docs: powerpc: Document nested KVM on POWER
@ 2023-06-07  5:37     ` Gautam Menghani
  0 siblings, 0 replies; 57+ messages in thread
From: Gautam Menghani @ 2023-06-07  5:37 UTC (permalink / raw)
  To: Jordan Niethe
  Cc: mikey, kautuk.consul.1980, kvm, kvm-ppc, npiggin, sbhat, vaibhav,
	linuxppc-dev

On Mon, Jun 05, 2023 at 04:48:48PM +1000, Jordan Niethe wrote:
> From: Michael Neuling <mikey@neuling.org>

Hi,
There are some minor typos in the documentation pointed out below


> +H_GUEST_GET_STATE()
> +-------------------
> +
> +This is called to get state associated with an L2 (Guest-wide or vCPU specific).
> +This info is passed via the Guest State Buffer (GSB), a standard format as
> +explained later in this doc, necessary details below:
> +
> +This can set either L2 wide or vcpu specific information. Examples of

We are getting the info about vcpu here : s/set/get

> +H_GUEST_RUN_VCPU()
> +------------------
> +
> +This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
> +parameters. The vCPU run with the state set previously using

Minor nit : s/run/runs

> +H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
> +hcall.
> +
> +This hcall also has associated input and output GSBs. Unlike
> +H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
> +parameters to the hcall (This was done in the interest of
> +performance). The locations of these GSBs must be preregistered using
> +the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
> +below).
> +
> 
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH RFC v2 6/6] docs: powerpc: Document nested KVM on POWER
@ 2023-06-07  5:37     ` Gautam Menghani
  0 siblings, 0 replies; 57+ messages in thread
From: Gautam Menghani @ 2023-06-07  5:49 UTC (permalink / raw)
  To: Jordan Niethe
  Cc: linuxppc-dev, mikey, kautuk.consul.1980, kvm, npiggin, kvm-ppc,
	sbhat, vaibhav

On Mon, Jun 05, 2023 at 04:48:48PM +1000, Jordan Niethe wrote:
> From: Michael Neuling <mikey@neuling.org>

Hi,
There are some minor typos in the documentation pointed out below


> +H_GUEST_GET_STATE()
> +-------------------
> +
> +This is called to get state associated with an L2 (Guest-wide or vCPU specific).
> +This info is passed via the Guest State Buffer (GSB), a standard format as
> +explained later in this doc, necessary details below:
> +
> +This can set either L2 wide or vcpu specific information. Examples of

We are getting the info about vcpu here : s/set/get

> +H_GUEST_RUN_VCPU()
> +------------------
> +
> +This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
> +parameters. The vCPU run with the state set previously using

Minor nit : s/run/runs

> +H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
> +hcall.
> +
> +This hcall also has associated input and output GSBs. Unlike
> +H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
> +parameters to the hcall (This was done in the interest of
> +performance). The locations of these GSBs must be preregistered using
> +the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
> +below).
> +
> 
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests
  2023-06-05  6:48 ` Jordan Niethe
  (?)
@ 2023-06-07  5:53   ` Nicholas Piggin
  -1 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  5:53 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: kvm, kvm-ppc, mikey, paulus, kautuk.consul.1980, vaibhav, sbhat

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> There is existing support for nested guests on powernv hosts however the
> hcall interface this uses is not support by other PAPR hosts.

I kind of liked it being called nested-HV v1 and v2 APIs as short and
to the point, but I suppose that's ambiguous with version 2 of the v1
API, so papr is okay. What's the old API called in this scheme, then?
"Existing API" is not great after patches go upstream.

And, you've probably explained it pretty well but slightly more of
a background first up could be helpful. E.g.,

  A nested-HV API for PAPR has been developed based on the KVM-specific
  nested-HV API that is upstream in Linux/KVM and QEMU. The PAPR API
  had to break compatibility to accommodate implementation in other
  hypervisors and partitioning firmware.

And key overall differences

  The control flow and interrupt processing between L0, L1, and L2
  in the new PAPR API are conceptually unchanged. Where the old API
  is almost stateless, the PAPR API is stateful, with the L1 registering
  L2 virtual machines and vCPUs with the L0. Supervisor-privileged
  register switching duty is now the responsibility for the L0, which
  holds canonical L2 register state and handles all switching. This
  new register handling motivates the "getters and setters" wrappers
  ...

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests
@ 2023-06-07  5:53   ` Nicholas Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  5:53 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: mikey, kautuk.consul.1980, kvm, kvm-ppc, sbhat, vaibhav

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> There is existing support for nested guests on powernv hosts however the
> hcall interface this uses is not support by other PAPR hosts.

I kind of liked it being called nested-HV v1 and v2 APIs as short and
to the point, but I suppose that's ambiguous with version 2 of the v1
API, so papr is okay. What's the old API called in this scheme, then?
"Existing API" is not great after patches go upstream.

And, you've probably explained it pretty well but slightly more of
a background first up could be helpful. E.g.,

  A nested-HV API for PAPR has been developed based on the KVM-specific
  nested-HV API that is upstream in Linux/KVM and QEMU. The PAPR API
  had to break compatibility to accommodate implementation in other
  hypervisors and partitioning firmware.

And key overall differences

  The control flow and interrupt processing between L0, L1, and L2
  in the new PAPR API are conceptually unchanged. Where the old API
  is almost stateless, the PAPR API is stateful, with the L1 registering
  L2 virtual machines and vCPUs with the L0. Supervisor-privileged
  register switching duty is now the responsibility for the L0, which
  holds canonical L2 register state and handles all switching. This
  new register handling motivates the "getters and setters" wrappers
  ...

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests
@ 2023-06-07  5:53   ` Nicholas Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  5:53 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: kvm, kvm-ppc, mikey, paulus, kautuk.consul.1980, vaibhav, sbhat

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> There is existing support for nested guests on powernv hosts however the
> hcall interface this uses is not support by other PAPR hosts.

I kind of liked it being called nested-HV v1 and v2 APIs as short and
to the point, but I suppose that's ambiguous with version 2 of the v1
API, so papr is okay. What's the old API called in this scheme, then?
"Existing API" is not great after patches go upstream.

And, you've probably explained it pretty well but slightly more of
a background first up could be helpful. E.g.,

  A nested-HV API for PAPR has been developed based on the KVM-specific
  nested-HV API that is upstream in Linux/KVM and QEMU. The PAPR API
  had to break compatibility to accommodate implementation in other
  hypervisors and partitioning firmware.

And key overall differences

  The control flow and interrupt processing between L0, L1, and L2
  in the new PAPR API are conceptually unchanged. Where the old API
  is almost stateless, the PAPR API is stateful, with the L1 registering
  L2 virtual machines and vCPUs with the L0. Supervisor-privileged
  register switching duty is now the responsibility for the L0, which
  holds canonical L2 register state and handles all switching. This
  new register handling motivates the "getters and setters" wrappers
  ...

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 1/6] KVM: PPC: Use getters and setters for vcpu register state
  2023-06-05  6:48   ` Jordan Niethe
  (?)
@ 2023-06-07  7:51     ` Nicholas Piggin
  -1 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  7:51 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: mikey, kautuk.consul.1980, kvm, kvm-ppc, sbhat, vaibhav

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> There are already some getter and setter functions used for accessing
> vcpu register state, e.g. kvmppc_get_pc(). There are also more
> complicated examples that are generated by macros like
> kvmppc_get_sprg0() which are generated by the SHARED_SPRNG_WRAPPER()
> macro.
>
> In the new PAPR API for nested guest partitions the L1 is required to
> communicate with the L0 to modify and read nested guest state.
>
> Prepare to support this by replacing direct accesses to vcpu register
> state with wrapper functions. Follow the existing pattern of using
> macros to generate individual wrappers. These wrappers will
> be augmented for supporting PAPR nested guests later.
>
> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_book3s.h  |  68 +++++++-
>  arch/powerpc/include/asm/kvm_ppc.h     |  48 ++++--
>  arch/powerpc/kvm/book3s.c              |  22 +--
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |   4 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |   9 +-
>  arch/powerpc/kvm/book3s_64_vio.c       |   4 +-
>  arch/powerpc/kvm/book3s_hv.c           | 222 +++++++++++++------------
>  arch/powerpc/kvm/book3s_hv.h           |  59 +++++++
>  arch/powerpc/kvm/book3s_hv_builtin.c   |  10 +-
>  arch/powerpc/kvm/book3s_hv_p9_entry.c  |   4 +-
>  arch/powerpc/kvm/book3s_hv_ras.c       |   5 +-
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c    |   8 +-
>  arch/powerpc/kvm/book3s_hv_rm_xics.c   |   4 +-
>  arch/powerpc/kvm/book3s_xive.c         |   9 +-
>  arch/powerpc/kvm/powerpc.c             |   4 +-
>  15 files changed, 322 insertions(+), 158 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index bbf5e2c5fe09..4e91f54a3f9f 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -392,6 +392,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.regs.nip;
>  }
>  
> +static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 val)
> +{
> +	vcpu->arch.pid = val;
> +}
> +
> +static inline u32 kvmppc_get_pid(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.pid;
> +}
> +
>  static inline u64 kvmppc_get_msr(struct kvm_vcpu *vcpu);
>  static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu)
>  {
> @@ -403,10 +413,66 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.fault_dar;
>  }
>  
> +#define BOOK3S_WRAPPER_SET(reg, size)					\
> +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
> +{									\
> +									\
> +	vcpu->arch.reg = val;						\
> +}
> +
> +#define BOOK3S_WRAPPER_GET(reg, size)					\
> +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
> +{									\
> +	return vcpu->arch.reg;						\
> +}
> +
> +#define BOOK3S_WRAPPER(reg, size)					\
> +	BOOK3S_WRAPPER_SET(reg, size)					\
> +	BOOK3S_WRAPPER_GET(reg, size)					\
> +
> +BOOK3S_WRAPPER(tar, 64)
> +BOOK3S_WRAPPER(ebbhr, 64)
> +BOOK3S_WRAPPER(ebbrr, 64)
> +BOOK3S_WRAPPER(bescr, 64)
> +BOOK3S_WRAPPER(ic, 64)
> +BOOK3S_WRAPPER(vrsave, 64)
> +
> +
> +#define VCORE_WRAPPER_SET(reg, size)					\
> +static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
> +{									\
> +	vcpu->arch.vcore->reg = val;					\
> +}
> +
> +#define VCORE_WRAPPER_GET(reg, size)					\
> +static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
> +{									\
> +	return vcpu->arch.vcore->reg;					\
> +}
> +
> +#define VCORE_WRAPPER(reg, size)					\
> +	VCORE_WRAPPER_SET(reg, size)					\
> +	VCORE_WRAPPER_GET(reg, size)					\
> +
> +
> +VCORE_WRAPPER(vtb, 64)
> +VCORE_WRAPPER(tb_offset, 64)
> +VCORE_WRAPPER(lpcr, 64)

The general idea is fine, some of the names could use a bit of
improvement. What's a BOOK3S_WRAPPER for example, is it not a
VCPU_WRAPPER, or alternatively why isn't a VCORE_WRAPPER Book3S
as well?

> +
> +static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.dec_expires;
> +}
> +
> +static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
> +{
> +	vcpu->arch.dec_expires = val;
> +}
> +
>  /* Expiry time of vcpu DEC relative to host TB */
>  static inline u64 kvmppc_dec_expires_host_tb(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu->arch.dec_expires - vcpu->arch.vcore->tb_offset;
> +	return kvmppc_get_dec_expires(vcpu) - kvmppc_get_tb_offset_hv(vcpu);
>  }
>  
>  static inline bool is_kvmppc_resume_guest(int r)
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 79a9c0bb8bba..fbac353ac46b 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -936,7 +936,7 @@ static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
>  #define SPRNG_WRAPPER_SET(reg, bookehv_spr)				\
>  static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val)	\
>  {									\
> -	mtspr(bookehv_spr, val);						\
> +	mtspr(bookehv_spr, val);					\
>  }									\
>  
>  #define SHARED_WRAPPER_GET(reg, size)					\

Stray hunk I think.

> @@ -957,10 +957,32 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
>  	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
>  }									\
>  
> +#define SHARED_CACHE_WRAPPER_GET(reg, size)				\
> +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
> +{									\
> +	if (kvmppc_shared_big_endian(vcpu))				\
> +	       return be##size##_to_cpu(vcpu->arch.shared->reg);	\
> +	else								\
> +	       return le##size##_to_cpu(vcpu->arch.shared->reg);	\
> +}									\
> +
> +#define SHARED_CACHE_WRAPPER_SET(reg, size)				\
> +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
> +{									\
> +	if (kvmppc_shared_big_endian(vcpu))				\
> +	       vcpu->arch.shared->reg = cpu_to_be##size(val);		\
> +	else								\
> +	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
> +}									\
> +
>  #define SHARED_WRAPPER(reg, size)					\
>  	SHARED_WRAPPER_GET(reg, size)					\
>  	SHARED_WRAPPER_SET(reg, size)					\
>  
> +#define SHARED_CACHE_WRAPPER(reg, size)					\
> +	SHARED_CACHE_WRAPPER_GET(reg, size)				\
> +	SHARED_CACHE_WRAPPER_SET(reg, size)				\

SHARED_CACHE_WRAPPER that does the same thing as SHARED_WRAPPER.

I know some of the names are a but crufty but it's probably a good time
to rethink them a bit.

KVMPPC_VCPU_SHARED_REG_ACCESSOR or something like that. A few
more keystrokes could help imensely.

> diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> index 34f1db212824..34bc0a8a1288 100644
> --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
> +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> @@ -305,7 +305,7 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6
>  	u32 pid;
>  
>  	lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
> -	pid = vcpu->arch.pid;
> +	pid = kvmppc_get_pid(vcpu);
>  
>  	/*
>  	 * Prior memory accesses to host PID Q3 must be completed before we

Could add some accessors for get_lpid / get_guest_id which check for the
correct KVM mode maybe.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 1/6] KVM: PPC: Use getters and setters for vcpu register state
@ 2023-06-07  7:51     ` Nicholas Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  7:51 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: kvm, kvm-ppc, mikey, paulus, kautuk.consul.1980, vaibhav, sbhat

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> There are already some getter and setter functions used for accessing
> vcpu register state, e.g. kvmppc_get_pc(). There are also more
> complicated examples that are generated by macros like
> kvmppc_get_sprg0() which are generated by the SHARED_SPRNG_WRAPPER()
> macro.
>
> In the new PAPR API for nested guest partitions the L1 is required to
> communicate with the L0 to modify and read nested guest state.
>
> Prepare to support this by replacing direct accesses to vcpu register
> state with wrapper functions. Follow the existing pattern of using
> macros to generate individual wrappers. These wrappers will
> be augmented for supporting PAPR nested guests later.
>
> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_book3s.h  |  68 +++++++-
>  arch/powerpc/include/asm/kvm_ppc.h     |  48 ++++--
>  arch/powerpc/kvm/book3s.c              |  22 +--
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |   4 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |   9 +-
>  arch/powerpc/kvm/book3s_64_vio.c       |   4 +-
>  arch/powerpc/kvm/book3s_hv.c           | 222 +++++++++++++------------
>  arch/powerpc/kvm/book3s_hv.h           |  59 +++++++
>  arch/powerpc/kvm/book3s_hv_builtin.c   |  10 +-
>  arch/powerpc/kvm/book3s_hv_p9_entry.c  |   4 +-
>  arch/powerpc/kvm/book3s_hv_ras.c       |   5 +-
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c    |   8 +-
>  arch/powerpc/kvm/book3s_hv_rm_xics.c   |   4 +-
>  arch/powerpc/kvm/book3s_xive.c         |   9 +-
>  arch/powerpc/kvm/powerpc.c             |   4 +-
>  15 files changed, 322 insertions(+), 158 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index bbf5e2c5fe09..4e91f54a3f9f 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -392,6 +392,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.regs.nip;
>  }
>  
> +static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 val)
> +{
> +	vcpu->arch.pid = val;
> +}
> +
> +static inline u32 kvmppc_get_pid(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.pid;
> +}
> +
>  static inline u64 kvmppc_get_msr(struct kvm_vcpu *vcpu);
>  static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu)
>  {
> @@ -403,10 +413,66 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.fault_dar;
>  }
>  
> +#define BOOK3S_WRAPPER_SET(reg, size)					\
> +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
> +{									\
> +									\
> +	vcpu->arch.reg = val;						\
> +}
> +
> +#define BOOK3S_WRAPPER_GET(reg, size)					\
> +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
> +{									\
> +	return vcpu->arch.reg;						\
> +}
> +
> +#define BOOK3S_WRAPPER(reg, size)					\
> +	BOOK3S_WRAPPER_SET(reg, size)					\
> +	BOOK3S_WRAPPER_GET(reg, size)					\
> +
> +BOOK3S_WRAPPER(tar, 64)
> +BOOK3S_WRAPPER(ebbhr, 64)
> +BOOK3S_WRAPPER(ebbrr, 64)
> +BOOK3S_WRAPPER(bescr, 64)
> +BOOK3S_WRAPPER(ic, 64)
> +BOOK3S_WRAPPER(vrsave, 64)
> +
> +
> +#define VCORE_WRAPPER_SET(reg, size)					\
> +static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
> +{									\
> +	vcpu->arch.vcore->reg = val;					\
> +}
> +
> +#define VCORE_WRAPPER_GET(reg, size)					\
> +static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
> +{									\
> +	return vcpu->arch.vcore->reg;					\
> +}
> +
> +#define VCORE_WRAPPER(reg, size)					\
> +	VCORE_WRAPPER_SET(reg, size)					\
> +	VCORE_WRAPPER_GET(reg, size)					\
> +
> +
> +VCORE_WRAPPER(vtb, 64)
> +VCORE_WRAPPER(tb_offset, 64)
> +VCORE_WRAPPER(lpcr, 64)

The general idea is fine, some of the names could use a bit of
improvement. What's a BOOK3S_WRAPPER for example, is it not a
VCPU_WRAPPER, or alternatively why isn't a VCORE_WRAPPER Book3S
as well?

> +
> +static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.dec_expires;
> +}
> +
> +static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
> +{
> +	vcpu->arch.dec_expires = val;
> +}
> +
>  /* Expiry time of vcpu DEC relative to host TB */
>  static inline u64 kvmppc_dec_expires_host_tb(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu->arch.dec_expires - vcpu->arch.vcore->tb_offset;
> +	return kvmppc_get_dec_expires(vcpu) - kvmppc_get_tb_offset_hv(vcpu);
>  }
>  
>  static inline bool is_kvmppc_resume_guest(int r)
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 79a9c0bb8bba..fbac353ac46b 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -936,7 +936,7 @@ static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
>  #define SPRNG_WRAPPER_SET(reg, bookehv_spr)				\
>  static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val)	\
>  {									\
> -	mtspr(bookehv_spr, val);						\
> +	mtspr(bookehv_spr, val);					\
>  }									\
>  
>  #define SHARED_WRAPPER_GET(reg, size)					\

Stray hunk I think.

> @@ -957,10 +957,32 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
>  	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
>  }									\
>  
> +#define SHARED_CACHE_WRAPPER_GET(reg, size)				\
> +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
> +{									\
> +	if (kvmppc_shared_big_endian(vcpu))				\
> +	       return be##size##_to_cpu(vcpu->arch.shared->reg);	\
> +	else								\
> +	       return le##size##_to_cpu(vcpu->arch.shared->reg);	\
> +}									\
> +
> +#define SHARED_CACHE_WRAPPER_SET(reg, size)				\
> +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
> +{									\
> +	if (kvmppc_shared_big_endian(vcpu))				\
> +	       vcpu->arch.shared->reg = cpu_to_be##size(val);		\
> +	else								\
> +	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
> +}									\
> +
>  #define SHARED_WRAPPER(reg, size)					\
>  	SHARED_WRAPPER_GET(reg, size)					\
>  	SHARED_WRAPPER_SET(reg, size)					\
>  
> +#define SHARED_CACHE_WRAPPER(reg, size)					\
> +	SHARED_CACHE_WRAPPER_GET(reg, size)				\
> +	SHARED_CACHE_WRAPPER_SET(reg, size)				\

SHARED_CACHE_WRAPPER that does the same thing as SHARED_WRAPPER.

I know some of the names are a but crufty but it's probably a good time
to rethink them a bit.

KVMPPC_VCPU_SHARED_REG_ACCESSOR or something like that. A few
more keystrokes could help imensely.

> diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> index 34f1db212824..34bc0a8a1288 100644
> --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
> +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> @@ -305,7 +305,7 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6
>  	u32 pid;
>  
>  	lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
> -	pid = vcpu->arch.pid;
> +	pid = kvmppc_get_pid(vcpu);
>  
>  	/*
>  	 * Prior memory accesses to host PID Q3 must be completed before we

Could add some accessors for get_lpid / get_guest_id which check for the
correct KVM mode maybe.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 1/6] KVM: PPC: Use getters and setters for vcpu register state
@ 2023-06-07  7:51     ` Nicholas Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  7:51 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: mikey, kautuk.consul.1980, kvm, kvm-ppc, sbhat, vaibhav

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> There are already some getter and setter functions used for accessing
> vcpu register state, e.g. kvmppc_get_pc(). There are also more
> complicated examples that are generated by macros like
> kvmppc_get_sprg0() which are generated by the SHARED_SPRNG_WRAPPER()
> macro.
>
> In the new PAPR API for nested guest partitions the L1 is required to
> communicate with the L0 to modify and read nested guest state.
>
> Prepare to support this by replacing direct accesses to vcpu register
> state with wrapper functions. Follow the existing pattern of using
> macros to generate individual wrappers. These wrappers will
> be augmented for supporting PAPR nested guests later.
>
> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_book3s.h  |  68 +++++++-
>  arch/powerpc/include/asm/kvm_ppc.h     |  48 ++++--
>  arch/powerpc/kvm/book3s.c              |  22 +--
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |   4 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |   9 +-
>  arch/powerpc/kvm/book3s_64_vio.c       |   4 +-
>  arch/powerpc/kvm/book3s_hv.c           | 222 +++++++++++++------------
>  arch/powerpc/kvm/book3s_hv.h           |  59 +++++++
>  arch/powerpc/kvm/book3s_hv_builtin.c   |  10 +-
>  arch/powerpc/kvm/book3s_hv_p9_entry.c  |   4 +-
>  arch/powerpc/kvm/book3s_hv_ras.c       |   5 +-
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c    |   8 +-
>  arch/powerpc/kvm/book3s_hv_rm_xics.c   |   4 +-
>  arch/powerpc/kvm/book3s_xive.c         |   9 +-
>  arch/powerpc/kvm/powerpc.c             |   4 +-
>  15 files changed, 322 insertions(+), 158 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index bbf5e2c5fe09..4e91f54a3f9f 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -392,6 +392,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.regs.nip;
>  }
>  
> +static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 val)
> +{
> +	vcpu->arch.pid = val;
> +}
> +
> +static inline u32 kvmppc_get_pid(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.pid;
> +}
> +
>  static inline u64 kvmppc_get_msr(struct kvm_vcpu *vcpu);
>  static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu)
>  {
> @@ -403,10 +413,66 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.fault_dar;
>  }
>  
> +#define BOOK3S_WRAPPER_SET(reg, size)					\
> +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
> +{									\
> +									\
> +	vcpu->arch.reg = val;						\
> +}
> +
> +#define BOOK3S_WRAPPER_GET(reg, size)					\
> +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
> +{									\
> +	return vcpu->arch.reg;						\
> +}
> +
> +#define BOOK3S_WRAPPER(reg, size)					\
> +	BOOK3S_WRAPPER_SET(reg, size)					\
> +	BOOK3S_WRAPPER_GET(reg, size)					\
> +
> +BOOK3S_WRAPPER(tar, 64)
> +BOOK3S_WRAPPER(ebbhr, 64)
> +BOOK3S_WRAPPER(ebbrr, 64)
> +BOOK3S_WRAPPER(bescr, 64)
> +BOOK3S_WRAPPER(ic, 64)
> +BOOK3S_WRAPPER(vrsave, 64)
> +
> +
> +#define VCORE_WRAPPER_SET(reg, size)					\
> +static inline void kvmppc_set_##reg ##_hv(struct kvm_vcpu *vcpu, u##size val)	\
> +{									\
> +	vcpu->arch.vcore->reg = val;					\
> +}
> +
> +#define VCORE_WRAPPER_GET(reg, size)					\
> +static inline u##size kvmppc_get_##reg ##_hv(struct kvm_vcpu *vcpu)	\
> +{									\
> +	return vcpu->arch.vcore->reg;					\
> +}
> +
> +#define VCORE_WRAPPER(reg, size)					\
> +	VCORE_WRAPPER_SET(reg, size)					\
> +	VCORE_WRAPPER_GET(reg, size)					\
> +
> +
> +VCORE_WRAPPER(vtb, 64)
> +VCORE_WRAPPER(tb_offset, 64)
> +VCORE_WRAPPER(lpcr, 64)

The general idea is fine, some of the names could use a bit of
improvement. What's a BOOK3S_WRAPPER for example, is it not a
VCPU_WRAPPER, or alternatively why isn't a VCORE_WRAPPER Book3S
as well?

> +
> +static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.dec_expires;
> +}
> +
> +static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
> +{
> +	vcpu->arch.dec_expires = val;
> +}
> +
>  /* Expiry time of vcpu DEC relative to host TB */
>  static inline u64 kvmppc_dec_expires_host_tb(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu->arch.dec_expires - vcpu->arch.vcore->tb_offset;
> +	return kvmppc_get_dec_expires(vcpu) - kvmppc_get_tb_offset_hv(vcpu);
>  }
>  
>  static inline bool is_kvmppc_resume_guest(int r)
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 79a9c0bb8bba..fbac353ac46b 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -936,7 +936,7 @@ static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
>  #define SPRNG_WRAPPER_SET(reg, bookehv_spr)				\
>  static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val)	\
>  {									\
> -	mtspr(bookehv_spr, val);						\
> +	mtspr(bookehv_spr, val);					\
>  }									\
>  
>  #define SHARED_WRAPPER_GET(reg, size)					\

Stray hunk I think.

> @@ -957,10 +957,32 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
>  	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
>  }									\
>  
> +#define SHARED_CACHE_WRAPPER_GET(reg, size)				\
> +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)		\
> +{									\
> +	if (kvmppc_shared_big_endian(vcpu))				\
> +	       return be##size##_to_cpu(vcpu->arch.shared->reg);	\
> +	else								\
> +	       return le##size##_to_cpu(vcpu->arch.shared->reg);	\
> +}									\
> +
> +#define SHARED_CACHE_WRAPPER_SET(reg, size)				\
> +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
> +{									\
> +	if (kvmppc_shared_big_endian(vcpu))				\
> +	       vcpu->arch.shared->reg = cpu_to_be##size(val);		\
> +	else								\
> +	       vcpu->arch.shared->reg = cpu_to_le##size(val);		\
> +}									\
> +
>  #define SHARED_WRAPPER(reg, size)					\
>  	SHARED_WRAPPER_GET(reg, size)					\
>  	SHARED_WRAPPER_SET(reg, size)					\
>  
> +#define SHARED_CACHE_WRAPPER(reg, size)					\
> +	SHARED_CACHE_WRAPPER_GET(reg, size)				\
> +	SHARED_CACHE_WRAPPER_SET(reg, size)				\

SHARED_CACHE_WRAPPER that does the same thing as SHARED_WRAPPER.

I know some of the names are a but crufty but it's probably a good time
to rethink them a bit.

KVMPPC_VCPU_SHARED_REG_ACCESSOR or something like that. A few
more keystrokes could help imensely.

> diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> index 34f1db212824..34bc0a8a1288 100644
> --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
> +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> @@ -305,7 +305,7 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6
>  	u32 pid;
>  
>  	lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
> -	pid = vcpu->arch.pid;
> +	pid = kvmppc_get_pid(vcpu);
>  
>  	/*
>  	 * Prior memory accesses to host PID Q3 must be completed before we

Could add some accessors for get_lpid / get_guest_id which check for the
correct KVM mode maybe.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 2/6] KVM: PPC: Add fpr getters and setters
  2023-06-05  6:48   ` Jordan Niethe
  (?)
@ 2023-06-07  7:55     ` Nicholas Piggin
  -1 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  7:55 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: kvm, kvm-ppc, mikey, paulus, kautuk.consul.1980, vaibhav, sbhat

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> Add wrappers for fpr registers to prepare for supporting PAPR nested
> guests.
>
> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_book3s.h | 31 +++++++++++++++++++++++++++
>  arch/powerpc/include/asm/kvm_booke.h  | 10 +++++++++
>  arch/powerpc/kvm/book3s.c             | 16 +++++++-------
>  arch/powerpc/kvm/emulate_loadstore.c  |  2 +-
>  arch/powerpc/kvm/powerpc.c            | 22 +++++++++----------
>  5 files changed, 61 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index 4e91f54a3f9f..a632e79639f0 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -413,6 +413,37 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.fault_dar;
>  }
>  
> +static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
> +{
> +	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
> +}
> +
> +static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
> +{
> +	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
> +}
> +
> +static inline u64 kvmppc_get_fpscr(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.fp.fpscr;
> +}
> +
> +static inline void kvmppc_set_fpscr(struct kvm_vcpu *vcpu, u64 val)
> +{
> +	vcpu->arch.fp.fpscr = val;
> +}
> +
> +
> +static inline u64 kvmppc_get_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j)
> +{
> +	return vcpu->arch.fp.fpr[i][j];
> +}
> +
> +static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 val)
> +{
> +	vcpu->arch.fp.fpr[i][j] = val;
> +}
> +
>  #define BOOK3S_WRAPPER_SET(reg, size)					\
>  static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
>  {									\
> diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
> index 0c3401b2e19e..7c3291aa8922 100644
> --- a/arch/powerpc/include/asm/kvm_booke.h
> +++ b/arch/powerpc/include/asm/kvm_booke.h
> @@ -89,6 +89,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.regs.nip;
>  }
>  
> +static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
> +{
> +	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
> +}
> +
> +static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
> +{
> +	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
> +}
> +
>  #ifdef CONFIG_BOOKE
>  static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 2fe31b518886..6cd20ab9e94e 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -636,17 +636,17 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  			break;
>  		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
>  			i = id - KVM_REG_PPC_FPR0;
> -			*val = get_reg_val(id, VCPU_FPR(vcpu, i));
> +			*val = get_reg_val(id, kvmppc_get_fpr(vcpu, i));
>  			break;
>  		case KVM_REG_PPC_FPSCR:
> -			*val = get_reg_val(id, vcpu->arch.fp.fpscr);
> +			*val = get_reg_val(id, kvmppc_get_fpscr(vcpu));
>  			break;
>  #ifdef CONFIG_VSX
>  		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
>  			if (cpu_has_feature(CPU_FTR_VSX)) {
>  				i = id - KVM_REG_PPC_VSR0;
> -				val->vsxval[0] = vcpu->arch.fp.fpr[i][0];
> -				val->vsxval[1] = vcpu->arch.fp.fpr[i][1];
> +				val->vsxval[0] = kvmppc_get_vsx_fpr(vcpu, i, 0);
> +				val->vsxval[1] = kvmppc_get_vsx_fpr(vcpu, i, 1);
>  			} else {
>  				r = -ENXIO;
>  			}
> @@ -724,7 +724,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  			break;
>  		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
>  			i = id - KVM_REG_PPC_FPR0;
> -			VCPU_FPR(vcpu, i) = set_reg_val(id, *val);
> +			kvmppc_set_fpr(vcpu, i, set_reg_val(id, *val));
>  			break;
>  		case KVM_REG_PPC_FPSCR:
>  			vcpu->arch.fp.fpscr = set_reg_val(id, *val);
> @@ -733,8 +733,8 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
>  			if (cpu_has_feature(CPU_FTR_VSX)) {
>  				i = id - KVM_REG_PPC_VSR0;
> -				vcpu->arch.fp.fpr[i][0] = val->vsxval[0];
> -				vcpu->arch.fp.fpr[i][1] = val->vsxval[1];
> +				kvmppc_set_vsx_fpr(vcpu, i, 0, val->vsxval[0]);
> +				kvmppc_set_vsx_fpr(vcpu, i, 1, val->vsxval[1]);
>  			} else {
>  				r = -ENXIO;
>  			}
> @@ -765,7 +765,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  			break;
>  #endif /* CONFIG_KVM_XIVE */
>  		case KVM_REG_PPC_FSCR:
> -			vcpu->arch.fscr = set_reg_val(id, *val);
> +			kvmppc_set_fpscr(vcpu, set_reg_val(id, *val));
>  			break;
>  		case KVM_REG_PPC_TAR:
>  			kvmppc_set_tar(vcpu, set_reg_val(id, *val));
> diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c
> index 059c08ae0340..e6e66c3792f8 100644
> --- a/arch/powerpc/kvm/emulate_loadstore.c
> +++ b/arch/powerpc/kvm/emulate_loadstore.c
> @@ -250,7 +250,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
>  				vcpu->arch.mmio_sp64_extend = 1;
>  
>  			emulated = kvmppc_handle_store(vcpu,
> -					VCPU_FPR(vcpu, op.reg), size, 1);
> +					kvmppc_get_fpr(vcpu, op.reg), size, 1);
>  
>  			if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
>  				kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index ca9793c3d437..7f913e68342a 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -938,7 +938,7 @@ static inline void kvmppc_set_vsr_dword(struct kvm_vcpu *vcpu,
>  		val.vsxval[offset] = gpr;
>  		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
>  	} else {
> -		VCPU_VSX_FPR(vcpu, index, offset) = gpr;
> +		kvmppc_set_vsx_fpr(vcpu, index, offset, gpr);
>  	}
>  }
>  

Is there a particular reason some reg sets are broken into their own
patches? Looking at this hunk you think the VR one got missed, but it's
in its own patch.

Not really a big deal but I wouldn't mind them all in one patch. Or at
least the FP/VR/VSR ine one since they're quite regular and similar.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 2/6] KVM: PPC: Add fpr getters and setters
@ 2023-06-07  7:55     ` Nicholas Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  7:55 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: mikey, kautuk.consul.1980, kvm, kvm-ppc, sbhat, vaibhav

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> Add wrappers for fpr registers to prepare for supporting PAPR nested
> guests.
>
> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_book3s.h | 31 +++++++++++++++++++++++++++
>  arch/powerpc/include/asm/kvm_booke.h  | 10 +++++++++
>  arch/powerpc/kvm/book3s.c             | 16 +++++++-------
>  arch/powerpc/kvm/emulate_loadstore.c  |  2 +-
>  arch/powerpc/kvm/powerpc.c            | 22 +++++++++----------
>  5 files changed, 61 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index 4e91f54a3f9f..a632e79639f0 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -413,6 +413,37 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.fault_dar;
>  }
>  
> +static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
> +{
> +	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
> +}
> +
> +static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
> +{
> +	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
> +}
> +
> +static inline u64 kvmppc_get_fpscr(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.fp.fpscr;
> +}
> +
> +static inline void kvmppc_set_fpscr(struct kvm_vcpu *vcpu, u64 val)
> +{
> +	vcpu->arch.fp.fpscr = val;
> +}
> +
> +
> +static inline u64 kvmppc_get_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j)
> +{
> +	return vcpu->arch.fp.fpr[i][j];
> +}
> +
> +static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 val)
> +{
> +	vcpu->arch.fp.fpr[i][j] = val;
> +}
> +
>  #define BOOK3S_WRAPPER_SET(reg, size)					\
>  static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
>  {									\
> diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
> index 0c3401b2e19e..7c3291aa8922 100644
> --- a/arch/powerpc/include/asm/kvm_booke.h
> +++ b/arch/powerpc/include/asm/kvm_booke.h
> @@ -89,6 +89,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.regs.nip;
>  }
>  
> +static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
> +{
> +	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
> +}
> +
> +static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
> +{
> +	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
> +}
> +
>  #ifdef CONFIG_BOOKE
>  static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 2fe31b518886..6cd20ab9e94e 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -636,17 +636,17 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  			break;
>  		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
>  			i = id - KVM_REG_PPC_FPR0;
> -			*val = get_reg_val(id, VCPU_FPR(vcpu, i));
> +			*val = get_reg_val(id, kvmppc_get_fpr(vcpu, i));
>  			break;
>  		case KVM_REG_PPC_FPSCR:
> -			*val = get_reg_val(id, vcpu->arch.fp.fpscr);
> +			*val = get_reg_val(id, kvmppc_get_fpscr(vcpu));
>  			break;
>  #ifdef CONFIG_VSX
>  		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
>  			if (cpu_has_feature(CPU_FTR_VSX)) {
>  				i = id - KVM_REG_PPC_VSR0;
> -				val->vsxval[0] = vcpu->arch.fp.fpr[i][0];
> -				val->vsxval[1] = vcpu->arch.fp.fpr[i][1];
> +				val->vsxval[0] = kvmppc_get_vsx_fpr(vcpu, i, 0);
> +				val->vsxval[1] = kvmppc_get_vsx_fpr(vcpu, i, 1);
>  			} else {
>  				r = -ENXIO;
>  			}
> @@ -724,7 +724,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  			break;
>  		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
>  			i = id - KVM_REG_PPC_FPR0;
> -			VCPU_FPR(vcpu, i) = set_reg_val(id, *val);
> +			kvmppc_set_fpr(vcpu, i, set_reg_val(id, *val));
>  			break;
>  		case KVM_REG_PPC_FPSCR:
>  			vcpu->arch.fp.fpscr = set_reg_val(id, *val);
> @@ -733,8 +733,8 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
>  			if (cpu_has_feature(CPU_FTR_VSX)) {
>  				i = id - KVM_REG_PPC_VSR0;
> -				vcpu->arch.fp.fpr[i][0] = val->vsxval[0];
> -				vcpu->arch.fp.fpr[i][1] = val->vsxval[1];
> +				kvmppc_set_vsx_fpr(vcpu, i, 0, val->vsxval[0]);
> +				kvmppc_set_vsx_fpr(vcpu, i, 1, val->vsxval[1]);
>  			} else {
>  				r = -ENXIO;
>  			}
> @@ -765,7 +765,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  			break;
>  #endif /* CONFIG_KVM_XIVE */
>  		case KVM_REG_PPC_FSCR:
> -			vcpu->arch.fscr = set_reg_val(id, *val);
> +			kvmppc_set_fpscr(vcpu, set_reg_val(id, *val));
>  			break;
>  		case KVM_REG_PPC_TAR:
>  			kvmppc_set_tar(vcpu, set_reg_val(id, *val));
> diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c
> index 059c08ae0340..e6e66c3792f8 100644
> --- a/arch/powerpc/kvm/emulate_loadstore.c
> +++ b/arch/powerpc/kvm/emulate_loadstore.c
> @@ -250,7 +250,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
>  				vcpu->arch.mmio_sp64_extend = 1;
>  
>  			emulated = kvmppc_handle_store(vcpu,
> -					VCPU_FPR(vcpu, op.reg), size, 1);
> +					kvmppc_get_fpr(vcpu, op.reg), size, 1);
>  
>  			if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
>  				kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index ca9793c3d437..7f913e68342a 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -938,7 +938,7 @@ static inline void kvmppc_set_vsr_dword(struct kvm_vcpu *vcpu,
>  		val.vsxval[offset] = gpr;
>  		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
>  	} else {
> -		VCPU_VSX_FPR(vcpu, index, offset) = gpr;
> +		kvmppc_set_vsx_fpr(vcpu, index, offset, gpr);
>  	}
>  }
>  

Is there a particular reason some reg sets are broken into their own
patches? Looking at this hunk you think the VR one got missed, but it's
in its own patch.

Not really a big deal but I wouldn't mind them all in one patch. Or at
least the FP/VR/VSR ine one since they're quite regular and similar.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 2/6] KVM: PPC: Add fpr getters and setters
@ 2023-06-07  7:55     ` Nicholas Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  7:55 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: kvm, kvm-ppc, mikey, paulus, kautuk.consul.1980, vaibhav, sbhat

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> Add wrappers for fpr registers to prepare for supporting PAPR nested
> guests.
>
> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_book3s.h | 31 +++++++++++++++++++++++++++
>  arch/powerpc/include/asm/kvm_booke.h  | 10 +++++++++
>  arch/powerpc/kvm/book3s.c             | 16 +++++++-------
>  arch/powerpc/kvm/emulate_loadstore.c  |  2 +-
>  arch/powerpc/kvm/powerpc.c            | 22 +++++++++----------
>  5 files changed, 61 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index 4e91f54a3f9f..a632e79639f0 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -413,6 +413,37 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.fault_dar;
>  }
>  
> +static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
> +{
> +	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
> +}
> +
> +static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
> +{
> +	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
> +}
> +
> +static inline u64 kvmppc_get_fpscr(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.fp.fpscr;
> +}
> +
> +static inline void kvmppc_set_fpscr(struct kvm_vcpu *vcpu, u64 val)
> +{
> +	vcpu->arch.fp.fpscr = val;
> +}
> +
> +
> +static inline u64 kvmppc_get_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j)
> +{
> +	return vcpu->arch.fp.fpr[i][j];
> +}
> +
> +static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j, u64 val)
> +{
> +	vcpu->arch.fp.fpr[i][j] = val;
> +}
> +
>  #define BOOK3S_WRAPPER_SET(reg, size)					\
>  static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)	\
>  {									\
> diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
> index 0c3401b2e19e..7c3291aa8922 100644
> --- a/arch/powerpc/include/asm/kvm_booke.h
> +++ b/arch/powerpc/include/asm/kvm_booke.h
> @@ -89,6 +89,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.regs.nip;
>  }
>  
> +static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
> +{
> +	vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
> +}
> +
> +static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
> +{
> +	return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
> +}
> +
>  #ifdef CONFIG_BOOKE
>  static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 2fe31b518886..6cd20ab9e94e 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -636,17 +636,17 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  			break;
>  		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
>  			i = id - KVM_REG_PPC_FPR0;
> -			*val = get_reg_val(id, VCPU_FPR(vcpu, i));
> +			*val = get_reg_val(id, kvmppc_get_fpr(vcpu, i));
>  			break;
>  		case KVM_REG_PPC_FPSCR:
> -			*val = get_reg_val(id, vcpu->arch.fp.fpscr);
> +			*val = get_reg_val(id, kvmppc_get_fpscr(vcpu));
>  			break;
>  #ifdef CONFIG_VSX
>  		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
>  			if (cpu_has_feature(CPU_FTR_VSX)) {
>  				i = id - KVM_REG_PPC_VSR0;
> -				val->vsxval[0] = vcpu->arch.fp.fpr[i][0];
> -				val->vsxval[1] = vcpu->arch.fp.fpr[i][1];
> +				val->vsxval[0] = kvmppc_get_vsx_fpr(vcpu, i, 0);
> +				val->vsxval[1] = kvmppc_get_vsx_fpr(vcpu, i, 1);
>  			} else {
>  				r = -ENXIO;
>  			}
> @@ -724,7 +724,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  			break;
>  		case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
>  			i = id - KVM_REG_PPC_FPR0;
> -			VCPU_FPR(vcpu, i) = set_reg_val(id, *val);
> +			kvmppc_set_fpr(vcpu, i, set_reg_val(id, *val));
>  			break;
>  		case KVM_REG_PPC_FPSCR:
>  			vcpu->arch.fp.fpscr = set_reg_val(id, *val);
> @@ -733,8 +733,8 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  		case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
>  			if (cpu_has_feature(CPU_FTR_VSX)) {
>  				i = id - KVM_REG_PPC_VSR0;
> -				vcpu->arch.fp.fpr[i][0] = val->vsxval[0];
> -				vcpu->arch.fp.fpr[i][1] = val->vsxval[1];
> +				kvmppc_set_vsx_fpr(vcpu, i, 0, val->vsxval[0]);
> +				kvmppc_set_vsx_fpr(vcpu, i, 1, val->vsxval[1]);
>  			} else {
>  				r = -ENXIO;
>  			}
> @@ -765,7 +765,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
>  			break;
>  #endif /* CONFIG_KVM_XIVE */
>  		case KVM_REG_PPC_FSCR:
> -			vcpu->arch.fscr = set_reg_val(id, *val);
> +			kvmppc_set_fpscr(vcpu, set_reg_val(id, *val));
>  			break;
>  		case KVM_REG_PPC_TAR:
>  			kvmppc_set_tar(vcpu, set_reg_val(id, *val));
> diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c
> index 059c08ae0340..e6e66c3792f8 100644
> --- a/arch/powerpc/kvm/emulate_loadstore.c
> +++ b/arch/powerpc/kvm/emulate_loadstore.c
> @@ -250,7 +250,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
>  				vcpu->arch.mmio_sp64_extend = 1;
>  
>  			emulated = kvmppc_handle_store(vcpu,
> -					VCPU_FPR(vcpu, op.reg), size, 1);
> +					kvmppc_get_fpr(vcpu, op.reg), size, 1);
>  
>  			if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
>  				kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index ca9793c3d437..7f913e68342a 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -938,7 +938,7 @@ static inline void kvmppc_set_vsr_dword(struct kvm_vcpu *vcpu,
>  		val.vsxval[offset] = gpr;
>  		VCPU_VSX_VR(vcpu, index - 32) = val.vval;
>  	} else {
> -		VCPU_VSX_FPR(vcpu, index, offset) = gpr;
> +		kvmppc_set_vsx_fpr(vcpu, index, offset, gpr);
>  	}
>  }
>  

Is there a particular reason some reg sets are broken into their own
patches? Looking at this hunk you think the VR one got missed, but it's
in its own patch.

Not really a big deal but I wouldn't mind them all in one patch. Or at
least the FP/VR/VSR ine one since they're quite regular and similar.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers
  2023-06-05  6:48   ` Jordan Niethe
  (?)
@ 2023-06-07  8:26     ` Nicholas Piggin
  -1 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  8:26 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: kvm, kvm-ppc, mikey, paulus, kautuk.consul.1980, vaibhav, sbhat

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> The new PAPR nested guest API introduces the concept of a Guest State
> Buffer for communication about L2 guests between L1 and L0 hosts.
>
> In the new API, the L0 manages the L2 on behalf of the L1. This means
> that if the L1 needs to change L2 state (e.g. GPRs, SPRs, partition
> table...), it must request the L0 perform the modification. If the
> nested host needs to read L2 state likewise this request must
> go through the L0.
>
> The Guest State Buffer is a Type-Length-Value style data format defined
> in the PAPR which assigns all relevant partition state a unique
> identity. Unlike a typical TLV format the length is redundant as the
> length of each identity is fixed but is included for checking
> correctness.
>
> A guest state buffer consists of an element count followed by a stream
> of elements, where elements are composed of an ID number, data length,
> then the data:
>
>   Header:
>
>    <---4 bytes--->
>   +----------------+-----
>   | Element Count  | Elements...
>   +----------------+-----
>
>   Element:
>
>    <----2 bytes---> <-2 bytes-> <-Length bytes->
>   +----------------+-----------+----------------+
>   | Guest State ID |  Length   |      Data      |
>   +----------------+-----------+----------------+
>
> Guest State IDs have other attributes defined in the PAPR such as
> whether they are per thread or per guest, or read-only.
>
> Introduce a library for using guest state buffers. This includes support
> for actions such as creating buffers, adding elements to buffers,
> reading the value of elements and parsing buffers. This will be used
> later by the PAPR nested guest support.

This is a tour de force in one of these things, so I hate to be
the "me smash with club" guy, but what if you allocated buffers
with enough room for all the state (or 99% of cases, in which
case an overflow would make an hcall)?

What's actually a fast-path that we don't get from the interrupt
return buffer? Getting and setting a few regs for MMIO emulation?


> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
> v2:
>   - Add missing #ifdef CONFIG_VSXs
>   - Move files from lib/ to kvm/
>   - Guard compilation on CONFIG_KVM_BOOK3S_HV_POSSIBLE
>   - Use kunit for guest state buffer tests
>   - Add configuration option for the tests
>   - Use macros for contiguous id ranges like GPRs
>   - Add some missing EXPORTs to functions
>   - HEIR element is a double word not a word
> ---
>  arch/powerpc/Kconfig.debug                    |  12 +
>  arch/powerpc/include/asm/guest-state-buffer.h | 901 ++++++++++++++++++
>  arch/powerpc/include/asm/kvm_book3s.h         |   2 +
>  arch/powerpc/kvm/Makefile                     |   3 +
>  arch/powerpc/kvm/guest-state-buffer.c         | 563 +++++++++++
>  arch/powerpc/kvm/test-guest-state-buffer.c    | 321 +++++++
>  6 files changed, 1802 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
>  create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
>  create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c
>
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 6aaf8dc60610..ed830a714720 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -82,6 +82,18 @@ config MSI_BITMAP_SELFTEST
>  	bool "Run self-tests of the MSI bitmap code"
>  	depends on DEBUG_KERNEL
>  
> +config GUEST_STATE_BUFFER_TEST
> +	def_tristate n
> +	prompt "Enable Guest State Buffer unit tests"
> +	depends on KUNIT
> +	depends on KVM_BOOK3S_HV_POSSIBLE
> +	default KUNIT_ALL_TESTS
> +	help
> +	  The Guest State Buffer is a data format specified in the PAPR.
> +	  It is by hcalls to communicate the state of L2 guests between
> +	  the L1 and L0 hypervisors. Enable unit tests for the library
> +	  used to create and use guest state buffers.
> +
>  config PPC_IRQ_SOFT_MASK_DEBUG
>  	bool "Include extra checks for powerpc irq soft masking"
>  	depends on PPC64
> diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
> new file mode 100644
> index 000000000000..65a840abf1bb
> --- /dev/null
> +++ b/arch/powerpc/include/asm/guest-state-buffer.h
> @@ -0,0 +1,901 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Interface based on include/net/netlink.h
> + */
> +#ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
> +#define _ASM_POWERPC_GUEST_STATE_BUFFER_H
> +
> +#include <linux/gfp.h>
> +#include <linux/bitmap.h>
> +#include <asm/plpar_wrappers.h>
> +
> +/**************************************************************************
> + * Guest State Buffer Constants
> + **************************************************************************/
> +#define GSID_BLANK			0x0000

The namespaces are a little abbreviated. KVM_PAPR_ might be nice if
you're calling the API that.

> +
> +#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
> +#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
> +#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
> +#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
> +#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
> +#define GSID_PROCESS_TABLE		0x0006 /* Process Table */

> +
> +#define GSID_RUN_INPUT			0x0C00 /* Run VCPU Input Buffer */
> +#define GSID_RUN_OUTPUT			0x0C01 /* Run VCPU Out Buffer */
> +#define GSID_VPA			0x0C02 /* HRA to Guest VCPU VPA */
> +
> +#define GSID_GPR(x)			(0x1000 + (x))
> +#define GSID_HDEC_EXPIRY_TB		0x1020
> +#define GSID_NIA			0x1021
> +#define GSID_MSR			0x1022
> +#define GSID_LR				0x1023
> +#define GSID_XER			0x1024
> +#define GSID_CTR			0x1025
> +#define GSID_CFAR			0x1026
> +#define GSID_SRR0			0x1027
> +#define GSID_SRR1			0x1028
> +#define GSID_DAR			0x1029

It's a shame you have to rip up all your wrapper functions now to
shoehorn these in.

If you included names analogous to the reg field names in the kvm
structures, the wrappers could do macro expansions that get them.

#define __GSID_WRAPPER_dar		GSID_DAR

Or similar.

And since of course you have to explicitly enumerate all these, I
wouldn't mind defining the types and lengths up-front rather than
down in the type function. You'd like to be able to go through the
spec and eyeball type, number, size.

[snip]

> +/**
> + * gsb_paddress() - the physical address of buffer
> + * @gsb: guest state buffer
> + *
> + * Returns the physical address of the buffer.
> + */
> +static inline u64 gsb_paddress(struct gs_buff *gsb)
> +{
> +	return __pa(gsb_header(gsb));
> +}

> +/**
> + * __gse_put_reg() - add a register type guest state element to a buffer
> + * @gsb: guest state buffer to add element to
> + * @iden: guest state ID
> + * @val: host endian value
> + *
> + * Adds a register type guest state element. Uses the guest state ID for
> + * determining the length of the guest element. If the guest state ID has
> + * bits that can not be set they will be cleared.
> + */
> +static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
> +{
> +	val &= gsid_mask(iden);
> +	if (gsid_size(iden) == sizeof(u64))
> +		return gse_put_u64(gsb, iden, val);
> +
> +	if (gsid_size(iden) == sizeof(u32)) {
> +		u32 tmp;
> +
> +		tmp = (u32)val;
> +		if (tmp != val)
> +			return -EINVAL;
> +
> +		return gse_put_u32(gsb, iden, tmp);
> +	}
> +	return -EINVAL;
> +}

There is a clever accessor that derives the length from the type, but
then you fall back to this.

> +/**
> + * gse_put - add a guest state element to a buffer
> + * @gsb: guest state buffer to add to
> + * @iden: guest state identity
> + * @v: generic value
> + */
> +#define gse_put(gsb, iden, v)					\
> +	(_Generic((v),						\
> +		  u64 : __gse_put_reg,				\
> +		  long unsigned int : __gse_put_reg,		\
> +		  u32 : __gse_put_reg,				\
> +		  struct gs_buff_info : gse_put_buff_info,	\
> +		  struct gs_proc_table : gse_put_proc_table,	\
> +		  struct gs_part_table : gse_put_part_table,	\
> +		  vector128 : gse_put_vector128)(gsb, iden, v))
> +
> +/**
> + * gse_get - return the data of a guest state element
> + * @gsb: guest state element to add to
> + * @v: generic value pointer to return in
> + */
> +#define gse_get(gse, v)						\
> +	(*v = (_Generic((v),					\
> +			u64 * : __gse_get_reg,			\
> +			unsigned long * : __gse_get_reg,	\
> +			u32 * : __gse_get_reg,			\
> +			vector128 * : gse_get_vector128)(gse)))

I don't see the benefit of this. Caller always knows the type doesn't
it? It seems like the right function could be called directly. It
makes the calling convention a bit clunky too. I know there's similar
precedent for uaccess functions, but not sure I like it for this.

> +struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
> +			unsigned long vcpu_id, gfp_t flags)
> +{
> +	struct gs_buff *gsb;
> +
> +	gsb = kzalloc(sizeof(*gsb), flags);
> +	if (!gsb)
> +		return NULL;
> +
> +	size = roundup_pow_of_two(size);
> +	gsb->hdr = kzalloc(size, GFP_KERNEL);
> +	if (!gsb->hdr)
> +		goto free;
> +
> +	gsb->capacity = size;
> +	gsb->len = sizeof(struct gs_header);
> +	gsb->vcpu_id = vcpu_id;
> +	gsb->guest_id = guest_id;
> +
> +	gsb->hdr->nelems = cpu_to_be32(0);
> +
> +	return gsb;
> +
> +free:
> +	kfree(gsb);
> +	return NULL;
> +}
> +EXPORT_SYMBOL(gsb_new);

Should all be GPL exports.

Needs more namespace too, I reckon (not just exports but any kernel-wide
name this short and non-descriptive needs a kvmppc or kvm_papr or
something).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers
@ 2023-06-07  8:26     ` Nicholas Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  8:26 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: mikey, kautuk.consul.1980, kvm, kvm-ppc, sbhat, vaibhav

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> The new PAPR nested guest API introduces the concept of a Guest State
> Buffer for communication about L2 guests between L1 and L0 hosts.
>
> In the new API, the L0 manages the L2 on behalf of the L1. This means
> that if the L1 needs to change L2 state (e.g. GPRs, SPRs, partition
> table...), it must request the L0 perform the modification. If the
> nested host needs to read L2 state likewise this request must
> go through the L0.
>
> The Guest State Buffer is a Type-Length-Value style data format defined
> in the PAPR which assigns all relevant partition state a unique
> identity. Unlike a typical TLV format the length is redundant as the
> length of each identity is fixed but is included for checking
> correctness.
>
> A guest state buffer consists of an element count followed by a stream
> of elements, where elements are composed of an ID number, data length,
> then the data:
>
>   Header:
>
>    <---4 bytes--->
>   +----------------+-----
>   | Element Count  | Elements...
>   +----------------+-----
>
>   Element:
>
>    <----2 bytes---> <-2 bytes-> <-Length bytes->
>   +----------------+-----------+----------------+
>   | Guest State ID |  Length   |      Data      |
>   +----------------+-----------+----------------+
>
> Guest State IDs have other attributes defined in the PAPR such as
> whether they are per thread or per guest, or read-only.
>
> Introduce a library for using guest state buffers. This includes support
> for actions such as creating buffers, adding elements to buffers,
> reading the value of elements and parsing buffers. This will be used
> later by the PAPR nested guest support.

This is a tour de force in one of these things, so I hate to be
the "me smash with club" guy, but what if you allocated buffers
with enough room for all the state (or 99% of cases, in which
case an overflow would make an hcall)?

What's actually a fast-path that we don't get from the interrupt
return buffer? Getting and setting a few regs for MMIO emulation?


> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
> v2:
>   - Add missing #ifdef CONFIG_VSXs
>   - Move files from lib/ to kvm/
>   - Guard compilation on CONFIG_KVM_BOOK3S_HV_POSSIBLE
>   - Use kunit for guest state buffer tests
>   - Add configuration option for the tests
>   - Use macros for contiguous id ranges like GPRs
>   - Add some missing EXPORTs to functions
>   - HEIR element is a double word not a word
> ---
>  arch/powerpc/Kconfig.debug                    |  12 +
>  arch/powerpc/include/asm/guest-state-buffer.h | 901 ++++++++++++++++++
>  arch/powerpc/include/asm/kvm_book3s.h         |   2 +
>  arch/powerpc/kvm/Makefile                     |   3 +
>  arch/powerpc/kvm/guest-state-buffer.c         | 563 +++++++++++
>  arch/powerpc/kvm/test-guest-state-buffer.c    | 321 +++++++
>  6 files changed, 1802 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
>  create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
>  create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c
>
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 6aaf8dc60610..ed830a714720 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -82,6 +82,18 @@ config MSI_BITMAP_SELFTEST
>  	bool "Run self-tests of the MSI bitmap code"
>  	depends on DEBUG_KERNEL
>  
> +config GUEST_STATE_BUFFER_TEST
> +	def_tristate n
> +	prompt "Enable Guest State Buffer unit tests"
> +	depends on KUNIT
> +	depends on KVM_BOOK3S_HV_POSSIBLE
> +	default KUNIT_ALL_TESTS
> +	help
> +	  The Guest State Buffer is a data format specified in the PAPR.
> +	  It is by hcalls to communicate the state of L2 guests between
> +	  the L1 and L0 hypervisors. Enable unit tests for the library
> +	  used to create and use guest state buffers.
> +
>  config PPC_IRQ_SOFT_MASK_DEBUG
>  	bool "Include extra checks for powerpc irq soft masking"
>  	depends on PPC64
> diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
> new file mode 100644
> index 000000000000..65a840abf1bb
> --- /dev/null
> +++ b/arch/powerpc/include/asm/guest-state-buffer.h
> @@ -0,0 +1,901 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Interface based on include/net/netlink.h
> + */
> +#ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
> +#define _ASM_POWERPC_GUEST_STATE_BUFFER_H
> +
> +#include <linux/gfp.h>
> +#include <linux/bitmap.h>
> +#include <asm/plpar_wrappers.h>
> +
> +/**************************************************************************
> + * Guest State Buffer Constants
> + **************************************************************************/
> +#define GSID_BLANK			0x0000

The namespaces are a little abbreviated. KVM_PAPR_ might be nice if
you're calling the API that.

> +
> +#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
> +#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
> +#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
> +#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
> +#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
> +#define GSID_PROCESS_TABLE		0x0006 /* Process Table */

> +
> +#define GSID_RUN_INPUT			0x0C00 /* Run VCPU Input Buffer */
> +#define GSID_RUN_OUTPUT			0x0C01 /* Run VCPU Out Buffer */
> +#define GSID_VPA			0x0C02 /* HRA to Guest VCPU VPA */
> +
> +#define GSID_GPR(x)			(0x1000 + (x))
> +#define GSID_HDEC_EXPIRY_TB		0x1020
> +#define GSID_NIA			0x1021
> +#define GSID_MSR			0x1022
> +#define GSID_LR				0x1023
> +#define GSID_XER			0x1024
> +#define GSID_CTR			0x1025
> +#define GSID_CFAR			0x1026
> +#define GSID_SRR0			0x1027
> +#define GSID_SRR1			0x1028
> +#define GSID_DAR			0x1029

It's a shame you have to rip up all your wrapper functions now to
shoehorn these in.

If you included names analogous to the reg field names in the kvm
structures, the wrappers could do macro expansions that get them.

#define __GSID_WRAPPER_dar		GSID_DAR

Or similar.

And since of course you have to explicitly enumerate all these, I
wouldn't mind defining the types and lengths up-front rather than
down in the type function. You'd like to be able to go through the
spec and eyeball type, number, size.

[snip]

> +/**
> + * gsb_paddress() - the physical address of buffer
> + * @gsb: guest state buffer
> + *
> + * Returns the physical address of the buffer.
> + */
> +static inline u64 gsb_paddress(struct gs_buff *gsb)
> +{
> +	return __pa(gsb_header(gsb));
> +}

> +/**
> + * __gse_put_reg() - add a register type guest state element to a buffer
> + * @gsb: guest state buffer to add element to
> + * @iden: guest state ID
> + * @val: host endian value
> + *
> + * Adds a register type guest state element. Uses the guest state ID for
> + * determining the length of the guest element. If the guest state ID has
> + * bits that can not be set they will be cleared.
> + */
> +static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
> +{
> +	val &= gsid_mask(iden);
> +	if (gsid_size(iden) == sizeof(u64))
> +		return gse_put_u64(gsb, iden, val);
> +
> +	if (gsid_size(iden) == sizeof(u32)) {
> +		u32 tmp;
> +
> +		tmp = (u32)val;
> +		if (tmp != val)
> +			return -EINVAL;
> +
> +		return gse_put_u32(gsb, iden, tmp);
> +	}
> +	return -EINVAL;
> +}

There is a clever accessor that derives the length from the type, but
then you fall back to this.

> +/**
> + * gse_put - add a guest state element to a buffer
> + * @gsb: guest state buffer to add to
> + * @iden: guest state identity
> + * @v: generic value
> + */
> +#define gse_put(gsb, iden, v)					\
> +	(_Generic((v),						\
> +		  u64 : __gse_put_reg,				\
> +		  long unsigned int : __gse_put_reg,		\
> +		  u32 : __gse_put_reg,				\
> +		  struct gs_buff_info : gse_put_buff_info,	\
> +		  struct gs_proc_table : gse_put_proc_table,	\
> +		  struct gs_part_table : gse_put_part_table,	\
> +		  vector128 : gse_put_vector128)(gsb, iden, v))
> +
> +/**
> + * gse_get - return the data of a guest state element
> + * @gsb: guest state element to add to
> + * @v: generic value pointer to return in
> + */
> +#define gse_get(gse, v)						\
> +	(*v = (_Generic((v),					\
> +			u64 * : __gse_get_reg,			\
> +			unsigned long * : __gse_get_reg,	\
> +			u32 * : __gse_get_reg,			\
> +			vector128 * : gse_get_vector128)(gse)))

I don't see the benefit of this. Caller always knows the type doesn't
it? It seems like the right function could be called directly. It
makes the calling convention a bit clunky too. I know there's similar
precedent for uaccess functions, but not sure I like it for this.

> +struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
> +			unsigned long vcpu_id, gfp_t flags)
> +{
> +	struct gs_buff *gsb;
> +
> +	gsb = kzalloc(sizeof(*gsb), flags);
> +	if (!gsb)
> +		return NULL;
> +
> +	size = roundup_pow_of_two(size);
> +	gsb->hdr = kzalloc(size, GFP_KERNEL);
> +	if (!gsb->hdr)
> +		goto free;
> +
> +	gsb->capacity = size;
> +	gsb->len = sizeof(struct gs_header);
> +	gsb->vcpu_id = vcpu_id;
> +	gsb->guest_id = guest_id;
> +
> +	gsb->hdr->nelems = cpu_to_be32(0);
> +
> +	return gsb;
> +
> +free:
> +	kfree(gsb);
> +	return NULL;
> +}
> +EXPORT_SYMBOL(gsb_new);

Should all be GPL exports.

Needs more namespace too, I reckon (not just exports but any kernel-wide
name this short and non-descriptive needs a kvmppc or kvm_papr or
something).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers
@ 2023-06-07  8:26     ` Nicholas Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  8:26 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: kvm, kvm-ppc, mikey, paulus, kautuk.consul.1980, vaibhav, sbhat

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> The new PAPR nested guest API introduces the concept of a Guest State
> Buffer for communication about L2 guests between L1 and L0 hosts.
>
> In the new API, the L0 manages the L2 on behalf of the L1. This means
> that if the L1 needs to change L2 state (e.g. GPRs, SPRs, partition
> table...), it must request the L0 perform the modification. If the
> nested host needs to read L2 state likewise this request must
> go through the L0.
>
> The Guest State Buffer is a Type-Length-Value style data format defined
> in the PAPR which assigns all relevant partition state a unique
> identity. Unlike a typical TLV format the length is redundant as the
> length of each identity is fixed but is included for checking
> correctness.
>
> A guest state buffer consists of an element count followed by a stream
> of elements, where elements are composed of an ID number, data length,
> then the data:
>
>   Header:
>
>    <---4 bytes--->
>   +----------------+-----
>   | Element Count  | Elements...
>   +----------------+-----
>
>   Element:
>
>    <----2 bytes---> <-2 bytes-> <-Length bytes->
>   +----------------+-----------+----------------+
>   | Guest State ID |  Length   |      Data      |
>   +----------------+-----------+----------------+
>
> Guest State IDs have other attributes defined in the PAPR such as
> whether they are per thread or per guest, or read-only.
>
> Introduce a library for using guest state buffers. This includes support
> for actions such as creating buffers, adding elements to buffers,
> reading the value of elements and parsing buffers. This will be used
> later by the PAPR nested guest support.

This is a tour de force in one of these things, so I hate to be
the "me smash with club" guy, but what if you allocated buffers
with enough room for all the state (or 99% of cases, in which
case an overflow would make an hcall)?

What's actually a fast-path that we don't get from the interrupt
return buffer? Getting and setting a few regs for MMIO emulation?


> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
> v2:
>   - Add missing #ifdef CONFIG_VSXs
>   - Move files from lib/ to kvm/
>   - Guard compilation on CONFIG_KVM_BOOK3S_HV_POSSIBLE
>   - Use kunit for guest state buffer tests
>   - Add configuration option for the tests
>   - Use macros for contiguous id ranges like GPRs
>   - Add some missing EXPORTs to functions
>   - HEIR element is a double word not a word
> ---
>  arch/powerpc/Kconfig.debug                    |  12 +
>  arch/powerpc/include/asm/guest-state-buffer.h | 901 ++++++++++++++++++
>  arch/powerpc/include/asm/kvm_book3s.h         |   2 +
>  arch/powerpc/kvm/Makefile                     |   3 +
>  arch/powerpc/kvm/guest-state-buffer.c         | 563 +++++++++++
>  arch/powerpc/kvm/test-guest-state-buffer.c    | 321 +++++++
>  6 files changed, 1802 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
>  create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
>  create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c
>
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 6aaf8dc60610..ed830a714720 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -82,6 +82,18 @@ config MSI_BITMAP_SELFTEST
>  	bool "Run self-tests of the MSI bitmap code"
>  	depends on DEBUG_KERNEL
>  
> +config GUEST_STATE_BUFFER_TEST
> +	def_tristate n
> +	prompt "Enable Guest State Buffer unit tests"
> +	depends on KUNIT
> +	depends on KVM_BOOK3S_HV_POSSIBLE
> +	default KUNIT_ALL_TESTS
> +	help
> +	  The Guest State Buffer is a data format specified in the PAPR.
> +	  It is by hcalls to communicate the state of L2 guests between
> +	  the L1 and L0 hypervisors. Enable unit tests for the library
> +	  used to create and use guest state buffers.
> +
>  config PPC_IRQ_SOFT_MASK_DEBUG
>  	bool "Include extra checks for powerpc irq soft masking"
>  	depends on PPC64
> diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
> new file mode 100644
> index 000000000000..65a840abf1bb
> --- /dev/null
> +++ b/arch/powerpc/include/asm/guest-state-buffer.h
> @@ -0,0 +1,901 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Interface based on include/net/netlink.h
> + */
> +#ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
> +#define _ASM_POWERPC_GUEST_STATE_BUFFER_H
> +
> +#include <linux/gfp.h>
> +#include <linux/bitmap.h>
> +#include <asm/plpar_wrappers.h>
> +
> +/**************************************************************************
> + * Guest State Buffer Constants
> + **************************************************************************/
> +#define GSID_BLANK			0x0000

The namespaces are a little abbreviated. KVM_PAPR_ might be nice if
you're calling the API that.

> +
> +#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
> +#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
> +#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
> +#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
> +#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
> +#define GSID_PROCESS_TABLE		0x0006 /* Process Table */

> +
> +#define GSID_RUN_INPUT			0x0C00 /* Run VCPU Input Buffer */
> +#define GSID_RUN_OUTPUT			0x0C01 /* Run VCPU Out Buffer */
> +#define GSID_VPA			0x0C02 /* HRA to Guest VCPU VPA */
> +
> +#define GSID_GPR(x)			(0x1000 + (x))
> +#define GSID_HDEC_EXPIRY_TB		0x1020
> +#define GSID_NIA			0x1021
> +#define GSID_MSR			0x1022
> +#define GSID_LR				0x1023
> +#define GSID_XER			0x1024
> +#define GSID_CTR			0x1025
> +#define GSID_CFAR			0x1026
> +#define GSID_SRR0			0x1027
> +#define GSID_SRR1			0x1028
> +#define GSID_DAR			0x1029

It's a shame you have to rip up all your wrapper functions now to
shoehorn these in.

If you included names analogous to the reg field names in the kvm
structures, the wrappers could do macro expansions that get them.

#define __GSID_WRAPPER_dar		GSID_DAR

Or similar.

And since of course you have to explicitly enumerate all these, I
wouldn't mind defining the types and lengths up-front rather than
down in the type function. You'd like to be able to go through the
spec and eyeball type, number, size.

[snip]

> +/**
> + * gsb_paddress() - the physical address of buffer
> + * @gsb: guest state buffer
> + *
> + * Returns the physical address of the buffer.
> + */
> +static inline u64 gsb_paddress(struct gs_buff *gsb)
> +{
> +	return __pa(gsb_header(gsb));
> +}

> +/**
> + * __gse_put_reg() - add a register type guest state element to a buffer
> + * @gsb: guest state buffer to add element to
> + * @iden: guest state ID
> + * @val: host endian value
> + *
> + * Adds a register type guest state element. Uses the guest state ID for
> + * determining the length of the guest element. If the guest state ID has
> + * bits that can not be set they will be cleared.
> + */
> +static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
> +{
> +	val &= gsid_mask(iden);
> +	if (gsid_size(iden) == sizeof(u64))
> +		return gse_put_u64(gsb, iden, val);
> +
> +	if (gsid_size(iden) == sizeof(u32)) {
> +		u32 tmp;
> +
> +		tmp = (u32)val;
> +		if (tmp != val)
> +			return -EINVAL;
> +
> +		return gse_put_u32(gsb, iden, tmp);
> +	}
> +	return -EINVAL;
> +}

There is a clever accessor that derives the length from the type, but
then you fall back to this.

> +/**
> + * gse_put - add a guest state element to a buffer
> + * @gsb: guest state buffer to add to
> + * @iden: guest state identity
> + * @v: generic value
> + */
> +#define gse_put(gsb, iden, v)					\
> +	(_Generic((v),						\
> +		  u64 : __gse_put_reg,				\
> +		  long unsigned int : __gse_put_reg,		\
> +		  u32 : __gse_put_reg,				\
> +		  struct gs_buff_info : gse_put_buff_info,	\
> +		  struct gs_proc_table : gse_put_proc_table,	\
> +		  struct gs_part_table : gse_put_part_table,	\
> +		  vector128 : gse_put_vector128)(gsb, iden, v))
> +
> +/**
> + * gse_get - return the data of a guest state element
> + * @gsb: guest state element to add to
> + * @v: generic value pointer to return in
> + */
> +#define gse_get(gse, v)						\
> +	(*v = (_Generic((v),					\
> +			u64 * : __gse_get_reg,			\
> +			unsigned long * : __gse_get_reg,	\
> +			u32 * : __gse_get_reg,			\
> +			vector128 * : gse_get_vector128)(gse)))

I don't see the benefit of this. Caller always knows the type doesn't
it? It seems like the right function could be called directly. It
makes the calling convention a bit clunky too. I know there's similar
precedent for uaccess functions, but not sure I like it for this.

> +struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
> +			unsigned long vcpu_id, gfp_t flags)
> +{
> +	struct gs_buff *gsb;
> +
> +	gsb = kzalloc(sizeof(*gsb), flags);
> +	if (!gsb)
> +		return NULL;
> +
> +	size = roundup_pow_of_two(size);
> +	gsb->hdr = kzalloc(size, GFP_KERNEL);
> +	if (!gsb->hdr)
> +		goto free;
> +
> +	gsb->capacity = size;
> +	gsb->len = sizeof(struct gs_header);
> +	gsb->vcpu_id = vcpu_id;
> +	gsb->guest_id = guest_id;
> +
> +	gsb->hdr->nelems = cpu_to_be32(0);
> +
> +	return gsb;
> +
> +free:
> +	kfree(gsb);
> +	return NULL;
> +}
> +EXPORT_SYMBOL(gsb_new);

Should all be GPL exports.

Needs more namespace too, I reckon (not just exports but any kernel-wide
name this short and non-descriptive needs a kvmppc or kvm_papr or
something).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 5/6] KVM: PPC: Add support for nested PAPR guests
  2023-06-05  6:48   ` Jordan Niethe
  (?)
@ 2023-06-07  9:08     ` Nicholas Piggin
  -1 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  9:08 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: mikey, kautuk.consul.1980, kvm, kvm-ppc, sbhat, vaibhav

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> A series of hcalls have been added to the PAPR which allow a regular
> guest partition to create and manage guest partitions of its own. Add
> support to KVM to utilize these hcalls to enable running nested guests.
>
> Overview of the new hcall usage:
>
> - L1 and L0 negotiate capabilities with
>   H_GUEST_{G,S}ET_CAPABILITIES()
>
> - L1 requests the L0 create a L2 with
>   H_GUEST_CREATE() and receives a handle to use in future hcalls
>
> - L1 requests the L0 create a L2 vCPU with
>   H_GUEST_CREATE_VCPU()
>
> - L1 sets up the L2 using H_GUEST_SET and the
>   H_GUEST_VCPU_RUN input buffer
>
> - L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN()
>
> - L2 returns to L1 with an exit reason and L1 reads the
>   H_GUEST_VCPU_RUN output buffer populated by the L0
>
> - L1 handles the exit using H_GET_STATE if necessary
>
> - L1 reruns L2 vCPU with H_GUEST_VCPU_RUN
>
> - L1 frees the L2 in the L0 with H_GUEST_DELETE()
>
> Support for the new API is determined by trying
> H_GUEST_GET_CAPABILITIES. On a successful return, the new API will then
> be used.
>
> Use the vcpu register state setters for tracking modified guest state
> elements and copy the thread wide values into the H_GUEST_VCPU_RUN input
> buffer immediately before running a L2. The guest wide
> elements can not be added to the input buffer so send them with a
> separate H_GUEST_SET call if necessary.
>
> Make the vcpu register getter load the corresponding value from the real
> host with H_GUEST_GET. To avoid unnecessarily calling H_GUEST_GET, track
> which values have already been loaded between H_GUEST_VCPU_RUN calls. If
> an element is present in the H_GUEST_VCPU_RUN output buffer it also does
> not need to be loaded again.
>
> There is existing support for running nested guests on KVM
> with powernv. However the interface used for this is not supported by
> other PAPR hosts. This existing API is still supported.
>
> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
> v2:
>   - Declare op structs as static
>   - Use expressions in switch case with local variables
>   - Do not use the PVR for the LOGICAL PVR ID
>   - Handle emul_inst as now a double word
>   - Use new GPR(), etc macros
>   - Determine PAPR nested capabilities from cpu features
> ---
>  arch/powerpc/include/asm/guest-state-buffer.h | 105 +-
>  arch/powerpc/include/asm/hvcall.h             |  30 +
>  arch/powerpc/include/asm/kvm_book3s.h         | 122 ++-
>  arch/powerpc/include/asm/kvm_book3s_64.h      |   6 +
>  arch/powerpc/include/asm/kvm_host.h           |  21 +
>  arch/powerpc/include/asm/kvm_ppc.h            |  64 +-
>  arch/powerpc/include/asm/plpar_wrappers.h     | 198 ++++
>  arch/powerpc/kvm/Makefile                     |   1 +
>  arch/powerpc/kvm/book3s_hv.c                  | 126 ++-
>  arch/powerpc/kvm/book3s_hv.h                  |  74 +-
>  arch/powerpc/kvm/book3s_hv_nested.c           |  38 +-
>  arch/powerpc/kvm/book3s_hv_papr.c             | 940 ++++++++++++++++++
>  arch/powerpc/kvm/emulate_loadstore.c          |   4 +-
>  arch/powerpc/kvm/guest-state-buffer.c         |  49 +
>  14 files changed, 1684 insertions(+), 94 deletions(-)
>  create mode 100644 arch/powerpc/kvm/book3s_hv_papr.c
>
> diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
> index 65a840abf1bb..116126edd8e2 100644
> --- a/arch/powerpc/include/asm/guest-state-buffer.h
> +++ b/arch/powerpc/include/asm/guest-state-buffer.h
> @@ -5,6 +5,7 @@
>  #ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
>  #define _ASM_POWERPC_GUEST_STATE_BUFFER_H
>  
> +#include "asm/hvcall.h"
>  #include <linux/gfp.h>
>  #include <linux/bitmap.h>
>  #include <asm/plpar_wrappers.h>
> @@ -14,16 +15,16 @@
>   **************************************************************************/
>  #define GSID_BLANK			0x0000
>  
> -#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
> -#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
> -#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
> -#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
> -#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
> -#define GSID_PROCESS_TABLE		0x0006 /* Process Table */
> +#define GSID_HOST_STATE_SIZE		0x0001
> +#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002
> +#define GSID_LOGICAL_PVR		0x0003
> +#define GSID_TB_OFFSET			0x0004
> +#define GSID_PARTITION_TABLE		0x0005
> +#define GSID_PROCESS_TABLE		0x0006

You lost your comments.

> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index 0ca2d8b37b42..c5c57552b447 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -12,6 +12,7 @@
>  #include <linux/types.h>
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_book3s_asm.h>
> +#include <asm/guest-state-buffer.h>
>  
>  struct kvmppc_bat {
>  	u64 raw;
> @@ -316,6 +317,57 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu);
>  
>  void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
>  
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +
> +extern bool __kvmhv_on_papr;
> +
> +static inline bool kvmhv_on_papr(void)
> +{
> +	return __kvmhv_on_papr;
> +}

It's a nitpick, but kvmhv_on_pseries() is because we're runnning KVM-HV
on a pseries guest kernel. Which is a papr guest kernel. So this kind of
doesn't make sense if you read it the same way.

kvmhv_nested_using_papr() or something like that might read a bit
better.

This could be a static key too.

> @@ -575,6 +593,7 @@ struct kvm_vcpu_arch {
>  	ulong dscr;
>  	ulong amr;
>  	ulong uamor;
> +	ulong amor;
>  	ulong iamr;
>  	u32 ctrl;
>  	u32 dabrx;

This belongs somewhere else.

> @@ -829,6 +848,8 @@ struct kvm_vcpu_arch {
>  	u64 nested_hfscr;	/* HFSCR that the L1 requested for the nested guest */
>  	u32 nested_vcpu_id;
>  	gpa_t nested_io_gpr;
> +	/* For nested APIv2 guests*/
> +	struct kvmhv_papr_host papr_host;
>  #endif

This is not exactly a papr host. Might have to come up with a better
name especially if we implement a L0 things could get confusing.

> @@ -342,6 +343,203 @@ static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
>  	return rc;
>  }
>  
> +static inline long plpar_guest_create(unsigned long flags, unsigned long *guest_id)
> +{
> +	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
> +	unsigned long token;
> +	long rc;
> +
> +	token = -1UL;
> +	while (true) {
> +		rc = plpar_hcall(H_GUEST_CREATE, retbuf, flags, token);
> +		if (rc == H_SUCCESS) {
> +			*guest_id = retbuf[0];
> +			break;
> +		}
> +
> +		if (rc == H_BUSY) {
> +			token = retbuf[0];
> +			cpu_relax();
> +			continue;
> +		}
> +
> +		if (H_IS_LONG_BUSY(rc)) {
> +			token = retbuf[0];
> +			mdelay(get_longbusy_msecs(rc));

All of these things need a non-sleeping delay? Can we sleep instead?
Or if not, might have to think about going back to the caller and it
can retry.

get/set state might be a bit inconvenient, although I don't expect
that should potentially take so long as guest and vcpu create/delete,
so at least those ones would be good if they're called while
preemptable.

> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 521d84621422..f22ee582e209 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -383,6 +383,11 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
>  	spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
>  }
>  
> +static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
> +{
> +	vcpu->arch.pvr = pvr;
> +}

Didn't you lose this in a previous patch? I thought it must have moved
to a header but it reappears.

> +
>  /* Dummy value used in computing PCR value below */
>  #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
>  
> @@ -1262,13 +1267,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  			return RESUME_HOST;
>  		break;
>  #endif
> -	case H_RANDOM:
> +	case H_RANDOM: {
>  		unsigned long rand;
>  
>  		if (!arch_get_random_seed_longs(&rand, 1))
>  			ret = H_HARDWARE;
>  		kvmppc_set_gpr(vcpu, 4, rand);
>  		break;
> +	}
>  	case H_RPT_INVALIDATE:
>  		ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
>  					      kvmppc_get_gpr(vcpu, 5),

Compile fix for a previous patch.

> @@ -2921,14 +2927,21 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
>  	vcpu->arch.shared_big_endian = false;
>  #endif
>  #endif
> -	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
>  
> +	if (kvmhv_on_papr()) {
> +		err = kvmhv_papr_vcpu_create(vcpu, &vcpu->arch.papr_host);
> +		if (err < 0)
> +			return err;
> +	}
> +
> +	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
>  	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>  		kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
>  		kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
>  	}
>  
>  	kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
> +	kvmppc_set_amor_hv(vcpu, ~0);

This AMOR thing should go somewhere else. Not actually sure why it needs
to be added to the vcpu since it's always ~0. Maybe just put that in a
#define somewhere and use that.

> @@ -4042,6 +4059,50 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> +static int kvmhv_vcpu_entry_papr(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
> +{
> +	struct kvmhv_papr_host *ph;
> +	unsigned long msr, i;
> +	int trap;
> +	long rc;
> +
> +	ph = &vcpu->arch.papr_host;
> +
> +	msr = mfmsr();
> +	kvmppc_msr_hard_disable_set_facilities(vcpu, msr);
> +	if (lazy_irq_pending())
> +		return 0;
> +
> +	kvmhv_papr_flush_vcpu(vcpu, time_limit);
> +
> +	accumulate_time(vcpu, &vcpu->arch.in_guest);
> +	rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
> +				  &trap, &i);
> +
> +	if (rc != H_SUCCESS) {
> +		pr_err("KVM Guest Run VCPU hcall failed\n");
> +		if (rc == H_INVALID_ELEMENT_ID)
> +			pr_err("KVM: Guest Run VCPU invalid element id at %ld\n", i);
> +		else if (rc == H_INVALID_ELEMENT_SIZE)
> +			pr_err("KVM: Guest Run VCPU invalid element size at %ld\n", i);
> +		else if (rc == H_INVALID_ELEMENT_VALUE)
> +			pr_err("KVM: Guest Run VCPU invalid element value at %ld\n", i);
> +		return 0;
> +	}

This needs the proper error handling. Were you going to wait until I
sent that up for existing code?

> @@ -5119,6 +5183,7 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
>   */
>  void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
>  {
> +	struct kvm_vcpu *vcpu;
>  	long int i;
>  	u32 cores_done = 0;
>  
> @@ -5139,6 +5204,12 @@ void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
>  		if (++cores_done >= kvm->arch.online_vcores)
>  			break;
>  	}
> +
> +	if (kvmhv_on_papr()) {
> +		kvm_for_each_vcpu(i, vcpu, kvm) {
> +			kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
> +		}
> +	}
>  }

vcpu define could go in that scope I guess.

> @@ -5405,15 +5476,43 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>  
>  	/* Allocate the guest's logical partition ID */
>  
> -	lpid = kvmppc_alloc_lpid();
> -	if ((long)lpid < 0)
> -		return -ENOMEM;
> -	kvm->arch.lpid = lpid;
> +	if (!kvmhv_on_papr()) {
> +		lpid = kvmppc_alloc_lpid();
> +		if ((long)lpid < 0)
> +			return -ENOMEM;
> +		kvm->arch.lpid = lpid;
> +	}
>  
>  	kvmppc_alloc_host_rm_ops();
>  
>  	kvmhv_vm_nested_init(kvm);
>  
> +	if (kvmhv_on_papr()) {
> +		long rc;
> +		unsigned long guest_id;
> +
> +		rc = plpar_guest_create(0, &guest_id);
> +
> +		if (rc != H_SUCCESS)
> +			pr_err("KVM: Create Guest hcall failed, rc=%ld\n", rc);
> +
> +		switch (rc) {
> +		case H_PARAMETER:
> +		case H_FUNCTION:
> +		case H_STATE:
> +			return -EINVAL;
> +		case H_NOT_ENOUGH_RESOURCES:
> +		case H_ABORTED:
> +			return -ENOMEM;
> +		case H_AUTHORITY:
> +			return -EPERM;
> +		case H_NOT_AVAILABLE:
> +			return -EBUSY;
> +		}
> +		kvm->arch.lpid = guest_id;
> +	}

I wouldn't mind putting lpid/guest_id in different variables. guest_id
is 64-bit isn't it? LPIDR is 32. If nothing else that could cause
issues if the hypervisor does something clever with the token.

> @@ -5573,10 +5675,14 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
>  		kvm->arch.process_table = 0;
>  		if (kvm->arch.secure_guest)
>  			uv_svm_terminate(kvm->arch.lpid);
> -		kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
> +		if (!kvmhv_on_papr())
> +			kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
>  	}

Would be nice to have a +ve test for the "existing" API. All we have to
do is think of a name for it.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 5/6] KVM: PPC: Add support for nested PAPR guests
@ 2023-06-07  9:08     ` Nicholas Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  9:08 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: kvm, kvm-ppc, mikey, paulus, kautuk.consul.1980, vaibhav, sbhat

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> A series of hcalls have been added to the PAPR which allow a regular
> guest partition to create and manage guest partitions of its own. Add
> support to KVM to utilize these hcalls to enable running nested guests.
>
> Overview of the new hcall usage:
>
> - L1 and L0 negotiate capabilities with
>   H_GUEST_{G,S}ET_CAPABILITIES()
>
> - L1 requests the L0 create a L2 with
>   H_GUEST_CREATE() and receives a handle to use in future hcalls
>
> - L1 requests the L0 create a L2 vCPU with
>   H_GUEST_CREATE_VCPU()
>
> - L1 sets up the L2 using H_GUEST_SET and the
>   H_GUEST_VCPU_RUN input buffer
>
> - L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN()
>
> - L2 returns to L1 with an exit reason and L1 reads the
>   H_GUEST_VCPU_RUN output buffer populated by the L0
>
> - L1 handles the exit using H_GET_STATE if necessary
>
> - L1 reruns L2 vCPU with H_GUEST_VCPU_RUN
>
> - L1 frees the L2 in the L0 with H_GUEST_DELETE()
>
> Support for the new API is determined by trying
> H_GUEST_GET_CAPABILITIES. On a successful return, the new API will then
> be used.
>
> Use the vcpu register state setters for tracking modified guest state
> elements and copy the thread wide values into the H_GUEST_VCPU_RUN input
> buffer immediately before running a L2. The guest wide
> elements can not be added to the input buffer so send them with a
> separate H_GUEST_SET call if necessary.
>
> Make the vcpu register getter load the corresponding value from the real
> host with H_GUEST_GET. To avoid unnecessarily calling H_GUEST_GET, track
> which values have already been loaded between H_GUEST_VCPU_RUN calls. If
> an element is present in the H_GUEST_VCPU_RUN output buffer it also does
> not need to be loaded again.
>
> There is existing support for running nested guests on KVM
> with powernv. However the interface used for this is not supported by
> other PAPR hosts. This existing API is still supported.
>
> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
> v2:
>   - Declare op structs as static
>   - Use expressions in switch case with local variables
>   - Do not use the PVR for the LOGICAL PVR ID
>   - Handle emul_inst as now a double word
>   - Use new GPR(), etc macros
>   - Determine PAPR nested capabilities from cpu features
> ---
>  arch/powerpc/include/asm/guest-state-buffer.h | 105 +-
>  arch/powerpc/include/asm/hvcall.h             |  30 +
>  arch/powerpc/include/asm/kvm_book3s.h         | 122 ++-
>  arch/powerpc/include/asm/kvm_book3s_64.h      |   6 +
>  arch/powerpc/include/asm/kvm_host.h           |  21 +
>  arch/powerpc/include/asm/kvm_ppc.h            |  64 +-
>  arch/powerpc/include/asm/plpar_wrappers.h     | 198 ++++
>  arch/powerpc/kvm/Makefile                     |   1 +
>  arch/powerpc/kvm/book3s_hv.c                  | 126 ++-
>  arch/powerpc/kvm/book3s_hv.h                  |  74 +-
>  arch/powerpc/kvm/book3s_hv_nested.c           |  38 +-
>  arch/powerpc/kvm/book3s_hv_papr.c             | 940 ++++++++++++++++++
>  arch/powerpc/kvm/emulate_loadstore.c          |   4 +-
>  arch/powerpc/kvm/guest-state-buffer.c         |  49 +
>  14 files changed, 1684 insertions(+), 94 deletions(-)
>  create mode 100644 arch/powerpc/kvm/book3s_hv_papr.c
>
> diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
> index 65a840abf1bb..116126edd8e2 100644
> --- a/arch/powerpc/include/asm/guest-state-buffer.h
> +++ b/arch/powerpc/include/asm/guest-state-buffer.h
> @@ -5,6 +5,7 @@
>  #ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
>  #define _ASM_POWERPC_GUEST_STATE_BUFFER_H
>  
> +#include "asm/hvcall.h"
>  #include <linux/gfp.h>
>  #include <linux/bitmap.h>
>  #include <asm/plpar_wrappers.h>
> @@ -14,16 +15,16 @@
>   **************************************************************************/
>  #define GSID_BLANK			0x0000
>  
> -#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
> -#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
> -#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
> -#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
> -#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
> -#define GSID_PROCESS_TABLE		0x0006 /* Process Table */
> +#define GSID_HOST_STATE_SIZE		0x0001
> +#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002
> +#define GSID_LOGICAL_PVR		0x0003
> +#define GSID_TB_OFFSET			0x0004
> +#define GSID_PARTITION_TABLE		0x0005
> +#define GSID_PROCESS_TABLE		0x0006

You lost your comments.

> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index 0ca2d8b37b42..c5c57552b447 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -12,6 +12,7 @@
>  #include <linux/types.h>
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_book3s_asm.h>
> +#include <asm/guest-state-buffer.h>
>  
>  struct kvmppc_bat {
>  	u64 raw;
> @@ -316,6 +317,57 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu);
>  
>  void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
>  
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +
> +extern bool __kvmhv_on_papr;
> +
> +static inline bool kvmhv_on_papr(void)
> +{
> +	return __kvmhv_on_papr;
> +}

It's a nitpick, but kvmhv_on_pseries() is because we're runnning KVM-HV
on a pseries guest kernel. Which is a papr guest kernel. So this kind of
doesn't make sense if you read it the same way.

kvmhv_nested_using_papr() or something like that might read a bit
better.

This could be a static key too.

> @@ -575,6 +593,7 @@ struct kvm_vcpu_arch {
>  	ulong dscr;
>  	ulong amr;
>  	ulong uamor;
> +	ulong amor;
>  	ulong iamr;
>  	u32 ctrl;
>  	u32 dabrx;

This belongs somewhere else.

> @@ -829,6 +848,8 @@ struct kvm_vcpu_arch {
>  	u64 nested_hfscr;	/* HFSCR that the L1 requested for the nested guest */
>  	u32 nested_vcpu_id;
>  	gpa_t nested_io_gpr;
> +	/* For nested APIv2 guests*/
> +	struct kvmhv_papr_host papr_host;
>  #endif

This is not exactly a papr host. Might have to come up with a better
name especially if we implement a L0 things could get confusing.

> @@ -342,6 +343,203 @@ static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
>  	return rc;
>  }
>  
> +static inline long plpar_guest_create(unsigned long flags, unsigned long *guest_id)
> +{
> +	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
> +	unsigned long token;
> +	long rc;
> +
> +	token = -1UL;
> +	while (true) {
> +		rc = plpar_hcall(H_GUEST_CREATE, retbuf, flags, token);
> +		if (rc == H_SUCCESS) {
> +			*guest_id = retbuf[0];
> +			break;
> +		}
> +
> +		if (rc == H_BUSY) {
> +			token = retbuf[0];
> +			cpu_relax();
> +			continue;
> +		}
> +
> +		if (H_IS_LONG_BUSY(rc)) {
> +			token = retbuf[0];
> +			mdelay(get_longbusy_msecs(rc));

All of these things need a non-sleeping delay? Can we sleep instead?
Or if not, might have to think about going back to the caller and it
can retry.

get/set state might be a bit inconvenient, although I don't expect
that should potentially take so long as guest and vcpu create/delete,
so at least those ones would be good if they're called while
preemptable.

> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 521d84621422..f22ee582e209 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -383,6 +383,11 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
>  	spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
>  }
>  
> +static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
> +{
> +	vcpu->arch.pvr = pvr;
> +}

Didn't you lose this in a previous patch? I thought it must have moved
to a header but it reappears.

> +
>  /* Dummy value used in computing PCR value below */
>  #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
>  
> @@ -1262,13 +1267,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  			return RESUME_HOST;
>  		break;
>  #endif
> -	case H_RANDOM:
> +	case H_RANDOM: {
>  		unsigned long rand;
>  
>  		if (!arch_get_random_seed_longs(&rand, 1))
>  			ret = H_HARDWARE;
>  		kvmppc_set_gpr(vcpu, 4, rand);
>  		break;
> +	}
>  	case H_RPT_INVALIDATE:
>  		ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
>  					      kvmppc_get_gpr(vcpu, 5),

Compile fix for a previous patch.

> @@ -2921,14 +2927,21 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
>  	vcpu->arch.shared_big_endian = false;
>  #endif
>  #endif
> -	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
>  
> +	if (kvmhv_on_papr()) {
> +		err = kvmhv_papr_vcpu_create(vcpu, &vcpu->arch.papr_host);
> +		if (err < 0)
> +			return err;
> +	}
> +
> +	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
>  	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>  		kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
>  		kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
>  	}
>  
>  	kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
> +	kvmppc_set_amor_hv(vcpu, ~0);

This AMOR thing should go somewhere else. Not actually sure why it needs
to be added to the vcpu since it's always ~0. Maybe just put that in a
#define somewhere and use that.

> @@ -4042,6 +4059,50 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> +static int kvmhv_vcpu_entry_papr(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
> +{
> +	struct kvmhv_papr_host *ph;
> +	unsigned long msr, i;
> +	int trap;
> +	long rc;
> +
> +	ph = &vcpu->arch.papr_host;
> +
> +	msr = mfmsr();
> +	kvmppc_msr_hard_disable_set_facilities(vcpu, msr);
> +	if (lazy_irq_pending())
> +		return 0;
> +
> +	kvmhv_papr_flush_vcpu(vcpu, time_limit);
> +
> +	accumulate_time(vcpu, &vcpu->arch.in_guest);
> +	rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
> +				  &trap, &i);
> +
> +	if (rc != H_SUCCESS) {
> +		pr_err("KVM Guest Run VCPU hcall failed\n");
> +		if (rc == H_INVALID_ELEMENT_ID)
> +			pr_err("KVM: Guest Run VCPU invalid element id at %ld\n", i);
> +		else if (rc == H_INVALID_ELEMENT_SIZE)
> +			pr_err("KVM: Guest Run VCPU invalid element size at %ld\n", i);
> +		else if (rc == H_INVALID_ELEMENT_VALUE)
> +			pr_err("KVM: Guest Run VCPU invalid element value at %ld\n", i);
> +		return 0;
> +	}

This needs the proper error handling. Were you going to wait until I
sent that up for existing code?

> @@ -5119,6 +5183,7 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
>   */
>  void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
>  {
> +	struct kvm_vcpu *vcpu;
>  	long int i;
>  	u32 cores_done = 0;
>  
> @@ -5139,6 +5204,12 @@ void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
>  		if (++cores_done >= kvm->arch.online_vcores)
>  			break;
>  	}
> +
> +	if (kvmhv_on_papr()) {
> +		kvm_for_each_vcpu(i, vcpu, kvm) {
> +			kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
> +		}
> +	}
>  }

vcpu define could go in that scope I guess.

> @@ -5405,15 +5476,43 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>  
>  	/* Allocate the guest's logical partition ID */
>  
> -	lpid = kvmppc_alloc_lpid();
> -	if ((long)lpid < 0)
> -		return -ENOMEM;
> -	kvm->arch.lpid = lpid;
> +	if (!kvmhv_on_papr()) {
> +		lpid = kvmppc_alloc_lpid();
> +		if ((long)lpid < 0)
> +			return -ENOMEM;
> +		kvm->arch.lpid = lpid;
> +	}
>  
>  	kvmppc_alloc_host_rm_ops();
>  
>  	kvmhv_vm_nested_init(kvm);
>  
> +	if (kvmhv_on_papr()) {
> +		long rc;
> +		unsigned long guest_id;
> +
> +		rc = plpar_guest_create(0, &guest_id);
> +
> +		if (rc != H_SUCCESS)
> +			pr_err("KVM: Create Guest hcall failed, rc=%ld\n", rc);
> +
> +		switch (rc) {
> +		case H_PARAMETER:
> +		case H_FUNCTION:
> +		case H_STATE:
> +			return -EINVAL;
> +		case H_NOT_ENOUGH_RESOURCES:
> +		case H_ABORTED:
> +			return -ENOMEM;
> +		case H_AUTHORITY:
> +			return -EPERM;
> +		case H_NOT_AVAILABLE:
> +			return -EBUSY;
> +		}
> +		kvm->arch.lpid = guest_id;
> +	}

I wouldn't mind putting lpid/guest_id in different variables. guest_id
is 64-bit isn't it? LPIDR is 32. If nothing else that could cause
issues if the hypervisor does something clever with the token.

> @@ -5573,10 +5675,14 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
>  		kvm->arch.process_table = 0;
>  		if (kvm->arch.secure_guest)
>  			uv_svm_terminate(kvm->arch.lpid);
> -		kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
> +		if (!kvmhv_on_papr())
> +			kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
>  	}

Would be nice to have a +ve test for the "existing" API. All we have to
do is think of a name for it.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 5/6] KVM: PPC: Add support for nested PAPR guests
@ 2023-06-07  9:08     ` Nicholas Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nicholas Piggin @ 2023-06-07  9:08 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev
  Cc: mikey, kautuk.consul.1980, kvm, kvm-ppc, sbhat, vaibhav

On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> A series of hcalls have been added to the PAPR which allow a regular
> guest partition to create and manage guest partitions of its own. Add
> support to KVM to utilize these hcalls to enable running nested guests.
>
> Overview of the new hcall usage:
>
> - L1 and L0 negotiate capabilities with
>   H_GUEST_{G,S}ET_CAPABILITIES()
>
> - L1 requests the L0 create a L2 with
>   H_GUEST_CREATE() and receives a handle to use in future hcalls
>
> - L1 requests the L0 create a L2 vCPU with
>   H_GUEST_CREATE_VCPU()
>
> - L1 sets up the L2 using H_GUEST_SET and the
>   H_GUEST_VCPU_RUN input buffer
>
> - L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN()
>
> - L2 returns to L1 with an exit reason and L1 reads the
>   H_GUEST_VCPU_RUN output buffer populated by the L0
>
> - L1 handles the exit using H_GET_STATE if necessary
>
> - L1 reruns L2 vCPU with H_GUEST_VCPU_RUN
>
> - L1 frees the L2 in the L0 with H_GUEST_DELETE()
>
> Support for the new API is determined by trying
> H_GUEST_GET_CAPABILITIES. On a successful return, the new API will then
> be used.
>
> Use the vcpu register state setters for tracking modified guest state
> elements and copy the thread wide values into the H_GUEST_VCPU_RUN input
> buffer immediately before running a L2. The guest wide
> elements can not be added to the input buffer so send them with a
> separate H_GUEST_SET call if necessary.
>
> Make the vcpu register getter load the corresponding value from the real
> host with H_GUEST_GET. To avoid unnecessarily calling H_GUEST_GET, track
> which values have already been loaded between H_GUEST_VCPU_RUN calls. If
> an element is present in the H_GUEST_VCPU_RUN output buffer it also does
> not need to be loaded again.
>
> There is existing support for running nested guests on KVM
> with powernv. However the interface used for this is not supported by
> other PAPR hosts. This existing API is still supported.
>
> Signed-off-by: Jordan Niethe <jpn@linux.vnet.ibm.com>
> ---
> v2:
>   - Declare op structs as static
>   - Use expressions in switch case with local variables
>   - Do not use the PVR for the LOGICAL PVR ID
>   - Handle emul_inst as now a double word
>   - Use new GPR(), etc macros
>   - Determine PAPR nested capabilities from cpu features
> ---
>  arch/powerpc/include/asm/guest-state-buffer.h | 105 +-
>  arch/powerpc/include/asm/hvcall.h             |  30 +
>  arch/powerpc/include/asm/kvm_book3s.h         | 122 ++-
>  arch/powerpc/include/asm/kvm_book3s_64.h      |   6 +
>  arch/powerpc/include/asm/kvm_host.h           |  21 +
>  arch/powerpc/include/asm/kvm_ppc.h            |  64 +-
>  arch/powerpc/include/asm/plpar_wrappers.h     | 198 ++++
>  arch/powerpc/kvm/Makefile                     |   1 +
>  arch/powerpc/kvm/book3s_hv.c                  | 126 ++-
>  arch/powerpc/kvm/book3s_hv.h                  |  74 +-
>  arch/powerpc/kvm/book3s_hv_nested.c           |  38 +-
>  arch/powerpc/kvm/book3s_hv_papr.c             | 940 ++++++++++++++++++
>  arch/powerpc/kvm/emulate_loadstore.c          |   4 +-
>  arch/powerpc/kvm/guest-state-buffer.c         |  49 +
>  14 files changed, 1684 insertions(+), 94 deletions(-)
>  create mode 100644 arch/powerpc/kvm/book3s_hv_papr.c
>
> diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
> index 65a840abf1bb..116126edd8e2 100644
> --- a/arch/powerpc/include/asm/guest-state-buffer.h
> +++ b/arch/powerpc/include/asm/guest-state-buffer.h
> @@ -5,6 +5,7 @@
>  #ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
>  #define _ASM_POWERPC_GUEST_STATE_BUFFER_H
>  
> +#include "asm/hvcall.h"
>  #include <linux/gfp.h>
>  #include <linux/bitmap.h>
>  #include <asm/plpar_wrappers.h>
> @@ -14,16 +15,16 @@
>   **************************************************************************/
>  #define GSID_BLANK			0x0000
>  
> -#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
> -#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
> -#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
> -#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
> -#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
> -#define GSID_PROCESS_TABLE		0x0006 /* Process Table */
> +#define GSID_HOST_STATE_SIZE		0x0001
> +#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002
> +#define GSID_LOGICAL_PVR		0x0003
> +#define GSID_TB_OFFSET			0x0004
> +#define GSID_PARTITION_TABLE		0x0005
> +#define GSID_PROCESS_TABLE		0x0006

You lost your comments.

> diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> index 0ca2d8b37b42..c5c57552b447 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -12,6 +12,7 @@
>  #include <linux/types.h>
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_book3s_asm.h>
> +#include <asm/guest-state-buffer.h>
>  
>  struct kvmppc_bat {
>  	u64 raw;
> @@ -316,6 +317,57 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu);
>  
>  void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
>  
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +
> +extern bool __kvmhv_on_papr;
> +
> +static inline bool kvmhv_on_papr(void)
> +{
> +	return __kvmhv_on_papr;
> +}

It's a nitpick, but kvmhv_on_pseries() is because we're runnning KVM-HV
on a pseries guest kernel. Which is a papr guest kernel. So this kind of
doesn't make sense if you read it the same way.

kvmhv_nested_using_papr() or something like that might read a bit
better.

This could be a static key too.

> @@ -575,6 +593,7 @@ struct kvm_vcpu_arch {
>  	ulong dscr;
>  	ulong amr;
>  	ulong uamor;
> +	ulong amor;
>  	ulong iamr;
>  	u32 ctrl;
>  	u32 dabrx;

This belongs somewhere else.

> @@ -829,6 +848,8 @@ struct kvm_vcpu_arch {
>  	u64 nested_hfscr;	/* HFSCR that the L1 requested for the nested guest */
>  	u32 nested_vcpu_id;
>  	gpa_t nested_io_gpr;
> +	/* For nested APIv2 guests*/
> +	struct kvmhv_papr_host papr_host;
>  #endif

This is not exactly a papr host. Might have to come up with a better
name especially if we implement a L0 things could get confusing.

> @@ -342,6 +343,203 @@ static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
>  	return rc;
>  }
>  
> +static inline long plpar_guest_create(unsigned long flags, unsigned long *guest_id)
> +{
> +	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
> +	unsigned long token;
> +	long rc;
> +
> +	token = -1UL;
> +	while (true) {
> +		rc = plpar_hcall(H_GUEST_CREATE, retbuf, flags, token);
> +		if (rc == H_SUCCESS) {
> +			*guest_id = retbuf[0];
> +			break;
> +		}
> +
> +		if (rc == H_BUSY) {
> +			token = retbuf[0];
> +			cpu_relax();
> +			continue;
> +		}
> +
> +		if (H_IS_LONG_BUSY(rc)) {
> +			token = retbuf[0];
> +			mdelay(get_longbusy_msecs(rc));

All of these things need a non-sleeping delay? Can we sleep instead?
Or if not, might have to think about going back to the caller and it
can retry.

get/set state might be a bit inconvenient, although I don't expect
that should potentially take so long as guest and vcpu create/delete,
so at least those ones would be good if they're called while
preemptable.

> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 521d84621422..f22ee582e209 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -383,6 +383,11 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
>  	spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
>  }
>  
> +static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
> +{
> +	vcpu->arch.pvr = pvr;
> +}

Didn't you lose this in a previous patch? I thought it must have moved
to a header but it reappears.

> +
>  /* Dummy value used in computing PCR value below */
>  #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
>  
> @@ -1262,13 +1267,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  			return RESUME_HOST;
>  		break;
>  #endif
> -	case H_RANDOM:
> +	case H_RANDOM: {
>  		unsigned long rand;
>  
>  		if (!arch_get_random_seed_longs(&rand, 1))
>  			ret = H_HARDWARE;
>  		kvmppc_set_gpr(vcpu, 4, rand);
>  		break;
> +	}
>  	case H_RPT_INVALIDATE:
>  		ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
>  					      kvmppc_get_gpr(vcpu, 5),

Compile fix for a previous patch.

> @@ -2921,14 +2927,21 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
>  	vcpu->arch.shared_big_endian = false;
>  #endif
>  #endif
> -	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
>  
> +	if (kvmhv_on_papr()) {
> +		err = kvmhv_papr_vcpu_create(vcpu, &vcpu->arch.papr_host);
> +		if (err < 0)
> +			return err;
> +	}
> +
> +	kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
>  	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>  		kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
>  		kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
>  	}
>  
>  	kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
> +	kvmppc_set_amor_hv(vcpu, ~0);

This AMOR thing should go somewhere else. Not actually sure why it needs
to be added to the vcpu since it's always ~0. Maybe just put that in a
#define somewhere and use that.

> @@ -4042,6 +4059,50 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> +static int kvmhv_vcpu_entry_papr(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
> +{
> +	struct kvmhv_papr_host *ph;
> +	unsigned long msr, i;
> +	int trap;
> +	long rc;
> +
> +	ph = &vcpu->arch.papr_host;
> +
> +	msr = mfmsr();
> +	kvmppc_msr_hard_disable_set_facilities(vcpu, msr);
> +	if (lazy_irq_pending())
> +		return 0;
> +
> +	kvmhv_papr_flush_vcpu(vcpu, time_limit);
> +
> +	accumulate_time(vcpu, &vcpu->arch.in_guest);
> +	rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
> +				  &trap, &i);
> +
> +	if (rc != H_SUCCESS) {
> +		pr_err("KVM Guest Run VCPU hcall failed\n");
> +		if (rc == H_INVALID_ELEMENT_ID)
> +			pr_err("KVM: Guest Run VCPU invalid element id at %ld\n", i);
> +		else if (rc == H_INVALID_ELEMENT_SIZE)
> +			pr_err("KVM: Guest Run VCPU invalid element size at %ld\n", i);
> +		else if (rc == H_INVALID_ELEMENT_VALUE)
> +			pr_err("KVM: Guest Run VCPU invalid element value at %ld\n", i);
> +		return 0;
> +	}

This needs the proper error handling. Were you going to wait until I
sent that up for existing code?

> @@ -5119,6 +5183,7 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
>   */
>  void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
>  {
> +	struct kvm_vcpu *vcpu;
>  	long int i;
>  	u32 cores_done = 0;
>  
> @@ -5139,6 +5204,12 @@ void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
>  		if (++cores_done >= kvm->arch.online_vcores)
>  			break;
>  	}
> +
> +	if (kvmhv_on_papr()) {
> +		kvm_for_each_vcpu(i, vcpu, kvm) {
> +			kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
> +		}
> +	}
>  }

vcpu define could go in that scope I guess.

> @@ -5405,15 +5476,43 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>  
>  	/* Allocate the guest's logical partition ID */
>  
> -	lpid = kvmppc_alloc_lpid();
> -	if ((long)lpid < 0)
> -		return -ENOMEM;
> -	kvm->arch.lpid = lpid;
> +	if (!kvmhv_on_papr()) {
> +		lpid = kvmppc_alloc_lpid();
> +		if ((long)lpid < 0)
> +			return -ENOMEM;
> +		kvm->arch.lpid = lpid;
> +	}
>  
>  	kvmppc_alloc_host_rm_ops();
>  
>  	kvmhv_vm_nested_init(kvm);
>  
> +	if (kvmhv_on_papr()) {
> +		long rc;
> +		unsigned long guest_id;
> +
> +		rc = plpar_guest_create(0, &guest_id);
> +
> +		if (rc != H_SUCCESS)
> +			pr_err("KVM: Create Guest hcall failed, rc=%ld\n", rc);
> +
> +		switch (rc) {
> +		case H_PARAMETER:
> +		case H_FUNCTION:
> +		case H_STATE:
> +			return -EINVAL;
> +		case H_NOT_ENOUGH_RESOURCES:
> +		case H_ABORTED:
> +			return -ENOMEM;
> +		case H_AUTHORITY:
> +			return -EPERM;
> +		case H_NOT_AVAILABLE:
> +			return -EBUSY;
> +		}
> +		kvm->arch.lpid = guest_id;
> +	}

I wouldn't mind putting lpid/guest_id in different variables. guest_id
is 64-bit isn't it? LPIDR is 32. If nothing else that could cause
issues if the hypervisor does something clever with the token.

> @@ -5573,10 +5675,14 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
>  		kvm->arch.process_table = 0;
>  		if (kvm->arch.secure_guest)
>  			uv_svm_terminate(kvm->arch.lpid);
> -		kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
> +		if (!kvmhv_on_papr())
> +			kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
>  	}

Would be nice to have a +ve test for the "existing" API. All we have to
do is think of a name for it.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH RFC v2 6/6] docs: powerpc: Document nested KVM on POWER
  2023-06-07  5:37     ` Gautam Menghani
  (?)
@ 2023-06-10  1:39       ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:39 UTC (permalink / raw)
  To: Gautam Menghani
  Cc: Jordan Niethe, mikey, kautuk.consul.1980, kvm, kvm-ppc, npiggin,
	sbhat, vaibhav, linuxppc-dev

On Wed, Jun 7, 2023 at 3:38 PM Gautam Menghani <gautam@linux.ibm.com> wrote:
>
> On Mon, Jun 05, 2023 at 04:48:48PM +1000, Jordan Niethe wrote:
> > From: Michael Neuling <mikey@neuling.org>
>
> Hi,
> There are some minor typos in the documentation pointed out below

Thank you, will correct in the next revision.

Jordan
>
>
> > +H_GUEST_GET_STATE()
> > +-------------------
> > +
> > +This is called to get state associated with an L2 (Guest-wide or vCPU specific).
> > +This info is passed via the Guest State Buffer (GSB), a standard format as
> > +explained later in this doc, necessary details below:
> > +
> > +This can set either L2 wide or vcpu specific information. Examples of
>
> We are getting the info about vcpu here : s/set/get
>
> > +H_GUEST_RUN_VCPU()
> > +------------------
> > +
> > +This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
> > +parameters. The vCPU run with the state set previously using
>
> Minor nit : s/run/runs
>
> > +H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
> > +hcall.
> > +
> > +This hcall also has associated input and output GSBs. Unlike
> > +H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
> > +parameters to the hcall (This was done in the interest of
> > +performance). The locations of these GSBs must be preregistered using
> > +the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
> > +below).
> > +
> >
> > --
> > 2.31.1
> >
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH RFC v2 6/6] docs: powerpc: Document nested KVM on POWER
@ 2023-06-10  1:39       ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:39 UTC (permalink / raw)
  To: Gautam Menghani
  Cc: mikey, kautuk.consul.1980, kvm, kvm-ppc, Jordan Niethe, npiggin,
	sbhat, vaibhav, linuxppc-dev

On Wed, Jun 7, 2023 at 3:38 PM Gautam Menghani <gautam@linux.ibm.com> wrote:
>
> On Mon, Jun 05, 2023 at 04:48:48PM +1000, Jordan Niethe wrote:
> > From: Michael Neuling <mikey@neuling.org>
>
> Hi,
> There are some minor typos in the documentation pointed out below

Thank you, will correct in the next revision.

Jordan
>
>
> > +H_GUEST_GET_STATE()
> > +-------------------
> > +
> > +This is called to get state associated with an L2 (Guest-wide or vCPU specific).
> > +This info is passed via the Guest State Buffer (GSB), a standard format as
> > +explained later in this doc, necessary details below:
> > +
> > +This can set either L2 wide or vcpu specific information. Examples of
>
> We are getting the info about vcpu here : s/set/get
>
> > +H_GUEST_RUN_VCPU()
> > +------------------
> > +
> > +This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
> > +parameters. The vCPU run with the state set previously using
>
> Minor nit : s/run/runs
>
> > +H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
> > +hcall.
> > +
> > +This hcall also has associated input and output GSBs. Unlike
> > +H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
> > +parameters to the hcall (This was done in the interest of
> > +performance). The locations of these GSBs must be preregistered using
> > +the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
> > +below).
> > +
> >
> > --
> > 2.31.1
> >
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH RFC v2 6/6] docs: powerpc: Document nested KVM on POWER
@ 2023-06-10  1:39       ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:39 UTC (permalink / raw)
  To: Gautam Menghani
  Cc: Jordan Niethe, mikey, kautuk.consul.1980, kvm, kvm-ppc, npiggin,
	sbhat, vaibhav, linuxppc-dev

On Wed, Jun 7, 2023 at 3:38 PM Gautam Menghani <gautam@linux.ibm.com> wrote:
>
> On Mon, Jun 05, 2023 at 04:48:48PM +1000, Jordan Niethe wrote:
> > From: Michael Neuling <mikey@neuling.org>
>
> Hi,
> There are some minor typos in the documentation pointed out below

Thank you, will correct in the next revision.

Jordan
>
>
> > +H_GUEST_GET_STATE()
> > +-------------------
> > +
> > +This is called to get state associated with an L2 (Guest-wide or vCPU specific).
> > +This info is passed via the Guest State Buffer (GSB), a standard format as
> > +explained later in this doc, necessary details below:
> > +
> > +This can set either L2 wide or vcpu specific information. Examples of
>
> We are getting the info about vcpu here : s/set/get
>
> > +H_GUEST_RUN_VCPU()
> > +------------------
> > +
> > +This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
> > +parameters. The vCPU run with the state set previously using
>
> Minor nit : s/run/runs
>
> > +H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
> > +hcall.
> > +
> > +This hcall also has associated input and output GSBs. Unlike
> > +H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
> > +parameters to the hcall (This was done in the interest of
> > +performance). The locations of these GSBs must be preregistered using
> > +the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
> > +below).
> > +
> >
> > --
> > 2.31.1
> >
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests
  2023-06-07  5:53   ` Nicholas Piggin
  (?)
@ 2023-06-10  1:46     ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:46 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Jordan Niethe, linuxppc-dev, mikey, kautuk.consul.1980, kvm,
	kvm-ppc, sbhat, vaibhav

On Wed, Jun 7, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> > There is existing support for nested guests on powernv hosts however the
> > hcall interface this uses is not support by other PAPR hosts.
>
> I kind of liked it being called nested-HV v1 and v2 APIs as short and
> to the point, but I suppose that's ambiguous with version 2 of the v1
> API, so papr is okay. What's the old API called in this scheme, then?
> "Existing API" is not great after patches go upstream.

Yes I was trying for a more descriptive name but it is just more
confusing and I'm struggling for a better alternative.

In the next revision I'll use v1 and v2. For version 2 of v1
we now call it v1.2 or something like that?

>
> And, you've probably explained it pretty well but slightly more of
> a background first up could be helpful. E.g.,
>
>   A nested-HV API for PAPR has been developed based on the KVM-specific
>   nested-HV API that is upstream in Linux/KVM and QEMU. The PAPR API
>   had to break compatibility to accommodate implementation in other
>   hypervisors and partitioning firmware.
>
> And key overall differences
>
>   The control flow and interrupt processing between L0, L1, and L2
>   in the new PAPR API are conceptually unchanged. Where the old API
>   is almost stateless, the PAPR API is stateful, with the L1 registering
>   L2 virtual machines and vCPUs with the L0. Supervisor-privileged
>   register switching duty is now the responsibility for the L0, which
>   holds canonical L2 register state and handles all switching. This
>   new register handling motivates the "getters and setters" wrappers
>   ...

I'll include something along those lines.

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests
@ 2023-06-10  1:46     ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:46 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: mikey, kautuk.consul.1980, kvm, kvm-ppc, Jordan Niethe, sbhat,
	vaibhav, linuxppc-dev

On Wed, Jun 7, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> > There is existing support for nested guests on powernv hosts however the
> > hcall interface this uses is not support by other PAPR hosts.
>
> I kind of liked it being called nested-HV v1 and v2 APIs as short and
> to the point, but I suppose that's ambiguous with version 2 of the v1
> API, so papr is okay. What's the old API called in this scheme, then?
> "Existing API" is not great after patches go upstream.

Yes I was trying for a more descriptive name but it is just more
confusing and I'm struggling for a better alternative.

In the next revision I'll use v1 and v2. For version 2 of v1
we now call it v1.2 or something like that?

>
> And, you've probably explained it pretty well but slightly more of
> a background first up could be helpful. E.g.,
>
>   A nested-HV API for PAPR has been developed based on the KVM-specific
>   nested-HV API that is upstream in Linux/KVM and QEMU. The PAPR API
>   had to break compatibility to accommodate implementation in other
>   hypervisors and partitioning firmware.
>
> And key overall differences
>
>   The control flow and interrupt processing between L0, L1, and L2
>   in the new PAPR API are conceptually unchanged. Where the old API
>   is almost stateless, the PAPR API is stateful, with the L1 registering
>   L2 virtual machines and vCPUs with the L0. Supervisor-privileged
>   register switching duty is now the responsibility for the L0, which
>   holds canonical L2 register state and handles all switching. This
>   new register handling motivates the "getters and setters" wrappers
>   ...

I'll include something along those lines.

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests
@ 2023-06-10  1:46     ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:46 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Jordan Niethe, linuxppc-dev, mikey, kautuk.consul.1980, kvm,
	kvm-ppc, sbhat, vaibhav

On Wed, Jun 7, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> > There is existing support for nested guests on powernv hosts however the
> > hcall interface this uses is not support by other PAPR hosts.
>
> I kind of liked it being called nested-HV v1 and v2 APIs as short and
> to the point, but I suppose that's ambiguous with version 2 of the v1
> API, so papr is okay. What's the old API called in this scheme, then?
> "Existing API" is not great after patches go upstream.

Yes I was trying for a more descriptive name but it is just more
confusing and I'm struggling for a better alternative.

In the next revision I'll use v1 and v2. For version 2 of v1
we now call it v1.2 or something like that?

>
> And, you've probably explained it pretty well but slightly more of
> a background first up could be helpful. E.g.,
>
>   A nested-HV API for PAPR has been developed based on the KVM-specific
>   nested-HV API that is upstream in Linux/KVM and QEMU. The PAPR API
>   had to break compatibility to accommodate implementation in other
>   hypervisors and partitioning firmware.
>
> And key overall differences
>
>   The control flow and interrupt processing between L0, L1, and L2
>   in the new PAPR API are conceptually unchanged. Where the old API
>   is almost stateless, the PAPR API is stateful, with the L1 registering
>   L2 virtual machines and vCPUs with the L0. Supervisor-privileged
>   register switching duty is now the responsibility for the L0, which
>   holds canonical L2 register state and handles all switching. This
>   new register handling motivates the "getters and setters" wrappers
>   ...

I'll include something along those lines.

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 1/6] KVM: PPC: Use getters and setters for vcpu register state
  2023-06-07  7:51     ` Nicholas Piggin
  (?)
@ 2023-06-10  1:52       ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:52 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Jordan Niethe, linuxppc-dev, mikey, kvm, kvm-ppc, sbhat, vaibhav

On Wed, Jun 7, 2023 at 5:53 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> The general idea is fine, some of the names could use a bit of
> improvement. What's a BOOK3S_WRAPPER for example, is it not a
> VCPU_WRAPPER, or alternatively why isn't a VCORE_WRAPPER Book3S
> as well?

Yeah the names are not great.
I didn't call it VCPU_WRAPPER because I wanted to keep separate
BOOK3S_WRAPPER for book3s registers
HV_WRAPPER for hv specific registers
I will change it to something like you suggested.

[snip]
>
> Stray hunk I think.

Yep.

>
> > @@ -957,10 +957,32 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val) \
> >              vcpu->arch.shared->reg = cpu_to_le##size(val);           \
> >  }                                                                    \
> >
> > +#define SHARED_CACHE_WRAPPER_GET(reg, size)                          \
> > +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)                \
> > +{                                                                    \
> > +     if (kvmppc_shared_big_endian(vcpu))                             \
> > +            return be##size##_to_cpu(vcpu->arch.shared->reg);        \
> > +     else                                                            \
> > +            return le##size##_to_cpu(vcpu->arch.shared->reg);        \
> > +}                                                                    \
> > +
> > +#define SHARED_CACHE_WRAPPER_SET(reg, size)                          \
> > +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)      \
> > +{                                                                    \
> > +     if (kvmppc_shared_big_endian(vcpu))                             \
> > +            vcpu->arch.shared->reg = cpu_to_be##size(val);           \
> > +     else                                                            \
> > +            vcpu->arch.shared->reg = cpu_to_le##size(val);           \
> > +}                                                                    \
> > +
> >  #define SHARED_WRAPPER(reg, size)                                    \
> >       SHARED_WRAPPER_GET(reg, size)                                   \
> >       SHARED_WRAPPER_SET(reg, size)                                   \
> >
> > +#define SHARED_CACHE_WRAPPER(reg, size)                                      \
> > +     SHARED_CACHE_WRAPPER_GET(reg, size)                             \
> > +     SHARED_CACHE_WRAPPER_SET(reg, size)                             \
>
> SHARED_CACHE_WRAPPER that does the same thing as SHARED_WRAPPER.

That changes once the guest state buffer IDs are included in a later
patch.

>
> I know some of the names are a but crufty but it's probably a good time
> to rethink them a bit.
>
> KVMPPC_VCPU_SHARED_REG_ACCESSOR or something like that. A few
> more keystrokes could help imensely.

Yes, I will do something like that, for the BOOK3S_WRAPPER and
HV_WRAPPER
too.

>
> > diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > index 34f1db212824..34bc0a8a1288 100644
> > --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > @@ -305,7 +305,7 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6
> >       u32 pid;
> >
> >       lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
> > -     pid = vcpu->arch.pid;
> > +     pid = kvmppc_get_pid(vcpu);
> >
> >       /*
> >        * Prior memory accesses to host PID Q3 must be completed before we
>
> Could add some accessors for get_lpid / get_guest_id which check for the
> correct KVM mode maybe.

True.

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 1/6] KVM: PPC: Use getters and setters for vcpu register state
@ 2023-06-10  1:52       ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:52 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: mikey, kvm, sbhat, kvm-ppc, Jordan Niethe, vaibhav, linuxppc-dev

On Wed, Jun 7, 2023 at 5:53 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> The general idea is fine, some of the names could use a bit of
> improvement. What's a BOOK3S_WRAPPER for example, is it not a
> VCPU_WRAPPER, or alternatively why isn't a VCORE_WRAPPER Book3S
> as well?

Yeah the names are not great.
I didn't call it VCPU_WRAPPER because I wanted to keep separate
BOOK3S_WRAPPER for book3s registers
HV_WRAPPER for hv specific registers
I will change it to something like you suggested.

[snip]
>
> Stray hunk I think.

Yep.

>
> > @@ -957,10 +957,32 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val) \
> >              vcpu->arch.shared->reg = cpu_to_le##size(val);           \
> >  }                                                                    \
> >
> > +#define SHARED_CACHE_WRAPPER_GET(reg, size)                          \
> > +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)                \
> > +{                                                                    \
> > +     if (kvmppc_shared_big_endian(vcpu))                             \
> > +            return be##size##_to_cpu(vcpu->arch.shared->reg);        \
> > +     else                                                            \
> > +            return le##size##_to_cpu(vcpu->arch.shared->reg);        \
> > +}                                                                    \
> > +
> > +#define SHARED_CACHE_WRAPPER_SET(reg, size)                          \
> > +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)      \
> > +{                                                                    \
> > +     if (kvmppc_shared_big_endian(vcpu))                             \
> > +            vcpu->arch.shared->reg = cpu_to_be##size(val);           \
> > +     else                                                            \
> > +            vcpu->arch.shared->reg = cpu_to_le##size(val);           \
> > +}                                                                    \
> > +
> >  #define SHARED_WRAPPER(reg, size)                                    \
> >       SHARED_WRAPPER_GET(reg, size)                                   \
> >       SHARED_WRAPPER_SET(reg, size)                                   \
> >
> > +#define SHARED_CACHE_WRAPPER(reg, size)                                      \
> > +     SHARED_CACHE_WRAPPER_GET(reg, size)                             \
> > +     SHARED_CACHE_WRAPPER_SET(reg, size)                             \
>
> SHARED_CACHE_WRAPPER that does the same thing as SHARED_WRAPPER.

That changes once the guest state buffer IDs are included in a later
patch.

>
> I know some of the names are a but crufty but it's probably a good time
> to rethink them a bit.
>
> KVMPPC_VCPU_SHARED_REG_ACCESSOR or something like that. A few
> more keystrokes could help imensely.

Yes, I will do something like that, for the BOOK3S_WRAPPER and
HV_WRAPPER
too.

>
> > diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > index 34f1db212824..34bc0a8a1288 100644
> > --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > @@ -305,7 +305,7 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6
> >       u32 pid;
> >
> >       lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
> > -     pid = vcpu->arch.pid;
> > +     pid = kvmppc_get_pid(vcpu);
> >
> >       /*
> >        * Prior memory accesses to host PID Q3 must be completed before we
>
> Could add some accessors for get_lpid / get_guest_id which check for the
> correct KVM mode maybe.

True.

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 1/6] KVM: PPC: Use getters and setters for vcpu register state
@ 2023-06-10  1:52       ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:52 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Jordan Niethe, linuxppc-dev, mikey, kvm, kvm-ppc, sbhat, vaibhav

On Wed, Jun 7, 2023 at 5:53 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> The general idea is fine, some of the names could use a bit of
> improvement. What's a BOOK3S_WRAPPER for example, is it not a
> VCPU_WRAPPER, or alternatively why isn't a VCORE_WRAPPER Book3S
> as well?

Yeah the names are not great.
I didn't call it VCPU_WRAPPER because I wanted to keep separate
BOOK3S_WRAPPER for book3s registers
HV_WRAPPER for hv specific registers
I will change it to something like you suggested.

[snip]
>
> Stray hunk I think.

Yep.

>
> > @@ -957,10 +957,32 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val) \
> >              vcpu->arch.shared->reg = cpu_to_le##size(val);           \
> >  }                                                                    \
> >
> > +#define SHARED_CACHE_WRAPPER_GET(reg, size)                          \
> > +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)                \
> > +{                                                                    \
> > +     if (kvmppc_shared_big_endian(vcpu))                             \
> > +            return be##size##_to_cpu(vcpu->arch.shared->reg);        \
> > +     else                                                            \
> > +            return le##size##_to_cpu(vcpu->arch.shared->reg);        \
> > +}                                                                    \
> > +
> > +#define SHARED_CACHE_WRAPPER_SET(reg, size)                          \
> > +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)      \
> > +{                                                                    \
> > +     if (kvmppc_shared_big_endian(vcpu))                             \
> > +            vcpu->arch.shared->reg = cpu_to_be##size(val);           \
> > +     else                                                            \
> > +            vcpu->arch.shared->reg = cpu_to_le##size(val);           \
> > +}                                                                    \
> > +
> >  #define SHARED_WRAPPER(reg, size)                                    \
> >       SHARED_WRAPPER_GET(reg, size)                                   \
> >       SHARED_WRAPPER_SET(reg, size)                                   \
> >
> > +#define SHARED_CACHE_WRAPPER(reg, size)                                      \
> > +     SHARED_CACHE_WRAPPER_GET(reg, size)                             \
> > +     SHARED_CACHE_WRAPPER_SET(reg, size)                             \
>
> SHARED_CACHE_WRAPPER that does the same thing as SHARED_WRAPPER.

That changes once the guest state buffer IDs are included in a later
patch.

>
> I know some of the names are a but crufty but it's probably a good time
> to rethink them a bit.
>
> KVMPPC_VCPU_SHARED_REG_ACCESSOR or something like that. A few
> more keystrokes could help imensely.

Yes, I will do something like that, for the BOOK3S_WRAPPER and
HV_WRAPPER
too.

>
> > diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > index 34f1db212824..34bc0a8a1288 100644
> > --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > @@ -305,7 +305,7 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6
> >       u32 pid;
> >
> >       lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
> > -     pid = vcpu->arch.pid;
> > +     pid = kvmppc_get_pid(vcpu);
> >
> >       /*
> >        * Prior memory accesses to host PID Q3 must be completed before we
>
> Could add some accessors for get_lpid / get_guest_id which check for the
> correct KVM mode maybe.

True.

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 2/6] KVM: PPC: Add fpr getters and setters
  2023-06-07  7:55     ` Nicholas Piggin
  (?)
@ 2023-06-10  1:54       ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:54 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Jordan Niethe, linuxppc-dev, mikey, kvm, kvm-ppc, sbhat, vaibhav

On Wed, Jun 7, 2023 at 5:56 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> Is there a particular reason some reg sets are broken into their own
> patches? Looking at this hunk you think the VR one got missed, but it's
> in its own patch.
>
> Not really a big deal but I wouldn't mind them all in one patch. Or at
> least the FP/VR/VSR ine one since they're quite regular and similar.

There's not really a reason,

Originally I had things even more broken apart but then thought one
patch made
more sense. Part way through squashing the patches I had a change of
heart
and thought I'd see if people had a preference.

I'll just finish the squashing for the next series.

Thanks,
Jordan
>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 2/6] KVM: PPC: Add fpr getters and setters
@ 2023-06-10  1:54       ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:54 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: mikey, kvm, sbhat, kvm-ppc, Jordan Niethe, vaibhav, linuxppc-dev

On Wed, Jun 7, 2023 at 5:56 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> Is there a particular reason some reg sets are broken into their own
> patches? Looking at this hunk you think the VR one got missed, but it's
> in its own patch.
>
> Not really a big deal but I wouldn't mind them all in one patch. Or at
> least the FP/VR/VSR ine one since they're quite regular and similar.

There's not really a reason,

Originally I had things even more broken apart but then thought one
patch made
more sense. Part way through squashing the patches I had a change of
heart
and thought I'd see if people had a preference.

I'll just finish the squashing for the next series.

Thanks,
Jordan
>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 2/6] KVM: PPC: Add fpr getters and setters
@ 2023-06-10  1:54       ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  1:54 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Jordan Niethe, linuxppc-dev, mikey, kvm, kvm-ppc, sbhat, vaibhav

On Wed, Jun 7, 2023 at 5:56 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> Is there a particular reason some reg sets are broken into their own
> patches? Looking at this hunk you think the VR one got missed, but it's
> in its own patch.
>
> Not really a big deal but I wouldn't mind them all in one patch. Or at
> least the FP/VR/VSR ine one since they're quite regular and similar.

There's not really a reason,

Originally I had things even more broken apart but then thought one
patch made
more sense. Part way through squashing the patches I had a change of
heart
and thought I'd see if people had a preference.

I'll just finish the squashing for the next series.

Thanks,
Jordan
>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers
  2023-06-07  8:26     ` Nicholas Piggin
  (?)
@ 2023-06-10  2:09       ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  2:09 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Jordan Niethe, linuxppc-dev, mikey, kvm, kvm-ppc, sbhat, vaibhav

On Wed, Jun 7, 2023 at 6:27 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> This is a tour de force in one of these things, so I hate to be
> the "me smash with club" guy, but what if you allocated buffers
> with enough room for all the state (or 99% of cases, in which
> case an overflow would make an hcall)?
>
> What's actually a fast-path that we don't get from the interrupt
> return buffer? Getting and setting a few regs for MMIO emulation?

As it is a vcpu uses four buffers:

- One for registering it's input and output buffers
   This is allocated just large enough for GSID_RUN_OUTPUT_MIN_SIZE,
   GSID_RUN_INPUT and GSID_RUN_OUTPUT.
   Freed once the buffers are registered.
   I suppose we could just make a buffer big enough to be used for the
vcpu run input buffer then have it register its own address.

- One for process and partition table entries
   Because kvmhv_set_ptbl_entry() isn't associated with a vcpu.
   kvmhv_papr_set_ptbl_entry() allocates and frees a minimal sized
buffer on demand.

- The run vcpu input buffer
   Persists over the lifetime of the vcpu after creation. Large enough
to hold all VCPU-wide elements. The same buffer is also reused for:

     * GET state hcalls
     * SET guest wide state hcalls (guest wide can not be passed into
the vcpu run buffer)

- The run vcpu output buffer
   Persists over the lifetime of the vcpu after creation. This is
sized to be GSID_RUN_OUTPUT_MIN_SIZE as returned by the L0.
   It's unlikely that it would be larger than the run vcpu buffer
size, so I guess you could make it that size too. Probably you could
even use the run vcpu input buffer as the vcpu output buffer.

The buffers could all be that max size and could combine the
configuration buffer, input and output buffers, but I feel it's more
understandable like this.

[snip]

>
> The namespaces are a little abbreviated. KVM_PAPR_ might be nice if
> you're calling the API that.

Will we go with KVM_NESTED_V2_ ?

>
> > +
> > +#define GSID_HOST_STATE_SIZE         0x0001 /* Size of Hypervisor Internal Format VCPU state */
> > +#define GSID_RUN_OUTPUT_MIN_SIZE     0x0002 /* Minimum size of the Run VCPU output buffer */
> > +#define GSID_LOGICAL_PVR             0x0003 /* Logical PVR */
> > +#define GSID_TB_OFFSET                       0x0004 /* Timebase Offset */
> > +#define GSID_PARTITION_TABLE         0x0005 /* Partition Scoped Page Table */
> > +#define GSID_PROCESS_TABLE           0x0006 /* Process Table */
>
> > +
> > +#define GSID_RUN_INPUT                       0x0C00 /* Run VCPU Input Buffer */
> > +#define GSID_RUN_OUTPUT                      0x0C01 /* Run VCPU Out Buffer */
> > +#define GSID_VPA                     0x0C02 /* HRA to Guest VCPU VPA */
> > +
> > +#define GSID_GPR(x)                  (0x1000 + (x))
> > +#define GSID_HDEC_EXPIRY_TB          0x1020
> > +#define GSID_NIA                     0x1021
> > +#define GSID_MSR                     0x1022
> > +#define GSID_LR                              0x1023
> > +#define GSID_XER                     0x1024
> > +#define GSID_CTR                     0x1025
> > +#define GSID_CFAR                    0x1026
> > +#define GSID_SRR0                    0x1027
> > +#define GSID_SRR1                    0x1028
> > +#define GSID_DAR                     0x1029
>
> It's a shame you have to rip up all your wrapper functions now to
> shoehorn these in.
>
> If you included names analogous to the reg field names in the kvm
> structures, the wrappers could do macro expansions that get them.
>
> #define __GSID_WRAPPER_dar              GSID_DAR
>
> Or similar.

Before I had something pretty hacky, in the macro accessors I had
along the lines of

     gsid_table[offsetof(vcpu, reg)]

to get the GSID for the register.

We can do the wrapper idea, I just worry if it is getting too magic.

>
> And since of course you have to explicitly enumerate all these, I
> wouldn't mind defining the types and lengths up-front rather than
> down in the type function. You'd like to be able to go through the
> spec and eyeball type, number, size.

Something like
#define KVM_NESTED_V2_GS_NIA (KVM_NESTED_V2_GSID_NIA | VCPU_WIDE |
READ_WRITE | DOUBLE_WORD)
etc
?

>
> [snip]
>
> > +/**
> > + * gsb_paddress() - the physical address of buffer
> > + * @gsb: guest state buffer
> > + *
> > + * Returns the physical address of the buffer.
> > + */
> > +static inline u64 gsb_paddress(struct gs_buff *gsb)
> > +{
> > +     return __pa(gsb_header(gsb));
> > +}
>
> > +/**
> > + * __gse_put_reg() - add a register type guest state element to a buffer
> > + * @gsb: guest state buffer to add element to
> > + * @iden: guest state ID
> > + * @val: host endian value
> > + *
> > + * Adds a register type guest state element. Uses the guest state ID for
> > + * determining the length of the guest element. If the guest state ID has
> > + * bits that can not be set they will be cleared.
> > + */
> > +static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
> > +{
> > +     val &= gsid_mask(iden);
> > +     if (gsid_size(iden) == sizeof(u64))
> > +             return gse_put_u64(gsb, iden, val);
> > +
> > +     if (gsid_size(iden) == sizeof(u32)) {
> > +             u32 tmp;
> > +
> > +             tmp = (u32)val;
> > +             if (tmp != val)
> > +                     return -EINVAL;
> > +
> > +             return gse_put_u32(gsb, iden, tmp);
> > +     }
> > +     return -EINVAL;
> > +}
>
> There is a clever accessor that derives the length from the type, but
> then you fall back to this.

It's basically just to massage where we have a kvm representation and
guest state buffer representation mismatch:

Like: unsigned long ccr; being 8 bytes and having 4 byte CR in the spec.

>
> > +/**
> > + * gse_put - add a guest state element to a buffer
> > + * @gsb: guest state buffer to add to
> > + * @iden: guest state identity
> > + * @v: generic value
> > + */
> > +#define gse_put(gsb, iden, v)                                        \
> > +     (_Generic((v),                                          \
> > +               u64 : __gse_put_reg,                          \
> > +               long unsigned int : __gse_put_reg,            \
> > +               u32 : __gse_put_reg,                          \
> > +               struct gs_buff_info : gse_put_buff_info,      \
> > +               struct gs_proc_table : gse_put_proc_table,    \
> > +               struct gs_part_table : gse_put_part_table,    \
> > +               vector128 : gse_put_vector128)(gsb, iden, v))
> > +
> > +/**
> > + * gse_get - return the data of a guest state element
> > + * @gsb: guest state element to add to
> > + * @v: generic value pointer to return in
> > + */
> > +#define gse_get(gse, v)                                              \
> > +     (*v = (_Generic((v),                                    \
> > +                     u64 * : __gse_get_reg,                  \
> > +                     unsigned long * : __gse_get_reg,        \
> > +                     u32 * : __gse_get_reg,                  \
> > +                     vector128 * : gse_get_vector128)(gse)))
>
> I don't see the benefit of this. Caller always knows the type doesn't
> it? It seems like the right function could be called directly. It
> makes the calling convention a bit clunky too. I know there's similar
> precedent for uaccess functions, but not sure I like it for this.

The compiler also knows so I just thought I'd save some typing.
I agree it's kind of ugly, happy to drop it.

[snip]
>
> Should all be GPL exports.
>
> Needs more namespace too, I reckon (not just exports but any kernel-wide
> name this short and non-descriptive needs a kvmppc or kvm_papr or
> something).

Will do.

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers
@ 2023-06-10  2:09       ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  2:09 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: mikey, kvm, sbhat, kvm-ppc, Jordan Niethe, vaibhav, linuxppc-dev

On Wed, Jun 7, 2023 at 6:27 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> This is a tour de force in one of these things, so I hate to be
> the "me smash with club" guy, but what if you allocated buffers
> with enough room for all the state (or 99% of cases, in which
> case an overflow would make an hcall)?
>
> What's actually a fast-path that we don't get from the interrupt
> return buffer? Getting and setting a few regs for MMIO emulation?

As it is a vcpu uses four buffers:

- One for registering it's input and output buffers
   This is allocated just large enough for GSID_RUN_OUTPUT_MIN_SIZE,
   GSID_RUN_INPUT and GSID_RUN_OUTPUT.
   Freed once the buffers are registered.
   I suppose we could just make a buffer big enough to be used for the
vcpu run input buffer then have it register its own address.

- One for process and partition table entries
   Because kvmhv_set_ptbl_entry() isn't associated with a vcpu.
   kvmhv_papr_set_ptbl_entry() allocates and frees a minimal sized
buffer on demand.

- The run vcpu input buffer
   Persists over the lifetime of the vcpu after creation. Large enough
to hold all VCPU-wide elements. The same buffer is also reused for:

     * GET state hcalls
     * SET guest wide state hcalls (guest wide can not be passed into
the vcpu run buffer)

- The run vcpu output buffer
   Persists over the lifetime of the vcpu after creation. This is
sized to be GSID_RUN_OUTPUT_MIN_SIZE as returned by the L0.
   It's unlikely that it would be larger than the run vcpu buffer
size, so I guess you could make it that size too. Probably you could
even use the run vcpu input buffer as the vcpu output buffer.

The buffers could all be that max size and could combine the
configuration buffer, input and output buffers, but I feel it's more
understandable like this.

[snip]

>
> The namespaces are a little abbreviated. KVM_PAPR_ might be nice if
> you're calling the API that.

Will we go with KVM_NESTED_V2_ ?

>
> > +
> > +#define GSID_HOST_STATE_SIZE         0x0001 /* Size of Hypervisor Internal Format VCPU state */
> > +#define GSID_RUN_OUTPUT_MIN_SIZE     0x0002 /* Minimum size of the Run VCPU output buffer */
> > +#define GSID_LOGICAL_PVR             0x0003 /* Logical PVR */
> > +#define GSID_TB_OFFSET                       0x0004 /* Timebase Offset */
> > +#define GSID_PARTITION_TABLE         0x0005 /* Partition Scoped Page Table */
> > +#define GSID_PROCESS_TABLE           0x0006 /* Process Table */
>
> > +
> > +#define GSID_RUN_INPUT                       0x0C00 /* Run VCPU Input Buffer */
> > +#define GSID_RUN_OUTPUT                      0x0C01 /* Run VCPU Out Buffer */
> > +#define GSID_VPA                     0x0C02 /* HRA to Guest VCPU VPA */
> > +
> > +#define GSID_GPR(x)                  (0x1000 + (x))
> > +#define GSID_HDEC_EXPIRY_TB          0x1020
> > +#define GSID_NIA                     0x1021
> > +#define GSID_MSR                     0x1022
> > +#define GSID_LR                              0x1023
> > +#define GSID_XER                     0x1024
> > +#define GSID_CTR                     0x1025
> > +#define GSID_CFAR                    0x1026
> > +#define GSID_SRR0                    0x1027
> > +#define GSID_SRR1                    0x1028
> > +#define GSID_DAR                     0x1029
>
> It's a shame you have to rip up all your wrapper functions now to
> shoehorn these in.
>
> If you included names analogous to the reg field names in the kvm
> structures, the wrappers could do macro expansions that get them.
>
> #define __GSID_WRAPPER_dar              GSID_DAR
>
> Or similar.

Before I had something pretty hacky, in the macro accessors I had
along the lines of

     gsid_table[offsetof(vcpu, reg)]

to get the GSID for the register.

We can do the wrapper idea, I just worry if it is getting too magic.

>
> And since of course you have to explicitly enumerate all these, I
> wouldn't mind defining the types and lengths up-front rather than
> down in the type function. You'd like to be able to go through the
> spec and eyeball type, number, size.

Something like
#define KVM_NESTED_V2_GS_NIA (KVM_NESTED_V2_GSID_NIA | VCPU_WIDE |
READ_WRITE | DOUBLE_WORD)
etc
?

>
> [snip]
>
> > +/**
> > + * gsb_paddress() - the physical address of buffer
> > + * @gsb: guest state buffer
> > + *
> > + * Returns the physical address of the buffer.
> > + */
> > +static inline u64 gsb_paddress(struct gs_buff *gsb)
> > +{
> > +     return __pa(gsb_header(gsb));
> > +}
>
> > +/**
> > + * __gse_put_reg() - add a register type guest state element to a buffer
> > + * @gsb: guest state buffer to add element to
> > + * @iden: guest state ID
> > + * @val: host endian value
> > + *
> > + * Adds a register type guest state element. Uses the guest state ID for
> > + * determining the length of the guest element. If the guest state ID has
> > + * bits that can not be set they will be cleared.
> > + */
> > +static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
> > +{
> > +     val &= gsid_mask(iden);
> > +     if (gsid_size(iden) == sizeof(u64))
> > +             return gse_put_u64(gsb, iden, val);
> > +
> > +     if (gsid_size(iden) == sizeof(u32)) {
> > +             u32 tmp;
> > +
> > +             tmp = (u32)val;
> > +             if (tmp != val)
> > +                     return -EINVAL;
> > +
> > +             return gse_put_u32(gsb, iden, tmp);
> > +     }
> > +     return -EINVAL;
> > +}
>
> There is a clever accessor that derives the length from the type, but
> then you fall back to this.

It's basically just to massage where we have a kvm representation and
guest state buffer representation mismatch:

Like: unsigned long ccr; being 8 bytes and having 4 byte CR in the spec.

>
> > +/**
> > + * gse_put - add a guest state element to a buffer
> > + * @gsb: guest state buffer to add to
> > + * @iden: guest state identity
> > + * @v: generic value
> > + */
> > +#define gse_put(gsb, iden, v)                                        \
> > +     (_Generic((v),                                          \
> > +               u64 : __gse_put_reg,                          \
> > +               long unsigned int : __gse_put_reg,            \
> > +               u32 : __gse_put_reg,                          \
> > +               struct gs_buff_info : gse_put_buff_info,      \
> > +               struct gs_proc_table : gse_put_proc_table,    \
> > +               struct gs_part_table : gse_put_part_table,    \
> > +               vector128 : gse_put_vector128)(gsb, iden, v))
> > +
> > +/**
> > + * gse_get - return the data of a guest state element
> > + * @gsb: guest state element to add to
> > + * @v: generic value pointer to return in
> > + */
> > +#define gse_get(gse, v)                                              \
> > +     (*v = (_Generic((v),                                    \
> > +                     u64 * : __gse_get_reg,                  \
> > +                     unsigned long * : __gse_get_reg,        \
> > +                     u32 * : __gse_get_reg,                  \
> > +                     vector128 * : gse_get_vector128)(gse)))
>
> I don't see the benefit of this. Caller always knows the type doesn't
> it? It seems like the right function could be called directly. It
> makes the calling convention a bit clunky too. I know there's similar
> precedent for uaccess functions, but not sure I like it for this.

The compiler also knows so I just thought I'd save some typing.
I agree it's kind of ugly, happy to drop it.

[snip]
>
> Should all be GPL exports.
>
> Needs more namespace too, I reckon (not just exports but any kernel-wide
> name this short and non-descriptive needs a kvmppc or kvm_papr or
> something).

Will do.

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers
@ 2023-06-10  2:09       ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  2:09 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Jordan Niethe, linuxppc-dev, mikey, kvm, kvm-ppc, sbhat, vaibhav

On Wed, Jun 7, 2023 at 6:27 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> This is a tour de force in one of these things, so I hate to be
> the "me smash with club" guy, but what if you allocated buffers
> with enough room for all the state (or 99% of cases, in which
> case an overflow would make an hcall)?
>
> What's actually a fast-path that we don't get from the interrupt
> return buffer? Getting and setting a few regs for MMIO emulation?

As it is a vcpu uses four buffers:

- One for registering it's input and output buffers
   This is allocated just large enough for GSID_RUN_OUTPUT_MIN_SIZE,
   GSID_RUN_INPUT and GSID_RUN_OUTPUT.
   Freed once the buffers are registered.
   I suppose we could just make a buffer big enough to be used for the
vcpu run input buffer then have it register its own address.

- One for process and partition table entries
   Because kvmhv_set_ptbl_entry() isn't associated with a vcpu.
   kvmhv_papr_set_ptbl_entry() allocates and frees a minimal sized
buffer on demand.

- The run vcpu input buffer
   Persists over the lifetime of the vcpu after creation. Large enough
to hold all VCPU-wide elements. The same buffer is also reused for:

     * GET state hcalls
     * SET guest wide state hcalls (guest wide can not be passed into
the vcpu run buffer)

- The run vcpu output buffer
   Persists over the lifetime of the vcpu after creation. This is
sized to be GSID_RUN_OUTPUT_MIN_SIZE as returned by the L0.
   It's unlikely that it would be larger than the run vcpu buffer
size, so I guess you could make it that size too. Probably you could
even use the run vcpu input buffer as the vcpu output buffer.

The buffers could all be that max size and could combine the
configuration buffer, input and output buffers, but I feel it's more
understandable like this.

[snip]

>
> The namespaces are a little abbreviated. KVM_PAPR_ might be nice if
> you're calling the API that.

Will we go with KVM_NESTED_V2_ ?

>
> > +
> > +#define GSID_HOST_STATE_SIZE         0x0001 /* Size of Hypervisor Internal Format VCPU state */
> > +#define GSID_RUN_OUTPUT_MIN_SIZE     0x0002 /* Minimum size of the Run VCPU output buffer */
> > +#define GSID_LOGICAL_PVR             0x0003 /* Logical PVR */
> > +#define GSID_TB_OFFSET                       0x0004 /* Timebase Offset */
> > +#define GSID_PARTITION_TABLE         0x0005 /* Partition Scoped Page Table */
> > +#define GSID_PROCESS_TABLE           0x0006 /* Process Table */
>
> > +
> > +#define GSID_RUN_INPUT                       0x0C00 /* Run VCPU Input Buffer */
> > +#define GSID_RUN_OUTPUT                      0x0C01 /* Run VCPU Out Buffer */
> > +#define GSID_VPA                     0x0C02 /* HRA to Guest VCPU VPA */
> > +
> > +#define GSID_GPR(x)                  (0x1000 + (x))
> > +#define GSID_HDEC_EXPIRY_TB          0x1020
> > +#define GSID_NIA                     0x1021
> > +#define GSID_MSR                     0x1022
> > +#define GSID_LR                              0x1023
> > +#define GSID_XER                     0x1024
> > +#define GSID_CTR                     0x1025
> > +#define GSID_CFAR                    0x1026
> > +#define GSID_SRR0                    0x1027
> > +#define GSID_SRR1                    0x1028
> > +#define GSID_DAR                     0x1029
>
> It's a shame you have to rip up all your wrapper functions now to
> shoehorn these in.
>
> If you included names analogous to the reg field names in the kvm
> structures, the wrappers could do macro expansions that get them.
>
> #define __GSID_WRAPPER_dar              GSID_DAR
>
> Or similar.

Before I had something pretty hacky, in the macro accessors I had
along the lines of

     gsid_table[offsetof(vcpu, reg)]

to get the GSID for the register.

We can do the wrapper idea, I just worry if it is getting too magic.

>
> And since of course you have to explicitly enumerate all these, I
> wouldn't mind defining the types and lengths up-front rather than
> down in the type function. You'd like to be able to go through the
> spec and eyeball type, number, size.

Something like
#define KVM_NESTED_V2_GS_NIA (KVM_NESTED_V2_GSID_NIA | VCPU_WIDE |
READ_WRITE | DOUBLE_WORD)
etc
?

>
> [snip]
>
> > +/**
> > + * gsb_paddress() - the physical address of buffer
> > + * @gsb: guest state buffer
> > + *
> > + * Returns the physical address of the buffer.
> > + */
> > +static inline u64 gsb_paddress(struct gs_buff *gsb)
> > +{
> > +     return __pa(gsb_header(gsb));
> > +}
>
> > +/**
> > + * __gse_put_reg() - add a register type guest state element to a buffer
> > + * @gsb: guest state buffer to add element to
> > + * @iden: guest state ID
> > + * @val: host endian value
> > + *
> > + * Adds a register type guest state element. Uses the guest state ID for
> > + * determining the length of the guest element. If the guest state ID has
> > + * bits that can not be set they will be cleared.
> > + */
> > +static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
> > +{
> > +     val &= gsid_mask(iden);
> > +     if (gsid_size(iden) == sizeof(u64))
> > +             return gse_put_u64(gsb, iden, val);
> > +
> > +     if (gsid_size(iden) == sizeof(u32)) {
> > +             u32 tmp;
> > +
> > +             tmp = (u32)val;
> > +             if (tmp != val)
> > +                     return -EINVAL;
> > +
> > +             return gse_put_u32(gsb, iden, tmp);
> > +     }
> > +     return -EINVAL;
> > +}
>
> There is a clever accessor that derives the length from the type, but
> then you fall back to this.

It's basically just to massage where we have a kvm representation and
guest state buffer representation mismatch:

Like: unsigned long ccr; being 8 bytes and having 4 byte CR in the spec.

>
> > +/**
> > + * gse_put - add a guest state element to a buffer
> > + * @gsb: guest state buffer to add to
> > + * @iden: guest state identity
> > + * @v: generic value
> > + */
> > +#define gse_put(gsb, iden, v)                                        \
> > +     (_Generic((v),                                          \
> > +               u64 : __gse_put_reg,                          \
> > +               long unsigned int : __gse_put_reg,            \
> > +               u32 : __gse_put_reg,                          \
> > +               struct gs_buff_info : gse_put_buff_info,      \
> > +               struct gs_proc_table : gse_put_proc_table,    \
> > +               struct gs_part_table : gse_put_part_table,    \
> > +               vector128 : gse_put_vector128)(gsb, iden, v))
> > +
> > +/**
> > + * gse_get - return the data of a guest state element
> > + * @gsb: guest state element to add to
> > + * @v: generic value pointer to return in
> > + */
> > +#define gse_get(gse, v)                                              \
> > +     (*v = (_Generic((v),                                    \
> > +                     u64 * : __gse_get_reg,                  \
> > +                     unsigned long * : __gse_get_reg,        \
> > +                     u32 * : __gse_get_reg,                  \
> > +                     vector128 * : gse_get_vector128)(gse)))
>
> I don't see the benefit of this. Caller always knows the type doesn't
> it? It seems like the right function could be called directly. It
> makes the calling convention a bit clunky too. I know there's similar
> precedent for uaccess functions, but not sure I like it for this.

The compiler also knows so I just thought I'd save some typing.
I agree it's kind of ugly, happy to drop it.

[snip]
>
> Should all be GPL exports.
>
> Needs more namespace too, I reckon (not just exports but any kernel-wide
> name this short and non-descriptive needs a kvmppc or kvm_papr or
> something).

Will do.

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 5/6] KVM: PPC: Add support for nested PAPR guests
  2023-06-07  9:08     ` Nicholas Piggin
  (?)
@ 2023-06-10  2:16       ` Jordan Niethe
  -1 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  2:16 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Jordan Niethe, linuxppc-dev, mikey, kvm, kvm-ppc, sbhat, vaibhav

On Wed, Jun 7, 2023 at 7:09 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> You lost your comments.

Thanks

>
> > diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> > index 0ca2d8b37b42..c5c57552b447 100644
> > --- a/arch/powerpc/include/asm/kvm_book3s.h
> > +++ b/arch/powerpc/include/asm/kvm_book3s.h
> > @@ -12,6 +12,7 @@
> >  #include <linux/types.h>
> >  #include <linux/kvm_host.h>
> >  #include <asm/kvm_book3s_asm.h>
> > +#include <asm/guest-state-buffer.h>
> >
> >  struct kvmppc_bat {
> >       u64 raw;
> > @@ -316,6 +317,57 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu);
> >
> >  void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
> >
> > +
> > +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> > +
> > +extern bool __kvmhv_on_papr;
> > +
> > +static inline bool kvmhv_on_papr(void)
> > +{
> > +     return __kvmhv_on_papr;
> > +}
>
> It's a nitpick, but kvmhv_on_pseries() is because we're runnning KVM-HV
> on a pseries guest kernel. Which is a papr guest kernel. So this kind of
> doesn't make sense if you read it the same way.
>
> kvmhv_nested_using_papr() or something like that might read a bit
> better.

Will we go with kvmhv_using_nested_v2()?

>
> This could be a static key too.

Will do.

>
> > @@ -575,6 +593,7 @@ struct kvm_vcpu_arch {
> >       ulong dscr;
> >       ulong amr;
> >       ulong uamor;
> > +     ulong amor;
> >       ulong iamr;
> >       u32 ctrl;
> >       u32 dabrx;
>
> This belongs somewhere else.

It can be dropped.

>
> > @@ -829,6 +848,8 @@ struct kvm_vcpu_arch {
> >       u64 nested_hfscr;       /* HFSCR that the L1 requested for the nested guest */
> >       u32 nested_vcpu_id;
> >       gpa_t nested_io_gpr;
> > +     /* For nested APIv2 guests*/
> > +     struct kvmhv_papr_host papr_host;
> >  #endif
>
> This is not exactly a papr host. Might have to come up with a better
> name especially if we implement a L0 things could get confusing.

Any name ideas? nestedv2_state?

>
> > @@ -342,6 +343,203 @@ static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
> >       return rc;
> >  }
> >
> > +static inline long plpar_guest_create(unsigned long flags, unsigned long *guest_id)
> > +{
> > +     unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
> > +     unsigned long token;
> > +     long rc;
> > +
> > +     token = -1UL;
> > +     while (true) {
> > +             rc = plpar_hcall(H_GUEST_CREATE, retbuf, flags, token);
> > +             if (rc == H_SUCCESS) {
> > +                     *guest_id = retbuf[0];
> > +                     break;
> > +             }
> > +
> > +             if (rc == H_BUSY) {
> > +                     token = retbuf[0];
> > +                     cpu_relax();
> > +                     continue;
> > +             }
> > +
> > +             if (H_IS_LONG_BUSY(rc)) {
> > +                     token = retbuf[0];
> > +                     mdelay(get_longbusy_msecs(rc));
>
> All of these things need a non-sleeping delay? Can we sleep instead?
> Or if not, might have to think about going back to the caller and it
> can retry.
>
> get/set state might be a bit inconvenient, although I don't expect
> that should potentially take so long as guest and vcpu create/delete,
> so at least those ones would be good if they're called while
> preemptable.

Yeah no reason not to sleep except for get/set, let me try it out.

>
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 521d84621422..f22ee582e209 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -383,6 +383,11 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
> >       spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
> >  }
> >
> > +static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
> > +{
> > +     vcpu->arch.pvr = pvr;
> > +}
>
> Didn't you lose this in a previous patch? I thought it must have moved
> to a header but it reappears.

Yes, that was meant to stay put.

>
> > +
> >  /* Dummy value used in computing PCR value below */
> >  #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
> >
> > @@ -1262,13 +1267,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
> >                       return RESUME_HOST;
> >               break;
> >  #endif
> > -     case H_RANDOM:
> > +     case H_RANDOM: {
> >               unsigned long rand;
> >
> >               if (!arch_get_random_seed_longs(&rand, 1))
> >                       ret = H_HARDWARE;
> >               kvmppc_set_gpr(vcpu, 4, rand);
> >               break;
> > +     }
> >       case H_RPT_INVALIDATE:
> >               ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
> >                                             kvmppc_get_gpr(vcpu, 5),
>
> Compile fix for a previous patch.

Thanks.

>
> > @@ -2921,14 +2927,21 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
> >       vcpu->arch.shared_big_endian = false;
> >  #endif
> >  #endif
> > -     kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
> >
> > +     if (kvmhv_on_papr()) {
> > +             err = kvmhv_papr_vcpu_create(vcpu, &vcpu->arch.papr_host);
> > +             if (err < 0)
> > +                     return err;
> > +     }
> > +
> > +     kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
> >       if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> >               kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
> >               kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
> >       }
> >
> >       kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
> > +     kvmppc_set_amor_hv(vcpu, ~0);
>
> This AMOR thing should go somewhere else. Not actually sure why it needs
> to be added to the vcpu since it's always ~0. Maybe just put that in a
> #define somewhere and use that.

Yes, you are right, just can get rid of it from the vcpu entirely.
>
> > @@ -4042,6 +4059,50 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
> >       }
> >  }
> >
> > +static int kvmhv_vcpu_entry_papr(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
> > +{
> > +     struct kvmhv_papr_host *ph;
> > +     unsigned long msr, i;
> > +     int trap;
> > +     long rc;
> > +
> > +     ph = &vcpu->arch.papr_host;
> > +
> > +     msr = mfmsr();
> > +     kvmppc_msr_hard_disable_set_facilities(vcpu, msr);
> > +     if (lazy_irq_pending())
> > +             return 0;
> > +
> > +     kvmhv_papr_flush_vcpu(vcpu, time_limit);
> > +
> > +     accumulate_time(vcpu, &vcpu->arch.in_guest);
> > +     rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
> > +                               &trap, &i);
> > +
> > +     if (rc != H_SUCCESS) {
> > +             pr_err("KVM Guest Run VCPU hcall failed\n");
> > +             if (rc == H_INVALID_ELEMENT_ID)
> > +                     pr_err("KVM: Guest Run VCPU invalid element id at %ld\n", i);
> > +             else if (rc == H_INVALID_ELEMENT_SIZE)
> > +                     pr_err("KVM: Guest Run VCPU invalid element size at %ld\n", i);
> > +             else if (rc == H_INVALID_ELEMENT_VALUE)
> > +                     pr_err("KVM: Guest Run VCPU invalid element value at %ld\n", i);
> > +             return 0;
> > +     }
>
> This needs the proper error handling. Were you going to wait until I
> sent that up for existing code?

Overall the unhappy paths need to be tightened up in the next revision.
But yeah this hits the same thing as the v1 API.

>
> > @@ -5119,6 +5183,7 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
> >   */
> >  void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
> >  {
> > +     struct kvm_vcpu *vcpu;
> >       long int i;
> >       u32 cores_done = 0;
> >
> > @@ -5139,6 +5204,12 @@ void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
> >               if (++cores_done >= kvm->arch.online_vcores)
> >                       break;
> >       }
> > +
> > +     if (kvmhv_on_papr()) {
> > +             kvm_for_each_vcpu(i, vcpu, kvm) {
> > +                     kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
> > +             }
> > +     }
> >  }
>
> vcpu define could go in that scope I guess.

True.

>
> > @@ -5405,15 +5476,43 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> >
> >       /* Allocate the guest's logical partition ID */
> >
> > -     lpid = kvmppc_alloc_lpid();
> > -     if ((long)lpid < 0)
> > -             return -ENOMEM;
> > -     kvm->arch.lpid = lpid;
> > +     if (!kvmhv_on_papr()) {
> > +             lpid = kvmppc_alloc_lpid();
> > +             if ((long)lpid < 0)
> > +                     return -ENOMEM;
> > +             kvm->arch.lpid = lpid;
> > +     }
> >
> >       kvmppc_alloc_host_rm_ops();
> >
> >       kvmhv_vm_nested_init(kvm);
> >
> > +     if (kvmhv_on_papr()) {
> > +             long rc;
> > +             unsigned long guest_id;
> > +
> > +             rc = plpar_guest_create(0, &guest_id);
> > +
> > +             if (rc != H_SUCCESS)
> > +                     pr_err("KVM: Create Guest hcall failed, rc=%ld\n", rc);
> > +
> > +             switch (rc) {
> > +             case H_PARAMETER:
> > +             case H_FUNCTION:
> > +             case H_STATE:
> > +                     return -EINVAL;
> > +             case H_NOT_ENOUGH_RESOURCES:
> > +             case H_ABORTED:
> > +                     return -ENOMEM;
> > +             case H_AUTHORITY:
> > +                     return -EPERM;
> > +             case H_NOT_AVAILABLE:
> > +                     return -EBUSY;
> > +             }
> > +             kvm->arch.lpid = guest_id;
> > +     }
>
> I wouldn't mind putting lpid/guest_id in different variables. guest_id
> is 64-bit isn't it? LPIDR is 32. If nothing else that could cause
> issues if the hypervisor does something clever with the token.

I was trying to get rid of a difference between this API and  the
others, but I'd forgotten about the 64bit / 32bit difference.
Will put it back in its own variable.

>
> > @@ -5573,10 +5675,14 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
> >               kvm->arch.process_table = 0;
> >               if (kvm->arch.secure_guest)
> >                       uv_svm_terminate(kvm->arch.lpid);
> > -             kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
> > +             if (!kvmhv_on_papr())
> > +                     kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
> >       }
>
> Would be nice to have a +ve test for the "existing" API. All we have to
> do is think of a name for it.

Will we go with nestedv1?

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 5/6] KVM: PPC: Add support for nested PAPR guests
@ 2023-06-10  2:16       ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  2:16 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: mikey, kvm, sbhat, kvm-ppc, Jordan Niethe, vaibhav, linuxppc-dev

On Wed, Jun 7, 2023 at 7:09 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> You lost your comments.

Thanks

>
> > diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> > index 0ca2d8b37b42..c5c57552b447 100644
> > --- a/arch/powerpc/include/asm/kvm_book3s.h
> > +++ b/arch/powerpc/include/asm/kvm_book3s.h
> > @@ -12,6 +12,7 @@
> >  #include <linux/types.h>
> >  #include <linux/kvm_host.h>
> >  #include <asm/kvm_book3s_asm.h>
> > +#include <asm/guest-state-buffer.h>
> >
> >  struct kvmppc_bat {
> >       u64 raw;
> > @@ -316,6 +317,57 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu);
> >
> >  void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
> >
> > +
> > +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> > +
> > +extern bool __kvmhv_on_papr;
> > +
> > +static inline bool kvmhv_on_papr(void)
> > +{
> > +     return __kvmhv_on_papr;
> > +}
>
> It's a nitpick, but kvmhv_on_pseries() is because we're runnning KVM-HV
> on a pseries guest kernel. Which is a papr guest kernel. So this kind of
> doesn't make sense if you read it the same way.
>
> kvmhv_nested_using_papr() or something like that might read a bit
> better.

Will we go with kvmhv_using_nested_v2()?

>
> This could be a static key too.

Will do.

>
> > @@ -575,6 +593,7 @@ struct kvm_vcpu_arch {
> >       ulong dscr;
> >       ulong amr;
> >       ulong uamor;
> > +     ulong amor;
> >       ulong iamr;
> >       u32 ctrl;
> >       u32 dabrx;
>
> This belongs somewhere else.

It can be dropped.

>
> > @@ -829,6 +848,8 @@ struct kvm_vcpu_arch {
> >       u64 nested_hfscr;       /* HFSCR that the L1 requested for the nested guest */
> >       u32 nested_vcpu_id;
> >       gpa_t nested_io_gpr;
> > +     /* For nested APIv2 guests*/
> > +     struct kvmhv_papr_host papr_host;
> >  #endif
>
> This is not exactly a papr host. Might have to come up with a better
> name especially if we implement a L0 things could get confusing.

Any name ideas? nestedv2_state?

>
> > @@ -342,6 +343,203 @@ static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
> >       return rc;
> >  }
> >
> > +static inline long plpar_guest_create(unsigned long flags, unsigned long *guest_id)
> > +{
> > +     unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
> > +     unsigned long token;
> > +     long rc;
> > +
> > +     token = -1UL;
> > +     while (true) {
> > +             rc = plpar_hcall(H_GUEST_CREATE, retbuf, flags, token);
> > +             if (rc == H_SUCCESS) {
> > +                     *guest_id = retbuf[0];
> > +                     break;
> > +             }
> > +
> > +             if (rc == H_BUSY) {
> > +                     token = retbuf[0];
> > +                     cpu_relax();
> > +                     continue;
> > +             }
> > +
> > +             if (H_IS_LONG_BUSY(rc)) {
> > +                     token = retbuf[0];
> > +                     mdelay(get_longbusy_msecs(rc));
>
> All of these things need a non-sleeping delay? Can we sleep instead?
> Or if not, might have to think about going back to the caller and it
> can retry.
>
> get/set state might be a bit inconvenient, although I don't expect
> that should potentially take so long as guest and vcpu create/delete,
> so at least those ones would be good if they're called while
> preemptable.

Yeah no reason not to sleep except for get/set, let me try it out.

>
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 521d84621422..f22ee582e209 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -383,6 +383,11 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
> >       spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
> >  }
> >
> > +static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
> > +{
> > +     vcpu->arch.pvr = pvr;
> > +}
>
> Didn't you lose this in a previous patch? I thought it must have moved
> to a header but it reappears.

Yes, that was meant to stay put.

>
> > +
> >  /* Dummy value used in computing PCR value below */
> >  #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
> >
> > @@ -1262,13 +1267,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
> >                       return RESUME_HOST;
> >               break;
> >  #endif
> > -     case H_RANDOM:
> > +     case H_RANDOM: {
> >               unsigned long rand;
> >
> >               if (!arch_get_random_seed_longs(&rand, 1))
> >                       ret = H_HARDWARE;
> >               kvmppc_set_gpr(vcpu, 4, rand);
> >               break;
> > +     }
> >       case H_RPT_INVALIDATE:
> >               ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
> >                                             kvmppc_get_gpr(vcpu, 5),
>
> Compile fix for a previous patch.

Thanks.

>
> > @@ -2921,14 +2927,21 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
> >       vcpu->arch.shared_big_endian = false;
> >  #endif
> >  #endif
> > -     kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
> >
> > +     if (kvmhv_on_papr()) {
> > +             err = kvmhv_papr_vcpu_create(vcpu, &vcpu->arch.papr_host);
> > +             if (err < 0)
> > +                     return err;
> > +     }
> > +
> > +     kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
> >       if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> >               kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
> >               kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
> >       }
> >
> >       kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
> > +     kvmppc_set_amor_hv(vcpu, ~0);
>
> This AMOR thing should go somewhere else. Not actually sure why it needs
> to be added to the vcpu since it's always ~0. Maybe just put that in a
> #define somewhere and use that.

Yes, you are right, just can get rid of it from the vcpu entirely.
>
> > @@ -4042,6 +4059,50 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
> >       }
> >  }
> >
> > +static int kvmhv_vcpu_entry_papr(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
> > +{
> > +     struct kvmhv_papr_host *ph;
> > +     unsigned long msr, i;
> > +     int trap;
> > +     long rc;
> > +
> > +     ph = &vcpu->arch.papr_host;
> > +
> > +     msr = mfmsr();
> > +     kvmppc_msr_hard_disable_set_facilities(vcpu, msr);
> > +     if (lazy_irq_pending())
> > +             return 0;
> > +
> > +     kvmhv_papr_flush_vcpu(vcpu, time_limit);
> > +
> > +     accumulate_time(vcpu, &vcpu->arch.in_guest);
> > +     rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
> > +                               &trap, &i);
> > +
> > +     if (rc != H_SUCCESS) {
> > +             pr_err("KVM Guest Run VCPU hcall failed\n");
> > +             if (rc == H_INVALID_ELEMENT_ID)
> > +                     pr_err("KVM: Guest Run VCPU invalid element id at %ld\n", i);
> > +             else if (rc == H_INVALID_ELEMENT_SIZE)
> > +                     pr_err("KVM: Guest Run VCPU invalid element size at %ld\n", i);
> > +             else if (rc == H_INVALID_ELEMENT_VALUE)
> > +                     pr_err("KVM: Guest Run VCPU invalid element value at %ld\n", i);
> > +             return 0;
> > +     }
>
> This needs the proper error handling. Were you going to wait until I
> sent that up for existing code?

Overall the unhappy paths need to be tightened up in the next revision.
But yeah this hits the same thing as the v1 API.

>
> > @@ -5119,6 +5183,7 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
> >   */
> >  void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
> >  {
> > +     struct kvm_vcpu *vcpu;
> >       long int i;
> >       u32 cores_done = 0;
> >
> > @@ -5139,6 +5204,12 @@ void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
> >               if (++cores_done >= kvm->arch.online_vcores)
> >                       break;
> >       }
> > +
> > +     if (kvmhv_on_papr()) {
> > +             kvm_for_each_vcpu(i, vcpu, kvm) {
> > +                     kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
> > +             }
> > +     }
> >  }
>
> vcpu define could go in that scope I guess.

True.

>
> > @@ -5405,15 +5476,43 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> >
> >       /* Allocate the guest's logical partition ID */
> >
> > -     lpid = kvmppc_alloc_lpid();
> > -     if ((long)lpid < 0)
> > -             return -ENOMEM;
> > -     kvm->arch.lpid = lpid;
> > +     if (!kvmhv_on_papr()) {
> > +             lpid = kvmppc_alloc_lpid();
> > +             if ((long)lpid < 0)
> > +                     return -ENOMEM;
> > +             kvm->arch.lpid = lpid;
> > +     }
> >
> >       kvmppc_alloc_host_rm_ops();
> >
> >       kvmhv_vm_nested_init(kvm);
> >
> > +     if (kvmhv_on_papr()) {
> > +             long rc;
> > +             unsigned long guest_id;
> > +
> > +             rc = plpar_guest_create(0, &guest_id);
> > +
> > +             if (rc != H_SUCCESS)
> > +                     pr_err("KVM: Create Guest hcall failed, rc=%ld\n", rc);
> > +
> > +             switch (rc) {
> > +             case H_PARAMETER:
> > +             case H_FUNCTION:
> > +             case H_STATE:
> > +                     return -EINVAL;
> > +             case H_NOT_ENOUGH_RESOURCES:
> > +             case H_ABORTED:
> > +                     return -ENOMEM;
> > +             case H_AUTHORITY:
> > +                     return -EPERM;
> > +             case H_NOT_AVAILABLE:
> > +                     return -EBUSY;
> > +             }
> > +             kvm->arch.lpid = guest_id;
> > +     }
>
> I wouldn't mind putting lpid/guest_id in different variables. guest_id
> is 64-bit isn't it? LPIDR is 32. If nothing else that could cause
> issues if the hypervisor does something clever with the token.

I was trying to get rid of a difference between this API and  the
others, but I'd forgotten about the 64bit / 32bit difference.
Will put it back in its own variable.

>
> > @@ -5573,10 +5675,14 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
> >               kvm->arch.process_table = 0;
> >               if (kvm->arch.secure_guest)
> >                       uv_svm_terminate(kvm->arch.lpid);
> > -             kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
> > +             if (!kvmhv_on_papr())
> > +                     kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
> >       }
>
> Would be nice to have a +ve test for the "existing" API. All we have to
> do is think of a name for it.

Will we go with nestedv1?

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH v2 5/6] KVM: PPC: Add support for nested PAPR guests
@ 2023-06-10  2:16       ` Jordan Niethe
  0 siblings, 0 replies; 57+ messages in thread
From: Jordan Niethe @ 2023-06-10  2:16 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Jordan Niethe, linuxppc-dev, mikey, kvm, kvm-ppc, sbhat, vaibhav

On Wed, Jun 7, 2023 at 7:09 PM Nicholas Piggin <npiggin@gmail.com> wrote:
[snip]
>
> You lost your comments.

Thanks

>
> > diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
> > index 0ca2d8b37b42..c5c57552b447 100644
> > --- a/arch/powerpc/include/asm/kvm_book3s.h
> > +++ b/arch/powerpc/include/asm/kvm_book3s.h
> > @@ -12,6 +12,7 @@
> >  #include <linux/types.h>
> >  #include <linux/kvm_host.h>
> >  #include <asm/kvm_book3s_asm.h>
> > +#include <asm/guest-state-buffer.h>
> >
> >  struct kvmppc_bat {
> >       u64 raw;
> > @@ -316,6 +317,57 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu);
> >
> >  void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
> >
> > +
> > +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> > +
> > +extern bool __kvmhv_on_papr;
> > +
> > +static inline bool kvmhv_on_papr(void)
> > +{
> > +     return __kvmhv_on_papr;
> > +}
>
> It's a nitpick, but kvmhv_on_pseries() is because we're runnning KVM-HV
> on a pseries guest kernel. Which is a papr guest kernel. So this kind of
> doesn't make sense if you read it the same way.
>
> kvmhv_nested_using_papr() or something like that might read a bit
> better.

Will we go with kvmhv_using_nested_v2()?

>
> This could be a static key too.

Will do.

>
> > @@ -575,6 +593,7 @@ struct kvm_vcpu_arch {
> >       ulong dscr;
> >       ulong amr;
> >       ulong uamor;
> > +     ulong amor;
> >       ulong iamr;
> >       u32 ctrl;
> >       u32 dabrx;
>
> This belongs somewhere else.

It can be dropped.

>
> > @@ -829,6 +848,8 @@ struct kvm_vcpu_arch {
> >       u64 nested_hfscr;       /* HFSCR that the L1 requested for the nested guest */
> >       u32 nested_vcpu_id;
> >       gpa_t nested_io_gpr;
> > +     /* For nested APIv2 guests*/
> > +     struct kvmhv_papr_host papr_host;
> >  #endif
>
> This is not exactly a papr host. Might have to come up with a better
> name especially if we implement a L0 things could get confusing.

Any name ideas? nestedv2_state?

>
> > @@ -342,6 +343,203 @@ static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
> >       return rc;
> >  }
> >
> > +static inline long plpar_guest_create(unsigned long flags, unsigned long *guest_id)
> > +{
> > +     unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
> > +     unsigned long token;
> > +     long rc;
> > +
> > +     token = -1UL;
> > +     while (true) {
> > +             rc = plpar_hcall(H_GUEST_CREATE, retbuf, flags, token);
> > +             if (rc == H_SUCCESS) {
> > +                     *guest_id = retbuf[0];
> > +                     break;
> > +             }
> > +
> > +             if (rc == H_BUSY) {
> > +                     token = retbuf[0];
> > +                     cpu_relax();
> > +                     continue;
> > +             }
> > +
> > +             if (H_IS_LONG_BUSY(rc)) {
> > +                     token = retbuf[0];
> > +                     mdelay(get_longbusy_msecs(rc));
>
> All of these things need a non-sleeping delay? Can we sleep instead?
> Or if not, might have to think about going back to the caller and it
> can retry.
>
> get/set state might be a bit inconvenient, although I don't expect
> that should potentially take so long as guest and vcpu create/delete,
> so at least those ones would be good if they're called while
> preemptable.

Yeah no reason not to sleep except for get/set, let me try it out.

>
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 521d84621422..f22ee582e209 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -383,6 +383,11 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
> >       spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
> >  }
> >
> > +static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
> > +{
> > +     vcpu->arch.pvr = pvr;
> > +}
>
> Didn't you lose this in a previous patch? I thought it must have moved
> to a header but it reappears.

Yes, that was meant to stay put.

>
> > +
> >  /* Dummy value used in computing PCR value below */
> >  #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
> >
> > @@ -1262,13 +1267,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
> >                       return RESUME_HOST;
> >               break;
> >  #endif
> > -     case H_RANDOM:
> > +     case H_RANDOM: {
> >               unsigned long rand;
> >
> >               if (!arch_get_random_seed_longs(&rand, 1))
> >                       ret = H_HARDWARE;
> >               kvmppc_set_gpr(vcpu, 4, rand);
> >               break;
> > +     }
> >       case H_RPT_INVALIDATE:
> >               ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
> >                                             kvmppc_get_gpr(vcpu, 5),
>
> Compile fix for a previous patch.

Thanks.

>
> > @@ -2921,14 +2927,21 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
> >       vcpu->arch.shared_big_endian = false;
> >  #endif
> >  #endif
> > -     kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
> >
> > +     if (kvmhv_on_papr()) {
> > +             err = kvmhv_papr_vcpu_create(vcpu, &vcpu->arch.papr_host);
> > +             if (err < 0)
> > +                     return err;
> > +     }
> > +
> > +     kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
> >       if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> >               kvmppc_set_mmcr_hv(vcpu, 0, kvmppc_get_mmcr_hv(vcpu, 0) | MMCR0_PMCCEXT);
> >               kvmppc_set_mmcra_hv(vcpu, MMCRA_BHRB_DISABLE);
> >       }
> >
> >       kvmppc_set_ctrl_hv(vcpu, CTRL_RUNLATCH);
> > +     kvmppc_set_amor_hv(vcpu, ~0);
>
> This AMOR thing should go somewhere else. Not actually sure why it needs
> to be added to the vcpu since it's always ~0. Maybe just put that in a
> #define somewhere and use that.

Yes, you are right, just can get rid of it from the vcpu entirely.
>
> > @@ -4042,6 +4059,50 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
> >       }
> >  }
> >
> > +static int kvmhv_vcpu_entry_papr(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb)
> > +{
> > +     struct kvmhv_papr_host *ph;
> > +     unsigned long msr, i;
> > +     int trap;
> > +     long rc;
> > +
> > +     ph = &vcpu->arch.papr_host;
> > +
> > +     msr = mfmsr();
> > +     kvmppc_msr_hard_disable_set_facilities(vcpu, msr);
> > +     if (lazy_irq_pending())
> > +             return 0;
> > +
> > +     kvmhv_papr_flush_vcpu(vcpu, time_limit);
> > +
> > +     accumulate_time(vcpu, &vcpu->arch.in_guest);
> > +     rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
> > +                               &trap, &i);
> > +
> > +     if (rc != H_SUCCESS) {
> > +             pr_err("KVM Guest Run VCPU hcall failed\n");
> > +             if (rc == H_INVALID_ELEMENT_ID)
> > +                     pr_err("KVM: Guest Run VCPU invalid element id at %ld\n", i);
> > +             else if (rc == H_INVALID_ELEMENT_SIZE)
> > +                     pr_err("KVM: Guest Run VCPU invalid element size at %ld\n", i);
> > +             else if (rc == H_INVALID_ELEMENT_VALUE)
> > +                     pr_err("KVM: Guest Run VCPU invalid element value at %ld\n", i);
> > +             return 0;
> > +     }
>
> This needs the proper error handling. Were you going to wait until I
> sent that up for existing code?

Overall the unhappy paths need to be tightened up in the next revision.
But yeah this hits the same thing as the v1 API.

>
> > @@ -5119,6 +5183,7 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
> >   */
> >  void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
> >  {
> > +     struct kvm_vcpu *vcpu;
> >       long int i;
> >       u32 cores_done = 0;
> >
> > @@ -5139,6 +5204,12 @@ void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
> >               if (++cores_done >= kvm->arch.online_vcores)
> >                       break;
> >       }
> > +
> > +     if (kvmhv_on_papr()) {
> > +             kvm_for_each_vcpu(i, vcpu, kvm) {
> > +                     kvmppc_set_lpcr_hv(vcpu, vcpu->arch.vcore->lpcr);
> > +             }
> > +     }
> >  }
>
> vcpu define could go in that scope I guess.

True.

>
> > @@ -5405,15 +5476,43 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> >
> >       /* Allocate the guest's logical partition ID */
> >
> > -     lpid = kvmppc_alloc_lpid();
> > -     if ((long)lpid < 0)
> > -             return -ENOMEM;
> > -     kvm->arch.lpid = lpid;
> > +     if (!kvmhv_on_papr()) {
> > +             lpid = kvmppc_alloc_lpid();
> > +             if ((long)lpid < 0)
> > +                     return -ENOMEM;
> > +             kvm->arch.lpid = lpid;
> > +     }
> >
> >       kvmppc_alloc_host_rm_ops();
> >
> >       kvmhv_vm_nested_init(kvm);
> >
> > +     if (kvmhv_on_papr()) {
> > +             long rc;
> > +             unsigned long guest_id;
> > +
> > +             rc = plpar_guest_create(0, &guest_id);
> > +
> > +             if (rc != H_SUCCESS)
> > +                     pr_err("KVM: Create Guest hcall failed, rc=%ld\n", rc);
> > +
> > +             switch (rc) {
> > +             case H_PARAMETER:
> > +             case H_FUNCTION:
> > +             case H_STATE:
> > +                     return -EINVAL;
> > +             case H_NOT_ENOUGH_RESOURCES:
> > +             case H_ABORTED:
> > +                     return -ENOMEM;
> > +             case H_AUTHORITY:
> > +                     return -EPERM;
> > +             case H_NOT_AVAILABLE:
> > +                     return -EBUSY;
> > +             }
> > +             kvm->arch.lpid = guest_id;
> > +     }
>
> I wouldn't mind putting lpid/guest_id in different variables. guest_id
> is 64-bit isn't it? LPIDR is 32. If nothing else that could cause
> issues if the hypervisor does something clever with the token.

I was trying to get rid of a difference between this API and  the
others, but I'd forgotten about the 64bit / 32bit difference.
Will put it back in its own variable.

>
> > @@ -5573,10 +5675,14 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
> >               kvm->arch.process_table = 0;
> >               if (kvm->arch.secure_guest)
> >                       uv_svm_terminate(kvm->arch.lpid);
> > -             kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
> > +             if (!kvmhv_on_papr())
> > +                     kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
> >       }
>
> Would be nice to have a +ve test for the "existing" API. All we have to
> do is think of a name for it.

Will we go with nestedv1?

Thanks,
Jordan

>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2023-06-10  2:17 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-05  6:48 [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests Jordan Niethe
2023-06-05  6:48 ` Jordan Niethe
2023-06-05  6:48 ` Jordan Niethe
2023-06-05  6:48 ` [RFC PATCH v2 1/6] KVM: PPC: Use getters and setters for vcpu register state Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-07  7:51   ` Nicholas Piggin
2023-06-07  7:51     ` Nicholas Piggin
2023-06-07  7:51     ` Nicholas Piggin
2023-06-10  1:52     ` Jordan Niethe
2023-06-10  1:52       ` Jordan Niethe
2023-06-10  1:52       ` Jordan Niethe
2023-06-05  6:48 ` [RFC PATCH v2 2/6] KVM: PPC: Add fpr getters and setters Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-07  7:55   ` Nicholas Piggin
2023-06-07  7:55     ` Nicholas Piggin
2023-06-07  7:55     ` Nicholas Piggin
2023-06-10  1:54     ` Jordan Niethe
2023-06-10  1:54       ` Jordan Niethe
2023-06-10  1:54       ` Jordan Niethe
2023-06-05  6:48 ` [RFC PATCH v2 3/6] KVM: PPC: Add vr " Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-05  6:48 ` [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-07  8:26   ` Nicholas Piggin
2023-06-07  8:26     ` Nicholas Piggin
2023-06-07  8:26     ` Nicholas Piggin
2023-06-10  2:09     ` Jordan Niethe
2023-06-10  2:09       ` Jordan Niethe
2023-06-10  2:09       ` Jordan Niethe
2023-06-05  6:48 ` [RFC PATCH v2 5/6] KVM: PPC: Add support for nested PAPR guests Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-07  9:08   ` Nicholas Piggin
2023-06-07  9:08     ` Nicholas Piggin
2023-06-07  9:08     ` Nicholas Piggin
2023-06-10  2:16     ` Jordan Niethe
2023-06-10  2:16       ` Jordan Niethe
2023-06-10  2:16       ` Jordan Niethe
2023-06-05  6:48 ` [RFC PATCH v2 6/6] docs: powerpc: Document nested KVM on POWER Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-05  6:48   ` Jordan Niethe
2023-06-07  5:37   ` [PATCH RFC " Gautam Menghani
2023-06-07  5:49     ` Gautam Menghani
2023-06-07  5:37     ` Gautam Menghani
2023-06-10  1:39     ` Jordan Niethe
2023-06-10  1:39       ` Jordan Niethe
2023-06-10  1:39       ` Jordan Niethe
2023-06-07  5:53 ` [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests Nicholas Piggin
2023-06-07  5:53   ` Nicholas Piggin
2023-06-07  5:53   ` Nicholas Piggin
2023-06-10  1:46   ` Jordan Niethe
2023-06-10  1:46     ` Jordan Niethe
2023-06-10  1:46     ` Jordan Niethe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.