All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/37] KVM: s390: Add support for protected VMs
@ 2019-10-24 11:40 Janosch Frank
  2019-10-24 11:40 ` [RFC 01/37] DOCUMENTATION: protvirt: Protected virtual machine introduction Janosch Frank
                   ` (36 more replies)
  0 siblings, 37 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state
like guest memory and guest registers anymore. Instead the PVMs are
mostly managed by a new entity called Ultravisor (UV), which provides
an API, so KVM and the PV can request management actions.

PVMs are encrypted at rest and protected from hypervisor access while
running. They switch from a normal operation into protected mode, so
we can still use the standard boot process to load a encrypted blob
and then move it into protected mode.

Rebooting is only possible by passing through the unprotected/normal
mode and switching to protected again.

All patches are in the protvirt branch of the korg s390 kvm git.

Claudio will present the technology at his presentation at KVM Forum
2019.

Christian Borntraeger (1):
  KVM: s390: protvirt: Add SCLP handling

Claudio Imbrenda (2):
  KVM: s390: add missing include in gmap.h
  KVM: s390: protvirt: Implement on-demand pinning

Janosch Frank (27):
  DOCUMENTATION: protvirt: Protected virtual machine introduction
  KVM: s390: protvirt: Add initial lifecycle handling
  s390: KVM: Export PV handle to gmap
  s390: UV: Add import and export to UV library
  KVM: s390: protvirt: Secure memory is not mergeable
  DOCUMENTATION: protvirt: Interrupt injection
  KVM: s390: protvirt: Handle SE notification interceptions
  DOCUMENTATION: protvirt: Instruction emulation
  KVM: s390: protvirt: Handle spec exception loops
  KVM: s390: protvirt: Add new gprs location handling
  KVM: S390: protvirt: Introduce instruction data area bounce buffer
  KVM: S390: protvirt: Instruction emulation
  KVM: s390: protvirt: Make sure prefix is always protected
  KVM: s390: protvirt: Write sthyi data to instruction data area
  KVM: s390: protvirt: STSI handling
  KVM: s390: protvirt: Only sync fmt4 registers
  KVM: s390: protvirt: SIGP handling
  KVM: s390: protvirt: Add program exception injection
  KVM: s390: protvirt: Sync pv state
  DOCUMENTATION: protvirt: Diag 308 IPL
  KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling
  KVM: s390: protvirt: UV calls diag308 0, 1
  KVM: s390: Introduce VCPU reset IOCTL
  KVM: s390: protvirt: Report CPU state to Ultravisor
  KVM: s390: Fix cpu reset local IRQ clearing
  KVM: s390: protvirt: Support cmd 5 operation state
  KVM: s390: protvirt: Add UV debug trace

Michael Mueller (4):
  KVM: s390: protvirt: Add interruption injection controls
  KVM: s390: protvirt: Implement interruption injection
  KVM: s390: protvirt: Add machine-check interruption injection controls
  KVM: s390: protvirt: Implement machine-check interruption injection

Vasily Gorbik (3):
  s390/protvirt: introduce host side setup
  s390/protvirt: add ultravisor initialization
  s390: add (non)secure page access exceptions handlers

 .../admin-guide/kernel-parameters.txt         |   5 +
 Documentation/virtual/kvm/s390-pv-boot.txt    |  62 +++
 Documentation/virtual/kvm/s390-pv.txt         |  97 ++++
 arch/s390/boot/Makefile                       |   2 +-
 arch/s390/boot/uv.c                           |  20 +-
 arch/s390/include/asm/gmap.h                  |   4 +
 arch/s390/include/asm/kvm_host.h              | 103 +++-
 arch/s390/include/asm/uv.h                    | 255 +++++++++-
 arch/s390/include/uapi/asm/kvm.h              |   5 +-
 arch/s390/kernel/Makefile                     |   1 +
 arch/s390/kernel/pgm_check.S                  |   4 +-
 arch/s390/kernel/setup.c                      |   7 +-
 arch/s390/kernel/uv.c                         | 121 +++++
 arch/s390/kvm/Kconfig                         |   9 +
 arch/s390/kvm/Makefile                        |   2 +-
 arch/s390/kvm/diag.c                          |   7 +
 arch/s390/kvm/intercept.c                     |  91 +++-
 arch/s390/kvm/interrupt.c                     | 208 ++++++--
 arch/s390/kvm/kvm-s390.c                      | 476 +++++++++++++++---
 arch/s390/kvm/kvm-s390.h                      |  58 +++
 arch/s390/kvm/priv.c                          |   9 +-
 arch/s390/kvm/pv.c                            | 317 ++++++++++++
 arch/s390/mm/fault.c                          |  64 +++
 arch/s390/mm/gmap.c                           |  28 +-
 include/uapi/linux/kvm.h                      |  42 ++
 25 files changed, 1848 insertions(+), 149 deletions(-)
 create mode 100644 Documentation/virtual/kvm/s390-pv-boot.txt
 create mode 100644 Documentation/virtual/kvm/s390-pv.txt
 create mode 100644 arch/s390/kernel/uv.c
 create mode 100644 arch/s390/kvm/pv.c

-- 
2.20.1

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [RFC 01/37] DOCUMENTATION: protvirt: Protected virtual machine introduction
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-01  8:18   ` Christian Borntraeger
  2019-11-04 14:18   ` Cornelia Huck
  2019-10-24 11:40 ` [RFC 02/37] s390/protvirt: introduce host side setup Janosch Frank
                   ` (35 subsequent siblings)
  36 siblings, 2 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Introduction to Protected VMs.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 Documentation/virtual/kvm/s390-pv.txt | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)
 create mode 100644 Documentation/virtual/kvm/s390-pv.txt

diff --git a/Documentation/virtual/kvm/s390-pv.txt b/Documentation/virtual/kvm/s390-pv.txt
new file mode 100644
index 000000000000..86ed95f36759
--- /dev/null
+++ b/Documentation/virtual/kvm/s390-pv.txt
@@ -0,0 +1,23 @@
+Ultravisor and Protected VMs
+===========================
+
+Summary:
+
+Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state
+like guest memory and guest registers anymore. Instead the PVMs are
+mostly managed by a new entity called Ultravisor (UV), which provides
+an API, so KVM and the PVM can request management actions.
+
+Each guest starts in the non-protected mode and then transitions into
+protected mode. On transition KVM registers the guest and its VCPUs
+with the Ultravisor and prepares everything for running it.
+
+The Ultravisor will secure and decrypt the guest's boot memory
+(i.e. kernel/initrd). It will safeguard state changes like VCPU
+starts/stops and injected interrupts while the guest is running.
+
+As access to the guest's state, like the SIE state description is
+normally needed to be able to run a VM, some changes have been made in
+SIE behavior and fields have different meaning for a PVM. SIE exits
+are minimized as much as possible to improve speed and reduce exposed
+guest state.
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 02/37] s390/protvirt: introduce host side setup
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
  2019-10-24 11:40 ` [RFC 01/37] DOCUMENTATION: protvirt: Protected virtual machine introduction Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-24 13:25   ` David Hildenbrand
                     ` (3 more replies)
  2019-10-24 11:40 ` [RFC 03/37] s390/protvirt: add ultravisor initialization Janosch Frank
                   ` (34 subsequent siblings)
  36 siblings, 4 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

From: Vasily Gorbik <gor@linux.ibm.com>

Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option for
protected virtual machines hosting support code.

Add "prot_virt" command line option which controls if the kernel
protected VMs support is enabled at runtime.

Extend ultravisor info definitions and expose it via uv_info struct
filled in during startup.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 ++
 arch/s390/boot/Makefile                       |  2 +-
 arch/s390/boot/uv.c                           | 20 +++++++-
 arch/s390/include/asm/uv.h                    | 46 ++++++++++++++++--
 arch/s390/kernel/Makefile                     |  1 +
 arch/s390/kernel/setup.c                      |  4 --
 arch/s390/kernel/uv.c                         | 48 +++++++++++++++++++
 arch/s390/kvm/Kconfig                         |  9 ++++
 8 files changed, 126 insertions(+), 9 deletions(-)
 create mode 100644 arch/s390/kernel/uv.c

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c7ac2f3ac99f..aa22e36b3105 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3693,6 +3693,11 @@
 			before loading.
 			See Documentation/admin-guide/blockdev/ramdisk.rst.
 
+	prot_virt=	[S390] enable hosting protected virtual machines
+			isolated from the hypervisor (if hardware supports
+			that).
+			Format: <bool>
+
 	psi=		[KNL] Enable or disable pressure stall information
 			tracking.
 			Format: <bool>
diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
index e2c47d3a1c89..82247e71617a 100644
--- a/arch/s390/boot/Makefile
+++ b/arch/s390/boot/Makefile
@@ -37,7 +37,7 @@ CFLAGS_sclp_early_core.o += -I$(srctree)/drivers/s390/char
 obj-y	:= head.o als.o startup.o mem_detect.o ipl_parm.o ipl_report.o
 obj-y	+= string.o ebcdic.o sclp_early_core.o mem.o ipl_vmparm.o cmdline.o
 obj-y	+= version.o pgm_check_info.o ctype.o text_dma.o
-obj-$(CONFIG_PROTECTED_VIRTUALIZATION_GUEST)	+= uv.o
+obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST))	+= uv.o
 obj-$(CONFIG_RELOCATABLE)	+= machine_kexec_reloc.o
 obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 targets	:= bzImage startup.a section_cmp.boot.data section_cmp.boot.preserved.data $(obj-y)
diff --git a/arch/s390/boot/uv.c b/arch/s390/boot/uv.c
index ed007f4a6444..88cf8825d169 100644
--- a/arch/s390/boot/uv.c
+++ b/arch/s390/boot/uv.c
@@ -3,7 +3,12 @@
 #include <asm/facility.h>
 #include <asm/sections.h>
 
+#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
 int __bootdata_preserved(prot_virt_guest);
+#endif
+#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
+struct uv_info __bootdata_preserved(uv_info);
+#endif
 
 void uv_query_info(void)
 {
@@ -18,7 +23,20 @@ void uv_query_info(void)
 	if (uv_call(0, (uint64_t)&uvcb))
 		return;
 
-	if (test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
+	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST)) {
+		memcpy(uv_info.inst_calls_list, uvcb.inst_calls_list, sizeof(uv_info.inst_calls_list));
+		uv_info.uv_base_stor_len = uvcb.uv_base_stor_len;
+		uv_info.guest_base_stor_len = uvcb.conf_base_phys_stor_len;
+		uv_info.guest_virt_base_stor_len = uvcb.conf_base_virt_stor_len;
+		uv_info.guest_virt_var_stor_len = uvcb.conf_virt_var_stor_len;
+		uv_info.guest_cpu_stor_len = uvcb.cpu_stor_len;
+		uv_info.max_sec_stor_addr = ALIGN(uvcb.max_guest_stor_addr, PAGE_SIZE);
+		uv_info.max_num_sec_conf = uvcb.max_num_sec_conf;
+		uv_info.max_guest_cpus = uvcb.max_guest_cpus;
+	}
+
+	if (IS_ENABLED(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) &&
+	    test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
 	    test_bit_inv(BIT_UVC_CMD_REMOVE_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list))
 		prot_virt_guest = 1;
 }
diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index ef3c00b049ab..6db1bc495e67 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -44,7 +44,19 @@ struct uv_cb_qui {
 	struct uv_cb_header header;
 	u64 reserved08;
 	u64 inst_calls_list[4];
-	u64 reserved30[15];
+	u64 reserved30[2];
+	u64 uv_base_stor_len;
+	u64 reserved48;
+	u64 conf_base_phys_stor_len;
+	u64 conf_base_virt_stor_len;
+	u64 conf_virt_var_stor_len;
+	u64 cpu_stor_len;
+	u32 reserved68[3];
+	u32 max_num_sec_conf;
+	u64 max_guest_stor_addr;
+	u8  reserved80[150-128];
+	u16 max_guest_cpus;
+	u64 reserved98;
 } __packed __aligned(8);
 
 struct uv_cb_share {
@@ -69,9 +81,21 @@ static inline int uv_call(unsigned long r1, unsigned long r2)
 	return cc;
 }
 
-#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
+struct uv_info {
+	unsigned long inst_calls_list[4];
+	unsigned long uv_base_stor_len;
+	unsigned long guest_base_stor_len;
+	unsigned long guest_virt_base_stor_len;
+	unsigned long guest_virt_var_stor_len;
+	unsigned long guest_cpu_stor_len;
+	unsigned long max_sec_stor_addr;
+	unsigned int max_num_sec_conf;
+	unsigned short max_guest_cpus;
+};
+extern struct uv_info uv_info;
 extern int prot_virt_guest;
 
+#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
 static inline int is_prot_virt_guest(void)
 {
 	return prot_virt_guest;
@@ -121,11 +145,27 @@ static inline int uv_remove_shared(unsigned long addr)
 	return share(addr, UVC_CMD_REMOVE_SHARED_ACCESS);
 }
 
-void uv_query_info(void);
 #else
 #define is_prot_virt_guest() 0
 static inline int uv_set_shared(unsigned long addr) { return 0; }
 static inline int uv_remove_shared(unsigned long addr) { return 0; }
+#endif
+
+#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
+extern int prot_virt_host;
+
+static inline int is_prot_virt_host(void)
+{
+	return prot_virt_host;
+}
+#else
+#define is_prot_virt_host() 0
+#endif
+
+#if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
+	defined(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST)
+void uv_query_info(void);
+#else
 static inline void uv_query_info(void) {}
 #endif
 
diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
index 7edbbcd8228a..fe4fe475f526 100644
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -78,6 +78,7 @@ obj-$(CONFIG_PERF_EVENTS)	+= perf_cpum_cf_events.o perf_regs.o
 obj-$(CONFIG_PERF_EVENTS)	+= perf_cpum_cf_diag.o
 
 obj-$(CONFIG_TRACEPOINTS)	+= trace.o
+obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST))	+= uv.o
 
 # vdso
 obj-y				+= vdso64/
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 3ff291bc63b7..f36370f8af38 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -92,10 +92,6 @@ char elf_platform[ELF_PLATFORM_SIZE];
 
 unsigned long int_hwcap = 0;
 
-#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
-int __bootdata_preserved(prot_virt_guest);
-#endif
-
 int __bootdata(noexec_disabled);
 int __bootdata(memory_end_set);
 unsigned long __bootdata(memory_end);
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
new file mode 100644
index 000000000000..35ce89695509
--- /dev/null
+++ b/arch/s390/kernel/uv.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Common Ultravisor functions and initialization
+ *
+ * Copyright IBM Corp. 2019
+ */
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/sizes.h>
+#include <linux/bitmap.h>
+#include <linux/memblock.h>
+#include <asm/facility.h>
+#include <asm/sections.h>
+#include <asm/uv.h>
+
+#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
+int __bootdata_preserved(prot_virt_guest);
+#endif
+
+#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
+int prot_virt_host;
+EXPORT_SYMBOL(prot_virt_host);
+struct uv_info __bootdata_preserved(uv_info);
+EXPORT_SYMBOL(uv_info);
+
+static int __init prot_virt_setup(char *val)
+{
+	bool enabled;
+	int rc;
+
+	rc = kstrtobool(val, &enabled);
+	if (!rc && enabled)
+		prot_virt_host = 1;
+
+	if (is_prot_virt_guest() && prot_virt_host) {
+		prot_virt_host = 0;
+		pr_info("Running as protected virtualization guest.");
+	}
+
+	if (prot_virt_host && !test_facility(158)) {
+		prot_virt_host = 0;
+		pr_info("The ultravisor call facility is not available.");
+	}
+
+	return rc;
+}
+early_param("prot_virt", prot_virt_setup);
+#endif
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index d3db3d7ed077..652b36f0efca 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -55,6 +55,15 @@ config KVM_S390_UCONTROL
 
 	  If unsure, say N.
 
+config KVM_S390_PROTECTED_VIRTUALIZATION_HOST
+	bool "Protected guests execution support"
+	depends on KVM
+	---help---
+	  Support hosting protected virtual machines isolated from the
+	  hypervisor.
+
+	  If unsure, say Y.
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source "drivers/vhost/Kconfig"
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 03/37] s390/protvirt: add ultravisor initialization
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
  2019-10-24 11:40 ` [RFC 01/37] DOCUMENTATION: protvirt: Protected virtual machine introduction Janosch Frank
  2019-10-24 11:40 ` [RFC 02/37] s390/protvirt: introduce host side setup Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-25  9:21   ` David Hildenbrand
                     ` (2 more replies)
  2019-10-24 11:40 ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Janosch Frank
                   ` (33 subsequent siblings)
  36 siblings, 3 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

From: Vasily Gorbik <gor@linux.ibm.com>

Before being able to host protected virtual machines, donate some of
the memory to the ultravisor. Besides that the ultravisor might impose
addressing limitations for memory used to back protected VM storage. Treat
that limit as protected virtualization host's virtual memory limit.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
---
 arch/s390/include/asm/uv.h | 16 ++++++++++++
 arch/s390/kernel/setup.c   |  3 +++
 arch/s390/kernel/uv.c      | 53 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 72 insertions(+)

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 6db1bc495e67..82a46fb913e7 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -23,12 +23,14 @@
 #define UVC_RC_NO_RESUME	0x0007
 
 #define UVC_CMD_QUI			0x0001
+#define UVC_CMD_INIT_UV			0x000f
 #define UVC_CMD_SET_SHARED_ACCESS	0x1000
 #define UVC_CMD_REMOVE_SHARED_ACCESS	0x1001
 
 /* Bits in installed uv calls */
 enum uv_cmds_inst {
 	BIT_UVC_CMD_QUI = 0,
+	BIT_UVC_CMD_INIT_UV = 1,
 	BIT_UVC_CMD_SET_SHARED_ACCESS = 8,
 	BIT_UVC_CMD_REMOVE_SHARED_ACCESS = 9,
 };
@@ -59,6 +61,15 @@ struct uv_cb_qui {
 	u64 reserved98;
 } __packed __aligned(8);
 
+struct uv_cb_init {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 stor_origin;
+	u64 stor_len;
+	u64 reserved28[4];
+
+} __packed __aligned(8);
+
 struct uv_cb_share {
 	struct uv_cb_header header;
 	u64 reserved08[3];
@@ -158,8 +169,13 @@ static inline int is_prot_virt_host(void)
 {
 	return prot_virt_host;
 }
+
+void setup_uv(void);
+void adjust_to_uv_max(unsigned long *vmax);
 #else
 #define is_prot_virt_host() 0
+static inline void setup_uv(void) {}
+static inline void adjust_to_uv_max(unsigned long *vmax) {}
 #endif
 
 #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index f36370f8af38..d29d83c0b8df 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -567,6 +567,8 @@ static void __init setup_memory_end(void)
 			vmax = _REGION1_SIZE; /* 4-level kernel page table */
 	}
 
+	adjust_to_uv_max(&vmax);
+
 	/* module area is at the end of the kernel address space. */
 	MODULES_END = vmax;
 	MODULES_VADDR = MODULES_END - MODULES_LEN;
@@ -1147,6 +1149,7 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	memblock_trim_memory(1UL << (MAX_ORDER - 1 + PAGE_SHIFT));
 
+	setup_uv();
 	setup_memory_end();
 	setup_memory();
 	dma_contiguous_reserve(memory_end);
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 35ce89695509..f7778493e829 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -45,4 +45,57 @@ static int __init prot_virt_setup(char *val)
 	return rc;
 }
 early_param("prot_virt", prot_virt_setup);
+
+static int __init uv_init(unsigned long stor_base, unsigned long stor_len)
+{
+	struct uv_cb_init uvcb = {
+		.header.cmd = UVC_CMD_INIT_UV,
+		.header.len = sizeof(uvcb),
+		.stor_origin = stor_base,
+		.stor_len = stor_len,
+	};
+	int cc;
+
+	cc = uv_call(0, (uint64_t)&uvcb);
+	if (cc || uvcb.header.rc != UVC_RC_EXECUTED) {
+		pr_err("Ultravisor init failed with cc: %d rc: 0x%hx\n", cc,
+		       uvcb.header.rc);
+		return -1;
+	}
+	return 0;
+}
+
+void __init setup_uv(void)
+{
+	unsigned long uv_stor_base;
+
+	if (!prot_virt_host)
+		return;
+
+	uv_stor_base = (unsigned long)memblock_alloc_try_nid(
+		uv_info.uv_base_stor_len, SZ_1M, SZ_2G,
+		MEMBLOCK_ALLOC_ACCESSIBLE, NUMA_NO_NODE);
+	if (!uv_stor_base) {
+		pr_info("Failed to reserve %lu bytes for ultravisor base storage\n",
+			uv_info.uv_base_stor_len);
+		goto fail;
+	}
+
+	if (uv_init(uv_stor_base, uv_info.uv_base_stor_len)) {
+		memblock_free(uv_stor_base, uv_info.uv_base_stor_len);
+		goto fail;
+	}
+
+	pr_info("Reserving %luMB as ultravisor base storage\n",
+		uv_info.uv_base_stor_len >> 20);
+	return;
+fail:
+	prot_virt_host = 0;
+}
+
+void adjust_to_uv_max(unsigned long *vmax)
+{
+	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
+		*vmax = uv_info.max_sec_stor_addr;
+}
 #endif
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (2 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 03/37] s390/protvirt: add ultravisor initialization Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-25  8:58   ` David Hildenbrand
                     ` (5 more replies)
  2019-10-24 11:40 ` [RFC 05/37] s390: KVM: Export PV handle to gmap Janosch Frank
                   ` (32 subsequent siblings)
  36 siblings, 6 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Let's add a KVM interface to create and destroy protected VMs.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  24 +++-
 arch/s390/include/asm/uv.h       | 110 ++++++++++++++
 arch/s390/kvm/Makefile           |   2 +-
 arch/s390/kvm/kvm-s390.c         | 173 +++++++++++++++++++++-
 arch/s390/kvm/kvm-s390.h         |  47 ++++++
 arch/s390/kvm/pv.c               | 237 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h         |  33 +++++
 7 files changed, 622 insertions(+), 4 deletions(-)
 create mode 100644 arch/s390/kvm/pv.c

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 02f4c21c57f6..d4fd0f3af676 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -155,7 +155,13 @@ struct kvm_s390_sie_block {
 	__u8	reserved08[4];		/* 0x0008 */
 #define PROG_IN_SIE (1<<0)
 	__u32	prog0c;			/* 0x000c */
-	__u8	reserved10[16];		/* 0x0010 */
+	union {
+		__u8	reserved10[16];		/* 0x0010 */
+		struct {
+			__u64	pv_handle_cpu;
+			__u64	pv_handle_config;
+		};
+	};
 #define PROG_BLOCK_SIE	(1<<0)
 #define PROG_REQUEST	(1<<1)
 	atomic_t prog20;		/* 0x0020 */
@@ -228,7 +234,7 @@ struct kvm_s390_sie_block {
 #define ECB3_RI  0x01
 	__u8    ecb3;			/* 0x0063 */
 	__u32	scaol;			/* 0x0064 */
-	__u8	reserved68;		/* 0x0068 */
+	__u8	sdf;			/* 0x0068 */
 	__u8    epdx;			/* 0x0069 */
 	__u8    reserved6a[2];		/* 0x006a */
 	__u32	todpr;			/* 0x006c */
@@ -640,6 +646,11 @@ struct kvm_guestdbg_info_arch {
 	unsigned long last_bp;
 };
 
+struct kvm_s390_pv_vcpu {
+	u64 handle;
+	unsigned long stor_base;
+};
+
 struct kvm_vcpu_arch {
 	struct kvm_s390_sie_block *sie_block;
 	/* if vsie is active, currently executed shadow sie control block */
@@ -668,6 +679,7 @@ struct kvm_vcpu_arch {
 	__u64 cputm_start;
 	bool gs_enabled;
 	bool skey_enabled;
+	struct kvm_s390_pv_vcpu pv;
 };
 
 struct kvm_vm_stat {
@@ -841,6 +853,13 @@ struct kvm_s390_gisa_interrupt {
 	DECLARE_BITMAP(kicked_mask, KVM_MAX_VCPUS);
 };
 
+struct kvm_s390_pv {
+	u64 handle;
+	u64 guest_len;
+	unsigned long stor_base;
+	void *stor_var;
+};
+
 struct kvm_arch{
 	void *sca;
 	int use_esca;
@@ -876,6 +895,7 @@ struct kvm_arch{
 	DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
 	DECLARE_BITMAP(idle_mask, KVM_MAX_VCPUS);
 	struct kvm_s390_gisa_interrupt gisa_int;
+	struct kvm_s390_pv pv;
 };
 
 #define KVM_HVA_ERR_BAD		(-1UL)
diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 82a46fb913e7..0bfbafcca136 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -21,9 +21,19 @@
 #define UVC_RC_INV_STATE	0x0003
 #define UVC_RC_INV_LEN		0x0005
 #define UVC_RC_NO_RESUME	0x0007
+#define UVC_RC_NEED_DESTROY	0x8000
 
 #define UVC_CMD_QUI			0x0001
 #define UVC_CMD_INIT_UV			0x000f
+#define UVC_CMD_CREATE_SEC_CONF		0x0100
+#define UVC_CMD_DESTROY_SEC_CONF	0x0101
+#define UVC_CMD_CREATE_SEC_CPU		0x0120
+#define UVC_CMD_DESTROY_SEC_CPU		0x0121
+#define UVC_CMD_CONV_TO_SEC_STOR	0x0200
+#define UVC_CMD_CONV_FROM_SEC_STOR	0x0201
+#define UVC_CMD_SET_SEC_CONF_PARAMS	0x0300
+#define UVC_CMD_UNPACK_IMG		0x0301
+#define UVC_CMD_VERIFY_IMG		0x0302
 #define UVC_CMD_SET_SHARED_ACCESS	0x1000
 #define UVC_CMD_REMOVE_SHARED_ACCESS	0x1001
 
@@ -31,8 +41,17 @@
 enum uv_cmds_inst {
 	BIT_UVC_CMD_QUI = 0,
 	BIT_UVC_CMD_INIT_UV = 1,
+	BIT_UVC_CMD_CREATE_SEC_CONF = 2,
+	BIT_UVC_CMD_DESTROY_SEC_CONF = 3,
+	BIT_UVC_CMD_CREATE_SEC_CPU = 4,
+	BIT_UVC_CMD_DESTROY_SEC_CPU = 5,
+	BIT_UVC_CMD_CONV_TO_SEC_STOR = 6,
+	BIT_UVC_CMD_CONV_FROM_SEC_STOR = 7,
 	BIT_UVC_CMD_SET_SHARED_ACCESS = 8,
 	BIT_UVC_CMD_REMOVE_SHARED_ACCESS = 9,
+	BIT_UVC_CMD_SET_SEC_PARMS = 11,
+	BIT_UVC_CMD_UNPACK_IMG = 13,
+	BIT_UVC_CMD_VERIFY_IMG = 14,
 };
 
 struct uv_cb_header {
@@ -70,6 +89,76 @@ struct uv_cb_init {
 
 } __packed __aligned(8);
 
+struct uv_cb_cgc {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 guest_handle;
+	u64 conf_base_stor_origin;
+	u64 conf_var_stor_origin;
+	u64 reserved30;
+	u64 guest_stor_origin;
+	u64 guest_stor_len;
+	u64 reserved48;
+	u64 guest_asce;
+	u64 reserved60[5];
+} __packed __aligned(8);
+
+struct uv_cb_csc {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 cpu_handle;
+	u64 guest_handle;
+	u64 stor_origin;
+	u8  reserved30[6];
+	u16 num;
+	u64 state_origin;
+	u64 reserved[4];
+} __packed __aligned(8);
+
+struct uv_cb_cts {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 guest_handle;
+	u64 gaddr;
+} __packed __aligned(8);
+
+struct uv_cb_cfs {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 paddr;
+} __packed __aligned(8);
+
+struct uv_cb_ssc {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 guest_handle;
+	u64 sec_header_origin;
+	u32 sec_header_len;
+	u32 reserved32;
+	u64 reserved38[4];
+} __packed __aligned(8);
+
+struct uv_cb_unp {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 guest_handle;
+	u64 gaddr;
+	u64 tweak[2];
+	u64 reserved28[3];
+} __packed __aligned(8);
+
+/*
+ * A common UV call struct for the following calls:
+ * Destroy cpu/config
+ * Verify
+ */
+struct uv_cb_nodata {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 handle;
+	u64 reserved20[4];
+} __packed __aligned(8);
+
 struct uv_cb_share {
 	struct uv_cb_header header;
 	u64 reserved08[3];
@@ -170,12 +259,33 @@ static inline int is_prot_virt_host(void)
 	return prot_virt_host;
 }
 
+/*
+ * Generic cmd executor for calls that only transport the cpu or guest
+ * handle and the command.
+ */
+static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
+{
+	int rc;
+	struct uv_cb_nodata uvcb = {
+		.header.cmd = cmd,
+		.header.len = sizeof(uvcb),
+		.handle = handle,
+	};
+
+	WARN(!handle, "No handle provided to Ultravisor call cmd %x\n", cmd);
+	rc = uv_call(0, (u64)&uvcb);
+	if (ret)
+		*ret = *(u32 *)&uvcb.header.rc;
+	return rc ? -EINVAL : 0;
+}
+
 void setup_uv(void);
 void adjust_to_uv_max(unsigned long *vmax);
 #else
 #define is_prot_virt_host() 0
 static inline void setup_uv(void) {}
 static inline void adjust_to_uv_max(unsigned long *vmax) {}
+static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret) { return 0; }
 #endif
 
 #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
index 05ee90a5ea08..eaaedf9e61a7 100644
--- a/arch/s390/kvm/Makefile
+++ b/arch/s390/kvm/Makefile
@@ -9,6 +9,6 @@ common-objs = $(KVM)/kvm_main.o $(KVM)/eventfd.o  $(KVM)/async_pf.o $(KVM)/irqch
 ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
 
 kvm-objs := $(common-objs) kvm-s390.o intercept.o interrupt.o priv.o sigp.o
-kvm-objs += diag.o gaccess.o guestdbg.o vsie.o
+kvm-objs += diag.o gaccess.o guestdbg.o vsie.o $(if $(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST),pv.o)
 
 obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 3b5ebf48f802..924132d92782 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -44,6 +44,7 @@
 #include <asm/cpacf.h>
 #include <asm/timex.h>
 #include <asm/ap.h>
+#include <asm/uv.h>
 #include "kvm-s390.h"
 #include "gaccess.h"
 
@@ -235,6 +236,7 @@ int kvm_arch_check_processor_compat(void)
 
 static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
 			      unsigned long end);
+static int sca_switch_to_extended(struct kvm *kvm);
 
 static void kvm_clock_sync_scb(struct kvm_s390_sie_block *scb, u64 delta)
 {
@@ -563,6 +565,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_S390_BPB:
 		r = test_facility(82);
 		break;
+#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
+	case KVM_CAP_S390_PROTECTED:
+		r = is_prot_virt_host();
+		break;
+#endif
 	default:
 		r = 0;
 	}
@@ -2157,6 +2164,96 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
 	return r;
 }
 
+#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
+static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
+{
+	int r = 0;
+	void __user *argp = (void __user *)cmd->data;
+
+	switch (cmd->cmd) {
+	case KVM_PV_VM_CREATE: {
+		r = kvm_s390_pv_alloc_vm(kvm);
+		if (r)
+			break;
+
+		mutex_lock(&kvm->lock);
+		kvm_s390_vcpu_block_all(kvm);
+		/* FMT 4 SIE needs esca */
+		r = sca_switch_to_extended(kvm);
+		if (!r)
+			r = kvm_s390_pv_create_vm(kvm);
+		kvm_s390_vcpu_unblock_all(kvm);
+		mutex_unlock(&kvm->lock);
+		break;
+	}
+	case KVM_PV_VM_DESTROY: {
+		/* All VCPUs have to be destroyed before this call. */
+		mutex_lock(&kvm->lock);
+		kvm_s390_vcpu_block_all(kvm);
+		r = kvm_s390_pv_destroy_vm(kvm);
+		if (!r)
+			kvm_s390_pv_dealloc_vm(kvm);
+		kvm_s390_vcpu_unblock_all(kvm);
+		mutex_unlock(&kvm->lock);
+		break;
+	}
+	case KVM_PV_VM_SET_SEC_PARMS: {
+		struct kvm_s390_pv_sec_parm parms = {};
+		void *hdr;
+
+		r = -EFAULT;
+		if (copy_from_user(&parms, argp, sizeof(parms)))
+			break;
+
+		/* Currently restricted to 8KB */
+		r = -EINVAL;
+		if (parms.length > PAGE_SIZE * 2)
+			break;
+
+		r = -ENOMEM;
+		hdr = vmalloc(parms.length);
+		if (!hdr)
+			break;
+
+		r = -EFAULT;
+		if (!copy_from_user(hdr, (void __user *)parms.origin,
+				   parms.length))
+			r = kvm_s390_pv_set_sec_parms(kvm, hdr, parms.length);
+
+		vfree(hdr);
+		break;
+	}
+	case KVM_PV_VM_UNPACK: {
+		struct kvm_s390_pv_unp unp = {};
+
+		r = -EFAULT;
+		if (copy_from_user(&unp, argp, sizeof(unp)))
+			break;
+
+		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak);
+		break;
+	}
+	case KVM_PV_VM_VERIFY: {
+		u32 ret;
+
+		r = -EINVAL;
+		if (!kvm_s390_pv_is_protected(kvm))
+			break;
+
+		r = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
+				  UVC_CMD_VERIFY_IMG,
+				  &ret);
+		VM_EVENT(kvm, 3, "PROTVIRT VERIFY: rc %x rrc %x",
+			 ret >> 16, ret & 0x0000ffff);
+		break;
+	}
+	default:
+		return -ENOTTY;
+	}
+	return r;
+}
+#endif
+
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
@@ -2254,6 +2351,22 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		mutex_unlock(&kvm->slots_lock);
 		break;
 	}
+#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
+	case KVM_S390_PV_COMMAND: {
+		struct kvm_pv_cmd args;
+
+		r = -EINVAL;
+		if (!is_prot_virt_host())
+			break;
+
+		r = -EFAULT;
+		if (copy_from_user(&args, argp, sizeof(args)))
+			break;
+
+		r = kvm_s390_handle_pv(kvm, &args);
+		break;
+	}
+#endif
 	default:
 		r = -ENOTTY;
 	}
@@ -2529,6 +2642,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 	if (vcpu->kvm->arch.use_cmma)
 		kvm_s390_vcpu_unsetup_cmma(vcpu);
+	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
+	    kvm_s390_pv_handle_cpu(vcpu))
+		kvm_s390_pv_destroy_cpu(vcpu);
 	free_page((unsigned long)(vcpu->arch.sie_block));
 
 	kvm_vcpu_uninit(vcpu);
@@ -2555,8 +2671,13 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	kvm_free_vcpus(kvm);
 	sca_dispose(kvm);
-	debug_unregister(kvm->arch.dbf);
 	kvm_s390_gisa_destroy(kvm);
+	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
+	    kvm_s390_pv_is_protected(kvm)) {
+		kvm_s390_pv_destroy_vm(kvm);
+		kvm_s390_pv_dealloc_vm(kvm);
+	}
+	debug_unregister(kvm->arch.dbf);
 	free_page((unsigned long)kvm->arch.sie_page2);
 	if (!kvm_is_ucontrol(kvm))
 		gmap_remove(kvm->arch.gmap);
@@ -2652,6 +2773,9 @@ static int sca_switch_to_extended(struct kvm *kvm)
 	unsigned int vcpu_idx;
 	u32 scaol, scaoh;
 
+	if (kvm->arch.use_esca)
+		return 0;
+
 	new_sca = alloc_pages_exact(sizeof(*new_sca), GFP_KERNEL|__GFP_ZERO);
 	if (!new_sca)
 		return -ENOMEM;
@@ -3073,6 +3197,15 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 	rc = kvm_vcpu_init(vcpu, kvm, id);
 	if (rc)
 		goto out_free_sie_block;
+
+	if (kvm_s390_pv_is_protected(kvm)) {
+		rc = kvm_s390_pv_create_cpu(vcpu);
+		if (rc) {
+			kvm_vcpu_uninit(vcpu);
+			goto out_free_sie_block;
+		}
+	}
+
 	VM_EVENT(kvm, 3, "create cpu %d at 0x%pK, sie block at 0x%pK", id, vcpu,
 		 vcpu->arch.sie_block);
 	trace_kvm_s390_create_vcpu(id, vcpu, vcpu->arch.sie_block);
@@ -4338,6 +4471,28 @@ long kvm_arch_vcpu_async_ioctl(struct file *filp,
 	return -ENOIOCTLCMD;
 }
 
+#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
+static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
+				   struct kvm_pv_cmd *cmd)
+{
+	int r = 0;
+
+	switch (cmd->cmd) {
+	case KVM_PV_VCPU_CREATE: {
+		r = kvm_s390_pv_create_cpu(vcpu);
+		break;
+	}
+	case KVM_PV_VCPU_DESTROY: {
+		r = kvm_s390_pv_destroy_cpu(vcpu);
+		break;
+	}
+	default:
+		r = -ENOTTY;
+	}
+	return r;
+}
+#endif
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
@@ -4470,6 +4625,22 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 					   irq_state.len);
 		break;
 	}
+#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
+	case KVM_S390_PV_COMMAND_VCPU: {
+		struct kvm_pv_cmd args;
+
+		r = -EINVAL;
+		if (!is_prot_virt_host())
+			break;
+
+		r = -EFAULT;
+		if (copy_from_user(&args, argp, sizeof(args)))
+			break;
+
+		r = kvm_s390_handle_pv_vcpu(vcpu, &args);
+		break;
+	}
+#endif
 	default:
 		r = -ENOTTY;
 	}
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 6d9448dbd052..0d61dcc51f0e 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -196,6 +196,53 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm)
 	return kvm->arch.user_cpu_state_ctrl != 0;
 }
 
+#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
+/* implemented in pv.c */
+void kvm_s390_pv_unpin(struct kvm *kvm);
+void kvm_s390_pv_dealloc_vm(struct kvm *kvm);
+int kvm_s390_pv_alloc_vm(struct kvm *kvm);
+int kvm_s390_pv_create_vm(struct kvm *kvm);
+int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu);
+int kvm_s390_pv_destroy_vm(struct kvm *kvm);
+int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu);
+int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
+int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
+		       unsigned long tweak);
+int kvm_s390_pv_verify(struct kvm *kvm);
+
+static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
+{
+	return !!kvm->arch.pv.handle;
+}
+
+static inline u64 kvm_s390_pv_handle(struct kvm *kvm)
+{
+	return kvm->arch.pv.handle;
+}
+
+static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.pv.handle;
+}
+#else
+static inline void kvm_s390_pv_unpin(struct kvm *kvm) {}
+static inline void kvm_s390_pv_dealloc_vm(struct kvm *kvm) {}
+static inline int kvm_s390_pv_alloc_vm(struct kvm *kvm) { return 0; }
+static inline int kvm_s390_pv_create_vm(struct kvm *kvm) { return 0; }
+static inline int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu) { return 0; }
+static inline int kvm_s390_pv_destroy_vm(struct kvm *kvm) { return 0; }
+static inline int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu) { return 0; }
+static inline int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
+					    u64 origin, u64 length) { return 0; }
+static inline int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr,
+				     unsigned long size,  unsigned long tweak)
+{ return 0; }
+static inline int kvm_s390_pv_verify(struct kvm *kvm) { return 0; }
+static inline bool kvm_s390_pv_is_protected(struct kvm *kvm) { return 0; }
+static inline u64 kvm_s390_pv_handle(struct kvm *kvm) { return 0; }
+static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu) { return 0; }
+#endif
+
 /* implemented in interrupt.c */
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
 void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu);
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
new file mode 100644
index 000000000000..94cf16f40f25
--- /dev/null
+++ b/arch/s390/kvm/pv.c
@@ -0,0 +1,237 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hosting Secure Execution virtual machines
+ *
+ * Copyright IBM Corp. 2019
+ *    Author(s): Janosch Frank <frankja@linux.ibm.com>
+ */
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/pagemap.h>
+#include <asm/pgalloc.h>
+#include <asm/gmap.h>
+#include <asm/uv.h>
+#include <asm/gmap.h>
+#include <asm/mman.h>
+#include "kvm-s390.h"
+
+void kvm_s390_pv_dealloc_vm(struct kvm *kvm)
+{
+	vfree(kvm->arch.pv.stor_var);
+	free_pages(kvm->arch.pv.stor_base,
+		   get_order(uv_info.guest_base_stor_len));
+	memset(&kvm->arch.pv, 0, sizeof(kvm->arch.pv));
+}
+
+int kvm_s390_pv_alloc_vm(struct kvm *kvm)
+{
+	unsigned long base = uv_info.guest_base_stor_len;
+	unsigned long virt = uv_info.guest_virt_var_stor_len;
+	unsigned long npages = 0, vlen = 0;
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+
+	kvm->arch.pv.stor_var = NULL;
+	kvm->arch.pv.stor_base = __get_free_pages(GFP_KERNEL, get_order(base));
+	if (!kvm->arch.pv.stor_base)
+		return -ENOMEM;
+
+	/*
+	 * Calculate current guest storage for allocation of the
+	 * variable storage, which is based on the length in MB.
+	 *
+	 * Slots are sorted by GFN
+	 */
+	mutex_lock(&kvm->slots_lock);
+	slots = kvm_memslots(kvm);
+	memslot = slots->memslots;
+	npages = memslot->base_gfn + memslot->npages;
+
+	mutex_unlock(&kvm->slots_lock);
+	kvm->arch.pv.guest_len = npages * PAGE_SIZE;
+
+	/* Allocate variable storage */
+	vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
+	vlen += uv_info.guest_virt_base_stor_len;
+	kvm->arch.pv.stor_var = vzalloc(vlen);
+	if (!kvm->arch.pv.stor_var) {
+		kvm_s390_pv_dealloc_vm(kvm);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+int kvm_s390_pv_destroy_vm(struct kvm *kvm)
+{
+	int rc;
+	u32 ret;
+
+	rc = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
+			   UVC_CMD_DESTROY_SEC_CONF, &ret);
+	VM_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
+		 ret >> 16, ret & 0x0000ffff);
+	return rc;
+}
+
+int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
+{
+	int rc = 0;
+	u32 ret;
+
+	if (kvm_s390_pv_handle_cpu(vcpu)) {
+		rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
+				   UVC_CMD_DESTROY_SEC_CPU,
+				   &ret);
+
+		VCPU_EVENT(vcpu, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
+			   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
+	}
+
+	free_pages(vcpu->arch.pv.stor_base,
+		   get_order(uv_info.guest_cpu_stor_len));
+	/* Clear cpu and vm handle */
+	memset(&vcpu->arch.sie_block->reserved10, 0,
+	       sizeof(vcpu->arch.sie_block->reserved10));
+	memset(&vcpu->arch.pv, 0, sizeof(vcpu->arch.pv));
+	vcpu->arch.sie_block->sdf = 0;
+	return rc;
+}
+
+int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
+{
+	int rc;
+	struct uv_cb_csc uvcb = {
+		.header.cmd = UVC_CMD_CREATE_SEC_CPU,
+		.header.len = sizeof(uvcb),
+	};
+
+	/* EEXIST and ENOENT? */
+	if (kvm_s390_pv_handle_cpu(vcpu))
+		return -EINVAL;
+
+	vcpu->arch.pv.stor_base = __get_free_pages(GFP_KERNEL,
+						   get_order(uv_info.guest_cpu_stor_len));
+	if (!vcpu->arch.pv.stor_base)
+		return -ENOMEM;
+
+	/* Input */
+	uvcb.guest_handle = kvm_s390_pv_handle(vcpu->kvm);
+	uvcb.num = vcpu->arch.sie_block->icpua;
+	uvcb.state_origin = (u64)vcpu->arch.sie_block;
+	uvcb.stor_origin = (u64)vcpu->arch.pv.stor_base;
+
+	rc = uv_call(0, (u64)&uvcb);
+	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
+		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
+		   uvcb.header.rrc);
+
+	/* Output */
+	vcpu->arch.pv.handle = uvcb.cpu_handle;
+	vcpu->arch.sie_block->pv_handle_cpu = uvcb.cpu_handle;
+	vcpu->arch.sie_block->pv_handle_config = kvm_s390_pv_handle(vcpu->kvm);
+	vcpu->arch.sie_block->sdf = 2;
+	if (!rc)
+		return 0;
+
+	kvm_s390_pv_destroy_cpu(vcpu);
+	return -EINVAL;
+}
+
+int kvm_s390_pv_create_vm(struct kvm *kvm)
+{
+	int rc;
+
+	struct uv_cb_cgc uvcb = {
+		.header.cmd = UVC_CMD_CREATE_SEC_CONF,
+		.header.len = sizeof(uvcb)
+	};
+
+	if (kvm_s390_pv_handle(kvm))
+		return -EINVAL;
+
+	/* Inputs */
+	uvcb.guest_stor_origin = 0; /* MSO is 0 for KVM */
+	uvcb.guest_stor_len = kvm->arch.pv.guest_len;
+	uvcb.guest_asce = kvm->arch.gmap->asce;
+	uvcb.conf_base_stor_origin = (u64)kvm->arch.pv.stor_base;
+	uvcb.conf_var_stor_origin = (u64)kvm->arch.pv.stor_var;
+
+	rc = uv_call(0, (u64)&uvcb);
+	VM_EVENT(kvm, 3, "PROTVIRT CREATE VM: handle %llx len %llx rc %x rrc %x",
+		 uvcb.guest_handle, uvcb.guest_stor_len, uvcb.header.rc,
+		 uvcb.header.rrc);
+
+	/* Outputs */
+	kvm->arch.pv.handle = uvcb.guest_handle;
+
+	if (rc && (uvcb.header.rc & 0x8000)) {
+		kvm_s390_pv_destroy_vm(kvm);
+		kvm_s390_pv_dealloc_vm(kvm);
+		return -EINVAL;
+	}
+	return rc;
+}
+
+int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
+			      void *hdr, u64 length)
+{
+	int rc;
+	struct uv_cb_ssc uvcb = {
+		.header.cmd = UVC_CMD_SET_SEC_CONF_PARAMS,
+		.header.len = sizeof(uvcb),
+		.sec_header_origin = (u64)hdr,
+		.sec_header_len = length,
+		.guest_handle = kvm_s390_pv_handle(kvm),
+	};
+
+	if (!kvm_s390_pv_handle(kvm))
+		return -EINVAL;
+
+	rc = uv_call(0, (u64)&uvcb);
+	VM_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x",
+		 uvcb.header.rc, uvcb.header.rrc);
+	if (rc)
+		return -EINVAL;
+	return 0;
+}
+
+int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
+		       unsigned long tweak)
+{
+	int i, rc = 0;
+	struct uv_cb_unp uvcb = {
+		.header.cmd = UVC_CMD_UNPACK_IMG,
+		.header.len = sizeof(uvcb),
+		.guest_handle = kvm_s390_pv_handle(kvm),
+		.tweak[0] = tweak
+	};
+
+	if (addr & ~PAGE_MASK || size & ~PAGE_MASK)
+		return -EINVAL;
+
+
+	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",
+		 addr, size);
+	for (i = 0; i < size / PAGE_SIZE; i++) {
+		uvcb.gaddr = addr + i * PAGE_SIZE;
+		uvcb.tweak[1] = i * PAGE_SIZE;
+retry:
+		rc = uv_call(0, (u64)&uvcb);
+		if (!rc)
+			continue;
+		/* If not yet mapped fault and retry */
+		if (uvcb.header.rc == 0x10a) {
+			rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
+					FAULT_FLAG_WRITE);
+			if (rc)
+				return rc;
+			goto retry;
+		}
+		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
+			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
+		break;
+	}
+	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x rrc %x",
+		 uvcb.header.rc, uvcb.header.rrc);
+	return rc;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 52641d8ca9e8..bb37d5710c89 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1000,6 +1000,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PMU_EVENT_FILTER 173
 #define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174
 #define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 175
+#define KVM_CAP_S390_PROTECTED 180
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1461,6 +1462,38 @@ struct kvm_enc_region {
 /* Available with KVM_CAP_ARM_SVE */
 #define KVM_ARM_VCPU_FINALIZE	  _IOW(KVMIO,  0xc2, int)
 
+struct kvm_s390_pv_sec_parm {
+	__u64	origin;
+	__u64	length;
+};
+
+struct kvm_s390_pv_unp {
+	__u64 addr;
+	__u64 size;
+	__u64 tweak;
+};
+
+enum pv_cmd_id {
+	KVM_PV_VM_CREATE,
+	KVM_PV_VM_DESTROY,
+	KVM_PV_VM_SET_SEC_PARMS,
+	KVM_PV_VM_UNPACK,
+	KVM_PV_VM_VERIFY,
+	KVM_PV_VCPU_CREATE,
+	KVM_PV_VCPU_DESTROY,
+};
+
+struct kvm_pv_cmd {
+	__u32	cmd;
+	__u16	rc;
+	__u16	rrc;
+	__u64	data;
+};
+
+/* Available with KVM_CAP_S390_SE */
+#define KVM_S390_PV_COMMAND		_IOW(KVMIO, 0xc3, struct kvm_pv_cmd)
+#define KVM_S390_PV_COMMAND_VCPU	_IOW(KVMIO, 0xc4, struct kvm_pv_cmd)
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 05/37] s390: KVM: Export PV handle to gmap
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (3 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-25  9:04   ` David Hildenbrand
  2019-10-24 11:40 ` [RFC 06/37] s390: UV: Add import and export to UV library Janosch Frank
                   ` (31 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

We need it in the next patch, when doing memory management for the
guest in the kernel's fault handler, where otherwise we wouldn't have
access to the handle.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/gmap.h | 1 +
 arch/s390/kvm/pv.c           | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 37f96b6f0e61..6efc0b501227 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -61,6 +61,7 @@ struct gmap {
 	spinlock_t shadow_lock;
 	struct gmap *parent;
 	unsigned long orig_asce;
+	unsigned long se_handle;
 	int edat_level;
 	bool removed;
 	bool initialized;
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 94cf16f40f25..80aecd5bea9e 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -169,6 +169,7 @@ int kvm_s390_pv_create_vm(struct kvm *kvm)
 		kvm_s390_pv_dealloc_vm(kvm);
 		return -EINVAL;
 	}
+	kvm->arch.gmap->se_handle = uvcb.guest_handle;
 	return rc;
 }
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 06/37] s390: UV: Add import and export to UV library
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (4 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 05/37] s390: KVM: Export PV handle to gmap Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-25  8:31   ` David Hildenbrand
                     ` (3 more replies)
  2019-10-24 11:40 ` [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable Janosch Frank
                   ` (30 subsequent siblings)
  36 siblings, 4 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

The convert to/from secure (or also "import/export") ultravisor calls
are need for page management, i.e. paging, of secure execution VM.

Export encrypts a secure guest's page and makes it accessible to the
guest for paging.

Import makes a page accessible to a secure guest.
On the first import of that page, the page will be cleared by the
Ultravisor before it is given to the guest.

All following imports will decrypt a exported page and verify
integrity before giving the page to the guest.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/uv.h | 51 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 0bfbafcca136..99cdd2034503 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -15,6 +15,7 @@
 #include <linux/errno.h>
 #include <linux/bug.h>
 #include <asm/page.h>
+#include <asm/gmap.h>
 
 #define UVC_RC_EXECUTED		0x0001
 #define UVC_RC_INV_CMD		0x0002
@@ -279,6 +280,54 @@ static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
 	return rc ? -EINVAL : 0;
 }
 
+/*
+ * Requests the Ultravisor to encrypt a guest page and make it
+ * accessible to the host for paging (export).
+ *
+ * @paddr: Absolute host address of page to be exported
+ */
+static inline int uv_convert_from_secure(unsigned long paddr)
+{
+	struct uv_cb_cfs uvcb = {
+		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
+		.header.len = sizeof(uvcb),
+		.paddr = paddr
+	};
+	if (!uv_call(0, (u64)&uvcb))
+		return 0;
+	return -EINVAL;
+}
+
+/*
+ * Requests the Ultravisor to make a page accessible to a guest
+ * (import). If it's brought in the first time, it will be cleared. If
+ * it has been exported before, it will be decrypted and integrity
+ * checked.
+ *
+ * @handle: Ultravisor guest handle
+ * @gaddr: Guest 2 absolute address to be imported
+ */
+static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
+{
+	int cc;
+	struct uv_cb_cts uvcb = {
+		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
+		.header.len = sizeof(uvcb),
+		.guest_handle = gmap->se_handle,
+		.gaddr = gaddr
+	};
+
+	cc = uv_call(0, (u64)&uvcb);
+
+	if (!cc)
+		return 0;
+	if (uvcb.header.rc == 0x104)
+		return -EEXIST;
+	if (uvcb.header.rc == 0x10a)
+		return -EFAULT;
+	return -EINVAL;
+}
+
 void setup_uv(void);
 void adjust_to_uv_max(unsigned long *vmax);
 #else
@@ -286,6 +335,8 @@ void adjust_to_uv_max(unsigned long *vmax);
 static inline void setup_uv(void) {}
 static inline void adjust_to_uv_max(unsigned long *vmax) {}
 static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret) { return 0; }
+static inline int uv_convert_from_secure(unsigned long paddr) { return 0; }
+static inline int uv_convert_to_secure(unsigned long handle, unsigned long gaddr) { return 0; }
 #endif
 
 #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (5 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 06/37] s390: UV: Add import and export to UV library Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-24 16:07   ` David Hildenbrand
                     ` (2 more replies)
  2019-10-24 11:40 ` [RFC 08/37] KVM: s390: add missing include in gmap.h Janosch Frank
                   ` (29 subsequent siblings)
  36 siblings, 3 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

KSM will not work on secure pages, because when the kernel reads a
secure page, it will be encrypted and hence no two pages will look the
same.

Let's mark the guest pages as unmergeable when we transition to secure
mode.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/gmap.h |  1 +
 arch/s390/kvm/kvm-s390.c     |  6 ++++++
 arch/s390/mm/gmap.c          | 28 ++++++++++++++++++----------
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 6efc0b501227..eab6a2ec3599 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -145,4 +145,5 @@ int gmap_mprotect_notify(struct gmap *, unsigned long start,
 
 void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
 			     unsigned long gaddr, unsigned long vmaddr);
+int gmap_mark_unmergeable(void);
 #endif /* _ASM_S390_GMAP_H */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 924132d92782..d1ba12f857e7 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2176,6 +2176,12 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 		if (r)
 			break;
 
+		down_write(&current->mm->mmap_sem);
+		r = gmap_mark_unmergeable();
+		up_write(&current->mm->mmap_sem);
+		if (r)
+			break;
+
 		mutex_lock(&kvm->lock);
 		kvm_s390_vcpu_block_all(kvm);
 		/* FMT 4 SIE needs esca */
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index edcdca97e85e..bf365a09f900 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
 }
 EXPORT_SYMBOL_GPL(s390_enable_sie);
 
+int gmap_mark_unmergeable(void)
+{
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
+				MADV_UNMERGEABLE, &vma->vm_flags)) {
+			mm->context.uses_skeys = 0;
+			return -ENOMEM;
+		}
+	}
+	mm->def_flags &= ~VM_MERGEABLE;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(gmap_mark_unmergeable);
+
 /*
  * Enable storage key handling from now on and initialize the storage
  * keys with the default key.
@@ -2593,7 +2610,6 @@ static const struct mm_walk_ops enable_skey_walk_ops = {
 int s390_enable_skey(void)
 {
 	struct mm_struct *mm = current->mm;
-	struct vm_area_struct *vma;
 	int rc = 0;
 
 	down_write(&mm->mmap_sem);
@@ -2601,15 +2617,7 @@ int s390_enable_skey(void)
 		goto out_up;
 
 	mm->context.uses_skeys = 1;
-	for (vma = mm->mmap; vma; vma = vma->vm_next) {
-		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
-				MADV_UNMERGEABLE, &vma->vm_flags)) {
-			mm->context.uses_skeys = 0;
-			rc = -ENOMEM;
-			goto out_up;
-		}
-	}
-	mm->def_flags &= ~VM_MERGEABLE;
+	gmap_mark_unmergeable();
 
 	walk_page_range(mm, 0, TASK_SIZE, &enable_skey_walk_ops, NULL);
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 08/37] KVM: s390: add missing include in gmap.h
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (6 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-25  8:24   ` David Hildenbrand
  2019-11-13 12:27   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning Janosch Frank
                   ` (28 subsequent siblings)
  36 siblings, 2 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

From: Claudio Imbrenda <imbrenda@linux.ibm.com>

gmap.h references radix trees, but does not include linux/radix-tree.h
itself. Sources that include gmap.h but not also radix-tree.h will
therefore fail to compile.

This simple patch adds the include for linux/radix-tree.h in gmap.h so
that users of gmap.h will be able to compile.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/include/asm/gmap.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index eab6a2ec3599..99b3eedda26e 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -10,6 +10,7 @@
 #define _ASM_S390_GMAP_H
 
 #include <linux/refcount.h>
+#include <linux/radix-tree.h>
 
 /* Generic bits for GMAP notification on DAT table entry changes. */
 #define GMAP_NOTIFY_SHADOW	0x2
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (7 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 08/37] KVM: s390: add missing include in gmap.h Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-25  8:49   ` David Hildenbrand
                     ` (2 more replies)
  2019-10-24 11:40 ` [RFC 10/37] s390: add (non)secure page access exceptions handlers Janosch Frank
                   ` (27 subsequent siblings)
  36 siblings, 3 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

From: Claudio Imbrenda <imbrenda@linux.ibm.com>

Pin the guest pages when they are first accessed, instead of all at
the same time when starting the guest.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/include/asm/gmap.h |  1 +
 arch/s390/include/asm/uv.h   |  6 +++++
 arch/s390/kernel/uv.c        | 20 ++++++++++++++
 arch/s390/kvm/kvm-s390.c     |  2 ++
 arch/s390/kvm/pv.c           | 51 ++++++++++++++++++++++++++++++------
 5 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 99b3eedda26e..483f64427c0e 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -63,6 +63,7 @@ struct gmap {
 	struct gmap *parent;
 	unsigned long orig_asce;
 	unsigned long se_handle;
+	struct page **pinned_pages;
 	int edat_level;
 	bool removed;
 	bool initialized;
diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 99cdd2034503..9ce9363aee1c 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -298,6 +298,7 @@ static inline int uv_convert_from_secure(unsigned long paddr)
 	return -EINVAL;
 }
 
+int kvm_s390_pv_pin_page(struct gmap *gmap, unsigned long gpa);
 /*
  * Requests the Ultravisor to make a page accessible to a guest
  * (import). If it's brought in the first time, it will be cleared. If
@@ -317,6 +318,11 @@ static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
 		.gaddr = gaddr
 	};
 
+	down_read(&gmap->mm->mmap_sem);
+	cc = kvm_s390_pv_pin_page(gmap, gaddr);
+	up_read(&gmap->mm->mmap_sem);
+	if (cc)
+		return cc;
 	cc = uv_call(0, (u64)&uvcb);
 
 	if (!cc)
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index f7778493e829..36554402b5c6 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -98,4 +98,24 @@ void adjust_to_uv_max(unsigned long *vmax)
 	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
 		*vmax = uv_info.max_sec_stor_addr;
 }
+
+int kvm_s390_pv_pin_page(struct gmap *gmap, unsigned long gpa)
+{
+	unsigned long hva, gfn = gpa / PAGE_SIZE;
+	int rc;
+
+	if (!gmap->pinned_pages)
+		return -EINVAL;
+	hva = __gmap_translate(gmap, gpa);
+	if (IS_ERR_VALUE(hva))
+		return -EFAULT;
+	if (gmap->pinned_pages[gfn])
+		return -EEXIST;
+	rc = get_user_pages_fast(hva, 1, FOLL_WRITE, gmap->pinned_pages + gfn);
+	if (rc < 0)
+		return rc;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_s390_pv_pin_page);
+
 #endif
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d1ba12f857e7..490fde080107 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2196,6 +2196,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 		/* All VCPUs have to be destroyed before this call. */
 		mutex_lock(&kvm->lock);
 		kvm_s390_vcpu_block_all(kvm);
+		kvm_s390_pv_unpin(kvm);
 		r = kvm_s390_pv_destroy_vm(kvm);
 		if (!r)
 			kvm_s390_pv_dealloc_vm(kvm);
@@ -2680,6 +2681,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_s390_gisa_destroy(kvm);
 	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
 	    kvm_s390_pv_is_protected(kvm)) {
+		kvm_s390_pv_unpin(kvm);
 		kvm_s390_pv_destroy_vm(kvm);
 		kvm_s390_pv_dealloc_vm(kvm);
 	}
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 80aecd5bea9e..383e660e2221 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -15,8 +15,35 @@
 #include <asm/mman.h>
 #include "kvm-s390.h"
 
+static void unpin_destroy(struct page **pages, int nr)
+{
+	int i;
+	struct page *page;
+	u8 *val;
+
+	for (i = 0; i < nr; i++) {
+		page = pages[i];
+		if (!page)	/* page was never used */
+			continue;
+		val = (void *)page_to_phys(page);
+		READ_ONCE(*val);
+		put_page(page);
+	}
+}
+
+void kvm_s390_pv_unpin(struct kvm *kvm)
+{
+	unsigned long npages = kvm->arch.pv.guest_len / PAGE_SIZE;
+
+	mutex_lock(&kvm->slots_lock);
+	unpin_destroy(kvm->arch.gmap->pinned_pages, npages);
+	mutex_unlock(&kvm->slots_lock);
+}
+
 void kvm_s390_pv_dealloc_vm(struct kvm *kvm)
 {
+	vfree(kvm->arch.gmap->pinned_pages);
+	kvm->arch.gmap->pinned_pages = NULL;
 	vfree(kvm->arch.pv.stor_var);
 	free_pages(kvm->arch.pv.stor_base,
 		   get_order(uv_info.guest_base_stor_len));
@@ -28,7 +55,6 @@ int kvm_s390_pv_alloc_vm(struct kvm *kvm)
 	unsigned long base = uv_info.guest_base_stor_len;
 	unsigned long virt = uv_info.guest_virt_var_stor_len;
 	unsigned long npages = 0, vlen = 0;
-	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
 
 	kvm->arch.pv.stor_var = NULL;
@@ -43,22 +69,26 @@ int kvm_s390_pv_alloc_vm(struct kvm *kvm)
 	 * Slots are sorted by GFN
 	 */
 	mutex_lock(&kvm->slots_lock);
-	slots = kvm_memslots(kvm);
-	memslot = slots->memslots;
+	memslot = kvm_memslots(kvm)->memslots;
 	npages = memslot->base_gfn + memslot->npages;
-
 	mutex_unlock(&kvm->slots_lock);
+
+	kvm->arch.gmap->pinned_pages = vzalloc(npages * sizeof(struct page *));
+	if (!kvm->arch.gmap->pinned_pages)
+		goto out_err;
 	kvm->arch.pv.guest_len = npages * PAGE_SIZE;
 
 	/* Allocate variable storage */
 	vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
 	vlen += uv_info.guest_virt_base_stor_len;
 	kvm->arch.pv.stor_var = vzalloc(vlen);
-	if (!kvm->arch.pv.stor_var) {
-		kvm_s390_pv_dealloc_vm(kvm);
-		return -ENOMEM;
-	}
+	if (!kvm->arch.pv.stor_var)
+		goto out_err;
 	return 0;
+
+out_err:
+	kvm_s390_pv_dealloc_vm(kvm);
+	return -ENOMEM;
 }
 
 int kvm_s390_pv_destroy_vm(struct kvm *kvm)
@@ -216,6 +246,11 @@ int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
 	for (i = 0; i < size / PAGE_SIZE; i++) {
 		uvcb.gaddr = addr + i * PAGE_SIZE;
 		uvcb.tweak[1] = i * PAGE_SIZE;
+		down_read(&kvm->mm->mmap_sem);
+		rc = kvm_s390_pv_pin_page(kvm->arch.gmap, uvcb.gaddr);
+		up_read(&kvm->mm->mmap_sem);
+		if (rc && (rc != -EEXIST))
+			break;
 retry:
 		rc = uv_call(0, (u64)&uvcb);
 		if (!rc)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 10/37] s390: add (non)secure page access exceptions handlers
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (8 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-24 11:40 ` [RFC 11/37] DOCUMENTATION: protvirt: Interrupt injection Janosch Frank
                   ` (26 subsequent siblings)
  36 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

From: Vasily Gorbik <gor@linux.ibm.com>

Add exceptions handlers performing transparent transition of non-secure
pages to secure (import) upon guest access and secure pages to
non-secure (export) upon hypervisor access.

Current assumption is that guest pages are pinned.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
---
 arch/s390/kernel/pgm_check.S |  4 +--
 arch/s390/mm/fault.c         | 64 ++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kernel/pgm_check.S b/arch/s390/kernel/pgm_check.S
index 59dee9d3bebf..27ac4f324c70 100644
--- a/arch/s390/kernel/pgm_check.S
+++ b/arch/s390/kernel/pgm_check.S
@@ -78,8 +78,8 @@ PGM_CHECK(do_dat_exception)		/* 39 */
 PGM_CHECK(do_dat_exception)		/* 3a */
 PGM_CHECK(do_dat_exception)		/* 3b */
 PGM_CHECK_DEFAULT			/* 3c */
-PGM_CHECK_DEFAULT			/* 3d */
-PGM_CHECK_DEFAULT			/* 3e */
+PGM_CHECK(do_secure_storage_access)	/* 3d */
+PGM_CHECK(do_non_secure_storage_access)	/* 3e */
 PGM_CHECK_DEFAULT			/* 3f */
 PGM_CHECK_DEFAULT			/* 40 */
 PGM_CHECK_DEFAULT			/* 41 */
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 7b0bb475c166..0c4577472432 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -38,6 +38,7 @@
 #include <asm/irq.h>
 #include <asm/mmu_context.h>
 #include <asm/facility.h>
+#include <asm/uv.h>
 #include "../kernel/entry.h"
 
 #define __FAIL_ADDR_MASK -4096L
@@ -816,3 +817,66 @@ static int __init pfault_irq_init(void)
 early_initcall(pfault_irq_init);
 
 #endif /* CONFIG_PFAULT */
+
+#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
+
+void do_secure_storage_access(struct pt_regs *regs)
+{
+	unsigned long addr = regs->int_parm_long & __FAIL_ADDR_MASK;
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	struct page *page;
+
+	switch (get_fault_type(regs)) {
+	case USER_FAULT:
+		mm = current->mm;
+		down_read(&mm->mmap_sem);
+		vma = find_vma(mm, addr);
+		if (!vma) {
+			up_read(&mm->mmap_sem);
+			do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
+			break;
+		}
+		page = follow_page(vma, addr, FOLL_GET);
+		if (IS_ERR_OR_NULL(page)) {
+			up_read(&mm->mmap_sem);
+			break;
+		}
+		uv_convert_from_secure(page_to_phys(page));
+		put_page(page);
+		up_read(&mm->mmap_sem);
+		break;
+	case KERNEL_FAULT:
+		uv_convert_from_secure(__pa(addr));
+		break;
+	case VDSO_FAULT:
+		/* fallthrough */
+	case GMAP_FAULT:
+		/* fallthrough */
+	default:
+		do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
+		WARN_ON_ONCE(1);
+	}
+}
+NOKPROBE_SYMBOL(do_secure_storage_access);
+
+void do_non_secure_storage_access(struct pt_regs *regs)
+{
+	unsigned long gaddr = regs->int_parm_long & __FAIL_ADDR_MASK;
+	struct gmap *gmap = (struct gmap *)S390_lowcore.gmap;
+
+	uv_convert_to_secure(gmap, gaddr);
+}
+NOKPROBE_SYMBOL(do_non_secure_storage_access);
+
+#else
+void do_secure_storage_access(struct pt_regs *regs)
+{
+	default_trap_handler(regs);
+}
+
+void do_non_secure_storage_access(struct pt_regs *regs)
+{
+	default_trap_handler(regs);
+}
+#endif
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 11/37] DOCUMENTATION: protvirt: Interrupt injection
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (9 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 10/37] s390: add (non)secure page access exceptions handlers Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-14 13:09   ` Cornelia Huck
  2019-10-24 11:40 ` [RFC 12/37] KVM: s390: protvirt: Handle SE notification interceptions Janosch Frank
                   ` (25 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Interrupt injection has changed a lot for protected guests, as KVM
can't access the cpus' lowcores. New fields in the state description,
like the interrupt injection control, and masked values safeguard the
guest from KVM.

Let's add some documentation to the interrupt injection basics for
protected guests.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 Documentation/virtual/kvm/s390-pv.txt | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/Documentation/virtual/kvm/s390-pv.txt b/Documentation/virtual/kvm/s390-pv.txt
index 86ed95f36759..e09f2dc5f164 100644
--- a/Documentation/virtual/kvm/s390-pv.txt
+++ b/Documentation/virtual/kvm/s390-pv.txt
@@ -21,3 +21,30 @@ normally needed to be able to run a VM, some changes have been made in
 SIE behavior and fields have different meaning for a PVM. SIE exits
 are minimized as much as possible to improve speed and reduce exposed
 guest state.
+
+
+Interrupt injection:
+
+Interrupt injection is safeguarded by the Ultravisor and, as KVM lost
+access to the VCPUs' lowcores, is handled via the format 4 state
+description.
+
+Machine check, external, IO and restart interruptions each can be
+injected on SIE entry via a bit in the interrupt injection control
+field (offset 0x54). If the guest cpu is not enabled for the interrupt
+at the time of injection, a validity interception is recognized. The
+interrupt's data is transported via parts of the interception data
+block.
+
+Program and Service Call exceptions have another layer of
+safeguarding, they are only injectable, when instructions have
+intercepted into KVM and such an exception can be an emulation result.
+
+
+Mask notification interceptions:
+As a replacement for the lctl(g) and lpsw(e) interception, two new
+interception codes have been introduced. One which tells us that CRs
+0, 6 or 14 have been changed and therefore interrupt masking might
+have changed. And one for PSW bit 13 changes. The CRs and the PSW in
+the state description only contain the mask bits and no further info
+like the current instruction address.
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 12/37] KVM: s390: protvirt: Handle SE notification interceptions
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (10 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 11/37] DOCUMENTATION: protvirt: Interrupt injection Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-30 15:50   ` David Hildenbrand
  2019-11-05 18:04   ` Cornelia Huck
  2019-10-24 11:40 ` [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls Janosch Frank
                   ` (24 subsequent siblings)
  36 siblings, 2 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Since KVM doesn't emulate any form of load control and load psw
instructions anymore, we wouldn't get an interception if PSWs or CRs
are changed in the guest. That means we can't inject IRQs right after
the guest is enabled for them.

The new interception codes solve that problem by being a notification
for changes to IRQ enablement relevant bits in CRs 0, 6 and 14, as
well a the machine check mask bit in the PSW.

No special handling is needed for these interception codes, the KVM
pre-run code will consult all necessary CRs and PSW bits and inject
IRQs the guest is enabled for.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  2 ++
 arch/s390/kvm/intercept.c        | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index d4fd0f3af676..6cc3b73ca904 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -210,6 +210,8 @@ struct kvm_s390_sie_block {
 #define ICPT_PARTEXEC	0x38
 #define ICPT_IOINST	0x40
 #define ICPT_KSS	0x5c
+#define ICPT_PV_MCHKR	0x60
+#define ICPT_PV_INT_EN	0x64
 	__u8	icptcode;		/* 0x0050 */
 	__u8	icptstatus;		/* 0x0051 */
 	__u16	ihcpu;			/* 0x0052 */
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index a389fa85cca2..acc1710fc472 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -480,6 +480,24 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 	case ICPT_KSS:
 		rc = kvm_s390_skey_check_enable(vcpu);
 		break;
+	case ICPT_PV_MCHKR:
+		/*
+		 * A protected guest changed PSW bit 13 to one and is now
+		 * enabled for interrupts. The pre-run code will check
+		 * the registers and inject pending MCHKs based on the
+		 * PSW and CRs. No additional work to do.
+		 */
+		rc = 0;
+		break;
+	case  ICPT_PV_INT_EN:
+		/*
+		 * A protected guest changed CR 0,6,14 and may now be
+		 * enabled for interrupts. The pre-run code will check
+		 * the registers and inject pending IRQs based on the
+		 * CRs. No additional work to do.
+		 */
+		rc = 0;
+	break;
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (11 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 12/37] KVM: s390: protvirt: Handle SE notification interceptions Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-30 15:53   ` David Hildenbrand
                     ` (2 more replies)
  2019-10-24 11:40 ` [RFC 14/37] KVM: s390: protvirt: Implement interruption injection Janosch Frank
                   ` (23 subsequent siblings)
  36 siblings, 3 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

From: Michael Mueller <mimu@linux.ibm.com>

Define the interruption injection codes and the related fields in the
sie control block for PVM interruption injection.

Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 6cc3b73ca904..82443236d4cc 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -215,7 +215,15 @@ struct kvm_s390_sie_block {
 	__u8	icptcode;		/* 0x0050 */
 	__u8	icptstatus;		/* 0x0051 */
 	__u16	ihcpu;			/* 0x0052 */
-	__u8	reserved54[2];		/* 0x0054 */
+	__u8	reserved54;		/* 0x0054 */
+#define IICTL_CODE_NONE		 0x00
+#define IICTL_CODE_MCHK		 0x01
+#define IICTL_CODE_EXT		 0x02
+#define IICTL_CODE_IO		 0x03
+#define IICTL_CODE_RESTART	 0x04
+#define IICTL_CODE_SPECIFICATION 0x10
+#define IICTL_CODE_OPERAND	 0x11
+	__u8	iictl;			/* 0x0055 */
 	__u16	ipa;			/* 0x0056 */
 	__u32	ipb;			/* 0x0058 */
 	__u32	scaoh;			/* 0x005c */
@@ -252,7 +260,8 @@ struct kvm_s390_sie_block {
 #define HPID_KVM	0x4
 #define HPID_VSIE	0x5
 	__u8	hpid;			/* 0x00b8 */
-	__u8	reservedb9[11];		/* 0x00b9 */
+	__u8	reservedb9[7];		/* 0x00b9 */
+	__u32	eiparams;		/* 0x00c0 */
 	__u16	extcpuaddr;		/* 0x00c4 */
 	__u16	eic;			/* 0x00c6 */
 	__u32	reservedc8;		/* 0x00c8 */
@@ -268,8 +277,16 @@ struct kvm_s390_sie_block {
 	__u8	oai;			/* 0x00e2 */
 	__u8	armid;			/* 0x00e3 */
 	__u8	reservede4[4];		/* 0x00e4 */
-	__u64	tecmc;			/* 0x00e8 */
-	__u8	reservedf0[12];		/* 0x00f0 */
+	union {
+		__u64	tecmc;		/* 0x00e8 */
+		struct {
+			__u16	subchannel_id;	/* 0x00e8 */
+			__u16	subchannel_nr;	/* 0x00ea */
+			__u32	io_int_parm;	/* 0x00ec */
+			__u32	io_int_word;	/* 0x00f0 */
+		};
+	} __packed;
+	__u8	reservedf4[8];		/* 0x00f4 */
 #define CRYCB_FORMAT_MASK 0x00000003
 #define CRYCB_FORMAT0 0x00000000
 #define CRYCB_FORMAT1 0x00000001
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 14/37] KVM: s390: protvirt: Implement interruption injection
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (12 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-04 10:29   ` David Hildenbrand
  2019-11-14 12:07   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 15/37] KVM: s390: protvirt: Add machine-check interruption injection controls Janosch Frank
                   ` (22 subsequent siblings)
  36 siblings, 2 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

From: Michael Mueller <mimu@linux.ibm.com>

The patch implements interruption injection for the following
list of interruption types:

  - I/O
    __deliver_io (III)

  - External
    __deliver_cpu_timer (IEI)
    __deliver_ckc (IEI)
    __deliver_emergency_signal (IEI)
    __deliver_external_call (IEI)
    __deliver_service (IEI)

  - cpu restart
    __deliver_restart (IRI)

Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [interrupt masking]
---
 arch/s390/include/asm/kvm_host.h |  10 ++
 arch/s390/kvm/interrupt.c        | 182 +++++++++++++++++++++++--------
 2 files changed, 149 insertions(+), 43 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 82443236d4cc..63fc32d38aa9 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -496,6 +496,7 @@ enum irq_types {
 	IRQ_PEND_PFAULT_INIT,
 	IRQ_PEND_EXT_HOST,
 	IRQ_PEND_EXT_SERVICE,
+	IRQ_PEND_EXT_SERVICE_EV,
 	IRQ_PEND_EXT_TIMING,
 	IRQ_PEND_EXT_CPU_TIMER,
 	IRQ_PEND_EXT_CLOCK_COMP,
@@ -540,6 +541,7 @@ enum irq_types {
 			   (1UL << IRQ_PEND_EXT_TIMING)     | \
 			   (1UL << IRQ_PEND_EXT_HOST)       | \
 			   (1UL << IRQ_PEND_EXT_SERVICE)    | \
+			   (1UL << IRQ_PEND_EXT_SERVICE_EV) | \
 			   (1UL << IRQ_PEND_VIRTIO)         | \
 			   (1UL << IRQ_PEND_PFAULT_INIT)    | \
 			   (1UL << IRQ_PEND_PFAULT_DONE))
@@ -556,6 +558,13 @@ enum irq_types {
 #define IRQ_PEND_MCHK_MASK ((1UL << IRQ_PEND_MCHK_REP) | \
 			    (1UL << IRQ_PEND_MCHK_EX))
 
+#define IRQ_PEND_EXT_II_MASK ((1UL << IRQ_PEND_EXT_CPU_TIMER)  | \
+			      (1UL << IRQ_PEND_EXT_CLOCK_COMP) | \
+			      (1UL << IRQ_PEND_EXT_EMERGENCY)  | \
+			      (1UL << IRQ_PEND_EXT_EXTERNAL)   | \
+			      (1UL << IRQ_PEND_EXT_SERVICE)    | \
+			      (1UL << IRQ_PEND_EXT_SERVICE_EV))
+
 struct kvm_s390_interrupt_info {
 	struct list_head list;
 	u64	type;
@@ -614,6 +623,7 @@ struct kvm_s390_local_interrupt {
 
 struct kvm_s390_float_interrupt {
 	unsigned long pending_irqs;
+	unsigned long masked_irqs;
 	spinlock_t lock;
 	struct list_head lists[FIRQ_LIST_COUNT];
 	int counters[FIRQ_MAX_COUNT];
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 165dea4c7f19..c919dfe4dfd3 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -324,8 +324,10 @@ static inline int gisa_tac_ipm_gisc(struct kvm_s390_gisa *gisa, u32 gisc)
 
 static inline unsigned long pending_irqs_no_gisa(struct kvm_vcpu *vcpu)
 {
-	return vcpu->kvm->arch.float_int.pending_irqs |
-		vcpu->arch.local_int.pending_irqs;
+	unsigned long pending = vcpu->kvm->arch.float_int.pending_irqs | vcpu->arch.local_int.pending_irqs;
+
+	pending &= ~vcpu->kvm->arch.float_int.masked_irqs;
+	return pending;
 }
 
 static inline unsigned long pending_irqs(struct kvm_vcpu *vcpu)
@@ -383,10 +385,16 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu)
 		__clear_bit(IRQ_PEND_EXT_CLOCK_COMP, &active_mask);
 	if (!(vcpu->arch.sie_block->gcr[0] & CR0_CPU_TIMER_SUBMASK))
 		__clear_bit(IRQ_PEND_EXT_CPU_TIMER, &active_mask);
-	if (!(vcpu->arch.sie_block->gcr[0] & CR0_SERVICE_SIGNAL_SUBMASK))
+	if (!(vcpu->arch.sie_block->gcr[0] & CR0_SERVICE_SIGNAL_SUBMASK)) {
 		__clear_bit(IRQ_PEND_EXT_SERVICE, &active_mask);
+		__clear_bit(IRQ_PEND_EXT_SERVICE_EV, &active_mask);
+	}
 	if (psw_mchk_disabled(vcpu))
 		active_mask &= ~IRQ_PEND_MCHK_MASK;
+	/* PV guest cpus can have a single interruption injected at a time. */
+	if (kvm_s390_pv_is_protected(vcpu->kvm) &&
+	    vcpu->arch.sie_block->iictl != IICTL_CODE_NONE)
+		active_mask &= ~(IRQ_PEND_EXT_II_MASK | IRQ_PEND_IO_MASK);
 	/*
 	 * Check both floating and local interrupt's cr14 because
 	 * bit IRQ_PEND_MCHK_REP could be set in both cases.
@@ -479,19 +487,23 @@ static void set_intercept_indicators(struct kvm_vcpu *vcpu)
 static int __must_check __deliver_cpu_timer(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
-	int rc;
+	int rc = 0;
 
 	vcpu->stat.deliver_cputm++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_CPU_TIMER,
 					 0, 0);
-
-	rc  = put_guest_lc(vcpu, EXT_IRQ_CPU_TIMER,
-			   (u16 *)__LC_EXT_INT_CODE);
-	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
-	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
-			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
-	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
-			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
+		vcpu->arch.sie_block->eic = EXT_IRQ_CPU_TIMER;
+	} else {
+		rc  = put_guest_lc(vcpu, EXT_IRQ_CPU_TIMER,
+				   (u16 *)__LC_EXT_INT_CODE);
+		rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
+		rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
+				     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+		rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
+				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	}
 	clear_bit(IRQ_PEND_EXT_CPU_TIMER, &li->pending_irqs);
 	return rc ? -EFAULT : 0;
 }
@@ -499,19 +511,23 @@ static int __must_check __deliver_cpu_timer(struct kvm_vcpu *vcpu)
 static int __must_check __deliver_ckc(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
-	int rc;
+	int rc = 0;
 
 	vcpu->stat.deliver_ckc++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_CLOCK_COMP,
 					 0, 0);
-
-	rc  = put_guest_lc(vcpu, EXT_IRQ_CLK_COMP,
-			   (u16 __user *)__LC_EXT_INT_CODE);
-	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
-	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
-			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
-	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
-			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
+		vcpu->arch.sie_block->eic = EXT_IRQ_CLK_COMP;
+	} else {
+		rc  = put_guest_lc(vcpu, EXT_IRQ_CLK_COMP,
+				   (u16 __user *)__LC_EXT_INT_CODE);
+		rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
+		rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
+				     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+		rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
+				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	}
 	clear_bit(IRQ_PEND_EXT_CLOCK_COMP, &li->pending_irqs);
 	return rc ? -EFAULT : 0;
 }
@@ -533,7 +549,6 @@ static int __must_check __deliver_pfault_init(struct kvm_vcpu *vcpu)
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
 					 KVM_S390_INT_PFAULT_INIT,
 					 0, ext.ext_params2);
-
 	rc  = put_guest_lc(vcpu, EXT_IRQ_CP_SERVICE, (u16 *) __LC_EXT_INT_CODE);
 	rc |= put_guest_lc(vcpu, PFAULT_INIT, (u16 *) __LC_EXT_CPU_ADDR);
 	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
@@ -696,17 +711,21 @@ static int __must_check __deliver_machine_check(struct kvm_vcpu *vcpu)
 static int __must_check __deliver_restart(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
-	int rc;
+	int rc = 0;
 
 	VCPU_EVENT(vcpu, 3, "%s", "deliver: cpu restart");
 	vcpu->stat.deliver_restart_signal++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_RESTART, 0, 0);
 
-	rc  = write_guest_lc(vcpu,
-			     offsetof(struct lowcore, restart_old_psw),
-			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
-	rc |= read_guest_lc(vcpu, offsetof(struct lowcore, restart_psw),
-			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_RESTART;
+	} else {
+		rc  = write_guest_lc(vcpu,
+				     offsetof(struct lowcore, restart_old_psw),
+				     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+		rc |= read_guest_lc(vcpu, offsetof(struct lowcore, restart_psw),
+				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	}
 	clear_bit(IRQ_PEND_RESTART, &li->pending_irqs);
 	return rc ? -EFAULT : 0;
 }
@@ -748,6 +767,12 @@ static int __must_check __deliver_emergency_signal(struct kvm_vcpu *vcpu)
 	vcpu->stat.deliver_emergency_signal++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_EMERGENCY,
 					 cpu_addr, 0);
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
+		vcpu->arch.sie_block->eic = EXT_IRQ_EMERGENCY_SIG;
+		vcpu->arch.sie_block->extcpuaddr = cpu_addr;
+		return 0;
+	}
 
 	rc  = put_guest_lc(vcpu, EXT_IRQ_EMERGENCY_SIG,
 			   (u16 *)__LC_EXT_INT_CODE);
@@ -776,6 +801,12 @@ static int __must_check __deliver_external_call(struct kvm_vcpu *vcpu)
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
 					 KVM_S390_INT_EXTERNAL_CALL,
 					 extcall.code, 0);
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
+		vcpu->arch.sie_block->eic = EXT_IRQ_EXTERNAL_CALL;
+		vcpu->arch.sie_block->extcpuaddr = extcall.code;
+		return 0;
+	}
 
 	rc  = put_guest_lc(vcpu, EXT_IRQ_EXTERNAL_CALL,
 			   (u16 *)__LC_EXT_INT_CODE);
@@ -902,6 +933,31 @@ static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
 	return rc ? -EFAULT : 0;
 }
 
+#define SCCB_MASK 0xFFFFFFF8
+#define SCCB_EVENT_PENDING 0x3
+
+static int write_sclp(struct kvm_vcpu *vcpu, u32 parm)
+{
+	int rc;
+
+	if (kvm_s390_pv_handle_cpu(vcpu)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
+		vcpu->arch.sie_block->eic = EXT_IRQ_SERVICE_SIG;
+		vcpu->arch.sie_block->eiparams = parm;
+		return 0;
+	}
+
+	rc  = put_guest_lc(vcpu, EXT_IRQ_SERVICE_SIG, (u16 *)__LC_EXT_INT_CODE);
+	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
+	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
+			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
+			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	rc |= put_guest_lc(vcpu, parm,
+			   (u32 *)__LC_EXT_PARAMS);
+	return rc;
+}
+
 static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
@@ -909,13 +965,17 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
 	int rc = 0;
 
 	spin_lock(&fi->lock);
-	if (!(test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs))) {
+	if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs) ||
+	    !(test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs))) {
 		spin_unlock(&fi->lock);
 		return 0;
 	}
 	ext = fi->srv_signal;
 	memset(&fi->srv_signal, 0, sizeof(ext));
 	clear_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs);
+	clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
+	if (kvm_s390_pv_is_protected(vcpu->kvm))
+		set_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs);
 	spin_unlock(&fi->lock);
 
 	VCPU_EVENT(vcpu, 4, "deliver: sclp parameter 0x%x",
@@ -924,15 +984,33 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_SERVICE,
 					 ext.ext_params, 0);
 
-	rc  = put_guest_lc(vcpu, EXT_IRQ_SERVICE_SIG, (u16 *)__LC_EXT_INT_CODE);
-	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
-	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
-			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
-	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
-			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
-	rc |= put_guest_lc(vcpu, ext.ext_params,
-			   (u32 *)__LC_EXT_PARAMS);
+	rc = write_sclp(vcpu, ext.ext_params);
+	return rc ? -EFAULT : 0;
+}
 
+static int __must_check __deliver_service_ev(struct kvm_vcpu *vcpu)
+{
+	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
+	struct kvm_s390_ext_info ext;
+	int rc = 0;
+
+	spin_lock(&fi->lock);
+	if (!(test_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs))) {
+		spin_unlock(&fi->lock);
+		return 0;
+	}
+	ext = fi->srv_signal;
+	/* only clear the event bit */
+	fi->srv_signal.ext_params &= ~SCCB_EVENT_PENDING;
+	clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
+	spin_unlock(&fi->lock);
+
+	VCPU_EVENT(vcpu, 4, "%s", "deliver: sclp parameter event");
+	vcpu->stat.deliver_service_signal++;
+	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_SERVICE,
+					 ext.ext_params, 0);
+
+	rc = write_sclp(vcpu, SCCB_EVENT_PENDING);
 	return rc ? -EFAULT : 0;
 }
 
@@ -1028,6 +1106,15 @@ static int __do_deliver_io(struct kvm_vcpu *vcpu, struct kvm_s390_io_info *io)
 {
 	int rc;
 
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_IO;
+		vcpu->arch.sie_block->subchannel_id = io->subchannel_id;
+		vcpu->arch.sie_block->subchannel_nr = io->subchannel_nr;
+		vcpu->arch.sie_block->io_int_parm = io->io_int_parm;
+		vcpu->arch.sie_block->io_int_word = io->io_int_word;
+		return 0;
+	}
+
 	rc  = put_guest_lc(vcpu, io->subchannel_id, (u16 *)__LC_SUBCHANNEL_ID);
 	rc |= put_guest_lc(vcpu, io->subchannel_nr, (u16 *)__LC_SUBCHANNEL_NR);
 	rc |= put_guest_lc(vcpu, io->io_int_parm, (u32 *)__LC_IO_INT_PARM);
@@ -1329,6 +1416,9 @@ int __must_check kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu)
 		case IRQ_PEND_EXT_SERVICE:
 			rc = __deliver_service(vcpu);
 			break;
+		case IRQ_PEND_EXT_SERVICE_EV:
+			rc = __deliver_service_ev(vcpu);
+			break;
 		case IRQ_PEND_PFAULT_DONE:
 			rc = __deliver_pfault_done(vcpu);
 			break;
@@ -1421,7 +1511,7 @@ static int __inject_extcall(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
 	if (kvm_get_vcpu_by_id(vcpu->kvm, src_id) == NULL)
 		return -EINVAL;
 
-	if (sclp.has_sigpif)
+	if (sclp.has_sigpif && !kvm_s390_pv_handle_cpu(vcpu))
 		return sca_inject_ext_call(vcpu, src_id);
 
 	if (test_and_set_bit(IRQ_PEND_EXT_EXTERNAL, &li->pending_irqs))
@@ -1681,9 +1771,6 @@ struct kvm_s390_interrupt_info *kvm_s390_get_io_int(struct kvm *kvm,
 	return inti;
 }
 
-#define SCCB_MASK 0xFFFFFFF8
-#define SCCB_EVENT_PENDING 0x3
-
 static int __inject_service(struct kvm *kvm,
 			     struct kvm_s390_interrupt_info *inti)
 {
@@ -1692,6 +1779,11 @@ static int __inject_service(struct kvm *kvm,
 	kvm->stat.inject_service_signal++;
 	spin_lock(&fi->lock);
 	fi->srv_signal.ext_params |= inti->ext.ext_params & SCCB_EVENT_PENDING;
+
+	/* We always allow events, track them separately from the sccb ints */
+	if (fi->srv_signal.ext_params & SCCB_EVENT_PENDING)
+		set_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
+
 	/*
 	 * Early versions of the QEMU s390 bios will inject several
 	 * service interrupts after another without handling a
@@ -1834,7 +1926,8 @@ static void __floating_irq_kick(struct kvm *kvm, u64 type)
 		break;
 	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
 		if (!(type & KVM_S390_INT_IO_AI_MASK &&
-		      kvm->arch.gisa_int.origin))
+		      kvm->arch.gisa_int.origin) ||
+		      kvm_s390_pv_handle_cpu(dst_vcpu))
 			kvm_s390_set_cpuflags(dst_vcpu, CPUSTAT_IO_INT);
 		break;
 	default:
@@ -2082,6 +2175,8 @@ void kvm_s390_clear_float_irqs(struct kvm *kvm)
 
 	spin_lock(&fi->lock);
 	fi->pending_irqs = 0;
+	if (!kvm_s390_pv_is_protected(kvm))
+		fi->masked_irqs = 0;
 	memset(&fi->srv_signal, 0, sizeof(fi->srv_signal));
 	memset(&fi->mchk, 0, sizeof(fi->mchk));
 	for (i = 0; i < FIRQ_LIST_COUNT; i++)
@@ -2146,7 +2241,8 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 __user *usrbuf, u64 len)
 			n++;
 		}
 	}
-	if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs)) {
+	if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs) ||
+	    test_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs)) {
 		if (n == max_irqs) {
 			/* signal userspace to try again */
 			ret = -ENOMEM;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 15/37] KVM: s390: protvirt: Add machine-check interruption injection controls
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (13 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 14/37] KVM: s390: protvirt: Implement interruption injection Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-13 14:49   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 16/37] KVM: s390: protvirt: Implement machine-check interruption injection Janosch Frank
                   ` (21 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

From: Michael Mueller <mimu@linux.ibm.com>

The following fields are added to the sie control block type 4:
     - Machine Check Interruption Code (mcic)
     - External Damage Code (edc)
     - Failing Storage Address (faddr)

Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 33 +++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 63fc32d38aa9..0ab309b7bf4c 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -261,16 +261,31 @@ struct kvm_s390_sie_block {
 #define HPID_VSIE	0x5
 	__u8	hpid;			/* 0x00b8 */
 	__u8	reservedb9[7];		/* 0x00b9 */
-	__u32	eiparams;		/* 0x00c0 */
-	__u16	extcpuaddr;		/* 0x00c4 */
-	__u16	eic;			/* 0x00c6 */
+	union {
+		struct {
+			__u32	eiparams;	/* 0x00c0 */
+			__u16	extcpuaddr;	/* 0x00c4 */
+			__u16	eic;		/* 0x00c6 */
+		};
+		__u64	mcic;			/* 0x00c0 */
+	} __packed;
 	__u32	reservedc8;		/* 0x00c8 */
-	__u16	pgmilc;			/* 0x00cc */
-	__u16	iprcc;			/* 0x00ce */
-	__u32	dxc;			/* 0x00d0 */
-	__u16	mcn;			/* 0x00d4 */
-	__u8	perc;			/* 0x00d6 */
-	__u8	peratmid;		/* 0x00d7 */
+	union {
+		struct {
+			__u16	pgmilc;		/* 0x00cc */
+			__u16	iprcc;		/* 0x00ce */
+		};
+		__u32	edc;			/* 0x00cc */
+	} __packed;
+	union {
+		struct {
+			__u32	dxc;		/* 0x00d0 */
+			__u16	mcn;		/* 0x00d4 */
+			__u8	perc;		/* 0x00d6 */
+			__u8	peratmid;	/* 0x00d7 */
+		};
+		__u64	faddr;			/* 0x00d0 */
+	} __packed;
 	__u64	peraddr;		/* 0x00d8 */
 	__u8	eai;			/* 0x00e0 */
 	__u8	peraid;			/* 0x00e1 */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 16/37] KVM: s390: protvirt: Implement machine-check interruption injection
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (14 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 15/37] KVM: s390: protvirt: Add machine-check interruption injection controls Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-05 18:11   ` Cornelia Huck
  2019-10-24 11:40 ` [RFC 17/37] DOCUMENTATION: protvirt: Instruction emulation Janosch Frank
                   ` (20 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

From: Michael Mueller <mimu@linux.ibm.com>

Similar to external interrupts, the hypervisor can inject machine
checks by providing the right data in the interrupt injection controls.

Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
---
 arch/s390/kvm/interrupt.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index c919dfe4dfd3..1f87c7d3fa3e 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -568,6 +568,14 @@ static int __write_machine_check(struct kvm_vcpu *vcpu,
 	union mci mci;
 	int rc;
 
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_MCHK;
+		vcpu->arch.sie_block->mcic = mchk->mcic;
+		vcpu->arch.sie_block->faddr = mchk->failing_storage_address;
+		vcpu->arch.sie_block->edc = mchk->ext_damage_code;
+		return 0;
+	}
+
 	mci.val = mchk->mcic;
 	/* take care of lazy register loading */
 	save_fpu_regs();
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 17/37] DOCUMENTATION: protvirt: Instruction emulation
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (15 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 16/37] KVM: s390: protvirt: Implement machine-check interruption injection Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-14 15:15   ` Cornelia Huck
  2019-10-24 11:40 ` [RFC 18/37] KVM: s390: protvirt: Handle spec exception loops Janosch Frank
                   ` (19 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

As guest memory is inaccessible and information about the guest's
state is very limited, new ways for instruction emulation have been
introduced.

With a bounce area for guest GRs and instruction data, guest state
leaks can be limited by the Ultravisor. KVM now has to move
instruction input and output through these areas.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 Documentation/virtual/kvm/s390-pv.txt | 47 +++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/Documentation/virtual/kvm/s390-pv.txt b/Documentation/virtual/kvm/s390-pv.txt
index e09f2dc5f164..cb08d78a7922 100644
--- a/Documentation/virtual/kvm/s390-pv.txt
+++ b/Documentation/virtual/kvm/s390-pv.txt
@@ -48,3 +48,50 @@ interception codes have been introduced. One which tells us that CRs
 have changed. And one for PSW bit 13 changes. The CRs and the PSW in
 the state description only contain the mask bits and no further info
 like the current instruction address.
+
+
+Instruction emulation:
+With the format 4 state description the SIE instruction already
+interprets more instructions than it does with format 2. As it is not
+able to interpret all instruction, the SIE and the UV safeguard KVM's
+emulation inputs and outputs.
+
+Guest GRs and most of the instruction data, like IO data structures
+are filtered. Instruction data is copied to and from the Secure
+Instruction Data Area. Guest GRs are put into / retrieved from the
+Interception-Data block.
+
+The Interception-Data block from the state description's offset 0x380
+contains GRs 0 - 16. Only GR values needed to emulate an instruction
+will be copied into this area.
+
+The Interception Parameters state description field still contains the
+the bytes of the instruction text but with pre-set register
+values. I.e. each instruction always uses the same instruction text,
+to not leak guest instruction text.
+
+The Secure Instruction Data Area contains instruction storage
+data. Data for diag 500 is exempt from that and has to be moved
+through shared buffers to KVM.
+
+When SIE intercepts an instruction, it will only allow data and
+program interrupts for this instruction to be moved to the guest via
+the two data areas discussed before. Other data is ignored or results
+in validity interceptions.
+
+
+Instruction emulation interceptions:
+There are two types of SIE secure instruction intercepts. The normal
+and the notification type. Normal secure instruction intercepts will
+make the guest pending for instruction completion of the intercepted
+instruction type, i.e. on SIE entry it is attempted to complete
+emulation of the instruction with the data provided by KVM. That might
+be a program exception or instruction completion.
+
+The notification type intercepts inform KVM about guest environment
+changes due to guest instruction interpretation. Such an interception
+is recognized for the store prefix instruction and provides the new
+lowcore location for mapping change notification arming. Any KVM data
+in the data areas is ignored, program exceptions are not injected and
+execution continues on next SIE entry, as if no intercept had
+happened.
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 18/37] KVM: s390: protvirt: Handle spec exception loops
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (16 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 17/37] DOCUMENTATION: protvirt: Instruction emulation Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-14 14:22   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling Janosch Frank
                   ` (18 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

SIE intercept code 8 is used only on exception loops for protected
guests. That means we need stop the guest when we see it.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/intercept.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index acc1710fc472..b013a9c88d43 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -231,6 +231,13 @@ static int handle_prog(struct kvm_vcpu *vcpu)
 
 	vcpu->stat.exit_program_interruption++;
 
+	/*
+	 * Intercept 8 indicates a loop of specification exceptions
+	 * for protected guests
+	 */
+	if (kvm_s390_pv_is_protected(vcpu->kvm))
+		return -EOPNOTSUPP;
+
 	if (guestdbg_enabled(vcpu) && per_event(vcpu)) {
 		rc = kvm_s390_handle_per_event(vcpu);
 		if (rc)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (17 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 18/37] KVM: s390: protvirt: Handle spec exception loops Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-04 11:25   ` David Hildenbrand
  2019-11-14 14:44   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 20/37] KVM: S390: protvirt: Introduce instruction data area bounce buffer Janosch Frank
                   ` (17 subsequent siblings)
  36 siblings, 2 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Guest registers for protected guests are stored at offset 0x380.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  4 +++-
 arch/s390/kvm/kvm-s390.c         | 11 +++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 0ab309b7bf4c..5deabf9734d9 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -336,7 +336,9 @@ struct kvm_s390_itdb {
 struct sie_page {
 	struct kvm_s390_sie_block sie_block;
 	struct mcck_volatile_info mcck_info;	/* 0x0200 */
-	__u8 reserved218[1000];		/* 0x0218 */
+	__u8 reserved218[360];		/* 0x0218 */
+	__u64 pv_grregs[16];		/* 0x380 */
+	__u8 reserved400[512];
 	struct kvm_s390_itdb itdb;	/* 0x0600 */
 	__u8 reserved700[2304];		/* 0x0700 */
 };
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 490fde080107..97d3a81e5074 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3965,6 +3965,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
 static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
 	int rc, exit_reason;
+	struct sie_page *sie_page = (struct sie_page *)vcpu->arch.sie_block;
 
 	/*
 	 * We try to hold kvm->srcu during most of vcpu_run (except when run-
@@ -3986,8 +3987,18 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 		guest_enter_irqoff();
 		__disable_cpu_timer_accounting(vcpu);
 		local_irq_enable();
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			memcpy(sie_page->pv_grregs,
+			       vcpu->run->s.regs.gprs,
+			       sizeof(sie_page->pv_grregs));
+		}
 		exit_reason = sie64a(vcpu->arch.sie_block,
 				     vcpu->run->s.regs.gprs);
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			memcpy(vcpu->run->s.regs.gprs,
+			       sie_page->pv_grregs,
+			       sizeof(sie_page->pv_grregs));
+		}
 		local_irq_disable();
 		__enable_cpu_timer_accounting(vcpu);
 		guest_exit_irqoff();
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 20/37] KVM: S390: protvirt: Introduce instruction data area bounce buffer
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (18 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-14 15:36   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 21/37] KVM: S390: protvirt: Instruction emulation Janosch Frank
                   ` (16 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Now that we can't access guest memory anymore, we have a dedicated
sattelite block that's a bounce buffer for instruction data.

We re-use the memop interface to copy the instruction data to / from
userspace. This lets us re-use a lot of QEMU code which used that
interface to make logical guest memory accesses which are not possible
anymore in protected mode anyway.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  5 ++++-
 arch/s390/kvm/kvm-s390.c         | 31 +++++++++++++++++++++++++++++++
 arch/s390/kvm/pv.c               |  9 +++++++++
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 5deabf9734d9..2a8a1e21e1c3 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -308,7 +308,10 @@ struct kvm_s390_sie_block {
 #define CRYCB_FORMAT2 0x00000003
 	__u32	crycbd;			/* 0x00fc */
 	__u64	gcr[16];		/* 0x0100 */
-	__u64	gbea;			/* 0x0180 */
+	union {
+		__u64	gbea;			/* 0x0180 */
+		__u64	sidad;
+	};
 	__u8    reserved188[8];		/* 0x0188 */
 	__u64   sdnxo;			/* 0x0190 */
 	__u8    reserved198[8];		/* 0x0198 */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 97d3a81e5074..6747cb6cf062 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4416,6 +4416,13 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 	if (mop->size > MEM_OP_MAX_SIZE)
 		return -E2BIG;
 
+	/* Protected guests move instruction data over the satellite
+	 * block which has its own size limit
+	 */
+	if (kvm_s390_pv_is_protected(vcpu->kvm) &&
+	    mop->size > ((vcpu->arch.sie_block->sidad & 0x0f) + 1) * PAGE_SIZE)
+		return -E2BIG;
+
 	if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
 		tmpbuf = vmalloc(mop->size);
 		if (!tmpbuf)
@@ -4427,10 +4434,22 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 	switch (mop->op) {
 	case KVM_S390_MEMOP_LOGICAL_READ:
 		if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
+			if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+				r = 0;
+				break;
+			}
 			r = check_gva_range(vcpu, mop->gaddr, mop->ar,
 					    mop->size, GACC_FETCH);
 			break;
 		}
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			r = 0;
+			if (copy_to_user(uaddr, (void *)vcpu->arch.sie_block->sidad +
+					 (mop->gaddr & ~PAGE_MASK),
+					 mop->size))
+				r = -EFAULT;
+			break;
+		}
 		r = read_guest(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size);
 		if (r == 0) {
 			if (copy_to_user(uaddr, tmpbuf, mop->size))
@@ -4439,10 +4458,22 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 		break;
 	case KVM_S390_MEMOP_LOGICAL_WRITE:
 		if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
+			if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+				r = 0;
+				break;
+			}
 			r = check_gva_range(vcpu, mop->gaddr, mop->ar,
 					    mop->size, GACC_STORE);
 			break;
 		}
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			r = 0;
+			if (copy_from_user((void *)vcpu->arch.sie_block->sidad +
+					   (mop->gaddr & ~PAGE_MASK), uaddr,
+					   mop->size))
+				r = -EFAULT;
+			break;
+		}
 		if (copy_from_user(tmpbuf, uaddr, mop->size)) {
 			r = -EFAULT;
 			break;
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 383e660e2221..be7d558ab897 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -119,6 +119,7 @@ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
 
 	free_pages(vcpu->arch.pv.stor_base,
 		   get_order(uv_info.guest_cpu_stor_len));
+	free_page(vcpu->arch.sie_block->sidad);
 	/* Clear cpu and vm handle */
 	memset(&vcpu->arch.sie_block->reserved10, 0,
 	       sizeof(vcpu->arch.sie_block->reserved10));
@@ -150,6 +151,14 @@ int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
 	uvcb.state_origin = (u64)vcpu->arch.sie_block;
 	uvcb.stor_origin = (u64)vcpu->arch.pv.stor_base;
 
+	/* Alloc Secure Instruction Data Area Designation */
+	vcpu->arch.sie_block->sidad = __get_free_page(GFP_KERNEL | __GFP_ZERO);
+	if (!vcpu->arch.sie_block->sidad) {
+		free_pages(vcpu->arch.pv.stor_base,
+			   get_order(uv_info.guest_cpu_stor_len));
+		return -ENOMEM;
+	}
+
 	rc = uv_call(0, (u64)&uvcb);
 	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
 		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 21/37] KVM: S390: protvirt: Instruction emulation
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (19 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 20/37] KVM: S390: protvirt: Introduce instruction data area bounce buffer Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-14 15:38   ` Cornelia Huck
  2019-10-24 11:40 ` [RFC 22/37] KVM: s390: protvirt: Add SCLP handling Janosch Frank
                   ` (15 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

We have two new SIE exit codes 104 for a secure instruction
interception, on which the SIE needs hypervisor action to complete the
instruction.

And 108 which is merely a notification and provides data for tracking
and management, like for the lowcore we set notification bits for the
lowcore pages.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  2 ++
 arch/s390/kvm/intercept.c        | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 2a8a1e21e1c3..a42dfe98128b 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -212,6 +212,8 @@ struct kvm_s390_sie_block {
 #define ICPT_KSS	0x5c
 #define ICPT_PV_MCHKR	0x60
 #define ICPT_PV_INT_EN	0x64
+#define ICPT_PV_INSTR	0x68
+#define ICPT_PV_NOT	0x6c
 	__u8	icptcode;		/* 0x0050 */
 	__u8	icptstatus;		/* 0x0051 */
 	__u16	ihcpu;			/* 0x0052 */
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index b013a9c88d43..a1df8a43c88b 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -451,6 +451,23 @@ static int handle_operexc(struct kvm_vcpu *vcpu)
 	return kvm_s390_inject_program_int(vcpu, PGM_OPERATION);
 }
 
+static int handle_pv_spx(struct kvm_vcpu *vcpu)
+{
+	u32 pref = *(u32 *)vcpu->arch.sie_block->sidad;
+
+	kvm_s390_set_prefix(vcpu, pref);
+	trace_kvm_s390_handle_prefix(vcpu, 1, pref);
+	return 0;
+}
+
+static int handle_pv_not(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.sie_block->ipa == 0xb210)
+		return handle_pv_spx(vcpu);
+
+	return handle_instruction(vcpu);
+}
+
 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 {
 	int rc, per_rc = 0;
@@ -505,6 +522,12 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 		 */
 		rc = 0;
 	break;
+	case ICPT_PV_INSTR:
+		rc = handle_instruction(vcpu);
+		break;
+	case ICPT_PV_NOT:
+		rc = handle_pv_not(vcpu);
+		break;
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 22/37] KVM: s390: protvirt: Add SCLP handling
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (20 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 21/37] KVM: S390: protvirt: Instruction emulation Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-24 11:40 ` [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected Janosch Frank
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

From: Christian Borntraeger <borntraeger@de.ibm.com>

Unmask the SCLP IRQ once we get the notification intercept on SCLP.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/intercept.c | 25 +++++++++++++++++++++++++
 arch/s390/kvm/kvm-s390.c  |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index a1df8a43c88b..510b1dee3320 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -460,11 +460,36 @@ static int handle_pv_spx(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static int handle_pv_sclp(struct kvm_vcpu *vcpu)
+{
+	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
+
+	spin_lock(&fi->lock);
+	/*
+	 * 2 cases:
+	 * a: an sccb answering interrupt was already pending or in flight.
+	 *    As the sccb value is not used we can simply set some more bits
+	 *    and make sure that we deliver something
+	 * b: an error sccb interrupt needs to be injected so we also inject
+	 *    something and let firmware do the right thing.
+	 * This makes sure, that both errors and real sccb returns will only
+	 * be delivered when we are unmasked.
+	 */
+	fi->srv_signal.ext_params |= 0x43000;
+	set_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs);
+	clear_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs);
+	spin_unlock(&fi->lock);
+	return 0;
+}
+
 static int handle_pv_not(struct kvm_vcpu *vcpu)
 {
 	if (vcpu->arch.sie_block->ipa == 0xb210)
 		return handle_pv_spx(vcpu);
 
+	if (vcpu->arch.sie_block->ipa == 0xb220)
+		return handle_pv_sclp(vcpu);
+
 	return handle_instruction(vcpu);
 }
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 6747cb6cf062..eddc9508c1b1 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2189,6 +2189,8 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 		if (!r)
 			r = kvm_s390_pv_create_vm(kvm);
 		kvm_s390_vcpu_unblock_all(kvm);
+		/* we need to block service interrupts from now on */
+		set_bit(IRQ_PEND_EXT_SERVICE,&kvm->arch.float_int.masked_irqs);
 		mutex_unlock(&kvm->lock);
 		break;
 	}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (21 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 22/37] KVM: s390: protvirt: Add SCLP handling Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-18 16:39   ` Cornelia Huck
  2019-11-19 10:18   ` David Hildenbrand
  2019-10-24 11:40 ` [RFC 24/37] KVM: s390: protvirt: Write sthyi data to instruction data area Janosch Frank
                   ` (13 subsequent siblings)
  36 siblings, 2 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index eddc9508c1b1..17a78774c617 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3646,6 +3646,15 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
 		rc = gmap_mprotect_notify(vcpu->arch.gmap,
 					  kvm_s390_get_prefix(vcpu),
 					  PAGE_SIZE * 2, PROT_WRITE);
+		if (!rc && kvm_s390_pv_is_protected(vcpu->kvm)) {
+			rc = uv_convert_to_secure(vcpu->arch.gmap,
+						  kvm_s390_get_prefix(vcpu));
+			WARN_ON_ONCE(rc && rc != -EEXIST);
+			rc = uv_convert_to_secure(vcpu->arch.gmap,
+						  kvm_s390_get_prefix(vcpu) + PAGE_SIZE);
+			WARN_ON_ONCE(rc && rc != -EEXIST);
+			rc = 0;
+		}
 		if (rc) {
 			kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
 			return rc;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 24/37] KVM: s390: protvirt: Write sthyi data to instruction data area
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (22 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-15  8:04   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 25/37] KVM: s390: protvirt: STSI handling Janosch Frank
                   ` (12 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

STHYI data has to go through the bounce buffer.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/intercept.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 510b1dee3320..37cb62bc261b 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -391,7 +391,7 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
 		goto out;
 	}
 
-	if (addr & ~PAGE_MASK)
+	if (!kvm_s390_pv_is_protected(vcpu->kvm) && (addr & ~PAGE_MASK))
 		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 
 	sctns = (void *)get_zeroed_page(GFP_KERNEL);
@@ -402,10 +402,15 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
 
 out:
 	if (!cc) {
-		r = write_guest(vcpu, addr, reg2, sctns, PAGE_SIZE);
-		if (r) {
-			free_page((unsigned long)sctns);
-			return kvm_s390_inject_prog_cond(vcpu, r);
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			memcpy((void *)vcpu->arch.sie_block->sidad, sctns,
+			       PAGE_SIZE);
+		} else {
+			r = write_guest(vcpu, addr, reg2, sctns, PAGE_SIZE);
+			if (r) {
+				free_page((unsigned long)sctns);
+				return kvm_s390_inject_prog_cond(vcpu, r);
+			}
 		}
 	}
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 25/37] KVM: s390: protvirt: STSI handling
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (23 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 24/37] KVM: s390: protvirt: Write sthyi data to instruction data area Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-15  8:27   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 26/37] KVM: s390: protvirt: Only sync fmt4 registers Janosch Frank
                   ` (11 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Save response to sidad and disable address checking for protected
guests.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/priv.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index ed52ffa8d5d4..06c7e7a10825 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -872,7 +872,7 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
 
 	operand2 = kvm_s390_get_base_disp_s(vcpu, &ar);
 
-	if (operand2 & 0xfff)
+	if (!kvm_s390_pv_is_protected(vcpu->kvm) && (operand2 & 0xfff))
 		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 
 	switch (fc) {
@@ -893,8 +893,13 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
 		handle_stsi_3_2_2(vcpu, (void *) mem);
 		break;
 	}
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		memcpy((void *)vcpu->arch.sie_block->sidad, (void *)mem,
+		       PAGE_SIZE);
+		rc = 0;
+	} else
+		rc = write_guest(vcpu, operand2, ar, (void *)mem, PAGE_SIZE);
 
-	rc = write_guest(vcpu, operand2, ar, (void *)mem, PAGE_SIZE);
 	if (rc) {
 		rc = kvm_s390_inject_prog_cond(vcpu, rc);
 		goto out;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 26/37] KVM: s390: protvirt: Only sync fmt4 registers
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (24 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 25/37] KVM: s390: protvirt: STSI handling Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-15  9:02   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 27/37] KVM: s390: protvirt: SIGP handling Janosch Frank
                   ` (10 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

A lot of the registers are controlled by the Ultravisor and never
visible to KVM. Also some registers are overlayed, like gbea is with
sidad, which might leak data to userspace.

Hence we sync a minimal set of registers for both SIE formats and then
check and sync format 2 registers if necessary.

Also we disable set/get one reg for the same reason. It's an old
interface anyway.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 138 +++++++++++++++++++++++----------------
 1 file changed, 82 insertions(+), 56 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 17a78774c617..f623c64aeade 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2997,7 +2997,8 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
 	/* make sure the new fpc will be lazily loaded */
 	save_fpu_regs();
 	current->thread.fpu.fpc = 0;
-	vcpu->arch.sie_block->gbea = 1;
+	if (!kvm_s390_pv_is_protected(vcpu->kvm))
+		vcpu->arch.sie_block->gbea = 1;
 	vcpu->arch.sie_block->pp = 0;
 	vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
 	vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
@@ -3367,6 +3368,10 @@ static int kvm_arch_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu,
 			     (u64 __user *)reg->addr);
 		break;
 	case KVM_REG_S390_GBEA:
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			r = 0;
+			break;
+		}
 		r = put_user(vcpu->arch.sie_block->gbea,
 			     (u64 __user *)reg->addr);
 		break;
@@ -3420,6 +3425,10 @@ static int kvm_arch_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu,
 			     (u64 __user *)reg->addr);
 		break;
 	case KVM_REG_S390_GBEA:
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			r = 0;
+			break;
+		}
 		r = get_user(vcpu->arch.sie_block->gbea,
 			     (u64 __user *)reg->addr);
 		break;
@@ -4023,28 +4032,19 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 	return rc;
 }
 
-static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+static void sync_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
 	struct runtime_instr_cb *riccb;
 	struct gs_cb *gscb;
 
 	riccb = (struct runtime_instr_cb *) &kvm_run->s.regs.riccb;
 	gscb = (struct gs_cb *) &kvm_run->s.regs.gscb;
-	vcpu->arch.sie_block->gpsw.mask = kvm_run->psw_mask;
-	vcpu->arch.sie_block->gpsw.addr = kvm_run->psw_addr;
-	if (kvm_run->kvm_dirty_regs & KVM_SYNC_PREFIX)
-		kvm_s390_set_prefix(vcpu, kvm_run->s.regs.prefix);
-	if (kvm_run->kvm_dirty_regs & KVM_SYNC_CRS) {
-		memcpy(&vcpu->arch.sie_block->gcr, &kvm_run->s.regs.crs, 128);
-		/* some control register changes require a tlb flush */
-		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
-	}
-	if (kvm_run->kvm_dirty_regs & KVM_SYNC_ARCH0) {
-		kvm_s390_set_cpu_timer(vcpu, kvm_run->s.regs.cputm);
-		vcpu->arch.sie_block->ckc = kvm_run->s.regs.ckc;
-		vcpu->arch.sie_block->todpr = kvm_run->s.regs.todpr;
-		vcpu->arch.sie_block->pp = kvm_run->s.regs.pp;
-		vcpu->arch.sie_block->gbea = kvm_run->s.regs.gbea;
+	if (kvm_run->kvm_dirty_regs & KVM_SYNC_ARCH0)
+		  vcpu->arch.sie_block->gbea = kvm_run->s.regs.gbea;
+	if ((kvm_run->kvm_dirty_regs & KVM_SYNC_BPBC) &&
+	    test_kvm_facility(vcpu->kvm, 82)) {
+		vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
+		vcpu->arch.sie_block->fpf |= kvm_run->s.regs.bpbc ? FPF_BPBC : 0;
 	}
 	if (kvm_run->kvm_dirty_regs & KVM_SYNC_PFAULT) {
 		vcpu->arch.pfault_token = kvm_run->s.regs.pft;
@@ -4077,25 +4077,6 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 		vcpu->arch.sie_block->ecd |= ECD_HOSTREGMGMT;
 		vcpu->arch.gs_enabled = 1;
 	}
-	if ((kvm_run->kvm_dirty_regs & KVM_SYNC_BPBC) &&
-	    test_kvm_facility(vcpu->kvm, 82)) {
-		vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
-		vcpu->arch.sie_block->fpf |= kvm_run->s.regs.bpbc ? FPF_BPBC : 0;
-	}
-	save_access_regs(vcpu->arch.host_acrs);
-	restore_access_regs(vcpu->run->s.regs.acrs);
-	/* save host (userspace) fprs/vrs */
-	save_fpu_regs();
-	vcpu->arch.host_fpregs.fpc = current->thread.fpu.fpc;
-	vcpu->arch.host_fpregs.regs = current->thread.fpu.regs;
-	if (MACHINE_HAS_VX)
-		current->thread.fpu.regs = vcpu->run->s.regs.vrs;
-	else
-		current->thread.fpu.regs = vcpu->run->s.regs.fprs;
-	current->thread.fpu.fpc = vcpu->run->s.regs.fpc;
-	if (test_fp_ctl(current->thread.fpu.fpc))
-		/* User space provided an invalid FPC, let's clear it */
-		current->thread.fpu.fpc = 0;
 	if (MACHINE_HAS_GS) {
 		preempt_disable();
 		__ctl_set_bit(2, 4);
@@ -4111,33 +4092,50 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 		preempt_enable();
 	}
 	/* SIE will load etoken directly from SDNX and therefore kvm_run */
+}
 
+static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+	vcpu->arch.sie_block->gpsw.mask = kvm_run->psw_mask;
+	vcpu->arch.sie_block->gpsw.addr = kvm_run->psw_addr;
+	if (kvm_run->kvm_dirty_regs & KVM_SYNC_PREFIX)
+		kvm_s390_set_prefix(vcpu, kvm_run->s.regs.prefix);
+	if (kvm_run->kvm_dirty_regs & KVM_SYNC_CRS) {
+		memcpy(&vcpu->arch.sie_block->gcr, &kvm_run->s.regs.crs, 128);
+		/* some control register changes require a tlb flush */
+		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+	}
+	if (kvm_run->kvm_dirty_regs & KVM_SYNC_ARCH0) {
+		kvm_s390_set_cpu_timer(vcpu, kvm_run->s.regs.cputm);
+		vcpu->arch.sie_block->ckc = kvm_run->s.regs.ckc;
+		vcpu->arch.sie_block->todpr = kvm_run->s.regs.todpr;
+		vcpu->arch.sie_block->pp = kvm_run->s.regs.pp;
+	}
+	save_access_regs(vcpu->arch.host_acrs);
+	restore_access_regs(vcpu->run->s.regs.acrs);
+	/* save host (userspace) fprs/vrs */
+	save_fpu_regs();
+	vcpu->arch.host_fpregs.fpc = current->thread.fpu.fpc;
+	vcpu->arch.host_fpregs.regs = current->thread.fpu.regs;
+	if (MACHINE_HAS_VX)
+		current->thread.fpu.regs = vcpu->run->s.regs.vrs;
+	else
+		current->thread.fpu.regs = vcpu->run->s.regs.fprs;
+	current->thread.fpu.fpc = vcpu->run->s.regs.fpc;
+	if (test_fp_ctl(current->thread.fpu.fpc))
+		/* User space provided an invalid FPC, let's clear it */
+		current->thread.fpu.fpc = 0;
+
+	/* Sync fmt2 only data */
+	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm)))
+		sync_regs_fmt2(vcpu, kvm_run);
 	kvm_run->kvm_dirty_regs = 0;
 }
 
-static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+static void store_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
-	kvm_run->psw_mask = vcpu->arch.sie_block->gpsw.mask;
-	kvm_run->psw_addr = vcpu->arch.sie_block->gpsw.addr;
-	kvm_run->s.regs.prefix = kvm_s390_get_prefix(vcpu);
-	memcpy(&kvm_run->s.regs.crs, &vcpu->arch.sie_block->gcr, 128);
-	kvm_run->s.regs.cputm = kvm_s390_get_cpu_timer(vcpu);
-	kvm_run->s.regs.ckc = vcpu->arch.sie_block->ckc;
-	kvm_run->s.regs.todpr = vcpu->arch.sie_block->todpr;
-	kvm_run->s.regs.pp = vcpu->arch.sie_block->pp;
 	kvm_run->s.regs.gbea = vcpu->arch.sie_block->gbea;
-	kvm_run->s.regs.pft = vcpu->arch.pfault_token;
-	kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
-	kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
 	kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
-	save_access_regs(vcpu->run->s.regs.acrs);
-	restore_access_regs(vcpu->arch.host_acrs);
-	/* Save guest register state */
-	save_fpu_regs();
-	vcpu->run->s.regs.fpc = current->thread.fpu.fpc;
-	/* Restore will be done lazily at return */
-	current->thread.fpu.fpc = vcpu->arch.host_fpregs.fpc;
-	current->thread.fpu.regs = vcpu->arch.host_fpregs.regs;
 	if (MACHINE_HAS_GS) {
 		__ctl_set_bit(2, 4);
 		if (vcpu->arch.gs_enabled)
@@ -4153,6 +4151,31 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	/* SIE will save etoken directly into SDNX and therefore kvm_run */
 }
 
+static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+	kvm_run->psw_mask = vcpu->arch.sie_block->gpsw.mask;
+	kvm_run->psw_addr = vcpu->arch.sie_block->gpsw.addr;
+	kvm_run->s.regs.prefix = kvm_s390_get_prefix(vcpu);
+	memcpy(&kvm_run->s.regs.crs, &vcpu->arch.sie_block->gcr, 128);
+	kvm_run->s.regs.cputm = kvm_s390_get_cpu_timer(vcpu);
+	kvm_run->s.regs.ckc = vcpu->arch.sie_block->ckc;
+	kvm_run->s.regs.todpr = vcpu->arch.sie_block->todpr;
+	kvm_run->s.regs.pp = vcpu->arch.sie_block->pp;
+	kvm_run->s.regs.pft = vcpu->arch.pfault_token;
+	kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
+	kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
+	save_access_regs(vcpu->run->s.regs.acrs);
+	restore_access_regs(vcpu->arch.host_acrs);
+	/* Save guest register state */
+	save_fpu_regs();
+	vcpu->run->s.regs.fpc = current->thread.fpu.fpc;
+	/* Restore will be done lazily at return */
+	current->thread.fpu.fpc = vcpu->arch.host_fpregs.fpc;
+	current->thread.fpu.regs = vcpu->arch.host_fpregs.regs;
+	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm)))
+		store_regs_fmt2(vcpu, kvm_run);
+}
+
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
 	int rc;
@@ -4585,6 +4608,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	case KVM_SET_ONE_REG:
 	case KVM_GET_ONE_REG: {
 		struct kvm_one_reg reg;
+		r = -EINVAL;
+		if (kvm_s390_pv_is_protected(vcpu->kvm))
+			break;
 		r = -EFAULT;
 		if (copy_from_user(&reg, argp, sizeof(reg)))
 			break;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 27/37] KVM: s390: protvirt: SIGP handling
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (25 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 26/37] KVM: s390: protvirt: Only sync fmt4 registers Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-30 18:29   ` David Hildenbrand
  2019-11-15 11:15   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 28/37] KVM: s390: protvirt: Add program exception injection Janosch Frank
                   ` (9 subsequent siblings)
  36 siblings, 2 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/intercept.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 37cb62bc261b..a89738e4f761 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -72,7 +72,8 @@ static int handle_stop(struct kvm_vcpu *vcpu)
 	if (!stop_pending)
 		return 0;
 
-	if (flags & KVM_S390_STOP_FLAG_STORE_STATUS) {
+	if (flags & KVM_S390_STOP_FLAG_STORE_STATUS &&
+	    !kvm_s390_pv_is_protected(vcpu->kvm)) {
 		rc = kvm_s390_vcpu_store_status(vcpu,
 						KVM_S390_STORE_STATUS_NOADDR);
 		if (rc)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 28/37] KVM: s390: protvirt: Add program exception injection
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (26 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 27/37] KVM: s390: protvirt: SIGP handling Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-24 11:40 ` [RFC 29/37] KVM: s390: protvirt: Sync pv state Janosch Frank
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Only two program exceptions can be injected for a protected guest:
specification and operand

Both have a code in offset 248 of the state description, as the lowcore
is not accessible by KVM for such guests.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/interrupt.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 1f87c7d3fa3e..2a711bae69a7 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -826,6 +826,21 @@ static int __must_check __deliver_external_call(struct kvm_vcpu *vcpu)
 	return rc ? -EFAULT : 0;
 }
 
+static int __deliver_prog_pv(struct kvm_vcpu *vcpu, u16 code)
+{
+	switch (code) {
+	case PGM_SPECIFICATION:
+		vcpu->arch.sie_block->iictl = IICTL_CODE_SPECIFICATION;
+		break;
+	case PGM_OPERAND:
+		vcpu->arch.sie_block->iictl = IICTL_CODE_OPERAND;
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
@@ -846,6 +861,9 @@ static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_PROGRAM_INT,
 					 pgm_info.code, 0);
 
+	if (kvm_s390_pv_is_protected(vcpu->kvm))
+		return __deliver_prog_pv(vcpu, pgm_info.code & ~PGM_PER);
+
 	switch (pgm_info.code & ~PGM_PER) {
 	case PGM_AFX_TRANSLATION:
 	case PGM_ASX_TRANSLATION:
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 29/37] KVM: s390: protvirt: Sync pv state
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (27 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 28/37] KVM: s390: protvirt: Add program exception injection Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-15  9:36   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 30/37] DOCUMENTATION: protvirt: Diag 308 IPL Janosch Frank
                   ` (7 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Indicate via register sync if the VM is in secure mode.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/uapi/asm/kvm.h | 5 ++++-
 arch/s390/kvm/kvm-s390.c         | 7 ++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 436ec7636927..b44c02426c2e 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -231,11 +231,13 @@ struct kvm_guest_debug_arch {
 #define KVM_SYNC_GSCB   (1UL << 9)
 #define KVM_SYNC_BPBC   (1UL << 10)
 #define KVM_SYNC_ETOKEN (1UL << 11)
+#define KVM_SYNC_PV	(1UL << 12)
 
 #define KVM_SYNC_S390_VALID_FIELDS \
 	(KVM_SYNC_PREFIX | KVM_SYNC_GPRS | KVM_SYNC_ACRS | KVM_SYNC_CRS | \
 	 KVM_SYNC_ARCH0 | KVM_SYNC_PFAULT | KVM_SYNC_VRS | KVM_SYNC_RICCB | \
-	 KVM_SYNC_FPRS | KVM_SYNC_GSCB | KVM_SYNC_BPBC | KVM_SYNC_ETOKEN)
+	 KVM_SYNC_FPRS | KVM_SYNC_GSCB | KVM_SYNC_BPBC | KVM_SYNC_ETOKEN | \
+	 KVM_SYNC_PV)
 
 /* length and alignment of the sdnx as a power of two */
 #define SDNXC 8
@@ -261,6 +263,7 @@ struct kvm_sync_regs {
 	__u8  reserved[512];	/* for future vector expansion */
 	__u32 fpc;		/* valid on KVM_SYNC_VRS or KVM_SYNC_FPRS */
 	__u8 bpbc : 1;		/* bp mode */
+	__u8 pv : 1;		/* pv mode */
 	__u8 reserved2 : 7;
 	__u8 padding1[51];	/* riccb needs to be 64byte aligned */
 	__u8 riccb[64];		/* runtime instrumentation controls block */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index f623c64aeade..500972a1f742 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2856,6 +2856,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 		vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
 	if (test_kvm_facility(vcpu->kvm, 156))
 		vcpu->run->kvm_valid_regs |= KVM_SYNC_ETOKEN;
+	if (test_kvm_facility(vcpu->kvm, 161))
+		vcpu->run->kvm_valid_regs |= KVM_SYNC_PV;
 	/* fprs can be synchronized via vrs, even if the guest has no vx. With
 	 * MACHINE_HAS_VX, (load|store)_fpu_regs() will work with vrs format.
 	 */
@@ -4136,6 +4138,7 @@ static void store_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
 	kvm_run->s.regs.gbea = vcpu->arch.sie_block->gbea;
 	kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
+	kvm_run->s.regs.pv = 0;
 	if (MACHINE_HAS_GS) {
 		__ctl_set_bit(2, 4);
 		if (vcpu->arch.gs_enabled)
@@ -4172,8 +4175,10 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	/* Restore will be done lazily at return */
 	current->thread.fpu.fpc = vcpu->arch.host_fpregs.fpc;
 	current->thread.fpu.regs = vcpu->arch.host_fpregs.regs;
-	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm)))
+	if (likely(!kvm_s390_pv_handle_cpu(vcpu)))
 		store_regs_fmt2(vcpu, kvm_run);
+	else
+		kvm_run->s.regs.pv = 1;
 }
 
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 30/37] DOCUMENTATION: protvirt: Diag 308 IPL
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (28 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 29/37] KVM: s390: protvirt: Sync pv state Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-06 16:48   ` Cornelia Huck
  2019-10-24 11:40 ` [RFC 31/37] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling Janosch Frank
                   ` (6 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Description of changes that are necessary to move a KVM VM into
Protected Virtualization mode.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 Documentation/virtual/kvm/s390-pv-boot.txt | 62 ++++++++++++++++++++++
 1 file changed, 62 insertions(+)
 create mode 100644 Documentation/virtual/kvm/s390-pv-boot.txt

diff --git a/Documentation/virtual/kvm/s390-pv-boot.txt b/Documentation/virtual/kvm/s390-pv-boot.txt
new file mode 100644
index 000000000000..af883c928c08
--- /dev/null
+++ b/Documentation/virtual/kvm/s390-pv-boot.txt
@@ -0,0 +1,62 @@
+Boot/IPL of Protected VMs
+========================
+
+Summary:
+
+Protected VMs are encrypted while not running. On IPL a small
+plaintext bootloader is started which provides information about the
+encrypted components and necessary metadata to KVM to decrypt it.
+
+Based on this data, KVM will make the PV known to the Ultravisor and
+instruct it to secure its memory, decrypt the components and verify
+the data and address list hashes, to ensure integrity. Afterwards KVM
+can run the PV via SIE which the UV will intercept and execute on
+KVM's behalf.
+
+The switch into PV mode lets us load encrypted guest executables and
+data via every available method (network, dasd, scsi, direct kernel,
+...) without the need to change the boot process.
+
+
+Diag308:
+
+This diagnose instruction is the basis vor VM IPL. The VM can set and
+retrieve IPL information blocks, that specify the IPL method/devices
+and request VM memory and subsystem resets, as well as IPLs.
+
+For PVs this concept has been continued with new subcodes:
+
+Subcode 8: Set an IPL Information Block of type 5.
+Subcode 9: Store the saved block in guest memory
+Subcode 10: Move into Protected Virtualization mode
+
+The new PV load-device-specific-parameters field specifies all data,
+that is necessary to move into PV mode.
+
+* PV Header origin
+* PV Header length
+* List of Components composed of:
+  * AES-XTS Tweak prefix
+  * Origin
+  * Size
+
+The PV header contains the keys and hashes, which the UV will use to
+decrypt and verify the PV, as well as control flags and a start PSW.
+
+The components are for instance an encrypted kernel, kernel cmd and
+initrd. The components are decrypted by the UV.
+
+All non-decrypted data of the non-PV guest instance are zero on first
+access of the PV.
+
+
+When running in a protected mode some subcodes will result in
+exceptions or return error codes.
+
+Subcodes 4 and 7 will result in specification exceptions.
+When removing a secure VM, the UV will clear all memory, so we can't
+have non-clearing IPL subcodes.
+
+Subcodes 8, 9, 10 will result in specification exceptions.
+Re-IPL into a protected mode is only possible via a detour into non
+protected mode.
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 31/37] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (29 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 30/37] DOCUMENTATION: protvirt: Diag 308 IPL Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-15 10:04   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 32/37] KVM: s390: protvirt: UV calls diag308 0, 1 Janosch Frank
                   ` (5 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

If the host initialized the Ultravisor, we can set stfle bit 161
(protected virtual IPL enhancements facility), which indicates, that
the IPL subcodes 8, 9 and are valid. These subcodes are used by a
normal guest to set/retrieve a IPIB of type 5 and transition into
protected mode.

Once in protected mode, the VM will loose the facility bit, as each
boot into protected mode has to go through non-protected. There is no
secure re-ipl with subcode 10 without a previous subcode 3.

In protected mode, there is no subcode 4 available, as the VM has no
more access to its memory from non-protected mode. I.e. each IPL
clears.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/diag.c     | 6 ++++++
 arch/s390/kvm/kvm-s390.c | 5 +++++
 2 files changed, 11 insertions(+)

diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index 3fb54ec2cf3e..b951dbdcb6a0 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -197,6 +197,12 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
 	case 4:
 		vcpu->run->s390_reset_flags = 0;
 		break;
+	case 8:
+	case 9:
+	case 10:
+		if (!test_kvm_facility(vcpu->kvm, 161))
+			return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
+		/* fall through */
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 500972a1f742..8947f1812b12 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2590,6 +2590,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	if (css_general_characteristics.aiv && test_facility(65))
 		set_kvm_facility(kvm->arch.model.fac_mask, 65);
 
+	if (is_prot_virt_host()) {
+		set_kvm_facility(kvm->arch.model.fac_mask, 161);
+		set_kvm_facility(kvm->arch.model.fac_list, 161);
+	}
+
 	kvm->arch.model.cpuid = kvm_s390_get_initial_cpuid();
 	kvm->arch.model.ibc = sclp.ibc & 0x0fff;
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 32/37] KVM: s390: protvirt: UV calls diag308 0, 1
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (30 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 31/37] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-15 10:07   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 33/37] KVM: s390: Introduce VCPU reset IOCTL Janosch Frank
                   ` (4 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/uv.h | 25 +++++++++++++++++++++++++
 arch/s390/kvm/diag.c       |  1 +
 arch/s390/kvm/kvm-s390.c   | 20 ++++++++++++++++++++
 arch/s390/kvm/kvm-s390.h   |  2 ++
 arch/s390/kvm/pv.c         | 19 +++++++++++++++++++
 include/uapi/linux/kvm.h   |  2 ++
 6 files changed, 69 insertions(+)

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 9ce9363aee1c..33b52ba306af 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -35,6 +35,12 @@
 #define UVC_CMD_SET_SEC_CONF_PARAMS	0x0300
 #define UVC_CMD_UNPACK_IMG		0x0301
 #define UVC_CMD_VERIFY_IMG		0x0302
+#define UVC_CMD_CPU_RESET		0x0310
+#define UVC_CMD_CPU_RESET_INITIAL	0x0311
+#define UVC_CMD_PERF_CONF_CLEAR_RESET	0x0320
+#define UVC_CMD_CPU_RESET_CLEAR		0x0321
+#define UVC_CMD_CPU_SET_STATE		0x0330
+#define UVC_CMD_SET_UNSHARED_ALL	0x0340
 #define UVC_CMD_SET_SHARED_ACCESS	0x1000
 #define UVC_CMD_REMOVE_SHARED_ACCESS	0x1001
 
@@ -53,6 +59,12 @@ enum uv_cmds_inst {
 	BIT_UVC_CMD_SET_SEC_PARMS = 11,
 	BIT_UVC_CMD_UNPACK_IMG = 13,
 	BIT_UVC_CMD_VERIFY_IMG = 14,
+	BIT_UVC_CMD_CPU_RESET = 15,
+	BIT_UVC_CMD_CPU_RESET_INITIAL = 16,
+	BIT_UVC_CMD_CPU_SET_STATE = 17,
+	BIT_UVC_CMD_PREPARE_CLEAR_RESET = 18,
+	BIT_UVC_CMD_CPU_PERFORM_CLEAR_RESET = 19,
+	BIT_UVC_CMD_REMOVE_SHARED_ACCES = 20,
 };
 
 struct uv_cb_header {
@@ -148,6 +160,19 @@ struct uv_cb_unp {
 	u64 reserved28[3];
 } __packed __aligned(8);
 
+#define PV_CPU_STATE_OPR	1
+#define PV_CPU_STATE_STP	2
+#define PV_CPU_STATE_CHKSTP	3
+
+struct uv_cb_cpu_set_state {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 cpu_handle;
+	u8  reserved20[7];
+	u8  state;
+	u64 reserved28[5];
+};
+
 /*
  * A common UV call struct for the following calls:
  * Destroy cpu/config
diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index b951dbdcb6a0..1c53eb7ba152 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -13,6 +13,7 @@
 #include <asm/pgalloc.h>
 #include <asm/gmap.h>
 #include <asm/virtio-ccw.h>
+#include <asm/uv.h>
 #include "kvm-s390.h"
 #include "trace.h"
 #include "trace-s390.h"
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 8947f1812b12..d3fd3ad1d09b 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2256,6 +2256,26 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 			 ret >> 16, ret & 0x0000ffff);
 		break;
 	}
+	case KVM_PV_VM_PERF_CLEAR_RESET: {
+		u32 ret;
+
+		r = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
+				  UVC_CMD_PERF_CONF_CLEAR_RESET,
+				  &ret);
+		VM_EVENT(kvm, 3, "PROTVIRT PERF CLEAR: rc %x rrc %x",
+			 ret >> 16, ret & 0x0000ffff);
+		break;
+	}
+	case KVM_PV_VM_UNSHARE: {
+		u32 ret;
+
+		r = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
+				  UVC_CMD_SET_UNSHARED_ALL,
+				  &ret);
+		VM_EVENT(kvm, 3, "PROTVIRT UNSHARE: %d rc %x rrc %x",
+			 r, ret >> 16, ret & 0x0000ffff);
+		break;
+	}
 	default:
 		return -ENOTTY;
 	}
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 0d61dcc51f0e..8cd2e978363d 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -209,6 +209,7 @@ int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
 int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
 		       unsigned long tweak);
 int kvm_s390_pv_verify(struct kvm *kvm);
+int kvm_s390_pv_set_cpu_state(struct kvm_vcpu *vcpu, u8 state);
 
 static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
 {
@@ -238,6 +239,7 @@ static inline int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr,
 				     unsigned long size,  unsigned long tweak)
 { return 0; }
 static inline int kvm_s390_pv_verify(struct kvm *kvm) { return 0; }
+static inline int kvm_s390_pv_set_cpu_state(struct kvm_vcpu *vcpu, u8 state) { return 0; }
 static inline bool kvm_s390_pv_is_protected(struct kvm *kvm) { return 0; }
 static inline u64 kvm_s390_pv_handle(struct kvm *kvm) { return 0; }
 static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu) { return 0; }
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index be7d558ab897..cf79a6503e1c 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -280,3 +280,22 @@ int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
 		 uvcb.header.rc, uvcb.header.rrc);
 	return rc;
 }
+
+int kvm_s390_pv_set_cpu_state(struct kvm_vcpu *vcpu, u8 state)
+{
+	int rc;
+	struct uv_cb_cpu_set_state uvcb = {
+		.header.cmd	= UVC_CMD_CPU_SET_STATE,
+		.header.len	= sizeof(uvcb),
+		.cpu_handle	= kvm_s390_pv_handle_cpu(vcpu),
+		.state		= state,
+	};
+
+	if (!kvm_s390_pv_handle_cpu(vcpu))
+		return -EINVAL;
+
+	rc = uv_call(0, (u64)&uvcb);
+	if (rc)
+		return -EINVAL;
+	return 0;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index bb37d5710c89..f75a051a7705 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1479,6 +1479,8 @@ enum pv_cmd_id {
 	KVM_PV_VM_SET_SEC_PARMS,
 	KVM_PV_VM_UNPACK,
 	KVM_PV_VM_VERIFY,
+	KVM_PV_VM_PERF_CLEAR_RESET,
+	KVM_PV_VM_UNSHARE,
 	KVM_PV_VCPU_CREATE,
 	KVM_PV_VCPU_DESTROY,
 };
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 33/37] KVM: s390: Introduce VCPU reset IOCTL
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (31 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 32/37] KVM: s390: protvirt: UV calls diag308 0, 1 Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-15 10:47   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 34/37] KVM: s390: protvirt: Report CPU state to Ultravisor Janosch Frank
                   ` (3 subsequent siblings)
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

With PV we need to do things for all reset types, not only initial...

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 53 ++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h |  6 +++++
 2 files changed, 59 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d3fd3ad1d09b..d8ee3a98e961 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3472,6 +3472,53 @@ static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static int kvm_arch_vcpu_ioctl_reset(struct kvm_vcpu *vcpu,
+				     unsigned long type)
+{
+	int rc;
+	u32 ret;
+
+	switch (type) {
+	case KVM_S390_VCPU_RESET_NORMAL:
+		/*
+		 * Only very little is reset, userspace handles the
+		 * non-protected case.
+		 */
+		rc = 0;
+		if (kvm_s390_pv_handle_cpu(vcpu)) {
+			rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
+					   UVC_CMD_CPU_RESET, &ret);
+			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET NORMAL VCPU: cpu %d rc %x rrc %x",
+				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
+		}
+		break;
+	case KVM_S390_VCPU_RESET_INITIAL:
+		rc = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
+		if (kvm_s390_pv_handle_cpu(vcpu)) {
+			uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
+				      UVC_CMD_CPU_RESET_INITIAL,
+				      &ret);
+			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET INITIAL VCPU: cpu %d rc %x rrc %x",
+				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
+		}
+		break;
+	case KVM_S390_VCPU_RESET_CLEAR:
+		rc = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
+		if (kvm_s390_pv_handle_cpu(vcpu)) {
+			rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
+					   UVC_CMD_CPU_RESET_CLEAR, &ret);
+			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET CLEAR VCPU: cpu %d rc %x rrc %x",
+				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
+		}
+		break;
+	default:
+		rc = -EINVAL;
+		break;
+	}
+	return rc;
+}
+
+
 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
 	vcpu_load(vcpu);
@@ -4633,8 +4680,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		break;
 	}
 	case KVM_S390_INITIAL_RESET:
+		r = -EINVAL;
+		if (kvm_s390_pv_is_protected(vcpu->kvm))
+			break;
 		r = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
 		break;
+	case KVM_S390_VCPU_RESET:
+		r = kvm_arch_vcpu_ioctl_reset(vcpu, arg);
+		break;
 	case KVM_SET_ONE_REG:
 	case KVM_GET_ONE_REG: {
 		struct kvm_one_reg reg;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f75a051a7705..2846ed5e5dd9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1496,6 +1496,12 @@ struct kvm_pv_cmd {
 #define KVM_S390_PV_COMMAND		_IOW(KVMIO, 0xc3, struct kvm_pv_cmd)
 #define KVM_S390_PV_COMMAND_VCPU	_IOW(KVMIO, 0xc4, struct kvm_pv_cmd)
 
+#define KVM_S390_VCPU_RESET_NORMAL	0
+#define KVM_S390_VCPU_RESET_INITIAL	1
+#define KVM_S390_VCPU_RESET_CLEAR	2
+
+#define KVM_S390_VCPU_RESET    _IO(KVMIO,   0xd0)
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 34/37] KVM: s390: protvirt: Report CPU state to Ultravisor
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (32 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 33/37] KVM: s390: Introduce VCPU reset IOCTL Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-10-24 11:40 ` [RFC 35/37] KVM: s390: Fix cpu reset local IRQ clearing Janosch Frank
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

VCPU states have to be reported to the ultravisor for SIGP
interpretation.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d8ee3a98e961..ba6144fdb5d1 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4439,7 +4439,8 @@ void kvm_s390_vcpu_start(struct kvm_vcpu *vcpu)
 		 */
 		__disable_ibs_on_all_vcpus(vcpu->kvm);
 	}
-
+	/* Let's tell the UV that we want to start again */
+	kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_OPR);
 	kvm_s390_clear_cpuflags(vcpu, CPUSTAT_STOPPED);
 	/*
 	 * Another VCPU might have used IBS while we were offline.
@@ -4467,6 +4468,8 @@ void kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu)
 	kvm_s390_clear_stop_irq(vcpu);
 
 	kvm_s390_set_cpuflags(vcpu, CPUSTAT_STOPPED);
+	/* Let's tell the UV that we successfully stopped the vcpu */
+	kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_STP);
 	__disable_ibs_on_vcpu(vcpu);
 
 	for (i = 0; i < online_vcpus; i++) {
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 35/37] KVM: s390: Fix cpu reset local IRQ clearing
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (33 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 34/37] KVM: s390: protvirt: Report CPU state to Ultravisor Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-15 11:23   ` Thomas Huth
  2019-10-24 11:40 ` [RFC 36/37] KVM: s390: protvirt: Support cmd 5 operation state Janosch Frank
  2019-10-24 11:40 ` [RFC 37/37] KVM: s390: protvirt: Add UV debug trace Janosch Frank
  36 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

The architecture states that we need to reset local IRQs for all CPU
resets. Because the old reset interface did not support the normal CPU
reset we never did that.

Now that we have a new interface, let's properly clear out local IRQs
and let this commit be a reminder.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ba6144fdb5d1..cc5feb67f145 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3485,6 +3485,8 @@ static int kvm_arch_vcpu_ioctl_reset(struct kvm_vcpu *vcpu,
 		 * non-protected case.
 		 */
 		rc = 0;
+		kvm_clear_async_pf_completion_queue(vcpu);
+		kvm_s390_clear_local_irqs(vcpu);
 		if (kvm_s390_pv_handle_cpu(vcpu)) {
 			rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
 					   UVC_CMD_CPU_RESET, &ret);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 36/37] KVM: s390: protvirt: Support cmd 5 operation state
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (34 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 35/37] KVM: s390: Fix cpu reset local IRQ clearing Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  2019-11-15 11:25   ` Thomas Huth
  2019-11-18 17:38   ` Cornelia Huck
  2019-10-24 11:40 ` [RFC 37/37] KVM: s390: protvirt: Add UV debug trace Janosch Frank
  36 siblings, 2 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Code 5 for the set cpu state UV call tells the UV to load a PSW from
the SE header (first IPL) or from guest location 0x0 (diag 308 subcode
0/1). Also it sets the cpu into operating state afterwards, so we can
start it.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/uv.h | 1 +
 arch/s390/kvm/kvm-s390.c   | 4 ++++
 include/uapi/linux/kvm.h   | 1 +
 3 files changed, 6 insertions(+)

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 33b52ba306af..8d10ae731458 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -163,6 +163,7 @@ struct uv_cb_unp {
 #define PV_CPU_STATE_OPR	1
 #define PV_CPU_STATE_STP	2
 #define PV_CPU_STATE_CHKSTP	3
+#define PV_CPU_STATE_OPR_LOAD	5
 
 struct uv_cb_cpu_set_state {
 	struct uv_cb_header header;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index cc5feb67f145..5cc9108c94e4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4652,6 +4652,10 @@ static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
 		r = kvm_s390_pv_destroy_cpu(vcpu);
 		break;
 	}
+	case KVM_PV_VCPU_SET_IPL_PSW: {
+		r = kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_OPR_LOAD);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2846ed5e5dd9..973007d27d55 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1483,6 +1483,7 @@ enum pv_cmd_id {
 	KVM_PV_VM_UNSHARE,
 	KVM_PV_VCPU_CREATE,
 	KVM_PV_VCPU_DESTROY,
+	KVM_PV_VCPU_SET_IPL_PSW,
 };
 
 struct kvm_pv_cmd {
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [RFC 37/37] KVM: s390: protvirt: Add UV debug trace
  2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
                   ` (35 preceding siblings ...)
  2019-10-24 11:40 ` [RFC 36/37] KVM: s390: protvirt: Support cmd 5 operation state Janosch Frank
@ 2019-10-24 11:40 ` Janosch Frank
  36 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-24 11:40 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

Let's have some debug traces which stay around for longer than the
guest.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 10 +++++++++-
 arch/s390/kvm/kvm-s390.h |  9 +++++++++
 arch/s390/kvm/pv.c       | 16 ++++++++++++++++
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 5cc9108c94e4..56627968f2ed 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -221,6 +221,7 @@ static struct kvm_s390_vm_cpu_subfunc kvm_s390_available_subfunc;
 static struct gmap_notifier gmap_notifier;
 static struct gmap_notifier vsie_gmap_notifier;
 debug_info_t *kvm_s390_dbf;
+debug_info_t *kvm_s390_dbf_uv;
 
 /* Section: not file related */
 int kvm_arch_hardware_enable(void)
@@ -462,7 +463,13 @@ int kvm_arch_init(void *opaque)
 	if (!kvm_s390_dbf)
 		return -ENOMEM;
 
-	if (debug_register_view(kvm_s390_dbf, &debug_sprintf_view))
+	kvm_s390_dbf_uv = debug_register("kvm-uv", 32, 1, 7 * sizeof(long));
+	if (!kvm_s390_dbf_uv)
+		return -ENOMEM;
+
+
+	if (debug_register_view(kvm_s390_dbf, &debug_sprintf_view) ||
+	    debug_register_view(kvm_s390_dbf_uv, &debug_sprintf_view))
 		goto out;
 
 	kvm_s390_cpu_feat_init();
@@ -489,6 +496,7 @@ void kvm_arch_exit(void)
 {
 	kvm_s390_gib_destroy();
 	debug_unregister(kvm_s390_dbf);
+	debug_unregister(kvm_s390_dbf_uv);
 }
 
 /* Section: device related */
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 8cd2e978363d..d13ddf2113c0 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -25,6 +25,15 @@
 #define IS_ITDB_VALID(vcpu)	((*(char *)vcpu->arch.sie_block->itdba == TDB_FORMAT1))
 
 extern debug_info_t *kvm_s390_dbf;
+extern debug_info_t *kvm_s390_dbf_uv;
+
+#define KVM_UV_EVENT(d_kvm, d_loglevel, d_string, d_args...)\
+do { \
+	debug_sprintf_event(kvm_s390_dbf_uv, d_loglevel, \
+			    "%s: " d_string "\n", d_kvm->arch.dbf->name, \
+			    d_args); \
+} while (0)
+
 #define KVM_EVENT(d_loglevel, d_string, d_args...)\
 do { \
 	debug_sprintf_event(kvm_s390_dbf, d_loglevel, d_string "\n", \
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index cf79a6503e1c..78e4510a7776 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -100,6 +100,8 @@ int kvm_s390_pv_destroy_vm(struct kvm *kvm)
 			   UVC_CMD_DESTROY_SEC_CONF, &ret);
 	VM_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
 		 ret >> 16, ret & 0x0000ffff);
+	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
+		 ret >> 16, ret & 0x0000ffff);
 	return rc;
 }
 
@@ -115,6 +117,8 @@ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
 
 		VCPU_EVENT(vcpu, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
 			   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
+		KVM_UV_EVENT(vcpu->kvm, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
+			     vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
 	}
 
 	free_pages(vcpu->arch.pv.stor_base,
@@ -163,6 +167,10 @@ int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
 	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
 		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
 		   uvcb.header.rrc);
+	KVM_UV_EVENT(vcpu->kvm, 3,
+		     "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
+		     vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
+		     uvcb.header.rrc);
 
 	/* Output */
 	vcpu->arch.pv.handle = uvcb.cpu_handle;
@@ -200,6 +208,10 @@ int kvm_s390_pv_create_vm(struct kvm *kvm)
 		 uvcb.guest_handle, uvcb.guest_stor_len, uvcb.header.rc,
 		 uvcb.header.rrc);
 
+	KVM_UV_EVENT(kvm, 3, "PROTVIRT CREATE VM: handle %llx len %llx rc %x rrc %x",
+		 uvcb.guest_handle, uvcb.guest_stor_len, uvcb.header.rc,
+		 uvcb.header.rrc);
+
 	/* Outputs */
 	kvm->arch.pv.handle = uvcb.guest_handle;
 
@@ -230,6 +242,8 @@ int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
 	rc = uv_call(0, (u64)&uvcb);
 	VM_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x",
 		 uvcb.header.rc, uvcb.header.rrc);
+	KVM_UV_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x",
+		     uvcb.header.rc, uvcb.header.rrc);
 	if (rc)
 		return -EINVAL;
 	return 0;
@@ -278,6 +292,8 @@ int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
 	}
 	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x rrc %x",
 		 uvcb.header.rc, uvcb.header.rrc);
+	KVM_UV_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x rrc %x",
+		     uvcb.header.rc, uvcb.header.rrc);
 	return rc;
 }
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-10-24 11:40 ` [RFC 02/37] s390/protvirt: introduce host side setup Janosch Frank
@ 2019-10-24 13:25   ` David Hildenbrand
  2019-10-24 13:27     ` David Hildenbrand
  2019-10-28 14:54   ` Cornelia Huck
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-24 13:25 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> From: Vasily Gorbik <gor@linux.ibm.com>
> 
> Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option for
> protected virtual machines hosting support code.
> 
> Add "prot_virt" command line option which controls if the kernel
> protected VMs support is enabled at runtime.
> 
> Extend ultravisor info definitions and expose it via uv_info struct
> filled in during startup.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> ---
>   .../admin-guide/kernel-parameters.txt         |  5 ++
>   arch/s390/boot/Makefile                       |  2 +-
>   arch/s390/boot/uv.c                           | 20 +++++++-
>   arch/s390/include/asm/uv.h                    | 46 ++++++++++++++++--
>   arch/s390/kernel/Makefile                     |  1 +
>   arch/s390/kernel/setup.c                      |  4 --
>   arch/s390/kernel/uv.c                         | 48 +++++++++++++++++++
>   arch/s390/kvm/Kconfig                         |  9 ++++
>   8 files changed, 126 insertions(+), 9 deletions(-)
>   create mode 100644 arch/s390/kernel/uv.c
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c7ac2f3ac99f..aa22e36b3105 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3693,6 +3693,11 @@
>   			before loading.
>   			See Documentation/admin-guide/blockdev/ramdisk.rst.
>   
> +	prot_virt=	[S390] enable hosting protected virtual machines
> +			isolated from the hypervisor (if hardware supports
> +			that).
> +			Format: <bool>

Isn't that a virt driver detail that should come in via KVM module 
parameters? I don't see quite yet why this has to be a kernel parameter 
(that can be changed at runtime).

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-10-24 13:25   ` David Hildenbrand
@ 2019-10-24 13:27     ` David Hildenbrand
  2019-10-24 13:40       ` Christian Borntraeger
  0 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-24 13:27 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 15:25, David Hildenbrand wrote:
> On 24.10.19 13:40, Janosch Frank wrote:
>> From: Vasily Gorbik <gor@linux.ibm.com>
>>
>> Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option for
>> protected virtual machines hosting support code.
>>
>> Add "prot_virt" command line option which controls if the kernel
>> protected VMs support is enabled at runtime.
>>
>> Extend ultravisor info definitions and expose it via uv_info struct
>> filled in during startup.
>>
>> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
>> ---
>>    .../admin-guide/kernel-parameters.txt         |  5 ++
>>    arch/s390/boot/Makefile                       |  2 +-
>>    arch/s390/boot/uv.c                           | 20 +++++++-
>>    arch/s390/include/asm/uv.h                    | 46 ++++++++++++++++--
>>    arch/s390/kernel/Makefile                     |  1 +
>>    arch/s390/kernel/setup.c                      |  4 --
>>    arch/s390/kernel/uv.c                         | 48 +++++++++++++++++++
>>    arch/s390/kvm/Kconfig                         |  9 ++++
>>    8 files changed, 126 insertions(+), 9 deletions(-)
>>    create mode 100644 arch/s390/kernel/uv.c
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index c7ac2f3ac99f..aa22e36b3105 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3693,6 +3693,11 @@
>>    			before loading.
>>    			See Documentation/admin-guide/blockdev/ramdisk.rst.
>>    
>> +	prot_virt=	[S390] enable hosting protected virtual machines
>> +			isolated from the hypervisor (if hardware supports
>> +			that).
>> +			Format: <bool>
> 
> Isn't that a virt driver detail that should come in via KVM module
> parameters? I don't see quite yet why this has to be a kernel parameter
> (that can be changed at runtime).
> 

I was confused by "runtime" in "which controls if the kernel protected 
VMs support is enabled at runtime"

So this can't be changed at runtime. Can you clarify why kvm can't 
initialize that when loaded and why we need a kernel parameter?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-10-24 13:27     ` David Hildenbrand
@ 2019-10-24 13:40       ` Christian Borntraeger
  2019-10-24 15:52         ` David Hildenbrand
  0 siblings, 1 reply; 213+ messages in thread
From: Christian Borntraeger @ 2019-10-24 13:40 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor



On 24.10.19 15:27, David Hildenbrand wrote:
> On 24.10.19 15:25, David Hildenbrand wrote:
>> On 24.10.19 13:40, Janosch Frank wrote:
>>> From: Vasily Gorbik <gor@linux.ibm.com>
>>>
>>> Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option for
>>> protected virtual machines hosting support code.
>>>
>>> Add "prot_virt" command line option which controls if the kernel
>>> protected VMs support is enabled at runtime.
>>>
>>> Extend ultravisor info definitions and expose it via uv_info struct
>>> filled in during startup.
>>>
>>> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
>>> ---
>>>    .../admin-guide/kernel-parameters.txt         |  5 ++
>>>    arch/s390/boot/Makefile                       |  2 +-
>>>    arch/s390/boot/uv.c                           | 20 +++++++-
>>>    arch/s390/include/asm/uv.h                    | 46 ++++++++++++++++--
>>>    arch/s390/kernel/Makefile                     |  1 +
>>>    arch/s390/kernel/setup.c                      |  4 --
>>>    arch/s390/kernel/uv.c                         | 48 +++++++++++++++++++
>>>    arch/s390/kvm/Kconfig                         |  9 ++++
>>>    8 files changed, 126 insertions(+), 9 deletions(-)
>>>    create mode 100644 arch/s390/kernel/uv.c
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index c7ac2f3ac99f..aa22e36b3105 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -3693,6 +3693,11 @@
>>>                before loading.
>>>                See Documentation/admin-guide/blockdev/ramdisk.rst.
>>>    +    prot_virt=    [S390] enable hosting protected virtual machines
>>> +            isolated from the hypervisor (if hardware supports
>>> +            that).
>>> +            Format: <bool>
>>
>> Isn't that a virt driver detail that should come in via KVM module
>> parameters? I don't see quite yet why this has to be a kernel parameter
>> (that can be changed at runtime).
>>
> 
> I was confused by "runtime" in "which controls if the kernel protected VMs support is enabled at runtime"
> 
> So this can't be changed at runtime. Can you clarify why kvm can't initialize that when loaded and why we need a kernel parameter?

We have to do the opt-in very early for several reasons:
- we have to donate a potentially largish contiguous (in real) range of memory to the ultravisor
- The opt-in will also disable some features in the host that could affect guest integrity (e.g. 
time sync via STP to avoid the host messing with the guest time stepping). Linux is not happy
when you remove features at a later point in time

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-10-24 13:40       ` Christian Borntraeger
@ 2019-10-24 15:52         ` David Hildenbrand
  2019-10-24 16:30           ` Claudio Imbrenda
  0 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-24 15:52 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 15:40, Christian Borntraeger wrote:
> 
> 
> On 24.10.19 15:27, David Hildenbrand wrote:
>> On 24.10.19 15:25, David Hildenbrand wrote:
>>> On 24.10.19 13:40, Janosch Frank wrote:
>>>> From: Vasily Gorbik <gor@linux.ibm.com>
>>>>
>>>> Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option for
>>>> protected virtual machines hosting support code.
>>>>
>>>> Add "prot_virt" command line option which controls if the kernel
>>>> protected VMs support is enabled at runtime.
>>>>
>>>> Extend ultravisor info definitions and expose it via uv_info struct
>>>> filled in during startup.
>>>>
>>>> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
>>>> ---
>>>>     .../admin-guide/kernel-parameters.txt         |  5 ++
>>>>     arch/s390/boot/Makefile                       |  2 +-
>>>>     arch/s390/boot/uv.c                           | 20 +++++++-
>>>>     arch/s390/include/asm/uv.h                    | 46 ++++++++++++++++--
>>>>     arch/s390/kernel/Makefile                     |  1 +
>>>>     arch/s390/kernel/setup.c                      |  4 --
>>>>     arch/s390/kernel/uv.c                         | 48 +++++++++++++++++++
>>>>     arch/s390/kvm/Kconfig                         |  9 ++++
>>>>     8 files changed, 126 insertions(+), 9 deletions(-)
>>>>     create mode 100644 arch/s390/kernel/uv.c
>>>>
>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>>> index c7ac2f3ac99f..aa22e36b3105 100644
>>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>>> @@ -3693,6 +3693,11 @@
>>>>                 before loading.
>>>>                 See Documentation/admin-guide/blockdev/ramdisk.rst.
>>>>     +    prot_virt=    [S390] enable hosting protected virtual machines
>>>> +            isolated from the hypervisor (if hardware supports
>>>> +            that).
>>>> +            Format: <bool>
>>>
>>> Isn't that a virt driver detail that should come in via KVM module
>>> parameters? I don't see quite yet why this has to be a kernel parameter
>>> (that can be changed at runtime).
>>>
>>
>> I was confused by "runtime" in "which controls if the kernel protected VMs support is enabled at runtime"
>>
>> So this can't be changed at runtime. Can you clarify why kvm can't initialize that when loaded and why we need a kernel parameter?
> 
> We have to do the opt-in very early for several reasons:
> - we have to donate a potentially largish contiguous (in real) range of memory to the ultravisor

If you'd be using CMA (or alloc_contig_pages() with less guarantees) you 
could be making good use of the memory until you actually start an 
encrypted guest.

I can see that using the memblock allocator early is easier. But you 
waste "largish ... range of memory" even if you never run VMs.

Maybe something to work on in the future.

> - The opt-in will also disable some features in the host that could affect guest integrity (e.g.
> time sync via STP to avoid the host messing with the guest time stepping). Linux is not happy
> when you remove features at a later point in time

At least disabling STP shouldn't be a real issue if I'm not wrong (maybe 
I am). But there seem to be more features.

(when I saw "prot_virt" it felt like somebody is using a big hammer for 
small nails (e.g., compared to "stp=off").)

Can you guys add these details to the patch description?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-24 11:40 ` [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable Janosch Frank
@ 2019-10-24 16:07   ` David Hildenbrand
  2019-10-24 16:33     ` Claudio Imbrenda
  2019-10-25  7:18     ` Janosch Frank
  2019-10-25  7:46   ` David Hildenbrand
  2019-10-25  8:24   ` [RFC v2] " Janosch Frank
  2 siblings, 2 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-10-24 16:07 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> KSM will not work on secure pages, because when the kernel reads a
> secure page, it will be encrypted and hence no two pages will look the
> same.
> 
> Let's mark the guest pages as unmergeable when we transition to secure
> mode.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>   arch/s390/include/asm/gmap.h |  1 +
>   arch/s390/kvm/kvm-s390.c     |  6 ++++++
>   arch/s390/mm/gmap.c          | 28 ++++++++++++++++++----------
>   3 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index 6efc0b501227..eab6a2ec3599 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -145,4 +145,5 @@ int gmap_mprotect_notify(struct gmap *, unsigned long start,
>   
>   void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
>   			     unsigned long gaddr, unsigned long vmaddr);
> +int gmap_mark_unmergeable(void);
>   #endif /* _ASM_S390_GMAP_H */
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 924132d92782..d1ba12f857e7 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2176,6 +2176,12 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>   		if (r)
>   			break;
>   
> +		down_write(&current->mm->mmap_sem);
> +		r = gmap_mark_unmergeable();
> +		up_write(&current->mm->mmap_sem);
> +		if (r)
> +			break;
> +
>   		mutex_lock(&kvm->lock);
>   		kvm_s390_vcpu_block_all(kvm);
>   		/* FMT 4 SIE needs esca */
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index edcdca97e85e..bf365a09f900 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
>   }
>   EXPORT_SYMBOL_GPL(s390_enable_sie);
>   
> +int gmap_mark_unmergeable(void)
> +{
> +	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +
> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +				MADV_UNMERGEABLE, &vma->vm_flags)) {
> +			mm->context.uses_skeys = 0;

That skey setting does not make too much sense when coming via 
kvm_s390_handle_pv(). handle that in the caller?

> +			return -ENOMEM;
> +		}
> +	}
> +	mm->def_flags &= ~VM_MERGEABLE;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gmap_mark_unmergeable);
> +
>   /*
>    * Enable storage key handling from now on and initialize the storage
>    * keys with the default key.
> @@ -2593,7 +2610,6 @@ static const struct mm_walk_ops enable_skey_walk_ops = {
>   int s390_enable_skey(void)
>   {
>   	struct mm_struct *mm = current->mm;
> -	struct vm_area_struct *vma;
>   	int rc = 0;
>   
>   	down_write(&mm->mmap_sem);
> @@ -2601,15 +2617,7 @@ int s390_enable_skey(void)
>   		goto out_up;
>   
>   	mm->context.uses_skeys = 1;
> -	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> -		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> -				MADV_UNMERGEABLE, &vma->vm_flags)) {
> -			mm->context.uses_skeys = 0;
> -			rc = -ENOMEM;
> -			goto out_up;
> -		}
> -	}
> -	mm->def_flags &= ~VM_MERGEABLE;
> +	gmap_mark_unmergeable();
>   
>   	walk_page_range(mm, 0, TASK_SIZE, &enable_skey_walk_ops, NULL);
>   
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-10-24 15:52         ` David Hildenbrand
@ 2019-10-24 16:30           ` Claudio Imbrenda
  2019-10-24 16:54             ` David Hildenbrand
  0 siblings, 1 reply; 213+ messages in thread
From: Claudio Imbrenda @ 2019-10-24 16:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Christian Borntraeger, Janosch Frank, kvm, linux-s390, thuth,
	mihajlov, mimu, cohuck, gor

On Thu, 24 Oct 2019 17:52:31 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 24.10.19 15:40, Christian Borntraeger wrote:
> > 
> > 
> > On 24.10.19 15:27, David Hildenbrand wrote:  
> >> On 24.10.19 15:25, David Hildenbrand wrote:  
> >>> On 24.10.19 13:40, Janosch Frank wrote:  
> >>>> From: Vasily Gorbik <gor@linux.ibm.com>
> >>>>
> >>>> Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option
> >>>> for protected virtual machines hosting support code.
> >>>>
> >>>> Add "prot_virt" command line option which controls if the kernel
> >>>> protected VMs support is enabled at runtime.
> >>>>
> >>>> Extend ultravisor info definitions and expose it via uv_info
> >>>> struct filled in during startup.
> >>>>
> >>>> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> >>>> ---
> >>>>     .../admin-guide/kernel-parameters.txt         |  5 ++
> >>>>     arch/s390/boot/Makefile                       |  2 +-
> >>>>     arch/s390/boot/uv.c                           | 20 +++++++-
> >>>>     arch/s390/include/asm/uv.h                    | 46
> >>>> ++++++++++++++++-- arch/s390/kernel/Makefile
> >>>> |  1 + arch/s390/kernel/setup.c                      |  4 --
> >>>>     arch/s390/kernel/uv.c                         | 48
> >>>> +++++++++++++++++++
> >>>> arch/s390/kvm/Kconfig                         |  9 ++++ 8 files
> >>>> changed, 126 insertions(+), 9 deletions(-) create mode 100644
> >>>> arch/s390/kernel/uv.c
> >>>>
> >>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt
> >>>> b/Documentation/admin-guide/kernel-parameters.txt index
> >>>> c7ac2f3ac99f..aa22e36b3105 100644 ---
> >>>> a/Documentation/admin-guide/kernel-parameters.txt +++
> >>>> b/Documentation/admin-guide/kernel-parameters.txt @@ -3693,6
> >>>> +3693,11 @@ before loading.
> >>>>                 See
> >>>> Documentation/admin-guide/blockdev/ramdisk.rst.
> >>>>     +    prot_virt=    [S390] enable hosting protected virtual
> >>>> machines
> >>>> +            isolated from the hypervisor (if hardware supports
> >>>> +            that).
> >>>> +            Format: <bool>  
> >>>
> >>> Isn't that a virt driver detail that should come in via KVM module
> >>> parameters? I don't see quite yet why this has to be a kernel
> >>> parameter (that can be changed at runtime).
> >>>  
> >>
> >> I was confused by "runtime" in "which controls if the kernel
> >> protected VMs support is enabled at runtime"
> >>
> >> So this can't be changed at runtime. Can you clarify why kvm can't
> >> initialize that when loaded and why we need a kernel parameter?  
> > 
> > We have to do the opt-in very early for several reasons:
> > - we have to donate a potentially largish contiguous (in real)
> > range of memory to the ultravisor  
> 
> If you'd be using CMA (or alloc_contig_pages() with less guarantees)
> you could be making good use of the memory until you actually start
> an encrypted guest.

no, the memory needs to be allocated before any other interaction with
the ultravisor is attempted, and the size depends on the size of the
_host_ memory. it can be a very substantial amount of memory, and thus
it's very likely to fail unless it's done very early at boot time.

> 
> I can see that using the memblock allocator early is easier. But you 
> waste "largish ... range of memory" even if you never run VMs.

this is inevitable

> 
> Maybe something to work on in the future.

honestly unlikely. which is why protected virtualization needs to be
enabled explicitly

> 
> > - The opt-in will also disable some features in the host that could
> > affect guest integrity (e.g. time sync via STP to avoid the host
> > messing with the guest time stepping). Linux is not happy when you
> > remove features at a later point in time  
> 
> At least disabling STP shouldn't be a real issue if I'm not wrong
> (maybe I am). But there seem to be more features.
> 
> (when I saw "prot_virt" it felt like somebody is using a big hammer
> for small nails (e.g., compared to "stp=off").)
> 
> Can you guys add these details to the patch description?

yeah, probably a good idea

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-24 16:07   ` David Hildenbrand
@ 2019-10-24 16:33     ` Claudio Imbrenda
  2019-10-24 16:49       ` David Hildenbrand
  2019-10-25  7:18     ` Janosch Frank
  1 sibling, 1 reply; 213+ messages in thread
From: Claudio Imbrenda @ 2019-10-24 16:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Janosch Frank, kvm, linux-s390, thuth, borntraeger, mihajlov,
	mimu, cohuck, gor

On Thu, 24 Oct 2019 18:07:14 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 24.10.19 13:40, Janosch Frank wrote:
> > KSM will not work on secure pages, because when the kernel reads a
> > secure page, it will be encrypted and hence no two pages will look
> > the same.
> > 
> > Let's mark the guest pages as unmergeable when we transition to
> > secure mode.
> > 
> > Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> > ---
> >   arch/s390/include/asm/gmap.h |  1 +
> >   arch/s390/kvm/kvm-s390.c     |  6 ++++++
> >   arch/s390/mm/gmap.c          | 28 ++++++++++++++++++----------
> >   3 files changed, 25 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/s390/include/asm/gmap.h
> > b/arch/s390/include/asm/gmap.h index 6efc0b501227..eab6a2ec3599
> > 100644 --- a/arch/s390/include/asm/gmap.h
> > +++ b/arch/s390/include/asm/gmap.h
> > @@ -145,4 +145,5 @@ int gmap_mprotect_notify(struct gmap *,
> > unsigned long start, 
> >   void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long
> > dirty_bitmap[4], unsigned long gaddr, unsigned long vmaddr);
> > +int gmap_mark_unmergeable(void);
> >   #endif /* _ASM_S390_GMAP_H */
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 924132d92782..d1ba12f857e7 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -2176,6 +2176,12 @@ static int kvm_s390_handle_pv(struct kvm
> > *kvm, struct kvm_pv_cmd *cmd) if (r)
> >   			break;
> >   
> > +		down_write(&current->mm->mmap_sem);
> > +		r = gmap_mark_unmergeable();
> > +		up_write(&current->mm->mmap_sem);
> > +		if (r)
> > +			break;
> > +
> >   		mutex_lock(&kvm->lock);
> >   		kvm_s390_vcpu_block_all(kvm);
> >   		/* FMT 4 SIE needs esca */
> > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> > index edcdca97e85e..bf365a09f900 100644
> > --- a/arch/s390/mm/gmap.c
> > +++ b/arch/s390/mm/gmap.c
> > @@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
> >   }
> >   EXPORT_SYMBOL_GPL(s390_enable_sie);
> >   
> > +int gmap_mark_unmergeable(void)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	struct vm_area_struct *vma;
> > +
> > +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> > +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > +				MADV_UNMERGEABLE, &vma->vm_flags))
> > {
> > +			mm->context.uses_skeys = 0;  
> 
> That skey setting does not make too much sense when coming via 
> kvm_s390_handle_pv(). handle that in the caller?

protected guests run keyless; any attempt to use keys in the guest will
result in an exception in the guest.

> 
> > +			return -ENOMEM;
> > +		}
> > +	}
> > +	mm->def_flags &= ~VM_MERGEABLE;
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(gmap_mark_unmergeable);
> > +
> >   /*
> >    * Enable storage key handling from now on and initialize the
> > storage
> >    * keys with the default key.
> > @@ -2593,7 +2610,6 @@ static const struct mm_walk_ops
> > enable_skey_walk_ops = { int s390_enable_skey(void)
> >   {
> >   	struct mm_struct *mm = current->mm;
> > -	struct vm_area_struct *vma;
> >   	int rc = 0;
> >   
> >   	down_write(&mm->mmap_sem);
> > @@ -2601,15 +2617,7 @@ int s390_enable_skey(void)
> >   		goto out_up;
> >   
> >   	mm->context.uses_skeys = 1;
> > -	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> > -		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > -				MADV_UNMERGEABLE, &vma->vm_flags))
> > {
> > -			mm->context.uses_skeys = 0;
> > -			rc = -ENOMEM;
> > -			goto out_up;
> > -		}
> > -	}
> > -	mm->def_flags &= ~VM_MERGEABLE;
> > +	gmap_mark_unmergeable();
> >   
> >   	walk_page_range(mm, 0, TASK_SIZE, &enable_skey_walk_ops,
> > NULL); 
> >   
> 
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-24 16:33     ` Claudio Imbrenda
@ 2019-10-24 16:49       ` David Hildenbrand
  0 siblings, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-10-24 16:49 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Janosch Frank, kvm, linux-s390, thuth, borntraeger, mihajlov,
	mimu, cohuck, gor

On 24.10.19 18:33, Claudio Imbrenda wrote:
> On Thu, 24 Oct 2019 18:07:14 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 24.10.19 13:40, Janosch Frank wrote:
>>> KSM will not work on secure pages, because when the kernel reads a
>>> secure page, it will be encrypted and hence no two pages will look
>>> the same.
>>>
>>> Let's mark the guest pages as unmergeable when we transition to
>>> secure mode.
>>>
>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>> ---
>>>    arch/s390/include/asm/gmap.h |  1 +
>>>    arch/s390/kvm/kvm-s390.c     |  6 ++++++
>>>    arch/s390/mm/gmap.c          | 28 ++++++++++++++++++----------
>>>    3 files changed, 25 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/s390/include/asm/gmap.h
>>> b/arch/s390/include/asm/gmap.h index 6efc0b501227..eab6a2ec3599
>>> 100644 --- a/arch/s390/include/asm/gmap.h
>>> +++ b/arch/s390/include/asm/gmap.h
>>> @@ -145,4 +145,5 @@ int gmap_mprotect_notify(struct gmap *,
>>> unsigned long start,
>>>    void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long
>>> dirty_bitmap[4], unsigned long gaddr, unsigned long vmaddr);
>>> +int gmap_mark_unmergeable(void);
>>>    #endif /* _ASM_S390_GMAP_H */
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index 924132d92782..d1ba12f857e7 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -2176,6 +2176,12 @@ static int kvm_s390_handle_pv(struct kvm
>>> *kvm, struct kvm_pv_cmd *cmd) if (r)
>>>    			break;
>>>    
>>> +		down_write(&current->mm->mmap_sem);
>>> +		r = gmap_mark_unmergeable();
>>> +		up_write(&current->mm->mmap_sem);
>>> +		if (r)
>>> +			break;
>>> +
>>>    		mutex_lock(&kvm->lock);
>>>    		kvm_s390_vcpu_block_all(kvm);
>>>    		/* FMT 4 SIE needs esca */
>>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>>> index edcdca97e85e..bf365a09f900 100644
>>> --- a/arch/s390/mm/gmap.c
>>> +++ b/arch/s390/mm/gmap.c
>>> @@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
>>>    }
>>>    EXPORT_SYMBOL_GPL(s390_enable_sie);
>>>    
>>> +int gmap_mark_unmergeable(void)
>>> +{
>>> +	struct mm_struct *mm = current->mm;
>>> +	struct vm_area_struct *vma;
>>> +
>>> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
>>> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
>>> +				MADV_UNMERGEABLE, &vma->vm_flags))
>>> {
>>> +			mm->context.uses_skeys = 0;
>>
>> That skey setting does not make too much sense when coming via
>> kvm_s390_handle_pv(). handle that in the caller?
> 
> protected guests run keyless; any attempt to use keys in the guest will
> result in an exception in the guest.

still, this is the recovery path for the "mm->context.uses_skeys = 1;" 
in enable_skey_walk_ops() and confuses reader (like me).


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-10-24 16:30           ` Claudio Imbrenda
@ 2019-10-24 16:54             ` David Hildenbrand
  0 siblings, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-10-24 16:54 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Christian Borntraeger, Janosch Frank, kvm, linux-s390, thuth,
	mihajlov, mimu, cohuck, gor

On 24.10.19 18:30, Claudio Imbrenda wrote:
> On Thu, 24 Oct 2019 17:52:31 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 24.10.19 15:40, Christian Borntraeger wrote:
>>>
>>>
>>> On 24.10.19 15:27, David Hildenbrand wrote:
>>>> On 24.10.19 15:25, David Hildenbrand wrote:
>>>>> On 24.10.19 13:40, Janosch Frank wrote:
>>>>>> From: Vasily Gorbik <gor@linux.ibm.com>
>>>>>>
>>>>>> Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option
>>>>>> for protected virtual machines hosting support code.
>>>>>>
>>>>>> Add "prot_virt" command line option which controls if the kernel
>>>>>> protected VMs support is enabled at runtime.
>>>>>>
>>>>>> Extend ultravisor info definitions and expose it via uv_info
>>>>>> struct filled in during startup.
>>>>>>
>>>>>> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
>>>>>> ---
>>>>>>      .../admin-guide/kernel-parameters.txt         |  5 ++
>>>>>>      arch/s390/boot/Makefile                       |  2 +-
>>>>>>      arch/s390/boot/uv.c                           | 20 +++++++-
>>>>>>      arch/s390/include/asm/uv.h                    | 46
>>>>>> ++++++++++++++++-- arch/s390/kernel/Makefile
>>>>>> |  1 + arch/s390/kernel/setup.c                      |  4 --
>>>>>>      arch/s390/kernel/uv.c                         | 48
>>>>>> +++++++++++++++++++
>>>>>> arch/s390/kvm/Kconfig                         |  9 ++++ 8 files
>>>>>> changed, 126 insertions(+), 9 deletions(-) create mode 100644
>>>>>> arch/s390/kernel/uv.c
>>>>>>
>>>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt
>>>>>> b/Documentation/admin-guide/kernel-parameters.txt index
>>>>>> c7ac2f3ac99f..aa22e36b3105 100644 ---
>>>>>> a/Documentation/admin-guide/kernel-parameters.txt +++
>>>>>> b/Documentation/admin-guide/kernel-parameters.txt @@ -3693,6
>>>>>> +3693,11 @@ before loading.
>>>>>>                  See
>>>>>> Documentation/admin-guide/blockdev/ramdisk.rst.
>>>>>>      +    prot_virt=    [S390] enable hosting protected virtual
>>>>>> machines
>>>>>> +            isolated from the hypervisor (if hardware supports
>>>>>> +            that).
>>>>>> +            Format: <bool>
>>>>>
>>>>> Isn't that a virt driver detail that should come in via KVM module
>>>>> parameters? I don't see quite yet why this has to be a kernel
>>>>> parameter (that can be changed at runtime).
>>>>>   
>>>>
>>>> I was confused by "runtime" in "which controls if the kernel
>>>> protected VMs support is enabled at runtime"
>>>>
>>>> So this can't be changed at runtime. Can you clarify why kvm can't
>>>> initialize that when loaded and why we need a kernel parameter?
>>>
>>> We have to do the opt-in very early for several reasons:
>>> - we have to donate a potentially largish contiguous (in real)
>>> range of memory to the ultravisor
>>
>> If you'd be using CMA (or alloc_contig_pages() with less guarantees)
>> you could be making good use of the memory until you actually start
>> an encrypted guest.
> 
> no, the memory needs to be allocated before any other interaction with
> the ultravisor is attempted, and the size depends on the size of the

I fail to see why you need interaction with the UV before you actually 
start/create an encrypted guest (IOW why you can't defer uv_init()) - 
but I am not past "[RFC 07/37] KVM: s390: protvirt: Secure memory is not 
mergeable" yet and ...

> _host_ memory. it can be a very substantial amount of memory, and thus
> it's very likely to fail unless it's done very early at boot time.

... I guess you could still do that via CMA ... but it doesn't really 
matter right now :) I understood the rational.

> 
>>
>>> - The opt-in will also disable some features in the host that could
>>> affect guest integrity (e.g. time sync via STP to avoid the host
>>> messing with the guest time stepping). Linux is not happy when you
>>> remove features at a later point in time
>>
>> At least disabling STP shouldn't be a real issue if I'm not wrong
>> (maybe I am). But there seem to be more features.
>>
>> (when I saw "prot_virt" it felt like somebody is using a big hammer
>> for small nails (e.g., compared to "stp=off").)
>>
>> Can you guys add these details to the patch description?
> 
> yeah, probably a good idea
> 

If a patch explains why it does something and not only what it does 
usually makes me ask less stupid questions :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-24 16:07   ` David Hildenbrand
  2019-10-24 16:33     ` Claudio Imbrenda
@ 2019-10-25  7:18     ` Janosch Frank
  2019-10-25  8:04       ` David Hildenbrand
  1 sibling, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-25  7:18 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 2789 bytes --]

On 10/24/19 6:07 PM, David Hildenbrand wrote:
> On 24.10.19 13:40, Janosch Frank wrote:
>> KSM will not work on secure pages, because when the kernel reads a
>> secure page, it will be encrypted and hence no two pages will look the
>> same.
>>
>> Let's mark the guest pages as unmergeable when we transition to secure
>> mode.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/gmap.h |  1 +
>>   arch/s390/kvm/kvm-s390.c     |  6 ++++++
>>   arch/s390/mm/gmap.c          | 28 ++++++++++++++++++----------
>>   3 files changed, 25 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
>> index 6efc0b501227..eab6a2ec3599 100644
>> --- a/arch/s390/include/asm/gmap.h
>> +++ b/arch/s390/include/asm/gmap.h
>> @@ -145,4 +145,5 @@ int gmap_mprotect_notify(struct gmap *, unsigned long start,
>>   
>>   void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
>>   			     unsigned long gaddr, unsigned long vmaddr);
>> +int gmap_mark_unmergeable(void);
>>   #endif /* _ASM_S390_GMAP_H */
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 924132d92782..d1ba12f857e7 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -2176,6 +2176,12 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>>   		if (r)
>>   			break;
>>   
>> +		down_write(&current->mm->mmap_sem);
>> +		r = gmap_mark_unmergeable();
>> +		up_write(&current->mm->mmap_sem);
>> +		if (r)
>> +			break;
>> +
>>   		mutex_lock(&kvm->lock);
>>   		kvm_s390_vcpu_block_all(kvm);
>>   		/* FMT 4 SIE needs esca */
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index edcdca97e85e..bf365a09f900 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
>>   }
>>   EXPORT_SYMBOL_GPL(s390_enable_sie);
>>   
>> +int gmap_mark_unmergeable(void)
>> +{
>> +	struct mm_struct *mm = current->mm;
>> +	struct vm_area_struct *vma;
>> +
>> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
>> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
>> +				MADV_UNMERGEABLE, &vma->vm_flags)) {
>> +			mm->context.uses_skeys = 0;
> 
> That skey setting does not make too much sense when coming via 
> kvm_s390_handle_pv(). handle that in the caller?

Hmm, I think the name of that variable is just plain wrong.
It should be "can_use_skeys" or "uses_unmergeable" (which would fit
better into the mm context anyway) and then we could add a
kvm->arch.uses_skeys to tell that we actually used them for migration
checks, etc..

I had long discussions with Martin over these variable names a long time
ago..



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-24 11:40 ` [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable Janosch Frank
  2019-10-24 16:07   ` David Hildenbrand
@ 2019-10-25  7:46   ` David Hildenbrand
  2019-10-25  8:24   ` [RFC v2] " Janosch Frank
  2 siblings, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-10-25  7:46 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> KSM will not work on secure pages, because when the kernel reads a
> secure page, it will be encrypted and hence no two pages will look the
> same.
> 
> Let's mark the guest pages as unmergeable when we transition to secure
> mode.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>   arch/s390/include/asm/gmap.h |  1 +
>   arch/s390/kvm/kvm-s390.c     |  6 ++++++
>   arch/s390/mm/gmap.c          | 28 ++++++++++++++++++----------
>   3 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index 6efc0b501227..eab6a2ec3599 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -145,4 +145,5 @@ int gmap_mprotect_notify(struct gmap *, unsigned long start,
>   
>   void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
>   			     unsigned long gaddr, unsigned long vmaddr);
> +int gmap_mark_unmergeable(void);
>   #endif /* _ASM_S390_GMAP_H */
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 924132d92782..d1ba12f857e7 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2176,6 +2176,12 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>   		if (r)
>   			break;
>   
> +		down_write(&current->mm->mmap_sem);
> +		r = gmap_mark_unmergeable();
> +		up_write(&current->mm->mmap_sem);
> +		if (r)
> +			break;
> +
>   		mutex_lock(&kvm->lock);
>   		kvm_s390_vcpu_block_all(kvm);
>   		/* FMT 4 SIE needs esca */
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index edcdca97e85e..bf365a09f900 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
>   }
>   EXPORT_SYMBOL_GPL(s390_enable_sie);
>   
> +int gmap_mark_unmergeable(void)
> +{
> +	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +
> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +				MADV_UNMERGEABLE, &vma->vm_flags)) {
> +			mm->context.uses_skeys = 0;
> +			return -ENOMEM;
> +		}
> +	}
> +	mm->def_flags &= ~VM_MERGEABLE;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gmap_mark_unmergeable);
> +
>   /*
>    * Enable storage key handling from now on and initialize the storage
>    * keys with the default key.
> @@ -2593,7 +2610,6 @@ static const struct mm_walk_ops enable_skey_walk_ops = {
>   int s390_enable_skey(void)
>   {
>   	struct mm_struct *mm = current->mm;
> -	struct vm_area_struct *vma;
>   	int rc = 0;
>   
>   	down_write(&mm->mmap_sem);
> @@ -2601,15 +2617,7 @@ int s390_enable_skey(void)
>   		goto out_up;
>   
>   	mm->context.uses_skeys = 1;
> -	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> -		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> -				MADV_UNMERGEABLE, &vma->vm_flags)) {
> -			mm->context.uses_skeys = 0;
> -			rc = -ENOMEM;
> -			goto out_up;
> -		}
> -	}
> -	mm->def_flags &= ~VM_MERGEABLE;
> +	gmap_mark_unmergeable();

also, here we are now ignoring errors?

This looks broken.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-25  7:18     ` Janosch Frank
@ 2019-10-25  8:04       ` David Hildenbrand
  2019-10-25  8:20         ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-25  8:04 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 25.10.19 09:18, Janosch Frank wrote:
> On 10/24/19 6:07 PM, David Hildenbrand wrote:
>> On 24.10.19 13:40, Janosch Frank wrote:
>>> KSM will not work on secure pages, because when the kernel reads a
>>> secure page, it will be encrypted and hence no two pages will look the
>>> same.
>>>
>>> Let's mark the guest pages as unmergeable when we transition to secure
>>> mode.
>>>
>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>> ---
>>>    arch/s390/include/asm/gmap.h |  1 +
>>>    arch/s390/kvm/kvm-s390.c     |  6 ++++++
>>>    arch/s390/mm/gmap.c          | 28 ++++++++++++++++++----------
>>>    3 files changed, 25 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
>>> index 6efc0b501227..eab6a2ec3599 100644
>>> --- a/arch/s390/include/asm/gmap.h
>>> +++ b/arch/s390/include/asm/gmap.h
>>> @@ -145,4 +145,5 @@ int gmap_mprotect_notify(struct gmap *, unsigned long start,
>>>    
>>>    void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
>>>    			     unsigned long gaddr, unsigned long vmaddr);
>>> +int gmap_mark_unmergeable(void);
>>>    #endif /* _ASM_S390_GMAP_H */
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index 924132d92782..d1ba12f857e7 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -2176,6 +2176,12 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>>>    		if (r)
>>>    			break;
>>>    
>>> +		down_write(&current->mm->mmap_sem);
>>> +		r = gmap_mark_unmergeable();
>>> +		up_write(&current->mm->mmap_sem);
>>> +		if (r)
>>> +			break;
>>> +
>>>    		mutex_lock(&kvm->lock);
>>>    		kvm_s390_vcpu_block_all(kvm);
>>>    		/* FMT 4 SIE needs esca */
>>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>>> index edcdca97e85e..bf365a09f900 100644
>>> --- a/arch/s390/mm/gmap.c
>>> +++ b/arch/s390/mm/gmap.c
>>> @@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
>>>    }
>>>    EXPORT_SYMBOL_GPL(s390_enable_sie);
>>>    
>>> +int gmap_mark_unmergeable(void)
>>> +{
>>> +	struct mm_struct *mm = current->mm;
>>> +	struct vm_area_struct *vma;
>>> +
>>> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
>>> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
>>> +				MADV_UNMERGEABLE, &vma->vm_flags)) {
>>> +			mm->context.uses_skeys = 0;
>>
>> That skey setting does not make too much sense when coming via
>> kvm_s390_handle_pv(). handle that in the caller?
> 
> Hmm, I think the name of that variable is just plain wrong.
> It should be "can_use_skeys" or "uses_unmergeable" (which would fit
> better into the mm context anyway) and then we could add a
> kvm->arch.uses_skeys to tell that we actually used them for migration
> checks, etc..
> 
> I had long discussions with Martin over these variable names a long time
> ago..

uses_skeys is set during s390_enable_skey(). that is used when we

a) Call an skey instruction
b) Migrate skeys

So it should match "uses" or what am I missing?

If you look at the users of "mm_uses_skeys(mm)" I think 
"uses_unmergeable" would actually be misleading. (e.g., 
pgste_set_key()). it really means "somebody used skeys". The unmergable 
is just a required side effect.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-25  8:04       ` David Hildenbrand
@ 2019-10-25  8:20         ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-25  8:20 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 3616 bytes --]

On 10/25/19 10:04 AM, David Hildenbrand wrote:
> On 25.10.19 09:18, Janosch Frank wrote:
>> On 10/24/19 6:07 PM, David Hildenbrand wrote:
>>> On 24.10.19 13:40, Janosch Frank wrote:
>>>> KSM will not work on secure pages, because when the kernel reads a
>>>> secure page, it will be encrypted and hence no two pages will look the
>>>> same.
>>>>
>>>> Let's mark the guest pages as unmergeable when we transition to secure
>>>> mode.
>>>>
>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>> ---
>>>>    arch/s390/include/asm/gmap.h |  1 +
>>>>    arch/s390/kvm/kvm-s390.c     |  6 ++++++
>>>>    arch/s390/mm/gmap.c          | 28 ++++++++++++++++++----------
>>>>    3 files changed, 25 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
>>>> index 6efc0b501227..eab6a2ec3599 100644
>>>> --- a/arch/s390/include/asm/gmap.h
>>>> +++ b/arch/s390/include/asm/gmap.h
>>>> @@ -145,4 +145,5 @@ int gmap_mprotect_notify(struct gmap *, unsigned long start,
>>>>    
>>>>    void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
>>>>    			     unsigned long gaddr, unsigned long vmaddr);
>>>> +int gmap_mark_unmergeable(void);
>>>>    #endif /* _ASM_S390_GMAP_H */
>>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>>> index 924132d92782..d1ba12f857e7 100644
>>>> --- a/arch/s390/kvm/kvm-s390.c
>>>> +++ b/arch/s390/kvm/kvm-s390.c
>>>> @@ -2176,6 +2176,12 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>>>>    		if (r)
>>>>    			break;
>>>>    
>>>> +		down_write(&current->mm->mmap_sem);
>>>> +		r = gmap_mark_unmergeable();
>>>> +		up_write(&current->mm->mmap_sem);
>>>> +		if (r)
>>>> +			break;
>>>> +
>>>>    		mutex_lock(&kvm->lock);
>>>>    		kvm_s390_vcpu_block_all(kvm);
>>>>    		/* FMT 4 SIE needs esca */
>>>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>>>> index edcdca97e85e..bf365a09f900 100644
>>>> --- a/arch/s390/mm/gmap.c
>>>> +++ b/arch/s390/mm/gmap.c
>>>> @@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(s390_enable_sie);
>>>>    
>>>> +int gmap_mark_unmergeable(void)
>>>> +{
>>>> +	struct mm_struct *mm = current->mm;
>>>> +	struct vm_area_struct *vma;
>>>> +
>>>> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
>>>> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
>>>> +				MADV_UNMERGEABLE, &vma->vm_flags)) {
>>>> +			mm->context.uses_skeys = 0;
>>>
>>> That skey setting does not make too much sense when coming via
>>> kvm_s390_handle_pv(). handle that in the caller?
>>
>> Hmm, I think the name of that variable is just plain wrong.
>> It should be "can_use_skeys" or "uses_unmergeable" (which would fit
>> better into the mm context anyway) and then we could add a
>> kvm->arch.uses_skeys to tell that we actually used them for migration
>> checks, etc..
>>
>> I had long discussions with Martin over these variable names a long time
>> ago..
> 
> uses_skeys is set during s390_enable_skey(). that is used when we
> 
> a) Call an skey instruction
> b) Migrate skeys
> 
> So it should match "uses" or what am I missing?
> 
> If you look at the users of "mm_uses_skeys(mm)" I think 
> "uses_unmergeable" would actually be misleading. (e.g., 
> pgste_set_key()). it really means "somebody used skeys". The unmergable 
> is just a required side effect.

Hmm, we couldn't check struct kvm from pgtable.c anyway.
Oh well, I still don't like it very much but your arguments are better
:-) Let's fix this.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 08/37] KVM: s390: add missing include in gmap.h
  2019-10-24 11:40 ` [RFC 08/37] KVM: s390: add missing include in gmap.h Janosch Frank
@ 2019-10-25  8:24   ` David Hildenbrand
  2019-11-13 12:27   ` Thomas Huth
  1 sibling, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-10-25  8:24 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> gmap.h references radix trees, but does not include linux/radix-tree.h
> itself. Sources that include gmap.h but not also radix-tree.h will
> therefore fail to compile.
> 
> This simple patch adds the include for linux/radix-tree.h in gmap.h so
> that users of gmap.h will be able to compile.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>   arch/s390/include/asm/gmap.h | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index eab6a2ec3599..99b3eedda26e 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -10,6 +10,7 @@
>   #define _ASM_S390_GMAP_H
>   
>   #include <linux/refcount.h>
> +#include <linux/radix-tree.h>
>   
>   /* Generic bits for GMAP notification on DAT table entry changes. */
>   #define GMAP_NOTIFY_SHADOW	0x2
> 

Not sure if that's worth a separate patch, just squash it into the patch 
that needs it?

We usually don't care about includes as long as it compiles ...

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [RFC v2] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-24 11:40 ` [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable Janosch Frank
  2019-10-24 16:07   ` David Hildenbrand
  2019-10-25  7:46   ` David Hildenbrand
@ 2019-10-25  8:24   ` Janosch Frank
  2019-11-01 13:02     ` Christian Borntraeger
                       ` (2 more replies)
  2 siblings, 3 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-25  8:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, mimu,
	cohuck, gor, frankja

KSM will not work on secure pages, because when the kernel reads a
secure page, it will be encrypted and hence no two pages will look the
same.

Let's mark the guest pages as unmergeable when we transition to secure
mode.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/gmap.h |  1 +
 arch/s390/kvm/kvm-s390.c     |  6 ++++++
 arch/s390/mm/gmap.c          | 32 +++++++++++++++++++++-----------
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 6efc0b501227..eab6a2ec3599 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -145,4 +145,5 @@ int gmap_mprotect_notify(struct gmap *, unsigned long start,
 
 void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
 			     unsigned long gaddr, unsigned long vmaddr);
+int gmap_mark_unmergeable(void);
 #endif /* _ASM_S390_GMAP_H */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 924132d92782..d1ba12f857e7 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2176,6 +2176,12 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 		if (r)
 			break;
 
+		down_write(&current->mm->mmap_sem);
+		r = gmap_mark_unmergeable();
+		up_write(&current->mm->mmap_sem);
+		if (r)
+			break;
+
 		mutex_lock(&kvm->lock);
 		kvm_s390_vcpu_block_all(kvm);
 		/* FMT 4 SIE needs esca */
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index edcdca97e85e..faecdf81abdb 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
 }
 EXPORT_SYMBOL_GPL(s390_enable_sie);
 
+int gmap_mark_unmergeable(void)
+{
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
+				MADV_UNMERGEABLE, &vma->vm_flags))
+			return -ENOMEM;
+	}
+	mm->def_flags &= ~VM_MERGEABLE;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(gmap_mark_unmergeable);
+
 /*
  * Enable storage key handling from now on and initialize the storage
  * keys with the default key.
@@ -2593,24 +2610,17 @@ static const struct mm_walk_ops enable_skey_walk_ops = {
 int s390_enable_skey(void)
 {
 	struct mm_struct *mm = current->mm;
-	struct vm_area_struct *vma;
 	int rc = 0;
 
 	down_write(&mm->mmap_sem);
 	if (mm_uses_skeys(mm))
 		goto out_up;
 
-	mm->context.uses_skeys = 1;
-	for (vma = mm->mmap; vma; vma = vma->vm_next) {
-		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
-				MADV_UNMERGEABLE, &vma->vm_flags)) {
-			mm->context.uses_skeys = 0;
-			rc = -ENOMEM;
-			goto out_up;
-		}
-	}
-	mm->def_flags &= ~VM_MERGEABLE;
+	rc = gmap_mark_unmergeable();
+	if (rc)
+		goto out_up;
 
+	mm->context.uses_skeys = 1;
 	walk_page_range(mm, 0, TASK_SIZE, &enable_skey_walk_ops, NULL);
 
 out_up:
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [RFC 06/37] s390: UV: Add import and export to UV library
  2019-10-24 11:40 ` [RFC 06/37] s390: UV: Add import and export to UV library Janosch Frank
@ 2019-10-25  8:31   ` David Hildenbrand
  2019-10-25  8:39     ` Janosch Frank
  2019-11-01 11:26   ` Christian Borntraeger
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-25  8:31 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> The convert to/from secure (or also "import/export") ultravisor calls
> are need for page management, i.e. paging, of secure execution VM.
> 
> Export encrypts a secure guest's page and makes it accessible to the
> guest for paging.

How does paging play along with pinning the pages (from 
uv_convert_to_secure() -> kvm_s390_pv_pin_page()) in a follow up patch? 
Can you paint me the bigger picture?

Just so I understand:

When a page is "secure", it is actually unencrypted but only the guest 
can access it. If the host accesses it, there is an exception.

When a page is "not secure", it is encrypted but only the host can read 
it. If the guest accesses it, there is an exception.

Based on these exceptions, you are able to request to convert back and 
forth.


> 
> Import makes a page accessible to a secure guest.
> On the first import of that page, the page will be cleared by the
> Ultravisor before it is given to the guest.
> 
> All following imports will decrypt a exported page and verify
> integrity before giving the page to the guest.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>   arch/s390/include/asm/uv.h | 51 ++++++++++++++++++++++++++++++++++++++
>   1 file changed, 51 insertions(+)
> 
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 0bfbafcca136..99cdd2034503 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -15,6 +15,7 @@
>   #include <linux/errno.h>
>   #include <linux/bug.h>
>   #include <asm/page.h>
> +#include <asm/gmap.h>
>   
>   #define UVC_RC_EXECUTED		0x0001
>   #define UVC_RC_INV_CMD		0x0002
> @@ -279,6 +280,54 @@ static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
>   	return rc ? -EINVAL : 0;
>   }
>   
> +/*
> + * Requests the Ultravisor to encrypt a guest page and make it
> + * accessible to the host for paging (export).
> + *
> + * @paddr: Absolute host address of page to be exported
> + */
> +static inline int uv_convert_from_secure(unsigned long paddr)
> +{
> +	struct uv_cb_cfs uvcb = {
> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.paddr = paddr
> +	};
> +	if (!uv_call(0, (u64)&uvcb))
> +		return 0;
> +	return -EINVAL;
> +}
> +
> +/*
> + * Requests the Ultravisor to make a page accessible to a guest
> + * (import). If it's brought in the first time, it will be cleared. If
> + * it has been exported before, it will be decrypted and integrity
> + * checked.
> + *
> + * @handle: Ultravisor guest handle
> + * @gaddr: Guest 2 absolute address to be imported
> + */
> +static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
> +{
> +	int cc;
> +	struct uv_cb_cts uvcb = {
> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.guest_handle = gmap->se_handle,
> +		.gaddr = gaddr
> +	};
> +
> +	cc = uv_call(0, (u64)&uvcb);
> +
> +	if (!cc)
> +		return 0;
> +	if (uvcb.header.rc == 0x104)
> +		return -EEXIST;
> +	if (uvcb.header.rc == 0x10a)
> +		return -EFAULT;
> +	return -EINVAL;
> +}
> +
>   void setup_uv(void);
>   void adjust_to_uv_max(unsigned long *vmax);
>   #else
> @@ -286,6 +335,8 @@ void adjust_to_uv_max(unsigned long *vmax);
>   static inline void setup_uv(void) {}
>   static inline void adjust_to_uv_max(unsigned long *vmax) {}
>   static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret) { return 0; }
> +static inline int uv_convert_from_secure(unsigned long paddr) { return 0; }
> +static inline int uv_convert_to_secure(unsigned long handle, unsigned long gaddr) { return 0; }
>   #endif
>   
>   #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 06/37] s390: UV: Add import and export to UV library
  2019-10-25  8:31   ` David Hildenbrand
@ 2019-10-25  8:39     ` Janosch Frank
  2019-10-25  8:40       ` David Hildenbrand
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-25  8:39 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 4196 bytes --]

On 10/25/19 10:31 AM, David Hildenbrand wrote:
> On 24.10.19 13:40, Janosch Frank wrote:
>> The convert to/from secure (or also "import/export") ultravisor calls
>> are need for page management, i.e. paging, of secure execution VM.
>>
>> Export encrypts a secure guest's page and makes it accessible to the
>> guest for paging.
> 
> How does paging play along with pinning the pages (from 
> uv_convert_to_secure() -> kvm_s390_pv_pin_page()) in a follow up patch? 
> Can you paint me the bigger picture?

That's a stale comment I should have removed before sending...
The current patches do not support paging.

> 
> Just so I understand:
> 
> When a page is "secure", it is actually unencrypted but only the guest 
> can access it. If the host accesses it, there is an exception.
> 
> When a page is "not secure", it is encrypted but only the host can read 
> it. If the guest accesses it, there is an exception.
> 
> Based on these exceptions, you are able to request to convert back and 
> forth.

Yes
Shared pages are the exception, because they are accessible to both parties.

> 
> 
>>
>> Import makes a page accessible to a secure guest.
>> On the first import of that page, the page will be cleared by the
>> Ultravisor before it is given to the guest.
>>
>> All following imports will decrypt a exported page and verify
>> integrity before giving the page to the guest.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/uv.h | 51 ++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 51 insertions(+)
>>
>> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
>> index 0bfbafcca136..99cdd2034503 100644
>> --- a/arch/s390/include/asm/uv.h
>> +++ b/arch/s390/include/asm/uv.h
>> @@ -15,6 +15,7 @@
>>   #include <linux/errno.h>
>>   #include <linux/bug.h>
>>   #include <asm/page.h>
>> +#include <asm/gmap.h>
>>   
>>   #define UVC_RC_EXECUTED		0x0001
>>   #define UVC_RC_INV_CMD		0x0002
>> @@ -279,6 +280,54 @@ static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
>>   	return rc ? -EINVAL : 0;
>>   }
>>   
>> +/*
>> + * Requests the Ultravisor to encrypt a guest page and make it
>> + * accessible to the host for paging (export).
>> + *
>> + * @paddr: Absolute host address of page to be exported
>> + */
>> +static inline int uv_convert_from_secure(unsigned long paddr)
>> +{
>> +	struct uv_cb_cfs uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.paddr = paddr
>> +	};
>> +	if (!uv_call(0, (u64)&uvcb))
>> +		return 0;
>> +	return -EINVAL;
>> +}
>> +
>> +/*
>> + * Requests the Ultravisor to make a page accessible to a guest
>> + * (import). If it's brought in the first time, it will be cleared. If
>> + * it has been exported before, it will be decrypted and integrity
>> + * checked.
>> + *
>> + * @handle: Ultravisor guest handle
>> + * @gaddr: Guest 2 absolute address to be imported
>> + */
>> +static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
>> +{
>> +	int cc;
>> +	struct uv_cb_cts uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.guest_handle = gmap->se_handle,
>> +		.gaddr = gaddr
>> +	};
>> +
>> +	cc = uv_call(0, (u64)&uvcb);
>> +
>> +	if (!cc)
>> +		return 0;
>> +	if (uvcb.header.rc == 0x104)
>> +		return -EEXIST;
>> +	if (uvcb.header.rc == 0x10a)
>> +		return -EFAULT;
>> +	return -EINVAL;
>> +}
>> +
>>   void setup_uv(void);
>>   void adjust_to_uv_max(unsigned long *vmax);
>>   #else
>> @@ -286,6 +335,8 @@ void adjust_to_uv_max(unsigned long *vmax);
>>   static inline void setup_uv(void) {}
>>   static inline void adjust_to_uv_max(unsigned long *vmax) {}
>>   static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret) { return 0; }
>> +static inline int uv_convert_from_secure(unsigned long paddr) { return 0; }
>> +static inline int uv_convert_to_secure(unsigned long handle, unsigned long gaddr) { return 0; }
>>   #endif
>>   
>>   #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 06/37] s390: UV: Add import and export to UV library
  2019-10-25  8:39     ` Janosch Frank
@ 2019-10-25  8:40       ` David Hildenbrand
  2019-10-25  8:42         ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-25  8:40 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 25.10.19 10:39, Janosch Frank wrote:
> On 10/25/19 10:31 AM, David Hildenbrand wrote:
>> On 24.10.19 13:40, Janosch Frank wrote:
>>> The convert to/from secure (or also "import/export") ultravisor calls
>>> are need for page management, i.e. paging, of secure execution VM.
>>>
>>> Export encrypts a secure guest's page and makes it accessible to the
>>> guest for paging.
>>
>> How does paging play along with pinning the pages (from
>> uv_convert_to_secure() -> kvm_s390_pv_pin_page()) in a follow up patch?
>> Can you paint me the bigger picture?
> 
> That's a stale comment I should have removed before sending...
> The current patches do not support paging.

Note that once you pin you really have to disable the balloon in the 
QEMU (inhibit it).


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 06/37] s390: UV: Add import and export to UV library
  2019-10-25  8:40       ` David Hildenbrand
@ 2019-10-25  8:42         ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-25  8:42 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 943 bytes --]

On 10/25/19 10:40 AM, David Hildenbrand wrote:
> On 25.10.19 10:39, Janosch Frank wrote:
>> On 10/25/19 10:31 AM, David Hildenbrand wrote:
>>> On 24.10.19 13:40, Janosch Frank wrote:
>>>> The convert to/from secure (or also "import/export") ultravisor calls
>>>> are need for page management, i.e. paging, of secure execution VM.
>>>>
>>>> Export encrypts a secure guest's page and makes it accessible to the
>>>> guest for paging.
>>>
>>> How does paging play along with pinning the pages (from
>>> uv_convert_to_secure() -> kvm_s390_pv_pin_page()) in a follow up patch?
>>> Can you paint me the bigger picture?
>>
>> That's a stale comment I should have removed before sending...
>> The current patches do not support paging.
> 
> Note that once you pin you really have to disable the balloon in the 
> QEMU (inhibit it).
> 

Yes, and you need the iommu for virtio.
We didn't yet fully discuss how to handle that.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-10-24 11:40 ` [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning Janosch Frank
@ 2019-10-25  8:49   ` David Hildenbrand
  2019-10-31 15:41     ` Christian Borntraeger
  2019-11-02  8:53   ` Christian Borntraeger
  2019-11-04 14:17   ` David Hildenbrand
  2 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-25  8:49 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> Pin the guest pages when they are first accessed, instead of all at
> the same time when starting the guest.

Please explain why you do stuff. Why do we have to pin the hole guest 
memory? Why can't we mlock() the hole memory to avoid swapping in user 
space?

This really screams for a proper explanation.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>   arch/s390/include/asm/gmap.h |  1 +
>   arch/s390/include/asm/uv.h   |  6 +++++
>   arch/s390/kernel/uv.c        | 20 ++++++++++++++
>   arch/s390/kvm/kvm-s390.c     |  2 ++
>   arch/s390/kvm/pv.c           | 51 ++++++++++++++++++++++++++++++------
>   5 files changed, 72 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index 99b3eedda26e..483f64427c0e 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -63,6 +63,7 @@ struct gmap {
>   	struct gmap *parent;
>   	unsigned long orig_asce;
>   	unsigned long se_handle;
> +	struct page **pinned_pages;
>   	int edat_level;
>   	bool removed;
>   	bool initialized;
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 99cdd2034503..9ce9363aee1c 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -298,6 +298,7 @@ static inline int uv_convert_from_secure(unsigned long paddr)
>   	return -EINVAL;
>   }
>   
> +int kvm_s390_pv_pin_page(struct gmap *gmap, unsigned long gpa);
>   /*
>    * Requests the Ultravisor to make a page accessible to a guest
>    * (import). If it's brought in the first time, it will be cleared. If
> @@ -317,6 +318,11 @@ static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
>   		.gaddr = gaddr
>   	};
>   
> +	down_read(&gmap->mm->mmap_sem);
> +	cc = kvm_s390_pv_pin_page(gmap, gaddr);
> +	up_read(&gmap->mm->mmap_sem);
> +	if (cc)
> +		return cc;
>   	cc = uv_call(0, (u64)&uvcb);
>   
>   	if (!cc)
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index f7778493e829..36554402b5c6 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -98,4 +98,24 @@ void adjust_to_uv_max(unsigned long *vmax)
>   	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
>   		*vmax = uv_info.max_sec_stor_addr;
>   }
> +
> +int kvm_s390_pv_pin_page(struct gmap *gmap, unsigned long gpa)
> +{
> +	unsigned long hva, gfn = gpa / PAGE_SIZE;
> +	int rc;
> +
> +	if (!gmap->pinned_pages)
> +		return -EINVAL;
> +	hva = __gmap_translate(gmap, gpa);
> +	if (IS_ERR_VALUE(hva))
> +		return -EFAULT;
> +	if (gmap->pinned_pages[gfn])
> +		return -EEXIST;
> +	rc = get_user_pages_fast(hva, 1, FOLL_WRITE, gmap->pinned_pages + gfn);
> +	if (rc < 0)
> +		return rc;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_s390_pv_pin_page);
> +
>   #endif
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index d1ba12f857e7..490fde080107 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2196,6 +2196,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>   		/* All VCPUs have to be destroyed before this call. */
>   		mutex_lock(&kvm->lock);
>   		kvm_s390_vcpu_block_all(kvm);
> +		kvm_s390_pv_unpin(kvm);
>   		r = kvm_s390_pv_destroy_vm(kvm);
>   		if (!r)
>   			kvm_s390_pv_dealloc_vm(kvm);
> @@ -2680,6 +2681,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>   	kvm_s390_gisa_destroy(kvm);
>   	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
>   	    kvm_s390_pv_is_protected(kvm)) {
> +		kvm_s390_pv_unpin(kvm);
>   		kvm_s390_pv_destroy_vm(kvm);
>   		kvm_s390_pv_dealloc_vm(kvm);
>   	}
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 80aecd5bea9e..383e660e2221 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -15,8 +15,35 @@
>   #include <asm/mman.h>
>   #include "kvm-s390.h"
>   
> +static void unpin_destroy(struct page **pages, int nr)
> +{
> +	int i;
> +	struct page *page;
> +	u8 *val;
> +
> +	for (i = 0; i < nr; i++) {
> +		page = pages[i];
> +		if (!page)	/* page was never used */
> +			continue;
> +		val = (void *)page_to_phys(page);
> +		READ_ONCE(*val);
> +		put_page(page);
> +	}
> +}
> +
> +void kvm_s390_pv_unpin(struct kvm *kvm)
> +{
> +	unsigned long npages = kvm->arch.pv.guest_len / PAGE_SIZE;
> +
> +	mutex_lock(&kvm->slots_lock);
> +	unpin_destroy(kvm->arch.gmap->pinned_pages, npages);
> +	mutex_unlock(&kvm->slots_lock);
> +}
> +
>   void kvm_s390_pv_dealloc_vm(struct kvm *kvm)
>   {
> +	vfree(kvm->arch.gmap->pinned_pages);
> +	kvm->arch.gmap->pinned_pages = NULL;
>   	vfree(kvm->arch.pv.stor_var);
>   	free_pages(kvm->arch.pv.stor_base,
>   		   get_order(uv_info.guest_base_stor_len));
> @@ -28,7 +55,6 @@ int kvm_s390_pv_alloc_vm(struct kvm *kvm)
>   	unsigned long base = uv_info.guest_base_stor_len;
>   	unsigned long virt = uv_info.guest_virt_var_stor_len;
>   	unsigned long npages = 0, vlen = 0;
> -	struct kvm_memslots *slots;
>   	struct kvm_memory_slot *memslot;
>   
>   	kvm->arch.pv.stor_var = NULL;
> @@ -43,22 +69,26 @@ int kvm_s390_pv_alloc_vm(struct kvm *kvm)
>   	 * Slots are sorted by GFN
>   	 */
>   	mutex_lock(&kvm->slots_lock);
> -	slots = kvm_memslots(kvm);
> -	memslot = slots->memslots;
> +	memslot = kvm_memslots(kvm)->memslots;
>   	npages = memslot->base_gfn + memslot->npages;
> -
>   	mutex_unlock(&kvm->slots_lock);
> +
> +	kvm->arch.gmap->pinned_pages = vzalloc(npages * sizeof(struct page *));
> +	if (!kvm->arch.gmap->pinned_pages)
> +		goto out_err;
>   	kvm->arch.pv.guest_len = npages * PAGE_SIZE;
>   
>   	/* Allocate variable storage */
>   	vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
>   	vlen += uv_info.guest_virt_base_stor_len;
>   	kvm->arch.pv.stor_var = vzalloc(vlen);
> -	if (!kvm->arch.pv.stor_var) {
> -		kvm_s390_pv_dealloc_vm(kvm);
> -		return -ENOMEM;
> -	}
> +	if (!kvm->arch.pv.stor_var)
> +		goto out_err;
>   	return 0;
> +
> +out_err:
> +	kvm_s390_pv_dealloc_vm(kvm);
> +	return -ENOMEM;
>   }
>   
>   int kvm_s390_pv_destroy_vm(struct kvm *kvm)
> @@ -216,6 +246,11 @@ int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
>   	for (i = 0; i < size / PAGE_SIZE; i++) {
>   		uvcb.gaddr = addr + i * PAGE_SIZE;
>   		uvcb.tweak[1] = i * PAGE_SIZE;
> +		down_read(&kvm->mm->mmap_sem);
> +		rc = kvm_s390_pv_pin_page(kvm->arch.gmap, uvcb.gaddr);
> +		up_read(&kvm->mm->mmap_sem);
> +		if (rc && (rc != -EEXIST))
> +			break;
>   retry:
>   		rc = uv_call(0, (u64)&uvcb);
>   		if (!rc)
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-10-24 11:40 ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Janosch Frank
@ 2019-10-25  8:58   ` David Hildenbrand
  2019-10-25  9:02     ` David Hildenbrand
  2019-11-04  8:18   ` Christian Borntraeger
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-25  8:58 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> Let's add a KVM interface to create and destroy protected VMs.

More details please.

[...]

>   
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
> +{
> +	int r = 0;
> +	void __user *argp = (void __user *)cmd->data;
> +
> +	switch (cmd->cmd) {
> +	case KVM_PV_VM_CREATE: {
> +		r = kvm_s390_pv_alloc_vm(kvm);
> +		if (r)
> +			break;

So ... I can create multiple VMs?

Especially, I can call KVM_PV_VM_CREATE two times, setting 
"kvm->arch.pv.stor_var = NULL and leaking memory" on the second call. 
Not sure if that's desirable.


Shouldn't this be something like "KVM_PV_VM_INIT" and then make sure it 
can only be called once?

> +
> +		mutex_lock(&kvm->lock);
> +		kvm_s390_vcpu_block_all(kvm);
> +		/* FMT 4 SIE needs esca */
> +		r = sca_switch_to_extended(kvm);
> +		if (!r)
> +			r = kvm_s390_pv_create_vm(kvm);
> +		kvm_s390_vcpu_unblock_all(kvm);
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
> +	case KVM_PV_VM_DESTROY: {
> +		/* All VCPUs have to be destroyed before this call. */

Then please verify that? "KVM_PV_VM_DEINIT"

Also, who guarantees that user space calls this at all? Why is that 
needed? (IOW, when does user space call this?)

> +		mutex_lock(&kvm->lock);
> +		kvm_s390_vcpu_block_all(kvm);
> +		r = kvm_s390_pv_destroy_vm(kvm);
> +		if (!r)
> +			kvm_s390_pv_dealloc_vm(kvm);
> +		kvm_s390_vcpu_unblock_all(kvm);
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
> +	case KVM_PV_VM_SET_SEC_PARMS: {
> +		struct kvm_s390_pv_sec_parm parms = {};
> +		void *hdr;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&parms, argp, sizeof(parms)))
> +			break;
> +
> +		/* Currently restricted to 8KB */
> +		r = -EINVAL;
> +		if (parms.length > PAGE_SIZE * 2)
> +			break;
> +
> +		r = -ENOMEM;
> +		hdr = vmalloc(parms.length);
> +		if (!hdr)
> +			break;
> +
> +		r = -EFAULT;
> +		if (!copy_from_user(hdr, (void __user *)parms.origin,
> +				   parms.length))
> +			r = kvm_s390_pv_set_sec_parms(kvm, hdr, parms.length);
> +
> +		vfree(hdr);
> +		break;
> +	}
> +	case KVM_PV_VM_UNPACK: {
> +		struct kvm_s390_pv_unp unp = {};
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&unp, argp, sizeof(unp)))
> +			break;
> +
> +		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak);
> +		break;
> +	}
> +	case KVM_PV_VM_VERIFY: {
> +		u32 ret;
> +
> +		r = -EINVAL;
> +		if (!kvm_s390_pv_is_protected(kvm))
> +			break;
> +
> +		r = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
> +				  UVC_CMD_VERIFY_IMG,
> +				  &ret);
> +		VM_EVENT(kvm, 3, "PROTVIRT VERIFY: rc %x rrc %x",
> +			 ret >> 16, ret & 0x0000ffff);
> +		break;
> +	}
> +	default:
> +		return -ENOTTY;
> +	}
> +	return r;
> +}
> +#endif
> +
>   long kvm_arch_vm_ioctl(struct file *filp,
>   		       unsigned int ioctl, unsigned long arg)
>   {
> @@ -2254,6 +2351,22 @@ long kvm_arch_vm_ioctl(struct file *filp,
>   		mutex_unlock(&kvm->slots_lock);
>   		break;
>   	}
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +	case KVM_S390_PV_COMMAND: {
> +		struct kvm_pv_cmd args;
> +
> +		r = -EINVAL;
> +		if (!is_prot_virt_host())
> +			break;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&args, argp, sizeof(args)))
> +			break;
> +
> +		r = kvm_s390_handle_pv(kvm, &args);
> +		break;
> +	}
> +#endif
>   	default:
>   		r = -ENOTTY;
>   	}
> @@ -2529,6 +2642,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>   
>   	if (vcpu->kvm->arch.use_cmma)
>   		kvm_s390_vcpu_unsetup_cmma(vcpu);
> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
> +	    kvm_s390_pv_handle_cpu(vcpu))
> +		kvm_s390_pv_destroy_cpu(vcpu);
>   	free_page((unsigned long)(vcpu->arch.sie_block));
>   
>   	kvm_vcpu_uninit(vcpu);
> @@ -2555,8 +2671,13 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>   {
>   	kvm_free_vcpus(kvm);
>   	sca_dispose(kvm);
> -	debug_unregister(kvm->arch.dbf);
>   	kvm_s390_gisa_destroy(kvm);
> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
> +	    kvm_s390_pv_is_protected(kvm)) {
> +		kvm_s390_pv_destroy_vm(kvm);
> +		kvm_s390_pv_dealloc_vm(kvm);
> +	}
> +	debug_unregister(kvm->arch.dbf);
>   	free_page((unsigned long)kvm->arch.sie_page2);
>   	if (!kvm_is_ucontrol(kvm))
>   		gmap_remove(kvm->arch.gmap);
> @@ -2652,6 +2773,9 @@ static int sca_switch_to_extended(struct kvm *kvm)
>   	unsigned int vcpu_idx;
>   	u32 scaol, scaoh;
>   
> +	if (kvm->arch.use_esca)
> +		return 0;
> +
>   	new_sca = alloc_pages_exact(sizeof(*new_sca), GFP_KERNEL|__GFP_ZERO);
>   	if (!new_sca)
>   		return -ENOMEM;
> @@ -3073,6 +3197,15 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
>   	rc = kvm_vcpu_init(vcpu, kvm, id);
>   	if (rc)
>   		goto out_free_sie_block;
> +
> +	if (kvm_s390_pv_is_protected(kvm)) {
> +		rc = kvm_s390_pv_create_cpu(vcpu);
> +		if (rc) {
> +			kvm_vcpu_uninit(vcpu);
> +			goto out_free_sie_block;
> +		}
> +	}
> +
>   	VM_EVENT(kvm, 3, "create cpu %d at 0x%pK, sie block at 0x%pK", id, vcpu,
>   		 vcpu->arch.sie_block);
>   	trace_kvm_s390_create_vcpu(id, vcpu, vcpu->arch.sie_block);
> @@ -4338,6 +4471,28 @@ long kvm_arch_vcpu_async_ioctl(struct file *filp,
>   	return -ENOIOCTLCMD;
>   }
>   
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
> +				   struct kvm_pv_cmd *cmd)
> +{
> +	int r = 0;
> +
> +	switch (cmd->cmd) {
> +	case KVM_PV_VCPU_CREATE: {
> +		r = kvm_s390_pv_create_cpu(vcpu);
> +		break;
> +	}
> +	case KVM_PV_VCPU_DESTROY: {
> +		r = kvm_s390_pv_destroy_cpu(vcpu);
> +		break;
> +	}
> +	default:
> +		r = -ENOTTY;
> +	}
> +	return r;
> +}
> +#endif
> +
>   long kvm_arch_vcpu_ioctl(struct file *filp,
>   			 unsigned int ioctl, unsigned long arg)
>   {
> @@ -4470,6 +4625,22 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>   					   irq_state.len);
>   		break;
>   	}
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +	case KVM_S390_PV_COMMAND_VCPU: {
> +		struct kvm_pv_cmd args;
> +
> +		r = -EINVAL;
> +		if (!is_prot_virt_host())
> +			break;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&args, argp, sizeof(args)))
> +			break;
> +
> +		r = kvm_s390_handle_pv_vcpu(vcpu, &args);
> +		break;
> +	}
> +#endif
>   	default:
>   		r = -ENOTTY;
>   	}
> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
> index 6d9448dbd052..0d61dcc51f0e 100644
> --- a/arch/s390/kvm/kvm-s390.h
> +++ b/arch/s390/kvm/kvm-s390.h
> @@ -196,6 +196,53 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm)
>   	return kvm->arch.user_cpu_state_ctrl != 0;
>   }
>   
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +/* implemented in pv.c */
> +void kvm_s390_pv_unpin(struct kvm *kvm);
> +void kvm_s390_pv_dealloc_vm(struct kvm *kvm);
> +int kvm_s390_pv_alloc_vm(struct kvm *kvm);
> +int kvm_s390_pv_create_vm(struct kvm *kvm);
> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu);
> +int kvm_s390_pv_destroy_vm(struct kvm *kvm);
> +int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu);
> +int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> +		       unsigned long tweak);
> +int kvm_s390_pv_verify(struct kvm *kvm);
> +
> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
> +{
> +	return !!kvm->arch.pv.handle;
> +}
> +
> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm)
> +{
> +	return kvm->arch.pv.handle;
> +}
> +
> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.pv.handle;
> +}
> +#else
> +static inline void kvm_s390_pv_unpin(struct kvm *kvm) {}
> +static inline void kvm_s390_pv_dealloc_vm(struct kvm *kvm) {}
> +static inline int kvm_s390_pv_alloc_vm(struct kvm *kvm) { return 0; }
> +static inline int kvm_s390_pv_create_vm(struct kvm *kvm) { return 0; }
> +static inline int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu) { return 0; }
> +static inline int kvm_s390_pv_destroy_vm(struct kvm *kvm) { return 0; }
> +static inline int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu) { return 0; }
> +static inline int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
> +					    u64 origin, u64 length) { return 0; }
> +static inline int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr,
> +				     unsigned long size,  unsigned long tweak)
> +{ return 0; }
> +static inline int kvm_s390_pv_verify(struct kvm *kvm) { return 0; }
> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm) { return 0; }
> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm) { return 0; }
> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu) { return 0; }
> +#endif
> +
>   /* implemented in interrupt.c */
>   int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
>   void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu);
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> new file mode 100644
> index 000000000000..94cf16f40f25
> --- /dev/null
> +++ b/arch/s390/kvm/pv.c
> @@ -0,0 +1,237 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Hosting Secure Execution virtual machines
> + *
> + * Copyright IBM Corp. 2019
> + *    Author(s): Janosch Frank <frankja@linux.ibm.com>
> + */
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +#include <linux/pagemap.h>
> +#include <asm/pgalloc.h>
> +#include <asm/gmap.h>
> +#include <asm/uv.h>
> +#include <asm/gmap.h>
> +#include <asm/mman.h>
> +#include "kvm-s390.h"
> +
> +void kvm_s390_pv_dealloc_vm(struct kvm *kvm)
> +{
> +	vfree(kvm->arch.pv.stor_var);
> +	free_pages(kvm->arch.pv.stor_base,
> +		   get_order(uv_info.guest_base_stor_len));
> +	memset(&kvm->arch.pv, 0, sizeof(kvm->arch.pv));
> +}
> +
> +int kvm_s390_pv_alloc_vm(struct kvm *kvm)
> +{
> +	unsigned long base = uv_info.guest_base_stor_len;
> +	unsigned long virt = uv_info.guest_virt_var_stor_len;
> +	unsigned long npages = 0, vlen = 0;
> +	struct kvm_memslots *slots;
> +	struct kvm_memory_slot *memslot;
> +
> +	kvm->arch.pv.stor_var = NULL;
> +	kvm->arch.pv.stor_base = __get_free_pages(GFP_KERNEL, get_order(base));
> +	if (!kvm->arch.pv.stor_base)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Calculate current guest storage for allocation of the
> +	 * variable storage, which is based on the length in MB.
> +	 *
> +	 * Slots are sorted by GFN
> +	 */
> +	mutex_lock(&kvm->slots_lock);
> +	slots = kvm_memslots(kvm);
> +	memslot = slots->memslots;
> +	npages = memslot->base_gfn + memslot->npages;

What if

a) your guest has multiple memory slots
b) you hotplug memory and add memslots later

Do you dence that, and if so, how?

> +
> +	mutex_unlock(&kvm->slots_lock);
> +	kvm->arch.pv.guest_len = npages * PAGE_SIZE;
> +
> +	/* Allocate variable storage */
> +	vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);

I get the feeling that prot virt mainly consumes memory ;)

> +	vlen += uv_info.guest_virt_base_stor_len;
> +	kvm->arch.pv.stor_var = vzalloc(vlen);
> +	if (!kvm->arch.pv.stor_var) {
> +		kvm_s390_pv_dealloc_vm(kvm);
> +		return -ENOMEM;
> +	}
> +	return 0;
> +}
> +
> +int kvm_s390_pv_destroy_vm(struct kvm *kvm)
> +{
> +	int rc;
> +	u32 ret;
> +
> +	rc = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
> +			   UVC_CMD_DESTROY_SEC_CONF, &ret);
> +	VM_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
> +		 ret >> 16, ret & 0x0000ffff);
> +	return rc;
> +}
> +
> +int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
> +{
> +	int rc = 0;
> +	u32 ret;
> +
> +	if (kvm_s390_pv_handle_cpu(vcpu)) {
> +		rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
> +				   UVC_CMD_DESTROY_SEC_CPU,
> +				   &ret);
> +
> +		VCPU_EVENT(vcpu, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
> +			   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
> +	}
> +
> +	free_pages(vcpu->arch.pv.stor_base,
> +		   get_order(uv_info.guest_cpu_stor_len));
> +	/* Clear cpu and vm handle */
> +	memset(&vcpu->arch.sie_block->reserved10, 0,
> +	       sizeof(vcpu->arch.sie_block->reserved10));
> +	memset(&vcpu->arch.pv, 0, sizeof(vcpu->arch.pv));
> +	vcpu->arch.sie_block->sdf = 0;
> +	return rc;
> +}
> +
> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
> +{
> +	int rc;
> +	struct uv_cb_csc uvcb = {
> +		.header.cmd = UVC_CMD_CREATE_SEC_CPU,
> +		.header.len = sizeof(uvcb),
> +	};
> +
> +	/* EEXIST and ENOENT? */
> +	if (kvm_s390_pv_handle_cpu(vcpu))
> +		return -EINVAL;
> +
> +	vcpu->arch.pv.stor_base = __get_free_pages(GFP_KERNEL,
> +						   get_order(uv_info.guest_cpu_stor_len));
> +	if (!vcpu->arch.pv.stor_base)
> +		return -ENOMEM;
> +
> +	/* Input */
> +	uvcb.guest_handle = kvm_s390_pv_handle(vcpu->kvm);
> +	uvcb.num = vcpu->arch.sie_block->icpua;
> +	uvcb.state_origin = (u64)vcpu->arch.sie_block;
> +	uvcb.stor_origin = (u64)vcpu->arch.pv.stor_base;
> +
> +	rc = uv_call(0, (u64)&uvcb);
> +	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
> +		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
> +		   uvcb.header.rrc);
> +
> +	/* Output */
> +	vcpu->arch.pv.handle = uvcb.cpu_handle;
> +	vcpu->arch.sie_block->pv_handle_cpu = uvcb.cpu_handle;
> +	vcpu->arch.sie_block->pv_handle_config = kvm_s390_pv_handle(vcpu->kvm);
> +	vcpu->arch.sie_block->sdf = 2;
> +	if (!rc)
> +		return 0;
> +
> +	kvm_s390_pv_destroy_cpu(vcpu);
> +	return -EINVAL;
> +}
> +
> +int kvm_s390_pv_create_vm(struct kvm *kvm)
> +{
> +	int rc;
> +
> +	struct uv_cb_cgc uvcb = {
> +		.header.cmd = UVC_CMD_CREATE_SEC_CONF,
> +		.header.len = sizeof(uvcb)
> +	};
> +
> +	if (kvm_s390_pv_handle(kvm))
> +		return -EINVAL;
> +
> +	/* Inputs */
> +	uvcb.guest_stor_origin = 0; /* MSO is 0 for KVM */
> +	uvcb.guest_stor_len = kvm->arch.pv.guest_len;
> +	uvcb.guest_asce = kvm->arch.gmap->asce;
> +	uvcb.conf_base_stor_origin = (u64)kvm->arch.pv.stor_base;
> +	uvcb.conf_var_stor_origin = (u64)kvm->arch.pv.stor_var;
> +
> +	rc = uv_call(0, (u64)&uvcb);
> +	VM_EVENT(kvm, 3, "PROTVIRT CREATE VM: handle %llx len %llx rc %x rrc %x",
> +		 uvcb.guest_handle, uvcb.guest_stor_len, uvcb.header.rc,
> +		 uvcb.header.rrc);
> +
> +	/* Outputs */
> +	kvm->arch.pv.handle = uvcb.guest_handle;
> +
> +	if (rc && (uvcb.header.rc & 0x8000)) {
> +		kvm_s390_pv_destroy_vm(kvm);
> +		kvm_s390_pv_dealloc_vm(kvm);
> +		return -EINVAL;
> +	}
> +	return rc;
> +}
> +
> +int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
> +			      void *hdr, u64 length)
> +{
> +	int rc;
> +	struct uv_cb_ssc uvcb = {
> +		.header.cmd = UVC_CMD_SET_SEC_CONF_PARAMS,
> +		.header.len = sizeof(uvcb),
> +		.sec_header_origin = (u64)hdr,
> +		.sec_header_len = length,
> +		.guest_handle = kvm_s390_pv_handle(kvm),
> +	};
> +
> +	if (!kvm_s390_pv_handle(kvm))
> +		return -EINVAL;
> +
> +	rc = uv_call(0, (u64)&uvcb);
> +	VM_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x",
> +		 uvcb.header.rc, uvcb.header.rrc);
> +	if (rc)
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> +		       unsigned long tweak)
> +{
> +	int i, rc = 0;
> +	struct uv_cb_unp uvcb = {
> +		.header.cmd = UVC_CMD_UNPACK_IMG,
> +		.header.len = sizeof(uvcb),
> +		.guest_handle = kvm_s390_pv_handle(kvm),
> +		.tweak[0] = tweak
> +	};
> +
> +	if (addr & ~PAGE_MASK || size & ~PAGE_MASK)
> +		return -EINVAL;
> +
> +
> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",
> +		 addr, size);
> +	for (i = 0; i < size / PAGE_SIZE; i++) {
> +		uvcb.gaddr = addr + i * PAGE_SIZE;
> +		uvcb.tweak[1] = i * PAGE_SIZE;
> +retry:
> +		rc = uv_call(0, (u64)&uvcb);
> +		if (!rc)
> +			continue;
> +		/* If not yet mapped fault and retry */
> +		if (uvcb.header.rc == 0x10a) {
> +			rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
> +					FAULT_FLAG_WRITE);
> +			if (rc)
> +				return rc;
> +			goto retry;
> +		}
> +		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
> +			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
> +		break;
> +	}
> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x rrc %x",
> +		 uvcb.header.rc, uvcb.header.rrc);
> +	return rc;
> +}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 52641d8ca9e8..bb37d5710c89 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1000,6 +1000,7 @@ struct kvm_ppc_resize_hpt {
>   #define KVM_CAP_PMU_EVENT_FILTER 173
>   #define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174
>   #define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 175
> +#define KVM_CAP_S390_PROTECTED 180
>   
>   #ifdef KVM_CAP_IRQ_ROUTING
>   
> @@ -1461,6 +1462,38 @@ struct kvm_enc_region {
>   /* Available with KVM_CAP_ARM_SVE */
>   #define KVM_ARM_VCPU_FINALIZE	  _IOW(KVMIO,  0xc2, int)
>   
> +struct kvm_s390_pv_sec_parm {
> +	__u64	origin;
> +	__u64	length;
> +};
> +
> +struct kvm_s390_pv_unp {
> +	__u64 addr;
> +	__u64 size;
> +	__u64 tweak;
> +};
> +
> +enum pv_cmd_id {
> +	KVM_PV_VM_CREATE,
> +	KVM_PV_VM_DESTROY,
> +	KVM_PV_VM_SET_SEC_PARMS,
> +	KVM_PV_VM_UNPACK,
> +	KVM_PV_VM_VERIFY,
> +	KVM_PV_VCPU_CREATE,
> +	KVM_PV_VCPU_DESTROY,
> +};
> +
> +struct kvm_pv_cmd {
> +	__u32	cmd;
> +	__u16	rc;
> +	__u16	rrc;
> +	__u64	data;
> +};
> +
> +/* Available with KVM_CAP_S390_SE */
> +#define KVM_S390_PV_COMMAND		_IOW(KVMIO, 0xc3, struct kvm_pv_cmd)
> +#define KVM_S390_PV_COMMAND_VCPU	_IOW(KVMIO, 0xc4, struct kvm_pv_cmd)
> +
>   /* Secure Encrypted Virtualization command */
>   enum sev_cmd_id {
>   	/* Guest initialization commands */
> 

This is a lengthy patch and I ahven't explored anything yet :)

I do wonder if it makes sense to split this up. VM, VCPUs, parameters, 
Extract+verify ...

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-10-25  8:58   ` David Hildenbrand
@ 2019-10-25  9:02     ` David Hildenbrand
  0 siblings, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-10-25  9:02 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 25.10.19 10:58, David Hildenbrand wrote:
> On 24.10.19 13:40, Janosch Frank wrote:
>> Let's add a KVM interface to create and destroy protected VMs.
> 
> More details please.
> 
> [...]
> 
>>    
>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>> +static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>> +{
>> +	int r = 0;
>> +	void __user *argp = (void __user *)cmd->data;
>> +
>> +	switch (cmd->cmd) {
>> +	case KVM_PV_VM_CREATE: {
>> +		r = kvm_s390_pv_alloc_vm(kvm);
>> +		if (r)
>> +			break;
> 
> So ... I can create multiple VMs?
> 
> Especially, I can call KVM_PV_VM_CREATE two times, setting
> "kvm->arch.pv.stor_var = NULL and leaking memory" on the second call.
> Not sure if that's desirable.
> 
> 
> Shouldn't this be something like "KVM_PV_VM_INIT" and then make sure it
> can only be called once?
> 
>> +
>> +		mutex_lock(&kvm->lock);
>> +		kvm_s390_vcpu_block_all(kvm);
>> +		/* FMT 4 SIE needs esca */
>> +		r = sca_switch_to_extended(kvm);
>> +		if (!r)
>> +			r = kvm_s390_pv_create_vm(kvm);
>> +		kvm_s390_vcpu_unblock_all(kvm);
>> +		mutex_unlock(&kvm->lock);
>> +		break;
>> +	}
>> +	case KVM_PV_VM_DESTROY: {
>> +		/* All VCPUs have to be destroyed before this call. */
> 
> Then please verify that? "KVM_PV_VM_DEINIT"
> 
> Also, who guarantees that user space calls this at all? Why is that
> needed? (IOW, when does user space call this?)
> 
>> +		mutex_lock(&kvm->lock);
>> +		kvm_s390_vcpu_block_all(kvm);
>> +		r = kvm_s390_pv_destroy_vm(kvm);
>> +		if (!r)
>> +			kvm_s390_pv_dealloc_vm(kvm);
>> +		kvm_s390_vcpu_unblock_all(kvm);
>> +		mutex_unlock(&kvm->lock);
>> +		break;
>> +	}
>> +	case KVM_PV_VM_SET_SEC_PARMS: {
>> +		struct kvm_s390_pv_sec_parm parms = {};
>> +		void *hdr;
>> +
>> +		r = -EFAULT;
>> +		if (copy_from_user(&parms, argp, sizeof(parms)))
>> +			break;
>> +
>> +		/* Currently restricted to 8KB */
>> +		r = -EINVAL;
>> +		if (parms.length > PAGE_SIZE * 2)
>> +			break;
>> +
>> +		r = -ENOMEM;
>> +		hdr = vmalloc(parms.length);
>> +		if (!hdr)
>> +			break;
>> +
>> +		r = -EFAULT;
>> +		if (!copy_from_user(hdr, (void __user *)parms.origin,
>> +				   parms.length))
>> +			r = kvm_s390_pv_set_sec_parms(kvm, hdr, parms.length);
>> +
>> +		vfree(hdr);
>> +		break;
>> +	}
>> +	case KVM_PV_VM_UNPACK: {
>> +		struct kvm_s390_pv_unp unp = {};
>> +
>> +		r = -EFAULT;
>> +		if (copy_from_user(&unp, argp, sizeof(unp)))
>> +			break;
>> +
>> +		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak);
>> +		break;
>> +	}
>> +	case KVM_PV_VM_VERIFY: {
>> +		u32 ret;
>> +
>> +		r = -EINVAL;
>> +		if (!kvm_s390_pv_is_protected(kvm))
>> +			break;
>> +
>> +		r = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
>> +				  UVC_CMD_VERIFY_IMG,
>> +				  &ret);
>> +		VM_EVENT(kvm, 3, "PROTVIRT VERIFY: rc %x rrc %x",
>> +			 ret >> 16, ret & 0x0000ffff);
>> +		break;
>> +	}
>> +	default:
>> +		return -ENOTTY;
>> +	}
>> +	return r;
>> +}
>> +#endif
>> +
>>    long kvm_arch_vm_ioctl(struct file *filp,
>>    		       unsigned int ioctl, unsigned long arg)
>>    {
>> @@ -2254,6 +2351,22 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>    		mutex_unlock(&kvm->slots_lock);
>>    		break;
>>    	}
>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>> +	case KVM_S390_PV_COMMAND: {
>> +		struct kvm_pv_cmd args;
>> +
>> +		r = -EINVAL;
>> +		if (!is_prot_virt_host())
>> +			break;
>> +
>> +		r = -EFAULT;
>> +		if (copy_from_user(&args, argp, sizeof(args)))
>> +			break;
>> +
>> +		r = kvm_s390_handle_pv(kvm, &args);
>> +		break;
>> +	}
>> +#endif
>>    	default:
>>    		r = -ENOTTY;
>>    	}
>> @@ -2529,6 +2642,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>    
>>    	if (vcpu->kvm->arch.use_cmma)
>>    		kvm_s390_vcpu_unsetup_cmma(vcpu);
>> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
>> +	    kvm_s390_pv_handle_cpu(vcpu))
>> +		kvm_s390_pv_destroy_cpu(vcpu);
>>    	free_page((unsigned long)(vcpu->arch.sie_block));
>>    
>>    	kvm_vcpu_uninit(vcpu);
>> @@ -2555,8 +2671,13 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>    {
>>    	kvm_free_vcpus(kvm);
>>    	sca_dispose(kvm);
>> -	debug_unregister(kvm->arch.dbf);
>>    	kvm_s390_gisa_destroy(kvm);
>> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
>> +	    kvm_s390_pv_is_protected(kvm)) {
>> +		kvm_s390_pv_destroy_vm(kvm);
>> +		kvm_s390_pv_dealloc_vm(kvm);
>> +	}
>> +	debug_unregister(kvm->arch.dbf);
>>    	free_page((unsigned long)kvm->arch.sie_page2);
>>    	if (!kvm_is_ucontrol(kvm))
>>    		gmap_remove(kvm->arch.gmap);
>> @@ -2652,6 +2773,9 @@ static int sca_switch_to_extended(struct kvm *kvm)
>>    	unsigned int vcpu_idx;
>>    	u32 scaol, scaoh;
>>    
>> +	if (kvm->arch.use_esca)
>> +		return 0;
>> +
>>    	new_sca = alloc_pages_exact(sizeof(*new_sca), GFP_KERNEL|__GFP_ZERO);
>>    	if (!new_sca)
>>    		return -ENOMEM;
>> @@ -3073,6 +3197,15 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
>>    	rc = kvm_vcpu_init(vcpu, kvm, id);
>>    	if (rc)
>>    		goto out_free_sie_block;
>> +
>> +	if (kvm_s390_pv_is_protected(kvm)) {
>> +		rc = kvm_s390_pv_create_cpu(vcpu);
>> +		if (rc) {
>> +			kvm_vcpu_uninit(vcpu);
>> +			goto out_free_sie_block;
>> +		}
>> +	}
>> +
>>    	VM_EVENT(kvm, 3, "create cpu %d at 0x%pK, sie block at 0x%pK", id, vcpu,
>>    		 vcpu->arch.sie_block);
>>    	trace_kvm_s390_create_vcpu(id, vcpu, vcpu->arch.sie_block);
>> @@ -4338,6 +4471,28 @@ long kvm_arch_vcpu_async_ioctl(struct file *filp,
>>    	return -ENOIOCTLCMD;
>>    }
>>    
>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>> +static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
>> +				   struct kvm_pv_cmd *cmd)
>> +{
>> +	int r = 0;
>> +
>> +	switch (cmd->cmd) {
>> +	case KVM_PV_VCPU_CREATE: {
>> +		r = kvm_s390_pv_create_cpu(vcpu);
>> +		break;
>> +	}
>> +	case KVM_PV_VCPU_DESTROY: {
>> +		r = kvm_s390_pv_destroy_cpu(vcpu);
>> +		break;
>> +	}
>> +	default:
>> +		r = -ENOTTY;
>> +	}
>> +	return r;
>> +}
>> +#endif
>> +
>>    long kvm_arch_vcpu_ioctl(struct file *filp,
>>    			 unsigned int ioctl, unsigned long arg)
>>    {
>> @@ -4470,6 +4625,22 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>>    					   irq_state.len);
>>    		break;
>>    	}
>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>> +	case KVM_S390_PV_COMMAND_VCPU: {
>> +		struct kvm_pv_cmd args;
>> +
>> +		r = -EINVAL;
>> +		if (!is_prot_virt_host())
>> +			break;
>> +
>> +		r = -EFAULT;
>> +		if (copy_from_user(&args, argp, sizeof(args)))
>> +			break;
>> +
>> +		r = kvm_s390_handle_pv_vcpu(vcpu, &args);
>> +		break;
>> +	}
>> +#endif
>>    	default:
>>    		r = -ENOTTY;
>>    	}
>> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
>> index 6d9448dbd052..0d61dcc51f0e 100644
>> --- a/arch/s390/kvm/kvm-s390.h
>> +++ b/arch/s390/kvm/kvm-s390.h
>> @@ -196,6 +196,53 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm)
>>    	return kvm->arch.user_cpu_state_ctrl != 0;
>>    }
>>    
>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>> +/* implemented in pv.c */
>> +void kvm_s390_pv_unpin(struct kvm *kvm);
>> +void kvm_s390_pv_dealloc_vm(struct kvm *kvm);
>> +int kvm_s390_pv_alloc_vm(struct kvm *kvm);
>> +int kvm_s390_pv_create_vm(struct kvm *kvm);
>> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu);
>> +int kvm_s390_pv_destroy_vm(struct kvm *kvm);
>> +int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu);
>> +int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
>> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
>> +		       unsigned long tweak);
>> +int kvm_s390_pv_verify(struct kvm *kvm);
>> +
>> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
>> +{
>> +	return !!kvm->arch.pv.handle;
>> +}
>> +
>> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm)
>> +{
>> +	return kvm->arch.pv.handle;
>> +}
>> +
>> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu)
>> +{
>> +	return vcpu->arch.pv.handle;
>> +}
>> +#else
>> +static inline void kvm_s390_pv_unpin(struct kvm *kvm) {}
>> +static inline void kvm_s390_pv_dealloc_vm(struct kvm *kvm) {}
>> +static inline int kvm_s390_pv_alloc_vm(struct kvm *kvm) { return 0; }
>> +static inline int kvm_s390_pv_create_vm(struct kvm *kvm) { return 0; }
>> +static inline int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu) { return 0; }
>> +static inline int kvm_s390_pv_destroy_vm(struct kvm *kvm) { return 0; }
>> +static inline int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu) { return 0; }
>> +static inline int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
>> +					    u64 origin, u64 length) { return 0; }
>> +static inline int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr,
>> +				     unsigned long size,  unsigned long tweak)
>> +{ return 0; }
>> +static inline int kvm_s390_pv_verify(struct kvm *kvm) { return 0; }
>> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm) { return 0; }
>> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm) { return 0; }
>> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu) { return 0; }
>> +#endif
>> +
>>    /* implemented in interrupt.c */
>>    int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
>>    void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu);
>> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
>> new file mode 100644
>> index 000000000000..94cf16f40f25
>> --- /dev/null
>> +++ b/arch/s390/kvm/pv.c
>> @@ -0,0 +1,237 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Hosting Secure Execution virtual machines
>> + *
>> + * Copyright IBM Corp. 2019
>> + *    Author(s): Janosch Frank <frankja@linux.ibm.com>
>> + */
>> +#include <linux/kvm.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/pagemap.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/gmap.h>
>> +#include <asm/uv.h>
>> +#include <asm/gmap.h>
>> +#include <asm/mman.h>
>> +#include "kvm-s390.h"
>> +
>> +void kvm_s390_pv_dealloc_vm(struct kvm *kvm)
>> +{
>> +	vfree(kvm->arch.pv.stor_var);
>> +	free_pages(kvm->arch.pv.stor_base,
>> +		   get_order(uv_info.guest_base_stor_len));
>> +	memset(&kvm->arch.pv, 0, sizeof(kvm->arch.pv));
>> +}
>> +
>> +int kvm_s390_pv_alloc_vm(struct kvm *kvm)
>> +{
>> +	unsigned long base = uv_info.guest_base_stor_len;
>> +	unsigned long virt = uv_info.guest_virt_var_stor_len;
>> +	unsigned long npages = 0, vlen = 0;
>> +	struct kvm_memslots *slots;
>> +	struct kvm_memory_slot *memslot;
>> +
>> +	kvm->arch.pv.stor_var = NULL;
>> +	kvm->arch.pv.stor_base = __get_free_pages(GFP_KERNEL, get_order(base));
>> +	if (!kvm->arch.pv.stor_base)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * Calculate current guest storage for allocation of the
>> +	 * variable storage, which is based on the length in MB.
>> +	 *
>> +	 * Slots are sorted by GFN
>> +	 */
>> +	mutex_lock(&kvm->slots_lock);
>> +	slots = kvm_memslots(kvm);
>> +	memslot = slots->memslots;
>> +	npages = memslot->base_gfn + memslot->npages;
> 
> What if
> 
> a) your guest has multiple memory slots
> b) you hotplug memory and add memslots later
> 
> Do you dence that, and if so, how?
> 
>> +
>> +	mutex_unlock(&kvm->slots_lock);
>> +	kvm->arch.pv.guest_len = npages * PAGE_SIZE;
>> +
>> +	/* Allocate variable storage */
>> +	vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
> 
> I get the feeling that prot virt mainly consumes memory ;)
> 
>> +	vlen += uv_info.guest_virt_base_stor_len;
>> +	kvm->arch.pv.stor_var = vzalloc(vlen);
>> +	if (!kvm->arch.pv.stor_var) {
>> +		kvm_s390_pv_dealloc_vm(kvm);
>> +		return -ENOMEM;
>> +	}
>> +	return 0;
>> +}
>> +
>> +int kvm_s390_pv_destroy_vm(struct kvm *kvm)
>> +{
>> +	int rc;
>> +	u32 ret;
>> +
>> +	rc = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
>> +			   UVC_CMD_DESTROY_SEC_CONF, &ret);
>> +	VM_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
>> +		 ret >> 16, ret & 0x0000ffff);
>> +	return rc;
>> +}
>> +
>> +int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
>> +{
>> +	int rc = 0;
>> +	u32 ret;
>> +
>> +	if (kvm_s390_pv_handle_cpu(vcpu)) {
>> +		rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
>> +				   UVC_CMD_DESTROY_SEC_CPU,
>> +				   &ret);
>> +
>> +		VCPU_EVENT(vcpu, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
>> +			   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
>> +	}
>> +
>> +	free_pages(vcpu->arch.pv.stor_base,
>> +		   get_order(uv_info.guest_cpu_stor_len));
>> +	/* Clear cpu and vm handle */
>> +	memset(&vcpu->arch.sie_block->reserved10, 0,
>> +	       sizeof(vcpu->arch.sie_block->reserved10));
>> +	memset(&vcpu->arch.pv, 0, sizeof(vcpu->arch.pv));
>> +	vcpu->arch.sie_block->sdf = 0;
>> +	return rc;
>> +}
>> +
>> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
>> +{
>> +	int rc;
>> +	struct uv_cb_csc uvcb = {
>> +		.header.cmd = UVC_CMD_CREATE_SEC_CPU,
>> +		.header.len = sizeof(uvcb),
>> +	};
>> +
>> +	/* EEXIST and ENOENT? */
>> +	if (kvm_s390_pv_handle_cpu(vcpu))
>> +		return -EINVAL;
>> +
>> +	vcpu->arch.pv.stor_base = __get_free_pages(GFP_KERNEL,
>> +						   get_order(uv_info.guest_cpu_stor_len));
>> +	if (!vcpu->arch.pv.stor_base)
>> +		return -ENOMEM;
>> +
>> +	/* Input */
>> +	uvcb.guest_handle = kvm_s390_pv_handle(vcpu->kvm);
>> +	uvcb.num = vcpu->arch.sie_block->icpua;
>> +	uvcb.state_origin = (u64)vcpu->arch.sie_block;
>> +	uvcb.stor_origin = (u64)vcpu->arch.pv.stor_base;
>> +
>> +	rc = uv_call(0, (u64)&uvcb);
>> +	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
>> +		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
>> +		   uvcb.header.rrc);
>> +
>> +	/* Output */
>> +	vcpu->arch.pv.handle = uvcb.cpu_handle;
>> +	vcpu->arch.sie_block->pv_handle_cpu = uvcb.cpu_handle;
>> +	vcpu->arch.sie_block->pv_handle_config = kvm_s390_pv_handle(vcpu->kvm);
>> +	vcpu->arch.sie_block->sdf = 2;
>> +	if (!rc)
>> +		return 0;
>> +
>> +	kvm_s390_pv_destroy_cpu(vcpu);
>> +	return -EINVAL;
>> +}
>> +
>> +int kvm_s390_pv_create_vm(struct kvm *kvm)
>> +{
>> +	int rc;
>> +
>> +	struct uv_cb_cgc uvcb = {
>> +		.header.cmd = UVC_CMD_CREATE_SEC_CONF,
>> +		.header.len = sizeof(uvcb)
>> +	};
>> +
>> +	if (kvm_s390_pv_handle(kvm))
>> +		return -EINVAL;
>> +
>> +	/* Inputs */
>> +	uvcb.guest_stor_origin = 0; /* MSO is 0 for KVM */
>> +	uvcb.guest_stor_len = kvm->arch.pv.guest_len;
>> +	uvcb.guest_asce = kvm->arch.gmap->asce;
>> +	uvcb.conf_base_stor_origin = (u64)kvm->arch.pv.stor_base;
>> +	uvcb.conf_var_stor_origin = (u64)kvm->arch.pv.stor_var;
>> +
>> +	rc = uv_call(0, (u64)&uvcb);
>> +	VM_EVENT(kvm, 3, "PROTVIRT CREATE VM: handle %llx len %llx rc %x rrc %x",
>> +		 uvcb.guest_handle, uvcb.guest_stor_len, uvcb.header.rc,
>> +		 uvcb.header.rrc);
>> +
>> +	/* Outputs */
>> +	kvm->arch.pv.handle = uvcb.guest_handle;
>> +
>> +	if (rc && (uvcb.header.rc & 0x8000)) {
>> +		kvm_s390_pv_destroy_vm(kvm);
>> +		kvm_s390_pv_dealloc_vm(kvm);
>> +		return -EINVAL;
>> +	}
>> +	return rc;
>> +}
>> +
>> +int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
>> +			      void *hdr, u64 length)
>> +{
>> +	int rc;
>> +	struct uv_cb_ssc uvcb = {
>> +		.header.cmd = UVC_CMD_SET_SEC_CONF_PARAMS,
>> +		.header.len = sizeof(uvcb),
>> +		.sec_header_origin = (u64)hdr,
>> +		.sec_header_len = length,
>> +		.guest_handle = kvm_s390_pv_handle(kvm),
>> +	};
>> +
>> +	if (!kvm_s390_pv_handle(kvm))
>> +		return -EINVAL;
>> +
>> +	rc = uv_call(0, (u64)&uvcb);
>> +	VM_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x",
>> +		 uvcb.header.rc, uvcb.header.rrc);
>> +	if (rc)
>> +		return -EINVAL;
>> +	return 0;
>> +}
>> +
>> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
>> +		       unsigned long tweak)
>> +{
>> +	int i, rc = 0;
>> +	struct uv_cb_unp uvcb = {
>> +		.header.cmd = UVC_CMD_UNPACK_IMG,
>> +		.header.len = sizeof(uvcb),
>> +		.guest_handle = kvm_s390_pv_handle(kvm),
>> +		.tweak[0] = tweak
>> +	};
>> +
>> +	if (addr & ~PAGE_MASK || size & ~PAGE_MASK)
>> +		return -EINVAL;
>> +
>> +
>> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",
>> +		 addr, size);
>> +	for (i = 0; i < size / PAGE_SIZE; i++) {
>> +		uvcb.gaddr = addr + i * PAGE_SIZE;
>> +		uvcb.tweak[1] = i * PAGE_SIZE;
>> +retry:
>> +		rc = uv_call(0, (u64)&uvcb);
>> +		if (!rc)
>> +			continue;
>> +		/* If not yet mapped fault and retry */
>> +		if (uvcb.header.rc == 0x10a) {
>> +			rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
>> +					FAULT_FLAG_WRITE);
>> +			if (rc)
>> +				return rc;
>> +			goto retry;
>> +		}
>> +		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
>> +			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
>> +		break;
>> +	}
>> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x rrc %x",
>> +		 uvcb.header.rc, uvcb.header.rrc);
>> +	return rc;
>> +}
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 52641d8ca9e8..bb37d5710c89 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1000,6 +1000,7 @@ struct kvm_ppc_resize_hpt {
>>    #define KVM_CAP_PMU_EVENT_FILTER 173
>>    #define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174
>>    #define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 175
>> +#define KVM_CAP_S390_PROTECTED 180
>>    
>>    #ifdef KVM_CAP_IRQ_ROUTING
>>    
>> @@ -1461,6 +1462,38 @@ struct kvm_enc_region {
>>    /* Available with KVM_CAP_ARM_SVE */
>>    #define KVM_ARM_VCPU_FINALIZE	  _IOW(KVMIO,  0xc2, int)
>>    
>> +struct kvm_s390_pv_sec_parm {
>> +	__u64	origin;
>> +	__u64	length;
>> +};
>> +
>> +struct kvm_s390_pv_unp {
>> +	__u64 addr;
>> +	__u64 size;
>> +	__u64 tweak;
>> +};
>> +
>> +enum pv_cmd_id {
>> +	KVM_PV_VM_CREATE,
>> +	KVM_PV_VM_DESTROY,
>> +	KVM_PV_VM_SET_SEC_PARMS,
>> +	KVM_PV_VM_UNPACK,
>> +	KVM_PV_VM_VERIFY,
>> +	KVM_PV_VCPU_CREATE,
>> +	KVM_PV_VCPU_DESTROY,
>> +};
>> +
>> +struct kvm_pv_cmd {
>> +	__u32	cmd;
>> +	__u16	rc;
>> +	__u16	rrc;
>> +	__u64	data;
>> +};
>> +
>> +/* Available with KVM_CAP_S390_SE */
>> +#define KVM_S390_PV_COMMAND		_IOW(KVMIO, 0xc3, struct kvm_pv_cmd)
>> +#define KVM_S390_PV_COMMAND_VCPU	_IOW(KVMIO, 0xc4, struct kvm_pv_cmd)
>> +
>>    /* Secure Encrypted Virtualization command */
>>    enum sev_cmd_id {
>>    	/* Guest initialization commands */
>>
> 
> This is a lengthy patch and I ahven't explored anything yet :)

lol, everything. :)


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 05/37] s390: KVM: Export PV handle to gmap
  2019-10-24 11:40 ` [RFC 05/37] s390: KVM: Export PV handle to gmap Janosch Frank
@ 2019-10-25  9:04   ` David Hildenbrand
  0 siblings, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-10-25  9:04 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> We need it in the next patch, when doing memory management for the
> guest in the kernel's fault handler, where otherwise we wouldn't have
> access to the handle.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>   arch/s390/include/asm/gmap.h | 1 +
>   arch/s390/kvm/pv.c           | 1 +
>   2 files changed, 2 insertions(+)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index 37f96b6f0e61..6efc0b501227 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -61,6 +61,7 @@ struct gmap {
>   	spinlock_t shadow_lock;
>   	struct gmap *parent;
>   	unsigned long orig_asce;
> +	unsigned long se_handle;
>   	int edat_level;
>   	bool removed;
>   	bool initialized;
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 94cf16f40f25..80aecd5bea9e 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -169,6 +169,7 @@ int kvm_s390_pv_create_vm(struct kvm *kvm)
>   		kvm_s390_pv_dealloc_vm(kvm);
>   		return -EINVAL;
>   	}
> +	kvm->arch.gmap->se_handle = uvcb.guest_handle;
>   	return rc;
>   }
>   
> 

I'd suggest squashing that into the patch that needs it.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 03/37] s390/protvirt: add ultravisor initialization
  2019-10-24 11:40 ` [RFC 03/37] s390/protvirt: add ultravisor initialization Janosch Frank
@ 2019-10-25  9:21   ` David Hildenbrand
  2019-10-28 15:48     ` Vasily Gorbik
  2019-11-01 10:07   ` Christian Borntraeger
  2019-11-07 15:28   ` Cornelia Huck
  2 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-25  9:21 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> From: Vasily Gorbik <gor@linux.ibm.com>
> 
> Before being able to host protected virtual machines, donate some of
> the memory to the ultravisor. Besides that the ultravisor might impose
> addressing limitations for memory used to back protected VM storage. Treat
> that limit as protected virtualization host's virtual memory limit.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> ---
>   arch/s390/include/asm/uv.h | 16 ++++++++++++
>   arch/s390/kernel/setup.c   |  3 +++
>   arch/s390/kernel/uv.c      | 53 ++++++++++++++++++++++++++++++++++++++
>   3 files changed, 72 insertions(+)
> 
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 6db1bc495e67..82a46fb913e7 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -23,12 +23,14 @@
>   #define UVC_RC_NO_RESUME	0x0007
>   
>   #define UVC_CMD_QUI			0x0001
> +#define UVC_CMD_INIT_UV			0x000f
>   #define UVC_CMD_SET_SHARED_ACCESS	0x1000
>   #define UVC_CMD_REMOVE_SHARED_ACCESS	0x1001
>   
>   /* Bits in installed uv calls */
>   enum uv_cmds_inst {
>   	BIT_UVC_CMD_QUI = 0,
> +	BIT_UVC_CMD_INIT_UV = 1,
>   	BIT_UVC_CMD_SET_SHARED_ACCESS = 8,
>   	BIT_UVC_CMD_REMOVE_SHARED_ACCESS = 9,
>   };
> @@ -59,6 +61,15 @@ struct uv_cb_qui {
>   	u64 reserved98;
>   } __packed __aligned(8);
>   
> +struct uv_cb_init {
> +	struct uv_cb_header header;
> +	u64 reserved08[2];
> +	u64 stor_origin;
> +	u64 stor_len;
> +	u64 reserved28[4];
> +
> +} __packed __aligned(8);
> +
>   struct uv_cb_share {
>   	struct uv_cb_header header;
>   	u64 reserved08[3];
> @@ -158,8 +169,13 @@ static inline int is_prot_virt_host(void)
>   {
>   	return prot_virt_host;
>   }
> +
> +void setup_uv(void);
> +void adjust_to_uv_max(unsigned long *vmax);
>   #else
>   #define is_prot_virt_host() 0
> +static inline void setup_uv(void) {}
> +static inline void adjust_to_uv_max(unsigned long *vmax) {}
>   #endif
>   
>   #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
> index f36370f8af38..d29d83c0b8df 100644
> --- a/arch/s390/kernel/setup.c
> +++ b/arch/s390/kernel/setup.c
> @@ -567,6 +567,8 @@ static void __init setup_memory_end(void)
>   			vmax = _REGION1_SIZE; /* 4-level kernel page table */
>   	}
>   
> +	adjust_to_uv_max(&vmax);

I do wonder what would happen if vmax < max_physmem_end. Not sure if 
that is relevant at all.

> +
>   	/* module area is at the end of the kernel address space. */
>   	MODULES_END = vmax;
>   	MODULES_VADDR = MODULES_END - MODULES_LEN;
> @@ -1147,6 +1149,7 @@ void __init setup_arch(char **cmdline_p)
>   	 */
>   	memblock_trim_memory(1UL << (MAX_ORDER - 1 + PAGE_SHIFT));
>   
> +	setup_uv();
>   	setup_memory_end();
>   	setup_memory();
>   	dma_contiguous_reserve(memory_end);
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index 35ce89695509..f7778493e829 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -45,4 +45,57 @@ static int __init prot_virt_setup(char *val)
>   	return rc;
>   }
>   early_param("prot_virt", prot_virt_setup);
> +
> +static int __init uv_init(unsigned long stor_base, unsigned long stor_len)
> +{
> +	struct uv_cb_init uvcb = {
> +		.header.cmd = UVC_CMD_INIT_UV,
> +		.header.len = sizeof(uvcb),
> +		.stor_origin = stor_base,
> +		.stor_len = stor_len,
> +	};
> +	int cc;
> +
> +	cc = uv_call(0, (uint64_t)&uvcb);
> +	if (cc || uvcb.header.rc != UVC_RC_EXECUTED) {
> +		pr_err("Ultravisor init failed with cc: %d rc: 0x%hx\n", cc,
> +		       uvcb.header.rc);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +void __init setup_uv(void)
> +{
> +	unsigned long uv_stor_base;
> +
> +	if (!prot_virt_host)
> +		return;
> +
> +	uv_stor_base = (unsigned long)memblock_alloc_try_nid(
> +		uv_info.uv_base_stor_len, SZ_1M, SZ_2G,
> +		MEMBLOCK_ALLOC_ACCESSIBLE, NUMA_NO_NODE);
> +	if (!uv_stor_base) {
> +		pr_info("Failed to reserve %lu bytes for ultravisor base storage\n",
> +			uv_info.uv_base_stor_len);
> +		goto fail;
> +	}

If I'm not wrong, we could setup/reserve a CMA area here and defer the 
actual allocation. Then, any MOVABLE data can end up on this CMA area 
until needed.

But I am neither an expert on CMA nor on UV, so most probably what I say 
is wrong ;)

> +
> +	if (uv_init(uv_stor_base, uv_info.uv_base_stor_len)) {
> +		memblock_free(uv_stor_base, uv_info.uv_base_stor_len);
> +		goto fail;
> +	}
> +
> +	pr_info("Reserving %luMB as ultravisor base storage\n",
> +		uv_info.uv_base_stor_len >> 20);
> +	return;
> +fail:
> +	prot_virt_host = 0;
> +}
> +
> +void adjust_to_uv_max(unsigned long *vmax)
> +{
> +	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
> +		*vmax = uv_info.max_sec_stor_addr;
> +}
>   #endif
> 

Looks good to me from what I can tell.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-10-24 11:40 ` [RFC 02/37] s390/protvirt: introduce host side setup Janosch Frank
  2019-10-24 13:25   ` David Hildenbrand
@ 2019-10-28 14:54   ` Cornelia Huck
  2019-10-28 20:20     ` Christian Borntraeger
  2019-11-01  8:53   ` Christian Borntraeger
  2019-11-04 15:54   ` Cornelia Huck
  3 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-10-28 14:54 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:24 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> From: Vasily Gorbik <gor@linux.ibm.com>
> 
> Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option for
> protected virtual machines hosting support code.
> 
> Add "prot_virt" command line option which controls if the kernel
> protected VMs support is enabled at runtime.
> 
> Extend ultravisor info definitions and expose it via uv_info struct
> filled in during startup.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  5 ++
>  arch/s390/boot/Makefile                       |  2 +-
>  arch/s390/boot/uv.c                           | 20 +++++++-
>  arch/s390/include/asm/uv.h                    | 46 ++++++++++++++++--
>  arch/s390/kernel/Makefile                     |  1 +
>  arch/s390/kernel/setup.c                      |  4 --
>  arch/s390/kernel/uv.c                         | 48 +++++++++++++++++++
>  arch/s390/kvm/Kconfig                         |  9 ++++
>  8 files changed, 126 insertions(+), 9 deletions(-)
>  create mode 100644 arch/s390/kernel/uv.c

(...)

> diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
> index d3db3d7ed077..652b36f0efca 100644
> --- a/arch/s390/kvm/Kconfig
> +++ b/arch/s390/kvm/Kconfig
> @@ -55,6 +55,15 @@ config KVM_S390_UCONTROL
>  
>  	  If unsure, say N.
>  
> +config KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +	bool "Protected guests execution support"
> +	depends on KVM
> +	---help---
> +	  Support hosting protected virtual machines isolated from the
> +	  hypervisor.

I'm currently in the process of glancing across this patch set (won't
be able to get around to properly looking at it until next week the
earliest), so just a very high level comment:

I think there's not enough information in here to allow someone
configuring the kernel to decide what this is and if it would be useful
to them. This should probably be at least point to some document giving
some more details. Also, can you add a sentence where this feature is
actually expected to be available?

> +
> +	  If unsure, say Y.

Is 'Y' really the safe choice here? AFAICS, this is introducing new
code and not only trying to call new interfaces, if available. Is there
any drawback to enabling this on a kernel that won't run on a platform
supporting this feature? Is this supposed to be a common setup?

> +
>  # OK, it's a little counter-intuitive to do this, but it puts it neatly under
>  # the virtualization menu.
>  source "drivers/vhost/Kconfig"

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 03/37] s390/protvirt: add ultravisor initialization
  2019-10-25  9:21   ` David Hildenbrand
@ 2019-10-28 15:48     ` Vasily Gorbik
  2019-10-28 15:54       ` David Hildenbrand
  0 siblings, 1 reply; 213+ messages in thread
From: Vasily Gorbik @ 2019-10-28 15:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Janosch Frank, kvm, linux-s390, thuth, borntraeger, imbrenda,
	mihajlov, mimu, cohuck

On Fri, Oct 25, 2019 at 11:21:05AM +0200, David Hildenbrand wrote:
> On 24.10.19 13:40, Janosch Frank wrote:
> > From: Vasily Gorbik <gor@linux.ibm.com>
> > 
> > Before being able to host protected virtual machines, donate some of
> > the memory to the ultravisor. Besides that the ultravisor might impose
> > addressing limitations for memory used to back protected VM storage. Treat
> > that limit as protected virtualization host's virtual memory limit.
> > 
> > Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> > ---
> >   arch/s390/include/asm/uv.h | 16 ++++++++++++
> >   arch/s390/kernel/setup.c   |  3 +++
> >   arch/s390/kernel/uv.c      | 53 ++++++++++++++++++++++++++++++++++++++
> >   3 files changed, 72 insertions(+)
> > 
> > --- a/arch/s390/kernel/setup.c
> > +++ b/arch/s390/kernel/setup.c
> > @@ -567,6 +567,8 @@ static void __init setup_memory_end(void)
> >   			vmax = _REGION1_SIZE; /* 4-level kernel page table */
> >   	}
> > +	adjust_to_uv_max(&vmax);
> 
> I do wonder what would happen if vmax < max_physmem_end. Not sure if that is
> relevant at all.

Then identity mapping would be shorter then actual physical memory available
and everything above would be lost. But in reality "max_sec_stor_addr"
is big enough to not worry about it in the foreseeable future at all.

> > +void __init setup_uv(void)
> > +{
> > +	unsigned long uv_stor_base;
> > +
> > +	if (!prot_virt_host)
> > +		return;
> > +
> > +	uv_stor_base = (unsigned long)memblock_alloc_try_nid(
> > +		uv_info.uv_base_stor_len, SZ_1M, SZ_2G,
> > +		MEMBLOCK_ALLOC_ACCESSIBLE, NUMA_NO_NODE);
> > +	if (!uv_stor_base) {
> > +		pr_info("Failed to reserve %lu bytes for ultravisor base storage\n",
> > +			uv_info.uv_base_stor_len);
> > +		goto fail;
> > +	}
> 
> If I'm not wrong, we could setup/reserve a CMA area here and defer the
> actual allocation. Then, any MOVABLE data can end up on this CMA area until
> needed.
> 
> But I am neither an expert on CMA nor on UV, so most probably what I say is
> wrong ;)

From pure memory management this sounds like a good idea. And I tried
it and cma_declare_contiguous fulfills our needs, just had to export
cma_alloc/cma_release symbols. Nevertheless, delaying ultravisor init means we
would be potentially left with vmax == max_sec_stor_addr even if we wouldn't
be able to run protected VMs after all (currently setup_uv() is called
before kernel address space layout setup). Another much more fundamental
reason is that ultravisor init has to be called with a single cpu running,
which means it's easy to do before bringing other cpus up and we currently
don't have api to stop cpus at a later point (stop_machine won't cut it).

> > +
> > +	if (uv_init(uv_stor_base, uv_info.uv_base_stor_len)) {
> > +		memblock_free(uv_stor_base, uv_info.uv_base_stor_len);
> > +		goto fail;
> > +	}
> > +
> > +	pr_info("Reserving %luMB as ultravisor base storage\n",
> > +		uv_info.uv_base_stor_len >> 20);
> > +	return;
> > +fail:
> > +	prot_virt_host = 0;
> > +}
> > +
> > +void adjust_to_uv_max(unsigned long *vmax)
> > +{
> > +	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
> > +		*vmax = uv_info.max_sec_stor_addr;
> > +}
> >   #endif
> > 
> 
> Looks good to me from what I can tell.
> 
> -- 
> 
> Thanks,
> 
> David / dhildenb
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 03/37] s390/protvirt: add ultravisor initialization
  2019-10-28 15:48     ` Vasily Gorbik
@ 2019-10-28 15:54       ` David Hildenbrand
  0 siblings, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-10-28 15:54 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Janosch Frank, kvm, linux-s390, thuth, borntraeger, imbrenda,
	mihajlov, mimu, cohuck

On 28.10.19 16:48, Vasily Gorbik wrote:
> On Fri, Oct 25, 2019 at 11:21:05AM +0200, David Hildenbrand wrote:
>> On 24.10.19 13:40, Janosch Frank wrote:
>>> From: Vasily Gorbik <gor@linux.ibm.com>
>>>
>>> Before being able to host protected virtual machines, donate some of
>>> the memory to the ultravisor. Besides that the ultravisor might impose
>>> addressing limitations for memory used to back protected VM storage. Treat
>>> that limit as protected virtualization host's virtual memory limit.
>>>
>>> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
>>> ---
>>>    arch/s390/include/asm/uv.h | 16 ++++++++++++
>>>    arch/s390/kernel/setup.c   |  3 +++
>>>    arch/s390/kernel/uv.c      | 53 ++++++++++++++++++++++++++++++++++++++
>>>    3 files changed, 72 insertions(+)
>>>
>>> --- a/arch/s390/kernel/setup.c
>>> +++ b/arch/s390/kernel/setup.c
>>> @@ -567,6 +567,8 @@ static void __init setup_memory_end(void)
>>>    			vmax = _REGION1_SIZE; /* 4-level kernel page table */
>>>    	}
>>> +	adjust_to_uv_max(&vmax);
>>
>> I do wonder what would happen if vmax < max_physmem_end. Not sure if that is
>> relevant at all.
> 
> Then identity mapping would be shorter then actual physical memory available
> and everything above would be lost. But in reality "max_sec_stor_addr"
> is big enough to not worry about it in the foreseeable future at all.
> 
>>> +void __init setup_uv(void)
>>> +{
>>> +	unsigned long uv_stor_base;
>>> +
>>> +	if (!prot_virt_host)
>>> +		return;
>>> +
>>> +	uv_stor_base = (unsigned long)memblock_alloc_try_nid(
>>> +		uv_info.uv_base_stor_len, SZ_1M, SZ_2G,
>>> +		MEMBLOCK_ALLOC_ACCESSIBLE, NUMA_NO_NODE);
>>> +	if (!uv_stor_base) {
>>> +		pr_info("Failed to reserve %lu bytes for ultravisor base storage\n",
>>> +			uv_info.uv_base_stor_len);
>>> +		goto fail;
>>> +	}
>>
>> If I'm not wrong, we could setup/reserve a CMA area here and defer the
>> actual allocation. Then, any MOVABLE data can end up on this CMA area until
>> needed.
>>
>> But I am neither an expert on CMA nor on UV, so most probably what I say is
>> wrong ;)
> 
>  From pure memory management this sounds like a good idea. And I tried
> it and cma_declare_contiguous fulfills our needs, just had to export
> cma_alloc/cma_release symbols. Nevertheless, delaying ultravisor init means we
> would be potentially left with vmax == max_sec_stor_addr even if we wouldn't
> be able to run protected VMs after all (currently setup_uv() is called
> before kernel address space layout setup). Another much more fundamental
> reason is that ultravisor init has to be called with a single cpu running,
> which means it's easy to do before bringing other cpus up and we currently
> don't have api to stop cpus at a later point (stop_machine won't cut it).

Interesting point, I guess. One could hack around that. Emphasis on 
*hack* :) In stop_machine() you caught all CPUs. You could just 
temporarily SIGP STOP all running ones, issue the UV init call, and SIGP 
START them again. Not sure how that works with SMP, though ...

But yeah, this is stuff for the future, just an idea from my side :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-10-28 14:54   ` Cornelia Huck
@ 2019-10-28 20:20     ` Christian Borntraeger
  0 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2019-10-28 20:20 UTC (permalink / raw)
  To: Cornelia Huck, Janosch Frank
  Cc: kvm, linux-s390, thuth, david, imbrenda, mihajlov, mimu, gor



On 28.10.19 15:54, Cornelia Huck wrote:

> I think there's not enough information in here to allow someone
> configuring the kernel to decide what this is and if it would be useful
> to them. This should probably be at least point to some document giving
> some more details. Also, can you add a sentence where this feature is
> actually expected to be available?
> 
>> +
>> +	  If unsure, say Y.
> 
> Is 'Y' really the safe choice here? AFAICS, this is introducing new
> code and not only trying to call new interfaces, if available. Is there
> any drawback to enabling this on a kernel that won't run on a platform
> supporting this feature? Is this supposed to be a common setup?

I would expect that this is enabled on distributions in the future. So
I think we should actually get rid of this Kconfig and always enable that code.
We just must pay attention to fence of all the new code if the user does 
not opt in. (e.g. prot_virt=0). We need to do that anyway and not hanving a
Kconfig forces us to be extra careful.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 12/37] KVM: s390: protvirt: Handle SE notification interceptions
  2019-10-24 11:40 ` [RFC 12/37] KVM: s390: protvirt: Handle SE notification interceptions Janosch Frank
@ 2019-10-30 15:50   ` David Hildenbrand
  2019-10-30 17:58     ` Janosch Frank
  2019-11-05 18:04   ` Cornelia Huck
  1 sibling, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-30 15:50 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> Since KVM doesn't emulate any form of load control and load psw
> instructions anymore, we wouldn't get an interception if PSWs or CRs
> are changed in the guest. That means we can't inject IRQs right after
> the guest is enabled for them.
> 
> The new interception codes solve that problem by being a notification
> for changes to IRQ enablement relevant bits in CRs 0, 6 and 14, as
> well a the machine check mask bit in the PSW.
> 
> No special handling is needed for these interception codes, the KVM
> pre-run code will consult all necessary CRs and PSW bits and inject
> IRQs the guest is enabled for.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>   arch/s390/include/asm/kvm_host.h |  2 ++
>   arch/s390/kvm/intercept.c        | 18 ++++++++++++++++++
>   2 files changed, 20 insertions(+)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index d4fd0f3af676..6cc3b73ca904 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -210,6 +210,8 @@ struct kvm_s390_sie_block {
>   #define ICPT_PARTEXEC	0x38
>   #define ICPT_IOINST	0x40
>   #define ICPT_KSS	0x5c
> +#define ICPT_PV_MCHKR	0x60
> +#define ICPT_PV_INT_EN	0x64
>   	__u8	icptcode;		/* 0x0050 */
>   	__u8	icptstatus;		/* 0x0051 */
>   	__u16	ihcpu;			/* 0x0052 */
> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
> index a389fa85cca2..acc1710fc472 100644
> --- a/arch/s390/kvm/intercept.c
> +++ b/arch/s390/kvm/intercept.c
> @@ -480,6 +480,24 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
>   	case ICPT_KSS:
>   		rc = kvm_s390_skey_check_enable(vcpu);
>   		break;
> +	case ICPT_PV_MCHKR:
> +		/*
> +		 * A protected guest changed PSW bit 13 to one and is now
> +		 * enabled for interrupts. The pre-run code will check
> +		 * the registers and inject pending MCHKs based on the
> +		 * PSW and CRs. No additional work to do.
> +		 */
> +		rc = 0;
> +		break;
> +	case  ICPT_PV_INT_EN:
> +		/*
> +		 * A protected guest changed CR 0,6,14 and may now be
> +		 * enabled for interrupts. The pre-run code will check
> +		 * the registers and inject pending IRQs based on the
> +		 * CRs. No additional work to do.
> +		 */
> +		rc = 0;
> +	break;

Wrong indentation.

Maybe simply

case ICPT_PV_MCHKR:
ICPT_PV_INT_EN:
	/*
	 * PSW bit 13 or a CR (0, 6, 14) changed and we might now be
          * able to deliver interrupts. pre-run code will take care of
          * this.
	 */
	rc = 0;
	break;

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls
  2019-10-24 11:40 ` [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls Janosch Frank
@ 2019-10-30 15:53   ` David Hildenbrand
  2019-10-31  8:48     ` Michael Mueller
  2019-11-05 17:51   ` Cornelia Huck
  2019-11-14 11:48   ` Thomas Huth
  2 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-30 15:53 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> From: Michael Mueller <mimu@linux.ibm.com>
> 
> Define the interruption injection codes and the related fields in the
> sie control block for PVM interruption injection.
> 
> Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
> ---
>   arch/s390/include/asm/kvm_host.h | 25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 6cc3b73ca904..82443236d4cc 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -215,7 +215,15 @@ struct kvm_s390_sie_block {
>   	__u8	icptcode;		/* 0x0050 */
>   	__u8	icptstatus;		/* 0x0051 */
>   	__u16	ihcpu;			/* 0x0052 */
> -	__u8	reserved54[2];		/* 0x0054 */
> +	__u8	reserved54;		/* 0x0054 */
> +#define IICTL_CODE_NONE		 0x00
> +#define IICTL_CODE_MCHK		 0x01
> +#define IICTL_CODE_EXT		 0x02
> +#define IICTL_CODE_IO		 0x03
> +#define IICTL_CODE_RESTART	 0x04
> +#define IICTL_CODE_SPECIFICATION 0x10
> +#define IICTL_CODE_OPERAND	 0x11
> +	__u8	iictl;			/* 0x0055 */
>   	__u16	ipa;			/* 0x0056 */
>   	__u32	ipb;			/* 0x0058 */
>   	__u32	scaoh;			/* 0x005c */
> @@ -252,7 +260,8 @@ struct kvm_s390_sie_block {
>   #define HPID_KVM	0x4
>   #define HPID_VSIE	0x5
>   	__u8	hpid;			/* 0x00b8 */
> -	__u8	reservedb9[11];		/* 0x00b9 */
> +	__u8	reservedb9[7];		/* 0x00b9 */
> +	__u32	eiparams;		/* 0x00c0 */
>   	__u16	extcpuaddr;		/* 0x00c4 */
>   	__u16	eic;			/* 0x00c6 */
>   	__u32	reservedc8;		/* 0x00c8 */
> @@ -268,8 +277,16 @@ struct kvm_s390_sie_block {
>   	__u8	oai;			/* 0x00e2 */
>   	__u8	armid;			/* 0x00e3 */
>   	__u8	reservede4[4];		/* 0x00e4 */
> -	__u64	tecmc;			/* 0x00e8 */
> -	__u8	reservedf0[12];		/* 0x00f0 */
> +	union {
> +		__u64	tecmc;		/* 0x00e8 */
> +		struct {
> +			__u16	subchannel_id;	/* 0x00e8 */
> +			__u16	subchannel_nr;	/* 0x00ea */
> +			__u32	io_int_parm;	/* 0x00ec */
> +			__u32	io_int_word;	/* 0x00f0 */
> +		};

I only wonder if we should give this member a fitting name, e.g., "ioparams"

> +	} __packed;
> +	__u8	reservedf4[8];		/* 0x00f4 */
>   #define CRYCB_FORMAT_MASK 0x00000003
>   #define CRYCB_FORMAT0 0x00000000
>   #define CRYCB_FORMAT1 0x00000001
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 12/37] KVM: s390: protvirt: Handle SE notification interceptions
  2019-10-30 15:50   ` David Hildenbrand
@ 2019-10-30 17:58     ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-10-30 17:58 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 2764 bytes --]

On 10/30/19 4:50 PM, David Hildenbrand wrote:
> On 24.10.19 13:40, Janosch Frank wrote:
>> Since KVM doesn't emulate any form of load control and load psw
>> instructions anymore, we wouldn't get an interception if PSWs or CRs
>> are changed in the guest. That means we can't inject IRQs right after
>> the guest is enabled for them.
>>
>> The new interception codes solve that problem by being a notification
>> for changes to IRQ enablement relevant bits in CRs 0, 6 and 14, as
>> well a the machine check mask bit in the PSW.
>>
>> No special handling is needed for these interception codes, the KVM
>> pre-run code will consult all necessary CRs and PSW bits and inject
>> IRQs the guest is enabled for.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/kvm_host.h |  2 ++
>>   arch/s390/kvm/intercept.c        | 18 ++++++++++++++++++
>>   2 files changed, 20 insertions(+)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index d4fd0f3af676..6cc3b73ca904 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -210,6 +210,8 @@ struct kvm_s390_sie_block {
>>   #define ICPT_PARTEXEC	0x38
>>   #define ICPT_IOINST	0x40
>>   #define ICPT_KSS	0x5c
>> +#define ICPT_PV_MCHKR	0x60
>> +#define ICPT_PV_INT_EN	0x64
>>   	__u8	icptcode;		/* 0x0050 */
>>   	__u8	icptstatus;		/* 0x0051 */
>>   	__u16	ihcpu;			/* 0x0052 */
>> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
>> index a389fa85cca2..acc1710fc472 100644
>> --- a/arch/s390/kvm/intercept.c
>> +++ b/arch/s390/kvm/intercept.c
>> @@ -480,6 +480,24 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
>>   	case ICPT_KSS:
>>   		rc = kvm_s390_skey_check_enable(vcpu);
>>   		break;
>> +	case ICPT_PV_MCHKR:
>> +		/*
>> +		 * A protected guest changed PSW bit 13 to one and is now
>> +		 * enabled for interrupts. The pre-run code will check
>> +		 * the registers and inject pending MCHKs based on the
>> +		 * PSW and CRs. No additional work to do.
>> +		 */
>> +		rc = 0;
>> +		break;
>> +	case  ICPT_PV_INT_EN:
>> +		/*
>> +		 * A protected guest changed CR 0,6,14 and may now be
>> +		 * enabled for interrupts. The pre-run code will check
>> +		 * the registers and inject pending IRQs based on the
>> +		 * CRs. No additional work to do.
>> +		 */
>> +		rc = 0;
>> +	break;
> 
> Wrong indentation.
> 
> Maybe simply
> 
> case ICPT_PV_MCHKR:
> ICPT_PV_INT_EN:
> 	/*
> 	 * PSW bit 13 or a CR (0, 6, 14) changed and we might now be
>           * able to deliver interrupts. pre-run code will take care of
>           * this.
> 	 */
> 	rc = 0;
> 	break;

Sounds good, I'll fix it



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 27/37] KVM: s390: protvirt: SIGP handling
  2019-10-24 11:40 ` [RFC 27/37] KVM: s390: protvirt: SIGP handling Janosch Frank
@ 2019-10-30 18:29   ` David Hildenbrand
  2019-11-15 11:15   ` Thomas Huth
  1 sibling, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-10-30 18:29 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>

Can you add why this is necessary and how handle_stop() is intended to 
work in prot mode?

How is SIGP handled in general in prot mode? (which intercepts are 
handled by QEMU)
Would it be valid for user space to inject a STOP interrupt with "flags 
& KVM_S390_STOP_FLAG_STORE_STATUS" - I think not (legacy QEMU only)

I think we should rather disallow injecting such stop interrupts 
(KVM_S390_STOP_FLAG_STORE_STATUS) in prot mode in the first place. Also, 
we should disallow prot virt without user_sigp.

> ---
>   arch/s390/kvm/intercept.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
> index 37cb62bc261b..a89738e4f761 100644
> --- a/arch/s390/kvm/intercept.c
> +++ b/arch/s390/kvm/intercept.c
> @@ -72,7 +72,8 @@ static int handle_stop(struct kvm_vcpu *vcpu)
>   	if (!stop_pending)
>   		return 0;
>   
> -	if (flags & KVM_S390_STOP_FLAG_STORE_STATUS) {
> +	if (flags & KVM_S390_STOP_FLAG_STORE_STATUS &&
> +	    !kvm_s390_pv_is_protected(vcpu->kvm)) {
>   		rc = kvm_s390_vcpu_store_status(vcpu,
>   						KVM_S390_STORE_STATUS_NOADDR);
>   		if (rc)
> 

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls
  2019-10-30 15:53   ` David Hildenbrand
@ 2019-10-31  8:48     ` Michael Mueller
  2019-10-31  9:15       ` David Hildenbrand
  0 siblings, 1 reply; 213+ messages in thread
From: Michael Mueller @ 2019-10-31  8:48 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, cohuck, gor



On 30.10.19 16:53, David Hildenbrand wrote:
>> @@ -268,8 +277,16 @@ struct kvm_s390_sie_block {
>>       __u8    oai;            /* 0x00e2 */
>>       __u8    armid;            /* 0x00e3 */
>>       __u8    reservede4[4];        /* 0x00e4 */
>> -    __u64    tecmc;            /* 0x00e8 */
>> -    __u8    reservedf0[12];        /* 0x00f0 */
>> +    union {
>> +        __u64    tecmc;        /* 0x00e8 */
>> +        struct {
>> +            __u16    subchannel_id;    /* 0x00e8 */
>> +            __u16    subchannel_nr;    /* 0x00ea */
>> +            __u32    io_int_parm;    /* 0x00ec */
>> +            __u32    io_int_word;    /* 0x00f0 */
>> +        };
> 
> I only wonder if we should give this member a fitting name, e.g., 
> "ioparams"

Do you see a real gain for that? We have a lot of other unnamed structs
defined here as well.


> 
>> +    } __packed;
>> +    __u8    reservedf4[8];        /* 0x00f4 */
>>   #define CRYCB_FORMAT_MASK 0x00000003
>>   #define CRYCB_FORMAT0 0x00000000
>>   #define CRYCB_FORMAT1 0x00000001
>>

Thanks,
Michael

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls
  2019-10-31  8:48     ` Michael Mueller
@ 2019-10-31  9:15       ` David Hildenbrand
  2019-10-31 12:10         ` Michael Mueller
  0 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-10-31  9:15 UTC (permalink / raw)
  To: mimu, Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, cohuck, gor

On 31.10.19 09:48, Michael Mueller wrote:
> 
> 
> On 30.10.19 16:53, David Hildenbrand wrote:
>>> @@ -268,8 +277,16 @@ struct kvm_s390_sie_block {
>>>       __u8    oai;            /* 0x00e2 */
>>>       __u8    armid;            /* 0x00e3 */
>>>       __u8    reservede4[4];        /* 0x00e4 */
>>> -    __u64    tecmc;            /* 0x00e8 */
>>> -    __u8    reservedf0[12];        /* 0x00f0 */
>>> +    union {
>>> +        __u64    tecmc;        /* 0x00e8 */
>>> +        struct {
>>> +            __u16    subchannel_id;    /* 0x00e8 */
>>> +            __u16    subchannel_nr;    /* 0x00ea */
>>> +            __u32    io_int_parm;    /* 0x00ec */
>>> +            __u32    io_int_word;    /* 0x00f0 */
>>> +        };
>>
>> I only wonder if we should give this member a fitting name, e.g., 
>> "ioparams"
> 
> Do you see a real gain for that? We have a lot of other unnamed structs
> defined here as well.

I was wondering if we could just copy the whole struct when delivering
the interrupt.

You could even reuse  "struct kvm_s390_io_info" here to make that more
clear.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls
  2019-10-31  9:15       ` David Hildenbrand
@ 2019-10-31 12:10         ` Michael Mueller
  0 siblings, 0 replies; 213+ messages in thread
From: Michael Mueller @ 2019-10-31 12:10 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, cohuck, gor



On 31.10.19 10:15, David Hildenbrand wrote:
> On 31.10.19 09:48, Michael Mueller wrote:
>>
>>
>> On 30.10.19 16:53, David Hildenbrand wrote:
>>>> @@ -268,8 +277,16 @@ struct kvm_s390_sie_block {
>>>>        __u8    oai;            /* 0x00e2 */
>>>>        __u8    armid;            /* 0x00e3 */
>>>>        __u8    reservede4[4];        /* 0x00e4 */
>>>> -    __u64    tecmc;            /* 0x00e8 */
>>>> -    __u8    reservedf0[12];        /* 0x00f0 */
>>>> +    union {
>>>> +        __u64    tecmc;        /* 0x00e8 */
>>>> +        struct {
>>>> +            __u16    subchannel_id;    /* 0x00e8 */
>>>> +            __u16    subchannel_nr;    /* 0x00ea */
>>>> +            __u32    io_int_parm;    /* 0x00ec */
>>>> +            __u32    io_int_word;    /* 0x00f0 */
>>>> +        };
>>>
>>> I only wonder if we should give this member a fitting name, e.g.,
>>> "ioparams"
>>
>> Do you see a real gain for that? We have a lot of other unnamed structs
>> defined here as well.
> 
> I was wondering if we could just copy the whole struct when delivering
> the interrupt.
> 
> You could even reuse  "struct kvm_s390_io_info" here to make that more
> clear.

I want to keep it the way it is to have the fields in the SCB
declaration explicit.

Thanks,
Michael

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-10-25  8:49   ` David Hildenbrand
@ 2019-10-31 15:41     ` Christian Borntraeger
  2019-10-31 17:30       ` David Hildenbrand
  0 siblings, 1 reply; 213+ messages in thread
From: Christian Borntraeger @ 2019-10-31 15:41 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor



On 25.10.19 10:49, David Hildenbrand wrote:
> On 24.10.19 13:40, Janosch Frank wrote:
>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>
>> Pin the guest pages when they are first accessed, instead of all at
>> the same time when starting the guest.
> 
> Please explain why you do stuff. Why do we have to pin the hole guest memory? Why can't we mlock() the hole memory to avoid swapping in user space?

Basically we pin the guest for the same reason as AMD did it for their SEV. It is hard
to synchronize page import/export with the I/O for paging. For example you can actually
fault in a page that is currently under paging I/O. What do you do? import (so that the 
guest can run) or export (so that the I/O will work). As this turned out to be harder then
we though we decided to defer paging to a later point in time.

As we do not want to rely on the userspace to do the mlock this is now done in the kernel.



> 
> This really screams for a proper explanation.
>>
>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/gmap.h |  1 +
>>   arch/s390/include/asm/uv.h   |  6 +++++
>>   arch/s390/kernel/uv.c        | 20 ++++++++++++++
>>   arch/s390/kvm/kvm-s390.c     |  2 ++
>>   arch/s390/kvm/pv.c           | 51 ++++++++++++++++++++++++++++++------
>>   5 files changed, 72 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
>> index 99b3eedda26e..483f64427c0e 100644
>> --- a/arch/s390/include/asm/gmap.h
>> +++ b/arch/s390/include/asm/gmap.h
>> @@ -63,6 +63,7 @@ struct gmap {
>>       struct gmap *parent;
>>       unsigned long orig_asce;
>>       unsigned long se_handle;
>> +    struct page **pinned_pages;
>>       int edat_level;
>>       bool removed;
>>       bool initialized;
>> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
>> index 99cdd2034503..9ce9363aee1c 100644
>> --- a/arch/s390/include/asm/uv.h
>> +++ b/arch/s390/include/asm/uv.h
>> @@ -298,6 +298,7 @@ static inline int uv_convert_from_secure(unsigned long paddr)
>>       return -EINVAL;
>>   }
>>   +int kvm_s390_pv_pin_page(struct gmap *gmap, unsigned long gpa);
>>   /*
>>    * Requests the Ultravisor to make a page accessible to a guest
>>    * (import). If it's brought in the first time, it will be cleared. If
>> @@ -317,6 +318,11 @@ static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
>>           .gaddr = gaddr
>>       };
>>   +    down_read(&gmap->mm->mmap_sem);
>> +    cc = kvm_s390_pv_pin_page(gmap, gaddr);
>> +    up_read(&gmap->mm->mmap_sem);
>> +    if (cc)
>> +        return cc;
>>       cc = uv_call(0, (u64)&uvcb);
>>         if (!cc)
>> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
>> index f7778493e829..36554402b5c6 100644
>> --- a/arch/s390/kernel/uv.c
>> +++ b/arch/s390/kernel/uv.c
>> @@ -98,4 +98,24 @@ void adjust_to_uv_max(unsigned long *vmax)
>>       if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
>>           *vmax = uv_info.max_sec_stor_addr;
>>   }
>> +
>> +int kvm_s390_pv_pin_page(struct gmap *gmap, unsigned long gpa)
>> +{
>> +    unsigned long hva, gfn = gpa / PAGE_SIZE;
>> +    int rc;
>> +
>> +    if (!gmap->pinned_pages)
>> +        return -EINVAL;
>> +    hva = __gmap_translate(gmap, gpa);
>> +    if (IS_ERR_VALUE(hva))
>> +        return -EFAULT;
>> +    if (gmap->pinned_pages[gfn])
>> +        return -EEXIST;
>> +    rc = get_user_pages_fast(hva, 1, FOLL_WRITE, gmap->pinned_pages + gfn);
>> +    if (rc < 0)
>> +        return rc;
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_s390_pv_pin_page);
>> +
>>   #endif
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index d1ba12f857e7..490fde080107 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -2196,6 +2196,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>>           /* All VCPUs have to be destroyed before this call. */
>>           mutex_lock(&kvm->lock);
>>           kvm_s390_vcpu_block_all(kvm);
>> +        kvm_s390_pv_unpin(kvm);
>>           r = kvm_s390_pv_destroy_vm(kvm);
>>           if (!r)
>>               kvm_s390_pv_dealloc_vm(kvm);
>> @@ -2680,6 +2681,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>       kvm_s390_gisa_destroy(kvm);
>>       if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
>>           kvm_s390_pv_is_protected(kvm)) {
>> +        kvm_s390_pv_unpin(kvm);
>>           kvm_s390_pv_destroy_vm(kvm);
>>           kvm_s390_pv_dealloc_vm(kvm);
>>       }
>> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
>> index 80aecd5bea9e..383e660e2221 100644
>> --- a/arch/s390/kvm/pv.c
>> +++ b/arch/s390/kvm/pv.c
>> @@ -15,8 +15,35 @@
>>   #include <asm/mman.h>
>>   #include "kvm-s390.h"
>>   +static void unpin_destroy(struct page **pages, int nr)
>> +{
>> +    int i;
>> +    struct page *page;
>> +    u8 *val;
>> +
>> +    for (i = 0; i < nr; i++) {
>> +        page = pages[i];
>> +        if (!page)    /* page was never used */
>> +            continue;
>> +        val = (void *)page_to_phys(page);
>> +        READ_ONCE(*val);
>> +        put_page(page);
>> +    }
>> +}
>> +
>> +void kvm_s390_pv_unpin(struct kvm *kvm)
>> +{
>> +    unsigned long npages = kvm->arch.pv.guest_len / PAGE_SIZE;
>> +
>> +    mutex_lock(&kvm->slots_lock);
>> +    unpin_destroy(kvm->arch.gmap->pinned_pages, npages);
>> +    mutex_unlock(&kvm->slots_lock);
>> +}
>> +
>>   void kvm_s390_pv_dealloc_vm(struct kvm *kvm)
>>   {
>> +    vfree(kvm->arch.gmap->pinned_pages);
>> +    kvm->arch.gmap->pinned_pages = NULL;
>>       vfree(kvm->arch.pv.stor_var);
>>       free_pages(kvm->arch.pv.stor_base,
>>              get_order(uv_info.guest_base_stor_len));
>> @@ -28,7 +55,6 @@ int kvm_s390_pv_alloc_vm(struct kvm *kvm)
>>       unsigned long base = uv_info.guest_base_stor_len;
>>       unsigned long virt = uv_info.guest_virt_var_stor_len;
>>       unsigned long npages = 0, vlen = 0;
>> -    struct kvm_memslots *slots;
>>       struct kvm_memory_slot *memslot;
>>         kvm->arch.pv.stor_var = NULL;
>> @@ -43,22 +69,26 @@ int kvm_s390_pv_alloc_vm(struct kvm *kvm)
>>        * Slots are sorted by GFN
>>        */
>>       mutex_lock(&kvm->slots_lock);
>> -    slots = kvm_memslots(kvm);
>> -    memslot = slots->memslots;
>> +    memslot = kvm_memslots(kvm)->memslots;
>>       npages = memslot->base_gfn + memslot->npages;
>> -
>>       mutex_unlock(&kvm->slots_lock);
>> +
>> +    kvm->arch.gmap->pinned_pages = vzalloc(npages * sizeof(struct page *));
>> +    if (!kvm->arch.gmap->pinned_pages)
>> +        goto out_err;
>>       kvm->arch.pv.guest_len = npages * PAGE_SIZE;
>>         /* Allocate variable storage */
>>       vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
>>       vlen += uv_info.guest_virt_base_stor_len;
>>       kvm->arch.pv.stor_var = vzalloc(vlen);
>> -    if (!kvm->arch.pv.stor_var) {
>> -        kvm_s390_pv_dealloc_vm(kvm);
>> -        return -ENOMEM;
>> -    }
>> +    if (!kvm->arch.pv.stor_var)
>> +        goto out_err;
>>       return 0;
>> +
>> +out_err:
>> +    kvm_s390_pv_dealloc_vm(kvm);
>> +    return -ENOMEM;
>>   }
>>     int kvm_s390_pv_destroy_vm(struct kvm *kvm)
>> @@ -216,6 +246,11 @@ int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
>>       for (i = 0; i < size / PAGE_SIZE; i++) {
>>           uvcb.gaddr = addr + i * PAGE_SIZE;
>>           uvcb.tweak[1] = i * PAGE_SIZE;
>> +        down_read(&kvm->mm->mmap_sem);
>> +        rc = kvm_s390_pv_pin_page(kvm->arch.gmap, uvcb.gaddr);
>> +        up_read(&kvm->mm->mmap_sem);
>> +        if (rc && (rc != -EEXIST))
>> +            break;
>>   retry:
>>           rc = uv_call(0, (u64)&uvcb);
>>           if (!rc)
>>
> 
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-10-31 15:41     ` Christian Borntraeger
@ 2019-10-31 17:30       ` David Hildenbrand
  2019-10-31 20:57         ` Janosch Frank
  2019-11-01  8:50         ` Claudio Imbrenda
  0 siblings, 2 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-10-31 17:30 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor

On 31.10.19 16:41, Christian Borntraeger wrote:
> 
> 
> On 25.10.19 10:49, David Hildenbrand wrote:
>> On 24.10.19 13:40, Janosch Frank wrote:
>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>
>>> Pin the guest pages when they are first accessed, instead of all at
>>> the same time when starting the guest.
>>
>> Please explain why you do stuff. Why do we have to pin the hole guest memory? Why can't we mlock() the hole memory to avoid swapping in user space?
> 
> Basically we pin the guest for the same reason as AMD did it for their SEV. It is hard

Pinning all guest memory is very ugly. What you want is "don't page", 
what you get is unmovable pages all over the place. I was hoping that 
you could get around this by having an automatic back-and-forth 
conversion in place (due to the special new exceptions).

> to synchronize page import/export with the I/O for paging. For example you can actually
> fault in a page that is currently under paging I/O. What do you do? import (so that the
> guest can run) or export (so that the I/O will work). As this turned out to be harder then
> we though we decided to defer paging to a later point in time.

I don't quite see the issue yet. If you page out, the page will 
automatically (on access) be converted to !secure/encrypted memory. If 
the UV/guest wants to access it, it will be automatically converted to 
secure/unencrypted memory. If you have concurrent access, it will be 
converted back and forth until one party is done.

A proper automatic conversion should make this work. What am I missing?

> 
> As we do not want to rely on the userspace to do the mlock this is now done in the kernel.

I wonder if we could come up with an alternative (similar to how we 
override VM_MERGEABLE in the kernel) that can be called and ensured in 
the kernel. E.g., marking whole VMAs as "don't page" (I remember 
something like "special VMAs" like used for VDSOs that achieve exactly 
that, but I am absolutely no expert on that). That would be much nicer 
than pinning all pages and remembering what you pinned in huge page 
arrays ...

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-10-31 17:30       ` David Hildenbrand
@ 2019-10-31 20:57         ` Janosch Frank
  2019-11-04 10:19           ` David Hildenbrand
  2019-11-01  8:50         ` Claudio Imbrenda
  1 sibling, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-10-31 20:57 UTC (permalink / raw)
  To: David Hildenbrand, Christian Borntraeger, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor

On 10/31/19 6:30 PM, David Hildenbrand wrote:
> On 31.10.19 16:41, Christian Borntraeger wrote:
>>
>>
>> On 25.10.19 10:49, David Hildenbrand wrote:
>>> On 24.10.19 13:40, Janosch Frank wrote:
>>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>>
>>>> Pin the guest pages when they are first accessed, instead of all at
>>>> the same time when starting the guest.
>>>
>>> Please explain why you do stuff. Why do we have to pin the hole guest memory? Why can't we mlock() the hole memory to avoid swapping in user space?
>>
>> Basically we pin the guest for the same reason as AMD did it for their SEV. It is hard
> 
> Pinning all guest memory is very ugly. What you want is "don't page", 
> what you get is unmovable pages all over the place. I was hoping that 
> you could get around this by having an automatic back-and-forth 
> conversion in place (due to the special new exceptions).

Yes, that's one of the ideas that have been circulating.

> 
>> to synchronize page import/export with the I/O for paging. For example you can actually
>> fault in a page that is currently under paging I/O. What do you do? import (so that the
>> guest can run) or export (so that the I/O will work). As this turned out to be harder then
>> we though we decided to defer paging to a later point in time.
> 
> I don't quite see the issue yet. If you page out, the page will 
> automatically (on access) be converted to !secure/encrypted memory. If 
> the UV/guest wants to access it, it will be automatically converted to 
> secure/unencrypted memory. If you have concurrent access, it will be 
> converted back and forth until one party is done.

IO does not trigger an export on an imported page, but an error
condition in the IO subsystem. The page code does not read pages through
the cpu, but often just asks the device to read directly and that's
where everything goes wrong. We could bounce swapping, but chose to pin
for now until we find a proper solution to that problem which nicely
integrates into linux.

> 
> A proper automatic conversion should make this work. What am I missing?
> 
>>
>> As we do not want to rely on the userspace to do the mlock this is now done in the kernel.
> 
> I wonder if we could come up with an alternative (similar to how we 
> override VM_MERGEABLE in the kernel) that can be called and ensured in 
> the kernel. E.g., marking whole VMAs as "don't page" (I remember 
> something like "special VMAs" like used for VDSOs that achieve exactly 
> that, but I am absolutely no expert on that). That would be much nicer 
> than pinning all pages and remembering what you pinned in huge page 
> arrays ...

It might be more worthwhile to just accept one or two releases with
pinning and fix the root of the problem than design a nice stopgap.

Btw. s390 is not alone with the problem and we'll try to have another
discussion tomorrow with AMD to find a solution which works for more
than one architecture.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 01/37] DOCUMENTATION: protvirt: Protected virtual machine introduction
  2019-10-24 11:40 ` [RFC 01/37] DOCUMENTATION: protvirt: Protected virtual machine introduction Janosch Frank
@ 2019-11-01  8:18   ` Christian Borntraeger
  2019-11-04 14:18   ` Cornelia Huck
  1 sibling, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-01  8:18 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, david, imbrenda, mihajlov, mimu, cohuck, gor



On 24.10.19 13:40, Janosch Frank wrote:
> Introduction to Protected VMs.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  Documentation/virtual/kvm/s390-pv.txt | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/s390-pv.txt
> 
> diff --git a/Documentation/virtual/kvm/s390-pv.txt b/Documentation/virtual/kvm/s390-pv.txt
> new file mode 100644
> index 000000000000..86ed95f36759
> --- /dev/null
> +++ b/Documentation/virtual/kvm/s390-pv.txt
> @@ -0,0 +1,23 @@
> +Ultravisor and Protected VMs
> +===========================
> +
> +Summary:
> +
> +Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state
> +like guest memory and guest registers anymore. Instead the PVMs are
> +mostly managed by a new entity called Ultravisor (UV), which provides
> +an API, so KVM and the PVM can request management actions.
> +
> +Each guest starts in the non-protected mode and then transitions into
> +protected mode. On transition KVM registers the guest and its VCPUs
> +with the Ultravisor and prepares everything for running it.
> +
> +The Ultravisor will secure and decrypt the guest's boot memory
> +(i.e. kernel/initrd). It will safeguard state changes like VCPU
> +starts/stops and injected interrupts while the guest is running.
> +
> +As access to the guest's state, like the SIE state description is
                     not a native speaker, but do we need a , /here\ ?
> +normally needed to be able to run a VM, some changes have been made in

> +SIE behavior and fields have different meaning for a PVM. SIE exits
> +are minimized as much as possible to improve speed and reduce exposed
> +guest state.
> 

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>

After review we could merge all documentation patches into one, if we want.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-10-31 17:30       ` David Hildenbrand
  2019-10-31 20:57         ` Janosch Frank
@ 2019-11-01  8:50         ` Claudio Imbrenda
  2019-11-04 10:22           ` David Hildenbrand
  1 sibling, 1 reply; 213+ messages in thread
From: Claudio Imbrenda @ 2019-11-01  8:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Christian Borntraeger, Janosch Frank, kvm, linux-s390, thuth,
	mihajlov, mimu, cohuck, gor

On Thu, 31 Oct 2019 18:30:30 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 31.10.19 16:41, Christian Borntraeger wrote:
> > 
> > 
> > On 25.10.19 10:49, David Hildenbrand wrote:  
> >> On 24.10.19 13:40, Janosch Frank wrote:  
> >>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> >>>
> >>> Pin the guest pages when they are first accessed, instead of all
> >>> at the same time when starting the guest.  
> >>
> >> Please explain why you do stuff. Why do we have to pin the hole
> >> guest memory? Why can't we mlock() the hole memory to avoid
> >> swapping in user space?  
> > 
> > Basically we pin the guest for the same reason as AMD did it for
> > their SEV. It is hard  
> 
> Pinning all guest memory is very ugly. What you want is "don't page", 
> what you get is unmovable pages all over the place. I was hoping that 
> you could get around this by having an automatic back-and-forth 
> conversion in place (due to the special new exceptions).

we're not pinning all of guest memory, btw, but only the pages that are
actually used.

so if you have a *huge* guest, only the few pages used by the kernel and
initrd are actually pinned at VM start. Then one by one the ones
actually used when the guest is running get pinned on first use.

I don't need to add anything regarding the other points since the other
have commented already :)

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-10-24 11:40 ` [RFC 02/37] s390/protvirt: introduce host side setup Janosch Frank
  2019-10-24 13:25   ` David Hildenbrand
  2019-10-28 14:54   ` Cornelia Huck
@ 2019-11-01  8:53   ` Christian Borntraeger
  2019-11-04 14:26     ` Cornelia Huck
  2019-11-04 15:54   ` Cornelia Huck
  3 siblings, 1 reply; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-01  8:53 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, david, imbrenda, mihajlov, mimu, cohuck, gor



On 24.10.19 13:40, Janosch Frank wrote:
> From: Vasily Gorbik <gor@linux.ibm.com>
> 
> Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option for
> protected virtual machines hosting support code.
> 
> Add "prot_virt" command line option which controls if the kernel
> protected VMs support is enabled at runtime.
> 
> Extend ultravisor info definitions and expose it via uv_info struct
> filled in during startup.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  5 ++
>  arch/s390/boot/Makefile                       |  2 +-
>  arch/s390/boot/uv.c                           | 20 +++++++-
>  arch/s390/include/asm/uv.h                    | 46 ++++++++++++++++--
>  arch/s390/kernel/Makefile                     |  1 +
>  arch/s390/kernel/setup.c                      |  4 --
>  arch/s390/kernel/uv.c                         | 48 +++++++++++++++++++
>  arch/s390/kvm/Kconfig                         |  9 ++++
>  8 files changed, 126 insertions(+), 9 deletions(-)
>  create mode 100644 arch/s390/kernel/uv.c
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c7ac2f3ac99f..aa22e36b3105 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3693,6 +3693,11 @@
>  			before loading.
>  			See Documentation/admin-guide/blockdev/ramdisk.rst.
> 
> +	prot_virt=	[S390] enable hosting protected virtual machines
> +			isolated from the hypervisor (if hardware supports
> +			that).
> +			Format: <bool>
> +
>  	psi=		[KNL] Enable or disable pressure stall information
>  			tracking.
>  			Format: <bool>
> diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
> index e2c47d3a1c89..82247e71617a 100644
> --- a/arch/s390/boot/Makefile
> +++ b/arch/s390/boot/Makefile
> @@ -37,7 +37,7 @@ CFLAGS_sclp_early_core.o += -I$(srctree)/drivers/s390/char
>  obj-y	:= head.o als.o startup.o mem_detect.o ipl_parm.o ipl_report.o
>  obj-y	+= string.o ebcdic.o sclp_early_core.o mem.o ipl_vmparm.o cmdline.o
>  obj-y	+= version.o pgm_check_info.o ctype.o text_dma.o
> -obj-$(CONFIG_PROTECTED_VIRTUALIZATION_GUEST)	+= uv.o
> +obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST))	+= uv.o
>  obj-$(CONFIG_RELOCATABLE)	+= machine_kexec_reloc.o
>  obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
>  targets	:= bzImage startup.a section_cmp.boot.data section_cmp.boot.preserved.data $(obj-y)
> diff --git a/arch/s390/boot/uv.c b/arch/s390/boot/uv.c
> index ed007f4a6444..88cf8825d169 100644
> --- a/arch/s390/boot/uv.c
> +++ b/arch/s390/boot/uv.c
> @@ -3,7 +3,12 @@
>  #include <asm/facility.h>
>  #include <asm/sections.h>
> 
> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>  int __bootdata_preserved(prot_virt_guest);
> +#endif
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +struct uv_info __bootdata_preserved(uv_info);
> +#endif
> 
>  void uv_query_info(void)
>  {
> @@ -18,7 +23,20 @@ void uv_query_info(void)
>  	if (uv_call(0, (uint64_t)&uvcb))
>  		return;
> 
> -	if (test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST)) {
> +		memcpy(uv_info.inst_calls_list, uvcb.inst_calls_list, sizeof(uv_info.inst_calls_list));
> +		uv_info.uv_base_stor_len = uvcb.uv_base_stor_len;
> +		uv_info.guest_base_stor_len = uvcb.conf_base_phys_stor_len;
> +		uv_info.guest_virt_base_stor_len = uvcb.conf_base_virt_stor_len;
> +		uv_info.guest_virt_var_stor_len = uvcb.conf_virt_var_stor_len;
> +		uv_info.guest_cpu_stor_len = uvcb.cpu_stor_len;
> +		uv_info.max_sec_stor_addr = ALIGN(uvcb.max_guest_stor_addr, PAGE_SIZE);
> +		uv_info.max_num_sec_conf = uvcb.max_num_sec_conf;
> +		uv_info.max_guest_cpus = uvcb.max_guest_cpus;
> +	}
> +
> +	if (IS_ENABLED(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) &&
> +	    test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
>  	    test_bit_inv(BIT_UVC_CMD_REMOVE_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list))
>  		prot_virt_guest = 1;
>  }
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index ef3c00b049ab..6db1bc495e67 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -44,7 +44,19 @@ struct uv_cb_qui {
>  	struct uv_cb_header header;
>  	u64 reserved08;
>  	u64 inst_calls_list[4];
> -	u64 reserved30[15];
> +	u64 reserved30[2];
> +	u64 uv_base_stor_len;
> +	u64 reserved48;
> +	u64 conf_base_phys_stor_len;
> +	u64 conf_base_virt_stor_len;
> +	u64 conf_virt_var_stor_len;
> +	u64 cpu_stor_len;
> +	u32 reserved68[3];
> +	u32 max_num_sec_conf;
> +	u64 max_guest_stor_addr;
> +	u8  reserved80[150-128];
> +	u16 max_guest_cpus;
> +	u64 reserved98;
>  } __packed __aligned(8);
> 
>  struct uv_cb_share {
> @@ -69,9 +81,21 @@ static inline int uv_call(unsigned long r1, unsigned long r2)
>  	return cc;
>  }
> 
> -#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
> +struct uv_info {
> +	unsigned long inst_calls_list[4];
> +	unsigned long uv_base_stor_len;
> +	unsigned long guest_base_stor_len;
> +	unsigned long guest_virt_base_stor_len;
> +	unsigned long guest_virt_var_stor_len;
> +	unsigned long guest_cpu_stor_len;
> +	unsigned long max_sec_stor_addr;
> +	unsigned int max_num_sec_conf;
> +	unsigned short max_guest_cpus;
> +};
> +extern struct uv_info uv_info;
>  extern int prot_virt_guest;
> 
> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>  static inline int is_prot_virt_guest(void)
>  {
>  	return prot_virt_guest;
> @@ -121,11 +145,27 @@ static inline int uv_remove_shared(unsigned long addr)
>  	return share(addr, UVC_CMD_REMOVE_SHARED_ACCESS);
>  }
> 
> -void uv_query_info(void);
>  #else
>  #define is_prot_virt_guest() 0
>  static inline int uv_set_shared(unsigned long addr) { return 0; }
>  static inline int uv_remove_shared(unsigned long addr) { return 0; }
> +#endif
> +
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +extern int prot_virt_host;
> +
> +static inline int is_prot_virt_host(void)
> +{
> +	return prot_virt_host;
> +}
> +#else
> +#define is_prot_virt_host() 0
> +#endif
> +
> +#if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> +	defined(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST)
> +void uv_query_info(void);
> +#else
>  static inline void uv_query_info(void) {}
>  #endif
> 
> diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
> index 7edbbcd8228a..fe4fe475f526 100644
> --- a/arch/s390/kernel/Makefile
> +++ b/arch/s390/kernel/Makefile
> @@ -78,6 +78,7 @@ obj-$(CONFIG_PERF_EVENTS)	+= perf_cpum_cf_events.o perf_regs.o
>  obj-$(CONFIG_PERF_EVENTS)	+= perf_cpum_cf_diag.o
> 
>  obj-$(CONFIG_TRACEPOINTS)	+= trace.o
> +obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST))	+= uv.o
> 
>  # vdso
>  obj-y				+= vdso64/
> diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
> index 3ff291bc63b7..f36370f8af38 100644
> --- a/arch/s390/kernel/setup.c
> +++ b/arch/s390/kernel/setup.c
> @@ -92,10 +92,6 @@ char elf_platform[ELF_PLATFORM_SIZE];
> 
>  unsigned long int_hwcap = 0;
> 
> -#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
> -int __bootdata_preserved(prot_virt_guest);
> -#endif
> -
>  int __bootdata(noexec_disabled);
>  int __bootdata(memory_end_set);
>  unsigned long __bootdata(memory_end);
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> new file mode 100644
> index 000000000000..35ce89695509
> --- /dev/null
> +++ b/arch/s390/kernel/uv.c
> @@ -0,0 +1,48 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Common Ultravisor functions and initialization
> + *
> + * Copyright IBM Corp. 2019
> + */
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/sizes.h>
> +#include <linux/bitmap.h>
> +#include <linux/memblock.h>
> +#include <asm/facility.h>
> +#include <asm/sections.h>
> +#include <asm/uv.h>
> +
> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
> +int __bootdata_preserved(prot_virt_guest);
> +#endif
> +
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +int prot_virt_host;
> +EXPORT_SYMBOL(prot_virt_host);
> +struct uv_info __bootdata_preserved(uv_info);
> +EXPORT_SYMBOL(uv_info);
> +
> +static int __init prot_virt_setup(char *val)
> +{
> +	bool enabled;
> +	int rc;
> +
> +	rc = kstrtobool(val, &enabled);
> +	if (!rc && enabled)
> +		prot_virt_host = 1;
> +
> +	if (is_prot_virt_guest() && prot_virt_host) {
> +		prot_virt_host = 0;
> +		pr_info("Running as protected virtualization guest.");
> +	}
> +
> +	if (prot_virt_host && !test_facility(158)) {
> +		prot_virt_host = 0;
> +		pr_info("The ultravisor call facility is not available.");
> +	}
> +
> +	return rc;
> +}
> +early_param("prot_virt", prot_virt_setup);
> +#endif
> diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
> index d3db3d7ed077..652b36f0efca 100644
> --- a/arch/s390/kvm/Kconfig
> +++ b/arch/s390/kvm/Kconfig
> @@ -55,6 +55,15 @@ config KVM_S390_UCONTROL
> 
>  	  If unsure, say N.
> 
> +config KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +	bool "Protected guests execution support"
> +	depends on KVM
> +	---help---
> +	  Support hosting protected virtual machines isolated from the
> +	  hypervisor.
> +
> +	  If unsure, say Y.
> +
>  # OK, it's a little counter-intuitive to do this, but it puts it neatly under
>  # the virtualization menu.
>  source "drivers/vhost/Kconfig"
> 

As we have the prot_virt kernel paramter there is a way to fence this during runtime
Not sure if we really need a build time fence. We could get rid of
CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST and just use CONFIG_KVM instead,
assuming that in the long run all distros will enable that anyway. 
If other reviewers prefer to keep that extra option what about the following to the
help section:

----
Support hosting protected virtual machines in KVM. The state of these machines like
memory content or register content is protected from the host or host administrators.

Enabling this option will enable extra code that talks to a new firmware instance
called ultravisor that will take care of protecting the guest while also enabling
KVM to run this guest.

This feature must be enable by the kernel command line option prot_virt.

	  If unsure, say Y.

----


Either way, the remaining part is

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 03/37] s390/protvirt: add ultravisor initialization
  2019-10-24 11:40 ` [RFC 03/37] s390/protvirt: add ultravisor initialization Janosch Frank
  2019-10-25  9:21   ` David Hildenbrand
@ 2019-11-01 10:07   ` Christian Borntraeger
  2019-11-07 15:28   ` Cornelia Huck
  2 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-01 10:07 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, david, imbrenda, mihajlov, mimu, cohuck, gor



On 24.10.19 13:40, Janosch Frank wrote:
> From: Vasily Gorbik <gor@linux.ibm.com>
> 
> Before being able to host protected virtual machines, donate some of
> the memory to the ultravisor. Besides that the ultravisor might impose
> addressing limitations for memory used to back protected VM storage. Treat
> that limit as protected virtualization host's virtual memory limit.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>


> ---
>  arch/s390/include/asm/uv.h | 16 ++++++++++++
>  arch/s390/kernel/setup.c   |  3 +++
>  arch/s390/kernel/uv.c      | 53 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 72 insertions(+)
> 
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 6db1bc495e67..82a46fb913e7 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -23,12 +23,14 @@
>  #define UVC_RC_NO_RESUME	0x0007
>  
>  #define UVC_CMD_QUI			0x0001
> +#define UVC_CMD_INIT_UV			0x000f
>  #define UVC_CMD_SET_SHARED_ACCESS	0x1000
>  #define UVC_CMD_REMOVE_SHARED_ACCESS	0x1001
>  
>  /* Bits in installed uv calls */
>  enum uv_cmds_inst {
>  	BIT_UVC_CMD_QUI = 0,
> +	BIT_UVC_CMD_INIT_UV = 1,
>  	BIT_UVC_CMD_SET_SHARED_ACCESS = 8,
>  	BIT_UVC_CMD_REMOVE_SHARED_ACCESS = 9,
>  };
> @@ -59,6 +61,15 @@ struct uv_cb_qui {
>  	u64 reserved98;
>  } __packed __aligned(8);
>  
> +struct uv_cb_init {
> +	struct uv_cb_header header;
> +	u64 reserved08[2];
> +	u64 stor_origin;
> +	u64 stor_len;
> +	u64 reserved28[4];
> +
> +} __packed __aligned(8);
> +
>  struct uv_cb_share {
>  	struct uv_cb_header header;
>  	u64 reserved08[3];
> @@ -158,8 +169,13 @@ static inline int is_prot_virt_host(void)
>  {
>  	return prot_virt_host;
>  }
> +
> +void setup_uv(void);
> +void adjust_to_uv_max(unsigned long *vmax);
>  #else
>  #define is_prot_virt_host() 0
> +static inline void setup_uv(void) {}
> +static inline void adjust_to_uv_max(unsigned long *vmax) {}
>  #endif
>  
>  #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
> index f36370f8af38..d29d83c0b8df 100644
> --- a/arch/s390/kernel/setup.c
> +++ b/arch/s390/kernel/setup.c
> @@ -567,6 +567,8 @@ static void __init setup_memory_end(void)
>  			vmax = _REGION1_SIZE; /* 4-level kernel page table */
>  	}
>  
> +	adjust_to_uv_max(&vmax);
> +
>  	/* module area is at the end of the kernel address space. */
>  	MODULES_END = vmax;
>  	MODULES_VADDR = MODULES_END - MODULES_LEN;
> @@ -1147,6 +1149,7 @@ void __init setup_arch(char **cmdline_p)
>  	 */
>  	memblock_trim_memory(1UL << (MAX_ORDER - 1 + PAGE_SHIFT));
>  
> +	setup_uv();
>  	setup_memory_end();
>  	setup_memory();
>  	dma_contiguous_reserve(memory_end);
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index 35ce89695509..f7778493e829 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -45,4 +45,57 @@ static int __init prot_virt_setup(char *val)
>  	return rc;
>  }
>  early_param("prot_virt", prot_virt_setup);
> +
> +static int __init uv_init(unsigned long stor_base, unsigned long stor_len)
> +{
> +	struct uv_cb_init uvcb = {
> +		.header.cmd = UVC_CMD_INIT_UV,
> +		.header.len = sizeof(uvcb),
> +		.stor_origin = stor_base,
> +		.stor_len = stor_len,
> +	};
> +	int cc;
> +
> +	cc = uv_call(0, (uint64_t)&uvcb);
> +	if (cc || uvcb.header.rc != UVC_RC_EXECUTED) {
> +		pr_err("Ultravisor init failed with cc: %d rc: 0x%hx\n", cc,
> +		       uvcb.header.rc);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +void __init setup_uv(void)
> +{
> +	unsigned long uv_stor_base;
> +
> +	if (!prot_virt_host)
> +		return;
> +
> +	uv_stor_base = (unsigned long)memblock_alloc_try_nid(
> +		uv_info.uv_base_stor_len, SZ_1M, SZ_2G,
> +		MEMBLOCK_ALLOC_ACCESSIBLE, NUMA_NO_NODE);
> +	if (!uv_stor_base) {
> +		pr_info("Failed to reserve %lu bytes for ultravisor base storage\n",
> +			uv_info.uv_base_stor_len);
> +		goto fail;
> +	}
> +
> +	if (uv_init(uv_stor_base, uv_info.uv_base_stor_len)) {
> +		memblock_free(uv_stor_base, uv_info.uv_base_stor_len);
> +		goto fail;
> +	}
> +
> +	pr_info("Reserving %luMB as ultravisor base storage\n",
> +		uv_info.uv_base_stor_len >> 20);
> +	return;
> +fail:
> +	prot_virt_host = 0;
> +}
> +
> +void adjust_to_uv_max(unsigned long *vmax)
> +{
> +	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
> +		*vmax = uv_info.max_sec_stor_addr;
> +}
>  #endif
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 06/37] s390: UV: Add import and export to UV library
  2019-10-24 11:40 ` [RFC 06/37] s390: UV: Add import and export to UV library Janosch Frank
  2019-10-25  8:31   ` David Hildenbrand
@ 2019-11-01 11:26   ` Christian Borntraeger
  2019-11-01 12:25     ` Janosch Frank
  2019-11-01 12:42   ` Christian Borntraeger
  2019-11-11 16:40   ` Cornelia Huck
  3 siblings, 1 reply; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-01 11:26 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, david, imbrenda, mihajlov, mimu, cohuck, gor



On 24.10.19 13:40, Janosch Frank wrote:
> The convert to/from secure (or also "import/export") ultravisor calls
> are need for page management, i.e. paging, of secure execution VM.
> 
> Export encrypts a secure guest's page and makes it accessible to the
> guest for paging.
> 
> Import makes a page accessible to a secure guest.
> On the first import of that page, the page will be cleared by the
> Ultravisor before it is given to the guest.
> 
> All following imports will decrypt a exported page and verify
> integrity before giving the page to the guest.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/uv.h | 51 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 0bfbafcca136..99cdd2034503 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -15,6 +15,7 @@
>  #include <linux/errno.h>
>  #include <linux/bug.h>
>  #include <asm/page.h>
> +#include <asm/gmap.h>
>  
>  #define UVC_RC_EXECUTED		0x0001
>  #define UVC_RC_INV_CMD		0x0002
> @@ -279,6 +280,54 @@ static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
>  	return rc ? -EINVAL : 0;
>  }
>  
> +/*
> + * Requests the Ultravisor to encrypt a guest page and make it
> + * accessible to the host for paging (export).
> + *
> + * @paddr: Absolute host address of page to be exported
> + */
> +static inline int uv_convert_from_secure(unsigned long paddr)
> +{
> +	struct uv_cb_cfs uvcb = {
> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.paddr = paddr
> +	};
> +	if (!uv_call(0, (u64)&uvcb))
> +		return 0;

As discussed on the KVM forum. We should also check for
uvcb.header.rc != UVC_RC_EXECUTED
I know, we cant really do much if this fails, but we certainly want to know.






> +	return -EINVAL;
> +}
> +
> +/*
> + * Requests the Ultravisor to make a page accessible to a guest
> + * (import). If it's brought in the first time, it will be cleared. If
> + * it has been exported before, it will be decrypted and integrity
> + * checked.
> + *
> + * @handle: Ultravisor guest handle
> + * @gaddr: Guest 2 absolute address to be imported
> + */
> +static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
> +{
> +	int cc;
> +	struct uv_cb_cts uvcb = {
> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.guest_handle = gmap->se_handle,
> +		.gaddr = gaddr
> +	};
> +
> +	cc = uv_call(0, (u64)&uvcb);
> +
> +	if (!cc)
> +		return 0;
> +	if (uvcb.header.rc == 0x104)
> +		return -EEXIST;
> +	if (uvcb.header.rc == 0x10a)
> +		return -EFAULT;

again, we should probably check for rc != UVC_RC_EXECUTED to detect any other problem.

> +	return -EINVAL;
> +}
> +
>  void setup_uv(void);
>  void adjust_to_uv_max(unsigned long *vmax);
>  #else
> @@ -286,6 +335,8 @@ void adjust_to_uv_max(unsigned long *vmax);
>  static inline void setup_uv(void) {}
>  static inline void adjust_to_uv_max(unsigned long *vmax) {}
>  static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret) { return 0; }
> +static inline int uv_convert_from_secure(unsigned long paddr) { return 0; }
> +static inline int uv_convert_to_secure(unsigned long handle, unsigned long gaddr) { return 0; }
>  #endif
>  
>  #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 06/37] s390: UV: Add import and export to UV library
  2019-11-01 11:26   ` Christian Borntraeger
@ 2019-11-01 12:25     ` Janosch Frank
  2019-11-01 12:39       ` Christian Borntraeger
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-01 12:25 UTC (permalink / raw)
  To: Christian Borntraeger, kvm
  Cc: linux-s390, thuth, david, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 3826 bytes --]

On 11/1/19 12:26 PM, Christian Borntraeger wrote:
> 
> 
> On 24.10.19 13:40, Janosch Frank wrote:
>> The convert to/from secure (or also "import/export") ultravisor calls
>> are need for page management, i.e. paging, of secure execution VM.
>>
>> Export encrypts a secure guest's page and makes it accessible to the
>> guest for paging.
>>
>> Import makes a page accessible to a secure guest.
>> On the first import of that page, the page will be cleared by the
>> Ultravisor before it is given to the guest.
>>
>> All following imports will decrypt a exported page and verify
>> integrity before giving the page to the guest.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/uv.h | 51 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 51 insertions(+)
>>
>> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
>> index 0bfbafcca136..99cdd2034503 100644
>> --- a/arch/s390/include/asm/uv.h
>> +++ b/arch/s390/include/asm/uv.h
>> @@ -15,6 +15,7 @@
>>  #include <linux/errno.h>
>>  #include <linux/bug.h>
>>  #include <asm/page.h>
>> +#include <asm/gmap.h>
>>  
>>  #define UVC_RC_EXECUTED		0x0001
>>  #define UVC_RC_INV_CMD		0x0002
>> @@ -279,6 +280,54 @@ static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
>>  	return rc ? -EINVAL : 0;
>>  }
>>  
>> +/*
>> + * Requests the Ultravisor to encrypt a guest page and make it
>> + * accessible to the host for paging (export).
>> + *
>> + * @paddr: Absolute host address of page to be exported
>> + */
>> +static inline int uv_convert_from_secure(unsigned long paddr)
>> +{
>> +	struct uv_cb_cfs uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.paddr = paddr
>> +	};
>> +	if (!uv_call(0, (u64)&uvcb))
>> +		return 0;
> 
> As discussed on the KVM forum. We should also check for
> uvcb.header.rc != UVC_RC_EXECUTED
> I know, we cant really do much if this fails, but we certainly want to know.
> 
> 
> 
> 
> 
> 
>> +	return -EINVAL;
>> +}
>> +
>> +/*
>> + * Requests the Ultravisor to make a page accessible to a guest
>> + * (import). If it's brought in the first time, it will be cleared. If
>> + * it has been exported before, it will be decrypted and integrity
>> + * checked.
>> + *
>> + * @handle: Ultravisor guest handle
>> + * @gaddr: Guest 2 absolute address to be imported
>> + */
>> +static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
>> +{
>> +	int cc;
>> +	struct uv_cb_cts uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.guest_handle = gmap->se_handle,
>> +		.gaddr = gaddr
>> +	};
>> +
>> +	cc = uv_call(0, (u64)&uvcb);
>> +
>> +	if (!cc)
>> +		return 0;
>> +	if (uvcb.header.rc == 0x104)
>> +		return -EEXIST;
>> +	if (uvcb.header.rc == 0x10a)
>> +		return -EFAULT;
> 
> again, we should probably check for rc != UVC_RC_EXECUTED to detect any other problem.

That's handled by the CC and the return below.
CC == 1 means error
cc == 0 is success, that's why we return erly on cc == 0

> 
>> +	return -EINVAL;
>> +}
>> +
>>  void setup_uv(void);
>>  void adjust_to_uv_max(unsigned long *vmax);
>>  #else
>> @@ -286,6 +335,8 @@ void adjust_to_uv_max(unsigned long *vmax);
>>  static inline void setup_uv(void) {}
>>  static inline void adjust_to_uv_max(unsigned long *vmax) {}
>>  static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret) { return 0; }
>> +static inline int uv_convert_from_secure(unsigned long paddr) { return 0; }
>> +static inline int uv_convert_to_secure(unsigned long handle, unsigned long gaddr) { return 0; }
>>  #endif
>>  
>>  #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
>>



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 06/37] s390: UV: Add import and export to UV library
  2019-11-01 12:25     ` Janosch Frank
@ 2019-11-01 12:39       ` Christian Borntraeger
  0 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-01 12:39 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, david, imbrenda, mihajlov, mimu, cohuck, gor



On 01.11.19 13:25, Janosch Frank wrote:
> On 11/1/19 12:26 PM, Christian Borntraeger wrote:
>>
>>
>> On 24.10.19 13:40, Janosch Frank wrote:
>>> The convert to/from secure (or also "import/export") ultravisor calls
>>> are need for page management, i.e. paging, of secure execution VM.
>>>
>>> Export encrypts a secure guest's page and makes it accessible to the
>>> guest for paging.
>>>
>>> Import makes a page accessible to a secure guest.
>>> On the first import of that page, the page will be cleared by the
>>> Ultravisor before it is given to the guest.
>>>
>>> All following imports will decrypt a exported page and verify
>>> integrity before giving the page to the guest.
>>>
>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>> ---
>>>  arch/s390/include/asm/uv.h | 51 ++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 51 insertions(+)
>>>
>>> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
>>> index 0bfbafcca136..99cdd2034503 100644
>>> --- a/arch/s390/include/asm/uv.h
>>> +++ b/arch/s390/include/asm/uv.h
>>> @@ -15,6 +15,7 @@
>>>  #include <linux/errno.h>
>>>  #include <linux/bug.h>
>>>  #include <asm/page.h>
>>> +#include <asm/gmap.h>
>>>  
>>>  #define UVC_RC_EXECUTED		0x0001
>>>  #define UVC_RC_INV_CMD		0x0002
>>> @@ -279,6 +280,54 @@ static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
>>>  	return rc ? -EINVAL : 0;
>>>  }
>>>  
>>> +/*
>>> + * Requests the Ultravisor to encrypt a guest page and make it
>>> + * accessible to the host for paging (export).
>>> + *
>>> + * @paddr: Absolute host address of page to be exported
>>> + */
>>> +static inline int uv_convert_from_secure(unsigned long paddr)
>>> +{
>>> +	struct uv_cb_cfs uvcb = {
>>> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
>>> +		.header.len = sizeof(uvcb),
>>> +		.paddr = paddr
>>> +	};
>>> +	if (!uv_call(0, (u64)&uvcb))
>>> +		return 0;
>>
>> As discussed on the KVM forum. We should also check for
>> uvcb.header.rc != UVC_RC_EXECUTED
>> I know, we cant really do much if this fails, but we certainly want to know.
>>
>>
>>
>>
>>
>>
>>> +	return -EINVAL;
>>> +}
>>> +
>>> +/*
>>> + * Requests the Ultravisor to make a page accessible to a guest
>>> + * (import). If it's brought in the first time, it will be cleared. If
>>> + * it has been exported before, it will be decrypted and integrity
>>> + * checked.
>>> + *
>>> + * @handle: Ultravisor guest handle
>>> + * @gaddr: Guest 2 absolute address to be imported
>>> + */
>>> +static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
>>> +{
>>> +	int cc;
>>> +	struct uv_cb_cts uvcb = {
>>> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
>>> +		.header.len = sizeof(uvcb),
>>> +		.guest_handle = gmap->se_handle,
>>> +		.gaddr = gaddr
>>> +	};
>>> +
>>> +	cc = uv_call(0, (u64)&uvcb);
>>> +
>>> +	if (!cc)
>>> +		return 0;
>>> +	if (uvcb.header.rc == 0x104)
>>> +		return -EEXIST;
>>> +	if (uvcb.header.rc == 0x10a)
>>> +		return -EFAULT;
>>
>> again, we should probably check for rc != UVC_RC_EXECUTED to detect any other problem.
> 
> That's handled by the CC and the return below.
> CC == 1 means error
> cc == 0 is success, that's why we return erly on cc == 0

Right, uv_call return depends on CC. Nevermind.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 06/37] s390: UV: Add import and export to UV library
  2019-10-24 11:40 ` [RFC 06/37] s390: UV: Add import and export to UV library Janosch Frank
  2019-10-25  8:31   ` David Hildenbrand
  2019-11-01 11:26   ` Christian Borntraeger
@ 2019-11-01 12:42   ` Christian Borntraeger
  2019-11-11 16:40   ` Cornelia Huck
  3 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-01 12:42 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, david, imbrenda, mihajlov, mimu, cohuck, gor



On 24.10.19 13:40, Janosch Frank wrote:
> The convert to/from secure (or also "import/export") ultravisor calls
> are need for page management, i.e. paging, of secure execution VM.
> 
> Export encrypts a secure guest's page and makes it accessible to the
> guest for paging.
> 
> Import makes a page accessible to a secure guest.
> On the first import of that page, the page will be cleared by the
> Ultravisor before it is given to the guest.
> 
> All following imports will decrypt a exported page and verify
> integrity before giving the page to the guest.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>

After re-reading.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/uv.h | 51 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 0bfbafcca136..99cdd2034503 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -15,6 +15,7 @@
>  #include <linux/errno.h>
>  #include <linux/bug.h>
>  #include <asm/page.h>
> +#include <asm/gmap.h>
>  
>  #define UVC_RC_EXECUTED		0x0001
>  #define UVC_RC_INV_CMD		0x0002
> @@ -279,6 +280,54 @@ static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
>  	return rc ? -EINVAL : 0;
>  }
>  
> +/*
> + * Requests the Ultravisor to encrypt a guest page and make it
> + * accessible to the host for paging (export).
> + *
> + * @paddr: Absolute host address of page to be exported
> + */
> +static inline int uv_convert_from_secure(unsigned long paddr)
> +{
> +	struct uv_cb_cfs uvcb = {
> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.paddr = paddr
> +	};
> +	if (!uv_call(0, (u64)&uvcb))
> +		return 0;
> +	return -EINVAL;
> +}
> +
> +/*
> + * Requests the Ultravisor to make a page accessible to a guest
> + * (import). If it's brought in the first time, it will be cleared. If
> + * it has been exported before, it will be decrypted and integrity
> + * checked.
> + *
> + * @handle: Ultravisor guest handle
> + * @gaddr: Guest 2 absolute address to be imported
> + */
> +static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
> +{
> +	int cc;
> +	struct uv_cb_cts uvcb = {
> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.guest_handle = gmap->se_handle,
> +		.gaddr = gaddr
> +	};
> +
> +	cc = uv_call(0, (u64)&uvcb);
> +
> +	if (!cc)
> +		return 0;
> +	if (uvcb.header.rc == 0x104)
> +		return -EEXIST;
> +	if (uvcb.header.rc == 0x10a)
> +		return -EFAULT;
> +	return -EINVAL;
> +}
> +
>  void setup_uv(void);
>  void adjust_to_uv_max(unsigned long *vmax);
>  #else
> @@ -286,6 +335,8 @@ void adjust_to_uv_max(unsigned long *vmax);
>  static inline void setup_uv(void) {}
>  static inline void adjust_to_uv_max(unsigned long *vmax) {}
>  static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret) { return 0; }
> +static inline int uv_convert_from_secure(unsigned long paddr) { return 0; }
> +static inline int uv_convert_to_secure(unsigned long handle, unsigned long gaddr) { return 0; }
>  #endif
>  
>  #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC v2] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-25  8:24   ` [RFC v2] " Janosch Frank
@ 2019-11-01 13:02     ` Christian Borntraeger
  2019-11-04 14:32     ` David Hildenbrand
  2019-11-13 12:23     ` Thomas Huth
  2 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-01 13:02 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, david, imbrenda, mihajlov, mimu, cohuck, gor



On 25.10.19 10:24, Janosch Frank wrote:
> KSM will not work on secure pages, because when the kernel reads a
> secure page, it will be encrypted and hence no two pages will look the
> same.
> 
> Let's mark the guest pages as unmergeable when we transition to secure
> mode.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>

v2 Looks better.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

> ---
>  arch/s390/include/asm/gmap.h |  1 +
>  arch/s390/kvm/kvm-s390.c     |  6 ++++++
>  arch/s390/mm/gmap.c          | 32 +++++++++++++++++++++-----------
>  3 files changed, 28 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index 6efc0b501227..eab6a2ec3599 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -145,4 +145,5 @@ int gmap_mprotect_notify(struct gmap *, unsigned long start,
>  
>  void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
>  			     unsigned long gaddr, unsigned long vmaddr);
> +int gmap_mark_unmergeable(void);
>  #endif /* _ASM_S390_GMAP_H */
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 924132d92782..d1ba12f857e7 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2176,6 +2176,12 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>  		if (r)
>  			break;
>  
> +		down_write(&current->mm->mmap_sem);
> +		r = gmap_mark_unmergeable();
> +		up_write(&current->mm->mmap_sem);
> +		if (r)
> +			break;
> +
>  		mutex_lock(&kvm->lock);
>  		kvm_s390_vcpu_block_all(kvm);
>  		/* FMT 4 SIE needs esca */
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index edcdca97e85e..faecdf81abdb 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
>  }
>  EXPORT_SYMBOL_GPL(s390_enable_sie);
>  
> +int gmap_mark_unmergeable(void)
> +{
> +	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +
> +
> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +				MADV_UNMERGEABLE, &vma->vm_flags))
> +			return -ENOMEM;
> +	}
> +	mm->def_flags &= ~VM_MERGEABLE;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gmap_mark_unmergeable);
> +
>  /*
>   * Enable storage key handling from now on and initialize the storage
>   * keys with the default key.
> @@ -2593,24 +2610,17 @@ static const struct mm_walk_ops enable_skey_walk_ops = {
>  int s390_enable_skey(void)
>  {
>  	struct mm_struct *mm = current->mm;
> -	struct vm_area_struct *vma;
>  	int rc = 0;
>  
>  	down_write(&mm->mmap_sem);
>  	if (mm_uses_skeys(mm))
>  		goto out_up;
>  
> -	mm->context.uses_skeys = 1;
> -	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> -		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> -				MADV_UNMERGEABLE, &vma->vm_flags)) {
> -			mm->context.uses_skeys = 0;
> -			rc = -ENOMEM;
> -			goto out_up;
> -		}
> -	}
> -	mm->def_flags &= ~VM_MERGEABLE;
> +	rc = gmap_mark_unmergeable();
> +	if (rc)
> +		goto out_up;
>  
> +	mm->context.uses_skeys = 1;
>  	walk_page_range(mm, 0, TASK_SIZE, &enable_skey_walk_ops, NULL);
>  
>  out_up:
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-10-24 11:40 ` [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning Janosch Frank
  2019-10-25  8:49   ` David Hildenbrand
@ 2019-11-02  8:53   ` Christian Borntraeger
  2019-11-04 14:17   ` David Hildenbrand
  2 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-02  8:53 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, david, imbrenda, mihajlov, mimu, cohuck, gor



On 24.10.19 13:40, Janosch Frank wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> Pin the guest pages when they are first accessed, instead of all at
> the same time when starting the guest.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
[...]
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 80aecd5bea9e..383e660e2221 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -15,8 +15,35 @@
>  #include <asm/mman.h>
>  #include "kvm-s390.h"
>  
> +static void unpin_destroy(struct page **pages, int nr)
> +{
> +	int i;
> +	struct page *page;
> +	u8 *val;
> +
> +	for (i = 0; i < nr; i++) {
> +		page = pages[i];
> +		if (!page)	/* page was never used */
> +			continue;
> +		val = (void *)page_to_phys(page);

Why dont we call the convert from secure directly here to avoid the fault overhead?

> +		READ_ONCE(*val);
> +		put_page(page);

as we also do the export here (via implicit reading that page) this can take a while.
I think we must add a cond_resched here. 

> +	}
> +}
> +

[...]
> -
>  	mutex_unlock(&kvm->slots_lock);
> +
> +	kvm->arch.gmap->pinned_pages = vzalloc(npages * sizeof(struct page *));
> +	if (!kvm->arch.gmap->pinned_pages)
> +		goto out_err;
>  	kvm->arch.pv.guest_len = npages * PAGE_SIZE;
>  
>  	/* Allocate variable storage */
>  	vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
>  	vlen += uv_info.guest_virt_base_stor_len;
>  	kvm->arch.pv.stor_var = vzalloc(vlen);
> -	if (!kvm->arch.pv.stor_var) {
> -		kvm_s390_pv_dealloc_vm(kvm);
> -		return -ENOMEM;
> -	}
> +	if (!kvm->arch.pv.stor_var)
> +		goto out_err;
>  	return 0;
> +
> +out_err:
> +	kvm_s390_pv_dealloc_vm(kvm);
> +	return -ENOMEM;
>  }
>  
>  int kvm_s390_pv_destroy_vm(struct kvm *kvm)
> @@ -216,6 +246,11 @@ int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
>  	for (i = 0; i < size / PAGE_SIZE; i++) {
>  		uvcb.gaddr = addr + i * PAGE_SIZE;
>  		uvcb.tweak[1] = i * PAGE_SIZE;
> +		down_read(&kvm->mm->mmap_sem);
> +		rc = kvm_s390_pv_pin_page(kvm->arch.gmap, uvcb.gaddr);
> +		up_read(&kvm->mm->mmap_sem);

Here we should also have a cond_resched();

> +		if (rc && (rc != -EEXIST))
> +			break;
>  retry:
>  		rc = uv_call(0, (u64)&uvcb);
>  		if (!rc)
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-10-24 11:40 ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Janosch Frank
  2019-10-25  8:58   ` David Hildenbrand
@ 2019-11-04  8:18   ` Christian Borntraeger
  2019-11-04  8:41     ` Janosch Frank
  2019-11-07 16:29   ` Cornelia Huck
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-04  8:18 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, david, imbrenda, mihajlov, mimu, cohuck, gor



On 24.10.19 13:40, Janosch Frank wrote:
> Let's add a KVM interface to create and destroy protected VMs.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h |  24 +++-
>  arch/s390/include/asm/uv.h       | 110 ++++++++++++++
>  arch/s390/kvm/Makefile           |   2 +-
>  arch/s390/kvm/kvm-s390.c         | 173 +++++++++++++++++++++-
>  arch/s390/kvm/kvm-s390.h         |  47 ++++++
>  arch/s390/kvm/pv.c               | 237 +++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h         |  33 +++++
>  7 files changed, 622 insertions(+), 4 deletions(-)
>  create mode 100644 arch/s390/kvm/pv.c
[...]

> +	case KVM_PV_VM_UNPACK: {
> +		struct kvm_s390_pv_unp unp = {};
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&unp, argp, sizeof(unp)))
> +			break;
> +
> +		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak);
> +		break;
> +	}



[....]

> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> +		       unsigned long tweak)
> +{
> +	int i, rc = 0;
> +	struct uv_cb_unp uvcb = {
> +		.header.cmd = UVC_CMD_UNPACK_IMG,
> +		.header.len = sizeof(uvcb),
> +		.guest_handle = kvm_s390_pv_handle(kvm),
> +		.tweak[0] = tweak
> +	};
> +
> +	if (addr & ~PAGE_MASK || size & ~PAGE_MASK)
> +		return -EINVAL;
> +
> +
> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",
> +		 addr, size);

Does it make sense to check for addr and addr+size to be within the memory
size of the guest? The uv_call or gmap_fault will fail later on, but we 
could do an early exit if the the site is wrong. 




> +	for (i = 0; i < size / PAGE_SIZE; i++) {
> +		uvcb.gaddr = addr + i * PAGE_SIZE;
> +		uvcb.tweak[1] = i * PAGE_SIZE;


> +retry:

> +		rc = uv_call(0, (u64)&uvcb);
> +		if (!rc)
> +			continue;
> +		/* If not yet mapped fault and retry */
> +		if (uvcb.header.rc == 0x10a) {
> +			rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
> +					FAULT_FLAG_WRITE);
> +			if (rc)
> +				return rc;
> +			goto retry;
> +		}
> +		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
> +			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
> +		break;
> +	}
[...]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-11-04  8:18   ` Christian Borntraeger
@ 2019-11-04  8:41     ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-04  8:41 UTC (permalink / raw)
  To: Christian Borntraeger, kvm
  Cc: linux-s390, thuth, david, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 2583 bytes --]

On 11/4/19 9:18 AM, Christian Borntraeger wrote:
> 
> 
> On 24.10.19 13:40, Janosch Frank wrote:
>> Let's add a KVM interface to create and destroy protected VMs.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/kvm_host.h |  24 +++-
>>  arch/s390/include/asm/uv.h       | 110 ++++++++++++++
>>  arch/s390/kvm/Makefile           |   2 +-
>>  arch/s390/kvm/kvm-s390.c         | 173 +++++++++++++++++++++-
>>  arch/s390/kvm/kvm-s390.h         |  47 ++++++
>>  arch/s390/kvm/pv.c               | 237 +++++++++++++++++++++++++++++++
>>  include/uapi/linux/kvm.h         |  33 +++++
>>  7 files changed, 622 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/s390/kvm/pv.c
> [...]
> 
>> +	case KVM_PV_VM_UNPACK: {
>> +		struct kvm_s390_pv_unp unp = {};
>> +
>> +		r = -EFAULT;
>> +		if (copy_from_user(&unp, argp, sizeof(unp)))
>> +			break;
>> +
>> +		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak);
>> +		break;
>> +	}
> 
> 
> 
> [....]
> 
>> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
>> +		       unsigned long tweak)
>> +{
>> +	int i, rc = 0;
>> +	struct uv_cb_unp uvcb = {
>> +		.header.cmd = UVC_CMD_UNPACK_IMG,
>> +		.header.len = sizeof(uvcb),
>> +		.guest_handle = kvm_s390_pv_handle(kvm),
>> +		.tweak[0] = tweak
>> +	};
>> +
>> +	if (addr & ~PAGE_MASK || size & ~PAGE_MASK)
>> +		return -EINVAL;
>> +
>> +
>> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",
>> +		 addr, size);
> 
> Does it make sense to check for addr and addr+size to be within the memory
> size of the guest? The uv_call or gmap_fault will fail later on, but we 
> could do an early exit if the the site is wrong. 

Yeah, Marc already brought that up because of a testcase of his.
I'll add a check, but before that I need to understand what makes us
loop so long that we get RCU warnings.

> 
> 
> 
> 
>> +	for (i = 0; i < size / PAGE_SIZE; i++) {
>> +		uvcb.gaddr = addr + i * PAGE_SIZE;
>> +		uvcb.tweak[1] = i * PAGE_SIZE;
> 
> 
>> +retry:
> 
>> +		rc = uv_call(0, (u64)&uvcb);
>> +		if (!rc)
>> +			continue;
>> +		/* If not yet mapped fault and retry */
>> +		if (uvcb.header.rc == 0x10a) {
>> +			rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
>> +					FAULT_FLAG_WRITE);
>> +			if (rc)
>> +				return rc;
>> +			goto retry;
>> +		}
>> +		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
>> +			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
>> +		break;
>> +	}
> [...]
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-10-31 20:57         ` Janosch Frank
@ 2019-11-04 10:19           ` David Hildenbrand
  2019-11-04 10:25             ` Janosch Frank
  2019-11-04 13:58             ` Christian Borntraeger
  0 siblings, 2 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 10:19 UTC (permalink / raw)
  To: Janosch Frank, Christian Borntraeger, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor

>>> to synchronize page import/export with the I/O for paging. For example you can actually
>>> fault in a page that is currently under paging I/O. What do you do? import (so that the
>>> guest can run) or export (so that the I/O will work). As this turned out to be harder then
>>> we though we decided to defer paging to a later point in time.
>>
>> I don't quite see the issue yet. If you page out, the page will
>> automatically (on access) be converted to !secure/encrypted memory. If
>> the UV/guest wants to access it, it will be automatically converted to
>> secure/unencrypted memory. If you have concurrent access, it will be
>> converted back and forth until one party is done.
> 
> IO does not trigger an export on an imported page, but an error
> condition in the IO subsystem. The page code does not read pages through

Ah, that makes it much clearer. Thanks!

> the cpu, but often just asks the device to read directly and that's
> where everything goes wrong. We could bounce swapping, but chose to pin
> for now until we find a proper solution to that problem which nicely
> integrates into linux.

How hard would it be to

1. Detect the error condition
2. Try a read on the affected page from the CPU (will will automatically 
convert to encrypted/!secure)
3. Restart the I/O

I assume that this is a corner case where we don't really have to care 
about performance in the first shot.

> 
>>
>> A proper automatic conversion should make this work. What am I missing?
>>
>>>
>>> As we do not want to rely on the userspace to do the mlock this is now done in the kernel.
>>
>> I wonder if we could come up with an alternative (similar to how we
>> override VM_MERGEABLE in the kernel) that can be called and ensured in
>> the kernel. E.g., marking whole VMAs as "don't page" (I remember
>> something like "special VMAs" like used for VDSOs that achieve exactly
>> that, but I am absolutely no expert on that). That would be much nicer
>> than pinning all pages and remembering what you pinned in huge page
>> arrays ...
> 
> It might be more worthwhile to just accept one or two releases with
> pinning and fix the root of the problem than design a nice stopgap.

Quite honestly, to me this feels like a prototype hack that deserves a 
proper solution first. The issue with this hack is that it affects user 
space (esp. MADV_DONTNEED no longer working correctly). It's not just 
something you once fix in the kernel and be done with it.
> 
> Btw. s390 is not alone with the problem and we'll try to have another
> discussion tomorrow with AMD to find a solution which works for more
> than one architecture.

Let me know if there was an interesting outcome.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-11-01  8:50         ` Claudio Imbrenda
@ 2019-11-04 10:22           ` David Hildenbrand
  0 siblings, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 10:22 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Christian Borntraeger, Janosch Frank, kvm, linux-s390, thuth,
	mihajlov, mimu, cohuck, gor

On 01.11.19 09:50, Claudio Imbrenda wrote:
> On Thu, 31 Oct 2019 18:30:30 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 31.10.19 16:41, Christian Borntraeger wrote:
>>>
>>>
>>> On 25.10.19 10:49, David Hildenbrand wrote:
>>>> On 24.10.19 13:40, Janosch Frank wrote:
>>>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>>>
>>>>> Pin the guest pages when they are first accessed, instead of all
>>>>> at the same time when starting the guest.
>>>>
>>>> Please explain why you do stuff. Why do we have to pin the hole
>>>> guest memory? Why can't we mlock() the hole memory to avoid
>>>> swapping in user space?
>>>
>>> Basically we pin the guest for the same reason as AMD did it for
>>> their SEV. It is hard
>>
>> Pinning all guest memory is very ugly. What you want is "don't page",
>> what you get is unmovable pages all over the place. I was hoping that
>> you could get around this by having an automatic back-and-forth
>> conversion in place (due to the special new exceptions).
> 
> we're not pinning all of guest memory, btw, but only the pages that are
> actually used.

Any longer-running guest will eventually touch all guest physical memory 
(e.g., page cache, page shuffling), so this is only an optimization for 
short running guests.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-11-04 10:19           ` David Hildenbrand
@ 2019-11-04 10:25             ` Janosch Frank
  2019-11-04 10:27               ` David Hildenbrand
  2019-11-04 13:58             ` Christian Borntraeger
  1 sibling, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-04 10:25 UTC (permalink / raw)
  To: David Hildenbrand, Christian Borntraeger, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 3116 bytes --]

On 11/4/19 11:19 AM, David Hildenbrand wrote:
>>>> to synchronize page import/export with the I/O for paging. For example you can actually
>>>> fault in a page that is currently under paging I/O. What do you do? import (so that the
>>>> guest can run) or export (so that the I/O will work). As this turned out to be harder then
>>>> we though we decided to defer paging to a later point in time.
>>>
>>> I don't quite see the issue yet. If you page out, the page will
>>> automatically (on access) be converted to !secure/encrypted memory. If
>>> the UV/guest wants to access it, it will be automatically converted to
>>> secure/unencrypted memory. If you have concurrent access, it will be
>>> converted back and forth until one party is done.
>>
>> IO does not trigger an export on an imported page, but an error
>> condition in the IO subsystem. The page code does not read pages through
> 
> Ah, that makes it much clearer. Thanks!
> 
>> the cpu, but often just asks the device to read directly and that's
>> where everything goes wrong. We could bounce swapping, but chose to pin
>> for now until we find a proper solution to that problem which nicely
>> integrates into linux.
> 
> How hard would it be to
> 
> 1. Detect the error condition
> 2. Try a read on the affected page from the CPU (will will automatically 
> convert to encrypted/!secure)
> 3. Restart the I/O
> 
> I assume that this is a corner case where we don't really have to care 
> about performance in the first shot.

Restarting IO can be quite difficult with CCW, we might need to change
request data...

> 
>>
>>>
>>> A proper automatic conversion should make this work. What am I missing?
>>>
>>>>
>>>> As we do not want to rely on the userspace to do the mlock this is now done in the kernel.
>>>
>>> I wonder if we could come up with an alternative (similar to how we
>>> override VM_MERGEABLE in the kernel) that can be called and ensured in
>>> the kernel. E.g., marking whole VMAs as "don't page" (I remember
>>> something like "special VMAs" like used for VDSOs that achieve exactly
>>> that, but I am absolutely no expert on that). That would be much nicer
>>> than pinning all pages and remembering what you pinned in huge page
>>> arrays ...
>>
>> It might be more worthwhile to just accept one or two releases with
>> pinning and fix the root of the problem than design a nice stopgap.
> 
> Quite honestly, to me this feels like a prototype hack that deserves a 
> proper solution first. The issue with this hack is that it affects user 
> space (esp. MADV_DONTNEED no longer working correctly). It's not just 
> something you once fix in the kernel and be done with it.

It is a hack, yes.
But we're not the only architecture to need it x86 pins all the memory
at the start of the VM and that code is already upstream...

>>
>> Btw. s390 is not alone with the problem and we'll try to have another
>> discussion tomorrow with AMD to find a solution which works for more
>> than one architecture.
> 
> Let me know if there was an interesting outcome.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-11-04 10:25             ` Janosch Frank
@ 2019-11-04 10:27               ` David Hildenbrand
  0 siblings, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 10:27 UTC (permalink / raw)
  To: Janosch Frank, Christian Borntraeger, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor

On 04.11.19 11:25, Janosch Frank wrote:
> On 11/4/19 11:19 AM, David Hildenbrand wrote:
>>>>> to synchronize page import/export with the I/O for paging. For example you can actually
>>>>> fault in a page that is currently under paging I/O. What do you do? import (so that the
>>>>> guest can run) or export (so that the I/O will work). As this turned out to be harder then
>>>>> we though we decided to defer paging to a later point in time.
>>>>
>>>> I don't quite see the issue yet. If you page out, the page will
>>>> automatically (on access) be converted to !secure/encrypted memory. If
>>>> the UV/guest wants to access it, it will be automatically converted to
>>>> secure/unencrypted memory. If you have concurrent access, it will be
>>>> converted back and forth until one party is done.
>>>
>>> IO does not trigger an export on an imported page, but an error
>>> condition in the IO subsystem. The page code does not read pages through
>>
>> Ah, that makes it much clearer. Thanks!
>>
>>> the cpu, but often just asks the device to read directly and that's
>>> where everything goes wrong. We could bounce swapping, but chose to pin
>>> for now until we find a proper solution to that problem which nicely
>>> integrates into linux.
>>
>> How hard would it be to
>>
>> 1. Detect the error condition
>> 2. Try a read on the affected page from the CPU (will will automatically
>> convert to encrypted/!secure)
>> 3. Restart the I/O
>>
>> I assume that this is a corner case where we don't really have to care
>> about performance in the first shot.
> 
> Restarting IO can be quite difficult with CCW, we might need to change
> request data...

I am no I/O expert, so I can't comment if that would be possible :(


>>>> A proper automatic conversion should make this work. What am I missing?
>>>>
>>>>>
>>>>> As we do not want to rely on the userspace to do the mlock this is now done in the kernel.
>>>>
>>>> I wonder if we could come up with an alternative (similar to how we
>>>> override VM_MERGEABLE in the kernel) that can be called and ensured in
>>>> the kernel. E.g., marking whole VMAs as "don't page" (I remember
>>>> something like "special VMAs" like used for VDSOs that achieve exactly
>>>> that, but I am absolutely no expert on that). That would be much nicer
>>>> than pinning all pages and remembering what you pinned in huge page
>>>> arrays ...
>>>
>>> It might be more worthwhile to just accept one or two releases with
>>> pinning and fix the root of the problem than design a nice stopgap.
>>
>> Quite honestly, to me this feels like a prototype hack that deserves a
>> proper solution first. The issue with this hack is that it affects user
>> space (esp. MADV_DONTNEED no longer working correctly). It's not just
>> something you once fix in the kernel and be done with it.
> 
> It is a hack, yes.
> But we're not the only architecture to need it x86 pins all the memory
> at the start of the VM and that code is already upstream...

IMHO that doesn't make it any better. It is and remains a prototype hack 
in my opinion.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 14/37] KVM: s390: protvirt: Implement interruption injection
  2019-10-24 11:40 ` [RFC 14/37] KVM: s390: protvirt: Implement interruption injection Janosch Frank
@ 2019-11-04 10:29   ` David Hildenbrand
  2019-11-04 14:05     ` Christian Borntraeger
  2019-11-14 12:07   ` Thomas Huth
  1 sibling, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 10:29 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> From: Michael Mueller <mimu@linux.ibm.com>
> 
> The patch implements interruption injection for the following
> list of interruption types:
> 
>    - I/O
>      __deliver_io (III)
> 
>    - External
>      __deliver_cpu_timer (IEI)
>      __deliver_ckc (IEI)
>      __deliver_emergency_signal (IEI)
>      __deliver_external_call (IEI)
>      __deliver_service (IEI)
> 
>    - cpu restart
>      __deliver_restart (IRI)

What exactly is IRQ_PEND_EXT_SERVICE_EV? Can you add some comments whet 
the new interrupt does and why it is needed in this context? Thanks

> 
> Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [interrupt masking]
> ---
>   arch/s390/include/asm/kvm_host.h |  10 ++
>   arch/s390/kvm/interrupt.c        | 182 +++++++++++++++++++++++--------
>   2 files changed, 149 insertions(+), 43 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 82443236d4cc..63fc32d38aa9 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -496,6 +496,7 @@ enum irq_types {
>   	IRQ_PEND_PFAULT_INIT,
>   	IRQ_PEND_EXT_HOST,
>   	IRQ_PEND_EXT_SERVICE,
> +	IRQ_PEND_EXT_SERVICE_EV,
>   	IRQ_PEND_EXT_TIMING,
>   	IRQ_PEND_EXT_CPU_TIMER,
>   	IRQ_PEND_EXT_CLOCK_COMP,
> @@ -540,6 +541,7 @@ enum irq_types {
>   			   (1UL << IRQ_PEND_EXT_TIMING)     | \
>   			   (1UL << IRQ_PEND_EXT_HOST)       | \
>   			   (1UL << IRQ_PEND_EXT_SERVICE)    | \
> +			   (1UL << IRQ_PEND_EXT_SERVICE_EV) | \
>   			   (1UL << IRQ_PEND_VIRTIO)         | \
>   			   (1UL << IRQ_PEND_PFAULT_INIT)    | \
>   			   (1UL << IRQ_PEND_PFAULT_DONE))
> @@ -556,6 +558,13 @@ enum irq_types {
>   #define IRQ_PEND_MCHK_MASK ((1UL << IRQ_PEND_MCHK_REP) | \
>   			    (1UL << IRQ_PEND_MCHK_EX))
>   
> +#define IRQ_PEND_EXT_II_MASK ((1UL << IRQ_PEND_EXT_CPU_TIMER)  | \
> +			      (1UL << IRQ_PEND_EXT_CLOCK_COMP) | \
> +			      (1UL << IRQ_PEND_EXT_EMERGENCY)  | \
> +			      (1UL << IRQ_PEND_EXT_EXTERNAL)   | \
> +			      (1UL << IRQ_PEND_EXT_SERVICE)    | \
> +			      (1UL << IRQ_PEND_EXT_SERVICE_EV))
> +
>   struct kvm_s390_interrupt_info {
>   	struct list_head list;
>   	u64	type;
> @@ -614,6 +623,7 @@ struct kvm_s390_local_interrupt {
>   
>   struct kvm_s390_float_interrupt {
>   	unsigned long pending_irqs;
> +	unsigned long masked_irqs;
>   	spinlock_t lock;
>   	struct list_head lists[FIRQ_LIST_COUNT];
>   	int counters[FIRQ_MAX_COUNT];
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index 165dea4c7f19..c919dfe4dfd3 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -324,8 +324,10 @@ static inline int gisa_tac_ipm_gisc(struct kvm_s390_gisa *gisa, u32 gisc)
>   
>   static inline unsigned long pending_irqs_no_gisa(struct kvm_vcpu *vcpu)
>   {
> -	return vcpu->kvm->arch.float_int.pending_irqs |
> -		vcpu->arch.local_int.pending_irqs;
> +	unsigned long pending = vcpu->kvm->arch.float_int.pending_irqs | vcpu->arch.local_int.pending_irqs;
> +
> +	pending &= ~vcpu->kvm->arch.float_int.masked_irqs;
> +	return pending;
>   }
>   
>   static inline unsigned long pending_irqs(struct kvm_vcpu *vcpu)
> @@ -383,10 +385,16 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu)
>   		__clear_bit(IRQ_PEND_EXT_CLOCK_COMP, &active_mask);
>   	if (!(vcpu->arch.sie_block->gcr[0] & CR0_CPU_TIMER_SUBMASK))
>   		__clear_bit(IRQ_PEND_EXT_CPU_TIMER, &active_mask);
> -	if (!(vcpu->arch.sie_block->gcr[0] & CR0_SERVICE_SIGNAL_SUBMASK))
> +	if (!(vcpu->arch.sie_block->gcr[0] & CR0_SERVICE_SIGNAL_SUBMASK)) {
>   		__clear_bit(IRQ_PEND_EXT_SERVICE, &active_mask);
> +		__clear_bit(IRQ_PEND_EXT_SERVICE_EV, &active_mask);
> +	}
>   	if (psw_mchk_disabled(vcpu))
>   		active_mask &= ~IRQ_PEND_MCHK_MASK;
> +	/* PV guest cpus can have a single interruption injected at a time. */
> +	if (kvm_s390_pv_is_protected(vcpu->kvm) &&
> +	    vcpu->arch.sie_block->iictl != IICTL_CODE_NONE)
> +		active_mask &= ~(IRQ_PEND_EXT_II_MASK | IRQ_PEND_IO_MASK);
>   	/*
>   	 * Check both floating and local interrupt's cr14 because
>   	 * bit IRQ_PEND_MCHK_REP could be set in both cases.
> @@ -479,19 +487,23 @@ static void set_intercept_indicators(struct kvm_vcpu *vcpu)
>   static int __must_check __deliver_cpu_timer(struct kvm_vcpu *vcpu)
>   {
>   	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
> -	int rc;
> +	int rc = 0;
>   
>   	vcpu->stat.deliver_cputm++;
>   	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_CPU_TIMER,
>   					 0, 0);
> -
> -	rc  = put_guest_lc(vcpu, EXT_IRQ_CPU_TIMER,
> -			   (u16 *)__LC_EXT_INT_CODE);
> -	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
> -	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
> -			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> -	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
> -			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
> +		vcpu->arch.sie_block->eic = EXT_IRQ_CPU_TIMER;
> +	} else {
> +		rc  = put_guest_lc(vcpu, EXT_IRQ_CPU_TIMER,
> +				   (u16 *)__LC_EXT_INT_CODE);
> +		rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
> +		rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
> +				     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +		rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
> +				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +	}
>   	clear_bit(IRQ_PEND_EXT_CPU_TIMER, &li->pending_irqs);
>   	return rc ? -EFAULT : 0;
>   }
> @@ -499,19 +511,23 @@ static int __must_check __deliver_cpu_timer(struct kvm_vcpu *vcpu)
>   static int __must_check __deliver_ckc(struct kvm_vcpu *vcpu)
>   {
>   	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
> -	int rc;
> +	int rc = 0;
>   
>   	vcpu->stat.deliver_ckc++;
>   	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_CLOCK_COMP,
>   					 0, 0);
> -
> -	rc  = put_guest_lc(vcpu, EXT_IRQ_CLK_COMP,
> -			   (u16 __user *)__LC_EXT_INT_CODE);
> -	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
> -	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
> -			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> -	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
> -			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
> +		vcpu->arch.sie_block->eic = EXT_IRQ_CLK_COMP;
> +	} else {
> +		rc  = put_guest_lc(vcpu, EXT_IRQ_CLK_COMP,
> +				   (u16 __user *)__LC_EXT_INT_CODE);
> +		rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
> +		rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
> +				     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +		rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
> +				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +	}
>   	clear_bit(IRQ_PEND_EXT_CLOCK_COMP, &li->pending_irqs);
>   	return rc ? -EFAULT : 0;
>   }
> @@ -533,7 +549,6 @@ static int __must_check __deliver_pfault_init(struct kvm_vcpu *vcpu)
>   	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
>   					 KVM_S390_INT_PFAULT_INIT,
>   					 0, ext.ext_params2);
> -
>   	rc  = put_guest_lc(vcpu, EXT_IRQ_CP_SERVICE, (u16 *) __LC_EXT_INT_CODE);
>   	rc |= put_guest_lc(vcpu, PFAULT_INIT, (u16 *) __LC_EXT_CPU_ADDR);
>   	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
> @@ -696,17 +711,21 @@ static int __must_check __deliver_machine_check(struct kvm_vcpu *vcpu)
>   static int __must_check __deliver_restart(struct kvm_vcpu *vcpu)
>   {
>   	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
> -	int rc;
> +	int rc = 0;
>   
>   	VCPU_EVENT(vcpu, 3, "%s", "deliver: cpu restart");
>   	vcpu->stat.deliver_restart_signal++;
>   	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_RESTART, 0, 0);
>   
> -	rc  = write_guest_lc(vcpu,
> -			     offsetof(struct lowcore, restart_old_psw),
> -			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> -	rc |= read_guest_lc(vcpu, offsetof(struct lowcore, restart_psw),
> -			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +		vcpu->arch.sie_block->iictl = IICTL_CODE_RESTART;
> +	} else {
> +		rc  = write_guest_lc(vcpu,
> +				     offsetof(struct lowcore, restart_old_psw),
> +				     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +		rc |= read_guest_lc(vcpu, offsetof(struct lowcore, restart_psw),
> +				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +	}
>   	clear_bit(IRQ_PEND_RESTART, &li->pending_irqs);
>   	return rc ? -EFAULT : 0;
>   }
> @@ -748,6 +767,12 @@ static int __must_check __deliver_emergency_signal(struct kvm_vcpu *vcpu)
>   	vcpu->stat.deliver_emergency_signal++;
>   	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_EMERGENCY,
>   					 cpu_addr, 0);
> +	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
> +		vcpu->arch.sie_block->eic = EXT_IRQ_EMERGENCY_SIG;
> +		vcpu->arch.sie_block->extcpuaddr = cpu_addr;
> +		return 0;
> +	}
>   
>   	rc  = put_guest_lc(vcpu, EXT_IRQ_EMERGENCY_SIG,
>   			   (u16 *)__LC_EXT_INT_CODE);
> @@ -776,6 +801,12 @@ static int __must_check __deliver_external_call(struct kvm_vcpu *vcpu)
>   	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
>   					 KVM_S390_INT_EXTERNAL_CALL,
>   					 extcall.code, 0);
> +	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
> +		vcpu->arch.sie_block->eic = EXT_IRQ_EXTERNAL_CALL;
> +		vcpu->arch.sie_block->extcpuaddr = extcall.code;
> +		return 0;
> +	}
>   
>   	rc  = put_guest_lc(vcpu, EXT_IRQ_EXTERNAL_CALL,
>   			   (u16 *)__LC_EXT_INT_CODE);
> @@ -902,6 +933,31 @@ static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
>   	return rc ? -EFAULT : 0;
>   }
>   
> +#define SCCB_MASK 0xFFFFFFF8
> +#define SCCB_EVENT_PENDING 0x3
> +
> +static int write_sclp(struct kvm_vcpu *vcpu, u32 parm)
> +{
> +	int rc;
> +
> +	if (kvm_s390_pv_handle_cpu(vcpu)) {
> +		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
> +		vcpu->arch.sie_block->eic = EXT_IRQ_SERVICE_SIG;
> +		vcpu->arch.sie_block->eiparams = parm;
> +		return 0;
> +	}
> +
> +	rc  = put_guest_lc(vcpu, EXT_IRQ_SERVICE_SIG, (u16 *)__LC_EXT_INT_CODE);
> +	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
> +	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
> +			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
> +			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +	rc |= put_guest_lc(vcpu, parm,
> +			   (u32 *)__LC_EXT_PARAMS);
> +	return rc;
> +}
> +
>   static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
>   {
>   	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
> @@ -909,13 +965,17 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
>   	int rc = 0;
>   
>   	spin_lock(&fi->lock);
> -	if (!(test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs))) {
> +	if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs) ||
> +	    !(test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs))) {
>   		spin_unlock(&fi->lock);
>   		return 0;
>   	}
>   	ext = fi->srv_signal;
>   	memset(&fi->srv_signal, 0, sizeof(ext));
>   	clear_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs);
> +	clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
> +	if (kvm_s390_pv_is_protected(vcpu->kvm))
> +		set_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs);
>   	spin_unlock(&fi->lock);
>   
>   	VCPU_EVENT(vcpu, 4, "deliver: sclp parameter 0x%x",
> @@ -924,15 +984,33 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
>   	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_SERVICE,
>   					 ext.ext_params, 0);
>   
> -	rc  = put_guest_lc(vcpu, EXT_IRQ_SERVICE_SIG, (u16 *)__LC_EXT_INT_CODE);
> -	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
> -	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
> -			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> -	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
> -			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> -	rc |= put_guest_lc(vcpu, ext.ext_params,
> -			   (u32 *)__LC_EXT_PARAMS);
> +	rc = write_sclp(vcpu, ext.ext_params);
> +	return rc ? -EFAULT : 0;
> +}
>   
> +static int __must_check __deliver_service_ev(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
> +	struct kvm_s390_ext_info ext;
> +	int rc = 0;
> +
> +	spin_lock(&fi->lock);
> +	if (!(test_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs))) {
> +		spin_unlock(&fi->lock);
> +		return 0;
> +	}
> +	ext = fi->srv_signal;
> +	/* only clear the event bit */
> +	fi->srv_signal.ext_params &= ~SCCB_EVENT_PENDING;
> +	clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
> +	spin_unlock(&fi->lock);
> +
> +	VCPU_EVENT(vcpu, 4, "%s", "deliver: sclp parameter event");
> +	vcpu->stat.deliver_service_signal++;
> +	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_SERVICE,
> +					 ext.ext_params, 0);
> +
> +	rc = write_sclp(vcpu, SCCB_EVENT_PENDING);
>   	return rc ? -EFAULT : 0;
>   }
>   
> @@ -1028,6 +1106,15 @@ static int __do_deliver_io(struct kvm_vcpu *vcpu, struct kvm_s390_io_info *io)
>   {
>   	int rc;
>   
> +	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +		vcpu->arch.sie_block->iictl = IICTL_CODE_IO;
> +		vcpu->arch.sie_block->subchannel_id = io->subchannel_id;
> +		vcpu->arch.sie_block->subchannel_nr = io->subchannel_nr;
> +		vcpu->arch.sie_block->io_int_parm = io->io_int_parm;
> +		vcpu->arch.sie_block->io_int_word = io->io_int_word;
> +		return 0;
> +	}
> +
>   	rc  = put_guest_lc(vcpu, io->subchannel_id, (u16 *)__LC_SUBCHANNEL_ID);
>   	rc |= put_guest_lc(vcpu, io->subchannel_nr, (u16 *)__LC_SUBCHANNEL_NR);
>   	rc |= put_guest_lc(vcpu, io->io_int_parm, (u32 *)__LC_IO_INT_PARM);
> @@ -1329,6 +1416,9 @@ int __must_check kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu)
>   		case IRQ_PEND_EXT_SERVICE:
>   			rc = __deliver_service(vcpu);
>   			break;
> +		case IRQ_PEND_EXT_SERVICE_EV:
> +			rc = __deliver_service_ev(vcpu);
> +			break;
>   		case IRQ_PEND_PFAULT_DONE:
>   			rc = __deliver_pfault_done(vcpu);
>   			break;
> @@ -1421,7 +1511,7 @@ static int __inject_extcall(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
>   	if (kvm_get_vcpu_by_id(vcpu->kvm, src_id) == NULL)
>   		return -EINVAL;
>   
> -	if (sclp.has_sigpif)
> +	if (sclp.has_sigpif && !kvm_s390_pv_handle_cpu(vcpu))
>   		return sca_inject_ext_call(vcpu, src_id);
>   
>   	if (test_and_set_bit(IRQ_PEND_EXT_EXTERNAL, &li->pending_irqs))
> @@ -1681,9 +1771,6 @@ struct kvm_s390_interrupt_info *kvm_s390_get_io_int(struct kvm *kvm,
>   	return inti;
>   }
>   
> -#define SCCB_MASK 0xFFFFFFF8
> -#define SCCB_EVENT_PENDING 0x3
> -
>   static int __inject_service(struct kvm *kvm,
>   			     struct kvm_s390_interrupt_info *inti)
>   {
> @@ -1692,6 +1779,11 @@ static int __inject_service(struct kvm *kvm,
>   	kvm->stat.inject_service_signal++;
>   	spin_lock(&fi->lock);
>   	fi->srv_signal.ext_params |= inti->ext.ext_params & SCCB_EVENT_PENDING;
> +
> +	/* We always allow events, track them separately from the sccb ints */
> +	if (fi->srv_signal.ext_params & SCCB_EVENT_PENDING)
> +		set_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
> +
>   	/*
>   	 * Early versions of the QEMU s390 bios will inject several
>   	 * service interrupts after another without handling a
> @@ -1834,7 +1926,8 @@ static void __floating_irq_kick(struct kvm *kvm, u64 type)
>   		break;
>   	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
>   		if (!(type & KVM_S390_INT_IO_AI_MASK &&
> -		      kvm->arch.gisa_int.origin))
> +		      kvm->arch.gisa_int.origin) ||
> +		      kvm_s390_pv_handle_cpu(dst_vcpu))
>   			kvm_s390_set_cpuflags(dst_vcpu, CPUSTAT_IO_INT);
>   		break;
>   	default:
> @@ -2082,6 +2175,8 @@ void kvm_s390_clear_float_irqs(struct kvm *kvm)
>   
>   	spin_lock(&fi->lock);
>   	fi->pending_irqs = 0;
> +	if (!kvm_s390_pv_is_protected(kvm))
> +		fi->masked_irqs = 0;
>   	memset(&fi->srv_signal, 0, sizeof(fi->srv_signal));
>   	memset(&fi->mchk, 0, sizeof(fi->mchk));
>   	for (i = 0; i < FIRQ_LIST_COUNT; i++)
> @@ -2146,7 +2241,8 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 __user *usrbuf, u64 len)
>   			n++;
>   		}
>   	}
> -	if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs)) {
> +	if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs) ||
> +	    test_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs)) {
>   		if (n == max_irqs) {
>   			/* signal userspace to try again */
>   			ret = -ENOMEM;
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling
  2019-10-24 11:40 ` [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling Janosch Frank
@ 2019-11-04 11:25   ` David Hildenbrand
  2019-11-05 12:01     ` Christian Borntraeger
  2019-11-14 14:44   ` Thomas Huth
  1 sibling, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 11:25 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> Guest registers for protected guests are stored at offset 0x380.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>   arch/s390/include/asm/kvm_host.h |  4 +++-
>   arch/s390/kvm/kvm-s390.c         | 11 +++++++++++
>   2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 0ab309b7bf4c..5deabf9734d9 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -336,7 +336,9 @@ struct kvm_s390_itdb {
>   struct sie_page {
>   	struct kvm_s390_sie_block sie_block;
>   	struct mcck_volatile_info mcck_info;	/* 0x0200 */
> -	__u8 reserved218[1000];		/* 0x0218 */
> +	__u8 reserved218[360];		/* 0x0218 */
> +	__u64 pv_grregs[16];		/* 0x380 */
> +	__u8 reserved400[512];
>   	struct kvm_s390_itdb itdb;	/* 0x0600 */
>   	__u8 reserved700[2304];		/* 0x0700 */
>   };
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 490fde080107..97d3a81e5074 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -3965,6 +3965,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>   static int __vcpu_run(struct kvm_vcpu *vcpu)
>   {
>   	int rc, exit_reason;
> +	struct sie_page *sie_page = (struct sie_page *)vcpu->arch.sie_block;
>   
>   	/*
>   	 * We try to hold kvm->srcu during most of vcpu_run (except when run-
> @@ -3986,8 +3987,18 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>   		guest_enter_irqoff();
>   		__disable_cpu_timer_accounting(vcpu);
>   		local_irq_enable();
> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +			memcpy(sie_page->pv_grregs,
> +			       vcpu->run->s.regs.gprs,
> +			       sizeof(sie_page->pv_grregs));
> +		}
>   		exit_reason = sie64a(vcpu->arch.sie_block,
>   				     vcpu->run->s.regs.gprs);
> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +			memcpy(vcpu->run->s.regs.gprs,
> +			       sie_page->pv_grregs,
> +			       sizeof(sie_page->pv_grregs));
> +		}

sie64a will load/save gprs 0-13 from to vcpu->run->s.regs.gprs.

I would have assume that this is not required for prot virt, because the 
HW has direct access via the sie block?


1. Would it make sense to have a specialized sie64a() (or a parameter, 
e.g., if you pass in NULL in r3), that optimizes this loading/saving? 
Eventually we can also optimize which host registers to save/restore then.

2. Avoid this copying here. We have to store the state to 
vcpu->run->s.regs.gprs when returning to user space and restore the 
state when coming from user space.

Also, we access the GPRS from interception handlers, there we might use 
wrappers like

kvm_s390_set_gprs()
kvm_s390_get_gprs()

to route to the right location. There are multiple options to optimize this.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-11-04 10:19           ` David Hildenbrand
  2019-11-04 10:25             ` Janosch Frank
@ 2019-11-04 13:58             ` Christian Borntraeger
  2019-11-04 14:08               ` David Hildenbrand
  1 sibling, 1 reply; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-04 13:58 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor



On 04.11.19 11:19, David Hildenbrand wrote:
>>>> to synchronize page import/export with the I/O for paging. For example you can actually
>>>> fault in a page that is currently under paging I/O. What do you do? import (so that the
>>>> guest can run) or export (so that the I/O will work). As this turned out to be harder then
>>>> we though we decided to defer paging to a later point in time.
>>>
>>> I don't quite see the issue yet. If you page out, the page will
>>> automatically (on access) be converted to !secure/encrypted memory. If
>>> the UV/guest wants to access it, it will be automatically converted to
>>> secure/unencrypted memory. If you have concurrent access, it will be
>>> converted back and forth until one party is done.
>>
>> IO does not trigger an export on an imported page, but an error
>> condition in the IO subsystem. The page code does not read pages through
> 
> Ah, that makes it much clearer. Thanks!
> 
>> the cpu, but often just asks the device to read directly and that's
>> where everything goes wrong. We could bounce swapping, but chose to pin
>> for now until we find a proper solution to that problem which nicely
>> integrates into linux.
> 
> How hard would it be to
> 
> 1. Detect the error condition
> 2. Try a read on the affected page from the CPU (will will automatically convert to encrypted/!secure)
> 3. Restart the I/O
> 
> I assume that this is a corner case where we don't really have to care about performance in the first shot.

We have looked into this. You would need to implement this in the low level
handler for every I/O. DASD, FCP, PCI based NVME, iscsi. Where do you want
to stop?
There is no proper global bounce buffer that works for everything. 

>>>
>>> A proper automatic conversion should make this work. What am I missing?
>>>
>>>>
>>>> As we do not want to rely on the userspace to do the mlock this is now done in the kernel.
>>>
>>> I wonder if we could come up with an alternative (similar to how we
>>> override VM_MERGEABLE in the kernel) that can be called and ensured in
>>> the kernel. E.g., marking whole VMAs as "don't page" (I remember
>>> something like "special VMAs" like used for VDSOs that achieve exactly
>>> that, but I am absolutely no expert on that). That would be much nicer
>>> than pinning all pages and remembering what you pinned in huge page
>>> arrays ...
>>
>> It might be more worthwhile to just accept one or two releases with
>> pinning and fix the root of the problem than design a nice stopgap.
> 
> Quite honestly, to me this feels like a prototype hack that deserves a proper solution first. The issue with this hack is that it affects user space (esp. MADV_DONTNEED no longer working correctly). It's not just something you once fix in the kernel and be done with it.

I disagree. Pinning is a valid initial version. I would find it strange to
allow it for AMD SEV, but not allowing it for s390x. 
As far as I can tell  MADV_DONTNEED continues to work within the bounds
of specification. In fact, it does work (or does not depending on your 
perspective :-) ) exactly in the same way as on hugetlbfs,which is also
a way of pinning.

And yes, I am in full agreement that we must work on lifting that
restriction. 


>>
>> Btw. s390 is not alone with the problem and we'll try to have another
>> discussion tomorrow with AMD to find a solution which works for more
>> than one architecture.
> 
> Let me know if there was an interesting outcome.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 14/37] KVM: s390: protvirt: Implement interruption injection
  2019-11-04 10:29   ` David Hildenbrand
@ 2019-11-04 14:05     ` Christian Borntraeger
  2019-11-04 14:23       ` David Hildenbrand
  0 siblings, 1 reply; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-04 14:05 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor



On 04.11.19 11:29, David Hildenbrand wrote:
> On 24.10.19 13:40, Janosch Frank wrote:
>> From: Michael Mueller <mimu@linux.ibm.com>
>>
>> The patch implements interruption injection for the following
>> list of interruption types:
>>
>>    - I/O
>>      __deliver_io (III)
>>
>>    - External
>>      __deliver_cpu_timer (IEI)
>>      __deliver_ckc (IEI)
>>      __deliver_emergency_signal (IEI)
>>      __deliver_external_call (IEI)
>>      __deliver_service (IEI)
>>
>>    - cpu restart
>>      __deliver_restart (IRI)
> 
> What exactly is IRQ_PEND_EXT_SERVICE_EV? Can you add some comments whet the new interrupt does and why it is needed in this context? Thanks

I did that code. What about the following add-on description.

The ultravisor does several checks on injected interrupts. For example it will
check that for an sclp interrupt with an sccb address we had an servc exit
and exit with a validity intercept. 
As the hypervisor must avoid valitity intercepts we now mask invalid interrupts.

There are also sclp interrupts that only inject an event (e.g. an input event
on the sclp consoles) those interrupts must not be masked.
Let us split out these "event interupts" from the normal sccb interrupts into
IRQ_PEND_EXT_SERVICE_EV. 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-11-04 13:58             ` Christian Borntraeger
@ 2019-11-04 14:08               ` David Hildenbrand
  2019-11-04 14:42                 ` David Hildenbrand
  0 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 14:08 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor

On 04.11.19 14:58, Christian Borntraeger wrote:
> 
> 
> On 04.11.19 11:19, David Hildenbrand wrote:
>>>>> to synchronize page import/export with the I/O for paging. For example you can actually
>>>>> fault in a page that is currently under paging I/O. What do you do? import (so that the
>>>>> guest can run) or export (so that the I/O will work). As this turned out to be harder then
>>>>> we though we decided to defer paging to a later point in time.
>>>>
>>>> I don't quite see the issue yet. If you page out, the page will
>>>> automatically (on access) be converted to !secure/encrypted memory. If
>>>> the UV/guest wants to access it, it will be automatically converted to
>>>> secure/unencrypted memory. If you have concurrent access, it will be
>>>> converted back and forth until one party is done.
>>>
>>> IO does not trigger an export on an imported page, but an error
>>> condition in the IO subsystem. The page code does not read pages through
>>
>> Ah, that makes it much clearer. Thanks!
>>
>>> the cpu, but often just asks the device to read directly and that's
>>> where everything goes wrong. We could bounce swapping, but chose to pin
>>> for now until we find a proper solution to that problem which nicely
>>> integrates into linux.
>>
>> How hard would it be to
>>
>> 1. Detect the error condition
>> 2. Try a read on the affected page from the CPU (will will automatically convert to encrypted/!secure)
>> 3. Restart the I/O
>>
>> I assume that this is a corner case where we don't really have to care about performance in the first shot.
> 
> We have looked into this. You would need to implement this in the low level
> handler for every I/O. DASD, FCP, PCI based NVME, iscsi. Where do you want
> to stop?

If that's the real fix, we should do that. Maybe one can focus on the 
real use cases first. But I am no I/O expert, so my judgment might be 
completely wrong.

> There is no proper global bounce buffer that works for everything.
> 
>>>>
>>>> A proper automatic conversion should make this work. What am I missing?
>>>>
>>>>>
>>>>> As we do not want to rely on the userspace to do the mlock this is now done in the kernel.
>>>>
>>>> I wonder if we could come up with an alternative (similar to how we
>>>> override VM_MERGEABLE in the kernel) that can be called and ensured in
>>>> the kernel. E.g., marking whole VMAs as "don't page" (I remember
>>>> something like "special VMAs" like used for VDSOs that achieve exactly
>>>> that, but I am absolutely no expert on that). That would be much nicer
>>>> than pinning all pages and remembering what you pinned in huge page
>>>> arrays ...
>>>
>>> It might be more worthwhile to just accept one or two releases with
>>> pinning and fix the root of the problem than design a nice stopgap.
>>
>> Quite honestly, to me this feels like a prototype hack that deserves a proper solution first. The issue with this hack is that it affects user space (esp. MADV_DONTNEED no longer working correctly). It's not just something you once fix in the kernel and be done with it.
> 
> I disagree. Pinning is a valid initial version. I would find it strange to
> allow it for AMD SEV, but not allowing it for s390x.

"not allowing" is wrong. I don't like it, but I am not NACKing it. All I 
am saying is that this is for me a big fat "prototype" marker.

As a workaround, I would much rather want to see a "don't page" control 
(process/vma) than pinning every single page if "paging" is the only 
concern. Such an internal handling would not imply any real user space 
changes (as noted, like MADV_DONTNEED would see).

> As far as I can tell  MADV_DONTNEED continues to work within the bounds
> of specification. In fact, it does work (or does not depending on your

There is a reason why we disallow MADV_DONTNEED in QEMU when we have 
such vfio devices (e.g., balloon, postcopy live migration, ...). It does 
no longer work as specified for devices that pinned pages. You get 
inconsistent mappings.

> perspective :-) ) exactly in the same way as on hugetlbfs,which is also
> a way of pinning.

MADV_DONTNEED is documented to not work on huge pages. That's a 
different story. You have to use FALLOC_FL_PUNCH_HOLE.


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-10-24 11:40 ` [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning Janosch Frank
  2019-10-25  8:49   ` David Hildenbrand
  2019-11-02  8:53   ` Christian Borntraeger
@ 2019-11-04 14:17   ` David Hildenbrand
  2 siblings, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 14:17 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> Pin the guest pages when they are first accessed, instead of all at
> the same time when starting the guest.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>   arch/s390/include/asm/gmap.h |  1 +
>   arch/s390/include/asm/uv.h   |  6 +++++
>   arch/s390/kernel/uv.c        | 20 ++++++++++++++
>   arch/s390/kvm/kvm-s390.c     |  2 ++
>   arch/s390/kvm/pv.c           | 51 ++++++++++++++++++++++++++++++------
>   5 files changed, 72 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index 99b3eedda26e..483f64427c0e 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -63,6 +63,7 @@ struct gmap {
>   	struct gmap *parent;
>   	unsigned long orig_asce;
>   	unsigned long se_handle;
> +	struct page **pinned_pages;
>   	int edat_level;
>   	bool removed;
>   	bool initialized;
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 99cdd2034503..9ce9363aee1c 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -298,6 +298,7 @@ static inline int uv_convert_from_secure(unsigned long paddr)
>   	return -EINVAL;
>   }
>   
> +int kvm_s390_pv_pin_page(struct gmap *gmap, unsigned long gpa);
>   /*
>    * Requests the Ultravisor to make a page accessible to a guest
>    * (import). If it's brought in the first time, it will be cleared. If
> @@ -317,6 +318,11 @@ static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
>   		.gaddr = gaddr
>   	};
>   
> +	down_read(&gmap->mm->mmap_sem);
> +	cc = kvm_s390_pv_pin_page(gmap, gaddr);
> +	up_read(&gmap->mm->mmap_sem);
> +	if (cc)
> +		return cc;
>   	cc = uv_call(0, (u64)&uvcb);

So, a theoretical question: Is any in-flight I/O from paging stopped 
when we try to pin the pages? I am no export on paging, but the comment 
from Christian ("you can actually fault in a page that is currently 
under paging I/O.") made me wonder.

Let's assume you could have parallel I/O on that page. You would do a 
uv_call() and convert the page to secure/unencrypted. The I/O can easily 
stumble over that (now inaccessible) page and report an error.

Or is any such race not possible because we are using 
get_user_pages_fast() vs. get_user_pages()? (then, I'd love to see a 
comment regarding that in the patch description)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 01/37] DOCUMENTATION: protvirt: Protected virtual machine introduction
  2019-10-24 11:40 ` [RFC 01/37] DOCUMENTATION: protvirt: Protected virtual machine introduction Janosch Frank
  2019-11-01  8:18   ` Christian Borntraeger
@ 2019-11-04 14:18   ` Cornelia Huck
  2019-11-12 14:38     ` Janosch Frank
  1 sibling, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-04 14:18 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:23 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> Introduction to Protected VMs.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  Documentation/virtual/kvm/s390-pv.txt | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/s390-pv.txt
> 
> diff --git a/Documentation/virtual/kvm/s390-pv.txt b/Documentation/virtual/kvm/s390-pv.txt
> new file mode 100644
> index 000000000000..86ed95f36759
> --- /dev/null
> +++ b/Documentation/virtual/kvm/s390-pv.txt

This should be under /virt/, I think. Also, maybe start out with RST
already for new files?

> @@ -0,0 +1,23 @@
> +Ultravisor and Protected VMs
> +===========================
> +
> +Summary:
> +
> +Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state
> +like guest memory and guest registers anymore. Instead the PVMs are

s/Instead/Instead,/

> +mostly managed by a new entity called Ultravisor (UV), which provides
> +an API, so KVM and the PVM can request management actions.

Hm...

"The UV provides an API (both for guests and hypervisors), where PVMs
and KVM can request management actions." ?

> +
> +Each guest starts in the non-protected mode and then transitions into

"and then may make a request to transition into protected mode" ?

> +protected mode. On transition KVM registers the guest and its VCPUs
> +with the Ultravisor and prepares everything for running it.
> +
> +The Ultravisor will secure and decrypt the guest's boot memory
> +(i.e. kernel/initrd). It will safeguard state changes like VCPU
> +starts/stops and injected interrupts while the guest is running.
> +
> +As access to the guest's state, like the SIE state description is

"such as the SIE state description," ?

> +normally needed to be able to run a VM, some changes have been made in
> +SIE behavior and fields have different meaning for a PVM. SIE exits
> +are minimized as much as possible to improve speed and reduce exposed
> +guest state.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 14/37] KVM: s390: protvirt: Implement interruption injection
  2019-11-04 14:05     ` Christian Borntraeger
@ 2019-11-04 14:23       ` David Hildenbrand
  0 siblings, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 14:23 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor

On 04.11.19 15:05, Christian Borntraeger wrote:
> 
> 
> On 04.11.19 11:29, David Hildenbrand wrote:
>> On 24.10.19 13:40, Janosch Frank wrote:
>>> From: Michael Mueller <mimu@linux.ibm.com>
>>>
>>> The patch implements interruption injection for the following
>>> list of interruption types:
>>>
>>>     - I/O
>>>       __deliver_io (III)
>>>
>>>     - External
>>>       __deliver_cpu_timer (IEI)
>>>       __deliver_ckc (IEI)
>>>       __deliver_emergency_signal (IEI)
>>>       __deliver_external_call (IEI)
>>>       __deliver_service (IEI)
>>>
>>>     - cpu restart
>>>       __deliver_restart (IRI)
>>
>> What exactly is IRQ_PEND_EXT_SERVICE_EV? Can you add some comments whet the new interrupt does and why it is needed in this context? Thanks
> 
> I did that code. What about the following add-on description.
> 
> The ultravisor does several checks on injected interrupts. For example it will
> check that for an sclp interrupt with an sccb address we had an servc exit
> and exit with a validity intercept.
> As the hypervisor must avoid valitity intercepts we now mask invalid interrupts.

s/valitity/validity/

> 
> There are also sclp interrupts that only inject an event (e.g. an input event
> on the sclp consoles) those interrupts must not be masked.
> Let us split out these "event interupts" from the normal sccb interrupts into
> IRQ_PEND_EXT_SERVICE_EV.
> 

Thanks for the clarification. From what I see, this is transparent from 
user space - we only track these interrupts separately internally.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-11-01  8:53   ` Christian Borntraeger
@ 2019-11-04 14:26     ` Cornelia Huck
  2019-11-12 14:47       ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-04 14:26 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, kvm, linux-s390, thuth, david, imbrenda, mihajlov,
	mimu, gor

On Fri, 1 Nov 2019 09:53:12 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 24.10.19 13:40, Janosch Frank wrote:
> > From: Vasily Gorbik <gor@linux.ibm.com>
> > 
> > Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option for
> > protected virtual machines hosting support code.
> > 
> > Add "prot_virt" command line option which controls if the kernel
> > protected VMs support is enabled at runtime.
> > 
> > Extend ultravisor info definitions and expose it via uv_info struct
> > filled in during startup.
> > 
> > Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> > ---
> >  .../admin-guide/kernel-parameters.txt         |  5 ++
> >  arch/s390/boot/Makefile                       |  2 +-
> >  arch/s390/boot/uv.c                           | 20 +++++++-
> >  arch/s390/include/asm/uv.h                    | 46 ++++++++++++++++--
> >  arch/s390/kernel/Makefile                     |  1 +
> >  arch/s390/kernel/setup.c                      |  4 --
> >  arch/s390/kernel/uv.c                         | 48 +++++++++++++++++++
> >  arch/s390/kvm/Kconfig                         |  9 ++++
> >  8 files changed, 126 insertions(+), 9 deletions(-)
> >  create mode 100644 arch/s390/kernel/uv.c

(...)

> > diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
> > index d3db3d7ed077..652b36f0efca 100644
> > --- a/arch/s390/kvm/Kconfig
> > +++ b/arch/s390/kvm/Kconfig
> > @@ -55,6 +55,15 @@ config KVM_S390_UCONTROL
> > 
> >  	  If unsure, say N.
> > 
> > +config KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> > +	bool "Protected guests execution support"
> > +	depends on KVM
> > +	---help---
> > +	  Support hosting protected virtual machines isolated from the
> > +	  hypervisor.
> > +
> > +	  If unsure, say Y.
> > +
> >  # OK, it's a little counter-intuitive to do this, but it puts it neatly under
> >  # the virtualization menu.
> >  source "drivers/vhost/Kconfig"
> >   
> 
> As we have the prot_virt kernel paramter there is a way to fence this during runtime
> Not sure if we really need a build time fence. We could get rid of
> CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST and just use CONFIG_KVM instead,
> assuming that in the long run all distros will enable that anyway. 

I still need to read through the rest of this patch set to have an
informed opinion on that, which will probably take some more time.

> If other reviewers prefer to keep that extra option what about the following to the
> help section:
> 
> ----
> Support hosting protected virtual machines in KVM. The state of these machines like
> memory content or register content is protected from the host or host administrators.
> 
> Enabling this option will enable extra code that talks to a new firmware instance

"...that allows the host kernel to talk..." ?

> called ultravisor that will take care of protecting the guest while also enabling
> KVM to run this guest.
> 
> This feature must be enable by the kernel command line option prot_virt.

s/enable by/enabled via/

> 
> 	  If unsure, say Y.

Looks better. I'm continuing to read the rest of this series before I
say more, though :)

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC v2] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-25  8:24   ` [RFC v2] " Janosch Frank
  2019-11-01 13:02     ` Christian Borntraeger
@ 2019-11-04 14:32     ` David Hildenbrand
  2019-11-04 14:36       ` Janosch Frank
  2019-11-13 12:23     ` Thomas Huth
  2 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 14:32 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 25.10.19 10:24, Janosch Frank wrote:
> KSM will not work on secure pages, because when the kernel reads a
> secure page, it will be encrypted and hence no two pages will look the
> same.
> 
> Let's mark the guest pages as unmergeable when we transition to secure
> mode.

Patch itself looks good to me, but I do wonder: Is this really needed 
when pinning all encrypted pages currently?

Not sure about races between KSM and the pinning/encrypting thread, 
similar to paging, though ...

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC v2] KVM: s390: protvirt: Secure memory is not mergeable
  2019-11-04 14:32     ` David Hildenbrand
@ 2019-11-04 14:36       ` Janosch Frank
  2019-11-04 14:38         ` David Hildenbrand
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-04 14:36 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 755 bytes --]

On 11/4/19 3:32 PM, David Hildenbrand wrote:
> On 25.10.19 10:24, Janosch Frank wrote:
>> KSM will not work on secure pages, because when the kernel reads a
>> secure page, it will be encrypted and hence no two pages will look the
>> same.
>>
>> Let's mark the guest pages as unmergeable when we transition to secure
>> mode.
> 
> Patch itself looks good to me, but I do wonder: Is this really needed 
> when pinning all encrypted pages currently?
> 
> Not sure about races between KSM and the pinning/encrypting thread, 
> similar to paging, though ...
> 

The pinning was added several months after I wrote the patch.
Now that we have it, we really need to have another proper look at the
whole topic.

Thanks for your review :-)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC v2] KVM: s390: protvirt: Secure memory is not mergeable
  2019-11-04 14:36       ` Janosch Frank
@ 2019-11-04 14:38         ` David Hildenbrand
  0 siblings, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 14:38 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 04.11.19 15:36, Janosch Frank wrote:
> On 11/4/19 3:32 PM, David Hildenbrand wrote:
>> On 25.10.19 10:24, Janosch Frank wrote:
>>> KSM will not work on secure pages, because when the kernel reads a
>>> secure page, it will be encrypted and hence no two pages will look the
>>> same.
>>>
>>> Let's mark the guest pages as unmergeable when we transition to secure
>>> mode.
>>
>> Patch itself looks good to me, but I do wonder: Is this really needed
>> when pinning all encrypted pages currently?
>>
>> Not sure about races between KSM and the pinning/encrypting thread,
>> similar to paging, though ...
>>
> 
> The pinning was added several months after I wrote the patch.
> Now that we have it, we really need to have another proper look at the
> whole topic.
> 
> Thanks for your review :-)

I'd certainly prefer this patch (+some way to mlock) over pinning ;)

You can have

Reviewed-by: David Hildenbrand <david@redhat.com>

For this patch, if you end up needing it :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-11-04 14:08               ` David Hildenbrand
@ 2019-11-04 14:42                 ` David Hildenbrand
  2019-11-04 17:17                   ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 14:42 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor

On 04.11.19 15:08, David Hildenbrand wrote:
> On 04.11.19 14:58, Christian Borntraeger wrote:
>>
>>
>> On 04.11.19 11:19, David Hildenbrand wrote:
>>>>>> to synchronize page import/export with the I/O for paging. For example you can actually
>>>>>> fault in a page that is currently under paging I/O. What do you do? import (so that the
>>>>>> guest can run) or export (so that the I/O will work). As this turned out to be harder then
>>>>>> we though we decided to defer paging to a later point in time.
>>>>>
>>>>> I don't quite see the issue yet. If you page out, the page will
>>>>> automatically (on access) be converted to !secure/encrypted memory. If
>>>>> the UV/guest wants to access it, it will be automatically converted to
>>>>> secure/unencrypted memory. If you have concurrent access, it will be
>>>>> converted back and forth until one party is done.
>>>>
>>>> IO does not trigger an export on an imported page, but an error
>>>> condition in the IO subsystem. The page code does not read pages through
>>>
>>> Ah, that makes it much clearer. Thanks!
>>>
>>>> the cpu, but often just asks the device to read directly and that's
>>>> where everything goes wrong. We could bounce swapping, but chose to pin
>>>> for now until we find a proper solution to that problem which nicely
>>>> integrates into linux.
>>>
>>> How hard would it be to
>>>
>>> 1. Detect the error condition
>>> 2. Try a read on the affected page from the CPU (will will automatically convert to encrypted/!secure)
>>> 3. Restart the I/O
>>>
>>> I assume that this is a corner case where we don't really have to care about performance in the first shot.
>>
>> We have looked into this. You would need to implement this in the low level
>> handler for every I/O. DASD, FCP, PCI based NVME, iscsi. Where do you want
>> to stop?
> 
> If that's the real fix, we should do that. Maybe one can focus on the
> real use cases first. But I am no I/O expert, so my judgment might be
> completely wrong.
> 

Oh, and by the way, as discussed you really only have to care about 
accesses via "real" I/O devices (IOW, not via the CPU). When accessing 
via the CPU, you should have automatic conversion back and forth. As I 
am no expert on I/O, I have no idea how iscsi fits into this picture 
here (especially on s390x).

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-10-24 11:40 ` [RFC 02/37] s390/protvirt: introduce host side setup Janosch Frank
                     ` (2 preceding siblings ...)
  2019-11-01  8:53   ` Christian Borntraeger
@ 2019-11-04 15:54   ` Cornelia Huck
  2019-11-04 17:50     ` Christian Borntraeger
  3 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-04 15:54 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:24 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> From: Vasily Gorbik <gor@linux.ibm.com>
> 
> Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option for
> protected virtual machines hosting support code.
> 
> Add "prot_virt" command line option which controls if the kernel
> protected VMs support is enabled at runtime.
> 
> Extend ultravisor info definitions and expose it via uv_info struct
> filled in during startup.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  5 ++
>  arch/s390/boot/Makefile                       |  2 +-
>  arch/s390/boot/uv.c                           | 20 +++++++-
>  arch/s390/include/asm/uv.h                    | 46 ++++++++++++++++--
>  arch/s390/kernel/Makefile                     |  1 +
>  arch/s390/kernel/setup.c                      |  4 --
>  arch/s390/kernel/uv.c                         | 48 +++++++++++++++++++
>  arch/s390/kvm/Kconfig                         |  9 ++++
>  8 files changed, 126 insertions(+), 9 deletions(-)
>  create mode 100644 arch/s390/kernel/uv.c

(...)

> diff --git a/arch/s390/boot/uv.c b/arch/s390/boot/uv.c
> index ed007f4a6444..88cf8825d169 100644
> --- a/arch/s390/boot/uv.c
> +++ b/arch/s390/boot/uv.c
> @@ -3,7 +3,12 @@
>  #include <asm/facility.h>
>  #include <asm/sections.h>
>  
> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>  int __bootdata_preserved(prot_virt_guest);
> +#endif
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +struct uv_info __bootdata_preserved(uv_info);
> +#endif

Two functions with the same name, but different signatures look really
ugly.

Also, what happens if I want to build just a single kernel image for
both guest and host?

>  
>  void uv_query_info(void)
>  {
> @@ -18,7 +23,20 @@ void uv_query_info(void)
>  	if (uv_call(0, (uint64_t)&uvcb))
>  		return;
>  
> -	if (test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST)) {

Do we always have everything needed for a host if uv_call() is
successful?

> +		memcpy(uv_info.inst_calls_list, uvcb.inst_calls_list, sizeof(uv_info.inst_calls_list));
> +		uv_info.uv_base_stor_len = uvcb.uv_base_stor_len;
> +		uv_info.guest_base_stor_len = uvcb.conf_base_phys_stor_len;
> +		uv_info.guest_virt_base_stor_len = uvcb.conf_base_virt_stor_len;
> +		uv_info.guest_virt_var_stor_len = uvcb.conf_virt_var_stor_len;
> +		uv_info.guest_cpu_stor_len = uvcb.cpu_stor_len;
> +		uv_info.max_sec_stor_addr = ALIGN(uvcb.max_guest_stor_addr, PAGE_SIZE);
> +		uv_info.max_num_sec_conf = uvcb.max_num_sec_conf;
> +		uv_info.max_guest_cpus = uvcb.max_guest_cpus;
> +	}
> +
> +	if (IS_ENABLED(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) &&
> +	    test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
>  	    test_bit_inv(BIT_UVC_CMD_REMOVE_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list))

Especially as it looks like we need to test for those two commands to
determine whether we have support for a guest.

>  		prot_virt_guest = 1;
>  }
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index ef3c00b049ab..6db1bc495e67 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -44,7 +44,19 @@ struct uv_cb_qui {
>  	struct uv_cb_header header;
>  	u64 reserved08;
>  	u64 inst_calls_list[4];
> -	u64 reserved30[15];
> +	u64 reserved30[2];
> +	u64 uv_base_stor_len;
> +	u64 reserved48;
> +	u64 conf_base_phys_stor_len;
> +	u64 conf_base_virt_stor_len;
> +	u64 conf_virt_var_stor_len;
> +	u64 cpu_stor_len;
> +	u32 reserved68[3];
> +	u32 max_num_sec_conf;
> +	u64 max_guest_stor_addr;
> +	u8  reserved80[150-128];
> +	u16 max_guest_cpus;
> +	u64 reserved98;
>  } __packed __aligned(8);
>  
>  struct uv_cb_share {
> @@ -69,9 +81,21 @@ static inline int uv_call(unsigned long r1, unsigned long r2)
>  	return cc;
>  }
>  
> -#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
> +struct uv_info {
> +	unsigned long inst_calls_list[4];
> +	unsigned long uv_base_stor_len;
> +	unsigned long guest_base_stor_len;
> +	unsigned long guest_virt_base_stor_len;
> +	unsigned long guest_virt_var_stor_len;
> +	unsigned long guest_cpu_stor_len;
> +	unsigned long max_sec_stor_addr;
> +	unsigned int max_num_sec_conf;
> +	unsigned short max_guest_cpus;
> +};

What is the main difference between uv_info and uv_cb_qui? The
alignment of max_sec_stor_addr?

> +extern struct uv_info uv_info;
>  extern int prot_virt_guest;
>  
> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>  static inline int is_prot_virt_guest(void)
>  {
>  	return prot_virt_guest;
> @@ -121,11 +145,27 @@ static inline int uv_remove_shared(unsigned long addr)
>  	return share(addr, UVC_CMD_REMOVE_SHARED_ACCESS);
>  }
>  
> -void uv_query_info(void);
>  #else
>  #define is_prot_virt_guest() 0
>  static inline int uv_set_shared(unsigned long addr) { return 0; }
>  static inline int uv_remove_shared(unsigned long addr) { return 0; }
> +#endif
> +
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +extern int prot_virt_host;
> +
> +static inline int is_prot_virt_host(void)
> +{
> +	return prot_virt_host;
> +}
> +#else
> +#define is_prot_virt_host() 0
> +#endif
> +
> +#if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> +	defined(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST)
> +void uv_query_info(void);
> +#else
>  static inline void uv_query_info(void) {}
>  #endif

(...)

> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> new file mode 100644
> index 000000000000..35ce89695509
> --- /dev/null
> +++ b/arch/s390/kernel/uv.c
> @@ -0,0 +1,48 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Common Ultravisor functions and initialization
> + *
> + * Copyright IBM Corp. 2019
> + */
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/sizes.h>
> +#include <linux/bitmap.h>
> +#include <linux/memblock.h>
> +#include <asm/facility.h>
> +#include <asm/sections.h>
> +#include <asm/uv.h>
> +
> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
> +int __bootdata_preserved(prot_virt_guest);
> +#endif
> +
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +int prot_virt_host;
> +EXPORT_SYMBOL(prot_virt_host);
> +struct uv_info __bootdata_preserved(uv_info);
> +EXPORT_SYMBOL(uv_info);
> +
> +static int __init prot_virt_setup(char *val)
> +{
> +	bool enabled;
> +	int rc;
> +
> +	rc = kstrtobool(val, &enabled);
> +	if (!rc && enabled)
> +		prot_virt_host = 1;
> +
> +	if (is_prot_virt_guest() && prot_virt_host) {
> +		prot_virt_host = 0;
> +		pr_info("Running as protected virtualization guest.");
> +	}
> +
> +	if (prot_virt_host && !test_facility(158)) {

Why not check for that facility earlier? If it is not present, we
cannot run as a prot virt guest, either.

> +		prot_virt_host = 0;
> +		pr_info("The ultravisor call facility is not available.");
> +	}
> +
> +	return rc;
> +}
> +early_param("prot_virt", prot_virt_setup);

Maybe rename this to prot_virt_host?

> +#endif

(...)

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-11-04 14:42                 ` David Hildenbrand
@ 2019-11-04 17:17                   ` Cornelia Huck
  2019-11-04 17:44                     ` David Hildenbrand
  2019-11-04 18:38                     ` David Hildenbrand
  0 siblings, 2 replies; 213+ messages in thread
From: Cornelia Huck @ 2019-11-04 17:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Christian Borntraeger, Janosch Frank, kvm, linux-s390, thuth,
	imbrenda, mihajlov, mimu, gor

On Mon, 4 Nov 2019 15:42:11 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 04.11.19 15:08, David Hildenbrand wrote:
> > On 04.11.19 14:58, Christian Borntraeger wrote:  
> >>
> >>
> >> On 04.11.19 11:19, David Hildenbrand wrote:  
> >>>>>> to synchronize page import/export with the I/O for paging. For example you can actually
> >>>>>> fault in a page that is currently under paging I/O. What do you do? import (so that the
> >>>>>> guest can run) or export (so that the I/O will work). As this turned out to be harder then
> >>>>>> we though we decided to defer paging to a later point in time.  
> >>>>>
> >>>>> I don't quite see the issue yet. If you page out, the page will
> >>>>> automatically (on access) be converted to !secure/encrypted memory. If
> >>>>> the UV/guest wants to access it, it will be automatically converted to
> >>>>> secure/unencrypted memory. If you have concurrent access, it will be
> >>>>> converted back and forth until one party is done.  
> >>>>
> >>>> IO does not trigger an export on an imported page, but an error
> >>>> condition in the IO subsystem. The page code does not read pages through  
> >>>
> >>> Ah, that makes it much clearer. Thanks!
> >>>  
> >>>> the cpu, but often just asks the device to read directly and that's
> >>>> where everything goes wrong. We could bounce swapping, but chose to pin
> >>>> for now until we find a proper solution to that problem which nicely
> >>>> integrates into linux.  
> >>>
> >>> How hard would it be to
> >>>
> >>> 1. Detect the error condition
> >>> 2. Try a read on the affected page from the CPU (will will automatically convert to encrypted/!secure)
> >>> 3. Restart the I/O
> >>>
> >>> I assume that this is a corner case where we don't really have to care about performance in the first shot.  
> >>
> >> We have looked into this. You would need to implement this in the low level
> >> handler for every I/O. DASD, FCP, PCI based NVME, iscsi. Where do you want
> >> to stop?  
> > 
> > If that's the real fix, we should do that. Maybe one can focus on the
> > real use cases first. But I am no I/O expert, so my judgment might be
> > completely wrong.
> >   
> 
> Oh, and by the way, as discussed you really only have to care about 
> accesses via "real" I/O devices (IOW, not via the CPU). When accessing 
> via the CPU, you should have automatic conversion back and forth. As I 
> am no expert on I/O, I have no idea how iscsi fits into this picture 
> here (especially on s390x).
> 

By "real" I/O devices, you mean things like channel devices, right? (So
everything where you basically hand off control to a different kind of
processor.)

For classic channel I/O (as used by dasd), I'd expect something like
getting a check condition on a ccw if the CU or device cannot access
the memory. You will know how far the channel program has progressed,
and might be able to restart (from the beginning or from that point).
Probably has a chance of working for a subset of channel programs.

For QDIO (as used by FCP), I have no idea how this is could work, as we
have long-running channel programs there and any error basically kills
the queues, which you would have to re-setup from the beginning.

For PCI devices, I have no idea how the instructions even act.

From my point of view, that error/restart approach looks nice on paper,
but it seems hard to make it work in the general case (and I'm unsure
if it's possible at all.)

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-11-04 17:17                   ` Cornelia Huck
@ 2019-11-04 17:44                     ` David Hildenbrand
  2019-11-04 18:38                     ` David Hildenbrand
  1 sibling, 0 replies; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 17:44 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Christian Borntraeger, Janosch Frank, kvm, linux-s390, thuth,
	imbrenda, mihajlov, mimu, gor

On 04.11.19 18:17, Cornelia Huck wrote:
> On Mon, 4 Nov 2019 15:42:11 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 04.11.19 15:08, David Hildenbrand wrote:
>>> On 04.11.19 14:58, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 04.11.19 11:19, David Hildenbrand wrote:
>>>>>>>> to synchronize page import/export with the I/O for paging. For example you can actually
>>>>>>>> fault in a page that is currently under paging I/O. What do you do? import (so that the
>>>>>>>> guest can run) or export (so that the I/O will work). As this turned out to be harder then
>>>>>>>> we though we decided to defer paging to a later point in time.
>>>>>>>
>>>>>>> I don't quite see the issue yet. If you page out, the page will
>>>>>>> automatically (on access) be converted to !secure/encrypted memory. If
>>>>>>> the UV/guest wants to access it, it will be automatically converted to
>>>>>>> secure/unencrypted memory. If you have concurrent access, it will be
>>>>>>> converted back and forth until one party is done.
>>>>>>
>>>>>> IO does not trigger an export on an imported page, but an error
>>>>>> condition in the IO subsystem. The page code does not read pages through
>>>>>
>>>>> Ah, that makes it much clearer. Thanks!
>>>>>   
>>>>>> the cpu, but often just asks the device to read directly and that's
>>>>>> where everything goes wrong. We could bounce swapping, but chose to pin
>>>>>> for now until we find a proper solution to that problem which nicely
>>>>>> integrates into linux.
>>>>>
>>>>> How hard would it be to
>>>>>
>>>>> 1. Detect the error condition
>>>>> 2. Try a read on the affected page from the CPU (will will automatically convert to encrypted/!secure)
>>>>> 3. Restart the I/O
>>>>>
>>>>> I assume that this is a corner case where we don't really have to care about performance in the first shot.
>>>>
>>>> We have looked into this. You would need to implement this in the low level
>>>> handler for every I/O. DASD, FCP, PCI based NVME, iscsi. Where do you want
>>>> to stop?
>>>
>>> If that's the real fix, we should do that. Maybe one can focus on the
>>> real use cases first. But I am no I/O expert, so my judgment might be
>>> completely wrong.
>>>    
>>
>> Oh, and by the way, as discussed you really only have to care about
>> accesses via "real" I/O devices (IOW, not via the CPU). When accessing
>> via the CPU, you should have automatic conversion back and forth. As I
>> am no expert on I/O, I have no idea how iscsi fits into this picture
>> here (especially on s390x).
>>
> 
> By "real" I/O devices, you mean things like channel devices, right? (So
> everything where you basically hand off control to a different kind of
> processor.)

Exactly.

> 
> For classic channel I/O (as used by dasd), I'd expect something like
> getting a check condition on a ccw if the CU or device cannot access
> the memory. You will know how far the channel program has progressed,
> and might be able to restart (from the beginning or from that point).
> Probably has a chance of working for a subset of channel programs.

Yeah, sound sane to me.

> 
> For QDIO (as used by FCP), I have no idea how this is could work, as we
> have long-running channel programs there and any error basically kills
> the queues, which you would have to re-setup from the beginning.
> 
> For PCI devices, I have no idea how the instructions even act.
> 
>  From my point of view, that error/restart approach looks nice on paper,
> but it seems hard to make it work in the general case (and I'm unsure
> if it's possible at all.)

Then I'm afraid whoever designed protected virtualization didn't 
properly consider concurrent I/O access to encrypted pages. It might not 
be easy to sort out, though, so I understand why the I/O part was 
designed that way :)

I was wondering if one could implement some kind of automatic conversion 
"back and forth" on I/O access (or even on any access within the HW). I 
mean, "basically" it's just encrypting/decrypting the page and updating 
the state by the UV (+ synchronization, lol). But yeah, the UV is 
involved, and would be triggered somehow via I/O access to these pages.
Right now that conversion is performed via exceptions by the OS 
explicitly. Instead of passing exceptions, the UV could convert 
automatically. Smells like massive HW changes, if possible and desired 
at all.

I do wonder what would happen if you back your guest memory not on 
anonymous memory but on e.g., a file. Could be that this eliminates all 
options besides pinning and fixing I/O, because we're talking about 
writeback and not paging.

HOWEVER, reading https://lwn.net/Articles/787636/

"Kara talked mostly about the writeback code; in some cases, it will 
simply skip pages that are pinned. But there are cases where those pages 
must be written out — "somebody has called fsync(), and they expect 
something to be saved". In this case, pinned pages will be written, but 
they will not be marked clean at the end of the operation; they will 
still be write-protected in the page tables while writeback is underway, 
though."

So, sounds like you will get concurrent I/O access even without paging 
... and that would leave fixing I/O the only option with the current HW 
design AFAIKS :/

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-11-04 15:54   ` Cornelia Huck
@ 2019-11-04 17:50     ` Christian Borntraeger
  2019-11-05  9:26       ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-04 17:50 UTC (permalink / raw)
  To: Cornelia Huck, Janosch Frank
  Cc: kvm, linux-s390, thuth, david, imbrenda, mihajlov, mimu, gor



On 04.11.19 16:54, Cornelia Huck wrote:
> On Thu, 24 Oct 2019 07:40:24 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> From: Vasily Gorbik <gor@linux.ibm.com>
>>
>> Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option for
>> protected virtual machines hosting support code.
>>
>> Add "prot_virt" command line option which controls if the kernel
>> protected VMs support is enabled at runtime.
>>
>> Extend ultravisor info definitions and expose it via uv_info struct
>> filled in during startup.
>>
>> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
>> ---
>>  .../admin-guide/kernel-parameters.txt         |  5 ++
>>  arch/s390/boot/Makefile                       |  2 +-
>>  arch/s390/boot/uv.c                           | 20 +++++++-
>>  arch/s390/include/asm/uv.h                    | 46 ++++++++++++++++--
>>  arch/s390/kernel/Makefile                     |  1 +
>>  arch/s390/kernel/setup.c                      |  4 --
>>  arch/s390/kernel/uv.c                         | 48 +++++++++++++++++++
>>  arch/s390/kvm/Kconfig                         |  9 ++++
>>  8 files changed, 126 insertions(+), 9 deletions(-)
>>  create mode 100644 arch/s390/kernel/uv.c
> 
> (...)
> 
>> diff --git a/arch/s390/boot/uv.c b/arch/s390/boot/uv.c
>> index ed007f4a6444..88cf8825d169 100644
>> --- a/arch/s390/boot/uv.c
>> +++ b/arch/s390/boot/uv.c
>> @@ -3,7 +3,12 @@
>>  #include <asm/facility.h>
>>  #include <asm/sections.h>
>>  
>> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>>  int __bootdata_preserved(prot_virt_guest);
>> +#endif
>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>> +struct uv_info __bootdata_preserved(uv_info);
>> +#endif
> 
> Two functions with the same name, but different signatures look really
> ugly.
> 
> Also, what happens if I want to build just a single kernel image for
> both guest and host?

This is not two functions with the same name. It is 2 variable declarations with
the __bootdata_preserved helper. We expect to have all distro kernels to enable
both. 

> 
>>  
>>  void uv_query_info(void)
>>  {
>> @@ -18,7 +23,20 @@ void uv_query_info(void)
>>  	if (uv_call(0, (uint64_t)&uvcb))
>>  		return;
>>  
>> -	if (test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
>> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST)) {
> 
> Do we always have everything needed for a host if uv_call() is
> successful?

The uv_call is the query call. It will provide the list of features. We check that
later on.

> 
>> +		memcpy(uv_info.inst_calls_list, uvcb.inst_calls_list, sizeof(uv_info.inst_calls_list));
>> +		uv_info.uv_base_stor_len = uvcb.uv_base_stor_len;
>> +		uv_info.guest_base_stor_len = uvcb.conf_base_phys_stor_len;
>> +		uv_info.guest_virt_base_stor_len = uvcb.conf_base_virt_stor_len;
>> +		uv_info.guest_virt_var_stor_len = uvcb.conf_virt_var_stor_len;
>> +		uv_info.guest_cpu_stor_len = uvcb.cpu_stor_len;
>> +		uv_info.max_sec_stor_addr = ALIGN(uvcb.max_guest_stor_addr, PAGE_SIZE);
>> +		uv_info.max_num_sec_conf = uvcb.max_num_sec_conf;
>> +		uv_info.max_guest_cpus = uvcb.max_guest_cpus;
>> +	}
>> +
>> +	if (IS_ENABLED(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) &&
>> +	    test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
>>  	    test_bit_inv(BIT_UVC_CMD_REMOVE_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list))
> 
> Especially as it looks like we need to test for those two commands to
> determine whether we have support for a guest.
> 
>>  		prot_virt_guest = 1;
>>  }
>> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
>> index ef3c00b049ab..6db1bc495e67 100644
>> --- a/arch/s390/include/asm/uv.h
>> +++ b/arch/s390/include/asm/uv.h
>> @@ -44,7 +44,19 @@ struct uv_cb_qui {
>>  	struct uv_cb_header header;
>>  	u64 reserved08;
>>  	u64 inst_calls_list[4];
>> -	u64 reserved30[15];
>> +	u64 reserved30[2];
>> +	u64 uv_base_stor_len;
>> +	u64 reserved48;
>> +	u64 conf_base_phys_stor_len;
>> +	u64 conf_base_virt_stor_len;
>> +	u64 conf_virt_var_stor_len;
>> +	u64 cpu_stor_len;
>> +	u32 reserved68[3];
>> +	u32 max_num_sec_conf;
>> +	u64 max_guest_stor_addr;
>> +	u8  reserved80[150-128];
>> +	u16 max_guest_cpus;
>> +	u64 reserved98;
>>  } __packed __aligned(8);
>>  
>>  struct uv_cb_share {
>> @@ -69,9 +81,21 @@ static inline int uv_call(unsigned long r1, unsigned long r2)
>>  	return cc;
>>  }
>>  
>> -#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>> +struct uv_info {
>> +	unsigned long inst_calls_list[4];
>> +	unsigned long uv_base_stor_len;
>> +	unsigned long guest_base_stor_len;
>> +	unsigned long guest_virt_base_stor_len;
>> +	unsigned long guest_virt_var_stor_len;
>> +	unsigned long guest_cpu_stor_len;
>> +	unsigned long max_sec_stor_addr;
>> +	unsigned int max_num_sec_conf;
>> +	unsigned short max_guest_cpus;
>> +};
> 
> What is the main difference between uv_info and uv_cb_qui? The
> alignment of max_sec_stor_addr?

One is the hardware data structure for query, the other one is the Linux
internal state.

> 
>> +extern struct uv_info uv_info;
>>  extern int prot_virt_guest;
>>  
>> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>>  static inline int is_prot_virt_guest(void)
>>  {
>>  	return prot_virt_guest;
>> @@ -121,11 +145,27 @@ static inline int uv_remove_shared(unsigned long addr)
>>  	return share(addr, UVC_CMD_REMOVE_SHARED_ACCESS);
>>  }
>>  
>> -void uv_query_info(void);
>>  #else
>>  #define is_prot_virt_guest() 0
>>  static inline int uv_set_shared(unsigned long addr) { return 0; }
>>  static inline int uv_remove_shared(unsigned long addr) { return 0; }
>> +#endif
>> +
>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>> +extern int prot_virt_host;
>> +
>> +static inline int is_prot_virt_host(void)
>> +{
>> +	return prot_virt_host;
>> +}
>> +#else
>> +#define is_prot_virt_host() 0
>> +#endif
>> +
>> +#if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
>> +	defined(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST)
>> +void uv_query_info(void);
>> +#else
>>  static inline void uv_query_info(void) {}
>>  #endif
> 
> (...)
> 
[...]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-11-04 17:17                   ` Cornelia Huck
  2019-11-04 17:44                     ` David Hildenbrand
@ 2019-11-04 18:38                     ` David Hildenbrand
  2019-11-05  9:15                       ` Cornelia Huck
  1 sibling, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-11-04 18:38 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Christian Borntraeger, Janosch Frank, kvm, linux-s390, thuth,
	imbrenda, mihajlov, mimu, gor

On 04.11.19 18:17, Cornelia Huck wrote:
> On Mon, 4 Nov 2019 15:42:11 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 04.11.19 15:08, David Hildenbrand wrote:
>>> On 04.11.19 14:58, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 04.11.19 11:19, David Hildenbrand wrote:
>>>>>>>> to synchronize page import/export with the I/O for paging. For example you can actually
>>>>>>>> fault in a page that is currently under paging I/O. What do you do? import (so that the
>>>>>>>> guest can run) or export (so that the I/O will work). As this turned out to be harder then
>>>>>>>> we though we decided to defer paging to a later point in time.
>>>>>>>
>>>>>>> I don't quite see the issue yet. If you page out, the page will
>>>>>>> automatically (on access) be converted to !secure/encrypted memory. If
>>>>>>> the UV/guest wants to access it, it will be automatically converted to
>>>>>>> secure/unencrypted memory. If you have concurrent access, it will be
>>>>>>> converted back and forth until one party is done.
>>>>>>
>>>>>> IO does not trigger an export on an imported page, but an error
>>>>>> condition in the IO subsystem. The page code does not read pages through
>>>>>
>>>>> Ah, that makes it much clearer. Thanks!
>>>>>   
>>>>>> the cpu, but often just asks the device to read directly and that's
>>>>>> where everything goes wrong. We could bounce swapping, but chose to pin
>>>>>> for now until we find a proper solution to that problem which nicely
>>>>>> integrates into linux.
>>>>>
>>>>> How hard would it be to
>>>>>
>>>>> 1. Detect the error condition
>>>>> 2. Try a read on the affected page from the CPU (will will automatically convert to encrypted/!secure)
>>>>> 3. Restart the I/O
>>>>>
>>>>> I assume that this is a corner case where we don't really have to care about performance in the first shot.
>>>>
>>>> We have looked into this. You would need to implement this in the low level
>>>> handler for every I/O. DASD, FCP, PCI based NVME, iscsi. Where do you want
>>>> to stop?
>>>
>>> If that's the real fix, we should do that. Maybe one can focus on the
>>> real use cases first. But I am no I/O expert, so my judgment might be
>>> completely wrong.
>>>    
>>
>> Oh, and by the way, as discussed you really only have to care about
>> accesses via "real" I/O devices (IOW, not via the CPU). When accessing
>> via the CPU, you should have automatic conversion back and forth. As I
>> am no expert on I/O, I have no idea how iscsi fits into this picture
>> here (especially on s390x).
>>
> 
> By "real" I/O devices, you mean things like channel devices, right? (So
> everything where you basically hand off control to a different kind of
> processor.)
> 
> For classic channel I/O (as used by dasd), I'd expect something like
> getting a check condition on a ccw if the CU or device cannot access
> the memory. You will know how far the channel program has progressed,
> and might be able to restart (from the beginning or from that point).
> Probably has a chance of working for a subset of channel programs.
> 
> For QDIO (as used by FCP), I have no idea how this is could work, as we
> have long-running channel programs there and any error basically kills
> the queues, which you would have to re-setup from the beginning.
> 
> For PCI devices, I have no idea how the instructions even act.
> 
>  From my point of view, that error/restart approach looks nice on paper,
> but it seems hard to make it work in the general case (and I'm unsure
> if it's possible at all.)

One thought: If all we do during an I/O request is read or write (or 
even a mixture), can we simply restart the whole I/O again, although we 
did partial reads/writes? This would eliminate the "know how far the 
channel program has progressed". On error, one would have to touch each 
involved page (e.g., try to read first byte to trigger a conversion) and 
restart the I/O. I can understand that this might sound simpler than it 
is (if it is even possible) and might still be problematic for QDIO as 
far as I understand. Just a thought.


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning
  2019-11-04 18:38                     ` David Hildenbrand
@ 2019-11-05  9:15                       ` Cornelia Huck
  0 siblings, 0 replies; 213+ messages in thread
From: Cornelia Huck @ 2019-11-05  9:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Christian Borntraeger, Janosch Frank, kvm, linux-s390, thuth,
	imbrenda, mihajlov, mimu, gor

On Mon, 4 Nov 2019 19:38:27 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 04.11.19 18:17, Cornelia Huck wrote:
> > On Mon, 4 Nov 2019 15:42:11 +0100
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> On 04.11.19 15:08, David Hildenbrand wrote:  
> >>> On 04.11.19 14:58, Christian Borntraeger wrote:  

> >>>>> How hard would it be to
> >>>>>
> >>>>> 1. Detect the error condition
> >>>>> 2. Try a read on the affected page from the CPU (will will automatically convert to encrypted/!secure)
> >>>>> 3. Restart the I/O
> >>>>>
> >>>>> I assume that this is a corner case where we don't really have to care about performance in the first shot.  
> >>>>
> >>>> We have looked into this. You would need to implement this in the low level
> >>>> handler for every I/O. DASD, FCP, PCI based NVME, iscsi. Where do you want
> >>>> to stop?  
> >>>
> >>> If that's the real fix, we should do that. Maybe one can focus on the
> >>> real use cases first. But I am no I/O expert, so my judgment might be
> >>> completely wrong.
> >>>      
> >>
> >> Oh, and by the way, as discussed you really only have to care about
> >> accesses via "real" I/O devices (IOW, not via the CPU). When accessing
> >> via the CPU, you should have automatic conversion back and forth. As I
> >> am no expert on I/O, I have no idea how iscsi fits into this picture
> >> here (especially on s390x).
> >>  
> > 
> > By "real" I/O devices, you mean things like channel devices, right? (So
> > everything where you basically hand off control to a different kind of
> > processor.)
> > 
> > For classic channel I/O (as used by dasd), I'd expect something like
> > getting a check condition on a ccw if the CU or device cannot access
> > the memory. You will know how far the channel program has progressed,
> > and might be able to restart (from the beginning or from that point).
> > Probably has a chance of working for a subset of channel programs.

NB that there's more than simple reads/writes... could also be control
commands, some of which do read/writes as well.

> > 
> > For QDIO (as used by FCP), I have no idea how this is could work, as we
> > have long-running channel programs there and any error basically kills
> > the queues, which you would have to re-setup from the beginning.
> > 
> > For PCI devices, I have no idea how the instructions even act.
> > 
> >  From my point of view, that error/restart approach looks nice on paper,
> > but it seems hard to make it work in the general case (and I'm unsure
> > if it's possible at all.)  
> 
> One thought: If all we do during an I/O request is read or write (or 
> even a mixture), can we simply restart the whole I/O again, although we 
> did partial reads/writes? This would eliminate the "know how far the 
> channel program has progressed". On error, one would have to touch each 
> involved page (e.g., try to read first byte to trigger a conversion) and 
> restart the I/O. I can understand that this might sound simpler than it 
> is (if it is even possible)

Any control commands might have side effects, though. Problems there
should be uncommon; there's still the _general_ case, though :(

Also, there's stuff like rewriting the channel program w/o prefetch,
jumping with TIC, etc. Linux probably does not do the former, but at
least the dasd driver uses NOP/TIC for error recovery.

> and might still be problematic for QDIO as 
> far as I understand. Just a thought.

Yes, given that for QDIO, establishing the queues is simply one
long-running channel program...

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-11-04 17:50     ` Christian Borntraeger
@ 2019-11-05  9:26       ` Cornelia Huck
  2019-11-08 12:14         ` Thomas Huth
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-05  9:26 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, kvm, linux-s390, thuth, david, imbrenda, mihajlov,
	mimu, gor

On Mon, 4 Nov 2019 18:50:12 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 04.11.19 16:54, Cornelia Huck wrote:
> > On Thu, 24 Oct 2019 07:40:24 -0400
> > Janosch Frank <frankja@linux.ibm.com> wrote:

> >> diff --git a/arch/s390/boot/uv.c b/arch/s390/boot/uv.c
> >> index ed007f4a6444..88cf8825d169 100644
> >> --- a/arch/s390/boot/uv.c
> >> +++ b/arch/s390/boot/uv.c
> >> @@ -3,7 +3,12 @@
> >>  #include <asm/facility.h>
> >>  #include <asm/sections.h>
> >>  
> >> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
> >>  int __bootdata_preserved(prot_virt_guest);
> >> +#endif
> >> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> >> +struct uv_info __bootdata_preserved(uv_info);
> >> +#endif  
> > 
> > Two functions with the same name, but different signatures look really
> > ugly.
> > 
> > Also, what happens if I want to build just a single kernel image for
> > both guest and host?  
> 
> This is not two functions with the same name. It is 2 variable declarations with
> the __bootdata_preserved helper. We expect to have all distro kernels to enable
> both. 

Ah ok, I misread that. (I'm blaming lack of sleep :/)

> 
> >   
> >>  
> >>  void uv_query_info(void)
> >>  {
> >> @@ -18,7 +23,20 @@ void uv_query_info(void)
> >>  	if (uv_call(0, (uint64_t)&uvcb))
> >>  		return;
> >>  
> >> -	if (test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
> >> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST)) {  
> > 
> > Do we always have everything needed for a host if uv_call() is
> > successful?  
> 
> The uv_call is the query call. It will provide the list of features. We check that
> later on.

Hm yes. I'm just seeing the guest side check for features, while the
host code just seems to go ahead and copies things. (later on == later
patches?)

> 
> >   
> >> +		memcpy(uv_info.inst_calls_list, uvcb.inst_calls_list, sizeof(uv_info.inst_calls_list));
> >> +		uv_info.uv_base_stor_len = uvcb.uv_base_stor_len;
> >> +		uv_info.guest_base_stor_len = uvcb.conf_base_phys_stor_len;
> >> +		uv_info.guest_virt_base_stor_len = uvcb.conf_base_virt_stor_len;
> >> +		uv_info.guest_virt_var_stor_len = uvcb.conf_virt_var_stor_len;
> >> +		uv_info.guest_cpu_stor_len = uvcb.cpu_stor_len;
> >> +		uv_info.max_sec_stor_addr = ALIGN(uvcb.max_guest_stor_addr, PAGE_SIZE);
> >> +		uv_info.max_num_sec_conf = uvcb.max_num_sec_conf;
> >> +		uv_info.max_guest_cpus = uvcb.max_guest_cpus;
> >> +	}
> >> +
> >> +	if (IS_ENABLED(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) &&
> >> +	    test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
> >>  	    test_bit_inv(BIT_UVC_CMD_REMOVE_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list))  
> > 
> > Especially as it looks like we need to test for those two commands to
> > determine whether we have support for a guest.
> >   
> >>  		prot_virt_guest = 1;
> >>  }
> >> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> >> index ef3c00b049ab..6db1bc495e67 100644
> >> --- a/arch/s390/include/asm/uv.h
> >> +++ b/arch/s390/include/asm/uv.h
> >> @@ -44,7 +44,19 @@ struct uv_cb_qui {
> >>  	struct uv_cb_header header;
> >>  	u64 reserved08;
> >>  	u64 inst_calls_list[4];
> >> -	u64 reserved30[15];
> >> +	u64 reserved30[2];
> >> +	u64 uv_base_stor_len;
> >> +	u64 reserved48;
> >> +	u64 conf_base_phys_stor_len;
> >> +	u64 conf_base_virt_stor_len;
> >> +	u64 conf_virt_var_stor_len;
> >> +	u64 cpu_stor_len;
> >> +	u32 reserved68[3];
> >> +	u32 max_num_sec_conf;
> >> +	u64 max_guest_stor_addr;
> >> +	u8  reserved80[150-128];
> >> +	u16 max_guest_cpus;
> >> +	u64 reserved98;
> >>  } __packed __aligned(8);
> >>  
> >>  struct uv_cb_share {
> >> @@ -69,9 +81,21 @@ static inline int uv_call(unsigned long r1, unsigned long r2)
> >>  	return cc;
> >>  }
> >>  
> >> -#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
> >> +struct uv_info {
> >> +	unsigned long inst_calls_list[4];
> >> +	unsigned long uv_base_stor_len;
> >> +	unsigned long guest_base_stor_len;
> >> +	unsigned long guest_virt_base_stor_len;
> >> +	unsigned long guest_virt_var_stor_len;
> >> +	unsigned long guest_cpu_stor_len;
> >> +	unsigned long max_sec_stor_addr;
> >> +	unsigned int max_num_sec_conf;
> >> +	unsigned short max_guest_cpus;
> >> +};  
> > 
> > What is the main difference between uv_info and uv_cb_qui? The
> > alignment of max_sec_stor_addr?  
> 
> One is the hardware data structure for query, the other one is the Linux
> internal state.

That's clear; I'm mainly wondering about what is simply copied vs. what
needs to be calculated.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling
  2019-11-04 11:25   ` David Hildenbrand
@ 2019-11-05 12:01     ` Christian Borntraeger
  2019-11-05 12:39       ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-05 12:01 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor



On 04.11.19 12:25, David Hildenbrand wrote:
> On 24.10.19 13:40, Janosch Frank wrote:
>> Guest registers for protected guests are stored at offset 0x380.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/kvm_host.h |  4 +++-
>>   arch/s390/kvm/kvm-s390.c         | 11 +++++++++++
>>   2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index 0ab309b7bf4c..5deabf9734d9 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -336,7 +336,9 @@ struct kvm_s390_itdb {
>>   struct sie_page {
>>       struct kvm_s390_sie_block sie_block;
>>       struct mcck_volatile_info mcck_info;    /* 0x0200 */
>> -    __u8 reserved218[1000];        /* 0x0218 */
>> +    __u8 reserved218[360];        /* 0x0218 */
>> +    __u64 pv_grregs[16];        /* 0x380 */
>> +    __u8 reserved400[512];
>>       struct kvm_s390_itdb itdb;    /* 0x0600 */
>>       __u8 reserved700[2304];        /* 0x0700 */
>>   };
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 490fde080107..97d3a81e5074 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -3965,6 +3965,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>>   static int __vcpu_run(struct kvm_vcpu *vcpu)
>>   {
>>       int rc, exit_reason;
>> +    struct sie_page *sie_page = (struct sie_page *)vcpu->arch.sie_block;
>>         /*
>>        * We try to hold kvm->srcu during most of vcpu_run (except when run-
>> @@ -3986,8 +3987,18 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>           guest_enter_irqoff();
>>           __disable_cpu_timer_accounting(vcpu);
>>           local_irq_enable();
>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +            memcpy(sie_page->pv_grregs,
>> +                   vcpu->run->s.regs.gprs,
>> +                   sizeof(sie_page->pv_grregs));
>> +        }
>>           exit_reason = sie64a(vcpu->arch.sie_block,
>>                        vcpu->run->s.regs.gprs);
>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +            memcpy(vcpu->run->s.regs.gprs,
>> +                   sie_page->pv_grregs,
>> +                   sizeof(sie_page->pv_grregs));
>> +        }
> 
> sie64a will load/save gprs 0-13 from to vcpu->run->s.regs.gprs.
> 
> I would have assume that this is not required for prot virt, because the HW has direct access via the sie block?

Yes, that is correct. The load/save in sie64a is not necessary for pv guests.

> 
> 
> 1. Would it make sense to have a specialized sie64a() (or a parameter, e.g., if you pass in NULL in r3), that optimizes this loading/saving? Eventually we can also optimize which host registers to save/restore then.

Having 2 kinds of sie64a seems not very nice for just saving a small number of cycles.

> 
> 2. Avoid this copying here. We have to store the state to vcpu->run->s.regs.gprs when returning to user space and restore the state when coming from user space.

I like this proposal better than the first one and
> 
> Also, we access the GPRS from interception handlers, there we might use wrappers like
> 
> kvm_s390_set_gprs()
> kvm_s390_get_gprs()

having register accessors might be useful anyway.
But I would like to defer that to a later point in time to keep the changes in here
minimal?

We can add a "TODO" comment in here so that we do not forget about this
for a future patch. Makes sense?

> 
> to route to the right location. There are multiple options to optimize this.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling
  2019-11-05 12:01     ` Christian Borntraeger
@ 2019-11-05 12:39       ` Janosch Frank
  2019-11-05 13:55         ` David Hildenbrand
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-05 12:39 UTC (permalink / raw)
  To: Christian Borntraeger, David Hildenbrand, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 4043 bytes --]

On 11/5/19 1:01 PM, Christian Borntraeger wrote:
> 
> 
> On 04.11.19 12:25, David Hildenbrand wrote:
>> On 24.10.19 13:40, Janosch Frank wrote:
>>> Guest registers for protected guests are stored at offset 0x380.
>>>
>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>> ---
>>>   arch/s390/include/asm/kvm_host.h |  4 +++-
>>>   arch/s390/kvm/kvm-s390.c         | 11 +++++++++++
>>>   2 files changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>>> index 0ab309b7bf4c..5deabf9734d9 100644
>>> --- a/arch/s390/include/asm/kvm_host.h
>>> +++ b/arch/s390/include/asm/kvm_host.h
>>> @@ -336,7 +336,9 @@ struct kvm_s390_itdb {
>>>   struct sie_page {
>>>       struct kvm_s390_sie_block sie_block;
>>>       struct mcck_volatile_info mcck_info;    /* 0x0200 */
>>> -    __u8 reserved218[1000];        /* 0x0218 */
>>> +    __u8 reserved218[360];        /* 0x0218 */
>>> +    __u64 pv_grregs[16];        /* 0x380 */
>>> +    __u8 reserved400[512];
>>>       struct kvm_s390_itdb itdb;    /* 0x0600 */
>>>       __u8 reserved700[2304];        /* 0x0700 */
>>>   };
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index 490fde080107..97d3a81e5074 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -3965,6 +3965,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>>>   static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>   {
>>>       int rc, exit_reason;
>>> +    struct sie_page *sie_page = (struct sie_page *)vcpu->arch.sie_block;
>>>         /*
>>>        * We try to hold kvm->srcu during most of vcpu_run (except when run-
>>> @@ -3986,8 +3987,18 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>           guest_enter_irqoff();
>>>           __disable_cpu_timer_accounting(vcpu);
>>>           local_irq_enable();
>>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>> +            memcpy(sie_page->pv_grregs,
>>> +                   vcpu->run->s.regs.gprs,
>>> +                   sizeof(sie_page->pv_grregs));
>>> +        }
>>>           exit_reason = sie64a(vcpu->arch.sie_block,
>>>                        vcpu->run->s.regs.gprs);
>>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>> +            memcpy(vcpu->run->s.regs.gprs,
>>> +                   sie_page->pv_grregs,
>>> +                   sizeof(sie_page->pv_grregs));
>>> +        }
>>
>> sie64a will load/save gprs 0-13 from to vcpu->run->s.regs.gprs.
>>
>> I would have assume that this is not required for prot virt, because the HW has direct access via the sie block?
> 
> Yes, that is correct. The load/save in sie64a is not necessary for pv guests.
> 
>>
>>
>> 1. Would it make sense to have a specialized sie64a() (or a parameter, e.g., if you pass in NULL in r3), that optimizes this loading/saving? Eventually we can also optimize which host registers to save/restore then.
> 
> Having 2 kinds of sie64a seems not very nice for just saving a small number of cycles.
> 
>>
>> 2. Avoid this copying here. We have to store the state to vcpu->run->s.regs.gprs when returning to user space and restore the state when coming from user space.
> 
> I like this proposal better than the first one and
>>
>> Also, we access the GPRS from interception handlers, there we might use wrappers like
>>
>> kvm_s390_set_gprs()
>> kvm_s390_get_gprs()
> 
> having register accessors might be useful anyway.
> But I would like to defer that to a later point in time to keep the changes in here
> minimal?
> 
> We can add a "TODO" comment in here so that we do not forget about this
> for a future patch. Makes sense?

I second all of that :-)




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling
  2019-11-05 12:39       ` Janosch Frank
@ 2019-11-05 13:55         ` David Hildenbrand
  2019-11-05 14:11           ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-11-05 13:55 UTC (permalink / raw)
  To: Janosch Frank, Christian Borntraeger, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor

On 05.11.19 13:39, Janosch Frank wrote:
> On 11/5/19 1:01 PM, Christian Borntraeger wrote:
>>
>>
>> On 04.11.19 12:25, David Hildenbrand wrote:
>>> On 24.10.19 13:40, Janosch Frank wrote:
>>>> Guest registers for protected guests are stored at offset 0x380.
>>>>
>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>> ---
>>>>    arch/s390/include/asm/kvm_host.h |  4 +++-
>>>>    arch/s390/kvm/kvm-s390.c         | 11 +++++++++++
>>>>    2 files changed, 14 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>>>> index 0ab309b7bf4c..5deabf9734d9 100644
>>>> --- a/arch/s390/include/asm/kvm_host.h
>>>> +++ b/arch/s390/include/asm/kvm_host.h
>>>> @@ -336,7 +336,9 @@ struct kvm_s390_itdb {
>>>>    struct sie_page {
>>>>        struct kvm_s390_sie_block sie_block;
>>>>        struct mcck_volatile_info mcck_info;    /* 0x0200 */
>>>> -    __u8 reserved218[1000];        /* 0x0218 */
>>>> +    __u8 reserved218[360];        /* 0x0218 */
>>>> +    __u64 pv_grregs[16];        /* 0x380 */
>>>> +    __u8 reserved400[512];
>>>>        struct kvm_s390_itdb itdb;    /* 0x0600 */
>>>>        __u8 reserved700[2304];        /* 0x0700 */
>>>>    };
>>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>>> index 490fde080107..97d3a81e5074 100644
>>>> --- a/arch/s390/kvm/kvm-s390.c
>>>> +++ b/arch/s390/kvm/kvm-s390.c
>>>> @@ -3965,6 +3965,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>>>>    static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>    {
>>>>        int rc, exit_reason;
>>>> +    struct sie_page *sie_page = (struct sie_page *)vcpu->arch.sie_block;
>>>>          /*
>>>>         * We try to hold kvm->srcu during most of vcpu_run (except when run-
>>>> @@ -3986,8 +3987,18 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>            guest_enter_irqoff();
>>>>            __disable_cpu_timer_accounting(vcpu);
>>>>            local_irq_enable();
>>>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>>> +            memcpy(sie_page->pv_grregs,
>>>> +                   vcpu->run->s.regs.gprs,
>>>> +                   sizeof(sie_page->pv_grregs));
>>>> +        }
>>>>            exit_reason = sie64a(vcpu->arch.sie_block,
>>>>                         vcpu->run->s.regs.gprs);
>>>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>>> +            memcpy(vcpu->run->s.regs.gprs,
>>>> +                   sie_page->pv_grregs,
>>>> +                   sizeof(sie_page->pv_grregs));
>>>> +        }
>>>
>>> sie64a will load/save gprs 0-13 from to vcpu->run->s.regs.gprs.
>>>
>>> I would have assume that this is not required for prot virt, because the HW has direct access via the sie block?
>>
>> Yes, that is correct. The load/save in sie64a is not necessary for pv guests.
>>
>>>
>>>
>>> 1. Would it make sense to have a specialized sie64a() (or a parameter, e.g., if you pass in NULL in r3), that optimizes this loading/saving? Eventually we can also optimize which host registers to save/restore then.
>>
>> Having 2 kinds of sie64a seems not very nice for just saving a small number of cycles.
>>
>>>
>>> 2. Avoid this copying here. We have to store the state to vcpu->run->s.regs.gprs when returning to user space and restore the state when coming from user space.
>>
>> I like this proposal better than the first one and

It was actually an additional proposal :)

1. avoids unnecessary saving/loading/saving/restoring
2. avoids the two memcpy

>>>
>>> Also, we access the GPRS from interception handlers, there we might use wrappers like
>>>
>>> kvm_s390_set_gprs()
>>> kvm_s390_get_gprs()
>>
>> having register accessors might be useful anyway.
>> But I would like to defer that to a later point in time to keep the changes in here
>> minimal?
>>
>> We can add a "TODO" comment in here so that we do not forget about this
>> for a future patch. Makes sense?

While it makes sense, I guess one could come up with a patch for 2. in 
less than 30 minutes ... but yeah, whatever you prefer. ;)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling
  2019-11-05 13:55         ` David Hildenbrand
@ 2019-11-05 14:11           ` Janosch Frank
  2019-11-05 14:18             ` David Hildenbrand
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-05 14:11 UTC (permalink / raw)
  To: David Hildenbrand, Christian Borntraeger, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 4779 bytes --]

On 11/5/19 2:55 PM, David Hildenbrand wrote:
> On 05.11.19 13:39, Janosch Frank wrote:
>> On 11/5/19 1:01 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 04.11.19 12:25, David Hildenbrand wrote:
>>>> On 24.10.19 13:40, Janosch Frank wrote:
>>>>> Guest registers for protected guests are stored at offset 0x380.
>>>>>
>>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>>> ---
>>>>>    arch/s390/include/asm/kvm_host.h |  4 +++-
>>>>>    arch/s390/kvm/kvm-s390.c         | 11 +++++++++++
>>>>>    2 files changed, 14 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>>>>> index 0ab309b7bf4c..5deabf9734d9 100644
>>>>> --- a/arch/s390/include/asm/kvm_host.h
>>>>> +++ b/arch/s390/include/asm/kvm_host.h
>>>>> @@ -336,7 +336,9 @@ struct kvm_s390_itdb {
>>>>>    struct sie_page {
>>>>>        struct kvm_s390_sie_block sie_block;
>>>>>        struct mcck_volatile_info mcck_info;    /* 0x0200 */
>>>>> -    __u8 reserved218[1000];        /* 0x0218 */
>>>>> +    __u8 reserved218[360];        /* 0x0218 */
>>>>> +    __u64 pv_grregs[16];        /* 0x380 */
>>>>> +    __u8 reserved400[512];
>>>>>        struct kvm_s390_itdb itdb;    /* 0x0600 */
>>>>>        __u8 reserved700[2304];        /* 0x0700 */
>>>>>    };
>>>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>>>> index 490fde080107..97d3a81e5074 100644
>>>>> --- a/arch/s390/kvm/kvm-s390.c
>>>>> +++ b/arch/s390/kvm/kvm-s390.c
>>>>> @@ -3965,6 +3965,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>>>>>    static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>>    {
>>>>>        int rc, exit_reason;
>>>>> +    struct sie_page *sie_page = (struct sie_page *)vcpu->arch.sie_block;
>>>>>          /*
>>>>>         * We try to hold kvm->srcu during most of vcpu_run (except when run-
>>>>> @@ -3986,8 +3987,18 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>>            guest_enter_irqoff();
>>>>>            __disable_cpu_timer_accounting(vcpu);
>>>>>            local_irq_enable();
>>>>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>>>> +            memcpy(sie_page->pv_grregs,
>>>>> +                   vcpu->run->s.regs.gprs,
>>>>> +                   sizeof(sie_page->pv_grregs));
>>>>> +        }
>>>>>            exit_reason = sie64a(vcpu->arch.sie_block,
>>>>>                         vcpu->run->s.regs.gprs);
>>>>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>>>> +            memcpy(vcpu->run->s.regs.gprs,
>>>>> +                   sie_page->pv_grregs,
>>>>> +                   sizeof(sie_page->pv_grregs));
>>>>> +        }
>>>>
>>>> sie64a will load/save gprs 0-13 from to vcpu->run->s.regs.gprs.
>>>>
>>>> I would have assume that this is not required for prot virt, because the HW has direct access via the sie block?
>>>
>>> Yes, that is correct. The load/save in sie64a is not necessary for pv guests.
>>>
>>>>
>>>>
>>>> 1. Would it make sense to have a specialized sie64a() (or a parameter, e.g., if you pass in NULL in r3), that optimizes this loading/saving? Eventually we can also optimize which host registers to save/restore then.
>>>
>>> Having 2 kinds of sie64a seems not very nice for just saving a small number of cycles.
>>>
>>>>
>>>> 2. Avoid this copying here. We have to store the state to vcpu->run->s.regs.gprs when returning to user space and restore the state when coming from user space.
>>>
>>> I like this proposal better than the first one and
> 
> It was actually an additional proposal :)
> 
> 1. avoids unnecessary saving/loading/saving/restoring
> 2. avoids the two memcpy
> 
>>>>
>>>> Also, we access the GPRS from interception handlers, there we might use wrappers like
>>>>
>>>> kvm_s390_set_gprs()
>>>> kvm_s390_get_gprs()
>>>
>>> having register accessors might be useful anyway.
>>> But I would like to defer that to a later point in time to keep the changes in here
>>> minimal?
>>>
>>> We can add a "TODO" comment in here so that we do not forget about this
>>> for a future patch. Makes sense?
> 
> While it makes sense, I guess one could come up with a patch for 2. in 
> less than 30 minutes ... but yeah, whatever you prefer. ;)
> 

Just to get it fully right we'd need to:
a. Synchronize registers into/from vcpu run in sync_regs/store_regs
b. Sprinkle get/set_gpr(int nr) over most of the files in arch/s390/kvm

That's your proposal?



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling
  2019-11-05 14:11           ` Janosch Frank
@ 2019-11-05 14:18             ` David Hildenbrand
  2019-11-14 14:46               ` Thomas Huth
  0 siblings, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-11-05 14:18 UTC (permalink / raw)
  To: Janosch Frank, Christian Borntraeger, kvm
  Cc: linux-s390, thuth, imbrenda, mihajlov, mimu, cohuck, gor

On 05.11.19 15:11, Janosch Frank wrote:
> On 11/5/19 2:55 PM, David Hildenbrand wrote:
>> On 05.11.19 13:39, Janosch Frank wrote:
>>> On 11/5/19 1:01 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 04.11.19 12:25, David Hildenbrand wrote:
>>>>> On 24.10.19 13:40, Janosch Frank wrote:
>>>>>> Guest registers for protected guests are stored at offset 0x380.
>>>>>>
>>>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>>>> ---
>>>>>>     arch/s390/include/asm/kvm_host.h |  4 +++-
>>>>>>     arch/s390/kvm/kvm-s390.c         | 11 +++++++++++
>>>>>>     2 files changed, 14 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>>>>>> index 0ab309b7bf4c..5deabf9734d9 100644
>>>>>> --- a/arch/s390/include/asm/kvm_host.h
>>>>>> +++ b/arch/s390/include/asm/kvm_host.h
>>>>>> @@ -336,7 +336,9 @@ struct kvm_s390_itdb {
>>>>>>     struct sie_page {
>>>>>>         struct kvm_s390_sie_block sie_block;
>>>>>>         struct mcck_volatile_info mcck_info;    /* 0x0200 */
>>>>>> -    __u8 reserved218[1000];        /* 0x0218 */
>>>>>> +    __u8 reserved218[360];        /* 0x0218 */
>>>>>> +    __u64 pv_grregs[16];        /* 0x380 */
>>>>>> +    __u8 reserved400[512];
>>>>>>         struct kvm_s390_itdb itdb;    /* 0x0600 */
>>>>>>         __u8 reserved700[2304];        /* 0x0700 */
>>>>>>     };
>>>>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>>>>> index 490fde080107..97d3a81e5074 100644
>>>>>> --- a/arch/s390/kvm/kvm-s390.c
>>>>>> +++ b/arch/s390/kvm/kvm-s390.c
>>>>>> @@ -3965,6 +3965,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>>>>>>     static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>>>     {
>>>>>>         int rc, exit_reason;
>>>>>> +    struct sie_page *sie_page = (struct sie_page *)vcpu->arch.sie_block;
>>>>>>           /*
>>>>>>          * We try to hold kvm->srcu during most of vcpu_run (except when run-
>>>>>> @@ -3986,8 +3987,18 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>>>             guest_enter_irqoff();
>>>>>>             __disable_cpu_timer_accounting(vcpu);
>>>>>>             local_irq_enable();
>>>>>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>>>>> +            memcpy(sie_page->pv_grregs,
>>>>>> +                   vcpu->run->s.regs.gprs,
>>>>>> +                   sizeof(sie_page->pv_grregs));
>>>>>> +        }
>>>>>>             exit_reason = sie64a(vcpu->arch.sie_block,
>>>>>>                          vcpu->run->s.regs.gprs);
>>>>>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>>>>> +            memcpy(vcpu->run->s.regs.gprs,
>>>>>> +                   sie_page->pv_grregs,
>>>>>> +                   sizeof(sie_page->pv_grregs));
>>>>>> +        }
>>>>>
>>>>> sie64a will load/save gprs 0-13 from to vcpu->run->s.regs.gprs.
>>>>>
>>>>> I would have assume that this is not required for prot virt, because the HW has direct access via the sie block?
>>>>
>>>> Yes, that is correct. The load/save in sie64a is not necessary for pv guests.
>>>>
>>>>>
>>>>>
>>>>> 1. Would it make sense to have a specialized sie64a() (or a parameter, e.g., if you pass in NULL in r3), that optimizes this loading/saving? Eventually we can also optimize which host registers to save/restore then.
>>>>
>>>> Having 2 kinds of sie64a seems not very nice for just saving a small number of cycles.
>>>>
>>>>>
>>>>> 2. Avoid this copying here. We have to store the state to vcpu->run->s.regs.gprs when returning to user space and restore the state when coming from user space.
>>>>
>>>> I like this proposal better than the first one and
>>
>> It was actually an additional proposal :)
>>
>> 1. avoids unnecessary saving/loading/saving/restoring
>> 2. avoids the two memcpy
>>
>>>>>
>>>>> Also, we access the GPRS from interception handlers, there we might use wrappers like
>>>>>
>>>>> kvm_s390_set_gprs()
>>>>> kvm_s390_get_gprs()
>>>>
>>>> having register accessors might be useful anyway.
>>>> But I would like to defer that to a later point in time to keep the changes in here
>>>> minimal?
>>>>
>>>> We can add a "TODO" comment in here so that we do not forget about this
>>>> for a future patch. Makes sense?
>>
>> While it makes sense, I guess one could come up with a patch for 2. in
>> less than 30 minutes ... but yeah, whatever you prefer. ;)
>>
> 
> Just to get it fully right we'd need to:
> a. Synchronize registers into/from vcpu run in sync_regs/store_regs
> b. Sprinkle get/set_gpr(int nr) over most of the files in arch/s390/kvm
> 
> That's your proposal?

Yes. Patch 1, factor out gprs access. Patch 2, avoid the memcpy by 
fixing the gprs access functions and removing the memcpys. (both as 
addons to this patch)

I guess that should be it ... but maybe we'll stumble over surprises :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls
  2019-10-24 11:40 ` [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls Janosch Frank
  2019-10-30 15:53   ` David Hildenbrand
@ 2019-11-05 17:51   ` Cornelia Huck
  2019-11-07 12:42     ` Michael Mueller
  2019-11-14 11:48   ` Thomas Huth
  2 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-05 17:51 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:35 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> From: Michael Mueller <mimu@linux.ibm.com>
> 
> Define the interruption injection codes and the related fields in the
> sie control block for PVM interruption injection.
> 
> Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 6cc3b73ca904..82443236d4cc 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -215,7 +215,15 @@ struct kvm_s390_sie_block {
>  	__u8	icptcode;		/* 0x0050 */
>  	__u8	icptstatus;		/* 0x0051 */
>  	__u16	ihcpu;			/* 0x0052 */
> -	__u8	reserved54[2];		/* 0x0054 */
> +	__u8	reserved54;		/* 0x0054 */
> +#define IICTL_CODE_NONE		 0x00
> +#define IICTL_CODE_MCHK		 0x01
> +#define IICTL_CODE_EXT		 0x02
> +#define IICTL_CODE_IO		 0x03
> +#define IICTL_CODE_RESTART	 0x04
> +#define IICTL_CODE_SPECIFICATION 0x10
> +#define IICTL_CODE_OPERAND	 0x11
> +	__u8	iictl;			/* 0x0055 */
>  	__u16	ipa;			/* 0x0056 */
>  	__u32	ipb;			/* 0x0058 */
>  	__u32	scaoh;			/* 0x005c */
> @@ -252,7 +260,8 @@ struct kvm_s390_sie_block {
>  #define HPID_KVM	0x4
>  #define HPID_VSIE	0x5
>  	__u8	hpid;			/* 0x00b8 */
> -	__u8	reservedb9[11];		/* 0x00b9 */
> +	__u8	reservedb9[7];		/* 0x00b9 */
> +	__u32	eiparams;		/* 0x00c0 */
>  	__u16	extcpuaddr;		/* 0x00c4 */
>  	__u16	eic;			/* 0x00c6 */
>  	__u32	reservedc8;		/* 0x00c8 */
> @@ -268,8 +277,16 @@ struct kvm_s390_sie_block {
>  	__u8	oai;			/* 0x00e2 */
>  	__u8	armid;			/* 0x00e3 */
>  	__u8	reservede4[4];		/* 0x00e4 */
> -	__u64	tecmc;			/* 0x00e8 */
> -	__u8	reservedf0[12];		/* 0x00f0 */
> +	union {
> +		__u64	tecmc;		/* 0x00e8 */
> +		struct {
> +			__u16	subchannel_id;	/* 0x00e8 */
> +			__u16	subchannel_nr;	/* 0x00ea */
> +			__u32	io_int_parm;	/* 0x00ec */
> +			__u32	io_int_word;	/* 0x00f0 */
> +		};
> +	} __packed;
> +	__u8	reservedf4[8];		/* 0x00f4 */

IIUC, for protected guests, you won't get an interception for which
tecmc would be valid anymore, but need to put the I/O interruption
stuff at the same place, right?

My main issue is that this makes the control block definition a bit
ugly, since the f0 value that's unused in the non-protvirt case is not
obvious anymore; but I don't know how to express this without making it
even uglier :(

>  #define CRYCB_FORMAT_MASK 0x00000003
>  #define CRYCB_FORMAT0 0x00000000
>  #define CRYCB_FORMAT1 0x00000001

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 12/37] KVM: s390: protvirt: Handle SE notification interceptions
  2019-10-24 11:40 ` [RFC 12/37] KVM: s390: protvirt: Handle SE notification interceptions Janosch Frank
  2019-10-30 15:50   ` David Hildenbrand
@ 2019-11-05 18:04   ` Cornelia Huck
  2019-11-05 18:15     ` Christian Borntraeger
  1 sibling, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-05 18:04 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:34 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> Since KVM doesn't emulate any form of load control and load psw
> instructions anymore, we wouldn't get an interception if PSWs or CRs
> are changed in the guest. That means we can't inject IRQs right after
> the guest is enabled for them.
> 
> The new interception codes solve that problem by being a notification
> for changes to IRQ enablement relevant bits in CRs 0, 6 and 14, as
> well a the machine check mask bit in the PSW.
> 
> No special handling is needed for these interception codes, the KVM
> pre-run code will consult all necessary CRs and PSW bits and inject
> IRQs the guest is enabled for.

Just to clarify: The hypervisor can still access the relevant bits for
pv guests, this is only about the notification, right?

> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h |  2 ++
>  arch/s390/kvm/intercept.c        | 18 ++++++++++++++++++
>  2 files changed, 20 insertions(+)

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 16/37] KVM: s390: protvirt: Implement machine-check interruption injection
  2019-10-24 11:40 ` [RFC 16/37] KVM: s390: protvirt: Implement machine-check interruption injection Janosch Frank
@ 2019-11-05 18:11   ` Cornelia Huck
  0 siblings, 0 replies; 213+ messages in thread
From: Cornelia Huck @ 2019-11-05 18:11 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:38 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> From: Michael Mueller <mimu@linux.ibm.com>
> 
> Similar to external interrupts, the hypervisor can inject machine
> checks by providing the right data in the interrupt injection controls.
> 
> Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
> ---
>  arch/s390/kvm/interrupt.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index c919dfe4dfd3..1f87c7d3fa3e 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -568,6 +568,14 @@ static int __write_machine_check(struct kvm_vcpu *vcpu,
>  	union mci mci;
>  	int rc;
>  
> +	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +		vcpu->arch.sie_block->iictl = IICTL_CODE_MCHK;
> +		vcpu->arch.sie_block->mcic = mchk->mcic;
> +		vcpu->arch.sie_block->faddr = mchk->failing_storage_address;
> +		vcpu->arch.sie_block->edc = mchk->ext_damage_code;
> +		return 0;
> +	}
> +

The other stuff this function injects in the !pv case is inaccessible
to the hypervisor in the pv case, right? (Registers, extended save
area, ...) Maybe add a comment?

>  	mci.val = mchk->mcic;
>  	/* take care of lazy register loading */
>  	save_fpu_regs();

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 12/37] KVM: s390: protvirt: Handle SE notification interceptions
  2019-11-05 18:04   ` Cornelia Huck
@ 2019-11-05 18:15     ` Christian Borntraeger
  2019-11-05 18:37       ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: Christian Borntraeger @ 2019-11-05 18:15 UTC (permalink / raw)
  To: Cornelia Huck, Janosch Frank
  Cc: kvm, linux-s390, thuth, david, imbrenda, mihajlov, mimu, gor



On 05.11.19 19:04, Cornelia Huck wrote:
> On Thu, 24 Oct 2019 07:40:34 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> Since KVM doesn't emulate any form of load control and load psw
>> instructions anymore, we wouldn't get an interception if PSWs or CRs
>> are changed in the guest. That means we can't inject IRQs right after
>> the guest is enabled for them.
>>
>> The new interception codes solve that problem by being a notification
>> for changes to IRQ enablement relevant bits in CRs 0, 6 and 14, as
>> well a the machine check mask bit in the PSW.
>>
>> No special handling is needed for these interception codes, the KVM
>> pre-run code will consult all necessary CRs and PSW bits and inject
>> IRQs the guest is enabled for.
> 
> Just to clarify: The hypervisor can still access the relevant bits for
> pv guests, this is only about the notification, right?
> 

Yes, the hypervisor (KVM) can always read the relevant PSW bits (I,E,M) and
CR bits to decide if an interrupt can be delivered. All other bits of PSW
and CRx are masked though.
This is a new intercept for notification as we do no longer get an IC4 (instruction
to handle) for load control and friends so that we can re-check the bits. 
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/kvm_host.h |  2 ++
>>  arch/s390/kvm/intercept.c        | 18 ++++++++++++++++++
>>  2 files changed, 20 insertions(+)
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 12/37] KVM: s390: protvirt: Handle SE notification interceptions
  2019-11-05 18:15     ` Christian Borntraeger
@ 2019-11-05 18:37       ` Cornelia Huck
  0 siblings, 0 replies; 213+ messages in thread
From: Cornelia Huck @ 2019-11-05 18:37 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, kvm, linux-s390, thuth, david, imbrenda, mihajlov,
	mimu, gor

On Tue, 5 Nov 2019 19:15:19 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 05.11.19 19:04, Cornelia Huck wrote:
> > On Thu, 24 Oct 2019 07:40:34 -0400
> > Janosch Frank <frankja@linux.ibm.com> wrote:
> >   
> >> Since KVM doesn't emulate any form of load control and load psw
> >> instructions anymore, we wouldn't get an interception if PSWs or CRs
> >> are changed in the guest. That means we can't inject IRQs right after
> >> the guest is enabled for them.
> >>
> >> The new interception codes solve that problem by being a notification
> >> for changes to IRQ enablement relevant bits in CRs 0, 6 and 14, as
> >> well a the machine check mask bit in the PSW.
> >>
> >> No special handling is needed for these interception codes, the KVM
> >> pre-run code will consult all necessary CRs and PSW bits and inject
> >> IRQs the guest is enabled for.  
> > 
> > Just to clarify: The hypervisor can still access the relevant bits for
> > pv guests, this is only about the notification, right?
> >   
> 
> Yes, the hypervisor (KVM) can always read the relevant PSW bits (I,E,M) and
> CR bits to decide if an interrupt can be delivered. All other bits of PSW
> and CRx are masked though.
> This is a new intercept for notification as we do no longer get an IC4 (instruction
> to handle) for load control and friends so that we can re-check the bits. 

Ok, thanks!

> >>
> >> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> >> ---
> >>  arch/s390/include/asm/kvm_host.h |  2 ++
> >>  arch/s390/kvm/intercept.c        | 18 ++++++++++++++++++
> >>  2 files changed, 20 insertions(+)  
> >   
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 30/37] DOCUMENTATION: protvirt: Diag 308 IPL
  2019-10-24 11:40 ` [RFC 30/37] DOCUMENTATION: protvirt: Diag 308 IPL Janosch Frank
@ 2019-11-06 16:48   ` Cornelia Huck
  2019-11-06 17:05     ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-06 16:48 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:52 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> Description of changes that are necessary to move a KVM VM into
> Protected Virtualization mode.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  Documentation/virtual/kvm/s390-pv-boot.txt | 62 ++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/s390-pv-boot.txt
> 
> diff --git a/Documentation/virtual/kvm/s390-pv-boot.txt b/Documentation/virtual/kvm/s390-pv-boot.txt
> new file mode 100644
> index 000000000000..af883c928c08
> --- /dev/null
> +++ b/Documentation/virtual/kvm/s390-pv-boot.txt
> @@ -0,0 +1,62 @@
> +Boot/IPL of Protected VMs
> +========================
> +
> +Summary:
> +
> +Protected VMs are encrypted while not running. On IPL a small
> +plaintext bootloader is started which provides information about the
> +encrypted components and necessary metadata to KVM to decrypt it.
> +
> +Based on this data, KVM will make the PV known to the Ultravisor and
> +instruct it to secure its memory, decrypt the components and verify
> +the data and address list hashes, to ensure integrity. Afterwards KVM
> +can run the PV via SIE which the UV will intercept and execute on
> +KVM's behalf.
> +
> +The switch into PV mode lets us load encrypted guest executables and
> +data via every available method (network, dasd, scsi, direct kernel,
> +...) without the need to change the boot process.
> +
> +
> +Diag308:
> +
> +This diagnose instruction is the basis vor VM IPL. The VM can set and

s/vor/for/

> +retrieve IPL information blocks, that specify the IPL method/devices
> +and request VM memory and subsystem resets, as well as IPLs.
> +
> +For PVs this concept has been continued with new subcodes:
> +
> +Subcode 8: Set an IPL Information Block of type 5.
> +Subcode 9: Store the saved block in guest memory
> +Subcode 10: Move into Protected Virtualization mode
> +
> +The new PV load-device-specific-parameters field specifies all data,
> +that is necessary to move into PV mode.
> +
> +* PV Header origin
> +* PV Header length
> +* List of Components composed of:
> +  * AES-XTS Tweak prefix
> +  * Origin
> +  * Size
> +
> +The PV header contains the keys and hashes, which the UV will use to
> +decrypt and verify the PV, as well as control flags and a start PSW.
> +
> +The components are for instance an encrypted kernel, kernel cmd and
> +initrd. The components are decrypted by the UV.
> +
> +All non-decrypted data of the non-PV guest instance are zero on first
> +access of the PV.
> +
> +
> +When running in a protected mode some subcodes will result in
> +exceptions or return error codes.
> +
> +Subcodes 4 and 7 will result in specification exceptions.
> +When removing a secure VM, the UV will clear all memory, so we can't
> +have non-clearing IPL subcodes.
> +
> +Subcodes 8, 9, 10 will result in specification exceptions.
> +Re-IPL into a protected mode is only possible via a detour into non
> +protected mode.

So... what do we IPL from? Is there still a need for the bios?

(Sorry, I'm a bit confused here.)

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 30/37] DOCUMENTATION: protvirt: Diag 308 IPL
  2019-11-06 16:48   ` Cornelia Huck
@ 2019-11-06 17:05     ` Janosch Frank
  2019-11-06 17:37       ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-06 17:05 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 3513 bytes --]

On 11/6/19 5:48 PM, Cornelia Huck wrote:
> On Thu, 24 Oct 2019 07:40:52 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> Description of changes that are necessary to move a KVM VM into
>> Protected Virtualization mode.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  Documentation/virtual/kvm/s390-pv-boot.txt | 62 ++++++++++++++++++++++
>>  1 file changed, 62 insertions(+)
>>  create mode 100644 Documentation/virtual/kvm/s390-pv-boot.txt
>>
>> diff --git a/Documentation/virtual/kvm/s390-pv-boot.txt b/Documentation/virtual/kvm/s390-pv-boot.txt
>> new file mode 100644
>> index 000000000000..af883c928c08
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/s390-pv-boot.txt
>> @@ -0,0 +1,62 @@
>> +Boot/IPL of Protected VMs
>> +========================
>> +
>> +Summary:
>> +
>> +Protected VMs are encrypted while not running. On IPL a small
>> +plaintext bootloader is started which provides information about the
>> +encrypted components and necessary metadata to KVM to decrypt it.
>> +
>> +Based on this data, KVM will make the PV known to the Ultravisor and
>> +instruct it to secure its memory, decrypt the components and verify
>> +the data and address list hashes, to ensure integrity. Afterwards KVM
>> +can run the PV via SIE which the UV will intercept and execute on
>> +KVM's behalf.
>> +
>> +The switch into PV mode lets us load encrypted guest executables and
>> +data via every available method (network, dasd, scsi, direct kernel,
>> +...) without the need to change the boot process.
>> +
>> +
>> +Diag308:
>> +
>> +This diagnose instruction is the basis vor VM IPL. The VM can set and
> 
> s/vor/for/
> 
>> +retrieve IPL information blocks, that specify the IPL method/devices
>> +and request VM memory and subsystem resets, as well as IPLs.
>> +
>> +For PVs this concept has been continued with new subcodes:
>> +
>> +Subcode 8: Set an IPL Information Block of type 5.
>> +Subcode 9: Store the saved block in guest memory
>> +Subcode 10: Move into Protected Virtualization mode
>> +
>> +The new PV load-device-specific-parameters field specifies all data,
>> +that is necessary to move into PV mode.
>> +
>> +* PV Header origin
>> +* PV Header length
>> +* List of Components composed of:
>> +  * AES-XTS Tweak prefix
>> +  * Origin
>> +  * Size
>> +
>> +The PV header contains the keys and hashes, which the UV will use to
>> +decrypt and verify the PV, as well as control flags and a start PSW.
>> +
>> +The components are for instance an encrypted kernel, kernel cmd and
>> +initrd. The components are decrypted by the UV.
>> +
>> +All non-decrypted data of the non-PV guest instance are zero on first
>> +access of the PV.
>> +
>> +
>> +When running in a protected mode some subcodes will result in
>> +exceptions or return error codes.
>> +
>> +Subcodes 4 and 7 will result in specification exceptions.
>> +When removing a secure VM, the UV will clear all memory, so we can't
>> +have non-clearing IPL subcodes.
>> +
>> +Subcodes 8, 9, 10 will result in specification exceptions.
>> +Re-IPL into a protected mode is only possible via a detour into non
>> +protected mode.
> 
> So... what do we IPL from? Is there still a need for the bios?
> 
> (Sorry, I'm a bit confused here.)
> 

We load a blob via the bios (all methods are supported) and that blob
moves itself into protected mode. I.e. it has a small unprotected stub,
the rest is an encrypted kernel.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 30/37] DOCUMENTATION: protvirt: Diag 308 IPL
  2019-11-06 17:05     ` Janosch Frank
@ 2019-11-06 17:37       ` Cornelia Huck
  2019-11-06 21:02         ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-06 17:37 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

[-- Attachment #1: Type: text/plain, Size: 997 bytes --]

On Wed, 6 Nov 2019 18:05:22 +0100
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 11/6/19 5:48 PM, Cornelia Huck wrote:
> > On Thu, 24 Oct 2019 07:40:52 -0400
> > Janosch Frank <frankja@linux.ibm.com> wrote:
> >   
> >> Description of changes that are necessary to move a KVM VM into
> >> Protected Virtualization mode.
> >>
> >> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> >> ---
> >>  Documentation/virtual/kvm/s390-pv-boot.txt | 62 ++++++++++++++++++++++
> >>  1 file changed, 62 insertions(+)
> >>  create mode 100644 Documentation/virtual/kvm/s390-pv-boot.txt

> > So... what do we IPL from? Is there still a need for the bios?
> > 
> > (Sorry, I'm a bit confused here.)
> >   
> 
> We load a blob via the bios (all methods are supported) and that blob
> moves itself into protected mode. I.e. it has a small unprotected stub,
> the rest is an encrypted kernel.
> 

Ok. The magic is in the loaded kernel, and we don't need modifications
to the bios?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 30/37] DOCUMENTATION: protvirt: Diag 308 IPL
  2019-11-06 17:37       ` Cornelia Huck
@ 2019-11-06 21:02         ` Janosch Frank
  2019-11-07  8:53           ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-06 21:02 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 1776 bytes --]

On 11/6/19 6:37 PM, Cornelia Huck wrote:
> On Wed, 6 Nov 2019 18:05:22 +0100
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> On 11/6/19 5:48 PM, Cornelia Huck wrote:
>>> On Thu, 24 Oct 2019 07:40:52 -0400
>>> Janosch Frank <frankja@linux.ibm.com> wrote:
>>>   
>>>> Description of changes that are necessary to move a KVM VM into
>>>> Protected Virtualization mode.
>>>>
>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>> ---
>>>>  Documentation/virtual/kvm/s390-pv-boot.txt | 62 ++++++++++++++++++++++
>>>>  1 file changed, 62 insertions(+)
>>>>  create mode 100644 Documentation/virtual/kvm/s390-pv-boot.txt
> 
>>> So... what do we IPL from? Is there still a need for the bios?
>>>
>>> (Sorry, I'm a bit confused here.)
>>>   
>>
>> We load a blob via the bios (all methods are supported) and that blob
>> moves itself into protected mode. I.e. it has a small unprotected stub,
>> the rest is an encrypted kernel.
>>
> 
> Ok. The magic is in the loaded kernel, and we don't need modifications
> to the bios?
> 

Yes.

The order is:
* We load a blob via the bios or direct kernel boot.
* That blob consists of a small stub, a header and an encrypted blob
glued together
* The small stub does the diag 308 subcode 8 and 10.
* Subcode 8 basically passes the header that describes the encrypted
blob to the Ultravisor (well rather registers it with qemu to pass on later)
* Subcode 10 tells QEMU to move the VM into protected mode
* A lot of APIs in KVM and the Ultravisor are called
* The protected VM starts
* A memory mover copies the now unencrypted, but protected kernel to its
intended place and jumps into the entry function
* Linux boots and detects, that it is protected and needs to use bounce
buffers


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 30/37] DOCUMENTATION: protvirt: Diag 308 IPL
  2019-11-06 21:02         ` Janosch Frank
@ 2019-11-07  8:53           ` Cornelia Huck
  2019-11-07  8:59             ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-07  8:53 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]

On Wed, 6 Nov 2019 22:02:41 +0100
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 11/6/19 6:37 PM, Cornelia Huck wrote:
> > On Wed, 6 Nov 2019 18:05:22 +0100
> > Janosch Frank <frankja@linux.ibm.com> wrote:
> >   
> >> On 11/6/19 5:48 PM, Cornelia Huck wrote:  
> >>> On Thu, 24 Oct 2019 07:40:52 -0400
> >>> Janosch Frank <frankja@linux.ibm.com> wrote:
> >>>     
> >>>> Description of changes that are necessary to move a KVM VM into
> >>>> Protected Virtualization mode.
> >>>>
> >>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> >>>> ---
> >>>>  Documentation/virtual/kvm/s390-pv-boot.txt | 62 ++++++++++++++++++++++
> >>>>  1 file changed, 62 insertions(+)
> >>>>  create mode 100644 Documentation/virtual/kvm/s390-pv-boot.txt  
> >   
> >>> So... what do we IPL from? Is there still a need for the bios?
> >>>
> >>> (Sorry, I'm a bit confused here.)
> >>>     
> >>
> >> We load a blob via the bios (all methods are supported) and that blob
> >> moves itself into protected mode. I.e. it has a small unprotected stub,
> >> the rest is an encrypted kernel.
> >>  
> > 
> > Ok. The magic is in the loaded kernel, and we don't need modifications
> > to the bios?
> >   
> 
> Yes.
> 
> The order is:
> * We load a blob via the bios or direct kernel boot.
> * That blob consists of a small stub, a header and an encrypted blob
> glued together
> * The small stub does the diag 308 subcode 8 and 10.
> * Subcode 8 basically passes the header that describes the encrypted
> blob to the Ultravisor (well rather registers it with qemu to pass on later)
> * Subcode 10 tells QEMU to move the VM into protected mode
> * A lot of APIs in KVM and the Ultravisor are called
> * The protected VM starts
> * A memory mover copies the now unencrypted, but protected kernel to its
> intended place and jumps into the entry function
> * Linux boots and detects, that it is protected and needs to use bounce
> buffers
> 

Thanks, this explanation makes things much clearer.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 30/37] DOCUMENTATION: protvirt: Diag 308 IPL
  2019-11-07  8:53           ` Cornelia Huck
@ 2019-11-07  8:59             ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-07  8:59 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 2312 bytes --]

On 11/7/19 9:53 AM, Cornelia Huck wrote:
> On Wed, 6 Nov 2019 22:02:41 +0100
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> On 11/6/19 6:37 PM, Cornelia Huck wrote:
>>> On Wed, 6 Nov 2019 18:05:22 +0100
>>> Janosch Frank <frankja@linux.ibm.com> wrote:
>>>   
>>>> On 11/6/19 5:48 PM, Cornelia Huck wrote:  
>>>>> On Thu, 24 Oct 2019 07:40:52 -0400
>>>>> Janosch Frank <frankja@linux.ibm.com> wrote:
>>>>>     
>>>>>> Description of changes that are necessary to move a KVM VM into
>>>>>> Protected Virtualization mode.
>>>>>>
>>>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>>>> ---
>>>>>>  Documentation/virtual/kvm/s390-pv-boot.txt | 62 ++++++++++++++++++++++
>>>>>>  1 file changed, 62 insertions(+)
>>>>>>  create mode 100644 Documentation/virtual/kvm/s390-pv-boot.txt  
>>>   
>>>>> So... what do we IPL from? Is there still a need for the bios?
>>>>>
>>>>> (Sorry, I'm a bit confused here.)
>>>>>     
>>>>
>>>> We load a blob via the bios (all methods are supported) and that blob
>>>> moves itself into protected mode. I.e. it has a small unprotected stub,
>>>> the rest is an encrypted kernel.
>>>>  
>>>
>>> Ok. The magic is in the loaded kernel, and we don't need modifications
>>> to the bios?
>>>   
>>
>> Yes.
>>
>> The order is:
>> * We load a blob via the bios or direct kernel boot.
>> * That blob consists of a small stub, a header and an encrypted blob
>> glued together
>> * The small stub does the diag 308 subcode 8 and 10.
>> * Subcode 8 basically passes the header that describes the encrypted
>> blob to the Ultravisor (well rather registers it with qemu to pass on later)
>> * Subcode 10 tells QEMU to move the VM into protected mode
>> * A lot of APIs in KVM and the Ultravisor are called
>> * The protected VM starts
>> * A memory mover copies the now unencrypted, but protected kernel to its
>> intended place and jumps into the entry function
>> * Linux boots and detects, that it is protected and needs to use bounce
>> buffers
>>
> 
> Thanks, this explanation makes things much clearer.

NP
We seem to assume that all of this is easily understandable, but we are
obviously biased :-)
I'll try to improve Documentation by adding Pierre to the discussion, as
he wasn't involved in the project yet.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls
  2019-11-05 17:51   ` Cornelia Huck
@ 2019-11-07 12:42     ` Michael Mueller
  0 siblings, 0 replies; 213+ messages in thread
From: Michael Mueller @ 2019-11-07 12:42 UTC (permalink / raw)
  To: Cornelia Huck, Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov, gor



On 05.11.19 18:51, Cornelia Huck wrote:
> On Thu, 24 Oct 2019 07:40:35 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> From: Michael Mueller <mimu@linux.ibm.com>
>>
>> Define the interruption injection codes and the related fields in the
>> sie control block for PVM interruption injection.
>>
>> Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/kvm_host.h | 25 +++++++++++++++++++++----
>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index 6cc3b73ca904..82443236d4cc 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -215,7 +215,15 @@ struct kvm_s390_sie_block {
>>   	__u8	icptcode;		/* 0x0050 */
>>   	__u8	icptstatus;		/* 0x0051 */
>>   	__u16	ihcpu;			/* 0x0052 */
>> -	__u8	reserved54[2];		/* 0x0054 */
>> +	__u8	reserved54;		/* 0x0054 */
>> +#define IICTL_CODE_NONE		 0x00
>> +#define IICTL_CODE_MCHK		 0x01
>> +#define IICTL_CODE_EXT		 0x02
>> +#define IICTL_CODE_IO		 0x03
>> +#define IICTL_CODE_RESTART	 0x04
>> +#define IICTL_CODE_SPECIFICATION 0x10
>> +#define IICTL_CODE_OPERAND	 0x11
>> +	__u8	iictl;			/* 0x0055 */
>>   	__u16	ipa;			/* 0x0056 */
>>   	__u32	ipb;			/* 0x0058 */
>>   	__u32	scaoh;			/* 0x005c */
>> @@ -252,7 +260,8 @@ struct kvm_s390_sie_block {
>>   #define HPID_KVM	0x4
>>   #define HPID_VSIE	0x5
>>   	__u8	hpid;			/* 0x00b8 */
>> -	__u8	reservedb9[11];		/* 0x00b9 */
>> +	__u8	reservedb9[7];		/* 0x00b9 */
>> +	__u32	eiparams;		/* 0x00c0 */
>>   	__u16	extcpuaddr;		/* 0x00c4 */
>>   	__u16	eic;			/* 0x00c6 */
>>   	__u32	reservedc8;		/* 0x00c8 */
>> @@ -268,8 +277,16 @@ struct kvm_s390_sie_block {
>>   	__u8	oai;			/* 0x00e2 */
>>   	__u8	armid;			/* 0x00e3 */
>>   	__u8	reservede4[4];		/* 0x00e4 */
>> -	__u64	tecmc;			/* 0x00e8 */
>> -	__u8	reservedf0[12];		/* 0x00f0 */
>> +	union {
>> +		__u64	tecmc;		/* 0x00e8 */
>> +		struct {
>> +			__u16	subchannel_id;	/* 0x00e8 */
>> +			__u16	subchannel_nr;	/* 0x00ea */
>> +			__u32	io_int_parm;	/* 0x00ec */
>> +			__u32	io_int_word;	/* 0x00f0 */
>> +		};
>> +	} __packed;
>> +	__u8	reservedf4[8];		/* 0x00f4 */
> 
> IIUC, for protected guests, you won't get an interception for which
> tecmc would be valid anymore, but need to put the I/O interruption
> stuff at the same place, right?

Yes, the format 4 architecture defines this.

> 
> My main issue is that this makes the control block definition a bit
> ugly, since the f0 value that's unused in the non-protvirt case is not
> obvious anymore; but I don't know how to express this without making it
> even uglier :(

:)

> 
>>   #define CRYCB_FORMAT_MASK 0x00000003
>>   #define CRYCB_FORMAT0 0x00000000
>>   #define CRYCB_FORMAT1 0x00000001
> 

Thanks
Michael

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 03/37] s390/protvirt: add ultravisor initialization
  2019-10-24 11:40 ` [RFC 03/37] s390/protvirt: add ultravisor initialization Janosch Frank
  2019-10-25  9:21   ` David Hildenbrand
  2019-11-01 10:07   ` Christian Borntraeger
@ 2019-11-07 15:28   ` Cornelia Huck
  2019-11-07 15:32     ` Janosch Frank
  2 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-07 15:28 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:25 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> From: Vasily Gorbik <gor@linux.ibm.com>
> 
> Before being able to host protected virtual machines, donate some of
> the memory to the ultravisor. Besides that the ultravisor might impose
> addressing limitations for memory used to back protected VM storage. Treat
> that limit as protected virtualization host's virtual memory limit.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> ---
>  arch/s390/include/asm/uv.h | 16 ++++++++++++
>  arch/s390/kernel/setup.c   |  3 +++
>  arch/s390/kernel/uv.c      | 53 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 72 insertions(+)

(...)

> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index 35ce89695509..f7778493e829 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -45,4 +45,57 @@ static int __init prot_virt_setup(char *val)
>  	return rc;
>  }
>  early_param("prot_virt", prot_virt_setup);
> +
> +static int __init uv_init(unsigned long stor_base, unsigned long stor_len)
> +{
> +	struct uv_cb_init uvcb = {
> +		.header.cmd = UVC_CMD_INIT_UV,
> +		.header.len = sizeof(uvcb),
> +		.stor_origin = stor_base,
> +		.stor_len = stor_len,
> +	};
> +	int cc;
> +
> +	cc = uv_call(0, (uint64_t)&uvcb);
> +	if (cc || uvcb.header.rc != UVC_RC_EXECUTED) {
> +		pr_err("Ultravisor init failed with cc: %d rc: 0x%hx\n", cc,
> +		       uvcb.header.rc);
> +		return -1;

Is there any reasonable case where that call might fail if we have the
facility installed? Bad stor_base, maybe?

> +	}
> +	return 0;
> +}
> +
> +void __init setup_uv(void)
> +{
> +	unsigned long uv_stor_base;
> +
> +	if (!prot_virt_host)
> +		return;
> +
> +	uv_stor_base = (unsigned long)memblock_alloc_try_nid(
> +		uv_info.uv_base_stor_len, SZ_1M, SZ_2G,
> +		MEMBLOCK_ALLOC_ACCESSIBLE, NUMA_NO_NODE);
> +	if (!uv_stor_base) {
> +		pr_info("Failed to reserve %lu bytes for ultravisor base storage\n",
> +			uv_info.uv_base_stor_len);
> +		goto fail;
> +	}
> +
> +	if (uv_init(uv_stor_base, uv_info.uv_base_stor_len)) {
> +		memblock_free(uv_stor_base, uv_info.uv_base_stor_len);
> +		goto fail;
> +	}
> +
> +	pr_info("Reserving %luMB as ultravisor base storage\n",
> +		uv_info.uv_base_stor_len >> 20);
> +	return;
> +fail:
> +	prot_virt_host = 0;

So, what happens if the user requested protected virtualization and any
of the above failed? We turn off host support, so any attempt to start
a protected virtualization guest on that host will fail (hopefully with
a meaningful error), I guess.

Is there any use case where we'd want to make failure to set this up
fatal?

> +}
> +
> +void adjust_to_uv_max(unsigned long *vmax)
> +{
> +	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
> +		*vmax = uv_info.max_sec_stor_addr;
> +}
>  #endif

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 03/37] s390/protvirt: add ultravisor initialization
  2019-11-07 15:28   ` Cornelia Huck
@ 2019-11-07 15:32     ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-07 15:32 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 3381 bytes --]

On 11/7/19 4:28 PM, Cornelia Huck wrote:
> On Thu, 24 Oct 2019 07:40:25 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> From: Vasily Gorbik <gor@linux.ibm.com>
>>
>> Before being able to host protected virtual machines, donate some of
>> the memory to the ultravisor. Besides that the ultravisor might impose
>> addressing limitations for memory used to back protected VM storage. Treat
>> that limit as protected virtualization host's virtual memory limit.
>>
>> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/uv.h | 16 ++++++++++++
>>  arch/s390/kernel/setup.c   |  3 +++
>>  arch/s390/kernel/uv.c      | 53 ++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 72 insertions(+)
> 
> (...)
> 
>> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
>> index 35ce89695509..f7778493e829 100644
>> --- a/arch/s390/kernel/uv.c
>> +++ b/arch/s390/kernel/uv.c
>> @@ -45,4 +45,57 @@ static int __init prot_virt_setup(char *val)
>>  	return rc;
>>  }
>>  early_param("prot_virt", prot_virt_setup);
>> +
>> +static int __init uv_init(unsigned long stor_base, unsigned long stor_len)
>> +{
>> +	struct uv_cb_init uvcb = {
>> +		.header.cmd = UVC_CMD_INIT_UV,
>> +		.header.len = sizeof(uvcb),
>> +		.stor_origin = stor_base,
>> +		.stor_len = stor_len,
>> +	};
>> +	int cc;
>> +
>> +	cc = uv_call(0, (uint64_t)&uvcb);
>> +	if (cc || uvcb.header.rc != UVC_RC_EXECUTED) {
>> +		pr_err("Ultravisor init failed with cc: %d rc: 0x%hx\n", cc,
>> +		       uvcb.header.rc);
>> +		return -1;
> 
> Is there any reasonable case where that call might fail if we have the
> facility installed? Bad stor_base, maybe?

Yes, wrong storage locations, length, etc...
Also if we are running with more than one CPU or the Ultravisor
encountered some internal error.

> 
>> +	}
>> +	return 0;
>> +}
>> +
>> +void __init setup_uv(void)
>> +{
>> +	unsigned long uv_stor_base;
>> +
>> +	if (!prot_virt_host)
>> +		return;
>> +
>> +	uv_stor_base = (unsigned long)memblock_alloc_try_nid(
>> +		uv_info.uv_base_stor_len, SZ_1M, SZ_2G,
>> +		MEMBLOCK_ALLOC_ACCESSIBLE, NUMA_NO_NODE);
>> +	if (!uv_stor_base) {
>> +		pr_info("Failed to reserve %lu bytes for ultravisor base storage\n",
>> +			uv_info.uv_base_stor_len);
>> +		goto fail;
>> +	}
>> +
>> +	if (uv_init(uv_stor_base, uv_info.uv_base_stor_len)) {
>> +		memblock_free(uv_stor_base, uv_info.uv_base_stor_len);
>> +		goto fail;
>> +	}
>> +
>> +	pr_info("Reserving %luMB as ultravisor base storage\n",
>> +		uv_info.uv_base_stor_len >> 20);
>> +	return;
>> +fail:
>> +	prot_virt_host = 0;
> 
> So, what happens if the user requested protected virtualization and any
> of the above failed? We turn off host support, so any attempt to start
> a protected virtualization guest on that host will fail (hopefully with
> a meaningful error), I guess.
> 

STFLE 161, and the associated diag308 subcodes 8-10 will not be
available to any VM. So yes, the stuv that starts a protected guest will
print a message.

> Is there any use case where we'd want to make failure to set this up
> fatal?

Not really.

> 
>> +}
>> +
>> +void adjust_to_uv_max(unsigned long *vmax)
>> +{
>> +	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
>> +		*vmax = uv_info.max_sec_stor_addr;
>> +}
>>  #endif
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-10-24 11:40 ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Janosch Frank
  2019-10-25  8:58   ` David Hildenbrand
  2019-11-04  8:18   ` Christian Borntraeger
@ 2019-11-07 16:29   ` Cornelia Huck
  2019-11-08  7:36     ` Janosch Frank
  2019-11-08 13:44   ` Thomas Huth
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-07 16:29 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:26 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> Let's add a KVM interface to create and destroy protected VMs.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h |  24 +++-
>  arch/s390/include/asm/uv.h       | 110 ++++++++++++++
>  arch/s390/kvm/Makefile           |   2 +-
>  arch/s390/kvm/kvm-s390.c         | 173 +++++++++++++++++++++-
>  arch/s390/kvm/kvm-s390.h         |  47 ++++++
>  arch/s390/kvm/pv.c               | 237 +++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h         |  33 +++++

Any new ioctls and caps probably want a mention in
Documentation/virt/kvm/api.txt :)

>  7 files changed, 622 insertions(+), 4 deletions(-)
>  create mode 100644 arch/s390/kvm/pv.c

(...)

> @@ -2157,6 +2164,96 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
>  	return r;
>  }
>  
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
> +{
> +	int r = 0;
> +	void __user *argp = (void __user *)cmd->data;
> +
> +	switch (cmd->cmd) {
> +	case KVM_PV_VM_CREATE: {
> +		r = kvm_s390_pv_alloc_vm(kvm);
> +		if (r)
> +			break;
> +
> +		mutex_lock(&kvm->lock);
> +		kvm_s390_vcpu_block_all(kvm);
> +		/* FMT 4 SIE needs esca */
> +		r = sca_switch_to_extended(kvm);
> +		if (!r)
> +			r = kvm_s390_pv_create_vm(kvm);
> +		kvm_s390_vcpu_unblock_all(kvm);
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
> +	case KVM_PV_VM_DESTROY: {
> +		/* All VCPUs have to be destroyed before this call. */
> +		mutex_lock(&kvm->lock);
> +		kvm_s390_vcpu_block_all(kvm);
> +		r = kvm_s390_pv_destroy_vm(kvm);
> +		if (!r)
> +			kvm_s390_pv_dealloc_vm(kvm);
> +		kvm_s390_vcpu_unblock_all(kvm);
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}

Would be helpful to have some code that shows when/how these are called
- do you have any plans to post something soon?

(...)

> @@ -2529,6 +2642,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  
>  	if (vcpu->kvm->arch.use_cmma)
>  		kvm_s390_vcpu_unsetup_cmma(vcpu);
> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
> +	    kvm_s390_pv_handle_cpu(vcpu))

I was a bit confused by that function name... maybe
kvm_s390_pv_cpu_get_handle()?

Also, if this always returns 0 if the config option is off, you
probably don't need to check for that option?

> +		kvm_s390_pv_destroy_cpu(vcpu);
>  	free_page((unsigned long)(vcpu->arch.sie_block));
>  
>  	kvm_vcpu_uninit(vcpu);
> @@ -2555,8 +2671,13 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  {
>  	kvm_free_vcpus(kvm);
>  	sca_dispose(kvm);
> -	debug_unregister(kvm->arch.dbf);
>  	kvm_s390_gisa_destroy(kvm);
> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
> +	    kvm_s390_pv_is_protected(kvm)) {
> +		kvm_s390_pv_destroy_vm(kvm);
> +		kvm_s390_pv_dealloc_vm(kvm);

It seems the pv vm can be either destroyed via the ioctl above or in
the course of normal vm destruction. When is which way supposed to be
used? Also, it seems kvm_s390_pv_destroy_vm() can fail -- can that be a
problem in this code path?

> +	}
> +	debug_unregister(kvm->arch.dbf);
>  	free_page((unsigned long)kvm->arch.sie_page2);
>  	if (!kvm_is_ucontrol(kvm))
>  		gmap_remove(kvm->arch.gmap);

(...)

> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
> index 6d9448dbd052..0d61dcc51f0e 100644
> --- a/arch/s390/kvm/kvm-s390.h
> +++ b/arch/s390/kvm/kvm-s390.h
> @@ -196,6 +196,53 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm)
>  	return kvm->arch.user_cpu_state_ctrl != 0;
>  }
>  
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +/* implemented in pv.c */
> +void kvm_s390_pv_unpin(struct kvm *kvm);
> +void kvm_s390_pv_dealloc_vm(struct kvm *kvm);
> +int kvm_s390_pv_alloc_vm(struct kvm *kvm);
> +int kvm_s390_pv_create_vm(struct kvm *kvm);
> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu);
> +int kvm_s390_pv_destroy_vm(struct kvm *kvm);
> +int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu);
> +int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> +		       unsigned long tweak);
> +int kvm_s390_pv_verify(struct kvm *kvm);
> +
> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
> +{
> +	return !!kvm->arch.pv.handle;
> +}
> +
> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm)

This function name is less confusing than the one below, but maybe also
rename this to kvm_s390_pv_get_handle() for consistency?

> +{
> +	return kvm->arch.pv.handle;
> +}
> +
> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.pv.handle;
> +}
> +#else
> +static inline void kvm_s390_pv_unpin(struct kvm *kvm) {}
> +static inline void kvm_s390_pv_dealloc_vm(struct kvm *kvm) {}
> +static inline int kvm_s390_pv_alloc_vm(struct kvm *kvm) { return 0; }
> +static inline int kvm_s390_pv_create_vm(struct kvm *kvm) { return 0; }
> +static inline int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu) { return 0; }
> +static inline int kvm_s390_pv_destroy_vm(struct kvm *kvm) { return 0; }
> +static inline int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu) { return 0; }
> +static inline int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
> +					    u64 origin, u64 length) { return 0; }
> +static inline int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr,
> +				     unsigned long size,  unsigned long tweak)
> +{ return 0; }
> +static inline int kvm_s390_pv_verify(struct kvm *kvm) { return 0; }
> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm) { return 0; }
> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm) { return 0; }
> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu) { return 0; }
> +#endif
> +
>  /* implemented in interrupt.c */
>  int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
>  void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu);

(...)

> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
> +{
> +	int rc;
> +	struct uv_cb_csc uvcb = {
> +		.header.cmd = UVC_CMD_CREATE_SEC_CPU,
> +		.header.len = sizeof(uvcb),
> +	};
> +
> +	/* EEXIST and ENOENT? */

?

> +	if (kvm_s390_pv_handle_cpu(vcpu))
> +		return -EINVAL;
> +
> +	vcpu->arch.pv.stor_base = __get_free_pages(GFP_KERNEL,
> +						   get_order(uv_info.guest_cpu_stor_len));
> +	if (!vcpu->arch.pv.stor_base)
> +		return -ENOMEM;
> +
> +	/* Input */
> +	uvcb.guest_handle = kvm_s390_pv_handle(vcpu->kvm);
> +	uvcb.num = vcpu->arch.sie_block->icpua;
> +	uvcb.state_origin = (u64)vcpu->arch.sie_block;
> +	uvcb.stor_origin = (u64)vcpu->arch.pv.stor_base;
> +
> +	rc = uv_call(0, (u64)&uvcb);
> +	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
> +		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
> +		   uvcb.header.rrc);
> +
> +	/* Output */
> +	vcpu->arch.pv.handle = uvcb.cpu_handle;
> +	vcpu->arch.sie_block->pv_handle_cpu = uvcb.cpu_handle;
> +	vcpu->arch.sie_block->pv_handle_config = kvm_s390_pv_handle(vcpu->kvm);
> +	vcpu->arch.sie_block->sdf = 2;
> +	if (!rc)
> +		return 0;
> +
> +	kvm_s390_pv_destroy_cpu(vcpu);
> +	return -EINVAL;
> +}

(...)

Only a quick readthrough, as this patch is longish.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-11-07 16:29   ` Cornelia Huck
@ 2019-11-08  7:36     ` Janosch Frank
  2019-11-11 16:25       ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-08  7:36 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 8349 bytes --]

On 11/7/19 5:29 PM, Cornelia Huck wrote:
> On Thu, 24 Oct 2019 07:40:26 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> Let's add a KVM interface to create and destroy protected VMs.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/kvm_host.h |  24 +++-
>>  arch/s390/include/asm/uv.h       | 110 ++++++++++++++
>>  arch/s390/kvm/Makefile           |   2 +-
>>  arch/s390/kvm/kvm-s390.c         | 173 +++++++++++++++++++++-
>>  arch/s390/kvm/kvm-s390.h         |  47 ++++++
>>  arch/s390/kvm/pv.c               | 237 +++++++++++++++++++++++++++++++
>>  include/uapi/linux/kvm.h         |  33 +++++
> 
> Any new ioctls and caps probably want a mention in
> Documentation/virt/kvm/api.txt :)

Noted

> 
>>  7 files changed, 622 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/s390/kvm/pv.c
> 
> (...)
> 
>> @@ -2157,6 +2164,96 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
>>  	return r;
>>  }
>>  
>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>> +static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>> +{
>> +	int r = 0;
>> +	void __user *argp = (void __user *)cmd->data;
>> +
>> +	switch (cmd->cmd) {
>> +	case KVM_PV_VM_CREATE: {
>> +		r = kvm_s390_pv_alloc_vm(kvm);
>> +		if (r)
>> +			break;
>> +
>> +		mutex_lock(&kvm->lock);
>> +		kvm_s390_vcpu_block_all(kvm);
>> +		/* FMT 4 SIE needs esca */
>> +		r = sca_switch_to_extended(kvm);
>> +		if (!r)
>> +			r = kvm_s390_pv_create_vm(kvm);
>> +		kvm_s390_vcpu_unblock_all(kvm);
>> +		mutex_unlock(&kvm->lock);
>> +		break;
>> +	}
>> +	case KVM_PV_VM_DESTROY: {
>> +		/* All VCPUs have to be destroyed before this call. */
>> +		mutex_lock(&kvm->lock);
>> +		kvm_s390_vcpu_block_all(kvm);
>> +		r = kvm_s390_pv_destroy_vm(kvm);
>> +		if (!r)
>> +			kvm_s390_pv_dealloc_vm(kvm);
>> +		kvm_s390_vcpu_unblock_all(kvm);
>> +		mutex_unlock(&kvm->lock);
>> +		break;
>> +	}
> 
> Would be helpful to have some code that shows when/how these are called
> - do you have any plans to post something soon?

Qemu patches will be in internal review soonish and afterwards I'll post
them upstream

> 
> (...)
> 
>> @@ -2529,6 +2642,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>  
>>  	if (vcpu->kvm->arch.use_cmma)
>>  		kvm_s390_vcpu_unsetup_cmma(vcpu);
>> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
>> +	    kvm_s390_pv_handle_cpu(vcpu))
> 
> I was a bit confused by that function name... maybe
> kvm_s390_pv_cpu_get_handle()?

Sure

> 
> Also, if this always returns 0 if the config option is off, you
> probably don't need to check for that option?

Hmm, if we decide to remove the config option altogether then it's not
needed anyway and I think that's what Christian wants.

> 
>> +		kvm_s390_pv_destroy_cpu(vcpu);
>>  	free_page((unsigned long)(vcpu->arch.sie_block));
>>  
>>  	kvm_vcpu_uninit(vcpu);
>> @@ -2555,8 +2671,13 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>  {
>>  	kvm_free_vcpus(kvm);
>>  	sca_dispose(kvm);
>> -	debug_unregister(kvm->arch.dbf);
>>  	kvm_s390_gisa_destroy(kvm);
>> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
>> +	    kvm_s390_pv_is_protected(kvm)) {
>> +		kvm_s390_pv_destroy_vm(kvm);
>> +		kvm_s390_pv_dealloc_vm(kvm);
> 
> It seems the pv vm can be either destroyed via the ioctl above or in
> the course of normal vm destruction. When is which way supposed to be
> used? Also, it seems kvm_s390_pv_destroy_vm() can fail -- can that be a
> problem in this code path?

On a reboot we need to tear down the protected VM and boot from
unprotected mode again. If the VM shuts down we go through this cleanup
path. If it fails the kernel will loose the memory that was allocated to
start the VM.

> 
>> +	}
>> +	debug_unregister(kvm->arch.dbf);
>>  	free_page((unsigned long)kvm->arch.sie_page2);
>>  	if (!kvm_is_ucontrol(kvm))
>>  		gmap_remove(kvm->arch.gmap);
> 
> (...)
> 
>> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
>> index 6d9448dbd052..0d61dcc51f0e 100644
>> --- a/arch/s390/kvm/kvm-s390.h
>> +++ b/arch/s390/kvm/kvm-s390.h
>> @@ -196,6 +196,53 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm)
>>  	return kvm->arch.user_cpu_state_ctrl != 0;
>>  }
>>  
>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>> +/* implemented in pv.c */
>> +void kvm_s390_pv_unpin(struct kvm *kvm);
>> +void kvm_s390_pv_dealloc_vm(struct kvm *kvm);
>> +int kvm_s390_pv_alloc_vm(struct kvm *kvm);
>> +int kvm_s390_pv_create_vm(struct kvm *kvm);
>> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu);
>> +int kvm_s390_pv_destroy_vm(struct kvm *kvm);
>> +int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu);
>> +int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
>> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
>> +		       unsigned long tweak);
>> +int kvm_s390_pv_verify(struct kvm *kvm);
>> +
>> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
>> +{
>> +	return !!kvm->arch.pv.handle;
>> +}
>> +
>> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm)
> 
> This function name is less confusing than the one below, but maybe also
> rename this to kvm_s390_pv_get_handle() for consistency?

kvm_s390_pv_kvm_handle?

> 
>> +{
>> +	return kvm->arch.pv.handle;
>> +}
>> +
>> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu)
>> +{
>> +	return vcpu->arch.pv.handle;
>> +}
>> +#else
>> +static inline void kvm_s390_pv_unpin(struct kvm *kvm) {}
>> +static inline void kvm_s390_pv_dealloc_vm(struct kvm *kvm) {}
>> +static inline int kvm_s390_pv_alloc_vm(struct kvm *kvm) { return 0; }
>> +static inline int kvm_s390_pv_create_vm(struct kvm *kvm) { return 0; }
>> +static inline int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu) { return 0; }
>> +static inline int kvm_s390_pv_destroy_vm(struct kvm *kvm) { return 0; }
>> +static inline int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu) { return 0; }
>> +static inline int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
>> +					    u64 origin, u64 length) { return 0; }
>> +static inline int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr,
>> +				     unsigned long size,  unsigned long tweak)
>> +{ return 0; }
>> +static inline int kvm_s390_pv_verify(struct kvm *kvm) { return 0; }
>> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm) { return 0; }
>> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm) { return 0; }
>> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu) { return 0; }
>> +#endif
>> +
>>  /* implemented in interrupt.c */
>>  int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
>>  void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu);
> 
> (...)
> 
>> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
>> +{
>> +	int rc;
>> +	struct uv_cb_csc uvcb = {
>> +		.header.cmd = UVC_CMD_CREATE_SEC_CPU,
>> +		.header.len = sizeof(uvcb),
>> +	};
>> +
>> +	/* EEXIST and ENOENT? */
> 
> ?

I was asking myself if EEXIST or ENOENT would be better error values
than EINVAL.

> 
>> +	if (kvm_s390_pv_handle_cpu(vcpu))
>> +		return -EINVAL;
>> +
>> +	vcpu->arch.pv.stor_base = __get_free_pages(GFP_KERNEL,
>> +						   get_order(uv_info.guest_cpu_stor_len));
>> +	if (!vcpu->arch.pv.stor_base)
>> +		return -ENOMEM;
>> +
>> +	/* Input */
>> +	uvcb.guest_handle = kvm_s390_pv_handle(vcpu->kvm);
>> +	uvcb.num = vcpu->arch.sie_block->icpua;
>> +	uvcb.state_origin = (u64)vcpu->arch.sie_block;
>> +	uvcb.stor_origin = (u64)vcpu->arch.pv.stor_base;
>> +
>> +	rc = uv_call(0, (u64)&uvcb);
>> +	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
>> +		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
>> +		   uvcb.header.rrc);
>> +
>> +	/* Output */
>> +	vcpu->arch.pv.handle = uvcb.cpu_handle;
>> +	vcpu->arch.sie_block->pv_handle_cpu = uvcb.cpu_handle;
>> +	vcpu->arch.sie_block->pv_handle_config = kvm_s390_pv_handle(vcpu->kvm);
>> +	vcpu->arch.sie_block->sdf = 2;
>> +	if (!rc)
>> +		return 0;
>> +
>> +	kvm_s390_pv_destroy_cpu(vcpu);
>> +	return -EINVAL;
>> +}
> 
> (...)
> 
> Only a quick readthrough, as this patch is longish.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-11-05  9:26       ` Cornelia Huck
@ 2019-11-08 12:14         ` Thomas Huth
  0 siblings, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-08 12:14 UTC (permalink / raw)
  To: Cornelia Huck, Christian Borntraeger
  Cc: Janosch Frank, kvm, linux-s390, david, imbrenda, mihajlov, mimu, gor

On 05/11/2019 10.26, Cornelia Huck wrote:
> On Mon, 4 Nov 2019 18:50:12 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 04.11.19 16:54, Cornelia Huck wrote:
>>> On Thu, 24 Oct 2019 07:40:24 -0400
>>> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>>>> diff --git a/arch/s390/boot/uv.c b/arch/s390/boot/uv.c
>>>> index ed007f4a6444..88cf8825d169 100644
>>>> --- a/arch/s390/boot/uv.c
>>>> +++ b/arch/s390/boot/uv.c
>>>> @@ -3,7 +3,12 @@
>>>>   #include <asm/facility.h>
>>>>   #include <asm/sections.h>
>>>>   
>>>> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>>>>   int __bootdata_preserved(prot_virt_guest);
>>>> +#endif
>>>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>>>> +struct uv_info __bootdata_preserved(uv_info);
>>>> +#endif
>>>
>>> Two functions with the same name, but different signatures look really
>>> ugly.
>>>
>>> Also, what happens if I want to build just a single kernel image for
>>> both guest and host?
>>
>> This is not two functions with the same name. It is 2 variable declarations with
>> the __bootdata_preserved helper. We expect to have all distro kernels to enable
>> both.
> 
> Ah ok, I misread that. (I'm blaming lack of sleep :/)

Honestly, I have to admit that I mis-read this in the same way as 
Cornelia at the first glance. Why is that macro not using capital 
letters? ... then it would be way more obvious that it's not about a 
function prototype...

  Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-10-24 11:40 ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Janosch Frank
                     ` (2 preceding siblings ...)
  2019-11-07 16:29   ` Cornelia Huck
@ 2019-11-08 13:44   ` Thomas Huth
  2019-11-13 10:28   ` Thomas Huth
  2019-11-13 11:48   ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Cornelia Huck
  5 siblings, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-08 13:44 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> Let's add a KVM interface to create and destroy protected VMs.

I agree with David, some more description here would be helpful.

> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>   arch/s390/include/asm/kvm_host.h |  24 +++-
>   arch/s390/include/asm/uv.h       | 110 ++++++++++++++
>   arch/s390/kvm/Makefile           |   2 +-
>   arch/s390/kvm/kvm-s390.c         | 173 +++++++++++++++++++++-
>   arch/s390/kvm/kvm-s390.h         |  47 ++++++
>   arch/s390/kvm/pv.c               | 237 +++++++++++++++++++++++++++++++
>   include/uapi/linux/kvm.h         |  33 +++++
>   7 files changed, 622 insertions(+), 4 deletions(-)
>   create mode 100644 arch/s390/kvm/pv.c
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 02f4c21c57f6..d4fd0f3af676 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -155,7 +155,13 @@ struct kvm_s390_sie_block {
>   	__u8	reserved08[4];		/* 0x0008 */
>   #define PROG_IN_SIE (1<<0)
>   	__u32	prog0c;			/* 0x000c */
> -	__u8	reserved10[16];		/* 0x0010 */
> +	union {
> +		__u8	reserved10[16];		/* 0x0010 */
> +		struct {
> +			__u64	pv_handle_cpu;
> +			__u64	pv_handle_config;
> +		};
> +	};

Why do you need to keep reserved10[] here? Simply replace it with the 
two new fields, and get rid of the union?


> +/*
> + * Generic cmd executor for calls that only transport the cpu or guest
> + * handle and the command.
> + */
> +static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
> +{
> +	int rc;
> +	struct uv_cb_nodata uvcb = {
> +		.header.cmd = cmd,
> +		.header.len = sizeof(uvcb),
> +		.handle = handle,
> +	};
> +
> +	WARN(!handle, "No handle provided to Ultravisor call cmd %x\n", cmd);

If this is not supposed to happen, I thing you should return here 
instead of doing the uv_call() below?
Or maybe even turn this into a BUG() statement?

> +	rc = uv_call(0, (u64)&uvcb);
> +	if (ret)
> +		*ret = *(u32 *)&uvcb.header.rc;
> +	return rc ? -EINVAL : 0;
> +}
[...]
> @@ -2157,6 +2164,96 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
>   	return r;
>   }
>   
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
> +{
> +	int r = 0;
> +	void __user *argp = (void __user *)cmd->data;
> +
> +	switch (cmd->cmd) {

Why are you using curly braces for the case statements below? They do 
not seem to be necessary in most cases?

> +	case KVM_PV_VM_CREATE: {
> +		r = kvm_s390_pv_alloc_vm(kvm);
> +		if (r)
> +			break;
> +
> +		mutex_lock(&kvm->lock);
> +		kvm_s390_vcpu_block_all(kvm);
> +		/* FMT 4 SIE needs esca */
> +		r = sca_switch_to_extended(kvm);
> +		if (!r)
> +			r = kvm_s390_pv_create_vm(kvm);
> +		kvm_s390_vcpu_unblock_all(kvm);
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
> +	case KVM_PV_VM_DESTROY: {
> +		/* All VCPUs have to be destroyed before this call. */
> +		mutex_lock(&kvm->lock);
> +		kvm_s390_vcpu_block_all(kvm);
> +		r = kvm_s390_pv_destroy_vm(kvm);
> +		if (!r)
> +			kvm_s390_pv_dealloc_vm(kvm);
> +		kvm_s390_vcpu_unblock_all(kvm);
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
> +	case KVM_PV_VM_SET_SEC_PARMS: {
> +		struct kvm_s390_pv_sec_parm parms = {};
> +		void *hdr;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&parms, argp, sizeof(parms)))
> +			break;
> +
> +		/* Currently restricted to 8KB */
> +		r = -EINVAL;
> +		if (parms.length > PAGE_SIZE * 2)
> +			break;

I think you should also check fr parms.length == 0 ... otherwise you'll 
get an unfriendly complaint from vmalloc().

> +		r = -ENOMEM;
> +		hdr = vmalloc(parms.length);
> +		if (!hdr)
> +			break;
> +
> +		r = -EFAULT;
> +		if (!copy_from_user(hdr, (void __user *)parms.origin,
> +				   parms.length))
> +			r = kvm_s390_pv_set_sec_parms(kvm, hdr, parms.length);
> +
> +		vfree(hdr);
> +		break;
> +	}
> +	case KVM_PV_VM_UNPACK: {
> +		struct kvm_s390_pv_unp unp = {};
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&unp, argp, sizeof(unp)))
> +			break;
> +
> +		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak);
> +		break;
> +	}
> +	case KVM_PV_VM_VERIFY: {
> +		u32 ret;
> +
> +		r = -EINVAL;
> +		if (!kvm_s390_pv_is_protected(kvm))
> +			break;
> +
> +		r = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
> +				  UVC_CMD_VERIFY_IMG,
> +				  &ret);
> +		VM_EVENT(kvm, 3, "PROTVIRT VERIFY: rc %x rrc %x",
> +			 ret >> 16, ret & 0x0000ffff);
> +		break;
> +	}
> +	default:
> +		return -ENOTTY;

Is ENOTTY the right thing to return for an invalid cmd here? It might 
get confused with the ioctl not being available at all? Maybe EINVAL 
would be better?

> +	}
> +	return r;
> +}
> +#endif
> +
[...]
> @@ -4338,6 +4471,28 @@ long kvm_arch_vcpu_async_ioctl(struct file *filp,
>   	return -ENOIOCTLCMD;
>   }
>   
> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> +static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
> +				   struct kvm_pv_cmd *cmd)
> +{
> +	int r = 0;
> +
> +	switch (cmd->cmd) {

Also no need for the curly braces of the case statements here?

> +	case KVM_PV_VCPU_CREATE: {
> +		r = kvm_s390_pv_create_cpu(vcpu);
> +		break;
> +	}
> +	case KVM_PV_VCPU_DESTROY: {
> +		r = kvm_s390_pv_destroy_cpu(vcpu);
> +		break;
> +	}
> +	default:
> +		r = -ENOTTY;

Or EINVAL?

> +	}
> +	return r;
> +}
> +#endif

  Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-11-08  7:36     ` Janosch Frank
@ 2019-11-11 16:25       ` Cornelia Huck
  2019-11-11 16:39         ` Janosch Frank
  2019-11-13 10:05         ` Thomas Huth
  0 siblings, 2 replies; 213+ messages in thread
From: Cornelia Huck @ 2019-11-11 16:25 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

[-- Attachment #1: Type: text/plain, Size: 8857 bytes --]

On Fri, 8 Nov 2019 08:36:35 +0100
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 11/7/19 5:29 PM, Cornelia Huck wrote:
> > On Thu, 24 Oct 2019 07:40:26 -0400
> > Janosch Frank <frankja@linux.ibm.com> wrote:

> >> @@ -2157,6 +2164,96 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
> >>  	return r;
> >>  }
> >>  
> >> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> >> +static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
> >> +{
> >> +	int r = 0;
> >> +	void __user *argp = (void __user *)cmd->data;
> >> +
> >> +	switch (cmd->cmd) {
> >> +	case KVM_PV_VM_CREATE: {
> >> +		r = kvm_s390_pv_alloc_vm(kvm);
> >> +		if (r)
> >> +			break;
> >> +
> >> +		mutex_lock(&kvm->lock);
> >> +		kvm_s390_vcpu_block_all(kvm);
> >> +		/* FMT 4 SIE needs esca */
> >> +		r = sca_switch_to_extended(kvm);

Looking at this again: this function calls kvm_s390_vcpu_block_all()
(which probably does not hurt), but then kvm_s390_vcpu_unblock_all()...
don't we want to keep the block across pv_create_vm() as well?

Also, can you maybe skip calling this function if we use the esca
already?

> >> +		if (!r)
> >> +			r = kvm_s390_pv_create_vm(kvm);
> >> +		kvm_s390_vcpu_unblock_all(kvm);
> >> +		mutex_unlock(&kvm->lock);
> >> +		break;
> >> +	}
> >> +	case KVM_PV_VM_DESTROY: {
> >> +		/* All VCPUs have to be destroyed before this call. */
> >> +		mutex_lock(&kvm->lock);
> >> +		kvm_s390_vcpu_block_all(kvm);
> >> +		r = kvm_s390_pv_destroy_vm(kvm);
> >> +		if (!r)
> >> +			kvm_s390_pv_dealloc_vm(kvm);
> >> +		kvm_s390_vcpu_unblock_all(kvm);
> >> +		mutex_unlock(&kvm->lock);
> >> +		break;
> >> +	}  
> > 
> > Would be helpful to have some code that shows when/how these are called
> > - do you have any plans to post something soon?  
> 
> Qemu patches will be in internal review soonish and afterwards I'll post
> them upstream

Great, looking forward to this :)

> 
> > 
> > (...)
> >   
> >> @@ -2529,6 +2642,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> >>  
> >>  	if (vcpu->kvm->arch.use_cmma)
> >>  		kvm_s390_vcpu_unsetup_cmma(vcpu);
> >> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
> >> +	    kvm_s390_pv_handle_cpu(vcpu))  
> > 
> > I was a bit confused by that function name... maybe
> > kvm_s390_pv_cpu_get_handle()?  
> 
> Sure
> 
> > 
> > Also, if this always returns 0 if the config option is off, you
> > probably don't need to check for that option?  
> 
> Hmm, if we decide to remove the config option altogether then it's not
> needed anyway and I think that's what Christian wants.

That would be fine with me as well (I have not yet thought about all
implications there, though.)

> 
> >   
> >> +		kvm_s390_pv_destroy_cpu(vcpu);
> >>  	free_page((unsigned long)(vcpu->arch.sie_block));
> >>  
> >>  	kvm_vcpu_uninit(vcpu);
> >> @@ -2555,8 +2671,13 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> >>  {
> >>  	kvm_free_vcpus(kvm);
> >>  	sca_dispose(kvm);
> >> -	debug_unregister(kvm->arch.dbf);
> >>  	kvm_s390_gisa_destroy(kvm);
> >> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
> >> +	    kvm_s390_pv_is_protected(kvm)) {
> >> +		kvm_s390_pv_destroy_vm(kvm);
> >> +		kvm_s390_pv_dealloc_vm(kvm);  
> > 
> > It seems the pv vm can be either destroyed via the ioctl above or in
> > the course of normal vm destruction. When is which way supposed to be
> > used? Also, it seems kvm_s390_pv_destroy_vm() can fail -- can that be a
> > problem in this code path?  
> 
> On a reboot we need to tear down the protected VM and boot from
> unprotected mode again. If the VM shuts down we go through this cleanup
> path. If it fails the kernel will loose the memory that was allocated to
> start the VM.

Shouldn't you at least log a moan in that case? Hopefully, this happens
very rarely, but the dbf will be gone...

> 
> >   
> >> +	}
> >> +	debug_unregister(kvm->arch.dbf);
> >>  	free_page((unsigned long)kvm->arch.sie_page2);
> >>  	if (!kvm_is_ucontrol(kvm))
> >>  		gmap_remove(kvm->arch.gmap);  
> > 
> > (...)
> >   
> >> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
> >> index 6d9448dbd052..0d61dcc51f0e 100644
> >> --- a/arch/s390/kvm/kvm-s390.h
> >> +++ b/arch/s390/kvm/kvm-s390.h
> >> @@ -196,6 +196,53 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm)
> >>  	return kvm->arch.user_cpu_state_ctrl != 0;
> >>  }
> >>  
> >> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> >> +/* implemented in pv.c */
> >> +void kvm_s390_pv_unpin(struct kvm *kvm);
> >> +void kvm_s390_pv_dealloc_vm(struct kvm *kvm);
> >> +int kvm_s390_pv_alloc_vm(struct kvm *kvm);
> >> +int kvm_s390_pv_create_vm(struct kvm *kvm);
> >> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu);
> >> +int kvm_s390_pv_destroy_vm(struct kvm *kvm);
> >> +int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu);
> >> +int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
> >> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> >> +		       unsigned long tweak);
> >> +int kvm_s390_pv_verify(struct kvm *kvm);
> >> +
> >> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
> >> +{
> >> +	return !!kvm->arch.pv.handle;
> >> +}
> >> +
> >> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm)  
> > 
> > This function name is less confusing than the one below, but maybe also
> > rename this to kvm_s390_pv_get_handle() for consistency?  
> 
> kvm_s390_pv_kvm_handle?

kvm_s390_pv_kvm_get_handle() would mirror the cpu function :) </bikeshed>

> 
> >   
> >> +{
> >> +	return kvm->arch.pv.handle;
> >> +}
> >> +
> >> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu)
> >> +{
> >> +	return vcpu->arch.pv.handle;
> >> +}
> >> +#else
> >> +static inline void kvm_s390_pv_unpin(struct kvm *kvm) {}
> >> +static inline void kvm_s390_pv_dealloc_vm(struct kvm *kvm) {}
> >> +static inline int kvm_s390_pv_alloc_vm(struct kvm *kvm) { return 0; }
> >> +static inline int kvm_s390_pv_create_vm(struct kvm *kvm) { return 0; }
> >> +static inline int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu) { return 0; }
> >> +static inline int kvm_s390_pv_destroy_vm(struct kvm *kvm) { return 0; }
> >> +static inline int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu) { return 0; }
> >> +static inline int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
> >> +					    u64 origin, u64 length) { return 0; }
> >> +static inline int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr,
> >> +				     unsigned long size,  unsigned long tweak)
> >> +{ return 0; }
> >> +static inline int kvm_s390_pv_verify(struct kvm *kvm) { return 0; }
> >> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm) { return 0; }
> >> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm) { return 0; }
> >> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu) { return 0; }
> >> +#endif
> >> +
> >>  /* implemented in interrupt.c */
> >>  int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
> >>  void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu);  
> > 
> > (...)
> >   
> >> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
> >> +{
> >> +	int rc;
> >> +	struct uv_cb_csc uvcb = {
> >> +		.header.cmd = UVC_CMD_CREATE_SEC_CPU,
> >> +		.header.len = sizeof(uvcb),
> >> +	};
> >> +
> >> +	/* EEXIST and ENOENT? */  
> > 
> > ?  
> 
> I was asking myself if EEXIST or ENOENT would be better error values
> than EINVAL.

EEXIST might be better, but I don't really like ENOENT.

> 
> >   
> >> +	if (kvm_s390_pv_handle_cpu(vcpu))
> >> +		return -EINVAL;
> >> +
> >> +	vcpu->arch.pv.stor_base = __get_free_pages(GFP_KERNEL,
> >> +						   get_order(uv_info.guest_cpu_stor_len));
> >> +	if (!vcpu->arch.pv.stor_base)
> >> +		return -ENOMEM;
> >> +
> >> +	/* Input */
> >> +	uvcb.guest_handle = kvm_s390_pv_handle(vcpu->kvm);
> >> +	uvcb.num = vcpu->arch.sie_block->icpua;
> >> +	uvcb.state_origin = (u64)vcpu->arch.sie_block;
> >> +	uvcb.stor_origin = (u64)vcpu->arch.pv.stor_base;
> >> +
> >> +	rc = uv_call(0, (u64)&uvcb);
> >> +	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
> >> +		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
> >> +		   uvcb.header.rrc);
> >> +
> >> +	/* Output */
> >> +	vcpu->arch.pv.handle = uvcb.cpu_handle;
> >> +	vcpu->arch.sie_block->pv_handle_cpu = uvcb.cpu_handle;
> >> +	vcpu->arch.sie_block->pv_handle_config = kvm_s390_pv_handle(vcpu->kvm);
> >> +	vcpu->arch.sie_block->sdf = 2;
> >> +	if (!rc)
> >> +		return 0;
> >> +
> >> +	kvm_s390_pv_destroy_cpu(vcpu);
> >> +	return -EINVAL;
> >> +}  
> > 
> > (...)
> > 
> > Only a quick readthrough, as this patch is longish.
> >   
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-11-11 16:25       ` Cornelia Huck
@ 2019-11-11 16:39         ` Janosch Frank
  2019-11-11 16:54           ` Cornelia Huck
  2019-11-13 10:05         ` Thomas Huth
  1 sibling, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-11 16:39 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 9206 bytes --]

On 11/11/19 5:25 PM, Cornelia Huck wrote:
> On Fri, 8 Nov 2019 08:36:35 +0100
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> On 11/7/19 5:29 PM, Cornelia Huck wrote:
>>> On Thu, 24 Oct 2019 07:40:26 -0400
>>> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>>>> @@ -2157,6 +2164,96 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
>>>>  	return r;
>>>>  }
>>>>  
>>>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>>>> +static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>>>> +{
>>>> +	int r = 0;
>>>> +	void __user *argp = (void __user *)cmd->data;
>>>> +
>>>> +	switch (cmd->cmd) {
>>>> +	case KVM_PV_VM_CREATE: {
>>>> +		r = kvm_s390_pv_alloc_vm(kvm);
>>>> +		if (r)
>>>> +			break;
>>>> +
>>>> +		mutex_lock(&kvm->lock);
>>>> +		kvm_s390_vcpu_block_all(kvm);
>>>> +		/* FMT 4 SIE needs esca */
>>>> +		r = sca_switch_to_extended(kvm);
> 
> Looking at this again: this function calls kvm_s390_vcpu_block_all()
> (which probably does not hurt), but then kvm_s390_vcpu_unblock_all()...
> don't we want to keep the block across pv_create_vm() as well?

Yeah

> 
> Also, can you maybe skip calling this function if we use the esca
> already?

Did I forget to include that in the patchset?
I extended sca_switch_to_extended() to just return in that case.

> 
>>>> +		if (!r)
>>>> +			r = kvm_s390_pv_create_vm(kvm);
>>>> +		kvm_s390_vcpu_unblock_all(kvm);
>>>> +		mutex_unlock(&kvm->lock);
>>>> +		break;
>>>> +	}
>>>> +	case KVM_PV_VM_DESTROY: {
>>>> +		/* All VCPUs have to be destroyed before this call. */
>>>> +		mutex_lock(&kvm->lock);
>>>> +		kvm_s390_vcpu_block_all(kvm);
>>>> +		r = kvm_s390_pv_destroy_vm(kvm);
>>>> +		if (!r)
>>>> +			kvm_s390_pv_dealloc_vm(kvm);
>>>> +		kvm_s390_vcpu_unblock_all(kvm);
>>>> +		mutex_unlock(&kvm->lock);
>>>> +		break;
>>>> +	}  
>>>
>>> Would be helpful to have some code that shows when/how these are called
>>> - do you have any plans to post something soon?  
>>
>> Qemu patches will be in internal review soonish and afterwards I'll post
>> them upstream
> 
> Great, looking forward to this :)
> 
>>
>>>
>>> (...)
>>>   
>>>> @@ -2529,6 +2642,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>>  
>>>>  	if (vcpu->kvm->arch.use_cmma)
>>>>  		kvm_s390_vcpu_unsetup_cmma(vcpu);
>>>> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
>>>> +	    kvm_s390_pv_handle_cpu(vcpu))  
>>>
>>> I was a bit confused by that function name... maybe
>>> kvm_s390_pv_cpu_get_handle()?  
>>
>> Sure
>>
>>>
>>> Also, if this always returns 0 if the config option is off, you
>>> probably don't need to check for that option?  
>>
>> Hmm, if we decide to remove the config option altogether then it's not
>> needed anyway and I think that's what Christian wants.
> 
> That would be fine with me as well (I have not yet thought about all
> implications there, though.)
> 
>>
>>>   
>>>> +		kvm_s390_pv_destroy_cpu(vcpu);
>>>>  	free_page((unsigned long)(vcpu->arch.sie_block));
>>>>  
>>>>  	kvm_vcpu_uninit(vcpu);
>>>> @@ -2555,8 +2671,13 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>>>  {
>>>>  	kvm_free_vcpus(kvm);
>>>>  	sca_dispose(kvm);
>>>> -	debug_unregister(kvm->arch.dbf);
>>>>  	kvm_s390_gisa_destroy(kvm);
>>>> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
>>>> +	    kvm_s390_pv_is_protected(kvm)) {
>>>> +		kvm_s390_pv_destroy_vm(kvm);
>>>> +		kvm_s390_pv_dealloc_vm(kvm);  
>>>
>>> It seems the pv vm can be either destroyed via the ioctl above or in
>>> the course of normal vm destruction. When is which way supposed to be
>>> used? Also, it seems kvm_s390_pv_destroy_vm() can fail -- can that be a
>>> problem in this code path?  
>>
>> On a reboot we need to tear down the protected VM and boot from
>> unprotected mode again. If the VM shuts down we go through this cleanup
>> path. If it fails the kernel will loose the memory that was allocated to
>> start the VM.
> 
> Shouldn't you at least log a moan in that case? Hopefully, this happens
> very rarely, but the dbf will be gone...

That's why I created the uv dbf :-)
Well, it shouldn't happen at all so maybe a WARN will be a good option

> 
>>
>>>   
>>>> +	}
>>>> +	debug_unregister(kvm->arch.dbf);
>>>>  	free_page((unsigned long)kvm->arch.sie_page2);
>>>>  	if (!kvm_is_ucontrol(kvm))
>>>>  		gmap_remove(kvm->arch.gmap);  
>>>
>>> (...)
>>>   
>>>> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
>>>> index 6d9448dbd052..0d61dcc51f0e 100644
>>>> --- a/arch/s390/kvm/kvm-s390.h
>>>> +++ b/arch/s390/kvm/kvm-s390.h
>>>> @@ -196,6 +196,53 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm)
>>>>  	return kvm->arch.user_cpu_state_ctrl != 0;
>>>>  }
>>>>  
>>>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>>>> +/* implemented in pv.c */
>>>> +void kvm_s390_pv_unpin(struct kvm *kvm);
>>>> +void kvm_s390_pv_dealloc_vm(struct kvm *kvm);
>>>> +int kvm_s390_pv_alloc_vm(struct kvm *kvm);
>>>> +int kvm_s390_pv_create_vm(struct kvm *kvm);
>>>> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu);
>>>> +int kvm_s390_pv_destroy_vm(struct kvm *kvm);
>>>> +int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu);
>>>> +int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
>>>> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
>>>> +		       unsigned long tweak);
>>>> +int kvm_s390_pv_verify(struct kvm *kvm);
>>>> +
>>>> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
>>>> +{
>>>> +	return !!kvm->arch.pv.handle;
>>>> +}
>>>> +
>>>> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm)  
>>>
>>> This function name is less confusing than the one below, but maybe also
>>> rename this to kvm_s390_pv_get_handle() for consistency?  
>>
>> kvm_s390_pv_kvm_handle?
> 
> kvm_s390_pv_kvm_get_handle() would mirror the cpu function :) </bikeshed>
> 
>>
>>>   
>>>> +{
>>>> +	return kvm->arch.pv.handle;
>>>> +}
>>>> +
>>>> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	return vcpu->arch.pv.handle;
>>>> +}
>>>> +#else
>>>> +static inline void kvm_s390_pv_unpin(struct kvm *kvm) {}
>>>> +static inline void kvm_s390_pv_dealloc_vm(struct kvm *kvm) {}
>>>> +static inline int kvm_s390_pv_alloc_vm(struct kvm *kvm) { return 0; }
>>>> +static inline int kvm_s390_pv_create_vm(struct kvm *kvm) { return 0; }
>>>> +static inline int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu) { return 0; }
>>>> +static inline int kvm_s390_pv_destroy_vm(struct kvm *kvm) { return 0; }
>>>> +static inline int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu) { return 0; }
>>>> +static inline int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
>>>> +					    u64 origin, u64 length) { return 0; }
>>>> +static inline int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr,
>>>> +				     unsigned long size,  unsigned long tweak)
>>>> +{ return 0; }
>>>> +static inline int kvm_s390_pv_verify(struct kvm *kvm) { return 0; }
>>>> +static inline bool kvm_s390_pv_is_protected(struct kvm *kvm) { return 0; }
>>>> +static inline u64 kvm_s390_pv_handle(struct kvm *kvm) { return 0; }
>>>> +static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu) { return 0; }
>>>> +#endif
>>>> +
>>>>  /* implemented in interrupt.c */
>>>>  int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
>>>>  void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu);  
>>>
>>> (...)
>>>   
>>>> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	int rc;
>>>> +	struct uv_cb_csc uvcb = {
>>>> +		.header.cmd = UVC_CMD_CREATE_SEC_CPU,
>>>> +		.header.len = sizeof(uvcb),
>>>> +	};
>>>> +
>>>> +	/* EEXIST and ENOENT? */  
>>>
>>> ?  
>>
>> I was asking myself if EEXIST or ENOENT would be better error values
>> than EINVAL.
> 
> EEXIST might be better, but I don't really like ENOENT.
> 
>>
>>>   
>>>> +	if (kvm_s390_pv_handle_cpu(vcpu))
>>>> +		return -EINVAL;
>>>> +
>>>> +	vcpu->arch.pv.stor_base = __get_free_pages(GFP_KERNEL,
>>>> +						   get_order(uv_info.guest_cpu_stor_len));
>>>> +	if (!vcpu->arch.pv.stor_base)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	/* Input */
>>>> +	uvcb.guest_handle = kvm_s390_pv_handle(vcpu->kvm);
>>>> +	uvcb.num = vcpu->arch.sie_block->icpua;
>>>> +	uvcb.state_origin = (u64)vcpu->arch.sie_block;
>>>> +	uvcb.stor_origin = (u64)vcpu->arch.pv.stor_base;
>>>> +
>>>> +	rc = uv_call(0, (u64)&uvcb);
>>>> +	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
>>>> +		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
>>>> +		   uvcb.header.rrc);
>>>> +
>>>> +	/* Output */
>>>> +	vcpu->arch.pv.handle = uvcb.cpu_handle;
>>>> +	vcpu->arch.sie_block->pv_handle_cpu = uvcb.cpu_handle;
>>>> +	vcpu->arch.sie_block->pv_handle_config = kvm_s390_pv_handle(vcpu->kvm);
>>>> +	vcpu->arch.sie_block->sdf = 2;
>>>> +	if (!rc)
>>>> +		return 0;
>>>> +
>>>> +	kvm_s390_pv_destroy_cpu(vcpu);
>>>> +	return -EINVAL;
>>>> +}  
>>>
>>> (...)
>>>
>>> Only a quick readthrough, as this patch is longish.
>>>   
>>
>>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 06/37] s390: UV: Add import and export to UV library
  2019-10-24 11:40 ` [RFC 06/37] s390: UV: Add import and export to UV library Janosch Frank
                     ` (2 preceding siblings ...)
  2019-11-01 12:42   ` Christian Borntraeger
@ 2019-11-11 16:40   ` Cornelia Huck
  2019-11-11 16:56     ` Janosch Frank
  3 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-11 16:40 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:28 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> The convert to/from secure (or also "import/export") ultravisor calls
> are need for page management, i.e. paging, of secure execution VM.
> 
> Export encrypts a secure guest's page and makes it accessible to the
> guest for paging.
> 
> Import makes a page accessible to a secure guest.
> On the first import of that page, the page will be cleared by the
> Ultravisor before it is given to the guest.
> 
> All following imports will decrypt a exported page and verify
> integrity before giving the page to the guest.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/uv.h | 51 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 0bfbafcca136..99cdd2034503 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -15,6 +15,7 @@
>  #include <linux/errno.h>
>  #include <linux/bug.h>
>  #include <asm/page.h>
> +#include <asm/gmap.h>
>  
>  #define UVC_RC_EXECUTED		0x0001
>  #define UVC_RC_INV_CMD		0x0002
> @@ -279,6 +280,54 @@ static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
>  	return rc ? -EINVAL : 0;
>  }
>  
> +/*
> + * Requests the Ultravisor to encrypt a guest page and make it
> + * accessible to the host for paging (export).
> + *
> + * @paddr: Absolute host address of page to be exported
> + */
> +static inline int uv_convert_from_secure(unsigned long paddr)
> +{
> +	struct uv_cb_cfs uvcb = {
> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.paddr = paddr
> +	};
> +	if (!uv_call(0, (u64)&uvcb))
> +		return 0;
> +	return -EINVAL;

No possibility for other return codes here (e.g. -EFAULT)? (Asking
because you look at a rc in the control block in the reverse function.)

> +}
> +
> +/*
> + * Requests the Ultravisor to make a page accessible to a guest
> + * (import). If it's brought in the first time, it will be cleared. If
> + * it has been exported before, it will be decrypted and integrity
> + * checked.
> + *
> + * @handle: Ultravisor guest handle
> + * @gaddr: Guest 2 absolute address to be imported
> + */
> +static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
> +{
> +	int cc;
> +	struct uv_cb_cts uvcb = {
> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.guest_handle = gmap->se_handle,
> +		.gaddr = gaddr
> +	};
> +
> +	cc = uv_call(0, (u64)&uvcb);
> +
> +	if (!cc)
> +		return 0;
> +	if (uvcb.header.rc == 0x104)
> +		return -EEXIST;
> +	if (uvcb.header.rc == 0x10a)
> +		return -EFAULT;
> +	return -EINVAL;
> +}
> +
>  void setup_uv(void);
>  void adjust_to_uv_max(unsigned long *vmax);
>  #else
> @@ -286,6 +335,8 @@ void adjust_to_uv_max(unsigned long *vmax);
>  static inline void setup_uv(void) {}
>  static inline void adjust_to_uv_max(unsigned long *vmax) {}
>  static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret) { return 0; }
> +static inline int uv_convert_from_secure(unsigned long paddr) { return 0; }
> +static inline int uv_convert_to_secure(unsigned long handle, unsigned long gaddr) { return 0; }
>  #endif
>  
>  #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-11-11 16:39         ` Janosch Frank
@ 2019-11-11 16:54           ` Cornelia Huck
  0 siblings, 0 replies; 213+ messages in thread
From: Cornelia Huck @ 2019-11-11 16:54 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

[-- Attachment #1: Type: text/plain, Size: 3357 bytes --]

On Mon, 11 Nov 2019 17:39:15 +0100
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 11/11/19 5:25 PM, Cornelia Huck wrote:
> > On Fri, 8 Nov 2019 08:36:35 +0100
> > Janosch Frank <frankja@linux.ibm.com> wrote:
> >   
> >> On 11/7/19 5:29 PM, Cornelia Huck wrote:  
> >>> On Thu, 24 Oct 2019 07:40:26 -0400
> >>> Janosch Frank <frankja@linux.ibm.com> wrote:  
> >   
> >>>> @@ -2157,6 +2164,96 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
> >>>>  	return r;
> >>>>  }
> >>>>  
> >>>> +#ifdef CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST
> >>>> +static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
> >>>> +{
> >>>> +	int r = 0;
> >>>> +	void __user *argp = (void __user *)cmd->data;
> >>>> +
> >>>> +	switch (cmd->cmd) {
> >>>> +	case KVM_PV_VM_CREATE: {
> >>>> +		r = kvm_s390_pv_alloc_vm(kvm);
> >>>> +		if (r)
> >>>> +			break;
> >>>> +
> >>>> +		mutex_lock(&kvm->lock);
> >>>> +		kvm_s390_vcpu_block_all(kvm);
> >>>> +		/* FMT 4 SIE needs esca */
> >>>> +		r = sca_switch_to_extended(kvm);  
> > 
> > Looking at this again: this function calls kvm_s390_vcpu_block_all()
> > (which probably does not hurt), but then kvm_s390_vcpu_unblock_all()...
> > don't we want to keep the block across pv_create_vm() as well?  
> 
> Yeah
> 
> > 
> > Also, can you maybe skip calling this function if we use the esca
> > already?  
> 
> Did I forget to include that in the patchset?
> I extended sca_switch_to_extended() to just return in that case.

If you did, I likely missed it; way too much stuff to review :)

> 
> >   
> >>>> +		if (!r)
> >>>> +			r = kvm_s390_pv_create_vm(kvm);
> >>>> +		kvm_s390_vcpu_unblock_all(kvm);
> >>>> +		mutex_unlock(&kvm->lock);
> >>>> +		break;
> >>>> +	}

(...)

> >>>> @@ -2555,8 +2671,13 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> >>>>  {
> >>>>  	kvm_free_vcpus(kvm);
> >>>>  	sca_dispose(kvm);
> >>>> -	debug_unregister(kvm->arch.dbf);
> >>>>  	kvm_s390_gisa_destroy(kvm);
> >>>> +	if (IS_ENABLED(CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST) &&
> >>>> +	    kvm_s390_pv_is_protected(kvm)) {
> >>>> +		kvm_s390_pv_destroy_vm(kvm);
> >>>> +		kvm_s390_pv_dealloc_vm(kvm);    
> >>>
> >>> It seems the pv vm can be either destroyed via the ioctl above or in
> >>> the course of normal vm destruction. When is which way supposed to be
> >>> used? Also, it seems kvm_s390_pv_destroy_vm() can fail -- can that be a
> >>> problem in this code path?    
> >>
> >> On a reboot we need to tear down the protected VM and boot from
> >> unprotected mode again. If the VM shuts down we go through this cleanup
> >> path. If it fails the kernel will loose the memory that was allocated to
> >> start the VM.  
> > 
> > Shouldn't you at least log a moan in that case? Hopefully, this happens
> > very rarely, but the dbf will be gone...  
> 
> That's why I created the uv dbf :-)

Again, way too easy to get lost in these changes :)

> Well, it shouldn't happen at all so maybe a WARN will be a good option

Yeah, if it this is one of these "should not happen" things, a WARN
sounds good.

> 
> >   
> >>  
> >>>     
> >>>> +	}
> >>>> +	debug_unregister(kvm->arch.dbf);
> >>>>  	free_page((unsigned long)kvm->arch.sie_page2);
> >>>>  	if (!kvm_is_ucontrol(kvm))
> >>>>  		gmap_remove(kvm->arch.gmap);    

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 06/37] s390: UV: Add import and export to UV library
  2019-11-11 16:40   ` Cornelia Huck
@ 2019-11-11 16:56     ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-11 16:56 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 3806 bytes --]

On 11/11/19 5:40 PM, Cornelia Huck wrote:
> On Thu, 24 Oct 2019 07:40:28 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> The convert to/from secure (or also "import/export") ultravisor calls
>> are need for page management, i.e. paging, of secure execution VM.
>>
>> Export encrypts a secure guest's page and makes it accessible to the
>> guest for paging.
>>
>> Import makes a page accessible to a secure guest.
>> On the first import of that page, the page will be cleared by the
>> Ultravisor before it is given to the guest.
>>
>> All following imports will decrypt a exported page and verify
>> integrity before giving the page to the guest.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/uv.h | 51 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 51 insertions(+)
>>
>> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
>> index 0bfbafcca136..99cdd2034503 100644
>> --- a/arch/s390/include/asm/uv.h
>> +++ b/arch/s390/include/asm/uv.h
>> @@ -15,6 +15,7 @@
>>  #include <linux/errno.h>
>>  #include <linux/bug.h>
>>  #include <asm/page.h>
>> +#include <asm/gmap.h>
>>  
>>  #define UVC_RC_EXECUTED		0x0001
>>  #define UVC_RC_INV_CMD		0x0002
>> @@ -279,6 +280,54 @@ static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
>>  	return rc ? -EINVAL : 0;
>>  }
>>  
>> +/*
>> + * Requests the Ultravisor to encrypt a guest page and make it
>> + * accessible to the host for paging (export).
>> + *
>> + * @paddr: Absolute host address of page to be exported
>> + */
>> +static inline int uv_convert_from_secure(unsigned long paddr)
>> +{
>> +	struct uv_cb_cfs uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.paddr = paddr
>> +	};
>> +	if (!uv_call(0, (u64)&uvcb))
>> +		return 0;
>> +	return -EINVAL;
> 
> No possibility for other return codes here (e.g. -EFAULT)? (Asking
> because you look at a rc in the control block in the reverse function.)

Notice the "paddr" variable?
We work on physical memory for this UV call, all error codes that are
defined are either input errors (not possible via the exception
handlers), a KVM management error or an attack on the VM.

> 
>> +}
>> +
>> +/*
>> + * Requests the Ultravisor to make a page accessible to a guest
>> + * (import). If it's brought in the first time, it will be cleared. If
>> + * it has been exported before, it will be decrypted and integrity
>> + * checked.
>> + *
>> + * @handle: Ultravisor guest handle
>> + * @gaddr: Guest 2 absolute address to be imported
>> + */
>> +static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
>> +{
>> +	int cc;
>> +	struct uv_cb_cts uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.guest_handle = gmap->se_handle,
>> +		.gaddr = gaddr
>> +	};
>> +
>> +	cc = uv_call(0, (u64)&uvcb);
>> +
>> +	if (!cc)
>> +		return 0;
>> +	if (uvcb.header.rc == 0x104)
>> +		return -EEXIST;
>> +	if (uvcb.header.rc == 0x10a)
>> +		return -EFAULT;
>> +	return -EINVAL;
>> +}
>> +
>>  void setup_uv(void);
>>  void adjust_to_uv_max(unsigned long *vmax);
>>  #else
>> @@ -286,6 +335,8 @@ void adjust_to_uv_max(unsigned long *vmax);
>>  static inline void setup_uv(void) {}
>>  static inline void adjust_to_uv_max(unsigned long *vmax) {}
>>  static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret) { return 0; }
>> +static inline int uv_convert_from_secure(unsigned long paddr) { return 0; }
>> +static inline int uv_convert_to_secure(unsigned long handle, unsigned long gaddr) { return 0; }
>>  #endif
>>  
>>  #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 01/37] DOCUMENTATION: protvirt: Protected virtual machine introduction
  2019-11-04 14:18   ` Cornelia Huck
@ 2019-11-12 14:38     ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-12 14:38 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 2332 bytes --]

On 11/4/19 3:18 PM, Cornelia Huck wrote:
> On Thu, 24 Oct 2019 07:40:23 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> Introduction to Protected VMs.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  Documentation/virtual/kvm/s390-pv.txt | 23 +++++++++++++++++++++++
>>  1 file changed, 23 insertions(+)
>>  create mode 100644 Documentation/virtual/kvm/s390-pv.txt
>>
>> diff --git a/Documentation/virtual/kvm/s390-pv.txt b/Documentation/virtual/kvm/s390-pv.txt
>> new file mode 100644
>> index 000000000000..86ed95f36759
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/s390-pv.txt
> 
> This should be under /virt/, I think. Also, maybe start out with RST
> already for new files?
> 
>> @@ -0,0 +1,23 @@
>> +Ultravisor and Protected VMs
>> +===========================
>> +
>> +Summary:
>> +
>> +Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state
>> +like guest memory and guest registers anymore. Instead the PVMs are
> 
> s/Instead/Instead,/

Fixed

> 
>> +mostly managed by a new entity called Ultravisor (UV), which provides
>> +an API, so KVM and the PVM can request management actions.
> 
> Hm...
> 
> "The UV provides an API (both for guests and hypervisors), where PVMs
> and KVM can request management actions." ?

I applied your proposal, but removed the part in the brace, as it is
obvious from the words that follow.

> 
>> +
>> +Each guest starts in the non-protected mode and then transitions into
> 
> "and then may make a request to transition into protected mode" ?

Sure

> 
>> +protected mode. On transition KVM registers the guest and its VCPUs
>> +with the Ultravisor and prepares everything for running it.
>> +
>> +The Ultravisor will secure and decrypt the guest's boot memory
>> +(i.e. kernel/initrd). It will safeguard state changes like VCPU
>> +starts/stops and injected interrupts while the guest is running.
>> +
>> +As access to the guest's state, like the SIE state description is
> 
> "such as the SIE state description," ?
> 
>> +normally needed to be able to run a VM, some changes have been made in
>> +SIE behavior and fields have different meaning for a PVM. SIE exits
>> +are minimized as much as possible to improve speed and reduce exposed
>> +guest state.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 02/37] s390/protvirt: introduce host side setup
  2019-11-04 14:26     ` Cornelia Huck
@ 2019-11-12 14:47       ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-12 14:47 UTC (permalink / raw)
  To: Cornelia Huck, Christian Borntraeger
  Cc: kvm, linux-s390, thuth, david, imbrenda, mihajlov, mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 3323 bytes --]

On 11/4/19 3:26 PM, Cornelia Huck wrote:
> On Fri, 1 Nov 2019 09:53:12 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 24.10.19 13:40, Janosch Frank wrote:
>>> From: Vasily Gorbik <gor@linux.ibm.com>
>>>
>>> Introduce KVM_S390_PROTECTED_VIRTUALIZATION_HOST kbuild option for
>>> protected virtual machines hosting support code.
>>>
>>> Add "prot_virt" command line option which controls if the kernel
>>> protected VMs support is enabled at runtime.
>>>
>>> Extend ultravisor info definitions and expose it via uv_info struct
>>> filled in during startup.
>>>
>>> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
>>> ---
>>>  .../admin-guide/kernel-parameters.txt         |  5 ++
>>>  arch/s390/boot/Makefile                       |  2 +-
>>>  arch/s390/boot/uv.c                           | 20 +++++++-
>>>  arch/s390/include/asm/uv.h                    | 46 ++++++++++++++++--
>>>  arch/s390/kernel/Makefile                     |  1 +
>>>  arch/s390/kernel/setup.c                      |  4 --
>>>  arch/s390/kernel/uv.c                         | 48 +++++++++++++++++++
>>>  arch/s390/kvm/Kconfig                         |  9 ++++
>>>  8 files changed, 126 insertions(+), 9 deletions(-)
>>>  create mode 100644 arch/s390/kernel/uv.c
> 
> (...)
> 
>>> diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
>>> index d3db3d7ed077..652b36f0efca 100644
>>> --- a/arch/s390/kvm/Kconfig
>>> +++ b/arch/s390/kvm/Kconfig
>>> @@ -55,6 +55,15 @@ config KVM_S390_UCONTROL
>>>
>>>  	  If unsure, say N.
>>>
>>> +config KVM_S390_PROTECTED_VIRTUALIZATION_HOST
>>> +	bool "Protected guests execution support"
>>> +	depends on KVM
>>> +	---help---
>>> +	  Support hosting protected virtual machines isolated from the
>>> +	  hypervisor.
>>> +
>>> +	  If unsure, say Y.
>>> +
>>>  # OK, it's a little counter-intuitive to do this, but it puts it neatly under
>>>  # the virtualization menu.
>>>  source "drivers/vhost/Kconfig"
>>>   
>>
>> As we have the prot_virt kernel paramter there is a way to fence this during runtime
>> Not sure if we really need a build time fence. We could get rid of
>> CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST and just use CONFIG_KVM instead,
>> assuming that in the long run all distros will enable that anyway. 
> 
> I still need to read through the rest of this patch set to have an
> informed opinion on that, which will probably take some more time.
> 
>> If other reviewers prefer to keep that extra option what about the following to the
>> help section:
>>
>> ----
>> Support hosting protected virtual machines in KVM. The state of these machines like
>> memory content or register content is protected from the host or host administrators.
>>
>> Enabling this option will enable extra code that talks to a new firmware instance
> 
> "...that allows the host kernel to talk..." ?

"allows a Linux hypervisor to talk..." ?

> 
>> called ultravisor that will take care of protecting the guest while also enabling
>> KVM to run this guest.
>>
>> This feature must be enable by the kernel command line option prot_virt.
> 
> s/enable by/enabled via/
> 
>>
>> 	  If unsure, say Y.
> 
> Looks better. I'm continuing to read the rest of this series before I
> say more, though :)
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-11-11 16:25       ` Cornelia Huck
  2019-11-11 16:39         ` Janosch Frank
@ 2019-11-13 10:05         ` Thomas Huth
  1 sibling, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-13 10:05 UTC (permalink / raw)
  To: Cornelia Huck, Janosch Frank
  Cc: kvm, linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 749 bytes --]

On 11/11/2019 17.25, Cornelia Huck wrote:
> On Fri, 8 Nov 2019 08:36:35 +0100
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> On 11/7/19 5:29 PM, Cornelia Huck wrote:
[...]
>>>   
>>>> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	int rc;
>>>> +	struct uv_cb_csc uvcb = {
>>>> +		.header.cmd = UVC_CMD_CREATE_SEC_CPU,
>>>> +		.header.len = sizeof(uvcb),
>>>> +	};
>>>> +
>>>> +	/* EEXIST and ENOENT? */  
>>>
>>> ?  
>>
>> I was asking myself if EEXIST or ENOENT would be better error values
>> than EINVAL.
> 
> EEXIST might be better, but I don't really like ENOENT.
> 
>>>   
>>>> +	if (kvm_s390_pv_handle_cpu(vcpu))
>>>> +		return -EINVAL;

FWIW, I'd also vote for EEXIST here.

 Thomas


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-10-24 11:40 ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Janosch Frank
                     ` (3 preceding siblings ...)
  2019-11-08 13:44   ` Thomas Huth
@ 2019-11-13 10:28   ` Thomas Huth
  2019-11-13 11:34     ` Janosch Frank
  2019-11-13 14:03     ` [PATCH] Fix unpack Janosch Frank
  2019-11-13 11:48   ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Cornelia Huck
  5 siblings, 2 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-13 10:28 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> Let's add a KVM interface to create and destroy protected VMs.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
[...]
> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> +		       unsigned long tweak)
> +{
> +	int i, rc = 0;
> +	struct uv_cb_unp uvcb = {
> +		.header.cmd = UVC_CMD_UNPACK_IMG,
> +		.header.len = sizeof(uvcb),
> +		.guest_handle = kvm_s390_pv_handle(kvm),
> +		.tweak[0] = tweak
> +	};
> +
> +	if (addr & ~PAGE_MASK || size & ~PAGE_MASK)
> +		return -EINVAL;

Also check for size == 0 ?

> +
> +

Remove one of the two empty lines, please.

> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",
> +		 addr, size);
> +	for (i = 0; i < size / PAGE_SIZE; i++) {
> +		uvcb.gaddr = addr + i * PAGE_SIZE;
> +		uvcb.tweak[1] = i * PAGE_SIZE;
> +retry:
> +		rc = uv_call(0, (u64)&uvcb);
> +		if (!rc)
> +			continue;
> +		/* If not yet mapped fault and retry */
> +		if (uvcb.header.rc == 0x10a) {
> +			rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
> +					FAULT_FLAG_WRITE);
> +			if (rc)
> +				return rc;
> +			goto retry;
> +		}
> +		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
> +			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
> +		break;

A break at the end of the for-loop ... that's really not what I'd expect.

Could you please invert the logic here, i.e.:

    if (uvcb.header.rc != 0x10a) {
        VM_EVENT(...)
        break;
    }
    rc = gmap_fault(...)
    ...

I think you might even get rid of that ugly "goto", too, that way?

> +	}
> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x rrc %x",
> +		 uvcb.header.rc, uvcb.header.rrc);
> +	return rc;
> +}

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-11-13 10:28   ` Thomas Huth
@ 2019-11-13 11:34     ` Janosch Frank
  2019-11-13 14:03     ` [PATCH] Fix unpack Janosch Frank
  1 sibling, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-13 11:34 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 11/13/19 11:28 AM, Thomas Huth wrote:
> On 24/10/2019 13.40, Janosch Frank wrote:
>> Let's add a KVM interface to create and destroy protected VMs.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
> [...]
>> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
>> +		       unsigned long tweak)
>> +{
>> +	int i, rc = 0;
>> +	struct uv_cb_unp uvcb = {
>> +		.header.cmd = UVC_CMD_UNPACK_IMG,
>> +		.header.len = sizeof(uvcb),
>> +		.guest_handle = kvm_s390_pv_handle(kvm),
>> +		.tweak[0] = tweak
>> +	};
>> +
>> +	if (addr & ~PAGE_MASK || size & ~PAGE_MASK)
>> +		return -EINVAL;
> 
> Also check for size == 0 ?

Yep

> 
>> +
>> +
> 
> Remove one of the two empty lines, please.

Yep

> 
>> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",
>> +		 addr, size);
>> +	for (i = 0; i < size / PAGE_SIZE; i++) {
>> +		uvcb.gaddr = addr + i * PAGE_SIZE;
>> +		uvcb.tweak[1] = i * PAGE_SIZE;
>> +retry:
>> +		rc = uv_call(0, (u64)&uvcb);
>> +		if (!rc)
>> +			continue;
>> +		/* If not yet mapped fault and retry */
>> +		if (uvcb.header.rc == 0x10a) {
>> +			rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
>> +					FAULT_FLAG_WRITE);
>> +			if (rc)
>> +				return rc;
>> +			goto retry;
>> +		}
>> +		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
>> +			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
>> +		break;
> 
> A break at the end of the for-loop ... that's really not what I'd expect.
> 
> Could you please invert the logic here, i.e.:
> 
>     if (uvcb.header.rc != 0x10a) {
>         VM_EVENT(...)
>         break;
>     }
>     rc = gmap_fault(...)
>     ...
> 
> I think you might even get rid of that ugly "goto", too, that way?

But without the goto we would increment i, no?
I'll try to find a solution, maybe using while, but then we need to
manage i incrementation.

> 
>> +	}
>> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x rrc %x",
>> +		 uvcb.header.rc, uvcb.header.rrc);
>> +	return rc;
>> +}
> 
>  Thomas
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling
  2019-10-24 11:40 ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Janosch Frank
                     ` (4 preceding siblings ...)
  2019-11-13 10:28   ` Thomas Huth
@ 2019-11-13 11:48   ` Cornelia Huck
  5 siblings, 0 replies; 213+ messages in thread
From: Cornelia Huck @ 2019-11-13 11:48 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:26 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> +/*
> + * Generic cmd executor for calls that only transport the cpu or guest
> + * handle and the command.
> + */
> +static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
> +{
> +	int rc;
> +	struct uv_cb_nodata uvcb = {
> +		.header.cmd = cmd,
> +		.header.len = sizeof(uvcb),
> +		.handle = handle,
> +	};
> +
> +	WARN(!handle, "No handle provided to Ultravisor call cmd %x\n", cmd);
> +	rc = uv_call(0, (u64)&uvcb);
> +	if (ret)
> +		*ret = *(u32 *)&uvcb.header.rc;
> +	return rc ? -EINVAL : 0;

Why go ahead with doing the uv call if it doesn't have a handle anyway?
Or why warn, if you already know it is going to fail? I assume this can
only happen if you have a logic error in the kvm code?

> +}

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC v2] KVM: s390: protvirt: Secure memory is not mergeable
  2019-10-25  8:24   ` [RFC v2] " Janosch Frank
  2019-11-01 13:02     ` Christian Borntraeger
  2019-11-04 14:32     ` David Hildenbrand
@ 2019-11-13 12:23     ` Thomas Huth
  2019-11-13 15:54       ` Janosch Frank
  2 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-13 12:23 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 25/10/2019 10.24, Janosch Frank wrote:
> KSM will not work on secure pages, because when the kernel reads a
> secure page, it will be encrypted and hence no two pages will look the
> same.
> 
> Let's mark the guest pages as unmergeable when we transition to secure
> mode.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
[...]
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index edcdca97e85e..faecdf81abdb 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
>  }
>  EXPORT_SYMBOL_GPL(s390_enable_sie);
>  
> +int gmap_mark_unmergeable(void)
> +{
> +	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +
> +

Please remove one of the two empty lines.

> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +				MADV_UNMERGEABLE, &vma->vm_flags))
> +			return -ENOMEM;
> +	}
> +	mm->def_flags &= ~VM_MERGEABLE;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gmap_mark_unmergeable);
> +
[...]

Apart from the cosmetic nit, the patch looks fine to me.

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 08/37] KVM: s390: add missing include in gmap.h
  2019-10-24 11:40 ` [RFC 08/37] KVM: s390: add missing include in gmap.h Janosch Frank
  2019-10-25  8:24   ` David Hildenbrand
@ 2019-11-13 12:27   ` Thomas Huth
  1 sibling, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-13 12:27 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> gmap.h references radix trees, but does not include linux/radix-tree.h
> itself. Sources that include gmap.h but not also radix-tree.h will
> therefore fail to compile.
> 
> This simple patch adds the include for linux/radix-tree.h in gmap.h so
> that users of gmap.h will be able to compile.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>  arch/s390/include/asm/gmap.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index eab6a2ec3599..99b3eedda26e 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -10,6 +10,7 @@
>  #define _ASM_S390_GMAP_H
>  
>  #include <linux/refcount.h>
> +#include <linux/radix-tree.h>
>  
>  /* Generic bits for GMAP notification on DAT table entry changes. */
>  #define GMAP_NOTIFY_SHADOW	0x2
> 

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCH] Fix unpack
  2019-11-13 10:28   ` Thomas Huth
  2019-11-13 11:34     ` Janosch Frank
@ 2019-11-13 14:03     ` Janosch Frank
  2019-11-13 14:19       ` Thomas Huth
  2019-11-13 14:36       ` Cornelia Huck
  1 sibling, 2 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-13 14:03 UTC (permalink / raw)
  To: kvm; +Cc: linux-s390, david, thuth

That should be easier to read :)

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/pv.c | 60 +++++++++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 25 deletions(-)

diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 94cf16f40f25..fd73afb33b20 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -195,43 +195,53 @@ int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
 	return 0;
 }
 
-int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
-		       unsigned long tweak)
+static int unpack_one(struct kvm *kvm, unsigned long addr, u64 tweak[2])
 {
-	int i, rc = 0;
+	int rc;
 	struct uv_cb_unp uvcb = {
 		.header.cmd = UVC_CMD_UNPACK_IMG,
 		.header.len = sizeof(uvcb),
 		.guest_handle = kvm_s390_pv_handle(kvm),
-		.tweak[0] = tweak
+		.gaddr = addr,
+		.tweak[0] = tweak[0],
+		.tweak[1] = tweak[1],
 	};
 
-	if (addr & ~PAGE_MASK || size & ~PAGE_MASK)
-		return -EINVAL;
+	rc = uv_call(0, (u64)&uvcb);
+	if (!rc)
+		return rc;
+	if (uvcb.header.rc == 0x10a) {
+		/* If not yet mapped fault and retry */
+		rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
+				FAULT_FLAG_WRITE);
+		if (!rc)
+			return -EAGAIN;
+	}
+	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
+		 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
+	return rc;
+}
 
+int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
+		       unsigned long tweak)
+{
+	int rc = 0;
+	u64 tw[2] = {tweak, 0};
+
+	if (addr & ~PAGE_MASK || !size || size & ~PAGE_MASK)
+		return -EINVAL;
 
 	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",
 		 addr, size);
-	for (i = 0; i < size / PAGE_SIZE; i++) {
-		uvcb.gaddr = addr + i * PAGE_SIZE;
-		uvcb.tweak[1] = i * PAGE_SIZE;
-retry:
-		rc = uv_call(0, (u64)&uvcb);
-		if (!rc)
+	while (tw[1] < size) {
+		rc = unpack_one(kvm, addr, tw);
+		if (rc == -EAGAIN)
 			continue;
-		/* If not yet mapped fault and retry */
-		if (uvcb.header.rc == 0x10a) {
-			rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
-					FAULT_FLAG_WRITE);
-			if (rc)
-				return rc;
-			goto retry;
-		}
-		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
-			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
-		break;
+		if (rc)
+			break;
+		addr += PAGE_SIZE;
+		tw[1] += PAGE_SIZE;
 	}
-	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x rrc %x",
-		 uvcb.header.rc, uvcb.header.rrc);
+	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished rc %x", rc);
 	return rc;
 }
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH] Fix unpack
  2019-11-13 14:03     ` [PATCH] Fix unpack Janosch Frank
@ 2019-11-13 14:19       ` Thomas Huth
  2019-11-13 14:36       ` Cornelia Huck
  1 sibling, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-13 14:19 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: linux-s390, david

On 13/11/2019 15.03, Janosch Frank wrote:
> That should be easier to read :)
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/pv.c | 60 +++++++++++++++++++++++++++-------------------
>  1 file changed, 35 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 94cf16f40f25..fd73afb33b20 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -195,43 +195,53 @@ int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
>  	return 0;
>  }
>  
> -int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> -		       unsigned long tweak)
> +static int unpack_one(struct kvm *kvm, unsigned long addr, u64 tweak[2])
>  {
> -	int i, rc = 0;
> +	int rc;
>  	struct uv_cb_unp uvcb = {
>  		.header.cmd = UVC_CMD_UNPACK_IMG,
>  		.header.len = sizeof(uvcb),
>  		.guest_handle = kvm_s390_pv_handle(kvm),
> -		.tweak[0] = tweak
> +		.gaddr = addr,
> +		.tweak[0] = tweak[0],
> +		.tweak[1] = tweak[1],
>  	};
>  
> -	if (addr & ~PAGE_MASK || size & ~PAGE_MASK)
> -		return -EINVAL;
> +	rc = uv_call(0, (u64)&uvcb);
> +	if (!rc)
> +		return rc;
> +	if (uvcb.header.rc == 0x10a) {
> +		/* If not yet mapped fault and retry */
> +		rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
> +				FAULT_FLAG_WRITE);
> +		if (!rc)
> +			return -EAGAIN;
> +	}
> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
> +		 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
> +	return rc;
> +}
>  
> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> +		       unsigned long tweak)
> +{
> +	int rc = 0;
> +	u64 tw[2] = {tweak, 0};
> +
> +	if (addr & ~PAGE_MASK || !size || size & ~PAGE_MASK)
> +		return -EINVAL;
>  
>  	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",
>  		 addr, size);
> -	for (i = 0; i < size / PAGE_SIZE; i++) {
> -		uvcb.gaddr = addr + i * PAGE_SIZE;
> -		uvcb.tweak[1] = i * PAGE_SIZE;
> -retry:
> -		rc = uv_call(0, (u64)&uvcb);
> -		if (!rc)
> +	while (tw[1] < size) {
> +		rc = unpack_one(kvm, addr, tw);
> +		if (rc == -EAGAIN)
>  			continue;
> -		/* If not yet mapped fault and retry */
> -		if (uvcb.header.rc == 0x10a) {
> -			rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
> -					FAULT_FLAG_WRITE);
> -			if (rc)
> -				return rc;
> -			goto retry;
> -		}
> -		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
> -			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
> -		break;
> +		if (rc)
> +			break;
> +		addr += PAGE_SIZE;
> +		tw[1] += PAGE_SIZE;
>  	}
> -	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x rrc %x",
> -		 uvcb.header.rc, uvcb.header.rrc);
> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished rc %x", rc);
>  	return rc;
>  }
> 

Yes, I think that will be better, thanks!

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH] Fix unpack
  2019-11-13 14:03     ` [PATCH] Fix unpack Janosch Frank
  2019-11-13 14:19       ` Thomas Huth
@ 2019-11-13 14:36       ` Cornelia Huck
  1 sibling, 0 replies; 213+ messages in thread
From: Cornelia Huck @ 2019-11-13 14:36 UTC (permalink / raw)
  To: Janosch Frank; +Cc: kvm, linux-s390, david, thuth

On Wed, 13 Nov 2019 09:03:06 -0500
Janosch Frank <frankja@linux.ibm.com> wrote:

You seem to have dropped some cc:s :(

> That should be easier to read :)
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/pv.c | 60 +++++++++++++++++++++++++++-------------------
>  1 file changed, 35 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 94cf16f40f25..fd73afb33b20 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -195,43 +195,53 @@ int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
>  	return 0;
>  }
>  
> -int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> -		       unsigned long tweak)
> +static int unpack_one(struct kvm *kvm, unsigned long addr, u64 tweak[2])
>  {
> -	int i, rc = 0;
> +	int rc;
>  	struct uv_cb_unp uvcb = {
>  		.header.cmd = UVC_CMD_UNPACK_IMG,
>  		.header.len = sizeof(uvcb),
>  		.guest_handle = kvm_s390_pv_handle(kvm),
> -		.tweak[0] = tweak
> +		.gaddr = addr,
> +		.tweak[0] = tweak[0],
> +		.tweak[1] = tweak[1],
>  	};
>  
> -	if (addr & ~PAGE_MASK || size & ~PAGE_MASK)
> -		return -EINVAL;
> +	rc = uv_call(0, (u64)&uvcb);
> +	if (!rc)
> +		return rc;
> +	if (uvcb.header.rc == 0x10a) {
> +		/* If not yet mapped fault and retry */
> +		rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
> +				FAULT_FLAG_WRITE);
> +		if (!rc)
> +			return -EAGAIN;
> +	}
> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
> +		 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
> +	return rc;
> +}
>  
> +int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> +		       unsigned long tweak)
> +{
> +	int rc = 0;
> +	u64 tw[2] = {tweak, 0};
> +
> +	if (addr & ~PAGE_MASK || !size || size & ~PAGE_MASK)
> +		return -EINVAL;
>  
>  	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",
>  		 addr, size);
> -	for (i = 0; i < size / PAGE_SIZE; i++) {
> -		uvcb.gaddr = addr + i * PAGE_SIZE;
> -		uvcb.tweak[1] = i * PAGE_SIZE;
> -retry:
> -		rc = uv_call(0, (u64)&uvcb);
> -		if (!rc)
> +	while (tw[1] < size) {
> +		rc = unpack_one(kvm, addr, tw);
> +		if (rc == -EAGAIN)
>  			continue;
> -		/* If not yet mapped fault and retry */
> -		if (uvcb.header.rc == 0x10a) {
> -			rc = gmap_fault(kvm->arch.gmap, uvcb.gaddr,
> -					FAULT_FLAG_WRITE);
> -			if (rc)
> -				return rc;
> -			goto retry;
> -		}
> -		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
> -			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
> -		break;
> +		if (rc)
> +			break;
> +		addr += PAGE_SIZE;
> +		tw[1] += PAGE_SIZE;
>  	}
> -	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x rrc %x",
> -		 uvcb.header.rc, uvcb.header.rrc);
> +	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished rc %x", rc);
>  	return rc;
>  }

Yes, this looks more readable.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 15/37] KVM: s390: protvirt: Add machine-check interruption injection controls
  2019-10-24 11:40 ` [RFC 15/37] KVM: s390: protvirt: Add machine-check interruption injection controls Janosch Frank
@ 2019-11-13 14:49   ` Thomas Huth
  2019-11-13 15:57     ` Michael Mueller
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-13 14:49 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> From: Michael Mueller <mimu@linux.ibm.com>
> 
> The following fields are added to the sie control block type 4:
>      - Machine Check Interruption Code (mcic)
>      - External Damage Code (edc)
>      - Failing Storage Address (faddr)
> 
> Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h | 33 +++++++++++++++++++++++---------
>  1 file changed, 24 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 63fc32d38aa9..0ab309b7bf4c 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -261,16 +261,31 @@ struct kvm_s390_sie_block {
>  #define HPID_VSIE	0x5
>  	__u8	hpid;			/* 0x00b8 */
>  	__u8	reservedb9[7];		/* 0x00b9 */
> -	__u32	eiparams;		/* 0x00c0 */
> -	__u16	extcpuaddr;		/* 0x00c4 */
> -	__u16	eic;			/* 0x00c6 */
> +	union {
> +		struct {
> +			__u32	eiparams;	/* 0x00c0 */
> +			__u16	extcpuaddr;	/* 0x00c4 */
> +			__u16	eic;		/* 0x00c6 */
> +		};
> +		__u64	mcic;			/* 0x00c0 */
> +	} __packed;
>  	__u32	reservedc8;		/* 0x00c8 */
> -	__u16	pgmilc;			/* 0x00cc */
> -	__u16	iprcc;			/* 0x00ce */
> -	__u32	dxc;			/* 0x00d0 */
> -	__u16	mcn;			/* 0x00d4 */
> -	__u8	perc;			/* 0x00d6 */
> -	__u8	peratmid;		/* 0x00d7 */
> +	union {
> +		struct {
> +			__u16	pgmilc;		/* 0x00cc */
> +			__u16	iprcc;		/* 0x00ce */
> +		};
> +		__u32	edc;			/* 0x00cc */
> +	} __packed;
> +	union {
> +		struct {
> +			__u32	dxc;		/* 0x00d0 */
> +			__u16	mcn;		/* 0x00d4 */
> +			__u8	perc;		/* 0x00d6 */
> +			__u8	peratmid;	/* 0x00d7 */
> +		};
> +		__u64	faddr;			/* 0x00d0 */
> +	} __packed;

Maybe drop the __packed keywords since the struct members are naturally
aligned anyway?

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC v2] KVM: s390: protvirt: Secure memory is not mergeable
  2019-11-13 12:23     ` Thomas Huth
@ 2019-11-13 15:54       ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-13 15:54 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 1360 bytes --]

On 11/13/19 1:23 PM, Thomas Huth wrote:
> On 25/10/2019 10.24, Janosch Frank wrote:
>> KSM will not work on secure pages, because when the kernel reads a
>> secure page, it will be encrypted and hence no two pages will look the
>> same.
>>
>> Let's mark the guest pages as unmergeable when we transition to secure
>> mode.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
> [...]
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index edcdca97e85e..faecdf81abdb 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -2548,6 +2548,23 @@ int s390_enable_sie(void)
>>  }
>>  EXPORT_SYMBOL_GPL(s390_enable_sie);
>>  
>> +int gmap_mark_unmergeable(void)
>> +{
>> +	struct mm_struct *mm = current->mm;
>> +	struct vm_area_struct *vma;
>> +
>> +
> 
> Please remove one of the two empty lines.

Already gone

> 
>> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
>> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
>> +				MADV_UNMERGEABLE, &vma->vm_flags))
>> +			return -ENOMEM;
>> +	}
>> +	mm->def_flags &= ~VM_MERGEABLE;
>> +

Also this one was removed

>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(gmap_mark_unmergeable);
>> +
> [...]
> 
> Apart from the cosmetic nit, the patch looks fine to me.
> 
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> 
Thanks!



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 15/37] KVM: s390: protvirt: Add machine-check interruption injection controls
  2019-11-13 14:49   ` Thomas Huth
@ 2019-11-13 15:57     ` Michael Mueller
  0 siblings, 0 replies; 213+ messages in thread
From: Michael Mueller @ 2019-11-13 15:57 UTC (permalink / raw)
  To: Thomas Huth, Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, cohuck, gor



On 13.11.19 15:49, Thomas Huth wrote:
> On 24/10/2019 13.40, Janosch Frank wrote:
>> From: Michael Mueller <mimu@linux.ibm.com>
>>
>> The following fields are added to the sie control block type 4:
>>       - Machine Check Interruption Code (mcic)
>>       - External Damage Code (edc)
>>       - Failing Storage Address (faddr)
>>
>> Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/kvm_host.h | 33 +++++++++++++++++++++++---------
>>   1 file changed, 24 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index 63fc32d38aa9..0ab309b7bf4c 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -261,16 +261,31 @@ struct kvm_s390_sie_block {
>>   #define HPID_VSIE	0x5
>>   	__u8	hpid;			/* 0x00b8 */
>>   	__u8	reservedb9[7];		/* 0x00b9 */
>> -	__u32	eiparams;		/* 0x00c0 */
>> -	__u16	extcpuaddr;		/* 0x00c4 */
>> -	__u16	eic;			/* 0x00c6 */
>> +	union {
>> +		struct {
>> +			__u32	eiparams;	/* 0x00c0 */
>> +			__u16	extcpuaddr;	/* 0x00c4 */
>> +			__u16	eic;		/* 0x00c6 */
>> +		};
>> +		__u64	mcic;			/* 0x00c0 */
>> +	} __packed;
>>   	__u32	reservedc8;		/* 0x00c8 */
>> -	__u16	pgmilc;			/* 0x00cc */
>> -	__u16	iprcc;			/* 0x00ce */
>> -	__u32	dxc;			/* 0x00d0 */
>> -	__u16	mcn;			/* 0x00d4 */
>> -	__u8	perc;			/* 0x00d6 */
>> -	__u8	peratmid;		/* 0x00d7 */
>> +	union {
>> +		struct {
>> +			__u16	pgmilc;		/* 0x00cc */
>> +			__u16	iprcc;		/* 0x00ce */
>> +		};
>> +		__u32	edc;			/* 0x00cc */
>> +	} __packed;
>> +	union {
>> +		struct {
>> +			__u32	dxc;		/* 0x00d0 */
>> +			__u16	mcn;		/* 0x00d4 */
>> +			__u8	perc;		/* 0x00d6 */
>> +			__u8	peratmid;	/* 0x00d7 */
>> +		};
>> +		__u64	faddr;			/* 0x00d0 */
>> +	} __packed;
> 
> Maybe drop the __packed keywords since the struct members are naturally
> aligned anyway?
> 
>   Thomas
> 

Thanks, I will give it a try.

Michael

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls
  2019-10-24 11:40 ` [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls Janosch Frank
  2019-10-30 15:53   ` David Hildenbrand
  2019-11-05 17:51   ` Cornelia Huck
@ 2019-11-14 11:48   ` Thomas Huth
  2 siblings, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-14 11:48 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> From: Michael Mueller <mimu@linux.ibm.com>
> 
> Define the interruption injection codes and the related fields in the
> sie control block for PVM interruption injection.
> 
> Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 6cc3b73ca904..82443236d4cc 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -215,7 +215,15 @@ struct kvm_s390_sie_block {
>  	__u8	icptcode;		/* 0x0050 */
>  	__u8	icptstatus;		/* 0x0051 */
>  	__u16	ihcpu;			/* 0x0052 */
> -	__u8	reserved54[2];		/* 0x0054 */
> +	__u8	reserved54;		/* 0x0054 */
> +#define IICTL_CODE_NONE		 0x00
> +#define IICTL_CODE_MCHK		 0x01
> +#define IICTL_CODE_EXT		 0x02
> +#define IICTL_CODE_IO		 0x03
> +#define IICTL_CODE_RESTART	 0x04
> +#define IICTL_CODE_SPECIFICATION 0x10
> +#define IICTL_CODE_OPERAND	 0x11
> +	__u8	iictl;			/* 0x0055 */
>  	__u16	ipa;			/* 0x0056 */
>  	__u32	ipb;			/* 0x0058 */
>  	__u32	scaoh;			/* 0x005c */
> @@ -252,7 +260,8 @@ struct kvm_s390_sie_block {
>  #define HPID_KVM	0x4
>  #define HPID_VSIE	0x5
>  	__u8	hpid;			/* 0x00b8 */
> -	__u8	reservedb9[11];		/* 0x00b9 */
> +	__u8	reservedb9[7];		/* 0x00b9 */
> +	__u32	eiparams;		/* 0x00c0 */
>  	__u16	extcpuaddr;		/* 0x00c4 */
>  	__u16	eic;			/* 0x00c6 */
>  	__u32	reservedc8;		/* 0x00c8 */
> @@ -268,8 +277,16 @@ struct kvm_s390_sie_block {
>  	__u8	oai;			/* 0x00e2 */
>  	__u8	armid;			/* 0x00e3 */
>  	__u8	reservede4[4];		/* 0x00e4 */
> -	__u64	tecmc;			/* 0x00e8 */
> -	__u8	reservedf0[12];		/* 0x00f0 */
> +	union {
> +		__u64	tecmc;		/* 0x00e8 */

I have to admit that I always have to think twice where the compiler
might put the padding in this case. Maybe you could do that manually to
make it obvious and wrap it in a struct, too:

                struct {
			__u64	tecmc;		/* 0x00e8 */
			__u8	reservedf0[4];	/* 0x00f0 */
 		};

?

Just my 0.02 €, though.

 Thomas


> +		struct {
> +			__u16	subchannel_id;	/* 0x00e8 */
> +			__u16	subchannel_nr;	/* 0x00ea */
> +			__u32	io_int_parm;	/* 0x00ec */
> +			__u32	io_int_word;	/* 0x00f0 */
> +		};
> +	} __packed;
> +	__u8	reservedf4[8];		/* 0x00f4 */
>  #define CRYCB_FORMAT_MASK 0x00000003
>  #define CRYCB_FORMAT0 0x00000000
>  #define CRYCB_FORMAT1 0x00000001
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 14/37] KVM: s390: protvirt: Implement interruption injection
  2019-10-24 11:40 ` [RFC 14/37] KVM: s390: protvirt: Implement interruption injection Janosch Frank
  2019-11-04 10:29   ` David Hildenbrand
@ 2019-11-14 12:07   ` Thomas Huth
  1 sibling, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-14 12:07 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> From: Michael Mueller <mimu@linux.ibm.com>
> 
> The patch implements interruption injection for the following
> list of interruption types:
> 
>   - I/O
>     __deliver_io (III)
> 
>   - External
>     __deliver_cpu_timer (IEI)
>     __deliver_ckc (IEI)
>     __deliver_emergency_signal (IEI)
>     __deliver_external_call (IEI)
>     __deliver_service (IEI)
> 
>   - cpu restart
>     __deliver_restart (IRI)
> 
> Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [interrupt masking]
> ---
[...]
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index 165dea4c7f19..c919dfe4dfd3 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -324,8 +324,10 @@ static inline int gisa_tac_ipm_gisc(struct kvm_s390_gisa *gisa, u32 gisc)
>  
>  static inline unsigned long pending_irqs_no_gisa(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu->kvm->arch.float_int.pending_irqs |
> -		vcpu->arch.local_int.pending_irqs;
> +	unsigned long pending = vcpu->kvm->arch.float_int.pending_irqs | vcpu->arch.local_int.pending_irqs;

The line is now pretty long, way more than 80 columns ... maybe keep it
on two lines?

> +
> +	pending &= ~vcpu->kvm->arch.float_int.masked_irqs;
> +	return pending;
>  }
[...]
> @@ -533,7 +549,6 @@ static int __must_check __deliver_pfault_init(struct kvm_vcpu *vcpu)
>  	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
>  					 KVM_S390_INT_PFAULT_INIT,
>  					 0, ext.ext_params2);
> -
>  	rc  = put_guest_lc(vcpu, EXT_IRQ_CP_SERVICE, (u16 *) __LC_EXT_INT_CODE);
>  	rc |= put_guest_lc(vcpu, PFAULT_INIT, (u16 *) __LC_EXT_CPU_ADDR);
>  	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,

I think you can drop this hunk.

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 11/37] DOCUMENTATION: protvirt: Interrupt injection
  2019-10-24 11:40 ` [RFC 11/37] DOCUMENTATION: protvirt: Interrupt injection Janosch Frank
@ 2019-11-14 13:09   ` Cornelia Huck
  2019-11-14 13:25     ` Claudio Imbrenda
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-14 13:09 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:33 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> Interrupt injection has changed a lot for protected guests, as KVM
> can't access the cpus' lowcores. New fields in the state description,
> like the interrupt injection control, and masked values safeguard the
> guest from KVM.
> 
> Let's add some documentation to the interrupt injection basics for
> protected guests.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  Documentation/virtual/kvm/s390-pv.txt | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/s390-pv.txt b/Documentation/virtual/kvm/s390-pv.txt
> index 86ed95f36759..e09f2dc5f164 100644
> --- a/Documentation/virtual/kvm/s390-pv.txt
> +++ b/Documentation/virtual/kvm/s390-pv.txt
> @@ -21,3 +21,30 @@ normally needed to be able to run a VM, some changes have been made in
>  SIE behavior and fields have different meaning for a PVM. SIE exits
>  are minimized as much as possible to improve speed and reduce exposed
>  guest state.
> +
> +
> +Interrupt injection:
> +
> +Interrupt injection is safeguarded by the Ultravisor and, as KVM lost
> +access to the VCPUs' lowcores, is handled via the format 4 state
> +description.
> +
> +Machine check, external, IO and restart interruptions each can be
> +injected on SIE entry via a bit in the interrupt injection control
> +field (offset 0x54). If the guest cpu is not enabled for the interrupt
> +at the time of injection, a validity interception is recognized. The
> +interrupt's data is transported via parts of the interception data
> +block.

"Data associated with the interrupt needs to be placed into the
respective fields in the interception data block to be injected into
the guest."

?

> +
> +Program and Service Call exceptions have another layer of
> +safeguarding, they are only injectable, when instructions have
> +intercepted into KVM and such an exception can be an emulation result.

I find this sentence hard to parse... not sure if I understand it
correctly.

"They can only be injected if the exception can be encountered during
emulation of instructions that had been intercepted into KVM."

?

> +
> +
> +Mask notification interceptions:
> +As a replacement for the lctl(g) and lpsw(e) interception, two new
> +interception codes have been introduced. One which tells us that CRs
> +0, 6 or 14 have been changed and therefore interrupt masking might
> +have changed. And one for PSW bit 13 changes. The CRs and the PSW in

Might be helpful to mention that this bit covers machine checks, which
do not get a separate bit in the control block :)

> +the state description only contain the mask bits and no further info
> +like the current instruction address.

"The CRs and the PSW in the state description only contain the bits
referring to interrupt masking; other fields like e.g. the current
instruction address are zero."

?

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 11/37] DOCUMENTATION: protvirt: Interrupt injection
  2019-11-14 13:09   ` Cornelia Huck
@ 2019-11-14 13:25     ` Claudio Imbrenda
  2019-11-14 13:47       ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: Claudio Imbrenda @ 2019-11-14 13:25 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Janosch Frank, kvm, linux-s390, thuth, david, borntraeger,
	mihajlov, mimu, gor

On Thu, 14 Nov 2019 14:09:46 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, 24 Oct 2019 07:40:33 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
> > Interrupt injection has changed a lot for protected guests, as KVM
> > can't access the cpus' lowcores. New fields in the state
> > description, like the interrupt injection control, and masked
> > values safeguard the guest from KVM.
> > 
> > Let's add some documentation to the interrupt injection basics for
> > protected guests.
> > 
> > Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> > ---
> >  Documentation/virtual/kvm/s390-pv.txt | 27
> > +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
> > 
> > diff --git a/Documentation/virtual/kvm/s390-pv.txt
> > b/Documentation/virtual/kvm/s390-pv.txt index
> > 86ed95f36759..e09f2dc5f164 100644 ---
> > a/Documentation/virtual/kvm/s390-pv.txt +++
> > b/Documentation/virtual/kvm/s390-pv.txt @@ -21,3 +21,30 @@ normally
> > needed to be able to run a VM, some changes have been made in SIE
> > behavior and fields have different meaning for a PVM. SIE exits are
> > minimized as much as possible to improve speed and reduce exposed
> > guest state. +
> > +
> > +Interrupt injection:
> > +
> > +Interrupt injection is safeguarded by the Ultravisor and, as KVM
> > lost +access to the VCPUs' lowcores, is handled via the format 4
> > state +description.
> > +
> > +Machine check, external, IO and restart interruptions each can be
> > +injected on SIE entry via a bit in the interrupt injection control
> > +field (offset 0x54). If the guest cpu is not enabled for the
> > interrupt +at the time of injection, a validity interception is
> > recognized. The +interrupt's data is transported via parts of the
> > interception data +block.  
> 
> "Data associated with the interrupt needs to be placed into the
> respective fields in the interception data block to be injected into
> the guest."
> 
> ?

when a normal guest intercepts an exception, depending on the exception
type, the parameters are saved in the state description at specified
offsets, between 0xC0 amd 0xF8

to perform interrupt injection for secure guests, the same fields are
used to specify the interrupt parameters that should be injected into
the guest

> > +
> > +Program and Service Call exceptions have another layer of
> > +safeguarding, they are only injectable, when instructions have
> > +intercepted into KVM and such an exception can be an emulation
> > result.  
> 
> I find this sentence hard to parse... not sure if I understand it
> correctly.
> 
> "They can only be injected if the exception can be encountered during
> emulation of instructions that had been intercepted into KVM."
 
yes

> 
> > +
> > +
> > +Mask notification interceptions:
> > +As a replacement for the lctl(g) and lpsw(e) interception, two new
> > +interception codes have been introduced. One which tells us that
> > CRs +0, 6 or 14 have been changed and therefore interrupt masking
> > might +have changed. And one for PSW bit 13 changes. The CRs and
> > the PSW in  
> 
> Might be helpful to mention that this bit covers machine checks, which
> do not get a separate bit in the control block :)
> 
> > +the state description only contain the mask bits and no further
> > info +like the current instruction address.  
> 
> "The CRs and the PSW in the state description only contain the bits
> referring to interrupt masking; other fields like e.g. the current
> instruction address are zero."

wait state is saved too

CC is write only, and is only inspected by hardware/firmware when
KVM/qemu is interpreting an instruction that expects a new CC to be set,
and then only the expected CCs are allowed (e.g. if an instruction only
allows CC 0 or 3, 2 cannot be specified)

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 11/37] DOCUMENTATION: protvirt: Interrupt injection
  2019-11-14 13:25     ` Claudio Imbrenda
@ 2019-11-14 13:47       ` Cornelia Huck
  2019-11-14 16:33         ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-14 13:47 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Janosch Frank, kvm, linux-s390, thuth, david, borntraeger,
	mihajlov, mimu, gor

On Thu, 14 Nov 2019 14:25:00 +0100
Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:

> On Thu, 14 Nov 2019 14:09:46 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Thu, 24 Oct 2019 07:40:33 -0400
> > Janosch Frank <frankja@linux.ibm.com> wrote:
> >   
> > > Interrupt injection has changed a lot for protected guests, as KVM
> > > can't access the cpus' lowcores. New fields in the state
> > > description, like the interrupt injection control, and masked
> > > values safeguard the guest from KVM.
> > > 
> > > Let's add some documentation to the interrupt injection basics for
> > > protected guests.
> > > 
> > > Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> > > ---
> > >  Documentation/virtual/kvm/s390-pv.txt | 27
> > > +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
> > > 
> > > diff --git a/Documentation/virtual/kvm/s390-pv.txt
> > > b/Documentation/virtual/kvm/s390-pv.txt index
> > > 86ed95f36759..e09f2dc5f164 100644 ---
> > > a/Documentation/virtual/kvm/s390-pv.txt +++
> > > b/Documentation/virtual/kvm/s390-pv.txt @@ -21,3 +21,30 @@ normally
> > > needed to be able to run a VM, some changes have been made in SIE
> > > behavior and fields have different meaning for a PVM. SIE exits are
> > > minimized as much as possible to improve speed and reduce exposed
> > > guest state. +
> > > +
> > > +Interrupt injection:
> > > +
> > > +Interrupt injection is safeguarded by the Ultravisor and, as KVM
> > > lost +access to the VCPUs' lowcores, is handled via the format 4
> > > state +description.
> > > +
> > > +Machine check, external, IO and restart interruptions each can be
> > > +injected on SIE entry via a bit in the interrupt injection control
> > > +field (offset 0x54). If the guest cpu is not enabled for the
> > > interrupt +at the time of injection, a validity interception is
> > > recognized. The +interrupt's data is transported via parts of the
> > > interception data +block.    
> > 
> > "Data associated with the interrupt needs to be placed into the
> > respective fields in the interception data block to be injected into
> > the guest."
> > 
> > ?  
> 
> when a normal guest intercepts an exception, depending on the exception
> type, the parameters are saved in the state description at specified
> offsets, between 0xC0 amd 0xF8
> 
> to perform interrupt injection for secure guests, the same fields are
> used to specify the interrupt parameters that should be injected into
> the guest

Ok, maybe add that as well.

> 
> > > +
> > > +Program and Service Call exceptions have another layer of
> > > +safeguarding, they are only injectable, when instructions have
> > > +intercepted into KVM and such an exception can be an emulation
> > > result.    
> > 
> > I find this sentence hard to parse... not sure if I understand it
> > correctly.
> > 
> > "They can only be injected if the exception can be encountered during
> > emulation of instructions that had been intercepted into KVM."  
>  
> yes
> 
> >   
> > > +
> > > +
> > > +Mask notification interceptions:
> > > +As a replacement for the lctl(g) and lpsw(e) interception, two new
> > > +interception codes have been introduced. One which tells us that
> > > CRs +0, 6 or 14 have been changed and therefore interrupt masking
> > > might +have changed. And one for PSW bit 13 changes. The CRs and
> > > the PSW in    
> > 
> > Might be helpful to mention that this bit covers machine checks, which
> > do not get a separate bit in the control block :)
> >   
> > > +the state description only contain the mask bits and no further
> > > info +like the current instruction address.    
> > 
> > "The CRs and the PSW in the state description only contain the bits
> > referring to interrupt masking; other fields like e.g. the current
> > instruction address are zero."  
> 
> wait state is saved too
> 
> CC is write only, and is only inspected by hardware/firmware when
> KVM/qemu is interpreting an instruction that expects a new CC to be set,
> and then only the expected CCs are allowed (e.g. if an instruction only
> allows CC 0 or 3, 2 cannot be specified)

So I'm wondering how much of that should go into the document... maybe
just

"The CRs and the PSW in the state description contain less information
than for normal guests: most information that does not refer to
interrupt masking is not available to the hypervisor."

?

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 18/37] KVM: s390: protvirt: Handle spec exception loops
  2019-10-24 11:40 ` [RFC 18/37] KVM: s390: protvirt: Handle spec exception loops Janosch Frank
@ 2019-11-14 14:22   ` Thomas Huth
  0 siblings, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-14 14:22 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> SIE intercept code 8 is used only on exception loops for protected
> guests. That means we need stop the guest when we see it.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/intercept.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
> index acc1710fc472..b013a9c88d43 100644
> --- a/arch/s390/kvm/intercept.c
> +++ b/arch/s390/kvm/intercept.c
> @@ -231,6 +231,13 @@ static int handle_prog(struct kvm_vcpu *vcpu)
>  
>  	vcpu->stat.exit_program_interruption++;
>  
> +	/*
> +	 * Intercept 8 indicates a loop of specification exceptions
> +	 * for protected guests
> +	 */
> +	if (kvm_s390_pv_is_protected(vcpu->kvm))
> +		return -EOPNOTSUPP;
> +
>  	if (guestdbg_enabled(vcpu) && per_event(vcpu)) {
>  		rc = kvm_s390_handle_per_event(vcpu);
>  		if (rc)

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling
  2019-10-24 11:40 ` [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling Janosch Frank
  2019-11-04 11:25   ` David Hildenbrand
@ 2019-11-14 14:44   ` Thomas Huth
  2019-11-14 15:56     ` Janosch Frank
  1 sibling, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-14 14:44 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> Guest registers for protected guests are stored at offset 0x380.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h |  4 +++-
>  arch/s390/kvm/kvm-s390.c         | 11 +++++++++++
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 0ab309b7bf4c..5deabf9734d9 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -336,7 +336,9 @@ struct kvm_s390_itdb {
>  struct sie_page {
>  	struct kvm_s390_sie_block sie_block;
>  	struct mcck_volatile_info mcck_info;	/* 0x0200 */
> -	__u8 reserved218[1000];		/* 0x0218 */
> +	__u8 reserved218[360];		/* 0x0218 */
> +	__u64 pv_grregs[16];		/* 0x380 */
> +	__u8 reserved400[512];

Maybe add a "/* 0x400 */" comment to be consisten with the other lines?

>  	struct kvm_s390_itdb itdb;	/* 0x0600 */
>  	__u8 reserved700[2304];		/* 0x0700 */
>  };

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling
  2019-11-05 14:18             ` David Hildenbrand
@ 2019-11-14 14:46               ` Thomas Huth
  0 siblings, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-14 14:46 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank, Christian Borntraeger, kvm
  Cc: linux-s390, imbrenda, mihajlov, mimu, cohuck, gor

On 05/11/2019 15.18, David Hildenbrand wrote:
> On 05.11.19 15:11, Janosch Frank wrote:
>> On 11/5/19 2:55 PM, David Hildenbrand wrote:
>>> On 05.11.19 13:39, Janosch Frank wrote:
>>>> On 11/5/19 1:01 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 04.11.19 12:25, David Hildenbrand wrote:
>>>>>> On 24.10.19 13:40, Janosch Frank wrote:
>>>>>>> Guest registers for protected guests are stored at offset 0x380.
>>>>>>>
>>>>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>>>>> ---
>>>>>>>     arch/s390/include/asm/kvm_host.h |  4 +++-
>>>>>>>     arch/s390/kvm/kvm-s390.c         | 11 +++++++++++
>>>>>>>     2 files changed, 14 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/arch/s390/include/asm/kvm_host.h
>>>>>>> b/arch/s390/include/asm/kvm_host.h
>>>>>>> index 0ab309b7bf4c..5deabf9734d9 100644
>>>>>>> --- a/arch/s390/include/asm/kvm_host.h
>>>>>>> +++ b/arch/s390/include/asm/kvm_host.h
>>>>>>> @@ -336,7 +336,9 @@ struct kvm_s390_itdb {
>>>>>>>     struct sie_page {
>>>>>>>         struct kvm_s390_sie_block sie_block;
>>>>>>>         struct mcck_volatile_info mcck_info;    /* 0x0200 */
>>>>>>> -    __u8 reserved218[1000];        /* 0x0218 */
>>>>>>> +    __u8 reserved218[360];        /* 0x0218 */
>>>>>>> +    __u64 pv_grregs[16];        /* 0x380 */
>>>>>>> +    __u8 reserved400[512];
>>>>>>>         struct kvm_s390_itdb itdb;    /* 0x0600 */
>>>>>>>         __u8 reserved700[2304];        /* 0x0700 */
>>>>>>>     };
>>>>>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>>>>>> index 490fde080107..97d3a81e5074 100644
>>>>>>> --- a/arch/s390/kvm/kvm-s390.c
>>>>>>> +++ b/arch/s390/kvm/kvm-s390.c
>>>>>>> @@ -3965,6 +3965,7 @@ static int vcpu_post_run(struct kvm_vcpu
>>>>>>> *vcpu, int exit_reason)
>>>>>>>     static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>>>>     {
>>>>>>>         int rc, exit_reason;
>>>>>>> +    struct sie_page *sie_page = (struct sie_page
>>>>>>> *)vcpu->arch.sie_block;
>>>>>>>           /*
>>>>>>>          * We try to hold kvm->srcu during most of vcpu_run
>>>>>>> (except when run-
>>>>>>> @@ -3986,8 +3987,18 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>>>>             guest_enter_irqoff();
>>>>>>>             __disable_cpu_timer_accounting(vcpu);
>>>>>>>             local_irq_enable();
>>>>>>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>>>>>> +            memcpy(sie_page->pv_grregs,
>>>>>>> +                   vcpu->run->s.regs.gprs,
>>>>>>> +                   sizeof(sie_page->pv_grregs));
>>>>>>> +        }
>>>>>>>             exit_reason = sie64a(vcpu->arch.sie_block,
>>>>>>>                          vcpu->run->s.regs.gprs);
>>>>>>> +        if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>>>>>> +            memcpy(vcpu->run->s.regs.gprs,
>>>>>>> +                   sie_page->pv_grregs,
>>>>>>> +                   sizeof(sie_page->pv_grregs));
>>>>>>> +        }
>>>>>>
>>>>>> sie64a will load/save gprs 0-13 from to vcpu->run->s.regs.gprs.
>>>>>>
>>>>>> I would have assume that this is not required for prot virt,
>>>>>> because the HW has direct access via the sie block?
>>>>>
>>>>> Yes, that is correct. The load/save in sie64a is not necessary for
>>>>> pv guests.
>>>>>
>>>>>>
>>>>>>
>>>>>> 1. Would it make sense to have a specialized sie64a() (or a
>>>>>> parameter, e.g., if you pass in NULL in r3), that optimizes this
>>>>>> loading/saving? Eventually we can also optimize which host
>>>>>> registers to save/restore then.
>>>>>
>>>>> Having 2 kinds of sie64a seems not very nice for just saving a
>>>>> small number of cycles.
>>>>>
>>>>>>
>>>>>> 2. Avoid this copying here. We have to store the state to
>>>>>> vcpu->run->s.regs.gprs when returning to user space and restore
>>>>>> the state when coming from user space.
>>>>>
>>>>> I like this proposal better than the first one and
>>>
>>> It was actually an additional proposal :)
>>>
>>> 1. avoids unnecessary saving/loading/saving/restoring
>>> 2. avoids the two memcpy
>>>
>>>>>>
>>>>>> Also, we access the GPRS from interception handlers, there we
>>>>>> might use wrappers like
>>>>>>
>>>>>> kvm_s390_set_gprs()
>>>>>> kvm_s390_get_gprs()
>>>>>
>>>>> having register accessors might be useful anyway.
>>>>> But I would like to defer that to a later point in time to keep the
>>>>> changes in here
>>>>> minimal?
>>>>>
>>>>> We can add a "TODO" comment in here so that we do not forget about
>>>>> this
>>>>> for a future patch. Makes sense?
>>>
>>> While it makes sense, I guess one could come up with a patch for 2. in
>>> less than 30 minutes ... but yeah, whatever you prefer. ;)
>>>
>>
>> Just to get it fully right we'd need to:
>> a. Synchronize registers into/from vcpu run in sync_regs/store_regs
>> b. Sprinkle get/set_gpr(int nr) over most of the files in arch/s390/kvm
>>
>> That's your proposal?
> 
> Yes. Patch 1, factor out gprs access. Patch 2, avoid the memcpy by
> fixing the gprs access functions and removing the memcpys. (both as
> addons to this patch)
> 
> I guess that should be it ... but maybe we'll stumble over surprises :)

That's likely a good idea, but I think I agree with Christian that this
should rather be done in a later, separate patch. This patch set is
already big enough, so I'd prefer to keep it shorter for now and do
optimizations later.

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 17/37] DOCUMENTATION: protvirt: Instruction emulation
  2019-10-24 11:40 ` [RFC 17/37] DOCUMENTATION: protvirt: Instruction emulation Janosch Frank
@ 2019-11-14 15:15   ` Cornelia Huck
  2019-11-14 15:20     ` Claudio Imbrenda
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-14 15:15 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:39 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> As guest memory is inaccessible and information about the guest's
> state is very limited, new ways for instruction emulation have been
> introduced.
> 
> With a bounce area for guest GRs and instruction data, guest state
> leaks can be limited by the Ultravisor. KVM now has to move
> instruction input and output through these areas.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  Documentation/virtual/kvm/s390-pv.txt | 47 +++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/s390-pv.txt b/Documentation/virtual/kvm/s390-pv.txt
> index e09f2dc5f164..cb08d78a7922 100644
> --- a/Documentation/virtual/kvm/s390-pv.txt
> +++ b/Documentation/virtual/kvm/s390-pv.txt
> @@ -48,3 +48,50 @@ interception codes have been introduced. One which tells us that CRs
>  have changed. And one for PSW bit 13 changes. The CRs and the PSW in
>  the state description only contain the mask bits and no further info
>  like the current instruction address.
> +
> +
> +Instruction emulation:
> +With the format 4 state description the SIE instruction already

s/description/description,/

> +interprets more instructions than it does with format 2. As it is not
> +able to interpret all instruction, the SIE and the UV safeguard KVM's

s/instruction/instructions/

> +emulation inputs and outputs.
> +
> +Guest GRs and most of the instruction data, like IO data structures

Hm, what 'IO data structures'?

> +are filtered. Instruction data is copied to and from the Secure
> +Instruction Data Area. Guest GRs are put into / retrieved from the
> +Interception-Data block.
> +
> +The Interception-Data block from the state description's offset 0x380
> +contains GRs 0 - 16. Only GR values needed to emulate an instruction
> +will be copied into this area.
> +
> +The Interception Parameters state description field still contains the
> +the bytes of the instruction text but with pre-set register
> +values. I.e. each instruction always uses the same instruction text,
> +to not leak guest instruction text.
> +
> +The Secure Instruction Data Area contains instruction storage
> +data. Data for diag 500 is exempt from that and has to be moved
> +through shared buffers to KVM.

I find this paragraph a bit confusing. What does that imply for diag
500 interception? Data is still present in gprs 1-4?

(Also, why only diag 500? Because it is the 'reserved for kvm' diagnose
call?)

> +
> +When SIE intercepts an instruction, it will only allow data and
> +program interrupts for this instruction to be moved to the guest via
> +the two data areas discussed before. Other data is ignored or results
> +in validity interceptions.
> +
> +
> +Instruction emulation interceptions:
> +There are two types of SIE secure instruction intercepts. The normal
> +and the notification type. Normal secure instruction intercepts will
> +make the guest pending for instruction completion of the intercepted
> +instruction type, i.e. on SIE entry it is attempted to complete
> +emulation of the instruction with the data provided by KVM. That might
> +be a program exception or instruction completion.
> +
> +The notification type intercepts inform KVM about guest environment
> +changes due to guest instruction interpretation. Such an interception

'interpretation by SIE' ?

> +is recognized for the store prefix instruction and provides the new
> +lowcore location for mapping change notification arming. Any KVM data
> +in the data areas is ignored, program exceptions are not injected and
> +execution continues on next SIE entry, as if no intercept had
> +happened.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 17/37] DOCUMENTATION: protvirt: Instruction emulation
  2019-11-14 15:15   ` Cornelia Huck
@ 2019-11-14 15:20     ` Claudio Imbrenda
  2019-11-14 15:41       ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: Claudio Imbrenda @ 2019-11-14 15:20 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Janosch Frank, kvm, linux-s390, thuth, david, borntraeger,
	mihajlov, mimu, gor

On Thu, 14 Nov 2019 16:15:26 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, 24 Oct 2019 07:40:39 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
> > As guest memory is inaccessible and information about the guest's
> > state is very limited, new ways for instruction emulation have been
> > introduced.
> > 
> > With a bounce area for guest GRs and instruction data, guest state
> > leaks can be limited by the Ultravisor. KVM now has to move
> > instruction input and output through these areas.
> > 
> > Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> > ---
> >  Documentation/virtual/kvm/s390-pv.txt | 47
> > +++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
> > 
> > diff --git a/Documentation/virtual/kvm/s390-pv.txt
> > b/Documentation/virtual/kvm/s390-pv.txt index
> > e09f2dc5f164..cb08d78a7922 100644 ---
> > a/Documentation/virtual/kvm/s390-pv.txt +++
> > b/Documentation/virtual/kvm/s390-pv.txt @@ -48,3 +48,50 @@
> > interception codes have been introduced. One which tells us that
> > CRs have changed. And one for PSW bit 13 changes. The CRs and the
> > PSW in the state description only contain the mask bits and no
> > further info like the current instruction address. +
> > +
> > +Instruction emulation:
> > +With the format 4 state description the SIE instruction already  
> 
> s/description/description,/
> 
> > +interprets more instructions than it does with format 2. As it is
> > not +able to interpret all instruction, the SIE and the UV
> > safeguard KVM's  
> 
> s/instruction/instructions/
> 
> > +emulation inputs and outputs.
> > +
> > +Guest GRs and most of the instruction data, like IO data
> > structures  
> 
> Hm, what 'IO data structures'?

the various IRB and ORB of I/O instructions

> > +are filtered. Instruction data is copied to and from the Secure
> > +Instruction Data Area. Guest GRs are put into / retrieved from the
> > +Interception-Data block.
> > +
> > +The Interception-Data block from the state description's offset
> > 0x380 +contains GRs 0 - 16. Only GR values needed to emulate an
> > instruction +will be copied into this area.
> > +
> > +The Interception Parameters state description field still contains
> > the +the bytes of the instruction text but with pre-set register
> > +values. I.e. each instruction always uses the same instruction
> > text, +to not leak guest instruction text.
> > +
> > +The Secure Instruction Data Area contains instruction storage
> > +data. Data for diag 500 is exempt from that and has to be moved
> > +through shared buffers to KVM.  
> 
> I find this paragraph a bit confusing. What does that imply for diag
> 500 interception? Data is still present in gprs 1-4?

no registers are leaked in the registers. registers are always only
exposed through the state description.

> (Also, why only diag 500? Because it is the 'reserved for kvm'
> diagnose call?)
> 
> > +
> > +When SIE intercepts an instruction, it will only allow data and
> > +program interrupts for this instruction to be moved to the guest
> > via +the two data areas discussed before. Other data is ignored or
> > results +in validity interceptions.
> > +
> > +
> > +Instruction emulation interceptions:
> > +There are two types of SIE secure instruction intercepts. The
> > normal +and the notification type. Normal secure instruction
> > intercepts will +make the guest pending for instruction completion
> > of the intercepted +instruction type, i.e. on SIE entry it is
> > attempted to complete +emulation of the instruction with the data
> > provided by KVM. That might +be a program exception or instruction
> > completion. +
> > +The notification type intercepts inform KVM about guest environment
> > +changes due to guest instruction interpretation. Such an
> > interception  
> 
> 'interpretation by SIE' ?
> 
> > +is recognized for the store prefix instruction and provides the new
> > +lowcore location for mapping change notification arming. Any KVM
> > data +in the data areas is ignored, program exceptions are not
> > injected and +execution continues on next SIE entry, as if no
> > intercept had +happened.  
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 20/37] KVM: S390: protvirt: Introduce instruction data area bounce buffer
  2019-10-24 11:40 ` [RFC 20/37] KVM: S390: protvirt: Introduce instruction data area bounce buffer Janosch Frank
@ 2019-11-14 15:36   ` Thomas Huth
  2019-11-14 16:04     ` Janosch Frank
  2019-11-14 16:21     ` [PATCH] Fixup sida bouncing Janosch Frank
  0 siblings, 2 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-14 15:36 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> Now that we can't access guest memory anymore, we have a dedicated
> sattelite block that's a bounce buffer for instruction data.

"satellite block that is ..."

> We re-use the memop interface to copy the instruction data to / from
> userspace. This lets us re-use a lot of QEMU code which used that
> interface to make logical guest memory accesses which are not possible
> anymore in protected mode anyway.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h |  5 ++++-
>  arch/s390/kvm/kvm-s390.c         | 31 +++++++++++++++++++++++++++++++
>  arch/s390/kvm/pv.c               |  9 +++++++++
>  3 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 5deabf9734d9..2a8a1e21e1c3 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -308,7 +308,10 @@ struct kvm_s390_sie_block {
>  #define CRYCB_FORMAT2 0x00000003
>  	__u32	crycbd;			/* 0x00fc */
>  	__u64	gcr[16];		/* 0x0100 */
> -	__u64	gbea;			/* 0x0180 */
> +	union {
> +		__u64	gbea;			/* 0x0180 */
> +		__u64	sidad;
> +	};
>  	__u8    reserved188[8];		/* 0x0188 */
>  	__u64   sdnxo;			/* 0x0190 */
>  	__u8    reserved198[8];		/* 0x0198 */
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 97d3a81e5074..6747cb6cf062 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -4416,6 +4416,13 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>  	if (mop->size > MEM_OP_MAX_SIZE)
>  		return -E2BIG;
>  
> +	/* Protected guests move instruction data over the satellite
> +	 * block which has its own size limit
> +	 */
> +	if (kvm_s390_pv_is_protected(vcpu->kvm) &&
> +	    mop->size > ((vcpu->arch.sie_block->sidad & 0x0f) + 1) * PAGE_SIZE)
> +		return -E2BIG;
> +
>  	if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
>  		tmpbuf = vmalloc(mop->size);
>  		if (!tmpbuf)
> @@ -4427,10 +4434,22 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>  	switch (mop->op) {
>  	case KVM_S390_MEMOP_LOGICAL_READ:
>  		if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
> +			if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +				r = 0;
> +				break;

Please add a short comment to the code why this is required / ok.

> +			}
>  			r = check_gva_range(vcpu, mop->gaddr, mop->ar,
>  					    mop->size, GACC_FETCH);
>  			break;
>  		}
> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +			r = 0;
> +			if (copy_to_user(uaddr, (void *)vcpu->arch.sie_block->sidad +
> +					 (mop->gaddr & ~PAGE_MASK),

That looks bogus. Couldn't userspace use mop->gaddr = 4095 and mop->size
= 4095 to read most of the page beyond the sidad page (assuming that it
is mapped, too)?
I think you have to take mop->gaddr into account in your new check at
the beginning of the function, too.

Or should the ioctl maybe even be restricted to mop->gaddr == 0 now? Is
there maybe also a way to validate that gaddr & PAGE_MASK really matches
the page that we have in sidad?

> +					 mop->size))
> +				r = -EFAULT;
> +			break;
> +		}
>  		r = read_guest(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size);
>  		if (r == 0) {
>  			if (copy_to_user(uaddr, tmpbuf, mop->size))
> @@ -4439,10 +4458,22 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>  		break;
>  	case KVM_S390_MEMOP_LOGICAL_WRITE:
>  		if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
> +			if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +				r = 0;
> +				break;
> +			}
>  			r = check_gva_range(vcpu, mop->gaddr, mop->ar,
>  					    mop->size, GACC_STORE);
>  			break;
>  		}
> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +			r = 0;
> +			if (copy_from_user((void *)vcpu->arch.sie_block->sidad +
> +					   (mop->gaddr & ~PAGE_MASK), uaddr,
> +					   mop->size))

dito, of course.

> +				r = -EFAULT;
> +			break;
> +		}
>  		if (copy_from_user(tmpbuf, uaddr, mop->size)) {
>  			r = -EFAULT;
>  			break;

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 21/37] KVM: S390: protvirt: Instruction emulation
  2019-10-24 11:40 ` [RFC 21/37] KVM: S390: protvirt: Instruction emulation Janosch Frank
@ 2019-11-14 15:38   ` Cornelia Huck
  2019-11-14 16:00     ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-14 15:38 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:43 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> We have two new SIE exit codes 104 for a secure instruction
> interception, on which the SIE needs hypervisor action to complete the
> instruction.
> 
> And 108 which is merely a notification and provides data for tracking
> and management, like for the lowcore we set notification bits for the
> lowcore pages.

What about the following:

"With protected virtualization, we have two new SIE exit codes:

- 104 indicates a secure instruction interception; the hypervisor needs
  to complete emulation of the instruction.
- 108 is merely a notification providing data for tracking and
  management in the hypervisor; for example, we set notification bits
  for the lowcore pages."

?

> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h |  2 ++
>  arch/s390/kvm/intercept.c        | 23 +++++++++++++++++++++++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 2a8a1e21e1c3..a42dfe98128b 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -212,6 +212,8 @@ struct kvm_s390_sie_block {
>  #define ICPT_KSS	0x5c
>  #define ICPT_PV_MCHKR	0x60
>  #define ICPT_PV_INT_EN	0x64
> +#define ICPT_PV_INSTR	0x68
> +#define ICPT_PV_NOT	0x6c

Maybe ICPT_PV_NOTIF?

>  	__u8	icptcode;		/* 0x0050 */
>  	__u8	icptstatus;		/* 0x0051 */
>  	__u16	ihcpu;			/* 0x0052 */
> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
> index b013a9c88d43..a1df8a43c88b 100644
> --- a/arch/s390/kvm/intercept.c
> +++ b/arch/s390/kvm/intercept.c
> @@ -451,6 +451,23 @@ static int handle_operexc(struct kvm_vcpu *vcpu)
>  	return kvm_s390_inject_program_int(vcpu, PGM_OPERATION);
>  }
>  
> +static int handle_pv_spx(struct kvm_vcpu *vcpu)
> +{
> +	u32 pref = *(u32 *)vcpu->arch.sie_block->sidad;
> +
> +	kvm_s390_set_prefix(vcpu, pref);
> +	trace_kvm_s390_handle_prefix(vcpu, 1, pref);
> +	return 0;
> +}
> +
> +static int handle_pv_not(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.sie_block->ipa == 0xb210)
> +		return handle_pv_spx(vcpu);
> +
> +	return handle_instruction(vcpu);

Hm... if I understood it correctly, we are getting this one because the
SIE informs us about things that it handled itself (but which we
should be aware of). What can handle_instruction() do in this case?

> +}
> +
>  int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
>  {
>  	int rc, per_rc = 0;
> @@ -505,6 +522,12 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
>  		 */
>  		rc = 0;
>  	break;
> +	case ICPT_PV_INSTR:
> +		rc = handle_instruction(vcpu);
> +		break;
> +	case ICPT_PV_NOT:
> +		rc = handle_pv_not(vcpu);
> +		break;
>  	default:
>  		return -EOPNOTSUPP;
>  	}

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 17/37] DOCUMENTATION: protvirt: Instruction emulation
  2019-11-14 15:20     ` Claudio Imbrenda
@ 2019-11-14 15:41       ` Cornelia Huck
  2019-11-14 15:55         ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-14 15:41 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Janosch Frank, kvm, linux-s390, thuth, david, borntraeger,
	mihajlov, mimu, gor

On Thu, 14 Nov 2019 16:20:24 +0100
Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:

> On Thu, 14 Nov 2019 16:15:26 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Thu, 24 Oct 2019 07:40:39 -0400
> > Janosch Frank <frankja@linux.ibm.com> wrote:
> >   
> > > As guest memory is inaccessible and information about the guest's
> > > state is very limited, new ways for instruction emulation have been
> > > introduced.
> > > 
> > > With a bounce area for guest GRs and instruction data, guest state
> > > leaks can be limited by the Ultravisor. KVM now has to move
> > > instruction input and output through these areas.
> > > 
> > > Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> > > ---
> > >  Documentation/virtual/kvm/s390-pv.txt | 47
> > > +++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
> > > 
> > > diff --git a/Documentation/virtual/kvm/s390-pv.txt
> > > b/Documentation/virtual/kvm/s390-pv.txt index
> > > e09f2dc5f164..cb08d78a7922 100644 ---
> > > a/Documentation/virtual/kvm/s390-pv.txt +++
> > > b/Documentation/virtual/kvm/s390-pv.txt @@ -48,3 +48,50 @@
> > > interception codes have been introduced. One which tells us that
> > > CRs have changed. And one for PSW bit 13 changes. The CRs and the
> > > PSW in the state description only contain the mask bits and no
> > > further info like the current instruction address. +
> > > +
> > > +Instruction emulation:
> > > +With the format 4 state description the SIE instruction already    
> > 
> > s/description/description,/
> >   
> > > +interprets more instructions than it does with format 2. As it is
> > > not +able to interpret all instruction, the SIE and the UV
> > > safeguard KVM's    
> > 
> > s/instruction/instructions/
> >   
> > > +emulation inputs and outputs.
> > > +
> > > +Guest GRs and most of the instruction data, like IO data
> > > structures    
> > 
> > Hm, what 'IO data structures'?  
> 
> the various IRB and ORB of I/O instructions

Would be good to mention them as examples :)

> 
> > > +are filtered. Instruction data is copied to and from the Secure
> > > +Instruction Data Area. Guest GRs are put into / retrieved from the
> > > +Interception-Data block.
> > > +
> > > +The Interception-Data block from the state description's offset
> > > 0x380 +contains GRs 0 - 16. Only GR values needed to emulate an
> > > instruction +will be copied into this area.
> > > +
> > > +The Interception Parameters state description field still contains
> > > the +the bytes of the instruction text but with pre-set register
> > > +values. I.e. each instruction always uses the same instruction
> > > text, +to not leak guest instruction text.
> > > +
> > > +The Secure Instruction Data Area contains instruction storage
> > > +data. Data for diag 500 is exempt from that and has to be moved
> > > +through shared buffers to KVM.    
> > 
> > I find this paragraph a bit confusing. What does that imply for diag
> > 500 interception? Data is still present in gprs 1-4?  
> 
> no registers are leaked in the registers. registers are always only
> exposed through the state description.

So, what is so special about diag 500, then?

> 
> > (Also, why only diag 500? Because it is the 'reserved for kvm'
> > diagnose call?)
> >   
> > > +
> > > +When SIE intercepts an instruction, it will only allow data and
> > > +program interrupts for this instruction to be moved to the guest
> > > via +the two data areas discussed before. Other data is ignored or
> > > results +in validity interceptions.
> > > +
> > > +
> > > +Instruction emulation interceptions:
> > > +There are two types of SIE secure instruction intercepts. The
> > > normal +and the notification type. Normal secure instruction
> > > intercepts will +make the guest pending for instruction completion
> > > of the intercepted +instruction type, i.e. on SIE entry it is
> > > attempted to complete +emulation of the instruction with the data
> > > provided by KVM. That might +be a program exception or instruction
> > > completion. +
> > > +The notification type intercepts inform KVM about guest environment
> > > +changes due to guest instruction interpretation. Such an
> > > interception    
> > 
> > 'interpretation by SIE' ?
> >   
> > > +is recognized for the store prefix instruction and provides the new
> > > +lowcore location for mapping change notification arming. Any KVM
> > > data +in the data areas is ignored, program exceptions are not
> > > injected and +execution continues on next SIE entry, as if no
> > > intercept had +happened.    
> >   
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 17/37] DOCUMENTATION: protvirt: Instruction emulation
  2019-11-14 15:41       ` Cornelia Huck
@ 2019-11-14 15:55         ` Janosch Frank
  2019-11-14 16:03           ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-14 15:55 UTC (permalink / raw)
  To: Cornelia Huck, Claudio Imbrenda
  Cc: kvm, linux-s390, thuth, david, borntraeger, mihajlov, mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 5012 bytes --]

On 11/14/19 4:41 PM, Cornelia Huck wrote:
> On Thu, 14 Nov 2019 16:20:24 +0100
> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> 
>> On Thu, 14 Nov 2019 16:15:26 +0100
>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>> On Thu, 24 Oct 2019 07:40:39 -0400
>>> Janosch Frank <frankja@linux.ibm.com> wrote:
>>>   
>>>> As guest memory is inaccessible and information about the guest's
>>>> state is very limited, new ways for instruction emulation have been
>>>> introduced.
>>>>
>>>> With a bounce area for guest GRs and instruction data, guest state
>>>> leaks can be limited by the Ultravisor. KVM now has to move
>>>> instruction input and output through these areas.
>>>>
>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>> ---
>>>>  Documentation/virtual/kvm/s390-pv.txt | 47
>>>> +++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
>>>>
>>>> diff --git a/Documentation/virtual/kvm/s390-pv.txt
>>>> b/Documentation/virtual/kvm/s390-pv.txt index
>>>> e09f2dc5f164..cb08d78a7922 100644 ---
>>>> a/Documentation/virtual/kvm/s390-pv.txt +++
>>>> b/Documentation/virtual/kvm/s390-pv.txt @@ -48,3 +48,50 @@
>>>> interception codes have been introduced. One which tells us that
>>>> CRs have changed. And one for PSW bit 13 changes. The CRs and the
>>>> PSW in the state description only contain the mask bits and no
>>>> further info like the current instruction address. +
>>>> +
>>>> +Instruction emulation:
>>>> +With the format 4 state description the SIE instruction already    
>>>
>>> s/description/description,/
>>>   
>>>> +interprets more instructions than it does with format 2. As it is
>>>> not +able to interpret all instruction, the SIE and the UV
>>>> safeguard KVM's    
>>>
>>> s/instruction/instructions/
>>>   
>>>> +emulation inputs and outputs.
>>>> +
>>>> +Guest GRs and most of the instruction data, like IO data
>>>> structures    
>>>
>>> Hm, what 'IO data structures'?  
>>
>> the various IRB and ORB of I/O instructions
> 
> Would be good to mention them as examples :)
> 
>>
>>>> +are filtered. Instruction data is copied to and from the Secure
>>>> +Instruction Data Area. Guest GRs are put into / retrieved from the
>>>> +Interception-Data block.
>>>> +
>>>> +The Interception-Data block from the state description's offset
>>>> 0x380 +contains GRs 0 - 16. Only GR values needed to emulate an
>>>> instruction +will be copied into this area.
>>>> +
>>>> +The Interception Parameters state description field still contains
>>>> the +the bytes of the instruction text but with pre-set register
>>>> +values. I.e. each instruction always uses the same instruction
>>>> text, +to not leak guest instruction text.
>>>> +
>>>> +The Secure Instruction Data Area contains instruction storage
>>>> +data. Data for diag 500 is exempt from that and has to be moved
>>>> +through shared buffers to KVM.    
>>>
>>> I find this paragraph a bit confusing. What does that imply for diag
>>> 500 interception? Data is still present in gprs 1-4?  
>>
>> no registers are leaked in the registers. registers are always only
>> exposed through the state description.
> 
> So, what is so special about diag 500, then?

That's mostly a confusion on my side.
The SIDAD is 4k max, so we can only move IO "management" data over it
like ORBs and stuff. My intention was to point out, that the data which
is to be transferred (disk contents, etc.) can't go over the SIDAD but
needs to be in a shared page.

diag500 was mostly a notification mechanism without a lot of data, right?

> 
>>
>>> (Also, why only diag 500? Because it is the 'reserved for kvm'
>>> diagnose call?)
>>>   
>>>> +
>>>> +When SIE intercepts an instruction, it will only allow data and
>>>> +program interrupts for this instruction to be moved to the guest
>>>> via +the two data areas discussed before. Other data is ignored or
>>>> results +in validity interceptions.
>>>> +
>>>> +
>>>> +Instruction emulation interceptions:
>>>> +There are two types of SIE secure instruction intercepts. The
>>>> normal +and the notification type. Normal secure instruction
>>>> intercepts will +make the guest pending for instruction completion
>>>> of the intercepted +instruction type, i.e. on SIE entry it is
>>>> attempted to complete +emulation of the instruction with the data
>>>> provided by KVM. That might +be a program exception or instruction
>>>> completion. +
>>>> +The notification type intercepts inform KVM about guest environment
>>>> +changes due to guest instruction interpretation. Such an
>>>> interception    
>>>
>>> 'interpretation by SIE' ?
>>>   
>>>> +is recognized for the store prefix instruction and provides the new
>>>> +lowcore location for mapping change notification arming. Any KVM
>>>> data +in the data areas is ignored, program exceptions are not
>>>> injected and +execution continues on next SIE entry, as if no
>>>> intercept had +happened.    
>>>   
>>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling
  2019-11-14 14:44   ` Thomas Huth
@ 2019-11-14 15:56     ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-14 15:56 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 1139 bytes --]

On 11/14/19 3:44 PM, Thomas Huth wrote:
> On 24/10/2019 13.40, Janosch Frank wrote:
>> Guest registers for protected guests are stored at offset 0x380.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/kvm_host.h |  4 +++-
>>  arch/s390/kvm/kvm-s390.c         | 11 +++++++++++
>>  2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index 0ab309b7bf4c..5deabf9734d9 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -336,7 +336,9 @@ struct kvm_s390_itdb {
>>  struct sie_page {
>>  	struct kvm_s390_sie_block sie_block;
>>  	struct mcck_volatile_info mcck_info;	/* 0x0200 */
>> -	__u8 reserved218[1000];		/* 0x0218 */
>> +	__u8 reserved218[360];		/* 0x0218 */
>> +	__u64 pv_grregs[16];		/* 0x380 */
>> +	__u8 reserved400[512];
> 
> Maybe add a "/* 0x400 */" comment to be consisten with the other lines?

Sure

> 
>>  	struct kvm_s390_itdb itdb;	/* 0x0600 */
>>  	__u8 reserved700[2304];		/* 0x0700 */
>>  };
> 
>  Thomas
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 21/37] KVM: S390: protvirt: Instruction emulation
  2019-11-14 15:38   ` Cornelia Huck
@ 2019-11-14 16:00     ` Janosch Frank
  2019-11-14 16:05       ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-14 16:00 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 3335 bytes --]

On 11/14/19 4:38 PM, Cornelia Huck wrote:
> On Thu, 24 Oct 2019 07:40:43 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> We have two new SIE exit codes 104 for a secure instruction
>> interception, on which the SIE needs hypervisor action to complete the
>> instruction.
>>
>> And 108 which is merely a notification and provides data for tracking
>> and management, like for the lowcore we set notification bits for the
>> lowcore pages.
> 
> What about the following:
> 
> "With protected virtualization, we have two new SIE exit codes:
> 
> - 104 indicates a secure instruction interception; the hypervisor needs
>   to complete emulation of the instruction.
> - 108 is merely a notification providing data for tracking and
>   management in the hypervisor; for example, we set notification bits
>   for the lowcore pages."
> 
> ?
> 
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/kvm_host.h |  2 ++
>>  arch/s390/kvm/intercept.c        | 23 +++++++++++++++++++++++
>>  2 files changed, 25 insertions(+)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index 2a8a1e21e1c3..a42dfe98128b 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -212,6 +212,8 @@ struct kvm_s390_sie_block {
>>  #define ICPT_KSS	0x5c
>>  #define ICPT_PV_MCHKR	0x60
>>  #define ICPT_PV_INT_EN	0x64
>> +#define ICPT_PV_INSTR	0x68
>> +#define ICPT_PV_NOT	0x6c
> 
> Maybe ICPT_PV_NOTIF?

NOTF?

> 
>>  	__u8	icptcode;		/* 0x0050 */
>>  	__u8	icptstatus;		/* 0x0051 */
>>  	__u16	ihcpu;			/* 0x0052 */
>> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
>> index b013a9c88d43..a1df8a43c88b 100644
>> --- a/arch/s390/kvm/intercept.c
>> +++ b/arch/s390/kvm/intercept.c
>> @@ -451,6 +451,23 @@ static int handle_operexc(struct kvm_vcpu *vcpu)
>>  	return kvm_s390_inject_program_int(vcpu, PGM_OPERATION);
>>  }
>>  
>> +static int handle_pv_spx(struct kvm_vcpu *vcpu)
>> +{
>> +	u32 pref = *(u32 *)vcpu->arch.sie_block->sidad;
>> +
>> +	kvm_s390_set_prefix(vcpu, pref);
>> +	trace_kvm_s390_handle_prefix(vcpu, 1, pref);
>> +	return 0;
>> +}
>> +
>> +static int handle_pv_not(struct kvm_vcpu *vcpu)
>> +{
>> +	if (vcpu->arch.sie_block->ipa == 0xb210)
>> +		return handle_pv_spx(vcpu);
>> +
>> +	return handle_instruction(vcpu);
> 
> Hm... if I understood it correctly, we are getting this one because the
> SIE informs us about things that it handled itself (but which we
> should be aware of). What can handle_instruction() do in this case?

There used to be an instruction which I could just pipe through normal
instruction handling. But I can't really remember what it was, too many
firmware changes in that area since then.

I'll mark it as a TODO for thinking about it with some coffee.

> 
>> +}
>> +
>>  int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
>>  {
>>  	int rc, per_rc = 0;
>> @@ -505,6 +522,12 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
>>  		 */
>>  		rc = 0;
>>  	break;
>> +	case ICPT_PV_INSTR:
>> +		rc = handle_instruction(vcpu);
>> +		break;
>> +	case ICPT_PV_NOT:
>> +		rc = handle_pv_not(vcpu);
>> +		break;
>>  	default:
>>  		return -EOPNOTSUPP;
>>  	}
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 17/37] DOCUMENTATION: protvirt: Instruction emulation
  2019-11-14 15:55         ` Janosch Frank
@ 2019-11-14 16:03           ` Cornelia Huck
  2019-11-14 16:18             ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-14 16:03 UTC (permalink / raw)
  To: Janosch Frank
  Cc: Claudio Imbrenda, kvm, linux-s390, thuth, david, borntraeger,
	mihajlov, mimu, gor

[-- Attachment #1: Type: text/plain, Size: 1526 bytes --]

On Thu, 14 Nov 2019 16:55:46 +0100
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 11/14/19 4:41 PM, Cornelia Huck wrote:
> > On Thu, 14 Nov 2019 16:20:24 +0100
> > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> >   
> >> On Thu, 14 Nov 2019 16:15:26 +0100
> >> Cornelia Huck <cohuck@redhat.com> wrote:
> >>  
> >>> On Thu, 24 Oct 2019 07:40:39 -0400
> >>> Janosch Frank <frankja@linux.ibm.com> wrote:

> >>>> +The Secure Instruction Data Area contains instruction storage
> >>>> +data. Data for diag 500 is exempt from that and has to be moved
> >>>> +through shared buffers to KVM.      
> >>>
> >>> I find this paragraph a bit confusing. What does that imply for diag
> >>> 500 interception? Data is still present in gprs 1-4?    
> >>
> >> no registers are leaked in the registers. registers are always only
> >> exposed through the state description.  
> > 
> > So, what is so special about diag 500, then?  
> 
> That's mostly a confusion on my side.
> The SIDAD is 4k max, so we can only move IO "management" data over it
> like ORBs and stuff. My intention was to point out, that the data which
> is to be transferred (disk contents, etc.) can't go over the SIDAD but
> needs to be in a shared page.
> 
> diag500 was mostly a notification mechanism without a lot of data, right?

Yes; the main information in there are the schid identifying the
subchannel, the virtqueue number, and a cookie value, all of which fit
into the registers.

So this goes via the sidad as well?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 20/37] KVM: S390: protvirt: Introduce instruction data area bounce buffer
  2019-11-14 15:36   ` Thomas Huth
@ 2019-11-14 16:04     ` Janosch Frank
  2019-11-14 16:21     ` [PATCH] Fixup sida bouncing Janosch Frank
  1 sibling, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-14 16:04 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 4561 bytes --]

On 11/14/19 4:36 PM, Thomas Huth wrote:
> On 24/10/2019 13.40, Janosch Frank wrote:
>> Now that we can't access guest memory anymore, we have a dedicated
>> sattelite block that's a bounce buffer for instruction data.
> 
> "satellite block that is ..."
> 
>> We re-use the memop interface to copy the instruction data to / from
>> userspace. This lets us re-use a lot of QEMU code which used that
>> interface to make logical guest memory accesses which are not possible
>> anymore in protected mode anyway.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/kvm_host.h |  5 ++++-
>>  arch/s390/kvm/kvm-s390.c         | 31 +++++++++++++++++++++++++++++++
>>  arch/s390/kvm/pv.c               |  9 +++++++++
>>  3 files changed, 44 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index 5deabf9734d9..2a8a1e21e1c3 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -308,7 +308,10 @@ struct kvm_s390_sie_block {
>>  #define CRYCB_FORMAT2 0x00000003
>>  	__u32	crycbd;			/* 0x00fc */
>>  	__u64	gcr[16];		/* 0x0100 */
>> -	__u64	gbea;			/* 0x0180 */
>> +	union {
>> +		__u64	gbea;			/* 0x0180 */
>> +		__u64	sidad;
>> +	};
>>  	__u8    reserved188[8];		/* 0x0188 */
>>  	__u64   sdnxo;			/* 0x0190 */
>>  	__u8    reserved198[8];		/* 0x0198 */
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 97d3a81e5074..6747cb6cf062 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -4416,6 +4416,13 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>>  	if (mop->size > MEM_OP_MAX_SIZE)
>>  		return -E2BIG;
>>  
>> +	/* Protected guests move instruction data over the satellite
>> +	 * block which has its own size limit
>> +	 */
>> +	if (kvm_s390_pv_is_protected(vcpu->kvm) &&
>> +	    mop->size > ((vcpu->arch.sie_block->sidad & 0x0f) + 1) * PAGE_SIZE)
>> +		return -E2BIG;
>> +
>>  	if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
>>  		tmpbuf = vmalloc(mop->size);
>>  		if (!tmpbuf)
>> @@ -4427,10 +4434,22 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>>  	switch (mop->op) {
>>  	case KVM_S390_MEMOP_LOGICAL_READ:
>>  		if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
>> +			if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +				r = 0;
>> +				break;
> 
> Please add a short comment to the code why this is required / ok.
> 
>> +			}
>>  			r = check_gva_range(vcpu, mop->gaddr, mop->ar,
>>  					    mop->size, GACC_FETCH);
>>  			break;
>>  		}
>> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +			r = 0;
>> +			if (copy_to_user(uaddr, (void *)vcpu->arch.sie_block->sidad +
>> +					 (mop->gaddr & ~PAGE_MASK),
> 
> That looks bogus. Couldn't userspace use mop->gaddr = 4095 and mop->size
> = 4095 to read most of the page beyond the sidad page (assuming that it
> is mapped, too)?
> I think you have to take mop->gaddr into account in your new check at
> the beginning of the function, too.

Ah, right, that needs some fixing.

> 
> Or should the ioctl maybe even be restricted to mop->gaddr == 0 now? Is
> there maybe also a way to validate that gaddr & PAGE_MASK really matches
> the page that we have in sidad?

There was one lonely usage of the ioctl where we still read from an
offset, either in IO or SCLP. Having 0 as a requirement would certainly
help, but I was a bit afraid of changing too many things in qemu.

> 
>> +					 mop->size))
>> +				r = -EFAULT;
>> +			break;
>> +		}
>>  		r = read_guest(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size);
>>  		if (r == 0) {
>>  			if (copy_to_user(uaddr, tmpbuf, mop->size))
>> @@ -4439,10 +4458,22 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>>  		break;
>>  	case KVM_S390_MEMOP_LOGICAL_WRITE:
>>  		if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
>> +			if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +				r = 0;
>> +				break;
>> +			}
>>  			r = check_gva_range(vcpu, mop->gaddr, mop->ar,
>>  					    mop->size, GACC_STORE);
>>  			break;
>>  		}
>> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +			r = 0;
>> +			if (copy_from_user((void *)vcpu->arch.sie_block->sidad +
>> +					   (mop->gaddr & ~PAGE_MASK), uaddr,
>> +					   mop->size))
> 
> dito, of course.
> 
>> +				r = -EFAULT;
>> +			break;
>> +		}
>>  		if (copy_from_user(tmpbuf, uaddr, mop->size)) {
>>  			r = -EFAULT;
>>  			break;
> 
>  Thomas
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 21/37] KVM: S390: protvirt: Instruction emulation
  2019-11-14 16:00     ` Janosch Frank
@ 2019-11-14 16:05       ` Cornelia Huck
  0 siblings, 0 replies; 213+ messages in thread
From: Cornelia Huck @ 2019-11-14 16:05 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

[-- Attachment #1: Type: text/plain, Size: 3684 bytes --]

On Thu, 14 Nov 2019 17:00:41 +0100
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 11/14/19 4:38 PM, Cornelia Huck wrote:
> > On Thu, 24 Oct 2019 07:40:43 -0400
> > Janosch Frank <frankja@linux.ibm.com> wrote:
> >   
> >> We have two new SIE exit codes 104 for a secure instruction
> >> interception, on which the SIE needs hypervisor action to complete the
> >> instruction.
> >>
> >> And 108 which is merely a notification and provides data for tracking
> >> and management, like for the lowcore we set notification bits for the
> >> lowcore pages.  
> > 
> > What about the following:
> > 
> > "With protected virtualization, we have two new SIE exit codes:
> > 
> > - 104 indicates a secure instruction interception; the hypervisor needs
> >   to complete emulation of the instruction.
> > - 108 is merely a notification providing data for tracking and
> >   management in the hypervisor; for example, we set notification bits
> >   for the lowcore pages."
> > 
> > ?
> >   
> >>
> >> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> >> ---
> >>  arch/s390/include/asm/kvm_host.h |  2 ++
> >>  arch/s390/kvm/intercept.c        | 23 +++++++++++++++++++++++
> >>  2 files changed, 25 insertions(+)
> >>
> >> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> >> index 2a8a1e21e1c3..a42dfe98128b 100644
> >> --- a/arch/s390/include/asm/kvm_host.h
> >> +++ b/arch/s390/include/asm/kvm_host.h
> >> @@ -212,6 +212,8 @@ struct kvm_s390_sie_block {
> >>  #define ICPT_KSS	0x5c
> >>  #define ICPT_PV_MCHKR	0x60
> >>  #define ICPT_PV_INT_EN	0x64
> >> +#define ICPT_PV_INSTR	0x68
> >> +#define ICPT_PV_NOT	0x6c  
> > 
> > Maybe ICPT_PV_NOTIF?  
> 
> NOTF?

Sounds good.

> 
> >   
> >>  	__u8	icptcode;		/* 0x0050 */
> >>  	__u8	icptstatus;		/* 0x0051 */
> >>  	__u16	ihcpu;			/* 0x0052 */
> >> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
> >> index b013a9c88d43..a1df8a43c88b 100644
> >> --- a/arch/s390/kvm/intercept.c
> >> +++ b/arch/s390/kvm/intercept.c
> >> @@ -451,6 +451,23 @@ static int handle_operexc(struct kvm_vcpu *vcpu)
> >>  	return kvm_s390_inject_program_int(vcpu, PGM_OPERATION);
> >>  }
> >>  
> >> +static int handle_pv_spx(struct kvm_vcpu *vcpu)
> >> +{
> >> +	u32 pref = *(u32 *)vcpu->arch.sie_block->sidad;
> >> +
> >> +	kvm_s390_set_prefix(vcpu, pref);
> >> +	trace_kvm_s390_handle_prefix(vcpu, 1, pref);
> >> +	return 0;
> >> +}
> >> +
> >> +static int handle_pv_not(struct kvm_vcpu *vcpu)
> >> +{
> >> +	if (vcpu->arch.sie_block->ipa == 0xb210)
> >> +		return handle_pv_spx(vcpu);
> >> +
> >> +	return handle_instruction(vcpu);  
> > 
> > Hm... if I understood it correctly, we are getting this one because the
> > SIE informs us about things that it handled itself (but which we
> > should be aware of). What can handle_instruction() do in this case?  
> 
> There used to be an instruction which I could just pipe through normal
> instruction handling. But I can't really remember what it was, too many
> firmware changes in that area since then.
> 
> I'll mark it as a TODO for thinking about it with some coffee.

ok :)

> 
> >   
> >> +}
> >> +
> >>  int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
> >>  {
> >>  	int rc, per_rc = 0;
> >> @@ -505,6 +522,12 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
> >>  		 */
> >>  		rc = 0;
> >>  	break;
> >> +	case ICPT_PV_INSTR:
> >> +		rc = handle_instruction(vcpu);
> >> +		break;
> >> +	case ICPT_PV_NOT:
> >> +		rc = handle_pv_not(vcpu);
> >> +		break;
> >>  	default:
> >>  		return -EOPNOTSUPP;
> >>  	}  
> >   
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 17/37] DOCUMENTATION: protvirt: Instruction emulation
  2019-11-14 16:03           ` Cornelia Huck
@ 2019-11-14 16:18             ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-14 16:18 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Claudio Imbrenda, kvm, linux-s390, thuth, david, borntraeger,
	mihajlov, mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 1983 bytes --]

On 11/14/19 5:03 PM, Cornelia Huck wrote:
> On Thu, 14 Nov 2019 16:55:46 +0100
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> On 11/14/19 4:41 PM, Cornelia Huck wrote:
>>> On Thu, 14 Nov 2019 16:20:24 +0100
>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
>>>   
>>>> On Thu, 14 Nov 2019 16:15:26 +0100
>>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>>  
>>>>> On Thu, 24 Oct 2019 07:40:39 -0400
>>>>> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>>>>>> +The Secure Instruction Data Area contains instruction storage
>>>>>> +data. Data for diag 500 is exempt from that and has to be moved
>>>>>> +through shared buffers to KVM.      
>>>>>
>>>>> I find this paragraph a bit confusing. What does that imply for diag
>>>>> 500 interception? Data is still present in gprs 1-4?    
>>>>
>>>> no registers are leaked in the registers. registers are always only
>>>> exposed through the state description.  
>>>
>>> So, what is so special about diag 500, then?  
>>
>> That's mostly a confusion on my side.
>> The SIDAD is 4k max, so we can only move IO "management" data over it
>> like ORBs and stuff. My intention was to point out, that the data which
>> is to be transferred (disk contents, etc.) can't go over the SIDAD but
>> needs to be in a shared page.
>>
>> diag500 was mostly a notification mechanism without a lot of data, right?
> 
> Yes; the main information in there are the schid identifying the
> subchannel, the virtqueue number, and a cookie value, all of which fit
> into the registers.
> 
> So this goes via the sidad as well?
> 

Only referenced data goes over the SIDA, register values go into offset
0x380 of the SIE state description.

If an instruction has an address in a register, we will receive a bogus
address and the referenced data in the SIDA.

SCLP has a code and an address as register values.
We will get the code and a bogus address in the register area.
The SCCB will be in the SIDA.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCH] Fixup sida bouncing
  2019-11-14 15:36   ` Thomas Huth
  2019-11-14 16:04     ` Janosch Frank
@ 2019-11-14 16:21     ` Janosch Frank
  2019-11-15  8:19       ` Thomas Huth
  1 sibling, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-14 16:21 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, david, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 0fa7c6d9ed0e..9820fde04887 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4432,13 +4432,21 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 	if (mop->size > MEM_OP_MAX_SIZE)
 		return -E2BIG;
 
-	/* Protected guests move instruction data over the satellite
+	/*
+	 * Protected guests move instruction data over the satellite
 	 * block which has its own size limit
 	 */
 	if (kvm_s390_pv_is_protected(vcpu->kvm) &&
-	    mop->size > ((vcpu->arch.sie_block->sidad & 0x0f) + 1) * PAGE_SIZE)
+	    mop->size > ((vcpu->arch.sie_block->sidad & 0xff) + 1) * PAGE_SIZE)
 		return -E2BIG;
 
+	/* We can currently only offset into the one SIDA page. */
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		mop->gaddr &= ~PAGE_MASK;
+		if (mop->gaddr + mop->size > PAGE_SIZE)
+			return -EINVAL;
+	}
+
 	if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
 		tmpbuf = vmalloc(mop->size);
 		if (!tmpbuf)
@@ -4451,6 +4459,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 	case KVM_S390_MEMOP_LOGICAL_READ:
 		if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
 			if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+				/* We can always copy into the SIDA */
 				r = 0;
 				break;
 			}
@@ -4461,8 +4470,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
 			r = 0;
 			if (copy_to_user(uaddr, (void *)vcpu->arch.sie_block->sidad +
-					 (mop->gaddr & ~PAGE_MASK),
-					 mop->size))
+					 mop->gaddr, mop->size))
 				r = -EFAULT;
 			break;
 		}
@@ -4485,8 +4493,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
 			r = 0;
 			if (copy_from_user((void *)vcpu->arch.sie_block->sidad +
-					   (mop->gaddr & ~PAGE_MASK), uaddr,
-					   mop->size))
+					   mop->gaddr, uaddr, mop->size))
 				r = -EFAULT;
 			break;
 		}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [RFC 11/37] DOCUMENTATION: protvirt: Interrupt injection
  2019-11-14 13:47       ` Cornelia Huck
@ 2019-11-14 16:33         ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-14 16:33 UTC (permalink / raw)
  To: Cornelia Huck, Claudio Imbrenda
  Cc: kvm, linux-s390, thuth, david, borntraeger, mihajlov, mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 4637 bytes --]

On 11/14/19 2:47 PM, Cornelia Huck wrote:
> On Thu, 14 Nov 2019 14:25:00 +0100
> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> 
>> On Thu, 14 Nov 2019 14:09:46 +0100
>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>> On Thu, 24 Oct 2019 07:40:33 -0400
>>> Janosch Frank <frankja@linux.ibm.com> wrote:
>>>   
>>>> Interrupt injection has changed a lot for protected guests, as KVM
>>>> can't access the cpus' lowcores. New fields in the state
>>>> description, like the interrupt injection control, and masked
>>>> values safeguard the guest from KVM.
>>>>
>>>> Let's add some documentation to the interrupt injection basics for
>>>> protected guests.
>>>>
>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>> ---
>>>>  Documentation/virtual/kvm/s390-pv.txt | 27
>>>> +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
>>>>
>>>> diff --git a/Documentation/virtual/kvm/s390-pv.txt
>>>> b/Documentation/virtual/kvm/s390-pv.txt index
>>>> 86ed95f36759..e09f2dc5f164 100644 ---
>>>> a/Documentation/virtual/kvm/s390-pv.txt +++
>>>> b/Documentation/virtual/kvm/s390-pv.txt @@ -21,3 +21,30 @@ normally
>>>> needed to be able to run a VM, some changes have been made in SIE
>>>> behavior and fields have different meaning for a PVM. SIE exits are
>>>> minimized as much as possible to improve speed and reduce exposed
>>>> guest state. +
>>>> +
>>>> +Interrupt injection:
>>>> +
>>>> +Interrupt injection is safeguarded by the Ultravisor and, as KVM
>>>> lost +access to the VCPUs' lowcores, is handled via the format 4
>>>> state +description.
>>>> +
>>>> +Machine check, external, IO and restart interruptions each can be
>>>> +injected on SIE entry via a bit in the interrupt injection control
>>>> +field (offset 0x54). If the guest cpu is not enabled for the
>>>> interrupt +at the time of injection, a validity interception is
>>>> recognized. The +interrupt's data is transported via parts of the
>>>> interception data +block.    
>>>
>>> "Data associated with the interrupt needs to be placed into the
>>> respective fields in the interception data block to be injected into
>>> the guest."
>>>
>>> ?  
>>
>> when a normal guest intercepts an exception, depending on the exception
>> type, the parameters are saved in the state description at specified
>> offsets, between 0xC0 amd 0xF8
>>
>> to perform interrupt injection for secure guests, the same fields are
>> used to specify the interrupt parameters that should be injected into
>> the guest
> 
> Ok, maybe add that as well.
> 
>>
>>>> +
>>>> +Program and Service Call exceptions have another layer of
>>>> +safeguarding, they are only injectable, when instructions have
>>>> +intercepted into KVM and such an exception can be an emulation
>>>> result.    
>>>
>>> I find this sentence hard to parse... not sure if I understand it
>>> correctly.
>>>
>>> "They can only be injected if the exception can be encountered during
>>> emulation of instructions that had been intercepted into KVM."  
>>  
>> yes
>>
>>>   
>>>> +
>>>> +
>>>> +Mask notification interceptions:
>>>> +As a replacement for the lctl(g) and lpsw(e) interception, two new
>>>> +interception codes have been introduced. One which tells us that
>>>> CRs +0, 6 or 14 have been changed and therefore interrupt masking
>>>> might +have changed. And one for PSW bit 13 changes. The CRs and
>>>> the PSW in    
>>>
>>> Might be helpful to mention that this bit covers machine checks, which
>>> do not get a separate bit in the control block :)
>>>   
>>>> +the state description only contain the mask bits and no further
>>>> info +like the current instruction address.    
>>>
>>> "The CRs and the PSW in the state description only contain the bits
>>> referring to interrupt masking; other fields like e.g. the current
>>> instruction address are zero."  
>>
>> wait state is saved too
>>
>> CC is write only, and is only inspected by hardware/firmware when
>> KVM/qemu is interpreting an instruction that expects a new CC to be set,
>> and then only the expected CCs are allowed (e.g. if an instruction only
>> allows CC 0 or 3, 2 cannot be specified)
> 
> So I'm wondering how much of that should go into the document... maybe
> just
> 
> "The CRs and the PSW in the state description contain less information
> than for normal guests: most information that does not refer to
> interrupt masking is not available to the hypervisor."
> 
> ?
> 

I'm not liking that too much and I'm also asking myself it makes sense
to fix documentation via mails. How about an etherpad?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 24/37] KVM: s390: protvirt: Write sthyi data to instruction data area
  2019-10-24 11:40 ` [RFC 24/37] KVM: s390: protvirt: Write sthyi data to instruction data area Janosch Frank
@ 2019-11-15  8:04   ` Thomas Huth
  2019-11-15 10:16     ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-15  8:04 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> STHYI data has to go through the bounce buffer.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/intercept.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
> index 510b1dee3320..37cb62bc261b 100644
> --- a/arch/s390/kvm/intercept.c
> +++ b/arch/s390/kvm/intercept.c
> @@ -391,7 +391,7 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
>  		goto out;
>  	}
>  
> -	if (addr & ~PAGE_MASK)
> +	if (!kvm_s390_pv_is_protected(vcpu->kvm) && (addr & ~PAGE_MASK))
>  		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>  
>  	sctns = (void *)get_zeroed_page(GFP_KERNEL);
> @@ -402,10 +402,15 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
>  
>  out:
>  	if (!cc) {
> -		r = write_guest(vcpu, addr, reg2, sctns, PAGE_SIZE);
> -		if (r) {
> -			free_page((unsigned long)sctns);
> -			return kvm_s390_inject_prog_cond(vcpu, r);
> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +			memcpy((void *)vcpu->arch.sie_block->sidad, sctns,

sidad & PAGE_MASK, just to be sure?

> +			       PAGE_SIZE);
> +		} else {
> +			r = write_guest(vcpu, addr, reg2, sctns, PAGE_SIZE);
> +			if (r) {
> +				free_page((unsigned long)sctns);
> +				return kvm_s390_inject_prog_cond(vcpu, r);
> +			}
>  		}
>  	}
>  
> 

With "& PAGE_MASK":

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH] Fixup sida bouncing
  2019-11-14 16:21     ` [PATCH] Fixup sida bouncing Janosch Frank
@ 2019-11-15  8:19       ` Thomas Huth
  2019-11-15  8:50         ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-15  8:19 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck

On 14/11/2019 17.21, Janosch Frank wrote:
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 0fa7c6d9ed0e..9820fde04887 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -4432,13 +4432,21 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>  	if (mop->size > MEM_OP_MAX_SIZE)
>  		return -E2BIG;
>  
> -	/* Protected guests move instruction data over the satellite
> +	/*
> +	 * Protected guests move instruction data over the satellite
>  	 * block which has its own size limit
>  	 */
>  	if (kvm_s390_pv_is_protected(vcpu->kvm) &&
> -	    mop->size > ((vcpu->arch.sie_block->sidad & 0x0f) + 1) * PAGE_SIZE)
> +	    mop->size > ((vcpu->arch.sie_block->sidad & 0xff) + 1) * PAGE_SIZE)
>  		return -E2BIG;
>  
> +	/* We can currently only offset into the one SIDA page. */
> +	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +		mop->gaddr &= ~PAGE_MASK;
> +		if (mop->gaddr + mop->size > PAGE_SIZE)
> +			return -EINVAL;
> +	}
> +
>  	if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
>  		tmpbuf = vmalloc(mop->size);
>  		if (!tmpbuf)
> @@ -4451,6 +4459,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>  	case KVM_S390_MEMOP_LOGICAL_READ:
>  		if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
>  			if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +				/* We can always copy into the SIDA */
>  				r = 0;
>  				break;
>  			}
> @@ -4461,8 +4470,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>  		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>  			r = 0;
>  			if (copy_to_user(uaddr, (void *)vcpu->arch.sie_block->sidad +
> -					 (mop->gaddr & ~PAGE_MASK),
> -					 mop->size))
> +					 mop->gaddr, mop->size))
>  				r = -EFAULT;
>  			break;
>  		}
> @@ -4485,8 +4493,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>  		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>  			r = 0;
>  			if (copy_from_user((void *)vcpu->arch.sie_block->sidad +
> -					   (mop->gaddr & ~PAGE_MASK), uaddr,
> -					   mop->size))
> +					   mop->gaddr, uaddr, mop->size))
>  				r = -EFAULT;
>  			break;
>  		}
> 

That looks better, indeed.

Still, is there a way you could also verify that gaddr references the
right page that is mirrored in the sidad?

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 25/37] KVM: s390: protvirt: STSI handling
  2019-10-24 11:40 ` [RFC 25/37] KVM: s390: protvirt: STSI handling Janosch Frank
@ 2019-11-15  8:27   ` Thomas Huth
  0 siblings, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-15  8:27 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> Save response to sidad and disable address checking for protected
> guests.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/priv.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index ed52ffa8d5d4..06c7e7a10825 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -872,7 +872,7 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
>  
>  	operand2 = kvm_s390_get_base_disp_s(vcpu, &ar);
>  
> -	if (operand2 & 0xfff)
> +	if (!kvm_s390_pv_is_protected(vcpu->kvm) && (operand2 & 0xfff))
>  		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);

I'd prefer if you could put the calculation of operand2 also under the
!pv if-statement:

	if (!kvm_s390_pv_is_protected(vcpu->kvm)) {
		operand2 = kvm_s390_get_base_disp_s(vcpu, &ar);
		if (operand2 & 0xfff)
			return kvm_s390_inject_program_int(vcpu,
						    PGM_SPECIFICATION);
	}

... that makes it more obvious that operand2 is only valid in the !pv
case and you should get automatic compiler warnings if you use it otherwise.

>  	switch (fc) {
> @@ -893,8 +893,13 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
>  		handle_stsi_3_2_2(vcpu, (void *) mem);
>  		break;
>  	}
> +	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +		memcpy((void *)vcpu->arch.sie_block->sidad, (void *)mem,
> +		       PAGE_SIZE);
> +		rc = 0;
> +	} else
> +		rc = write_guest(vcpu, operand2, ar, (void *)mem, PAGE_SIZE);

Please also use braces for the else-branch (according to
Documentation/process/coding-style.rst).

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH] Fixup sida bouncing
  2019-11-15  8:19       ` Thomas Huth
@ 2019-11-15  8:50         ` Janosch Frank
  2019-11-15  9:21           ` Thomas Huth
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-15  8:50 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck


[-- Attachment #1.1: Type: text/plain, Size: 2855 bytes --]

On 11/15/19 9:19 AM, Thomas Huth wrote:
> On 14/11/2019 17.21, Janosch Frank wrote:
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/kvm/kvm-s390.c | 19 +++++++++++++------
>>  1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 0fa7c6d9ed0e..9820fde04887 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -4432,13 +4432,21 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>>  	if (mop->size > MEM_OP_MAX_SIZE)
>>  		return -E2BIG;
>>  
>> -	/* Protected guests move instruction data over the satellite
>> +	/*
>> +	 * Protected guests move instruction data over the satellite
>>  	 * block which has its own size limit
>>  	 */
>>  	if (kvm_s390_pv_is_protected(vcpu->kvm) &&
>> -	    mop->size > ((vcpu->arch.sie_block->sidad & 0x0f) + 1) * PAGE_SIZE)
>> +	    mop->size > ((vcpu->arch.sie_block->sidad & 0xff) + 1) * PAGE_SIZE)
>>  		return -E2BIG;
>>  
>> +	/* We can currently only offset into the one SIDA page. */
>> +	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +		mop->gaddr &= ~PAGE_MASK;
>> +		if (mop->gaddr + mop->size > PAGE_SIZE)
>> +			return -EINVAL;
>> +	}
>> +
>>  	if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
>>  		tmpbuf = vmalloc(mop->size);
>>  		if (!tmpbuf)
>> @@ -4451,6 +4459,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>>  	case KVM_S390_MEMOP_LOGICAL_READ:
>>  		if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
>>  			if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +				/* We can always copy into the SIDA */
>>  				r = 0;
>>  				break;
>>  			}
>> @@ -4461,8 +4470,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>>  		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>  			r = 0;
>>  			if (copy_to_user(uaddr, (void *)vcpu->arch.sie_block->sidad +
>> -					 (mop->gaddr & ~PAGE_MASK),
>> -					 mop->size))
>> +					 mop->gaddr, mop->size))
>>  				r = -EFAULT;
>>  			break;
>>  		}
>> @@ -4485,8 +4493,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
>>  		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>  			r = 0;
>>  			if (copy_from_user((void *)vcpu->arch.sie_block->sidad +
>> -					   (mop->gaddr & ~PAGE_MASK), uaddr,
>> -					   mop->size))
>> +					   mop->gaddr, uaddr, mop->size))
>>  				r = -EFAULT;
>>  			break;
>>  		}
>>
> 
> That looks better, indeed.
> 
> Still, is there a way you could also verify that gaddr references the
> right page that is mirrored in the sidad?
> 
>  Thomas
> 

I'm not completely sure if I understand your question correctly.
Checking that is not possible here without also looking at the
instruction bytecode and register contents which would make this patch
ridiculously large with no real benefit.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 26/37] KVM: s390: protvirt: Only sync fmt4 registers
  2019-10-24 11:40 ` [RFC 26/37] KVM: s390: protvirt: Only sync fmt4 registers Janosch Frank
@ 2019-11-15  9:02   ` Thomas Huth
  2019-11-15 10:01     ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-15  9:02 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> A lot of the registers are controlled by the Ultravisor and never
> visible to KVM. Also some registers are overlayed, like gbea is with
> sidad, which might leak data to userspace.
> 
> Hence we sync a minimal set of registers for both SIE formats and then
> check and sync format 2 registers if necessary.
> 
> Also we disable set/get one reg for the same reason. It's an old
> interface anyway.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 138 +++++++++++++++++++++++----------------
>  1 file changed, 82 insertions(+), 56 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 17a78774c617..f623c64aeade 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2997,7 +2997,8 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
>  	/* make sure the new fpc will be lazily loaded */
>  	save_fpu_regs();
>  	current->thread.fpu.fpc = 0;
> -	vcpu->arch.sie_block->gbea = 1;
> +	if (!kvm_s390_pv_is_protected(vcpu->kvm))
> +		vcpu->arch.sie_block->gbea = 1;
>  	vcpu->arch.sie_block->pp = 0;
>  	vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
>  	vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
> @@ -3367,6 +3368,10 @@ static int kvm_arch_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu,
>  			     (u64 __user *)reg->addr);
>  		break;
>  	case KVM_REG_S390_GBEA:
> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +			r = 0;
> +			break;
> +		}
>  		r = put_user(vcpu->arch.sie_block->gbea,
>  			     (u64 __user *)reg->addr);
>  		break;
> @@ -3420,6 +3425,10 @@ static int kvm_arch_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu,
>  			     (u64 __user *)reg->addr);
>  		break;
>  	case KVM_REG_S390_GBEA:
> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
> +			r = 0;
> +			break;
> +		}

Wouldn't it be better to return EINVAL in this case? ... the callers
definitely do not get what they expected here...

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH] Fixup sida bouncing
  2019-11-15  8:50         ` Janosch Frank
@ 2019-11-15  9:21           ` Thomas Huth
  0 siblings, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-15  9:21 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck

On 15/11/2019 09.50, Janosch Frank wrote:
> On 11/15/19 9:19 AM, Thomas Huth wrote:
[...]
>> Still, is there a way you could also verify that gaddr references the
>> right page that is mirrored in the sidad?
>>
>>  Thomas
>>
> 
> I'm not completely sure if I understand your question correctly.
> Checking that is not possible here without also looking at the
> instruction bytecode and register contents which would make this patch
> ridiculously large with no real benefit.

Yes, I was thinking about something like that. I mean, how can you be
sure that the userspace really only wants to read the contents that are
references by the sidad? It could also try to read or write e.g. the
lowcore data inbetween (assuming that there are some code paths left
which are not aware of protected virtualization yet)?

Well, it does not have to be right now and in this patch, but I still
think that's something that should be added in the future if somehow
possible...

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 29/37] KVM: s390: protvirt: Sync pv state
  2019-10-24 11:40 ` [RFC 29/37] KVM: s390: protvirt: Sync pv state Janosch Frank
@ 2019-11-15  9:36   ` Thomas Huth
  2019-11-15  9:59     ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-15  9:36 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> Indicate via register sync if the VM is in secure mode.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/uapi/asm/kvm.h | 5 ++++-
>  arch/s390/kvm/kvm-s390.c         | 7 ++++++-
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index 436ec7636927..b44c02426c2e 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -231,11 +231,13 @@ struct kvm_guest_debug_arch {
>  #define KVM_SYNC_GSCB   (1UL << 9)
>  #define KVM_SYNC_BPBC   (1UL << 10)
>  #define KVM_SYNC_ETOKEN (1UL << 11)
> +#define KVM_SYNC_PV	(1UL << 12)
>  
>  #define KVM_SYNC_S390_VALID_FIELDS \
>  	(KVM_SYNC_PREFIX | KVM_SYNC_GPRS | KVM_SYNC_ACRS | KVM_SYNC_CRS | \
>  	 KVM_SYNC_ARCH0 | KVM_SYNC_PFAULT | KVM_SYNC_VRS | KVM_SYNC_RICCB | \
> -	 KVM_SYNC_FPRS | KVM_SYNC_GSCB | KVM_SYNC_BPBC | KVM_SYNC_ETOKEN)
> +	 KVM_SYNC_FPRS | KVM_SYNC_GSCB | KVM_SYNC_BPBC | KVM_SYNC_ETOKEN | \
> +	 KVM_SYNC_PV)
>  
>  /* length and alignment of the sdnx as a power of two */
>  #define SDNXC 8
> @@ -261,6 +263,7 @@ struct kvm_sync_regs {
>  	__u8  reserved[512];	/* for future vector expansion */
>  	__u32 fpc;		/* valid on KVM_SYNC_VRS or KVM_SYNC_FPRS */
>  	__u8 bpbc : 1;		/* bp mode */
> +	__u8 pv : 1;		/* pv mode */
>  	__u8 reserved2 : 7;

Don't you want to decrease the reserved2 bits to 6 ? ...

>  	__u8 padding1[51];	/* riccb needs to be 64byte aligned */

... otherwise you might mess up the alignment here!

>  	__u8 riccb[64];		/* runtime instrumentation controls block */
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index f623c64aeade..500972a1f742 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2856,6 +2856,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  		vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
>  	if (test_kvm_facility(vcpu->kvm, 156))
>  		vcpu->run->kvm_valid_regs |= KVM_SYNC_ETOKEN;
> +	if (test_kvm_facility(vcpu->kvm, 161))
> +		vcpu->run->kvm_valid_regs |= KVM_SYNC_PV;
>  	/* fprs can be synchronized via vrs, even if the guest has no vx. With
>  	 * MACHINE_HAS_VX, (load|store)_fpu_regs() will work with vrs format.
>  	 */
> @@ -4136,6 +4138,7 @@ static void store_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>  {
>  	kvm_run->s.regs.gbea = vcpu->arch.sie_block->gbea;
>  	kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
> +	kvm_run->s.regs.pv = 0;
>  	if (MACHINE_HAS_GS) {
>  		__ctl_set_bit(2, 4);
>  		if (vcpu->arch.gs_enabled)
> @@ -4172,8 +4175,10 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>  	/* Restore will be done lazily at return */
>  	current->thread.fpu.fpc = vcpu->arch.host_fpregs.fpc;
>  	current->thread.fpu.regs = vcpu->arch.host_fpregs.regs;
> -	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm)))
> +	if (likely(!kvm_s390_pv_handle_cpu(vcpu)))

Why change the if-statement now? Should this maybe rather be squashed
into the patch that introduced the if-statement?

>  		store_regs_fmt2(vcpu, kvm_run);
> +	else
> +		kvm_run->s.regs.pv = 1;
>  }
>  
>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> 

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 29/37] KVM: s390: protvirt: Sync pv state
  2019-11-15  9:36   ` Thomas Huth
@ 2019-11-15  9:59     ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-15  9:59 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 3611 bytes --]

On 11/15/19 10:36 AM, Thomas Huth wrote:
> On 24/10/2019 13.40, Janosch Frank wrote:
>> Indicate via register sync if the VM is in secure mode.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/uapi/asm/kvm.h | 5 ++++-
>>  arch/s390/kvm/kvm-s390.c         | 7 ++++++-
>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
>> index 436ec7636927..b44c02426c2e 100644
>> --- a/arch/s390/include/uapi/asm/kvm.h
>> +++ b/arch/s390/include/uapi/asm/kvm.h
>> @@ -231,11 +231,13 @@ struct kvm_guest_debug_arch {
>>  #define KVM_SYNC_GSCB   (1UL << 9)
>>  #define KVM_SYNC_BPBC   (1UL << 10)
>>  #define KVM_SYNC_ETOKEN (1UL << 11)
>> +#define KVM_SYNC_PV	(1UL << 12)
>>  
>>  #define KVM_SYNC_S390_VALID_FIELDS \
>>  	(KVM_SYNC_PREFIX | KVM_SYNC_GPRS | KVM_SYNC_ACRS | KVM_SYNC_CRS | \
>>  	 KVM_SYNC_ARCH0 | KVM_SYNC_PFAULT | KVM_SYNC_VRS | KVM_SYNC_RICCB | \
>> -	 KVM_SYNC_FPRS | KVM_SYNC_GSCB | KVM_SYNC_BPBC | KVM_SYNC_ETOKEN)
>> +	 KVM_SYNC_FPRS | KVM_SYNC_GSCB | KVM_SYNC_BPBC | KVM_SYNC_ETOKEN | \
>> +	 KVM_SYNC_PV)
>>  
>>  /* length and alignment of the sdnx as a power of two */
>>  #define SDNXC 8
>> @@ -261,6 +263,7 @@ struct kvm_sync_regs {
>>  	__u8  reserved[512];	/* for future vector expansion */
>>  	__u32 fpc;		/* valid on KVM_SYNC_VRS or KVM_SYNC_FPRS */
>>  	__u8 bpbc : 1;		/* bp mode */
>> +	__u8 pv : 1;		/* pv mode */
>>  	__u8 reserved2 : 7;
> 
> Don't you want to decrease the reserved2 bits to 6 ? ...

Ups

> 
>>  	__u8 padding1[51];	/* riccb needs to be 64byte aligned */
> 
> ... otherwise you might mess up the alignment here!
> 
>>  	__u8 riccb[64];		/* runtime instrumentation controls block */
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index f623c64aeade..500972a1f742 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -2856,6 +2856,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>  		vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
>>  	if (test_kvm_facility(vcpu->kvm, 156))
>>  		vcpu->run->kvm_valid_regs |= KVM_SYNC_ETOKEN;
>> +	if (test_kvm_facility(vcpu->kvm, 161))
>> +		vcpu->run->kvm_valid_regs |= KVM_SYNC_PV;
>>  	/* fprs can be synchronized via vrs, even if the guest has no vx. With
>>  	 * MACHINE_HAS_VX, (load|store)_fpu_regs() will work with vrs format.
>>  	 */
>> @@ -4136,6 +4138,7 @@ static void store_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>  {
>>  	kvm_run->s.regs.gbea = vcpu->arch.sie_block->gbea;
>>  	kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
>> +	kvm_run->s.regs.pv = 0;
>>  	if (MACHINE_HAS_GS) {
>>  		__ctl_set_bit(2, 4);
>>  		if (vcpu->arch.gs_enabled)
>> @@ -4172,8 +4175,10 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>  	/* Restore will be done lazily at return */
>>  	current->thread.fpu.fpc = vcpu->arch.host_fpregs.fpc;
>>  	current->thread.fpu.regs = vcpu->arch.host_fpregs.regs;
>> -	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm)))
>> +	if (likely(!kvm_s390_pv_handle_cpu(vcpu)))
> 
> Why change the if-statement now? Should this maybe rather be squashed
> into the patch that introduced the if-statement?

That was part of a cleanup that should have been done in other patches.

> 
>>  		store_regs_fmt2(vcpu, kvm_run);
>> +	else
>> +		kvm_run->s.regs.pv = 1;
>>  }
>>  
>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>
> 
>  Thomas
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 26/37] KVM: s390: protvirt: Only sync fmt4 registers
  2019-11-15  9:02   ` Thomas Huth
@ 2019-11-15 10:01     ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-15 10:01 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 2270 bytes --]

On 11/15/19 10:02 AM, Thomas Huth wrote:
> On 24/10/2019 13.40, Janosch Frank wrote:
>> A lot of the registers are controlled by the Ultravisor and never
>> visible to KVM. Also some registers are overlayed, like gbea is with
>> sidad, which might leak data to userspace.
>>
>> Hence we sync a minimal set of registers for both SIE formats and then
>> check and sync format 2 registers if necessary.
>>
>> Also we disable set/get one reg for the same reason. It's an old
>> interface anyway.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/kvm/kvm-s390.c | 138 +++++++++++++++++++++++----------------
>>  1 file changed, 82 insertions(+), 56 deletions(-)
>>
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 17a78774c617..f623c64aeade 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -2997,7 +2997,8 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
>>  	/* make sure the new fpc will be lazily loaded */
>>  	save_fpu_regs();
>>  	current->thread.fpu.fpc = 0;
>> -	vcpu->arch.sie_block->gbea = 1;
>> +	if (!kvm_s390_pv_is_protected(vcpu->kvm))
>> +		vcpu->arch.sie_block->gbea = 1;
>>  	vcpu->arch.sie_block->pp = 0;
>>  	vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
>>  	vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
>> @@ -3367,6 +3368,10 @@ static int kvm_arch_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu,
>>  			     (u64 __user *)reg->addr);
>>  		break;
>>  	case KVM_REG_S390_GBEA:
>> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +			r = 0;
>> +			break;
>> +		}
>>  		r = put_user(vcpu->arch.sie_block->gbea,
>>  			     (u64 __user *)reg->addr);
>>  		break;
>> @@ -3420,6 +3425,10 @@ static int kvm_arch_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu,
>>  			     (u64 __user *)reg->addr);
>>  		break;
>>  	case KVM_REG_S390_GBEA:
>> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +			r = 0;
>> +			break;
>> +		}
> 
> Wouldn't it be better to return EINVAL in this case? ... the callers
> definitely do not get what they expected here...
> 
>  Thomas
> 

Hrm, new QEMUs will use cpu run anyway and hence I guess it would make
sense as old QEMUs hopefully won't have pv support.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 31/37] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling
  2019-10-24 11:40 ` [RFC 31/37] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling Janosch Frank
@ 2019-11-15 10:04   ` Thomas Huth
  2019-11-15 10:20     ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-15 10:04 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> If the host initialized the Ultravisor, we can set stfle bit 161
> (protected virtual IPL enhancements facility), which indicates, that
> the IPL subcodes 8, 9 and are valid. These subcodes are used by a
> normal guest to set/retrieve a IPIB of type 5 and transition into
> protected mode.
> 
> Once in protected mode, the VM will loose the facility bit, as each

So should the bit be cleared in the host code again? ... I don't see
this happening in this patch?

 Thomas


> boot into protected mode has to go through non-protected. There is no
> secure re-ipl with subcode 10 without a previous subcode 3.
> 
> In protected mode, there is no subcode 4 available, as the VM has no
> more access to its memory from non-protected mode. I.e. each IPL
> clears.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/diag.c     | 6 ++++++
>  arch/s390/kvm/kvm-s390.c | 5 +++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
> index 3fb54ec2cf3e..b951dbdcb6a0 100644
> --- a/arch/s390/kvm/diag.c
> +++ b/arch/s390/kvm/diag.c
> @@ -197,6 +197,12 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
>  	case 4:
>  		vcpu->run->s390_reset_flags = 0;
>  		break;
> +	case 8:
> +	case 9:
> +	case 10:
> +		if (!test_kvm_facility(vcpu->kvm, 161))
> +			return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
> +		/* fall through */
>  	default:
>  		return -EOPNOTSUPP;
>  	}
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 500972a1f742..8947f1812b12 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2590,6 +2590,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	if (css_general_characteristics.aiv && test_facility(65))
>  		set_kvm_facility(kvm->arch.model.fac_mask, 65);
>  
> +	if (is_prot_virt_host()) {
> +		set_kvm_facility(kvm->arch.model.fac_mask, 161);
> +		set_kvm_facility(kvm->arch.model.fac_list, 161);
> +	}
> +
>  	kvm->arch.model.cpuid = kvm_s390_get_initial_cpuid();
>  	kvm->arch.model.ibc = sclp.ibc & 0x0fff;
>  
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 32/37] KVM: s390: protvirt: UV calls diag308 0, 1
  2019-10-24 11:40 ` [RFC 32/37] KVM: s390: protvirt: UV calls diag308 0, 1 Janosch Frank
@ 2019-11-15 10:07   ` Thomas Huth
  2019-11-15 11:39     ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-15 10:07 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/uv.h | 25 +++++++++++++++++++++++++
>  arch/s390/kvm/diag.c       |  1 +
>  arch/s390/kvm/kvm-s390.c   | 20 ++++++++++++++++++++
>  arch/s390/kvm/kvm-s390.h   |  2 ++
>  arch/s390/kvm/pv.c         | 19 +++++++++++++++++++
>  include/uapi/linux/kvm.h   |  2 ++
>  6 files changed, 69 insertions(+)

Add at least a short patch description what this patch is all about?

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 24/37] KVM: s390: protvirt: Write sthyi data to instruction data area
  2019-11-15  8:04   ` Thomas Huth
@ 2019-11-15 10:16     ` Janosch Frank
  2019-11-15 10:21       ` Thomas Huth
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-15 10:16 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 1654 bytes --]

On 11/15/19 9:04 AM, Thomas Huth wrote:
> On 24/10/2019 13.40, Janosch Frank wrote:
>> STHYI data has to go through the bounce buffer.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/kvm/intercept.c | 15 ++++++++++-----
>>  1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
>> index 510b1dee3320..37cb62bc261b 100644
>> --- a/arch/s390/kvm/intercept.c
>> +++ b/arch/s390/kvm/intercept.c
>> @@ -391,7 +391,7 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
>>  		goto out;
>>  	}
>>  
>> -	if (addr & ~PAGE_MASK)
>> +	if (!kvm_s390_pv_is_protected(vcpu->kvm) && (addr & ~PAGE_MASK))
>>  		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>>  
>>  	sctns = (void *)get_zeroed_page(GFP_KERNEL);
>> @@ -402,10 +402,15 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
>>  
>>  out:
>>  	if (!cc) {
>> -		r = write_guest(vcpu, addr, reg2, sctns, PAGE_SIZE);
>> -		if (r) {
>> -			free_page((unsigned long)sctns);
>> -			return kvm_s390_inject_prog_cond(vcpu, r);
>> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +			memcpy((void *)vcpu->arch.sie_block->sidad, sctns,
> 
> sidad & PAGE_MASK, just to be sure?

How about a macro or just saving the pointer in an arch struct?

> 
>> +			       PAGE_SIZE);
>> +		} else {
>> +			r = write_guest(vcpu, addr, reg2, sctns, PAGE_SIZE);
>> +			if (r) {
>> +				free_page((unsigned long)sctns);
>> +				return kvm_s390_inject_prog_cond(vcpu, r);
>> +			}
>>  		}
>>  	}
>>  
>>
> 
> With "& PAGE_MASK":
> 
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 31/37] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling
  2019-11-15 10:04   ` Thomas Huth
@ 2019-11-15 10:20     ` Janosch Frank
  2019-11-15 10:27       ` Thomas Huth
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-15 10:20 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 2543 bytes --]

On 11/15/19 11:04 AM, Thomas Huth wrote:
> On 24/10/2019 13.40, Janosch Frank wrote:
>> If the host initialized the Ultravisor, we can set stfle bit 161
>> (protected virtual IPL enhancements facility), which indicates, that
>> the IPL subcodes 8, 9 and are valid. These subcodes are used by a
>> normal guest to set/retrieve a IPIB of type 5 and transition into
>> protected mode.
>>
>> Once in protected mode, the VM will loose the facility bit, as each
> 
> So should the bit be cleared in the host code again? ... I don't see
> this happening in this patch?
> 
>  Thomas

No, KVM doesn't report stfle facilities in protected mode and we would
need to add it again in normal mode so just clearing it would be
pointless. In protected mode 8-10 do not intercept, so there's nothing
we need to do.

> 
> 
>> boot into protected mode has to go through non-protected. There is no
>> secure re-ipl with subcode 10 without a previous subcode 3.
>>
>> In protected mode, there is no subcode 4 available, as the VM has no
>> more access to its memory from non-protected mode. I.e. each IPL
>> clears.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/kvm/diag.c     | 6 ++++++
>>  arch/s390/kvm/kvm-s390.c | 5 +++++
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
>> index 3fb54ec2cf3e..b951dbdcb6a0 100644
>> --- a/arch/s390/kvm/diag.c
>> +++ b/arch/s390/kvm/diag.c
>> @@ -197,6 +197,12 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
>>  	case 4:
>>  		vcpu->run->s390_reset_flags = 0;
>>  		break;
>> +	case 8:
>> +	case 9:
>> +	case 10:
>> +		if (!test_kvm_facility(vcpu->kvm, 161))
>> +			return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>> +		/* fall through */
>>  	default:
>>  		return -EOPNOTSUPP;
>>  	}
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 500972a1f742..8947f1812b12 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -2590,6 +2590,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>  	if (css_general_characteristics.aiv && test_facility(65))
>>  		set_kvm_facility(kvm->arch.model.fac_mask, 65);
>>  
>> +	if (is_prot_virt_host()) {
>> +		set_kvm_facility(kvm->arch.model.fac_mask, 161);
>> +		set_kvm_facility(kvm->arch.model.fac_list, 161);
>> +	}
>> +
>>  	kvm->arch.model.cpuid = kvm_s390_get_initial_cpuid();
>>  	kvm->arch.model.ibc = sclp.ibc & 0x0fff;
>>  
>>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 24/37] KVM: s390: protvirt: Write sthyi data to instruction data area
  2019-11-15 10:16     ` Janosch Frank
@ 2019-11-15 10:21       ` Thomas Huth
  2019-11-15 12:17         ` [PATCH] SIDAD macro fixup Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-15 10:21 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 15/11/2019 11.16, Janosch Frank wrote:
> On 11/15/19 9:04 AM, Thomas Huth wrote:
>> On 24/10/2019 13.40, Janosch Frank wrote:
>>> STHYI data has to go through the bounce buffer.
>>>
>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>> ---
>>>  arch/s390/kvm/intercept.c | 15 ++++++++++-----
>>>  1 file changed, 10 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
>>> index 510b1dee3320..37cb62bc261b 100644
>>> --- a/arch/s390/kvm/intercept.c
>>> +++ b/arch/s390/kvm/intercept.c
>>> @@ -391,7 +391,7 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
>>>  		goto out;
>>>  	}
>>>  
>>> -	if (addr & ~PAGE_MASK)
>>> +	if (!kvm_s390_pv_is_protected(vcpu->kvm) && (addr & ~PAGE_MASK))
>>>  		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>>>  
>>>  	sctns = (void *)get_zeroed_page(GFP_KERNEL);
>>> @@ -402,10 +402,15 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
>>>  
>>>  out:
>>>  	if (!cc) {
>>> -		r = write_guest(vcpu, addr, reg2, sctns, PAGE_SIZE);
>>> -		if (r) {
>>> -			free_page((unsigned long)sctns);
>>> -			return kvm_s390_inject_prog_cond(vcpu, r);
>>> +		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
>>> +			memcpy((void *)vcpu->arch.sie_block->sidad, sctns,
>>
>> sidad & PAGE_MASK, just to be sure?
> 
> How about a macro or just saving the pointer in an arch struct?

Sounds fine, too. I think I'd personally slightly prefer a macro.

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 31/37] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling
  2019-11-15 10:20     ` Janosch Frank
@ 2019-11-15 10:27       ` Thomas Huth
  2019-11-15 11:29         ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-15 10:27 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 15/11/2019 11.20, Janosch Frank wrote:
> On 11/15/19 11:04 AM, Thomas Huth wrote:
>> On 24/10/2019 13.40, Janosch Frank wrote:
>>> If the host initialized the Ultravisor, we can set stfle bit 161
>>> (protected virtual IPL enhancements facility), which indicates, that
>>> the IPL subcodes 8, 9 and are valid. These subcodes are used by a
>>> normal guest to set/retrieve a IPIB of type 5 and transition into
>>> protected mode.
>>>
>>> Once in protected mode, the VM will loose the facility bit, as each
>>
>> So should the bit be cleared in the host code again? ... I don't see
>> this happening in this patch?
>>
>>  Thomas
> 
> No, KVM doesn't report stfle facilities in protected mode and we would
> need to add it again in normal mode so just clearing it would be
> pointless. In protected mode 8-10 do not intercept, so there's nothing
> we need to do.

Ah, ok, that's what I've missed. Maybe replace "the VM will loose the
facility bit" with "the ultravisor will conceal the facility bit" ?

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 33/37] KVM: s390: Introduce VCPU reset IOCTL
  2019-10-24 11:40 ` [RFC 33/37] KVM: s390: Introduce VCPU reset IOCTL Janosch Frank
@ 2019-11-15 10:47   ` Thomas Huth
  2019-11-15 13:06     ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-15 10:47 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> With PV we need to do things for all reset types, not only initial...
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 53 ++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h |  6 +++++
>  2 files changed, 59 insertions(+)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index d3fd3ad1d09b..d8ee3a98e961 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -3472,6 +3472,53 @@ static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> +static int kvm_arch_vcpu_ioctl_reset(struct kvm_vcpu *vcpu,
> +				     unsigned long type)
> +{
> +	int rc;
> +	u32 ret;
> +
> +	switch (type) {
> +	case KVM_S390_VCPU_RESET_NORMAL:
> +		/*
> +		 * Only very little is reset, userspace handles the
> +		 * non-protected case.
> +		 */
> +		rc = 0;
> +		if (kvm_s390_pv_handle_cpu(vcpu)) {
> +			rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
> +					   UVC_CMD_CPU_RESET, &ret);
> +			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET NORMAL VCPU: cpu %d rc %x rrc %x",
> +				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
> +		}
> +		break;
> +	case KVM_S390_VCPU_RESET_INITIAL:
> +		rc = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
> +		if (kvm_s390_pv_handle_cpu(vcpu)) {
> +			uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
> +				      UVC_CMD_CPU_RESET_INITIAL,
> +				      &ret);
> +			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET INITIAL VCPU: cpu %d rc %x rrc %x",
> +				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
> +		}
> +		break;
> +	case KVM_S390_VCPU_RESET_CLEAR:
> +		rc = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
> +		if (kvm_s390_pv_handle_cpu(vcpu)) {
> +			rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
> +					   UVC_CMD_CPU_RESET_CLEAR, &ret);
> +			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET CLEAR VCPU: cpu %d rc %x rrc %x",
> +				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
> +		}
> +		break;
> +	default:
> +		rc = -EINVAL;
> +		break;

(nit: you could drop the "break;" here)

> +	}
> +	return rc;
> +}
> +
> +
>  int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>  {
>  	vcpu_load(vcpu);
> @@ -4633,8 +4680,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  		break;
>  	}
>  	case KVM_S390_INITIAL_RESET:
> +		r = -EINVAL;
> +		if (kvm_s390_pv_is_protected(vcpu->kvm))
> +			break;

Wouldn't it be nicer to call

  kvm_arch_vcpu_ioctl_reset(vcpu, KVM_S390_VCPU_RESET_INITIAL)

in this case instead?

>  		r = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
>  		break;
> +	case KVM_S390_VCPU_RESET:
> +		r = kvm_arch_vcpu_ioctl_reset(vcpu, arg);
> +		break;
>  	case KVM_SET_ONE_REG:
>  	case KVM_GET_ONE_REG: {
>  		struct kvm_one_reg reg;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index f75a051a7705..2846ed5e5dd9 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1496,6 +1496,12 @@ struct kvm_pv_cmd {
>  #define KVM_S390_PV_COMMAND		_IOW(KVMIO, 0xc3, struct kvm_pv_cmd)
>  #define KVM_S390_PV_COMMAND_VCPU	_IOW(KVMIO, 0xc4, struct kvm_pv_cmd)
>  
> +#define KVM_S390_VCPU_RESET_NORMAL	0
> +#define KVM_S390_VCPU_RESET_INITIAL	1
> +#define KVM_S390_VCPU_RESET_CLEAR	2
> +
> +#define KVM_S390_VCPU_RESET    _IO(KVMIO,   0xd0)

Why not 0xc5 ?

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 27/37] KVM: s390: protvirt: SIGP handling
  2019-10-24 11:40 ` [RFC 27/37] KVM: s390: protvirt: SIGP handling Janosch Frank
  2019-10-30 18:29   ` David Hildenbrand
@ 2019-11-15 11:15   ` Thomas Huth
  1 sibling, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-15 11:15 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/intercept.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
> index 37cb62bc261b..a89738e4f761 100644
> --- a/arch/s390/kvm/intercept.c
> +++ b/arch/s390/kvm/intercept.c
> @@ -72,7 +72,8 @@ static int handle_stop(struct kvm_vcpu *vcpu)
>  	if (!stop_pending)
>  		return 0;
>  
> -	if (flags & KVM_S390_STOP_FLAG_STORE_STATUS) {
> +	if (flags & KVM_S390_STOP_FLAG_STORE_STATUS &&
> +	    !kvm_s390_pv_is_protected(vcpu->kvm)) {
>  		rc = kvm_s390_vcpu_store_status(vcpu,
>  						KVM_S390_STORE_STATUS_NOADDR);
>  		if (rc)

Can this still happen at all that we get here with
KVM_S390_STOP_FLAG_STORE_STATUS in the protected case? I'd rather expect
that SIGP is completely handled by the UV already, so userspace should
have no need to inject a SIGP_STOP anymore? Or did I get that wrong?

Anyway, I guess it can not hurt to add this check anyway, so:

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 35/37] KVM: s390: Fix cpu reset local IRQ clearing
  2019-10-24 11:40 ` [RFC 35/37] KVM: s390: Fix cpu reset local IRQ clearing Janosch Frank
@ 2019-11-15 11:23   ` Thomas Huth
  2019-11-15 11:37     ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-15 11:23 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> The architecture states that we need to reset local IRQs for all CPU
> resets. Because the old reset interface did not support the normal CPU
> reset we never did that.
> 
> Now that we have a new interface, let's properly clear out local IRQs
> and let this commit be a reminder.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index ba6144fdb5d1..cc5feb67f145 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -3485,6 +3485,8 @@ static int kvm_arch_vcpu_ioctl_reset(struct kvm_vcpu *vcpu,
>  		 * non-protected case.
>  		 */
>  		rc = 0;
> +		kvm_clear_async_pf_completion_queue(vcpu);
> +		kvm_s390_clear_local_irqs(vcpu);
>  		if (kvm_s390_pv_handle_cpu(vcpu)) {
>  			rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
>  					   UVC_CMD_CPU_RESET, &ret);
> 

I think you could squash this into patch 33/37 where you've introduced
the RESET_NORMAL (and adjust the patch description there).

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 36/37] KVM: s390: protvirt: Support cmd 5 operation state
  2019-10-24 11:40 ` [RFC 36/37] KVM: s390: protvirt: Support cmd 5 operation state Janosch Frank
@ 2019-11-15 11:25   ` Thomas Huth
  2019-11-18 17:38   ` Cornelia Huck
  1 sibling, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-15 11:25 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24/10/2019 13.40, Janosch Frank wrote:
> Code 5 for the set cpu state UV call tells the UV to load a PSW from
> the SE header (first IPL) or from guest location 0x0 (diag 308 subcode
> 0/1). Also it sets the cpu into operating state afterwards, so we can
> start it.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/uv.h | 1 +
>  arch/s390/kvm/kvm-s390.c   | 4 ++++
>  include/uapi/linux/kvm.h   | 1 +
>  3 files changed, 6 insertions(+)
> 
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 33b52ba306af..8d10ae731458 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -163,6 +163,7 @@ struct uv_cb_unp {
>  #define PV_CPU_STATE_OPR	1
>  #define PV_CPU_STATE_STP	2
>  #define PV_CPU_STATE_CHKSTP	3
> +#define PV_CPU_STATE_OPR_LOAD	5
>  
>  struct uv_cb_cpu_set_state {
>  	struct uv_cb_header header;
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index cc5feb67f145..5cc9108c94e4 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -4652,6 +4652,10 @@ static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
>  		r = kvm_s390_pv_destroy_cpu(vcpu);
>  		break;
>  	}
> +	case KVM_PV_VCPU_SET_IPL_PSW: {
> +		r = kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_OPR_LOAD);
> +		break;
> +	}

Nit: No need for the curly braces here.

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 31/37] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling
  2019-11-15 10:27       ` Thomas Huth
@ 2019-11-15 11:29         ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-15 11:29 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 1124 bytes --]

On 11/15/19 11:27 AM, Thomas Huth wrote:
> On 15/11/2019 11.20, Janosch Frank wrote:
>> On 11/15/19 11:04 AM, Thomas Huth wrote:
>>> On 24/10/2019 13.40, Janosch Frank wrote:
>>>> If the host initialized the Ultravisor, we can set stfle bit 161
>>>> (protected virtual IPL enhancements facility), which indicates, that
>>>> the IPL subcodes 8, 9 and are valid. These subcodes are used by a
>>>> normal guest to set/retrieve a IPIB of type 5 and transition into
>>>> protected mode.
>>>>
>>>> Once in protected mode, the VM will loose the facility bit, as each
>>>
>>> So should the bit be cleared in the host code again? ... I don't see
>>> this happening in this patch?
>>>
>>>  Thomas
>>
>> No, KVM doesn't report stfle facilities in protected mode and we would
>> need to add it again in normal mode so just clearing it would be
>> pointless. In protected mode 8-10 do not intercept, so there's nothing
>> we need to do.
> 
> Ah, ok, that's what I've missed. Maybe replace "the VM will loose the
> facility bit" with "the ultravisor will conceal the facility bit" ?
> 
>  Thomas
> 


Sure


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 35/37] KVM: s390: Fix cpu reset local IRQ clearing
  2019-11-15 11:23   ` Thomas Huth
@ 2019-11-15 11:37     ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-15 11:37 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 1383 bytes --]

On 11/15/19 12:23 PM, Thomas Huth wrote:
> On 24/10/2019 13.40, Janosch Frank wrote:
>> The architecture states that we need to reset local IRQs for all CPU
>> resets. Because the old reset interface did not support the normal CPU
>> reset we never did that.
>>
>> Now that we have a new interface, let's properly clear out local IRQs
>> and let this commit be a reminder.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/kvm/kvm-s390.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index ba6144fdb5d1..cc5feb67f145 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -3485,6 +3485,8 @@ static int kvm_arch_vcpu_ioctl_reset(struct kvm_vcpu *vcpu,
>>  		 * non-protected case.
>>  		 */
>>  		rc = 0;
>> +		kvm_clear_async_pf_completion_queue(vcpu);
>> +		kvm_s390_clear_local_irqs(vcpu);
>>  		if (kvm_s390_pv_handle_cpu(vcpu)) {
>>  			rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
>>  					   UVC_CMD_CPU_RESET, &ret);
>>
> 
> I think you could squash this into patch 33/37 where you've introduced
> the RESET_NORMAL (and adjust the patch description there).
> 
>  Thomas
> 

Yes, that hunk was singled out to have an item to discuss internally.
Since we now established, that it is needed, I can squash it.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 32/37] KVM: s390: protvirt: UV calls diag308 0, 1
  2019-11-15 10:07   ` Thomas Huth
@ 2019-11-15 11:39     ` Janosch Frank
  2019-11-15 13:30       ` Thomas Huth
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-15 11:39 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 1054 bytes --]

On 11/15/19 11:07 AM, Thomas Huth wrote:
> On 24/10/2019 13.40, Janosch Frank wrote:
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/uv.h | 25 +++++++++++++++++++++++++
>>  arch/s390/kvm/diag.c       |  1 +
>>  arch/s390/kvm/kvm-s390.c   | 20 ++++++++++++++++++++
>>  arch/s390/kvm/kvm-s390.h   |  2 ++
>>  arch/s390/kvm/pv.c         | 19 +++++++++++++++++++
>>  include/uapi/linux/kvm.h   |  2 ++
>>  6 files changed, 69 insertions(+)
> 
> Add at least a short patch description what this patch is all about?
> 
>  Thomas
> 

I'm thinking about taking out the set cpu state changes and move it into
a later patch.


How about:
diag 308 subcode 0 and 1 require KVM and Ultravisor interaction, since
the cpus have to be set into multiple reset states.

* All cpus need to be stopped
* The unshare all UVC needs to be executed
* The perform reset UVC needs to be executed
* The cpus need to be reset via the set cpu state UVC
* The issuing cpu needs to set state 5 via set cpu state


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCH] SIDAD macro fixup
  2019-11-15 10:21       ` Thomas Huth
@ 2019-11-15 12:17         ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-15 12:17 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, david, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck

Additionally I would need to use it in the other patches...

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 4 ++++
 arch/s390/kvm/kvm-s390.c         | 4 ++--
 arch/s390/kvm/pv.c               | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 2a8a1e21e1c3..81f6532531cb 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -122,6 +122,10 @@ struct mcck_volatile_info {
 	__u32 reserved;
 };
 
+#define SIDAD_SIZE_MASK		0xff
+#define sidad_origin(sie_block) \
+	(sie_block->sidad & PAGE_MASK)
+
 #define CPUSTAT_STOPPED    0x80000000
 #define CPUSTAT_WAIT       0x10000000
 #define CPUSTAT_ECALL_PEND 0x08000000
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 0fa7c6d9ed0e..91a638cc1eba 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4436,7 +4436,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 	 * block which has its own size limit
 	 */
 	if (kvm_s390_pv_is_protected(vcpu->kvm) &&
-	    mop->size > ((vcpu->arch.sie_block->sidad & 0x0f) + 1) * PAGE_SIZE)
+	    mop->size > ((vcpu->arch.sie_block->sidad & SIDAD_SIZE_MASK) + 1) * PAGE_SIZE)
 		return -E2BIG;
 
 	if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
@@ -4460,7 +4460,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 		}
 		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
 			r = 0;
-			if (copy_to_user(uaddr, (void *)vcpu->arch.sie_block->sidad +
+			if (copy_to_user(uaddr, (void *)sidad_origin(vcpu->arch.sie_block) +
 					 (mop->gaddr & ~PAGE_MASK),
 					 mop->size))
 				r = -EFAULT;
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 764f8f9f5dff..661f03629265 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -119,7 +119,7 @@ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
 
 	free_pages(vcpu->arch.pv.stor_base,
 		   get_order(uv_info.guest_cpu_stor_len));
-	free_page(vcpu->arch.sie_block->sidad);
+	free_page(sidad_origin(vcpu->arch.sie_block));
 	/* Clear cpu and vm handle */
 	memset(&vcpu->arch.sie_block->reserved10, 0,
 	       sizeof(vcpu->arch.sie_block->reserved10));
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [RFC 33/37] KVM: s390: Introduce VCPU reset IOCTL
  2019-11-15 10:47   ` Thomas Huth
@ 2019-11-15 13:06     ` Janosch Frank
  2019-11-15 13:18       ` Thomas Huth
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-15 13:06 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 3856 bytes --]

On 11/15/19 11:47 AM, Thomas Huth wrote:
> On 24/10/2019 13.40, Janosch Frank wrote:
>> With PV we need to do things for all reset types, not only initial...
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/kvm/kvm-s390.c | 53 ++++++++++++++++++++++++++++++++++++++++
>>  include/uapi/linux/kvm.h |  6 +++++
>>  2 files changed, 59 insertions(+)
>>
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index d3fd3ad1d09b..d8ee3a98e961 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -3472,6 +3472,53 @@ static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu)
>>  	return 0;
>>  }
>>  
>> +static int kvm_arch_vcpu_ioctl_reset(struct kvm_vcpu *vcpu,
>> +				     unsigned long type)
>> +{
>> +	int rc;
>> +	u32 ret;
>> +
>> +	switch (type) {
>> +	case KVM_S390_VCPU_RESET_NORMAL:
>> +		/*
>> +		 * Only very little is reset, userspace handles the
>> +		 * non-protected case.
>> +		 */
>> +		rc = 0;
>> +		if (kvm_s390_pv_handle_cpu(vcpu)) {
>> +			rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
>> +					   UVC_CMD_CPU_RESET, &ret);
>> +			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET NORMAL VCPU: cpu %d rc %x rrc %x",
>> +				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
>> +		}
>> +		break;
>> +	case KVM_S390_VCPU_RESET_INITIAL:
>> +		rc = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
>> +		if (kvm_s390_pv_handle_cpu(vcpu)) {
>> +			uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
>> +				      UVC_CMD_CPU_RESET_INITIAL,
>> +				      &ret);
>> +			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET INITIAL VCPU: cpu %d rc %x rrc %x",
>> +				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
>> +		}
>> +		break;
>> +	case KVM_S390_VCPU_RESET_CLEAR:
>> +		rc = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
>> +		if (kvm_s390_pv_handle_cpu(vcpu)) {
>> +			rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
>> +					   UVC_CMD_CPU_RESET_CLEAR, &ret);
>> +			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET CLEAR VCPU: cpu %d rc %x rrc %x",
>> +				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
>> +		}
>> +		break;
>> +	default:
>> +		rc = -EINVAL;
>> +		break;
> 
> (nit: you could drop the "break;" here)
> 
>> +	}
>> +	return rc;
>> +}
>> +
>> +
>>  int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>>  {
>>  	vcpu_load(vcpu);
>> @@ -4633,8 +4680,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>>  		break;
>>  	}
>>  	case KVM_S390_INITIAL_RESET:
>> +		r = -EINVAL;
>> +		if (kvm_s390_pv_is_protected(vcpu->kvm))
>> +			break;
> 
> Wouldn't it be nicer to call
> 
>   kvm_arch_vcpu_ioctl_reset(vcpu, KVM_S390_VCPU_RESET_INITIAL)
> 
> in this case instead?

How about:
        case KVM_S390_INITIAL_RESET:


                arg = KVM_S390_VCPU_RESET_INITIAL;


        case KVM_S390_VCPU_RESET:


                r = kvm_arch_vcpu_ioctl_reset(vcpu, arg);


                break;



> 
>>  		r = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
>>  		break;
>> +	case KVM_S390_VCPU_RESET:
>> +		r = kvm_arch_vcpu_ioctl_reset(vcpu, arg);
>> +		break;
>>  	case KVM_SET_ONE_REG:
>>  	case KVM_GET_ONE_REG: {
>>  		struct kvm_one_reg reg;
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index f75a051a7705..2846ed5e5dd9 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1496,6 +1496,12 @@ struct kvm_pv_cmd {
>>  #define KVM_S390_PV_COMMAND		_IOW(KVMIO, 0xc3, struct kvm_pv_cmd)
>>  #define KVM_S390_PV_COMMAND_VCPU	_IOW(KVMIO, 0xc4, struct kvm_pv_cmd)
>>  
>> +#define KVM_S390_VCPU_RESET_NORMAL	0
>> +#define KVM_S390_VCPU_RESET_INITIAL	1
>> +#define KVM_S390_VCPU_RESET_CLEAR	2
>> +
>> +#define KVM_S390_VCPU_RESET    _IO(KVMIO,   0xd0)
> 
> Why not 0xc5 ?

Fixed

> 
>  Thomas
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 33/37] KVM: s390: Introduce VCPU reset IOCTL
  2019-11-15 13:06     ` Janosch Frank
@ 2019-11-15 13:18       ` Thomas Huth
  0 siblings, 0 replies; 213+ messages in thread
From: Thomas Huth @ 2019-11-15 13:18 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 3214 bytes --]

On 15/11/2019 14.06, Janosch Frank wrote:
> On 11/15/19 11:47 AM, Thomas Huth wrote:
>> On 24/10/2019 13.40, Janosch Frank wrote:
>>> With PV we need to do things for all reset types, not only initial...
>>>
>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>> ---
>>>  arch/s390/kvm/kvm-s390.c | 53 ++++++++++++++++++++++++++++++++++++++++
>>>  include/uapi/linux/kvm.h |  6 +++++
>>>  2 files changed, 59 insertions(+)
>>>
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index d3fd3ad1d09b..d8ee3a98e961 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -3472,6 +3472,53 @@ static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu)
>>>  	return 0;
>>>  }
>>>  
>>> +static int kvm_arch_vcpu_ioctl_reset(struct kvm_vcpu *vcpu,
>>> +				     unsigned long type)
>>> +{
>>> +	int rc;
>>> +	u32 ret;
>>> +
>>> +	switch (type) {
>>> +	case KVM_S390_VCPU_RESET_NORMAL:
>>> +		/*
>>> +		 * Only very little is reset, userspace handles the
>>> +		 * non-protected case.
>>> +		 */
>>> +		rc = 0;
>>> +		if (kvm_s390_pv_handle_cpu(vcpu)) {
>>> +			rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
>>> +					   UVC_CMD_CPU_RESET, &ret);
>>> +			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET NORMAL VCPU: cpu %d rc %x rrc %x",
>>> +				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
>>> +		}
>>> +		break;
>>> +	case KVM_S390_VCPU_RESET_INITIAL:
>>> +		rc = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
>>> +		if (kvm_s390_pv_handle_cpu(vcpu)) {
>>> +			uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
>>> +				      UVC_CMD_CPU_RESET_INITIAL,
>>> +				      &ret);
>>> +			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET INITIAL VCPU: cpu %d rc %x rrc %x",
>>> +				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
>>> +		}
>>> +		break;
>>> +	case KVM_S390_VCPU_RESET_CLEAR:
>>> +		rc = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
>>> +		if (kvm_s390_pv_handle_cpu(vcpu)) {
>>> +			rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
>>> +					   UVC_CMD_CPU_RESET_CLEAR, &ret);
>>> +			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET CLEAR VCPU: cpu %d rc %x rrc %x",
>>> +				   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
>>> +		}
>>> +		break;
>>> +	default:
>>> +		rc = -EINVAL;
>>> +		break;
>>
>> (nit: you could drop the "break;" here)
>>
>>> +	}
>>> +	return rc;
>>> +}
>>> +
>>> +
>>>  int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>>>  {
>>>  	vcpu_load(vcpu);
>>> @@ -4633,8 +4680,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>>>  		break;
>>>  	}
>>>  	case KVM_S390_INITIAL_RESET:
>>> +		r = -EINVAL;
>>> +		if (kvm_s390_pv_is_protected(vcpu->kvm))
>>> +			break;
>>
>> Wouldn't it be nicer to call
>>
>>   kvm_arch_vcpu_ioctl_reset(vcpu, KVM_S390_VCPU_RESET_INITIAL)
>>
>> in this case instead?
> 
> How about:
>         case KVM_S390_INITIAL_RESET:
> 
> 
>                 arg = KVM_S390_VCPU_RESET_INITIAL;
> 

Add a "/* fallthrough */" comment here...

>         case KVM_S390_VCPU_RESET:
> 
> 
>                 r = kvm_arch_vcpu_ioctl_reset(vcpu, arg);
> 
> 
>                 break;

... then this looks good, yes!

 Thomas


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 32/37] KVM: s390: protvirt: UV calls diag308 0, 1
  2019-11-15 11:39     ` Janosch Frank
@ 2019-11-15 13:30       ` Thomas Huth
  2019-11-15 14:08         ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Thomas Huth @ 2019-11-15 13:30 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 15/11/2019 12.39, Janosch Frank wrote:
> On 11/15/19 11:07 AM, Thomas Huth wrote:
>> On 24/10/2019 13.40, Janosch Frank wrote:
>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>> ---
>>>  arch/s390/include/asm/uv.h | 25 +++++++++++++++++++++++++
>>>  arch/s390/kvm/diag.c       |  1 +
>>>  arch/s390/kvm/kvm-s390.c   | 20 ++++++++++++++++++++
>>>  arch/s390/kvm/kvm-s390.h   |  2 ++
>>>  arch/s390/kvm/pv.c         | 19 +++++++++++++++++++
>>>  include/uapi/linux/kvm.h   |  2 ++
>>>  6 files changed, 69 insertions(+)
>>
>> Add at least a short patch description what this patch is all about?
>>
>>  Thomas
>>
> 
> I'm thinking about taking out the set cpu state changes and move it into
> a later patch.
> 
> 
> How about:
> diag 308 subcode 0 and 1 require KVM and Ultravisor interaction, since
> the cpus have to be set into multiple reset states.
> 
> * All cpus need to be stopped
> * The unshare all UVC needs to be executed
> * The perform reset UVC needs to be executed
> * The cpus need to be reset via the set cpu state UVC
> * The issuing cpu needs to set state 5 via set cpu state

Could you put the UVC names into quotes? Like:

* The "unshare all" UVC needs to be executed

... I first had to read the sentence three times to really understand it.

 Thomas

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 32/37] KVM: s390: protvirt: UV calls diag308 0, 1
  2019-11-15 13:30       ` Thomas Huth
@ 2019-11-15 14:08         ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-15 14:08 UTC (permalink / raw)
  To: Thomas Huth, kvm
  Cc: linux-s390, david, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 1432 bytes --]

On 11/15/19 2:30 PM, Thomas Huth wrote:
> On 15/11/2019 12.39, Janosch Frank wrote:
>> On 11/15/19 11:07 AM, Thomas Huth wrote:
>>> On 24/10/2019 13.40, Janosch Frank wrote:
>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>> ---
>>>>  arch/s390/include/asm/uv.h | 25 +++++++++++++++++++++++++
>>>>  arch/s390/kvm/diag.c       |  1 +
>>>>  arch/s390/kvm/kvm-s390.c   | 20 ++++++++++++++++++++
>>>>  arch/s390/kvm/kvm-s390.h   |  2 ++
>>>>  arch/s390/kvm/pv.c         | 19 +++++++++++++++++++
>>>>  include/uapi/linux/kvm.h   |  2 ++
>>>>  6 files changed, 69 insertions(+)
>>>
>>> Add at least a short patch description what this patch is all about?
>>>
>>>  Thomas
>>>
>>
>> I'm thinking about taking out the set cpu state changes and move it into
>> a later patch.
>>
>>
>> How about:
>> diag 308 subcode 0 and 1 require KVM and Ultravisor interaction, since
>> the cpus have to be set into multiple reset states.
>>
>> * All cpus need to be stopped
>> * The unshare all UVC needs to be executed
>> * The perform reset UVC needs to be executed
>> * The cpus need to be reset via the set cpu state UVC
>> * The issuing cpu needs to set state 5 via set cpu state
> 
> Could you put the UVC names into quotes? Like:
> 
> * The "unshare all" UVC needs to be executed
> 
> ... I first had to read the sentence three times to really understand it.
> 
>  Thomas
> 

Sure, just did


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected
  2019-10-24 11:40 ` [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected Janosch Frank
@ 2019-11-18 16:39   ` Cornelia Huck
  2019-11-19  8:11     ` Janosch Frank
  2019-11-19 10:18   ` David Hildenbrand
  1 sibling, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-18 16:39 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:45 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

Add at least a short sentence here?

> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index eddc9508c1b1..17a78774c617 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -3646,6 +3646,15 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
>  		rc = gmap_mprotect_notify(vcpu->arch.gmap,
>  					  kvm_s390_get_prefix(vcpu),
>  					  PAGE_SIZE * 2, PROT_WRITE);
> +		if (!rc && kvm_s390_pv_is_protected(vcpu->kvm)) {
> +			rc = uv_convert_to_secure(vcpu->arch.gmap,
> +						  kvm_s390_get_prefix(vcpu));
> +			WARN_ON_ONCE(rc && rc != -EEXIST);
> +			rc = uv_convert_to_secure(vcpu->arch.gmap,
> +						  kvm_s390_get_prefix(vcpu) + PAGE_SIZE);
> +			WARN_ON_ONCE(rc && rc != -EEXIST);
> +			rc = 0;

So, what happens if we have an error other than -EEXIST (which
presumably means that we already protected it) here? The page is not
protected, and further accesses will get an error? (Another question:
what can actually go wrong here?)

> +		}
>  		if (rc) {
>  			kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
>  			return rc;

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 36/37] KVM: s390: protvirt: Support cmd 5 operation state
  2019-10-24 11:40 ` [RFC 36/37] KVM: s390: protvirt: Support cmd 5 operation state Janosch Frank
  2019-11-15 11:25   ` Thomas Huth
@ 2019-11-18 17:38   ` Cornelia Huck
  2019-11-19  8:13     ` Janosch Frank
  1 sibling, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-18 17:38 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

On Thu, 24 Oct 2019 07:40:58 -0400
Janosch Frank <frankja@linux.ibm.com> wrote:

> Code 5 for the set cpu state UV call tells the UV to load a PSW from
> the SE header (first IPL) or from guest location 0x0 (diag 308 subcode
> 0/1). Also it sets the cpu into operating state afterwards, so we can
> start it.

I'm a bit confused by the patch description: Does this mean that the UV
does the transition to operating state? Does the hypervisor get a
notification for that?

> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/uv.h | 1 +
>  arch/s390/kvm/kvm-s390.c   | 4 ++++
>  include/uapi/linux/kvm.h   | 1 +
>  3 files changed, 6 insertions(+)
> 
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 33b52ba306af..8d10ae731458 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -163,6 +163,7 @@ struct uv_cb_unp {
>  #define PV_CPU_STATE_OPR	1
>  #define PV_CPU_STATE_STP	2
>  #define PV_CPU_STATE_CHKSTP	3
> +#define PV_CPU_STATE_OPR_LOAD	5
>  
>  struct uv_cb_cpu_set_state {
>  	struct uv_cb_header header;
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index cc5feb67f145..5cc9108c94e4 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -4652,6 +4652,10 @@ static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
>  		r = kvm_s390_pv_destroy_cpu(vcpu);
>  		break;
>  	}
> +	case KVM_PV_VCPU_SET_IPL_PSW: {
> +		r = kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_OPR_LOAD);
> +		break;
> +	}
>  	default:
>  		r = -ENOTTY;
>  	}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2846ed5e5dd9..973007d27d55 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1483,6 +1483,7 @@ enum pv_cmd_id {
>  	KVM_PV_VM_UNSHARE,
>  	KVM_PV_VCPU_CREATE,
>  	KVM_PV_VCPU_DESTROY,
> +	KVM_PV_VCPU_SET_IPL_PSW,
>  };
>  
>  struct kvm_pv_cmd {

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected
  2019-11-18 16:39   ` Cornelia Huck
@ 2019-11-19  8:11     ` Janosch Frank
  2019-11-19  9:45       ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-19  8:11 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 1806 bytes --]

On 11/18/19 5:39 PM, Cornelia Huck wrote:
> On Thu, 24 Oct 2019 07:40:45 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
> Add at least a short sentence here?

For protected VMs the lowcore does not only need to be mapped, but also
needs to be protected memory, if not we'll get a validity interception
when trying to run it.

> 
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/kvm/kvm-s390.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index eddc9508c1b1..17a78774c617 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -3646,6 +3646,15 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
>>  		rc = gmap_mprotect_notify(vcpu->arch.gmap,
>>  					  kvm_s390_get_prefix(vcpu),
>>  					  PAGE_SIZE * 2, PROT_WRITE);
>> +		if (!rc && kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +			rc = uv_convert_to_secure(vcpu->arch.gmap,
>> +						  kvm_s390_get_prefix(vcpu));
>> +			WARN_ON_ONCE(rc && rc != -EEXIST);
>> +			rc = uv_convert_to_secure(vcpu->arch.gmap,
>> +						  kvm_s390_get_prefix(vcpu) + PAGE_SIZE);
>> +			WARN_ON_ONCE(rc && rc != -EEXIST);
>> +			rc = 0;
> 
> So, what happens if we have an error other than -EEXIST (which
> presumably means that we already protected it) here? The page is not
> protected, and further accesses will get an error? (Another question:
> what can actually go wrong here?)

If KVM or QEMU write into a lowcore, we will fail the integrity check on
import and this cpu will never run again.

In retrospect a warn_on_once might be the wrong error handling here.

> 
>> +		}
>>  		if (rc) {
>>  			kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
>>  			return rc;
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 36/37] KVM: s390: protvirt: Support cmd 5 operation state
  2019-11-18 17:38   ` Cornelia Huck
@ 2019-11-19  8:13     ` Janosch Frank
  2019-11-19 10:23       ` Cornelia Huck
  0 siblings, 1 reply; 213+ messages in thread
From: Janosch Frank @ 2019-11-19  8:13 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 2362 bytes --]

On 11/18/19 6:38 PM, Cornelia Huck wrote:
> On Thu, 24 Oct 2019 07:40:58 -0400
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> Code 5 for the set cpu state UV call tells the UV to load a PSW from
>> the SE header (first IPL) or from guest location 0x0 (diag 308 subcode
>> 0/1). Also it sets the cpu into operating state afterwards, so we can
>> start it.
> 
> I'm a bit confused by the patch description: Does this mean that the UV
> does the transition to operating state? Does the hypervisor get a
> notification for that?

CMD 5 is defined as "load psw and set to operating".
Currently QEMU will still go out and do a "set to operating" after the
cmd 5 because our current infrastructure does it and it's basically a
nop, so I didn't want to put in the effort to remove it.

> 
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/uv.h | 1 +
>>  arch/s390/kvm/kvm-s390.c   | 4 ++++
>>  include/uapi/linux/kvm.h   | 1 +
>>  3 files changed, 6 insertions(+)
>>
>> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
>> index 33b52ba306af..8d10ae731458 100644
>> --- a/arch/s390/include/asm/uv.h
>> +++ b/arch/s390/include/asm/uv.h
>> @@ -163,6 +163,7 @@ struct uv_cb_unp {
>>  #define PV_CPU_STATE_OPR	1
>>  #define PV_CPU_STATE_STP	2
>>  #define PV_CPU_STATE_CHKSTP	3
>> +#define PV_CPU_STATE_OPR_LOAD	5
>>  
>>  struct uv_cb_cpu_set_state {
>>  	struct uv_cb_header header;
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index cc5feb67f145..5cc9108c94e4 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -4652,6 +4652,10 @@ static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
>>  		r = kvm_s390_pv_destroy_cpu(vcpu);
>>  		break;
>>  	}
>> +	case KVM_PV_VCPU_SET_IPL_PSW: {
>> +		r = kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_OPR_LOAD);
>> +		break;
>> +	}
>>  	default:
>>  		r = -ENOTTY;
>>  	}
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 2846ed5e5dd9..973007d27d55 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1483,6 +1483,7 @@ enum pv_cmd_id {
>>  	KVM_PV_VM_UNSHARE,
>>  	KVM_PV_VCPU_CREATE,
>>  	KVM_PV_VCPU_DESTROY,
>> +	KVM_PV_VCPU_SET_IPL_PSW,
>>  };
>>  
>>  struct kvm_pv_cmd {
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected
  2019-11-19  8:11     ` Janosch Frank
@ 2019-11-19  9:45       ` Cornelia Huck
  2019-11-19 10:08         ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-19  9:45 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

[-- Attachment #1: Type: text/plain, Size: 2101 bytes --]

On Tue, 19 Nov 2019 09:11:11 +0100
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 11/18/19 5:39 PM, Cornelia Huck wrote:
> > On Thu, 24 Oct 2019 07:40:45 -0400
> > Janosch Frank <frankja@linux.ibm.com> wrote:
> > 
> > Add at least a short sentence here?  
> 
> For protected VMs the lowcore does not only need to be mapped, but also
> needs to be protected memory, if not we'll get a validity interception
> when trying to run it.

Much better, thanks!

> 
> >   
> >> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> >> ---
> >>  arch/s390/kvm/kvm-s390.c | 9 +++++++++
> >>  1 file changed, 9 insertions(+)
> >>
> >> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> >> index eddc9508c1b1..17a78774c617 100644
> >> --- a/arch/s390/kvm/kvm-s390.c
> >> +++ b/arch/s390/kvm/kvm-s390.c
> >> @@ -3646,6 +3646,15 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
> >>  		rc = gmap_mprotect_notify(vcpu->arch.gmap,
> >>  					  kvm_s390_get_prefix(vcpu),
> >>  					  PAGE_SIZE * 2, PROT_WRITE);
> >> +		if (!rc && kvm_s390_pv_is_protected(vcpu->kvm)) {
> >> +			rc = uv_convert_to_secure(vcpu->arch.gmap,
> >> +						  kvm_s390_get_prefix(vcpu));
> >> +			WARN_ON_ONCE(rc && rc != -EEXIST);
> >> +			rc = uv_convert_to_secure(vcpu->arch.gmap,
> >> +						  kvm_s390_get_prefix(vcpu) + PAGE_SIZE);
> >> +			WARN_ON_ONCE(rc && rc != -EEXIST);
> >> +			rc = 0;  
> > 
> > So, what happens if we have an error other than -EEXIST (which
> > presumably means that we already protected it) here? The page is not
> > protected, and further accesses will get an error? (Another question:
> > what can actually go wrong here?)  
> 
> If KVM or QEMU write into a lowcore, we will fail the integrity check on
> import and this cpu will never run again.

From the guest's POV, is that similar to a cpu going into xstop?
 
> In retrospect a warn_on_once might be the wrong error handling here.
> 
> >   
> >> +		}
> >>  		if (rc) {
> >>  			kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
> >>  			return rc;  
> >   
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected
  2019-11-19  9:45       ` Cornelia Huck
@ 2019-11-19 10:08         ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-19 10:08 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 861 bytes --]

On 11/19/19 10:45 AM, Cornelia Huck wrote:
> On Tue, 19 Nov 2019 09:11:11 +0100
[...]
>>>
>>> So, what happens if we have an error other than -EEXIST (which
>>> presumably means that we already protected it) here? The page is not
>>> protected, and further accesses will get an error? (Another question:
>>> what can actually go wrong here?)  
>>
>> If KVM or QEMU write into a lowcore, we will fail the integrity check on
>> import and this cpu will never run again.
> 
> From the guest's POV, is that similar to a cpu going into xstop?

Not really, I'm wondering what happens, if we try to send a restart to
such a cpu.

>  
>> In retrospect a warn_on_once might be the wrong error handling here.
>>
>>>   
>>>> +		}
>>>>  		if (rc) {
>>>>  			kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
>>>>  			return rc;  
>>>   
>>
>>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected
  2019-10-24 11:40 ` [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected Janosch Frank
  2019-11-18 16:39   ` Cornelia Huck
@ 2019-11-19 10:18   ` David Hildenbrand
  2019-11-19 11:36     ` Janosch Frank
  1 sibling, 1 reply; 213+ messages in thread
From: David Hildenbrand @ 2019-11-19 10:18 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor

On 24.10.19 13:40, Janosch Frank wrote:
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>   arch/s390/kvm/kvm-s390.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index eddc9508c1b1..17a78774c617 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -3646,6 +3646,15 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
>   		rc = gmap_mprotect_notify(vcpu->arch.gmap,
>   					  kvm_s390_get_prefix(vcpu),
>   					  PAGE_SIZE * 2, PROT_WRITE);
> +		if (!rc && kvm_s390_pv_is_protected(vcpu->kvm)) {
> +			rc = uv_convert_to_secure(vcpu->arch.gmap,
> +						  kvm_s390_get_prefix(vcpu));
> +			WARN_ON_ONCE(rc && rc != -EEXIST);
> +			rc = uv_convert_to_secure(vcpu->arch.gmap,
> +						  kvm_s390_get_prefix(vcpu) + PAGE_SIZE);
> +			WARN_ON_ONCE(rc && rc != -EEXIST);
> +			rc = 0;
> +		}

... what if userspace reads the prefix pages just after these calls? 
validity? :/

>   		if (rc) {
>   			kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
>   			return rc;
> 

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 36/37] KVM: s390: protvirt: Support cmd 5 operation state
  2019-11-19  8:13     ` Janosch Frank
@ 2019-11-19 10:23       ` Cornelia Huck
  2019-11-19 11:40         ` Janosch Frank
  0 siblings, 1 reply; 213+ messages in thread
From: Cornelia Huck @ 2019-11-19 10:23 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor

[-- Attachment #1: Type: text/plain, Size: 3387 bytes --]

On Tue, 19 Nov 2019 09:13:11 +0100
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 11/18/19 6:38 PM, Cornelia Huck wrote:
> > On Thu, 24 Oct 2019 07:40:58 -0400
> > Janosch Frank <frankja@linux.ibm.com> wrote:
> >   
> >> Code 5 for the set cpu state UV call tells the UV to load a PSW from
> >> the SE header (first IPL) or from guest location 0x0 (diag 308 subcode
> >> 0/1). Also it sets the cpu into operating state afterwards, so we can
> >> start it.  
> > 
> > I'm a bit confused by the patch description: Does this mean that the UV
> > does the transition to operating state? Does the hypervisor get a
> > notification for that?  
> 
> CMD 5 is defined as "load psw and set to operating".
> Currently QEMU will still go out and do a "set to operating" after the
> cmd 5 because our current infrastructure does it and it's basically a
> nop, so I didn't want to put in the effort to remove it.

So, the "it" setting the cpu into operating state is QEMU, via the
mpstate interface, which triggers that call? Or is that implicit, but
it does not hurt to do it again (which would make more sense to me)?

Assuming the latter, what about the following description:

"KVM: s390: protvirt: support setting cpu state 5

Setting code 5 ("load psw and set to operating") in the set cpu state
UV call tells the UV to load a PSW either from the SE header (first
IPL) or from guest location 0x0 (diag 308 subcode 0/1). Subsequently,
the cpu is set into operating state by the UV.

Note that we can still instruct the UV to set the cpu into operating
state explicitly afterwards."

> 
> >   
> >>
> >> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> >> ---
> >>  arch/s390/include/asm/uv.h | 1 +
> >>  arch/s390/kvm/kvm-s390.c   | 4 ++++
> >>  include/uapi/linux/kvm.h   | 1 +
> >>  3 files changed, 6 insertions(+)
> >>
> >> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> >> index 33b52ba306af..8d10ae731458 100644
> >> --- a/arch/s390/include/asm/uv.h
> >> +++ b/arch/s390/include/asm/uv.h
> >> @@ -163,6 +163,7 @@ struct uv_cb_unp {
> >>  #define PV_CPU_STATE_OPR	1
> >>  #define PV_CPU_STATE_STP	2
> >>  #define PV_CPU_STATE_CHKSTP	3
> >> +#define PV_CPU_STATE_OPR_LOAD	5
> >>  
> >>  struct uv_cb_cpu_set_state {
> >>  	struct uv_cb_header header;
> >> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> >> index cc5feb67f145..5cc9108c94e4 100644
> >> --- a/arch/s390/kvm/kvm-s390.c
> >> +++ b/arch/s390/kvm/kvm-s390.c
> >> @@ -4652,6 +4652,10 @@ static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
> >>  		r = kvm_s390_pv_destroy_cpu(vcpu);
> >>  		break;
> >>  	}
> >> +	case KVM_PV_VCPU_SET_IPL_PSW: {
> >> +		r = kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_OPR_LOAD);

Also maybe add a comment here that setting into oper state (again) can
be done separately?

> >> +		break;
> >> +	}
> >>  	default:
> >>  		r = -ENOTTY;
> >>  	}
> >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >> index 2846ed5e5dd9..973007d27d55 100644
> >> --- a/include/uapi/linux/kvm.h
> >> +++ b/include/uapi/linux/kvm.h
> >> @@ -1483,6 +1483,7 @@ enum pv_cmd_id {
> >>  	KVM_PV_VM_UNSHARE,
> >>  	KVM_PV_VCPU_CREATE,
> >>  	KVM_PV_VCPU_DESTROY,
> >> +	KVM_PV_VCPU_SET_IPL_PSW,
> >>  };
> >>  
> >>  struct kvm_pv_cmd {  
> >   
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected
  2019-11-19 10:18   ` David Hildenbrand
@ 2019-11-19 11:36     ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-19 11:36 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, thuth, borntraeger, imbrenda, mihajlov, mimu, cohuck, gor


[-- Attachment #1.1: Type: text/plain, Size: 1269 bytes --]

On 11/19/19 11:18 AM, David Hildenbrand wrote:
> On 24.10.19 13:40, Janosch Frank wrote:
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>   arch/s390/kvm/kvm-s390.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index eddc9508c1b1..17a78774c617 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -3646,6 +3646,15 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
>>   		rc = gmap_mprotect_notify(vcpu->arch.gmap,
>>   					  kvm_s390_get_prefix(vcpu),
>>   					  PAGE_SIZE * 2, PROT_WRITE);
>> +		if (!rc && kvm_s390_pv_is_protected(vcpu->kvm)) {
>> +			rc = uv_convert_to_secure(vcpu->arch.gmap,
>> +						  kvm_s390_get_prefix(vcpu));
>> +			WARN_ON_ONCE(rc && rc != -EEXIST);
>> +			rc = uv_convert_to_secure(vcpu->arch.gmap,
>> +						  kvm_s390_get_prefix(vcpu) + PAGE_SIZE);
>> +			WARN_ON_ONCE(rc && rc != -EEXIST);
>> +			rc = 0;
>> +		}
> 
> ... what if userspace reads the prefix pages just after these calls? 
> validity? :/

Currently yes, we're working with firmware to fix this.

> 
>>   		if (rc) {
>>   			kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
>>   			return rc;
>>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [RFC 36/37] KVM: s390: protvirt: Support cmd 5 operation state
  2019-11-19 10:23       ` Cornelia Huck
@ 2019-11-19 11:40         ` Janosch Frank
  0 siblings, 0 replies; 213+ messages in thread
From: Janosch Frank @ 2019-11-19 11:40 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-s390, thuth, david, borntraeger, imbrenda, mihajlov,
	mimu, gor


[-- Attachment #1.1: Type: text/plain, Size: 3668 bytes --]

On 11/19/19 11:23 AM, Cornelia Huck wrote:
> On Tue, 19 Nov 2019 09:13:11 +0100
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> On 11/18/19 6:38 PM, Cornelia Huck wrote:
>>> On Thu, 24 Oct 2019 07:40:58 -0400
>>> Janosch Frank <frankja@linux.ibm.com> wrote:
>>>   
>>>> Code 5 for the set cpu state UV call tells the UV to load a PSW from
>>>> the SE header (first IPL) or from guest location 0x0 (diag 308 subcode
>>>> 0/1). Also it sets the cpu into operating state afterwards, so we can
>>>> start it.  
>>>
>>> I'm a bit confused by the patch description: Does this mean that the UV
>>> does the transition to operating state? Does the hypervisor get a
>>> notification for that?  
>>
>> CMD 5 is defined as "load psw and set to operating".
>> Currently QEMU will still go out and do a "set to operating" after the
>> cmd 5 because our current infrastructure does it and it's basically a
>> nop, so I didn't want to put in the effort to remove it.
> 
> So, the "it" setting the cpu into operating state is QEMU, via the
> mpstate interface, which triggers that call? Or is that implicit, but
> it does not hurt to do it again (which would make more sense to me)?

Qemu sets operating state via mpstate.
I could fence that via env->pv checks but that would also mean more code
and setting operating twice is just a nop.

> 
> Assuming the latter, what about the following description:
> 
> "KVM: s390: protvirt: support setting cpu state 5
> 
> Setting code 5 ("load psw and set to operating") in the set cpu state
> UV call tells the UV to load a PSW either from the SE header (first
> IPL) or from guest location 0x0 (diag 308 subcode 0/1). Subsequently,
> the cpu is set into operating state by the UV.
> 
> Note that we can still instruct the UV to set the cpu into operating
> state explicitly afterwards."

Sounds good

> 
>>
>>>   
>>>>
>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>> ---
>>>>  arch/s390/include/asm/uv.h | 1 +
>>>>  arch/s390/kvm/kvm-s390.c   | 4 ++++
>>>>  include/uapi/linux/kvm.h   | 1 +
>>>>  3 files changed, 6 insertions(+)
>>>>
>>>> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
>>>> index 33b52ba306af..8d10ae731458 100644
>>>> --- a/arch/s390/include/asm/uv.h
>>>> +++ b/arch/s390/include/asm/uv.h
>>>> @@ -163,6 +163,7 @@ struct uv_cb_unp {
>>>>  #define PV_CPU_STATE_OPR	1
>>>>  #define PV_CPU_STATE_STP	2
>>>>  #define PV_CPU_STATE_CHKSTP	3
>>>> +#define PV_CPU_STATE_OPR_LOAD	5
>>>>  
>>>>  struct uv_cb_cpu_set_state {
>>>>  	struct uv_cb_header header;
>>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>>> index cc5feb67f145..5cc9108c94e4 100644
>>>> --- a/arch/s390/kvm/kvm-s390.c
>>>> +++ b/arch/s390/kvm/kvm-s390.c
>>>> @@ -4652,6 +4652,10 @@ static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
>>>>  		r = kvm_s390_pv_destroy_cpu(vcpu);
>>>>  		break;
>>>>  	}
>>>> +	case KVM_PV_VCPU_SET_IPL_PSW: {
>>>> +		r = kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_OPR_LOAD);
> 
> Also maybe add a comment here that setting into oper state (again) can
> be done separately?
> 
>>>> +		break;
>>>> +	}
>>>>  	default:
>>>>  		r = -ENOTTY;
>>>>  	}
>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>> index 2846ed5e5dd9..973007d27d55 100644
>>>> --- a/include/uapi/linux/kvm.h
>>>> +++ b/include/uapi/linux/kvm.h
>>>> @@ -1483,6 +1483,7 @@ enum pv_cmd_id {
>>>>  	KVM_PV_VM_UNSHARE,
>>>>  	KVM_PV_VCPU_CREATE,
>>>>  	KVM_PV_VCPU_DESTROY,
>>>> +	KVM_PV_VCPU_SET_IPL_PSW,
>>>>  };
>>>>  
>>>>  struct kvm_pv_cmd {  
>>>   
>>
>>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

end of thread, other threads:[~2019-11-19 11:40 UTC | newest]

Thread overview: 213+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24 11:40 [RFC 00/37] KVM: s390: Add support for protected VMs Janosch Frank
2019-10-24 11:40 ` [RFC 01/37] DOCUMENTATION: protvirt: Protected virtual machine introduction Janosch Frank
2019-11-01  8:18   ` Christian Borntraeger
2019-11-04 14:18   ` Cornelia Huck
2019-11-12 14:38     ` Janosch Frank
2019-10-24 11:40 ` [RFC 02/37] s390/protvirt: introduce host side setup Janosch Frank
2019-10-24 13:25   ` David Hildenbrand
2019-10-24 13:27     ` David Hildenbrand
2019-10-24 13:40       ` Christian Borntraeger
2019-10-24 15:52         ` David Hildenbrand
2019-10-24 16:30           ` Claudio Imbrenda
2019-10-24 16:54             ` David Hildenbrand
2019-10-28 14:54   ` Cornelia Huck
2019-10-28 20:20     ` Christian Borntraeger
2019-11-01  8:53   ` Christian Borntraeger
2019-11-04 14:26     ` Cornelia Huck
2019-11-12 14:47       ` Janosch Frank
2019-11-04 15:54   ` Cornelia Huck
2019-11-04 17:50     ` Christian Borntraeger
2019-11-05  9:26       ` Cornelia Huck
2019-11-08 12:14         ` Thomas Huth
2019-10-24 11:40 ` [RFC 03/37] s390/protvirt: add ultravisor initialization Janosch Frank
2019-10-25  9:21   ` David Hildenbrand
2019-10-28 15:48     ` Vasily Gorbik
2019-10-28 15:54       ` David Hildenbrand
2019-11-01 10:07   ` Christian Borntraeger
2019-11-07 15:28   ` Cornelia Huck
2019-11-07 15:32     ` Janosch Frank
2019-10-24 11:40 ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Janosch Frank
2019-10-25  8:58   ` David Hildenbrand
2019-10-25  9:02     ` David Hildenbrand
2019-11-04  8:18   ` Christian Borntraeger
2019-11-04  8:41     ` Janosch Frank
2019-11-07 16:29   ` Cornelia Huck
2019-11-08  7:36     ` Janosch Frank
2019-11-11 16:25       ` Cornelia Huck
2019-11-11 16:39         ` Janosch Frank
2019-11-11 16:54           ` Cornelia Huck
2019-11-13 10:05         ` Thomas Huth
2019-11-08 13:44   ` Thomas Huth
2019-11-13 10:28   ` Thomas Huth
2019-11-13 11:34     ` Janosch Frank
2019-11-13 14:03     ` [PATCH] Fix unpack Janosch Frank
2019-11-13 14:19       ` Thomas Huth
2019-11-13 14:36       ` Cornelia Huck
2019-11-13 11:48   ` [RFC 04/37] KVM: s390: protvirt: Add initial lifecycle handling Cornelia Huck
2019-10-24 11:40 ` [RFC 05/37] s390: KVM: Export PV handle to gmap Janosch Frank
2019-10-25  9:04   ` David Hildenbrand
2019-10-24 11:40 ` [RFC 06/37] s390: UV: Add import and export to UV library Janosch Frank
2019-10-25  8:31   ` David Hildenbrand
2019-10-25  8:39     ` Janosch Frank
2019-10-25  8:40       ` David Hildenbrand
2019-10-25  8:42         ` Janosch Frank
2019-11-01 11:26   ` Christian Borntraeger
2019-11-01 12:25     ` Janosch Frank
2019-11-01 12:39       ` Christian Borntraeger
2019-11-01 12:42   ` Christian Borntraeger
2019-11-11 16:40   ` Cornelia Huck
2019-11-11 16:56     ` Janosch Frank
2019-10-24 11:40 ` [RFC 07/37] KVM: s390: protvirt: Secure memory is not mergeable Janosch Frank
2019-10-24 16:07   ` David Hildenbrand
2019-10-24 16:33     ` Claudio Imbrenda
2019-10-24 16:49       ` David Hildenbrand
2019-10-25  7:18     ` Janosch Frank
2019-10-25  8:04       ` David Hildenbrand
2019-10-25  8:20         ` Janosch Frank
2019-10-25  7:46   ` David Hildenbrand
2019-10-25  8:24   ` [RFC v2] " Janosch Frank
2019-11-01 13:02     ` Christian Borntraeger
2019-11-04 14:32     ` David Hildenbrand
2019-11-04 14:36       ` Janosch Frank
2019-11-04 14:38         ` David Hildenbrand
2019-11-13 12:23     ` Thomas Huth
2019-11-13 15:54       ` Janosch Frank
2019-10-24 11:40 ` [RFC 08/37] KVM: s390: add missing include in gmap.h Janosch Frank
2019-10-25  8:24   ` David Hildenbrand
2019-11-13 12:27   ` Thomas Huth
2019-10-24 11:40 ` [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning Janosch Frank
2019-10-25  8:49   ` David Hildenbrand
2019-10-31 15:41     ` Christian Borntraeger
2019-10-31 17:30       ` David Hildenbrand
2019-10-31 20:57         ` Janosch Frank
2019-11-04 10:19           ` David Hildenbrand
2019-11-04 10:25             ` Janosch Frank
2019-11-04 10:27               ` David Hildenbrand
2019-11-04 13:58             ` Christian Borntraeger
2019-11-04 14:08               ` David Hildenbrand
2019-11-04 14:42                 ` David Hildenbrand
2019-11-04 17:17                   ` Cornelia Huck
2019-11-04 17:44                     ` David Hildenbrand
2019-11-04 18:38                     ` David Hildenbrand
2019-11-05  9:15                       ` Cornelia Huck
2019-11-01  8:50         ` Claudio Imbrenda
2019-11-04 10:22           ` David Hildenbrand
2019-11-02  8:53   ` Christian Borntraeger
2019-11-04 14:17   ` David Hildenbrand
2019-10-24 11:40 ` [RFC 10/37] s390: add (non)secure page access exceptions handlers Janosch Frank
2019-10-24 11:40 ` [RFC 11/37] DOCUMENTATION: protvirt: Interrupt injection Janosch Frank
2019-11-14 13:09   ` Cornelia Huck
2019-11-14 13:25     ` Claudio Imbrenda
2019-11-14 13:47       ` Cornelia Huck
2019-11-14 16:33         ` Janosch Frank
2019-10-24 11:40 ` [RFC 12/37] KVM: s390: protvirt: Handle SE notification interceptions Janosch Frank
2019-10-30 15:50   ` David Hildenbrand
2019-10-30 17:58     ` Janosch Frank
2019-11-05 18:04   ` Cornelia Huck
2019-11-05 18:15     ` Christian Borntraeger
2019-11-05 18:37       ` Cornelia Huck
2019-10-24 11:40 ` [RFC 13/37] KVM: s390: protvirt: Add interruption injection controls Janosch Frank
2019-10-30 15:53   ` David Hildenbrand
2019-10-31  8:48     ` Michael Mueller
2019-10-31  9:15       ` David Hildenbrand
2019-10-31 12:10         ` Michael Mueller
2019-11-05 17:51   ` Cornelia Huck
2019-11-07 12:42     ` Michael Mueller
2019-11-14 11:48   ` Thomas Huth
2019-10-24 11:40 ` [RFC 14/37] KVM: s390: protvirt: Implement interruption injection Janosch Frank
2019-11-04 10:29   ` David Hildenbrand
2019-11-04 14:05     ` Christian Borntraeger
2019-11-04 14:23       ` David Hildenbrand
2019-11-14 12:07   ` Thomas Huth
2019-10-24 11:40 ` [RFC 15/37] KVM: s390: protvirt: Add machine-check interruption injection controls Janosch Frank
2019-11-13 14:49   ` Thomas Huth
2019-11-13 15:57     ` Michael Mueller
2019-10-24 11:40 ` [RFC 16/37] KVM: s390: protvirt: Implement machine-check interruption injection Janosch Frank
2019-11-05 18:11   ` Cornelia Huck
2019-10-24 11:40 ` [RFC 17/37] DOCUMENTATION: protvirt: Instruction emulation Janosch Frank
2019-11-14 15:15   ` Cornelia Huck
2019-11-14 15:20     ` Claudio Imbrenda
2019-11-14 15:41       ` Cornelia Huck
2019-11-14 15:55         ` Janosch Frank
2019-11-14 16:03           ` Cornelia Huck
2019-11-14 16:18             ` Janosch Frank
2019-10-24 11:40 ` [RFC 18/37] KVM: s390: protvirt: Handle spec exception loops Janosch Frank
2019-11-14 14:22   ` Thomas Huth
2019-10-24 11:40 ` [RFC 19/37] KVM: s390: protvirt: Add new gprs location handling Janosch Frank
2019-11-04 11:25   ` David Hildenbrand
2019-11-05 12:01     ` Christian Borntraeger
2019-11-05 12:39       ` Janosch Frank
2019-11-05 13:55         ` David Hildenbrand
2019-11-05 14:11           ` Janosch Frank
2019-11-05 14:18             ` David Hildenbrand
2019-11-14 14:46               ` Thomas Huth
2019-11-14 14:44   ` Thomas Huth
2019-11-14 15:56     ` Janosch Frank
2019-10-24 11:40 ` [RFC 20/37] KVM: S390: protvirt: Introduce instruction data area bounce buffer Janosch Frank
2019-11-14 15:36   ` Thomas Huth
2019-11-14 16:04     ` Janosch Frank
2019-11-14 16:21     ` [PATCH] Fixup sida bouncing Janosch Frank
2019-11-15  8:19       ` Thomas Huth
2019-11-15  8:50         ` Janosch Frank
2019-11-15  9:21           ` Thomas Huth
2019-10-24 11:40 ` [RFC 21/37] KVM: S390: protvirt: Instruction emulation Janosch Frank
2019-11-14 15:38   ` Cornelia Huck
2019-11-14 16:00     ` Janosch Frank
2019-11-14 16:05       ` Cornelia Huck
2019-10-24 11:40 ` [RFC 22/37] KVM: s390: protvirt: Add SCLP handling Janosch Frank
2019-10-24 11:40 ` [RFC 23/37] KVM: s390: protvirt: Make sure prefix is always protected Janosch Frank
2019-11-18 16:39   ` Cornelia Huck
2019-11-19  8:11     ` Janosch Frank
2019-11-19  9:45       ` Cornelia Huck
2019-11-19 10:08         ` Janosch Frank
2019-11-19 10:18   ` David Hildenbrand
2019-11-19 11:36     ` Janosch Frank
2019-10-24 11:40 ` [RFC 24/37] KVM: s390: protvirt: Write sthyi data to instruction data area Janosch Frank
2019-11-15  8:04   ` Thomas Huth
2019-11-15 10:16     ` Janosch Frank
2019-11-15 10:21       ` Thomas Huth
2019-11-15 12:17         ` [PATCH] SIDAD macro fixup Janosch Frank
2019-10-24 11:40 ` [RFC 25/37] KVM: s390: protvirt: STSI handling Janosch Frank
2019-11-15  8:27   ` Thomas Huth
2019-10-24 11:40 ` [RFC 26/37] KVM: s390: protvirt: Only sync fmt4 registers Janosch Frank
2019-11-15  9:02   ` Thomas Huth
2019-11-15 10:01     ` Janosch Frank
2019-10-24 11:40 ` [RFC 27/37] KVM: s390: protvirt: SIGP handling Janosch Frank
2019-10-30 18:29   ` David Hildenbrand
2019-11-15 11:15   ` Thomas Huth
2019-10-24 11:40 ` [RFC 28/37] KVM: s390: protvirt: Add program exception injection Janosch Frank
2019-10-24 11:40 ` [RFC 29/37] KVM: s390: protvirt: Sync pv state Janosch Frank
2019-11-15  9:36   ` Thomas Huth
2019-11-15  9:59     ` Janosch Frank
2019-10-24 11:40 ` [RFC 30/37] DOCUMENTATION: protvirt: Diag 308 IPL Janosch Frank
2019-11-06 16:48   ` Cornelia Huck
2019-11-06 17:05     ` Janosch Frank
2019-11-06 17:37       ` Cornelia Huck
2019-11-06 21:02         ` Janosch Frank
2019-11-07  8:53           ` Cornelia Huck
2019-11-07  8:59             ` Janosch Frank
2019-10-24 11:40 ` [RFC 31/37] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling Janosch Frank
2019-11-15 10:04   ` Thomas Huth
2019-11-15 10:20     ` Janosch Frank
2019-11-15 10:27       ` Thomas Huth
2019-11-15 11:29         ` Janosch Frank
2019-10-24 11:40 ` [RFC 32/37] KVM: s390: protvirt: UV calls diag308 0, 1 Janosch Frank
2019-11-15 10:07   ` Thomas Huth
2019-11-15 11:39     ` Janosch Frank
2019-11-15 13:30       ` Thomas Huth
2019-11-15 14:08         ` Janosch Frank
2019-10-24 11:40 ` [RFC 33/37] KVM: s390: Introduce VCPU reset IOCTL Janosch Frank
2019-11-15 10:47   ` Thomas Huth
2019-11-15 13:06     ` Janosch Frank
2019-11-15 13:18       ` Thomas Huth
2019-10-24 11:40 ` [RFC 34/37] KVM: s390: protvirt: Report CPU state to Ultravisor Janosch Frank
2019-10-24 11:40 ` [RFC 35/37] KVM: s390: Fix cpu reset local IRQ clearing Janosch Frank
2019-11-15 11:23   ` Thomas Huth
2019-11-15 11:37     ` Janosch Frank
2019-10-24 11:40 ` [RFC 36/37] KVM: s390: protvirt: Support cmd 5 operation state Janosch Frank
2019-11-15 11:25   ` Thomas Huth
2019-11-18 17:38   ` Cornelia Huck
2019-11-19  8:13     ` Janosch Frank
2019-11-19 10:23       ` Cornelia Huck
2019-11-19 11:40         ` Janosch Frank
2019-10-24 11:40 ` [RFC 37/37] KVM: s390: protvirt: Add UV debug trace Janosch Frank

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.