All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next)
@ 2017-06-28 17:30 Christian Borntraeger
  2017-06-28 17:30 ` [GIT PULL 1/9] KVM: s390: CMMA tracking, ESSA emulation, migration mode Christian Borntraeger
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 17:30 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: KVM, Christian Borntraeger, Cornelia Huck, linux-s390

Paolo, Radim,

the first chunk of 4.13 related changes for s390.

There are two conflicts
1. wit the kvm-arm tree due to 2387149eade2 ("KVM: improve arch vcpu
request defining") which will be resolved in the kvm tree, I assume
2. with the s390 tree due to the s390-wide cleanup  a75259825401
("s390: rename struct psw_bits members"). This is trivial. I guess
this must be resolved by Linus.

It also contains a merge of a topic branch for machine check handling.
This branch is also merged in the s390 tree.

The following changes since commit 865279c53ca9d88718d974bb014b2c6ce259ac75:

  tools/kvm_stat: display guest list in pid/guest selection screens (2017-06-08 18:24:49 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  tags/kvm-s390-next-4.13-1

for you to fetch changes up to d52cd2076eb2ace9fe95dbf795f6d93587453914:

  KVM: s390: Inject machine check into the nested guest (2017-06-28 12:42:33 +0200)

----------------------------------------------------------------
KVM: s390: fixes and features for 4.13

- initial machine check forwarding
- migration support for the CMMA page hinting information
- cleanups
- fixes

----------------------------------------------------------------
Christian Borntraeger (2):
      KVM: s390: implement instruction execution protection for emulated     ifetch
      Merge tag 'nmiforkvm' of git://git.kernel.org/.../kvms390/linux into kernelorgnext

Claudio Imbrenda (2):
      KVM: s390: CMMA tracking, ESSA emulation, migration mode
      KVM: s390: ioctls to get and set guest storage attributes

Martin Schwidefsky (1):
      KVM: s390: avoid packed attribute

QingFeng Hao (4):
      s390/nmi: s390: New low level handling for machine check happening in guest
      KVM: s390: Backup the guest's machine check info
      KVM: s390: Inject machine check into the guest
      KVM: s390: Inject machine check into the nested guest

Yi Min Zhao (1):
      KVM: S390: add new group for flic

 Documentation/virtual/kvm/api.txt               | 135 +++++++++
 Documentation/virtual/kvm/devices/s390_flic.txt |  15 +
 Documentation/virtual/kvm/devices/vm.txt        |  33 +++
 arch/s390/include/asm/ctl_reg.h                 |   4 +-
 arch/s390/include/asm/kvm_host.h                |  44 ++-
 arch/s390/include/asm/nmi.h                     |  13 +
 arch/s390/include/asm/processor.h               |   2 +
 arch/s390/include/uapi/asm/kvm.h                |  12 +
 arch/s390/kernel/asm-offsets.c                  |   3 +
 arch/s390/kernel/entry.S                        |  13 +-
 arch/s390/kernel/nmi.c                          |  84 +++++-
 arch/s390/kvm/gaccess.c                         |  43 ++-
 arch/s390/kvm/interrupt.c                       |  91 +++++-
 arch/s390/kvm/kvm-s390.c                        | 372 +++++++++++++++++++++++-
 arch/s390/kvm/kvm-s390.h                        |   2 +
 arch/s390/kvm/priv.c                            | 103 ++++++-
 arch/s390/kvm/vsie.c                            |  25 +-
 include/uapi/linux/kvm.h                        |  33 +++
 18 files changed, 981 insertions(+), 46 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [GIT PULL 1/9] KVM: s390: CMMA tracking, ESSA emulation, migration mode
  2017-06-28 17:30 [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Christian Borntraeger
@ 2017-06-28 17:30 ` Christian Borntraeger
  2017-06-28 17:30 ` [GIT PULL 2/9] KVM: s390: ioctls to get and set guest storage attributes Christian Borntraeger
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 17:30 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: KVM, Christian Borntraeger, Cornelia Huck, linux-s390, Claudio Imbrenda

From: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>

* Add a migration state bitmap to keep track of which pages have dirty
  CMMA information.
* Disable CMMA by default, so we can track if it's used or not. Enable
  it on first use like we do for storage keys (unless we are doing a
  migration).
* Creates a VM attribute to enter and leave migration mode.
* In migration mode, CMMA is disabled in the SIE block, so ESSA is
  always interpreted and emulated in software.
* Free the migration state on VM destroy.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 Documentation/virtual/kvm/devices/vm.txt |  33 +++++++
 arch/s390/include/asm/kvm_host.h         |   9 ++
 arch/s390/include/uapi/asm/kvm.h         |   6 ++
 arch/s390/kvm/kvm-s390.c                 | 159 ++++++++++++++++++++++++++++++-
 arch/s390/kvm/priv.c                     | 103 +++++++++++++++++++-
 5 files changed, 304 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/vm.txt b/Documentation/virtual/kvm/devices/vm.txt
index 575ccb0..903fc92 100644
--- a/Documentation/virtual/kvm/devices/vm.txt
+++ b/Documentation/virtual/kvm/devices/vm.txt
@@ -222,3 +222,36 @@ Allows user space to disable dea key wrapping, clearing the wrapping key.
 
 Parameters: none
 Returns:    0
+
+5. GROUP: KVM_S390_VM_MIGRATION
+Architectures: s390
+
+5.1. ATTRIBUTE: KVM_S390_VM_MIGRATION_STOP (w/o)
+
+Allows userspace to stop migration mode, needed for PGSTE migration.
+Setting this attribute when migration mode is not active will have no
+effects.
+
+Parameters: none
+Returns:    0
+
+5.2. ATTRIBUTE: KVM_S390_VM_MIGRATION_START (w/o)
+
+Allows userspace to start migration mode, needed for PGSTE migration.
+Setting this attribute when migration mode is already active will have
+no effects.
+
+Parameters: none
+Returns:    -ENOMEM if there is not enough free memory to start migration mode
+	    -EINVAL if the state of the VM is invalid (e.g. no memory defined)
+	    0 in case of success.
+
+5.3. ATTRIBUTE: KVM_S390_VM_MIGRATION_STATUS (r/o)
+
+Allows userspace to query the status of migration mode.
+
+Parameters: address of a buffer in user space to store the data (u64) to;
+	    the data itself is either 0 if migration mode is disabled or 1
+	    if it is enabled
+Returns:    -EFAULT if the given address is not accessible from kernel space
+	    0 in case of success.
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 426614a..a8cafed 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -45,6 +45,8 @@
 #define KVM_REQ_ENABLE_IBS         8
 #define KVM_REQ_DISABLE_IBS        9
 #define KVM_REQ_ICPT_OPEREXC       10
+#define KVM_REQ_START_MIGRATION   11
+#define KVM_REQ_STOP_MIGRATION    12
 
 #define SIGP_CTRL_C		0x80
 #define SIGP_CTRL_SCN_MASK	0x3f
@@ -691,6 +693,12 @@ struct kvm_s390_vsie {
 	struct page *pages[KVM_MAX_VCPUS];
 };
 
+struct kvm_s390_migration_state {
+	unsigned long bitmap_size;	/* in bits (number of guest pages) */
+	atomic64_t dirty_pages;		/* number of dirty pages */
+	unsigned long *pgste_bitmap;
+};
+
 struct kvm_arch{
 	void *sca;
 	int use_esca;
@@ -718,6 +726,7 @@ struct kvm_arch{
 	struct kvm_s390_crypto crypto;
 	struct kvm_s390_vsie vsie;
 	u64 epoch;
+	struct kvm_s390_migration_state *migration_state;
 	/* subset of available cpu features enabled by user space */
 	DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
 };
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 3dd2a1d..d6879a9 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -70,6 +70,7 @@ struct kvm_s390_io_adapter_req {
 #define KVM_S390_VM_TOD			1
 #define KVM_S390_VM_CRYPTO		2
 #define KVM_S390_VM_CPU_MODEL		3
+#define KVM_S390_VM_MIGRATION		4
 
 /* kvm attributes for mem_ctrl */
 #define KVM_S390_VM_MEM_ENABLE_CMMA	0
@@ -151,6 +152,11 @@ struct kvm_s390_vm_cpu_subfunc {
 #define KVM_S390_VM_CRYPTO_DISABLE_AES_KW	2
 #define KVM_S390_VM_CRYPTO_DISABLE_DEA_KW	3
 
+/* kvm attributes for migration mode */
+#define KVM_S390_VM_MIGRATION_STOP	0
+#define KVM_S390_VM_MIGRATION_START	1
+#define KVM_S390_VM_MIGRATION_STATUS	2
+
 /* for KVM_GET_REGS and KVM_SET_REGS */
 struct kvm_regs {
 	/* general purpose regs for s390 */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 689ac48..c2b3914 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -31,6 +31,7 @@
 #include <linux/bitmap.h>
 #include <linux/sched/signal.h>
 
+#include <linux/string.h>
 #include <asm/asm-offsets.h>
 #include <asm/lowcore.h>
 #include <asm/stp.h>
@@ -750,6 +751,129 @@ static int kvm_s390_vm_set_crypto(struct kvm *kvm, struct kvm_device_attr *attr)
 	return 0;
 }
 
+static void kvm_s390_sync_request_broadcast(struct kvm *kvm, int req)
+{
+	int cx;
+	struct kvm_vcpu *vcpu;
+
+	kvm_for_each_vcpu(cx, vcpu, kvm)
+		kvm_s390_sync_request(req, vcpu);
+}
+
+/*
+ * Must be called with kvm->srcu held to avoid races on memslots, and with
+ * kvm->lock to avoid races with ourselves and kvm_s390_vm_stop_migration.
+ */
+static int kvm_s390_vm_start_migration(struct kvm *kvm)
+{
+	struct kvm_s390_migration_state *mgs;
+	struct kvm_memory_slot *ms;
+	/* should be the only one */
+	struct kvm_memslots *slots;
+	unsigned long ram_pages;
+	int slotnr;
+
+	/* migration mode already enabled */
+	if (kvm->arch.migration_state)
+		return 0;
+
+	slots = kvm_memslots(kvm);
+	if (!slots || !slots->used_slots)
+		return -EINVAL;
+
+	mgs = kzalloc(sizeof(*mgs), GFP_KERNEL);
+	if (!mgs)
+		return -ENOMEM;
+	kvm->arch.migration_state = mgs;
+
+	if (kvm->arch.use_cmma) {
+		/*
+		 * Get the last slot. They should be sorted by base_gfn, so the
+		 * last slot is also the one at the end of the address space.
+		 * We have verified above that at least one slot is present.
+		 */
+		ms = slots->memslots + slots->used_slots - 1;
+		/* round up so we only use full longs */
+		ram_pages = roundup(ms->base_gfn + ms->npages, BITS_PER_LONG);
+		/* allocate enough bytes to store all the bits */
+		mgs->pgste_bitmap = vmalloc(ram_pages / 8);
+		if (!mgs->pgste_bitmap) {
+			kfree(mgs);
+			kvm->arch.migration_state = NULL;
+			return -ENOMEM;
+		}
+
+		mgs->bitmap_size = ram_pages;
+		atomic64_set(&mgs->dirty_pages, ram_pages);
+		/* mark all the pages in active slots as dirty */
+		for (slotnr = 0; slotnr < slots->used_slots; slotnr++) {
+			ms = slots->memslots + slotnr;
+			bitmap_set(mgs->pgste_bitmap, ms->base_gfn, ms->npages);
+		}
+
+		kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION);
+	}
+	return 0;
+}
+
+/*
+ * Must be called with kvm->lock to avoid races with ourselves and
+ * kvm_s390_vm_start_migration.
+ */
+static int kvm_s390_vm_stop_migration(struct kvm *kvm)
+{
+	struct kvm_s390_migration_state *mgs;
+
+	/* migration mode already disabled */
+	if (!kvm->arch.migration_state)
+		return 0;
+	mgs = kvm->arch.migration_state;
+	kvm->arch.migration_state = NULL;
+
+	if (kvm->arch.use_cmma) {
+		kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION);
+		vfree(mgs->pgste_bitmap);
+	}
+	kfree(mgs);
+	return 0;
+}
+
+static int kvm_s390_vm_set_migration(struct kvm *kvm,
+				     struct kvm_device_attr *attr)
+{
+	int idx, res = -ENXIO;
+
+	mutex_lock(&kvm->lock);
+	switch (attr->attr) {
+	case KVM_S390_VM_MIGRATION_START:
+		idx = srcu_read_lock(&kvm->srcu);
+		res = kvm_s390_vm_start_migration(kvm);
+		srcu_read_unlock(&kvm->srcu, idx);
+		break;
+	case KVM_S390_VM_MIGRATION_STOP:
+		res = kvm_s390_vm_stop_migration(kvm);
+		break;
+	default:
+		break;
+	}
+	mutex_unlock(&kvm->lock);
+
+	return res;
+}
+
+static int kvm_s390_vm_get_migration(struct kvm *kvm,
+				     struct kvm_device_attr *attr)
+{
+	u64 mig = (kvm->arch.migration_state != NULL);
+
+	if (attr->attr != KVM_S390_VM_MIGRATION_STATUS)
+		return -ENXIO;
+
+	if (copy_to_user((void __user *)attr->addr, &mig, sizeof(mig)))
+		return -EFAULT;
+	return 0;
+}
+
 static int kvm_s390_set_tod_high(struct kvm *kvm, struct kvm_device_attr *attr)
 {
 	u8 gtod_high;
@@ -1090,6 +1214,9 @@ static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
 	case KVM_S390_VM_CRYPTO:
 		ret = kvm_s390_vm_set_crypto(kvm, attr);
 		break;
+	case KVM_S390_VM_MIGRATION:
+		ret = kvm_s390_vm_set_migration(kvm, attr);
+		break;
 	default:
 		ret = -ENXIO;
 		break;
@@ -1112,6 +1239,9 @@ static int kvm_s390_vm_get_attr(struct kvm *kvm, struct kvm_device_attr *attr)
 	case KVM_S390_VM_CPU_MODEL:
 		ret = kvm_s390_get_cpu_model(kvm, attr);
 		break;
+	case KVM_S390_VM_MIGRATION:
+		ret = kvm_s390_vm_get_migration(kvm, attr);
+		break;
 	default:
 		ret = -ENXIO;
 		break;
@@ -1179,6 +1309,9 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
 			break;
 		}
 		break;
+	case KVM_S390_VM_MIGRATION:
+		ret = 0;
+		break;
 	default:
 		ret = -ENXIO;
 		break;
@@ -1633,6 +1766,10 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_s390_destroy_adapters(kvm);
 	kvm_s390_clear_float_irqs(kvm);
 	kvm_s390_vsie_destroy(kvm);
+	if (kvm->arch.migration_state) {
+		vfree(kvm->arch.migration_state->pgste_bitmap);
+		kfree(kvm->arch.migration_state);
+	}
 	KVM_EVENT(3, "vm 0x%pK destroyed", kvm);
 }
 
@@ -1977,7 +2114,6 @@ int kvm_s390_vcpu_setup_cmma(struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.sie_block->cbrlo)
 		return -ENOMEM;
 
-	vcpu->arch.sie_block->ecb2 |= ECB2_CMMA;
 	vcpu->arch.sie_block->ecb2 &= ~ECB2_PFMFI;
 	return 0;
 }
@@ -2489,6 +2625,27 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
 		goto retry;
 	}
 
+	if (kvm_check_request(KVM_REQ_START_MIGRATION, vcpu)) {
+		/*
+		 * Disable CMMA virtualization; we will emulate the ESSA
+		 * instruction manually, in order to provide additional
+		 * functionalities needed for live migration.
+		 */
+		vcpu->arch.sie_block->ecb2 &= ~ECB2_CMMA;
+		goto retry;
+	}
+
+	if (kvm_check_request(KVM_REQ_STOP_MIGRATION, vcpu)) {
+		/*
+		 * Re-enable CMMA virtualization if CMMA is available and
+		 * was used.
+		 */
+		if ((vcpu->kvm->arch.use_cmma) &&
+		    (vcpu->kvm->mm->context.use_cmma))
+			vcpu->arch.sie_block->ecb2 |= ECB2_CMMA;
+		goto retry;
+	}
+
 	/* nothing to do, just clear the request */
 	kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index c03106c..a226c45 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -24,6 +24,7 @@
 #include <asm/ebcdic.h>
 #include <asm/sysinfo.h>
 #include <asm/pgtable.h>
+#include <asm/page-states.h>
 #include <asm/pgalloc.h>
 #include <asm/gmap.h>
 #include <asm/io.h>
@@ -949,13 +950,72 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static inline int do_essa(struct kvm_vcpu *vcpu, const int orc)
+{
+	struct kvm_s390_migration_state *ms = vcpu->kvm->arch.migration_state;
+	int r1, r2, nappended, entries;
+	unsigned long gfn, hva, res, pgstev, ptev;
+	unsigned long *cbrlo;
+
+	/*
+	 * We don't need to set SD.FPF.SK to 1 here, because if we have a
+	 * machine check here we either handle it or crash
+	 */
+
+	kvm_s390_get_regs_rre(vcpu, &r1, &r2);
+	gfn = vcpu->run->s.regs.gprs[r2] >> PAGE_SHIFT;
+	hva = gfn_to_hva(vcpu->kvm, gfn);
+	entries = (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3;
+
+	if (kvm_is_error_hva(hva))
+		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+
+	nappended = pgste_perform_essa(vcpu->kvm->mm, hva, orc, &ptev, &pgstev);
+	if (nappended < 0) {
+		res = orc ? 0x10 : 0;
+		vcpu->run->s.regs.gprs[r1] = res; /* Exception Indication */
+		return 0;
+	}
+	res = (pgstev & _PGSTE_GPS_USAGE_MASK) >> 22;
+	/*
+	 * Set the block-content state part of the result. 0 means resident, so
+	 * nothing to do if the page is valid. 2 is for preserved pages
+	 * (non-present and non-zero), and 3 for zero pages (non-present and
+	 * zero).
+	 */
+	if (ptev & _PAGE_INVALID) {
+		res |= 2;
+		if (pgstev & _PGSTE_GPS_ZERO)
+			res |= 1;
+	}
+	vcpu->run->s.regs.gprs[r1] = res;
+	/*
+	 * It is possible that all the normal 511 slots were full, in which case
+	 * we will now write in the 512th slot, which is reserved for host use.
+	 * In both cases we let the normal essa handling code process all the
+	 * slots, including the reserved one, if needed.
+	 */
+	if (nappended > 0) {
+		cbrlo = phys_to_virt(vcpu->arch.sie_block->cbrlo & PAGE_MASK);
+		cbrlo[entries] = gfn << PAGE_SHIFT;
+	}
+
+	if (orc) {
+		/* increment only if we are really flipping the bit to 1 */
+		if (!test_and_set_bit(gfn, ms->pgste_bitmap))
+			atomic64_inc(&ms->dirty_pages);
+	}
+
+	return nappended;
+}
+
 static int handle_essa(struct kvm_vcpu *vcpu)
 {
 	/* entries expected to be 1FF */
 	int entries = (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3;
 	unsigned long *cbrlo;
 	struct gmap *gmap;
-	int i;
+	int i, orc;
 
 	VCPU_EVENT(vcpu, 4, "ESSA: release %d pages", entries);
 	gmap = vcpu->arch.gmap;
@@ -965,12 +1025,45 @@ static int handle_essa(struct kvm_vcpu *vcpu)
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
-
-	if (((vcpu->arch.sie_block->ipb & 0xf0000000) >> 28) > 6)
+	/* Check for invalid operation request code */
+	orc = (vcpu->arch.sie_block->ipb & 0xf0000000) >> 28;
+	if (orc > ESSA_MAX)
 		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 
-	/* Retry the ESSA instruction */
-	kvm_s390_retry_instr(vcpu);
+	if (likely(!vcpu->kvm->arch.migration_state)) {
+		/*
+		 * CMMA is enabled in the KVM settings, but is disabled in
+		 * the SIE block and in the mm_context, and we are not doing
+		 * a migration. Enable CMMA in the mm_context.
+		 * Since we need to take a write lock to write to the context
+		 * to avoid races with storage keys handling, we check if the
+		 * value really needs to be written to; if the value is
+		 * already correct, we do nothing and avoid the lock.
+		 */
+		if (vcpu->kvm->mm->context.use_cmma == 0) {
+			down_write(&vcpu->kvm->mm->mmap_sem);
+			vcpu->kvm->mm->context.use_cmma = 1;
+			up_write(&vcpu->kvm->mm->mmap_sem);
+		}
+		/*
+		 * If we are here, we are supposed to have CMMA enabled in
+		 * the SIE block. Enabling CMMA works on a per-CPU basis,
+		 * while the context use_cmma flag is per process.
+		 * It's possible that the context flag is enabled and the
+		 * SIE flag is not, so we set the flag always; if it was
+		 * already set, nothing changes, otherwise we enable it
+		 * on this CPU too.
+		 */
+		vcpu->arch.sie_block->ecb2 |= ECB2_CMMA;
+		/* Retry the ESSA instruction */
+		kvm_s390_retry_instr(vcpu);
+	} else {
+		/* Account for the possible extra cbrl entry */
+		i = do_essa(vcpu, orc);
+		if (i < 0)
+			return i;
+		entries += i;
+	}
 	vcpu->arch.sie_block->cbrlo &= PAGE_MASK;	/* reset nceo */
 	cbrlo = phys_to_virt(vcpu->arch.sie_block->cbrlo);
 	down_read(&gmap->mm->mmap_sem);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [GIT PULL 2/9] KVM: s390: ioctls to get and set guest storage attributes
  2017-06-28 17:30 [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Christian Borntraeger
  2017-06-28 17:30 ` [GIT PULL 1/9] KVM: s390: CMMA tracking, ESSA emulation, migration mode Christian Borntraeger
@ 2017-06-28 17:30 ` Christian Borntraeger
  2017-06-28 17:30 ` [GIT PULL 3/9] KVM: s390: implement instruction execution protection for emulated ifetch Christian Borntraeger
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 17:30 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: KVM, Christian Borntraeger, Cornelia Huck, linux-s390, Claudio Imbrenda

From: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>

* Add the struct used in the ioctls to get and set CMMA attributes.
* Add the two functions needed to get and set the CMMA attributes for
  guest pages.
* Add the two ioctls that use the aforementioned functions.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 Documentation/virtual/kvm/api.txt | 135 +++++++++++++++++++++++++
 arch/s390/kvm/kvm-s390.c          | 202 +++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/kvm.h          |  33 +++++++
 3 files changed, 369 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 4029943..912b7df 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3255,6 +3255,141 @@ Otherwise, if the MCE is a corrected error, KVM will just
 store it in the corresponding bank (provided this bank is
 not holding a previously reported uncorrected error).
 
+4.107 KVM_S390_GET_CMMA_BITS
+
+Capability: KVM_CAP_S390_CMMA_MIGRATION
+Architectures: s390
+Type: vm ioctl
+Parameters: struct kvm_s390_cmma_log (in, out)
+Returns: 0 on success, a negative value on error
+
+This ioctl is used to get the values of the CMMA bits on the s390
+architecture. It is meant to be used in two scenarios:
+- During live migration to save the CMMA values. Live migration needs
+  to be enabled via the KVM_REQ_START_MIGRATION VM property.
+- To non-destructively peek at the CMMA values, with the flag
+  KVM_S390_CMMA_PEEK set.
+
+The ioctl takes parameters via the kvm_s390_cmma_log struct. The desired
+values are written to a buffer whose location is indicated via the "values"
+member in the kvm_s390_cmma_log struct.  The values in the input struct are
+also updated as needed.
+Each CMMA value takes up one byte.
+
+struct kvm_s390_cmma_log {
+	__u64 start_gfn;
+	__u32 count;
+	__u32 flags;
+	union {
+		__u64 remaining;
+		__u64 mask;
+	};
+	__u64 values;
+};
+
+start_gfn is the number of the first guest frame whose CMMA values are
+to be retrieved,
+
+count is the length of the buffer in bytes,
+
+values points to the buffer where the result will be written to.
+
+If count is greater than KVM_S390_SKEYS_MAX, then it is considered to be
+KVM_S390_SKEYS_MAX. KVM_S390_SKEYS_MAX is re-used for consistency with
+other ioctls.
+
+The result is written in the buffer pointed to by the field values, and
+the values of the input parameter are updated as follows.
+
+Depending on the flags, different actions are performed. The only
+supported flag so far is KVM_S390_CMMA_PEEK.
+
+The default behaviour if KVM_S390_CMMA_PEEK is not set is:
+start_gfn will indicate the first page frame whose CMMA bits were dirty.
+It is not necessarily the same as the one passed as input, as clean pages
+are skipped.
+
+count will indicate the number of bytes actually written in the buffer.
+It can (and very often will) be smaller than the input value, since the
+buffer is only filled until 16 bytes of clean values are found (which
+are then not copied in the buffer). Since a CMMA migration block needs
+the base address and the length, for a total of 16 bytes, we will send
+back some clean data if there is some dirty data afterwards, as long as
+the size of the clean data does not exceed the size of the header. This
+allows to minimize the amount of data to be saved or transferred over
+the network at the expense of more roundtrips to userspace. The next
+invocation of the ioctl will skip over all the clean values, saving
+potentially more than just the 16 bytes we found.
+
+If KVM_S390_CMMA_PEEK is set:
+the existing storage attributes are read even when not in migration
+mode, and no other action is performed;
+
+the output start_gfn will be equal to the input start_gfn,
+
+the output count will be equal to the input count, except if the end of
+memory has been reached.
+
+In both cases:
+the field "remaining" will indicate the total number of dirty CMMA values
+still remaining, or 0 if KVM_S390_CMMA_PEEK is set and migration mode is
+not enabled.
+
+mask is unused.
+
+values points to the userspace buffer where the result will be stored.
+
+This ioctl can fail with -ENOMEM if not enough memory can be allocated to
+complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
+KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with
+-EFAULT if the userspace address is invalid or if no page table is
+present for the addresses (e.g. when using hugepages).
+
+4.108 KVM_S390_SET_CMMA_BITS
+
+Capability: KVM_CAP_S390_CMMA_MIGRATION
+Architectures: s390
+Type: vm ioctl
+Parameters: struct kvm_s390_cmma_log (in)
+Returns: 0 on success, a negative value on error
+
+This ioctl is used to set the values of the CMMA bits on the s390
+architecture. It is meant to be used during live migration to restore
+the CMMA values, but there are no restrictions on its use.
+The ioctl takes parameters via the kvm_s390_cmma_values struct.
+Each CMMA value takes up one byte.
+
+struct kvm_s390_cmma_log {
+	__u64 start_gfn;
+	__u32 count;
+	__u32 flags;
+	union {
+		__u64 remaining;
+		__u64 mask;
+	};
+	__u64 values;
+};
+
+start_gfn indicates the starting guest frame number,
+
+count indicates how many values are to be considered in the buffer,
+
+flags is not used and must be 0.
+
+mask indicates which PGSTE bits are to be considered.
+
+remaining is not used.
+
+values points to the buffer in userspace where to store the values.
+
+This ioctl can fail with -ENOMEM if not enough memory can be allocated to
+complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
+the count field is too large (e.g. more than KVM_S390_CMMA_SIZE_MAX) or
+if the flags field was not 0, with -EFAULT if the userspace address is
+invalid, if invalid pages are written to (e.g. after the end of memory)
+or if no page table is present for the addresses (e.g. when using
+hugepages).
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index c2b3914..e100a7f 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -30,8 +30,8 @@
 #include <linux/vmalloc.h>
 #include <linux/bitmap.h>
 #include <linux/sched/signal.h>
-
 #include <linux/string.h>
+
 #include <asm/asm-offsets.h>
 #include <asm/lowcore.h>
 #include <asm/stp.h>
@@ -387,6 +387,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_S390_SKEYS:
 	case KVM_CAP_S390_IRQ_STATE:
 	case KVM_CAP_S390_USER_INSTR0:
+	case KVM_CAP_S390_CMMA_MIGRATION:
 	case KVM_CAP_S390_AIS:
 		r = 1;
 		break;
@@ -1419,6 +1420,182 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	return r;
 }
 
+/*
+ * Base address and length must be sent at the start of each block, therefore
+ * it's cheaper to send some clean data, as long as it's less than the size of
+ * two longs.
+ */
+#define KVM_S390_MAX_BIT_DISTANCE (2 * sizeof(void *))
+/* for consistency */
+#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX)
+
+/*
+ * This function searches for the next page with dirty CMMA attributes, and
+ * saves the attributes in the buffer up to either the end of the buffer or
+ * until a block of at least KVM_S390_MAX_BIT_DISTANCE clean bits is found;
+ * no trailing clean bytes are saved.
+ * In case no dirty bits were found, or if CMMA was not enabled or used, the
+ * output buffer will indicate 0 as length.
+ */
+static int kvm_s390_get_cmma_bits(struct kvm *kvm,
+				  struct kvm_s390_cmma_log *args)
+{
+	struct kvm_s390_migration_state *s = kvm->arch.migration_state;
+	unsigned long bufsize, hva, pgstev, i, next, cur;
+	int srcu_idx, peek, r = 0, rr;
+	u8 *res;
+
+	cur = args->start_gfn;
+	i = next = pgstev = 0;
+
+	if (unlikely(!kvm->arch.use_cmma))
+		return -ENXIO;
+	/* Invalid/unsupported flags were specified */
+	if (args->flags & ~KVM_S390_CMMA_PEEK)
+		return -EINVAL;
+	/* Migration mode query, and we are not doing a migration */
+	peek = !!(args->flags & KVM_S390_CMMA_PEEK);
+	if (!peek && !s)
+		return -EINVAL;
+	/* CMMA is disabled or was not used, or the buffer has length zero */
+	bufsize = min(args->count, KVM_S390_CMMA_SIZE_MAX);
+	if (!bufsize || !kvm->mm->context.use_cmma) {
+		memset(args, 0, sizeof(*args));
+		return 0;
+	}
+
+	if (!peek) {
+		/* We are not peeking, and there are no dirty pages */
+		if (!atomic64_read(&s->dirty_pages)) {
+			memset(args, 0, sizeof(*args));
+			return 0;
+		}
+		cur = find_next_bit(s->pgste_bitmap, s->bitmap_size,
+				    args->start_gfn);
+		if (cur >= s->bitmap_size)	/* nothing found, loop back */
+			cur = find_next_bit(s->pgste_bitmap, s->bitmap_size, 0);
+		if (cur >= s->bitmap_size) {	/* again! (very unlikely) */
+			memset(args, 0, sizeof(*args));
+			return 0;
+		}
+		next = find_next_bit(s->pgste_bitmap, s->bitmap_size, cur + 1);
+	}
+
+	res = vmalloc(bufsize);
+	if (!res)
+		return -ENOMEM;
+
+	args->start_gfn = cur;
+
+	down_read(&kvm->mm->mmap_sem);
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	while (i < bufsize) {
+		hva = gfn_to_hva(kvm, cur);
+		if (kvm_is_error_hva(hva)) {
+			r = -EFAULT;
+			break;
+		}
+		/* decrement only if we actually flipped the bit to 0 */
+		if (!peek && test_and_clear_bit(cur, s->pgste_bitmap))
+			atomic64_dec(&s->dirty_pages);
+		r = get_pgste(kvm->mm, hva, &pgstev);
+		if (r < 0)
+			pgstev = 0;
+		/* save the value */
+		res[i++] = (pgstev >> 24) & 0x3;
+		/*
+		 * if the next bit is too far away, stop.
+		 * if we reached the previous "next", find the next one
+		 */
+		if (!peek) {
+			if (next > cur + KVM_S390_MAX_BIT_DISTANCE)
+				break;
+			if (cur == next)
+				next = find_next_bit(s->pgste_bitmap,
+						     s->bitmap_size, cur + 1);
+		/* reached the end of the bitmap or of the buffer, stop */
+			if ((next >= s->bitmap_size) ||
+			    (next >= args->start_gfn + bufsize))
+				break;
+		}
+		cur++;
+	}
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	up_read(&kvm->mm->mmap_sem);
+	args->count = i;
+	args->remaining = s ? atomic64_read(&s->dirty_pages) : 0;
+
+	rr = copy_to_user((void __user *)args->values, res, args->count);
+	if (rr)
+		r = -EFAULT;
+
+	vfree(res);
+	return r;
+}
+
+/*
+ * This function sets the CMMA attributes for the given pages. If the input
+ * buffer has zero length, no action is taken, otherwise the attributes are
+ * set and the mm->context.use_cmma flag is set.
+ */
+static int kvm_s390_set_cmma_bits(struct kvm *kvm,
+				  const struct kvm_s390_cmma_log *args)
+{
+	unsigned long hva, mask, pgstev, i;
+	uint8_t *bits;
+	int srcu_idx, r = 0;
+
+	mask = args->mask;
+
+	if (!kvm->arch.use_cmma)
+		return -ENXIO;
+	/* invalid/unsupported flags */
+	if (args->flags != 0)
+		return -EINVAL;
+	/* Enforce sane limit on memory allocation */
+	if (args->count > KVM_S390_CMMA_SIZE_MAX)
+		return -EINVAL;
+	/* Nothing to do */
+	if (args->count == 0)
+		return 0;
+
+	bits = vmalloc(sizeof(*bits) * args->count);
+	if (!bits)
+		return -ENOMEM;
+
+	r = copy_from_user(bits, (void __user *)args->values, args->count);
+	if (r) {
+		r = -EFAULT;
+		goto out;
+	}
+
+	down_read(&kvm->mm->mmap_sem);
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	for (i = 0; i < args->count; i++) {
+		hva = gfn_to_hva(kvm, args->start_gfn + i);
+		if (kvm_is_error_hva(hva)) {
+			r = -EFAULT;
+			break;
+		}
+
+		pgstev = bits[i];
+		pgstev = pgstev << 24;
+		mask &= _PGSTE_GPS_USAGE_MASK;
+		set_pgste_bits(kvm->mm, hva, mask, pgstev);
+	}
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	up_read(&kvm->mm->mmap_sem);
+
+	if (!kvm->mm->context.use_cmma) {
+		down_write(&kvm->mm->mmap_sem);
+		kvm->mm->context.use_cmma = 1;
+		up_write(&kvm->mm->mmap_sem);
+	}
+out:
+	vfree(bits);
+	return r;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
@@ -1497,6 +1674,29 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = kvm_s390_set_skeys(kvm, &args);
 		break;
 	}
+	case KVM_S390_GET_CMMA_BITS: {
+		struct kvm_s390_cmma_log args;
+
+		r = -EFAULT;
+		if (copy_from_user(&args, argp, sizeof(args)))
+			break;
+		r = kvm_s390_get_cmma_bits(kvm, &args);
+		if (!r) {
+			r = copy_to_user(argp, &args, sizeof(args));
+			if (r)
+				r = -EFAULT;
+		}
+		break;
+	}
+	case KVM_S390_SET_CMMA_BITS: {
+		struct kvm_s390_cmma_log args;
+
+		r = -EFAULT;
+		if (copy_from_user(&args, argp, sizeof(args)))
+			break;
+		r = kvm_s390_set_cmma_bits(kvm, &args);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 577429a..2b8dc1c 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -155,6 +155,35 @@ struct kvm_s390_skeys {
 	__u32 reserved[9];
 };
 
+#define KVM_S390_CMMA_PEEK (1 << 0)
+
+/**
+ * kvm_s390_cmma_log - Used for CMMA migration.
+ *
+ * Used both for input and output.
+ *
+ * @start_gfn: Guest page number to start from.
+ * @count: Size of the result buffer.
+ * @flags: Control operation mode via KVM_S390_CMMA_* flags
+ * @remaining: Used with KVM_S390_GET_CMMA_BITS. Indicates how many dirty
+ *             pages are still remaining.
+ * @mask: Used with KVM_S390_SET_CMMA_BITS. Bitmap of bits to actually set
+ *        in the PGSTE.
+ * @values: Pointer to the values buffer.
+ *
+ * Used in KVM_S390_{G,S}ET_CMMA_BITS ioctls.
+ */
+struct kvm_s390_cmma_log {
+	__u64 start_gfn;
+	__u32 count;
+	__u32 flags;
+	union {
+		__u64 remaining;
+		__u64 mask;
+	};
+	__u64 values;
+};
+
 struct kvm_hyperv_exit {
 #define KVM_EXIT_HYPERV_SYNIC          1
 #define KVM_EXIT_HYPERV_HCALL          2
@@ -895,6 +924,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SPAPR_TCE_VFIO 142
 #define KVM_CAP_X86_GUEST_MWAIT 143
 #define KVM_CAP_ARM_USER_IRQ 144
+#define KVM_CAP_S390_CMMA_MIGRATION 145
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1318,6 +1348,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_S390_GET_IRQ_STATE	  _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
 /* Available with KVM_CAP_X86_SMM */
 #define KVM_SMI                   _IO(KVMIO,   0xb7)
+/* Available with KVM_CAP_S390_CMMA_MIGRATION */
+#define KVM_S390_GET_CMMA_BITS      _IOW(KVMIO, 0xb8, struct kvm_s390_cmma_log)
+#define KVM_S390_SET_CMMA_BITS      _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [GIT PULL 3/9] KVM: s390: implement instruction execution protection for emulated ifetch
  2017-06-28 17:30 [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Christian Borntraeger
  2017-06-28 17:30 ` [GIT PULL 1/9] KVM: s390: CMMA tracking, ESSA emulation, migration mode Christian Borntraeger
  2017-06-28 17:30 ` [GIT PULL 2/9] KVM: s390: ioctls to get and set guest storage attributes Christian Borntraeger
@ 2017-06-28 17:30 ` Christian Borntraeger
  2017-06-28 17:30 ` [GIT PULL 4/9] KVM: S390: add new group for flic Christian Borntraeger
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 17:30 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: KVM, Christian Borntraeger, Cornelia Huck, linux-s390

While currently only used to fetch the original instruction on failure
for getting the instruction length code, we should make the page table
walking code future proof.

Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/ctl_reg.h |  4 +++-
 arch/s390/kvm/gaccess.c         | 39 +++++++++++++++++++++++++++++----------
 2 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/ctl_reg.h b/arch/s390/include/asm/ctl_reg.h
index d0441ad..e508dff 100644
--- a/arch/s390/include/asm/ctl_reg.h
+++ b/arch/s390/include/asm/ctl_reg.h
@@ -59,7 +59,9 @@ union ctlreg0 {
 		unsigned long lap  : 1; /* Low-address-protection control */
 		unsigned long	   : 4;
 		unsigned long edat : 1; /* Enhanced-DAT-enablement control */
-		unsigned long	   : 4;
+		unsigned long	   : 2;
+		unsigned long iep  : 1; /* Instruction-Execution-Protection */
+		unsigned long	   : 1;
 		unsigned long afp  : 1; /* AFP-register control */
 		unsigned long vx   : 1; /* Vector enablement control */
 		unsigned long	   : 7;
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 9da243d..6fda095 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -89,7 +89,7 @@ struct region3_table_entry_fc1 {
 	unsigned long f  : 1; /* Fetch-Protection Bit */
 	unsigned long fc : 1; /* Format-Control */
 	unsigned long p  : 1; /* DAT-Protection Bit */
-	unsigned long co : 1; /* Change-Recording Override */
+	unsigned long iep: 1; /* Instruction-Execution-Protection */
 	unsigned long	 : 2;
 	unsigned long i  : 1; /* Region-Invalid Bit */
 	unsigned long cr : 1; /* Common-Region Bit */
@@ -131,7 +131,7 @@ struct segment_entry_fc1 {
 	unsigned long f  : 1; /* Fetch-Protection Bit */
 	unsigned long fc : 1; /* Format-Control */
 	unsigned long p  : 1; /* DAT-Protection Bit */
-	unsigned long co : 1; /* Change-Recording Override */
+	unsigned long iep: 1; /* Instruction-Execution-Protection */
 	unsigned long	 : 2;
 	unsigned long i  : 1; /* Segment-Invalid Bit */
 	unsigned long cs : 1; /* Common-Segment Bit */
@@ -168,7 +168,8 @@ union page_table_entry {
 		unsigned long z  : 1; /* Zero Bit */
 		unsigned long i  : 1; /* Page-Invalid Bit */
 		unsigned long p  : 1; /* DAT-Protection Bit */
-		unsigned long	 : 9;
+		unsigned long iep: 1; /* Instruction-Execution-Protection */
+		unsigned long	 : 8;
 	};
 };
 
@@ -485,6 +486,7 @@ enum prot_type {
 	PROT_TYPE_KEYC = 1,
 	PROT_TYPE_ALC  = 2,
 	PROT_TYPE_DAT  = 3,
+	PROT_TYPE_IEP  = 4,
 };
 
 static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
@@ -500,6 +502,9 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
 	switch (code) {
 	case PGM_PROTECTION:
 		switch (prot) {
+		case PROT_TYPE_IEP:
+			tec->b61 = 1;
+			/* FALL THROUGH */
 		case PROT_TYPE_LA:
 			tec->b56 = 1;
 			break;
@@ -591,6 +596,7 @@ static int deref_table(struct kvm *kvm, unsigned long gpa, unsigned long *val)
  * @gpa: points to where guest physical (absolute) address should be stored
  * @asce: effective asce
  * @mode: indicates the access mode to be used
+ * @prot: returns the type for protection exceptions
  *
  * Translate a guest virtual address into a guest absolute address by means
  * of dynamic address translation as specified by the architecture.
@@ -606,19 +612,21 @@ static int deref_table(struct kvm *kvm, unsigned long gpa, unsigned long *val)
  */
 static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
 				     unsigned long *gpa, const union asce asce,
-				     enum gacc_mode mode)
+				     enum gacc_mode mode, enum prot_type *prot)
 {
 	union vaddress vaddr = {.addr = gva};
 	union raddress raddr = {.addr = gva};
 	union page_table_entry pte;
 	int dat_protection = 0;
+	int iep_protection = 0;
 	union ctlreg0 ctlreg0;
 	unsigned long ptr;
-	int edat1, edat2;
+	int edat1, edat2, iep;
 
 	ctlreg0.val = vcpu->arch.sie_block->gcr[0];
 	edat1 = ctlreg0.edat && test_kvm_facility(vcpu->kvm, 8);
 	edat2 = edat1 && test_kvm_facility(vcpu->kvm, 78);
+	iep = ctlreg0.iep && test_kvm_facility(vcpu->kvm, 130);
 	if (asce.r)
 		goto real_address;
 	ptr = asce.origin * 4096;
@@ -702,6 +710,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
 			return PGM_TRANSLATION_SPEC;
 		if (rtte.fc && edat2) {
 			dat_protection |= rtte.fc1.p;
+			iep_protection = rtte.fc1.iep;
 			raddr.rfaa = rtte.fc1.rfaa;
 			goto absolute_address;
 		}
@@ -729,6 +738,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
 			return PGM_TRANSLATION_SPEC;
 		if (ste.fc && edat1) {
 			dat_protection |= ste.fc1.p;
+			iep_protection = ste.fc1.iep;
 			raddr.sfaa = ste.fc1.sfaa;
 			goto absolute_address;
 		}
@@ -745,12 +755,19 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
 	if (pte.z)
 		return PGM_TRANSLATION_SPEC;
 	dat_protection |= pte.p;
+	iep_protection = pte.iep;
 	raddr.pfra = pte.pfra;
 real_address:
 	raddr.addr = kvm_s390_real_to_abs(vcpu, raddr.addr);
 absolute_address:
-	if (mode == GACC_STORE && dat_protection)
+	if (mode == GACC_STORE && dat_protection) {
+		*prot = PROT_TYPE_DAT;
 		return PGM_PROTECTION;
+	}
+	if (mode == GACC_IFETCH && iep_protection && iep) {
+		*prot = PROT_TYPE_IEP;
+		return PGM_PROTECTION;
+	}
 	if (kvm_is_error_gpa(vcpu->kvm, raddr.addr))
 		return PGM_ADDRESSING;
 	*gpa = raddr.addr;
@@ -782,6 +799,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
 {
 	psw_t *psw = &vcpu->arch.sie_block->gpsw;
 	int lap_enabled, rc = 0;
+	enum prot_type prot;
 
 	lap_enabled = low_address_protection_enabled(vcpu, asce);
 	while (nr_pages) {
@@ -791,7 +809,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
 					 PROT_TYPE_LA);
 		ga &= PAGE_MASK;
 		if (psw_bits(*psw).t) {
-			rc = guest_translate(vcpu, ga, pages, asce, mode);
+			rc = guest_translate(vcpu, ga, pages, asce, mode, &prot);
 			if (rc < 0)
 				return rc;
 		} else {
@@ -800,7 +818,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
 				rc = PGM_ADDRESSING;
 		}
 		if (rc)
-			return trans_exc(vcpu, rc, ga, ar, mode, PROT_TYPE_DAT);
+			return trans_exc(vcpu, rc, ga, ar, mode, prot);
 		ga += PAGE_SIZE;
 		pages++;
 		nr_pages--;
@@ -886,6 +904,7 @@ int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
 			    unsigned long *gpa, enum gacc_mode mode)
 {
 	psw_t *psw = &vcpu->arch.sie_block->gpsw;
+	enum prot_type prot;
 	union asce asce;
 	int rc;
 
@@ -900,9 +919,9 @@ int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
 	}
 
 	if (psw_bits(*psw).t && !asce.r) {	/* Use DAT? */
-		rc = guest_translate(vcpu, gva, gpa, asce, mode);
+		rc = guest_translate(vcpu, gva, gpa, asce, mode, &prot);
 		if (rc > 0)
-			return trans_exc(vcpu, rc, gva, 0, mode, PROT_TYPE_DAT);
+			return trans_exc(vcpu, rc, gva, 0, mode, prot);
 	} else {
 		*gpa = kvm_s390_real_to_abs(vcpu, gva);
 		if (kvm_is_error_gpa(vcpu->kvm, *gpa))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [GIT PULL 4/9] KVM: S390: add new group for flic
  2017-06-28 17:30 [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Christian Borntraeger
                   ` (2 preceding siblings ...)
  2017-06-28 17:30 ` [GIT PULL 3/9] KVM: s390: implement instruction execution protection for emulated ifetch Christian Borntraeger
@ 2017-06-28 17:30 ` Christian Borntraeger
  2017-06-28 17:30 ` [GIT PULL 5/9] KVM: s390: avoid packed attribute Christian Borntraeger
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 17:30 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: KVM, Christian Borntraeger, Cornelia Huck, linux-s390, Yi Min Zhao

From: Yi Min Zhao <zyimin@linux.vnet.ibm.com>

In some cases, userspace needs to get or set all ais states for example
migration. So we introduce a new group KVM_DEV_FLIC_AISM_ALL to provide
interfaces to get or set the adapter-interruption-suppression mode for
all ISCs. The corresponding documentation is updated.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 Documentation/virtual/kvm/devices/s390_flic.txt | 15 ++++++++
 arch/s390/include/uapi/asm/kvm.h                |  6 ++++
 arch/s390/kvm/interrupt.c                       | 48 +++++++++++++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/s390_flic.txt b/Documentation/virtual/kvm/devices/s390_flic.txt
index c2518ce..2f1cbf1 100644
--- a/Documentation/virtual/kvm/devices/s390_flic.txt
+++ b/Documentation/virtual/kvm/devices/s390_flic.txt
@@ -16,6 +16,7 @@ FLIC provides support to
 - register and modify adapter interrupt sources (KVM_DEV_FLIC_ADAPTER_*)
 - modify AIS (adapter-interruption-suppression) mode state (KVM_DEV_FLIC_AISM)
 - inject adapter interrupts on a specified adapter (KVM_DEV_FLIC_AIRQ_INJECT)
+- get/set all AIS mode states (KVM_DEV_FLIC_AISM_ALL)
 
 Groups:
   KVM_DEV_FLIC_ENQUEUE
@@ -136,6 +137,20 @@ struct kvm_s390_ais_req {
     an isc according to the adapter-interruption-suppression mode on condition
     that the AIS capability is enabled.
 
+  KVM_DEV_FLIC_AISM_ALL
+    Gets or sets the adapter-interruption-suppression mode for all ISCs. Takes
+    a kvm_s390_ais_all describing:
+
+struct kvm_s390_ais_all {
+       __u8 simm; /* Single-Interruption-Mode mask */
+       __u8 nimm; /* No-Interruption-Mode mask *
+};
+
+    simm contains Single-Interruption-Mode mask for all ISCs, nimm contains
+    No-Interruption-Mode mask for all ISCs. Each bit in simm and nimm corresponds
+    to an ISC (MSB0 bit 0 to ISC 0 and so on). The combination of simm bit and
+    nimm bit presents AIS mode for a ISC.
+
 Note: The KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR device ioctls executed on
 FLIC with an unknown group or attribute gives the error code EINVAL (instead of
 ENXIO, as specified in the API documentation). It is not possible to conclude
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index d6879a9..69d09c3 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -28,6 +28,7 @@
 #define KVM_DEV_FLIC_CLEAR_IO_IRQ	8
 #define KVM_DEV_FLIC_AISM		9
 #define KVM_DEV_FLIC_AIRQ_INJECT	10
+#define KVM_DEV_FLIC_AISM_ALL		11
 /*
  * We can have up to 4*64k pending subchannels + 8 adapter interrupts,
  * as well as up  to ASYNC_PF_PER_VCPU*KVM_MAX_VCPUS pfault done interrupts.
@@ -53,6 +54,11 @@ struct kvm_s390_ais_req {
 	__u16 mode;
 };
 
+struct kvm_s390_ais_all {
+	__u8 simm;
+	__u8 nimm;
+};
+
 #define KVM_S390_IO_ADAPTER_MASK 1
 #define KVM_S390_IO_ADAPTER_MAP 2
 #define KVM_S390_IO_ADAPTER_UNMAP 3
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index caf15c8a..72f3aaf 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -1876,6 +1876,28 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 __user *usrbuf, u64 len)
 	return ret < 0 ? ret : n;
 }
 
+static int flic_ais_mode_get_all(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
+	struct kvm_s390_ais_all ais;
+
+	if (attr->attr < sizeof(ais))
+		return -EINVAL;
+
+	if (!test_kvm_facility(kvm, 72))
+		return -ENOTSUPP;
+
+	mutex_lock(&fi->ais_lock);
+	ais.simm = fi->simm;
+	ais.nimm = fi->nimm;
+	mutex_unlock(&fi->ais_lock);
+
+	if (copy_to_user((void __user *)attr->addr, &ais, sizeof(ais)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 {
 	int r;
@@ -1885,6 +1907,9 @@ static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 		r = get_all_floating_irqs(dev->kvm, (u8 __user *) attr->addr,
 					  attr->attr);
 		break;
+	case KVM_DEV_FLIC_AISM_ALL:
+		r = flic_ais_mode_get_all(dev->kvm, attr);
+		break;
 	default:
 		r = -EINVAL;
 	}
@@ -2235,6 +2260,25 @@ static int flic_inject_airq(struct kvm *kvm, struct kvm_device_attr *attr)
 	return kvm_s390_inject_airq(kvm, adapter);
 }
 
+static int flic_ais_mode_set_all(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
+	struct kvm_s390_ais_all ais;
+
+	if (!test_kvm_facility(kvm, 72))
+		return -ENOTSUPP;
+
+	if (copy_from_user(&ais, (void __user *)attr->addr, sizeof(ais)))
+		return -EFAULT;
+
+	mutex_lock(&fi->ais_lock);
+	fi->simm = ais.simm;
+	fi->nimm = ais.nimm;
+	mutex_unlock(&fi->ais_lock);
+
+	return 0;
+}
+
 static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 {
 	int r = 0;
@@ -2277,6 +2321,9 @@ static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 	case KVM_DEV_FLIC_AIRQ_INJECT:
 		r = flic_inject_airq(dev->kvm, attr);
 		break;
+	case KVM_DEV_FLIC_AISM_ALL:
+		r = flic_ais_mode_set_all(dev->kvm, attr);
+		break;
 	default:
 		r = -EINVAL;
 	}
@@ -2298,6 +2345,7 @@ static int flic_has_attr(struct kvm_device *dev,
 	case KVM_DEV_FLIC_CLEAR_IO_IRQ:
 	case KVM_DEV_FLIC_AISM:
 	case KVM_DEV_FLIC_AIRQ_INJECT:
+	case KVM_DEV_FLIC_AISM_ALL:
 		return 0;
 	}
 	return -ENXIO;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [GIT PULL 5/9] KVM: s390: avoid packed attribute
  2017-06-28 17:30 [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Christian Borntraeger
                   ` (3 preceding siblings ...)
  2017-06-28 17:30 ` [GIT PULL 4/9] KVM: S390: add new group for flic Christian Borntraeger
@ 2017-06-28 17:30 ` Christian Borntraeger
  2017-06-28 17:30 ` [GIT PULL 6/9] s390/nmi: s390: New low level handling for machine check happening in guest Christian Borntraeger
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 17:30 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: KVM, Christian Borntraeger, Cornelia Huck, linux-s390,
	Martin Schwidefsky

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

For naturally aligned and sized data structures avoid superfluous
packed and aligned attributes.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 18 +++++++++---------
 arch/s390/kvm/gaccess.c          |  4 ++--
 arch/s390/kvm/vsie.c             |  2 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a8cafed..72bad67 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -58,7 +58,7 @@ union bsca_sigp_ctrl {
 		__u8 r : 1;
 		__u8 scn : 6;
 	};
-} __packed;
+};
 
 union esca_sigp_ctrl {
 	__u16 value;
@@ -67,14 +67,14 @@ union esca_sigp_ctrl {
 		__u8 reserved: 7;
 		__u8 scn;
 	};
-} __packed;
+};
 
 struct esca_entry {
 	union esca_sigp_ctrl sigp_ctrl;
 	__u16   reserved1[3];
 	__u64   sda;
 	__u64   reserved2[6];
-} __packed;
+};
 
 struct bsca_entry {
 	__u8	reserved0;
@@ -82,7 +82,7 @@ struct bsca_entry {
 	__u16	reserved[3];
 	__u64	sda;
 	__u64	reserved2[2];
-} __attribute__((packed));
+};
 
 union ipte_control {
 	unsigned long val;
@@ -99,7 +99,7 @@ struct bsca_block {
 	__u64	mcn;
 	__u64	reserved2;
 	struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
-} __attribute__((packed));
+};
 
 struct esca_block {
 	union ipte_control ipte_control;
@@ -107,7 +107,7 @@ struct esca_block {
 	__u64   mcn[4];
 	__u64   reserved2[20];
 	struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
-} __packed;
+};
 
 #define CPUSTAT_STOPPED    0x80000000
 #define CPUSTAT_WAIT       0x10000000
@@ -262,14 +262,14 @@ struct kvm_s390_sie_block {
 
 struct kvm_s390_itdb {
 	__u8	data[256];
-} __packed;
+};
 
 struct sie_page {
 	struct kvm_s390_sie_block sie_block;
 	__u8 reserved200[1024];		/* 0x0200 */
 	struct kvm_s390_itdb itdb;	/* 0x0600 */
 	__u8 reserved700[2304];		/* 0x0700 */
-} __packed;
+};
 
 struct kvm_vcpu_stat {
 	u64 exit_userspace;
@@ -683,7 +683,7 @@ struct sie_page2 {
 	__u64 fac_list[S390_ARCH_FAC_LIST_SIZE_U64];	/* 0x0000 */
 	struct kvm_s390_crypto_cb crycb;		/* 0x0800 */
 	u8 reserved900[0x1000 - 0x900];			/* 0x0900 */
-} __packed;
+};
 
 struct kvm_s390_vsie {
 	struct mutex mutex;
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 6fda095..17e3a4e 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -242,7 +242,7 @@ struct ale {
 	unsigned long asteo  : 25; /* ASN-Second-Table-Entry Origin */
 	unsigned long        : 6;
 	unsigned long astesn : 32; /* ASTE Sequence Number */
-} __packed;
+};
 
 struct aste {
 	unsigned long i      : 1; /* ASX-Invalid Bit */
@@ -258,7 +258,7 @@ struct aste {
 	unsigned long ald    : 32;
 	unsigned long astesn : 32;
 	/* .. more fields there */
-} __packed;
+};
 
 int ipte_lock_held(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 4719ecb..e947657 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -35,7 +35,7 @@ struct vsie_page {
 	__u8 reserved[0x0700 - 0x0218];		/* 0x0218 */
 	struct kvm_s390_crypto_cb crycb;	/* 0x0700 */
 	__u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE];	/* 0x0800 */
-} __packed;
+};
 
 /* trigger a validity icpt for the given scb */
 static int set_validity_icpt(struct kvm_s390_sie_block *scb,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [GIT PULL 6/9] s390/nmi: s390: New low level handling for machine check happening in guest
  2017-06-28 17:30 [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Christian Borntraeger
                   ` (4 preceding siblings ...)
  2017-06-28 17:30 ` [GIT PULL 5/9] KVM: s390: avoid packed attribute Christian Borntraeger
@ 2017-06-28 17:30 ` Christian Borntraeger
  2017-06-28 17:30 ` [GIT PULL 7/9] KVM: s390: Backup the guest's machine check info Christian Borntraeger
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 17:30 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: KVM, Christian Borntraeger, Cornelia Huck, linux-s390, QingFeng Hao

From: QingFeng Hao <haoqf@linux.vnet.ibm.com>

Add the logic to check if the machine check happens when the guest is
running. If yes, set the exit reason -EINTR in the machine check's
interrupt handler. Refactor s390_do_machine_check to avoid panicing
the host for some kinds of machine checks which happen
when guest is running.
Reinject the instruction processing damage's machine checks including
Delayed Access Exception instead of damaging the host if it happens
in the guest because it could be caused by improper update on TLB entry
or other software case and impacts the guest only.

Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/nmi.h       |  7 ++++++
 arch/s390/include/asm/processor.h |  2 ++
 arch/s390/kernel/asm-offsets.c    |  3 +++
 arch/s390/kernel/entry.S          | 13 +++++++++-
 arch/s390/kernel/nmi.c            | 50 +++++++++++++++++++++++++++++++--------
 5 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/nmi.h b/arch/s390/include/asm/nmi.h
index e3e8895..13623b9 100644
--- a/arch/s390/include/asm/nmi.h
+++ b/arch/s390/include/asm/nmi.h
@@ -14,7 +14,14 @@
 #include <linux/const.h>
 #include <linux/types.h>
 
+#define MCIC_SUBCLASS_MASK	(1ULL<<63 | 1ULL<<62 | 1ULL<<61 | \
+				1ULL<<59 | 1ULL<<58 | 1ULL<<56 | \
+				1ULL<<55 | 1ULL<<54 | 1ULL<<53 | \
+				1ULL<<52 | 1ULL<<47 | 1ULL<<46 | \
+				1ULL<<45 | 1ULL<<44)
 #define MCCK_CODE_SYSTEM_DAMAGE		_BITUL(63)
+#define MCCK_CODE_EXT_DAMAGE		_BITUL(63 - 5)
+#define MCCK_CODE_CP			_BITUL(63 - 9)
 #define MCCK_CODE_CPU_TIMER_VALID	_BITUL(63 - 46)
 #define MCCK_CODE_PSW_MWP_VALID		_BITUL(63 - 20)
 #define MCCK_CODE_PSW_IA_VALID		_BITUL(63 - 23)
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 60d395f..5b1b247 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -20,6 +20,7 @@
 #define CIF_FPU			4	/* restore FPU registers */
 #define CIF_IGNORE_IRQ		5	/* ignore interrupt (for udelay) */
 #define CIF_ENABLED_WAIT	6	/* in enabled wait state */
+#define CIF_MCCK_GUEST		7	/* machine check happening in guest */
 
 #define _CIF_MCCK_PENDING	_BITUL(CIF_MCCK_PENDING)
 #define _CIF_ASCE_PRIMARY	_BITUL(CIF_ASCE_PRIMARY)
@@ -28,6 +29,7 @@
 #define _CIF_FPU		_BITUL(CIF_FPU)
 #define _CIF_IGNORE_IRQ		_BITUL(CIF_IGNORE_IRQ)
 #define _CIF_ENABLED_WAIT	_BITUL(CIF_ENABLED_WAIT)
+#define _CIF_MCCK_GUEST		_BITUL(CIF_MCCK_GUEST)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c
index 6bb2963..b65c414 100644
--- a/arch/s390/kernel/asm-offsets.c
+++ b/arch/s390/kernel/asm-offsets.c
@@ -58,6 +58,9 @@ int main(void)
 	OFFSET(__SF_BACKCHAIN, stack_frame, back_chain);
 	OFFSET(__SF_GPRS, stack_frame, gprs);
 	OFFSET(__SF_EMPTY, stack_frame, empty1);
+	OFFSET(__SF_SIE_CONTROL, stack_frame, empty1[0]);
+	OFFSET(__SF_SIE_SAVEAREA, stack_frame, empty1[1]);
+	OFFSET(__SF_SIE_REASON, stack_frame, empty1[2]);
 	BLANK();
 	/* timeval/timezone offsets for use by vdso */
 	OFFSET(__VDSO_UPD_COUNT, vdso_data, tb_update_count);
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index a5f5d3b..9b48196 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -225,6 +225,7 @@ ENTRY(sie64a)
 	jnz	.Lsie_skip
 	TSTMSK	__LC_CPU_FLAGS,_CIF_FPU
 	jo	.Lsie_skip			# exit if fp/vx regs changed
+.Lsie_entry:
 	sie	0(%r14)
 .Lsie_skip:
 	ni	__SIE_PROG0C+3(%r14),0xfe	# no longer in SIE
@@ -1102,7 +1103,13 @@ cleanup_critical:
 	.quad	.Lsie_done
 
 .Lcleanup_sie:
-	lg	%r9,__SF_EMPTY(%r15)		# get control block pointer
+	cghi    %r11,__LC_SAVE_AREA_ASYNC 	#Is this in normal interrupt?
+	je      1f
+	slg     %r9,BASED(.Lsie_crit_mcck_start)
+	clg     %r9,BASED(.Lsie_crit_mcck_length)
+	jh      1f
+	oi      __LC_CPU_FLAGS+7, _CIF_MCCK_GUEST
+1:	lg	%r9,__SF_EMPTY(%r15)		# get control block pointer
 	ni	__SIE_PROG0C+3(%r9),0xfe	# no longer in SIE
 	lctlg	%c1,%c1,__LC_USER_ASCE		# load primary asce
 	larl	%r9,sie_exit			# skip forward to sie_exit
@@ -1274,6 +1281,10 @@ cleanup_critical:
 	.quad	.Lsie_gmap
 .Lsie_critical_length:
 	.quad	.Lsie_done - .Lsie_gmap
+.Lsie_crit_mcck_start:
+	.quad   .Lsie_entry
+.Lsie_crit_mcck_length:
+	.quad   .Lsie_skip - .Lsie_entry
 #endif
 
 	.section .rodata, "a"
diff --git a/arch/s390/kernel/nmi.c b/arch/s390/kernel/nmi.c
index 9855895..958cc33 100644
--- a/arch/s390/kernel/nmi.c
+++ b/arch/s390/kernel/nmi.c
@@ -25,6 +25,7 @@
 #include <asm/crw.h>
 #include <asm/switch_to.h>
 #include <asm/ctl_reg.h>
+#include <asm/asm-offsets.h>
 
 struct mcck_struct {
 	unsigned int kill_task : 1;
@@ -280,6 +281,8 @@ static int notrace s390_validate_registers(union mci mci, int umode)
 #define ED_STP_ISLAND	6	/* External damage STP island check */
 #define ED_STP_SYNC	7	/* External damage STP sync check */
 
+#define MCCK_CODE_NO_GUEST	(MCCK_CODE_CP | MCCK_CODE_EXT_DAMAGE)
+
 /*
  * machine check handler.
  */
@@ -291,6 +294,7 @@ void notrace s390_do_machine_check(struct pt_regs *regs)
 	struct mcck_struct *mcck;
 	unsigned long long tmp;
 	union mci mci;
+	unsigned long mcck_dam_code;
 
 	nmi_enter();
 	inc_irq_stat(NMI_NMI);
@@ -301,7 +305,13 @@ void notrace s390_do_machine_check(struct pt_regs *regs)
 		/* System damage -> stopping machine */
 		s390_handle_damage();
 	}
-	if (mci.pd) {
+
+	/*
+	 * Reinject the instruction processing damages' machine checks
+	 * including Delayed Access Exception into the guest
+	 * instead of damaging the host if they happen in the guest.
+	 */
+	if (mci.pd && !test_cpu_flag(CIF_MCCK_GUEST)) {
 		if (mci.b) {
 			/* Processing backup -> verify if we can survive this */
 			u64 z_mcic, o_mcic, t_mcic;
@@ -358,15 +368,22 @@ void notrace s390_do_machine_check(struct pt_regs *regs)
 		if (mcck->stp_queue)
 			set_cpu_flag(CIF_MCCK_PENDING);
 	}
-	if (mci.se)
-		/* Storage error uncorrected */
-		s390_handle_damage();
-	if (mci.ke)
-		/* Storage key-error uncorrected */
-		s390_handle_damage();
-	if (mci.ds && mci.fa)
-		/* Storage degradation */
-		s390_handle_damage();
+
+	/*
+	 * Reinject storage related machine checks into the guest if they
+	 * happen when the guest is running.
+	 */
+	if (!test_cpu_flag(CIF_MCCK_GUEST)) {
+		if (mci.se)
+			/* Storage error uncorrected */
+			s390_handle_damage();
+		if (mci.ke)
+			/* Storage key-error uncorrected */
+			s390_handle_damage();
+		if (mci.ds && mci.fa)
+			/* Storage degradation */
+			s390_handle_damage();
+	}
 	if (mci.cp) {
 		/* Channel report word pending */
 		mcck->channel_report = 1;
@@ -377,6 +394,19 @@ void notrace s390_do_machine_check(struct pt_regs *regs)
 		mcck->warning = 1;
 		set_cpu_flag(CIF_MCCK_PENDING);
 	}
+
+	/*
+	 * If there are only Channel Report Pending and External Damage
+	 * machine checks, they will not be reinjected into the guest
+	 * because they refer to host conditions only.
+	 */
+	mcck_dam_code = (mci.val & MCIC_SUBCLASS_MASK);
+	if (test_cpu_flag(CIF_MCCK_GUEST) &&
+	(mcck_dam_code & MCCK_CODE_NO_GUEST) != mcck_dam_code) {
+		/* Set exit reason code for host's later handling */
+		*((long *)(regs->gprs[15] + __SF_SIE_REASON)) = -EINTR;
+	}
+	clear_cpu_flag(CIF_MCCK_GUEST);
 	nmi_exit();
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [GIT PULL 7/9] KVM: s390: Backup the guest's machine check info
  2017-06-28 17:30 [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Christian Borntraeger
                   ` (5 preceding siblings ...)
  2017-06-28 17:30 ` [GIT PULL 6/9] s390/nmi: s390: New low level handling for machine check happening in guest Christian Borntraeger
@ 2017-06-28 17:30 ` Christian Borntraeger
  2017-06-28 18:20   ` David Hildenbrand
  2017-06-28 17:30 ` [GIT PULL 8/9] KVM: s390: Inject machine check into the guest Christian Borntraeger
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 17:30 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: KVM, Christian Borntraeger, Cornelia Huck, linux-s390, QingFeng Hao

From: QingFeng Hao <haoqf@linux.vnet.ibm.com>

When a machine check happens in the guest, related mcck info (mcic,
external damage code, ...) is stored in the vcpu's lowcore on the host.
Then the machine check handler's low-level part is executed, followed
by the high-level part.

If the high-level part's execution is interrupted by a new machine check
happening on the same vcpu on the host, the mcck info in the lowcore is
overwritten with the new machine check's data.

If the high-level part's execution is scheduled to a different cpu,
the mcck info in the lowcore is uncertain.

Therefore, for both cases, the further reinjection to the guest will use
the wrong data.
Let's backup the mcck info in the lowcore to the sie page
for further reinjection, so that the right data will be used.

Add new member into struct sie_page to store related machine check's
info of mcic, failing storage address and external damage code.

Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 17 ++++++++++++++++-
 arch/s390/kernel/nmi.c           | 34 ++++++++++++++++++++++++++++++++++
 arch/s390/kvm/kvm-s390.c         |  1 +
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 426614a..c6e1d5f 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -107,6 +107,20 @@ struct esca_block {
 	struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
 } __packed;
 
+/*
+ * This struct is used to store some machine check info from lowcore
+ * for machine checks that happen while the guest is running.
+ * This info in host's lowcore might be overwritten by a second machine
+ * check from host when host is in the machine check's high-level handling.
+ * The size is 24 bytes.
+ */
+struct mcck_volatile_info {
+	__u64 mcic;
+	__u64 failing_storage_address;
+	__u32 ext_damage_code;
+	__u32 reserved;
+};
+
 #define CPUSTAT_STOPPED    0x80000000
 #define CPUSTAT_WAIT       0x10000000
 #define CPUSTAT_ECALL_PEND 0x08000000
@@ -264,7 +278,8 @@ struct kvm_s390_itdb {
 
 struct sie_page {
 	struct kvm_s390_sie_block sie_block;
-	__u8 reserved200[1024];		/* 0x0200 */
+	struct mcck_volatile_info mcck_info;	/* 0x0200 */
+	__u8 reserved218[1000];		/* 0x0218 */
 	struct kvm_s390_itdb itdb;	/* 0x0600 */
 	__u8 reserved700[2304];		/* 0x0700 */
 } __packed;
diff --git a/arch/s390/kernel/nmi.c b/arch/s390/kernel/nmi.c
index 958cc33..31d03a8 100644
--- a/arch/s390/kernel/nmi.c
+++ b/arch/s390/kernel/nmi.c
@@ -26,6 +26,7 @@
 #include <asm/switch_to.h>
 #include <asm/ctl_reg.h>
 #include <asm/asm-offsets.h>
+#include <linux/kvm_host.h>
 
 struct mcck_struct {
 	unsigned int kill_task : 1;
@@ -275,6 +276,31 @@ static int notrace s390_validate_registers(union mci mci, int umode)
 	return kill_task;
 }
 
+/*
+ * Backup the guest's machine check info to its description block
+ */
+static void notrace s390_backup_mcck_info(struct pt_regs *regs)
+{
+	struct mcck_volatile_info *mcck_backup;
+	struct sie_page *sie_page;
+
+	/* r14 contains the sie block, which was set in sie64a */
+	struct kvm_s390_sie_block *sie_block =
+			(struct kvm_s390_sie_block *) regs->gprs[14];
+
+	if (sie_block == NULL)
+		/* Something's seriously wrong, stop system. */
+		s390_handle_damage();
+
+	sie_page = container_of(sie_block, struct sie_page, sie_block);
+	mcck_backup = &sie_page->mcck_info;
+	mcck_backup->mcic = S390_lowcore.mcck_interruption_code &
+				~(MCCK_CODE_CP | MCCK_CODE_EXT_DAMAGE);
+	mcck_backup->ext_damage_code = S390_lowcore.external_damage_code;
+	mcck_backup->failing_storage_address
+			= S390_lowcore.failing_storage_address;
+}
+
 #define MAX_IPD_COUNT	29
 #define MAX_IPD_TIME	(5 * 60 * USEC_PER_SEC) /* 5 minutes */
 
@@ -355,6 +381,14 @@ void notrace s390_do_machine_check(struct pt_regs *regs)
 		mcck->mcck_code = mci.val;
 		set_cpu_flag(CIF_MCCK_PENDING);
 	}
+
+	/*
+	 * Backup the machine check's info if it happens when the guest
+	 * is running.
+	 */
+	if (test_cpu_flag(CIF_MCCK_GUEST))
+		s390_backup_mcck_info(regs);
+
 	if (mci.cd) {
 		/* Timing facility damage */
 		s390_handle_damage();
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 689ac48..0457e03 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2069,6 +2069,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 	if (!vcpu)
 		goto out;
 
+	BUILD_BUG_ON(sizeof(struct sie_page) != 4096);
 	sie_page = (struct sie_page *) get_zeroed_page(GFP_KERNEL);
 	if (!sie_page)
 		goto out_free_cpu;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [GIT PULL 8/9] KVM: s390: Inject machine check into the guest
  2017-06-28 17:30 [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Christian Borntraeger
                   ` (6 preceding siblings ...)
  2017-06-28 17:30 ` [GIT PULL 7/9] KVM: s390: Backup the guest's machine check info Christian Borntraeger
@ 2017-06-28 17:30 ` Christian Borntraeger
  2017-06-28 17:30 ` [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest Christian Borntraeger
  2017-06-28 20:39 ` [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Paolo Bonzini
  9 siblings, 0 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 17:30 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: KVM, Christian Borntraeger, Cornelia Huck, linux-s390, QingFeng Hao

From: QingFeng Hao <haoqf@linux.vnet.ibm.com>

If the exit flag of SIE indicates that a machine check has happened
during guest's running and needs to be injected, inject it to the guest
accordingly.
But some machine checks, e.g. Channel Report Pending (CRW), refer to
host conditions only (the guest's channel devices are not managed by
the kernel directly) and are therefore not injected into the guest.
External Damage (ED) is also not reinjected into the guest because ETR
conditions are gone in Linux and STP conditions are not enabled in the
guest, and ED contains only these 8 ETR and STP conditions.
In general, instruction-processing damage, system recovery,
storage error, service-processor damage and channel subsystem damage
will be reinjected into the guest, and the remain (System damage,
timing-facility damage, warning, ED and CRW) will be handled on the host.

Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/nmi.h |  6 ++++++
 arch/s390/kvm/interrupt.c   | 43 ++++++++++++++++++++++++++++++++++++++++++-
 arch/s390/kvm/kvm-s390.c    | 12 ++++++++++++
 arch/s390/kvm/kvm-s390.h    |  2 ++
 4 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/nmi.h b/arch/s390/include/asm/nmi.h
index 13623b9..9d91cf3 100644
--- a/arch/s390/include/asm/nmi.h
+++ b/arch/s390/include/asm/nmi.h
@@ -26,6 +26,12 @@
 #define MCCK_CODE_PSW_MWP_VALID		_BITUL(63 - 20)
 #define MCCK_CODE_PSW_IA_VALID		_BITUL(63 - 23)
 
+#define MCCK_CR14_CR_PENDING_SUB_MASK	(1 << 28)
+#define MCCK_CR14_RECOVERY_SUB_MASK	(1 << 27)
+#define MCCK_CR14_DEGRAD_SUB_MASK	(1 << 26)
+#define MCCK_CR14_EXT_DAMAGE_SUB_MASK	(1 << 25)
+#define MCCK_CR14_WARN_SUB_MASK		(1 << 24)
+
 #ifndef __ASSEMBLY__
 
 union mci {
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 72f3aaf..f2c78fc 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -251,8 +251,13 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu)
 		__clear_bit(IRQ_PEND_EXT_SERVICE, &active_mask);
 	if (psw_mchk_disabled(vcpu))
 		active_mask &= ~IRQ_PEND_MCHK_MASK;
+	/*
+	 * Check both floating and local interrupt's cr14 because
+	 * bit IRQ_PEND_MCHK_REP could be set in both cases.
+	 */
 	if (!(vcpu->arch.sie_block->gcr[14] &
-	      vcpu->kvm->arch.float_int.mchk.cr14))
+	   (vcpu->kvm->arch.float_int.mchk.cr14 |
+	   vcpu->arch.local_int.irq.mchk.cr14)))
 		__clear_bit(IRQ_PEND_MCHK_REP, &active_mask);
 
 	/*
@@ -2463,6 +2468,42 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
 	return ret;
 }
 
+/*
+ * Inject the machine check to the guest.
+ */
+void kvm_s390_reinject_machine_check(struct kvm_vcpu *vcpu,
+				     struct mcck_volatile_info *mcck_info)
+{
+	struct kvm_s390_interrupt_info inti;
+	struct kvm_s390_irq irq;
+	struct kvm_s390_mchk_info *mchk;
+	union mci mci;
+	__u64 cr14 = 0;         /* upper bits are not used */
+
+	mci.val = mcck_info->mcic;
+	if (mci.sr)
+		cr14 |= MCCK_CR14_RECOVERY_SUB_MASK;
+	if (mci.dg)
+		cr14 |= MCCK_CR14_DEGRAD_SUB_MASK;
+	if (mci.w)
+		cr14 |= MCCK_CR14_WARN_SUB_MASK;
+
+	mchk = mci.ck ? &inti.mchk : &irq.u.mchk;
+	mchk->cr14 = cr14;
+	mchk->mcic = mcck_info->mcic;
+	mchk->ext_damage_code = mcck_info->ext_damage_code;
+	mchk->failing_storage_address = mcck_info->failing_storage_address;
+	if (mci.ck) {
+		/* Inject the floating machine check */
+		inti.type = KVM_S390_MCHK;
+		WARN_ON_ONCE(__inject_vm(vcpu->kvm, &inti));
+	} else {
+		/* Inject the machine check to specified vcpu */
+		irq.type = KVM_S390_MCHK;
+		WARN_ON_ONCE(kvm_s390_inject_vcpu(vcpu, &irq));
+	}
+}
+
 int kvm_set_routing_entry(struct kvm *kvm,
 			  struct kvm_kernel_irq_routing_entry *e,
 			  const struct kvm_irq_routing_entry *ue)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 90434760..a0f6b59 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3041,6 +3041,9 @@ static int vcpu_post_run_fault_in_sie(struct kvm_vcpu *vcpu)
 
 static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
 {
+	struct mcck_volatile_info *mcck_info;
+	struct sie_page *sie_page;
+
 	VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
 		   vcpu->arch.sie_block->icptcode);
 	trace_kvm_s390_sie_exit(vcpu, vcpu->arch.sie_block->icptcode);
@@ -3051,6 +3054,15 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
 	vcpu->run->s.regs.gprs[14] = vcpu->arch.sie_block->gg14;
 	vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15;
 
+	if (exit_reason == -EINTR) {
+		VCPU_EVENT(vcpu, 3, "%s", "machine check");
+		sie_page = container_of(vcpu->arch.sie_block,
+					struct sie_page, sie_block);
+		mcck_info = &sie_page->mcck_info;
+		kvm_s390_reinject_machine_check(vcpu, mcck_info);
+		return 0;
+	}
+
 	if (vcpu->arch.sie_block->icptcode > 0) {
 		int rc = kvm_handle_sie_intercept(vcpu);
 
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 55f5c84..6fedc8b 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -397,4 +397,6 @@ static inline int kvm_s390_use_sca_entries(void)
 	 */
 	return sclp.has_sigpif;
 }
+void kvm_s390_reinject_machine_check(struct kvm_vcpu *vcpu,
+				     struct mcck_volatile_info *mcck_info);
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest
  2017-06-28 17:30 [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Christian Borntraeger
                   ` (7 preceding siblings ...)
  2017-06-28 17:30 ` [GIT PULL 8/9] KVM: s390: Inject machine check into the guest Christian Borntraeger
@ 2017-06-28 17:30 ` Christian Borntraeger
  2017-06-28 18:06   ` David Hildenbrand
  2017-06-28 20:39 ` [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Paolo Bonzini
  9 siblings, 1 reply; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 17:30 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: KVM, Christian Borntraeger, Cornelia Huck, linux-s390, QingFeng Hao

From: QingFeng Hao <haoqf@linux.vnet.ibm.com>

With vsie feature enabled, kvm can support nested guests (guest-3).
So inject machine check to the guest-2 if it happens when the nested
guest is running. And guest-2 will detect the machine check belongs
to guest-3 and reinject it into guest-3.
The host (guest-1) tries to inject the machine check to the picked
destination vcpu if it's a floating machine check.

Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/vsie.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index e947657..715c19c 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -26,13 +26,18 @@
 
 struct vsie_page {
 	struct kvm_s390_sie_block scb_s;	/* 0x0000 */
+	/*
+	 * the backup info for machine check. ensure it's at
+	 * the same offset as that in struct sie_page!
+	 */
+	struct mcck_volatile_info mcck_info;    /* 0x0200 */
 	/* the pinned originial scb */
-	struct kvm_s390_sie_block *scb_o;	/* 0x0200 */
+	struct kvm_s390_sie_block *scb_o;	/* 0x0218 */
 	/* the shadow gmap in use by the vsie_page */
-	struct gmap *gmap;			/* 0x0208 */
+	struct gmap *gmap;			/* 0x0220 */
 	/* address of the last reported fault to guest2 */
-	unsigned long fault_addr;		/* 0x0210 */
-	__u8 reserved[0x0700 - 0x0218];		/* 0x0218 */
+	unsigned long fault_addr;		/* 0x0228 */
+	__u8 reserved[0x0700 - 0x0230];		/* 0x0230 */
 	struct kvm_s390_crypto_cb crycb;	/* 0x0700 */
 	__u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE];	/* 0x0800 */
 };
@@ -801,6 +806,8 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 {
 	struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
 	struct kvm_s390_sie_block *scb_o = vsie_page->scb_o;
+	struct mcck_volatile_info *mcck_info;
+	struct sie_page *sie_page;
 	int rc;
 
 	handle_last_fault(vcpu, vsie_page);
@@ -822,6 +829,14 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 	local_irq_enable();
 	vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 
+	if (rc == -EINTR) {
+		VCPU_EVENT(vcpu, 3, "%s", "machine check");
+		sie_page = container_of(scb_s, struct sie_page, sie_block);
+		mcck_info = &sie_page->mcck_info;
+		kvm_s390_reinject_machine_check(vcpu, mcck_info);
+		return 0;
+	}
+
 	if (rc > 0)
 		rc = 0; /* we could still have an icpt */
 	else if (rc == -EFAULT)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest
  2017-06-28 17:30 ` [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest Christian Borntraeger
@ 2017-06-28 18:06   ` David Hildenbrand
  2017-06-28 18:59     ` Christian Borntraeger
  0 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2017-06-28 18:06 UTC (permalink / raw)
  To: Christian Borntraeger, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

On 28.06.2017 19:30, Christian Borntraeger wrote:
> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
> 
> With vsie feature enabled, kvm can support nested guests (guest-3).
> So inject machine check to the guest-2 if it happens when the nested
> guest is running. And guest-2 will detect the machine check belongs
> to guest-3 and reinject it into guest-3.
> The host (guest-1) tries to inject the machine check to the picked
> destination vcpu if it's a floating machine check.

The subject is confusing. We don't inject anything into the nested guest
here. We just catch machine checks during vsie and inject it into the
ordinary kvm guest.

Are there any SIE specific things to consider here, that may have to be
translated?

> 
> Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/vsie.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index e947657..715c19c 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -26,13 +26,18 @@
>  
>  struct vsie_page {
>  	struct kvm_s390_sie_block scb_s;	/* 0x0000 */
> +	/*
> +	 * the backup info for machine check. ensure it's at
> +	 * the same offset as that in struct sie_page!
> +	 */
> +	struct mcck_volatile_info mcck_info;    /* 0x0200 */
>  	/* the pinned originial scb */
> -	struct kvm_s390_sie_block *scb_o;	/* 0x0200 */
> +	struct kvm_s390_sie_block *scb_o;	/* 0x0218 */
>  	/* the shadow gmap in use by the vsie_page */
> -	struct gmap *gmap;			/* 0x0208 */
> +	struct gmap *gmap;			/* 0x0220 */
>  	/* address of the last reported fault to guest2 */
> -	unsigned long fault_addr;		/* 0x0210 */
> -	__u8 reserved[0x0700 - 0x0218];		/* 0x0218 */
> +	unsigned long fault_addr;		/* 0x0228 */
> +	__u8 reserved[0x0700 - 0x0230];		/* 0x0230 */
>  	struct kvm_s390_crypto_cb crycb;	/* 0x0700 */
>  	__u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE];	/* 0x0800 */
>  };
> @@ -801,6 +806,8 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>  {
>  	struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
>  	struct kvm_s390_sie_block *scb_o = vsie_page->scb_o;
> +	struct mcck_volatile_info *mcck_info;
> +	struct sie_page *sie_page;
>  	int rc;
>  
>  	handle_last_fault(vcpu, vsie_page);
> @@ -822,6 +829,14 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>  	local_irq_enable();
>  	vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
>  
> +	if (rc == -EINTR) {
> +		VCPU_EVENT(vcpu, 3, "%s", "machine check");

We directly have a pointer to vsie_page, why do we need to go back from
scb_s?

Am I missing anything important?



> +		sie_page = container_of(scb_s, struct sie_page, sie_block);
> +		mcck_info = &sie_page->mcck_info;
> +		kvm_s390_reinject_machine_check(vcpu, mcck_info);

This could be a simple

kvm_s390_reinject_machine_check(vcpu, &vsie_page->mcck_info);

no?

> +		return 0;
> +	}
> +
>  	if (rc > 0)
>  		rc = 0; /* we could still have an icpt */
>  	else if (rc == -EFAULT)
> 


-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 7/9] KVM: s390: Backup the guest's machine check info
  2017-06-28 17:30 ` [GIT PULL 7/9] KVM: s390: Backup the guest's machine check info Christian Borntraeger
@ 2017-06-28 18:20   ` David Hildenbrand
  2017-06-28 18:37     ` Christian Borntraeger
  0 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2017-06-28 18:20 UTC (permalink / raw)
  To: Christian Borntraeger, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

On 28.06.2017 19:30, Christian Borntraeger wrote:
> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
> 
> When a machine check happens in the guest, related mcck info (mcic,
> external damage code, ...) is stored in the vcpu's lowcore on the host.
> Then the machine check handler's low-level part is executed, followed
> by the high-level part.
> 
> If the high-level part's execution is interrupted by a new machine check
> happening on the same vcpu on the host, the mcck info in the lowcore is
> overwritten with the new machine check's data.
> 
> If the high-level part's execution is scheduled to a different cpu,
> the mcck info in the lowcore is uncertain.
> 
> Therefore, for both cases, the further reinjection to the guest will use
> the wrong data.
> Let's backup the mcck info in the lowcore to the sie page
> for further reinjection, so that the right data will be used.
> 
> Add new member into struct sie_page to store related machine check's
> info of mcic, failing storage address and external damage code.
> 


When this happens while the guest is running, there will be some
registers written into the low core save area (gprs, cr etc.) during the
machine check. Are these always host registers? Or can these be guest
registers?

Also, do the "valid" flags always refer to guest or host bits?

If they can be guest bits, I think we would have to do more translation.
And most likely treat vSIE special.

Or will something like that always lead to a host crash and real machine
errors will still take the host down, and not the guest?


-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 7/9] KVM: s390: Backup the guest's machine check info
  2017-06-28 18:20   ` David Hildenbrand
@ 2017-06-28 18:37     ` Christian Borntraeger
  2017-06-28 18:56       ` David Hildenbrand
  0 siblings, 1 reply; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 18:37 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

On 06/28/2017 08:20 PM, David Hildenbrand wrote:
> On 28.06.2017 19:30, Christian Borntraeger wrote:
>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>
>> When a machine check happens in the guest, related mcck info (mcic,
>> external damage code, ...) is stored in the vcpu's lowcore on the host.
>> Then the machine check handler's low-level part is executed, followed
>> by the high-level part.
>>
>> If the high-level part's execution is interrupted by a new machine check
>> happening on the same vcpu on the host, the mcck info in the lowcore is
>> overwritten with the new machine check's data.
>>
>> If the high-level part's execution is scheduled to a different cpu,
>> the mcck info in the lowcore is uncertain.
>>
>> Therefore, for both cases, the further reinjection to the guest will use
>> the wrong data.
>> Let's backup the mcck info in the lowcore to the sie page
>> for further reinjection, so that the right data will be used.
>>
>> Add new member into struct sie_page to store related machine check's
>> info of mcic, failing storage address and external damage code.
>>
> 
> 
> When this happens while the guest is running, there will be some
> registers written into the low core save area (gprs, cr etc.) during the
> machine check. Are these always host registers? Or can these be guest
> registers?

Always host registers (the machine will exit SIE before presenting the machine
check).
> 
> Also, do the "valid" flags always refer to guest or host bits?

The host bits.
> 
> If they can be guest bits, I think we would have to do more translation.
> And most likely treat vSIE special.
> 
> Or will something like that always lead to a host crash and real machine
> errors will still take the host down, and not the guest?
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 7/9] KVM: s390: Backup the guest's machine check info
  2017-06-28 18:37     ` Christian Borntraeger
@ 2017-06-28 18:56       ` David Hildenbrand
  2017-06-28 19:14           ` Christian Borntraeger
  0 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2017-06-28 18:56 UTC (permalink / raw)
  To: Christian Borntraeger, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

On 28.06.2017 20:37, Christian Borntraeger wrote:
> On 06/28/2017 08:20 PM, David Hildenbrand wrote:
>> On 28.06.2017 19:30, Christian Borntraeger wrote:
>>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>>
>>> When a machine check happens in the guest, related mcck info (mcic,
>>> external damage code, ...) is stored in the vcpu's lowcore on the host.
>>> Then the machine check handler's low-level part is executed, followed
>>> by the high-level part.
>>>
>>> If the high-level part's execution is interrupted by a new machine check
>>> happening on the same vcpu on the host, the mcck info in the lowcore is
>>> overwritten with the new machine check's data.
>>>
>>> If the high-level part's execution is scheduled to a different cpu,
>>> the mcck info in the lowcore is uncertain.
>>>
>>> Therefore, for both cases, the further reinjection to the guest will use
>>> the wrong data.
>>> Let's backup the mcck info in the lowcore to the sie page
>>> for further reinjection, so that the right data will be used.
>>>
>>> Add new member into struct sie_page to store related machine check's
>>> info of mcic, failing storage address and external damage code.
>>>
>>
>>
>> When this happens while the guest is running, there will be some
>> registers written into the low core save area (gprs, cr etc.) during the
>> machine check. Are these always host registers? Or can these be guest
>> registers?
> 
> Always host registers (the machine will exit SIE before presenting the machine
> check).
>>
>> Also, do the "valid" flags always refer to guest or host bits?
> 
> The host bits.

Okay, then why is mcic extracted from the real machine check and passed
on to the guest (almost unmodified)?

Wouldn't all guest registers than have to be always valid?

Also, if the guest does not have vector registers enabled, but the host
does, and there is a machine check, the guest would suddenly have the
vector registers valid flag indicated, which is strange.

For ordinary crw machine checks, we properly take care of that (QEMU
build_channel_report_mcic()).


-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest
  2017-06-28 18:06   ` David Hildenbrand
@ 2017-06-28 18:59     ` Christian Borntraeger
  2017-06-28 19:08       ` David Hildenbrand
  0 siblings, 1 reply; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 18:59 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

On 06/28/2017 08:06 PM, David Hildenbrand wrote:
> On 28.06.2017 19:30, Christian Borntraeger wrote:
>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>
>> With vsie feature enabled, kvm can support nested guests (guest-3).
>> So inject machine check to the guest-2 if it happens when the nested
>> guest is running. And guest-2 will detect the machine check belongs
>> to guest-3 and reinject it into guest-3.
>> The host (guest-1) tries to inject the machine check to the picked
>> destination vcpu if it's a floating machine check.
> 
> The subject is confusing. We don't inject anything into the nested guest
> here. We just catch machine checks during vsie and inject it into the
> ordinary kvm guest.

Agreed, it is confusing and maybe a leftover from an early rework due to internal
feedback. We inject in guest-2 and rely on guest-2 to reinject to guest-3 would
be a better wording.

 


> Are there any SIE specific things to consider here, that may have to be
> translated?

As HW exits SIE before delivering the machine check, the SIE control block 
contains all saved guest3 registers and the host (guest1) registers contain
the lazy registers (as we have already restored them) just like a normal exit.
> 
>>
>> Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  arch/s390/kvm/vsie.c | 23 +++++++++++++++++++----
>>  1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
>> index e947657..715c19c 100644
>> --- a/arch/s390/kvm/vsie.c
>> +++ b/arch/s390/kvm/vsie.c
>> @@ -26,13 +26,18 @@
>>  
>>  struct vsie_page {
>>  	struct kvm_s390_sie_block scb_s;	/* 0x0000 */
>> +	/*
>> +	 * the backup info for machine check. ensure it's at
>> +	 * the same offset as that in struct sie_page!
>> +	 */
>> +	struct mcck_volatile_info mcck_info;    /* 0x0200 */
>>  	/* the pinned originial scb */
>> -	struct kvm_s390_sie_block *scb_o;	/* 0x0200 */
>> +	struct kvm_s390_sie_block *scb_o;	/* 0x0218 */
>>  	/* the shadow gmap in use by the vsie_page */
>> -	struct gmap *gmap;			/* 0x0208 */
>> +	struct gmap *gmap;			/* 0x0220 */
>>  	/* address of the last reported fault to guest2 */
>> -	unsigned long fault_addr;		/* 0x0210 */
>> -	__u8 reserved[0x0700 - 0x0218];		/* 0x0218 */
>> +	unsigned long fault_addr;		/* 0x0228 */
>> +	__u8 reserved[0x0700 - 0x0230];		/* 0x0230 */
>>  	struct kvm_s390_crypto_cb crycb;	/* 0x0700 */
>>  	__u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE];	/* 0x0800 */
>>  };
>> @@ -801,6 +806,8 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>>  {
>>  	struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
>>  	struct kvm_s390_sie_block *scb_o = vsie_page->scb_o;
>> +	struct mcck_volatile_info *mcck_info;
>> +	struct sie_page *sie_page;
>>  	int rc;
>>  
>>  	handle_last_fault(vcpu, vsie_page);
>> @@ -822,6 +829,14 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>>  	local_irq_enable();
>>  	vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
>>  
>> +	if (rc == -EINTR) {
>> +		VCPU_EVENT(vcpu, 3, "%s", "machine check");
> 
> We directly have a pointer to vsie_page, why do we need to go back from
> scb_s?

> 
> Am I missing anything important?
> 
> 
> 
>> +		sie_page = container_of(scb_s, struct sie_page, sie_block);
>> +		mcck_info = &sie_page->mcck_info;
>> +		kvm_s390_reinject_machine_check(vcpu, mcck_info);
> 
> This could be a simple
> 
> kvm_s390_reinject_machine_check(vcpu, &vsie_page->mcck_info);
> 
> no?

Yes that would be simpler, I guess. Looks like  is just a "do it like the low
level handler". The code in nmi.c has to go back from the sie control block into
SIE page. 

Do you want a respin of this patch?

> 
>> +		return 0;
>> +	}
>> +
>>  	if (rc > 0)
>>  		rc = 0; /* we could still have an icpt */
>>  	else if (rc == -EFAULT)
>>
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest
  2017-06-28 18:59     ` Christian Borntraeger
@ 2017-06-28 19:08       ` David Hildenbrand
  2017-06-28 19:16         ` Christian Borntraeger
  0 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2017-06-28 19:08 UTC (permalink / raw)
  To: Christian Borntraeger, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

On 28.06.2017 20:59, Christian Borntraeger wrote:
> On 06/28/2017 08:06 PM, David Hildenbrand wrote:
>> On 28.06.2017 19:30, Christian Borntraeger wrote:
>>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>>
>>> With vsie feature enabled, kvm can support nested guests (guest-3).
>>> So inject machine check to the guest-2 if it happens when the nested
>>> guest is running. And guest-2 will detect the machine check belongs
>>> to guest-3 and reinject it into guest-3.
>>> The host (guest-1) tries to inject the machine check to the picked
>>> destination vcpu if it's a floating machine check.
>>
>> The subject is confusing. We don't inject anything into the nested guest
>> here. We just catch machine checks during vsie and inject it into the
>> ordinary kvm guest.
> 
> Agreed, it is confusing and maybe a leftover from an early rework due to internal
> feedback. We inject in guest-2 and rely on guest-2 to reinject to guest-3 would
> be a better wording.
> 
>  
> 
> 
>> Are there any SIE specific things to consider here, that may have to be
>> translated?
> 
> As HW exits SIE before delivering the machine check, the SIE control block 
> contains all saved guest3 registers and the host (guest1) registers contain
> the lazy registers (as we have already restored them) just like a normal exit.

As mentioned in the other mail, e.g. vector register validity should
only be set if vector registers are enabled for the nested guest
(execution control enabled).

>>
>>> +		sie_page = container_of(scb_s, struct sie_page, sie_block);
>>> +		mcck_info = &sie_page->mcck_info;
>>> +		kvm_s390_reinject_machine_check(vcpu, mcck_info);
>>
>> This could be a simple
>>
>> kvm_s390_reinject_machine_check(vcpu, &vsie_page->mcck_info);
>>
>> no?
> 
> Yes that would be simpler, I guess. Looks like  is just a "do it like the low
> level handler". The code in nmi.c has to go back from the sie control block into
> SIE page. 
> 
> Do you want a respin of this patch?

You can also send a fixup if you don't have to respin.


-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 7/9] KVM: s390: Backup the guest's machine check info
  2017-06-28 18:56       ` David Hildenbrand
@ 2017-06-28 19:14           ` Christian Borntraeger
  0 siblings, 0 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 19:14 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

On 06/28/2017 08:56 PM, David Hildenbrand wrote:
> On 28.06.2017 20:37, Christian Borntraeger wrote:
>> On 06/28/2017 08:20 PM, David Hildenbrand wrote:
>>> On 28.06.2017 19:30, Christian Borntraeger wrote:
>>>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>>>
>>>> When a machine check happens in the guest, related mcck info (mcic,
>>>> external damage code, ...) is stored in the vcpu's lowcore on the host.
>>>> Then the machine check handler's low-level part is executed, followed
>>>> by the high-level part.
>>>>
>>>> If the high-level part's execution is interrupted by a new machine check
>>>> happening on the same vcpu on the host, the mcck info in the lowcore is
>>>> overwritten with the new machine check's data.
>>>>
>>>> If the high-level part's execution is scheduled to a different cpu,
>>>> the mcck info in the lowcore is uncertain.
>>>>
>>>> Therefore, for both cases, the further reinjection to the guest will use
>>>> the wrong data.
>>>> Let's backup the mcck info in the lowcore to the sie page
>>>> for further reinjection, so that the right data will be used.
>>>>
>>>> Add new member into struct sie_page to store related machine check's
>>>> info of mcic, failing storage address and external damage code.
>>>>
>>>
>>>
>>> When this happens while the guest is running, there will be some
>>> registers written into the low core save area (gprs, cr etc.) during the
>>> machine check. Are these always host registers? Or can these be guest
>>> registers?
>>
>> Always host registers (the machine will exit SIE before presenting the machine
>> check).
>>>
>>> Also, do the "valid" flags always refer to guest or host bits?
>>
>> The host bits.
> 
> Okay, then why is mcic extracted from the real machine check and passed
> on to the guest (almost unmodified)?
> 
> Wouldn't all guest registers than have to be always valid?

No. Several guest registers are shared with host registers and reloaded lazily.
For example if the floating point registers are invalid, they will be invalid 
for the host (but the registers contain the guest values).
From a KVM perspective we can always turn off valid bits in case of doubts 
(overindicate errors). So the current variant is certainly not perfect, but it
at least will not claim correct registers when they are incorrect.
We can certainly improve the handling in a future patch. (e.g. turn on validity
for control registers), I will have a look.

> 
> Also, if the guest does not have vector registers enabled, but the host
> does, and there is a machine check, the guest would suddenly have the
> vector registers valid flag indicated, which is strange.

Yes, this should be masked away. Will provide a fixup patch.
> 
> For ordinary crw machine checks, we properly take care of that (QEMU
> build_channel_report_mcic()).
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 7/9] KVM: s390: Backup the guest's machine check info
@ 2017-06-28 19:14           ` Christian Borntraeger
  0 siblings, 0 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 19:14 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

On 06/28/2017 08:56 PM, David Hildenbrand wrote:
> On 28.06.2017 20:37, Christian Borntraeger wrote:
>> On 06/28/2017 08:20 PM, David Hildenbrand wrote:
>>> On 28.06.2017 19:30, Christian Borntraeger wrote:
>>>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>>>
>>>> When a machine check happens in the guest, related mcck info (mcic,
>>>> external damage code, ...) is stored in the vcpu's lowcore on the host.
>>>> Then the machine check handler's low-level part is executed, followed
>>>> by the high-level part.
>>>>
>>>> If the high-level part's execution is interrupted by a new machine check
>>>> happening on the same vcpu on the host, the mcck info in the lowcore is
>>>> overwritten with the new machine check's data.
>>>>
>>>> If the high-level part's execution is scheduled to a different cpu,
>>>> the mcck info in the lowcore is uncertain.
>>>>
>>>> Therefore, for both cases, the further reinjection to the guest will use
>>>> the wrong data.
>>>> Let's backup the mcck info in the lowcore to the sie page
>>>> for further reinjection, so that the right data will be used.
>>>>
>>>> Add new member into struct sie_page to store related machine check's
>>>> info of mcic, failing storage address and external damage code.
>>>>
>>>
>>>
>>> When this happens while the guest is running, there will be some
>>> registers written into the low core save area (gprs, cr etc.) during the
>>> machine check. Are these always host registers? Or can these be guest
>>> registers?
>>
>> Always host registers (the machine will exit SIE before presenting the machine
>> check).
>>>
>>> Also, do the "valid" flags always refer to guest or host bits?
>>
>> The host bits.
> 
> Okay, then why is mcic extracted from the real machine check and passed
> on to the guest (almost unmodified)?
> 
> Wouldn't all guest registers than have to be always valid?

No. Several guest registers are shared with host registers and reloaded lazily.
For example if the floating point registers are invalid, they will be invalid 
for the host (but the registers contain the guest values).
>From a KVM perspective we can always turn off valid bits in case of doubts 
(overindicate errors). So the current variant is certainly not perfect, but it
at least will not claim correct registers when they are incorrect.
We can certainly improve the handling in a future patch. (e.g. turn on validity
for control registers), I will have a look.

> 
> Also, if the guest does not have vector registers enabled, but the host
> does, and there is a machine check, the guest would suddenly have the
> vector registers valid flag indicated, which is strange.

Yes, this should be masked away. Will provide a fixup patch.
> 
> For ordinary crw machine checks, we properly take care of that (QEMU
> build_channel_report_mcic()).
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest
  2017-06-28 19:08       ` David Hildenbrand
@ 2017-06-28 19:16         ` Christian Borntraeger
  2017-06-28 19:17           ` David Hildenbrand
  2017-06-29 14:57           ` Christian Borntraeger
  0 siblings, 2 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-28 19:16 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

On 06/28/2017 09:08 PM, David Hildenbrand wrote:
> On 28.06.2017 20:59, Christian Borntraeger wrote:
>> On 06/28/2017 08:06 PM, David Hildenbrand wrote:
>>> On 28.06.2017 19:30, Christian Borntraeger wrote:
>>>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>>>
>>>> With vsie feature enabled, kvm can support nested guests (guest-3).
>>>> So inject machine check to the guest-2 if it happens when the nested
>>>> guest is running. And guest-2 will detect the machine check belongs
>>>> to guest-3 and reinject it into guest-3.
>>>> The host (guest-1) tries to inject the machine check to the picked
>>>> destination vcpu if it's a floating machine check.
>>>
>>> The subject is confusing. We don't inject anything into the nested guest
>>> here. We just catch machine checks during vsie and inject it into the
>>> ordinary kvm guest.
>>
>> Agreed, it is confusing and maybe a leftover from an early rework due to internal
>> feedback. We inject in guest-2 and rely on guest-2 to reinject to guest-3 would
>> be a better wording.
>>
>>  
>>
>>
>>> Are there any SIE specific things to consider here, that may have to be
>>> translated?
>>
>> As HW exits SIE before delivering the machine check, the SIE control block 
>> contains all saved guest3 registers and the host (guest1) registers contain
>> the lazy registers (as we have already restored them) just like a normal exit.
> 
> As mentioned in the other mail, e.g. vector register validity should
> only be set if vector registers are enabled for the nested guest
> (execution control enabled).

Yes, but sInce we inject in the base guest (non-nested) we have to check the base guest
execution control. 

> 
>>>
>>>> +		sie_page = container_of(scb_s, struct sie_page, sie_block);
>>>> +		mcck_info = &sie_page->mcck_info;
>>>> +		kvm_s390_reinject_machine_check(vcpu, mcck_info);
>>>
>>> This could be a simple
>>>
>>> kvm_s390_reinject_machine_check(vcpu, &vsie_page->mcck_info);
>>>
>>> no?
>>
>> Yes that would be simpler, I guess. Looks like  is just a "do it like the low
>> level handler". The code in nmi.c has to go back from the sie control block into
>> SIE page. 
>>
>> Do you want a respin of this patch?
> 
> You can also send a fixup if you don't have to respin.
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest
  2017-06-28 19:16         ` Christian Borntraeger
@ 2017-06-28 19:17           ` David Hildenbrand
  2017-06-29 14:57           ` Christian Borntraeger
  1 sibling, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2017-06-28 19:17 UTC (permalink / raw)
  To: Christian Borntraeger, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao


>> As mentioned in the other mail, e.g. vector register validity should
>> only be set if vector registers are enabled for the nested guest
>> (execution control enabled).
> 
> Yes, but sInce we inject in the base guest (non-nested) we have to check the base guest
> execution control. 
> 

Right, confusion due to nesting levels strikes again :)

-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 7/9] KVM: s390: Backup the guest's machine check info
  2017-06-28 19:14           ` Christian Borntraeger
  (?)
@ 2017-06-28 19:46           ` David Hildenbrand
  -1 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2017-06-28 19:46 UTC (permalink / raw)
  To: Christian Borntraeger, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

> No. Several guest registers are shared with host registers and reloaded lazily.
> For example if the floating point registers are invalid, they will be invalid 
> for the host (but the registers contain the guest values).
> From a KVM perspective we can always turn off valid bits in case of doubts 
> (overindicate errors). So the current variant is certainly not perfect, but it
> at least will not claim correct registers when they are incorrect.
> We can certainly improve the handling in a future patch. (e.g. turn on validity
> for control registers), I will have a look.
> 

As far as I understand, that would then mean, enabling all validity
flags for registers that are not shared. Makes sense.


-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next)
  2017-06-28 17:30 [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Christian Borntraeger
                   ` (8 preceding siblings ...)
  2017-06-28 17:30 ` [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest Christian Borntraeger
@ 2017-06-28 20:39 ` Paolo Bonzini
  9 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2017-06-28 20:39 UTC (permalink / raw)
  To: Christian Borntraeger, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390



On 28/06/2017 19:30, Christian Borntraeger wrote:
> Paolo, Radim,
> 
> the first chunk of 4.13 related changes for s390.
> 
> There are two conflicts
> 1. wit the kvm-arm tree due to 2387149eade2 ("KVM: improve arch vcpu
> request defining") which will be resolved in the kvm tree, I assume
> 2. with the s390 tree due to the s390-wide cleanup  a75259825401
> ("s390: rename struct psw_bits members"). This is trivial. I guess
> this must be resolved by Linus.
> 
> It also contains a merge of a topic branch for machine check handling.
> This branch is also merged in the s390 tree.
> 
> The following changes since commit 865279c53ca9d88718d974bb014b2c6ce259ac75:
> 
>   tools/kvm_stat: display guest list in pid/guest selection screens (2017-06-08 18:24:49 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  tags/kvm-s390-next-4.13-1
> 
> for you to fetch changes up to d52cd2076eb2ace9fe95dbf795f6d93587453914:
> 
>   KVM: s390: Inject machine check into the nested guest (2017-06-28 12:42:33 +0200)
> 
> ----------------------------------------------------------------
> KVM: s390: fixes and features for 4.13
> 
> - initial machine check forwarding
> - migration support for the CMMA page hinting information
> - cleanups
> - fixes
> 
> ----------------------------------------------------------------
> Christian Borntraeger (2):
>       KVM: s390: implement instruction execution protection for emulated     ifetch
>       Merge tag 'nmiforkvm' of git://git.kernel.org/.../kvms390/linux into kernelorgnext
> 
> Claudio Imbrenda (2):
>       KVM: s390: CMMA tracking, ESSA emulation, migration mode
>       KVM: s390: ioctls to get and set guest storage attributes
> 
> Martin Schwidefsky (1):
>       KVM: s390: avoid packed attribute
> 
> QingFeng Hao (4):
>       s390/nmi: s390: New low level handling for machine check happening in guest
>       KVM: s390: Backup the guest's machine check info
>       KVM: s390: Inject machine check into the guest
>       KVM: s390: Inject machine check into the nested guest
> 
> Yi Min Zhao (1):
>       KVM: S390: add new group for flic
> 
>  Documentation/virtual/kvm/api.txt               | 135 +++++++++
>  Documentation/virtual/kvm/devices/s390_flic.txt |  15 +
>  Documentation/virtual/kvm/devices/vm.txt        |  33 +++
>  arch/s390/include/asm/ctl_reg.h                 |   4 +-
>  arch/s390/include/asm/kvm_host.h                |  44 ++-
>  arch/s390/include/asm/nmi.h                     |  13 +
>  arch/s390/include/asm/processor.h               |   2 +
>  arch/s390/include/uapi/asm/kvm.h                |  12 +
>  arch/s390/kernel/asm-offsets.c                  |   3 +
>  arch/s390/kernel/entry.S                        |  13 +-
>  arch/s390/kernel/nmi.c                          |  84 +++++-
>  arch/s390/kvm/gaccess.c                         |  43 ++-
>  arch/s390/kvm/interrupt.c                       |  91 +++++-
>  arch/s390/kvm/kvm-s390.c                        | 372 +++++++++++++++++++++++-
>  arch/s390/kvm/kvm-s390.h                        |   2 +
>  arch/s390/kvm/priv.c                            | 103 ++++++-
>  arch/s390/kvm/vsie.c                            |  25 +-
>  include/uapi/linux/kvm.h                        |  33 +++
>  18 files changed, 981 insertions(+), 46 deletions(-)
> 

Pulled, thanks (and thanks David for the reviews!).

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest
  2017-06-28 19:16         ` Christian Borntraeger
  2017-06-28 19:17           ` David Hildenbrand
@ 2017-06-29 14:57           ` Christian Borntraeger
  2017-06-30 16:30             ` David Hildenbrand
  1 sibling, 1 reply; 24+ messages in thread
From: Christian Borntraeger @ 2017-06-29 14:57 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

On 06/28/2017 09:16 PM, Christian Borntraeger wrote:
> On 06/28/2017 09:08 PM, David Hildenbrand wrote:
>> On 28.06.2017 20:59, Christian Borntraeger wrote:
>>> On 06/28/2017 08:06 PM, David Hildenbrand wrote:
>>>> On 28.06.2017 19:30, Christian Borntraeger wrote:
>>>>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>>>>
>>>>> With vsie feature enabled, kvm can support nested guests (guest-3).
>>>>> So inject machine check to the guest-2 if it happens when the nested
>>>>> guest is running. And guest-2 will detect the machine check belongs
>>>>> to guest-3 and reinject it into guest-3.
>>>>> The host (guest-1) tries to inject the machine check to the picked
>>>>> destination vcpu if it's a floating machine check.
>>>>
>>>> The subject is confusing. We don't inject anything into the nested guest
>>>> here. We just catch machine checks during vsie and inject it into the
>>>> ordinary kvm guest.
>>>
>>> Agreed, it is confusing and maybe a leftover from an early rework due to internal
>>> feedback. We inject in guest-2 and rely on guest-2 to reinject to guest-3 would
>>> be a better wording.
>>>
>>>  
>>>
>>>
>>>> Are there any SIE specific things to consider here, that may have to be
>>>> translated?
>>>
>>> As HW exits SIE before delivering the machine check, the SIE control block 
>>> contains all saved guest3 registers and the host (guest1) registers contain
>>> the lazy registers (as we have already restored them) just like a normal exit.
>>
>> As mentioned in the other mail, e.g. vector register validity should
>> only be set if vector registers are enabled for the nested guest
>> (execution control enabled).
> 
> Yes, but sInce we inject in the base guest (non-nested) we have to check the base guest
> execution control. 

While we drag the vector validity along, it looks like __write_machine_check is already
doing the right thing for vector and guarded storage before writing the mcic to the
guest, no?

[...]
        if (!rc && mci.vr && ext_sa_addr && test_kvm_facility(vcpu->kvm, 129)) {
                if (write_guest_abs(vcpu, ext_sa_addr, vcpu->run->s.regs.vrs,
                                    512))
                        mci.vr = 0;
        } else {
                mci.vr = 0;
        }
        if (!rc && mci.gs && ext_sa_addr && test_kvm_facility(vcpu->kvm, 133)
            && (lc == 11 || lc == 12)) {
                if (write_guest_abs(vcpu, ext_sa_addr + 1024,
                                    &vcpu->run->s.regs.gscb, 32))
                        mci.gs = 0;
        } else {
                mci.gs = 0;
        }
[...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest
  2017-06-29 14:57           ` Christian Borntraeger
@ 2017-06-30 16:30             ` David Hildenbrand
  0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2017-06-30 16:30 UTC (permalink / raw)
  To: Christian Borntraeger, Paolo Bonzini, Radim Krčmář
  Cc: KVM, Cornelia Huck, linux-s390, QingFeng Hao

On 29.06.2017 16:57, Christian Borntraeger wrote:
> On 06/28/2017 09:16 PM, Christian Borntraeger wrote:
>> On 06/28/2017 09:08 PM, David Hildenbrand wrote:
>>> On 28.06.2017 20:59, Christian Borntraeger wrote:
>>>> On 06/28/2017 08:06 PM, David Hildenbrand wrote:
>>>>> On 28.06.2017 19:30, Christian Borntraeger wrote:
>>>>>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>>>>>
>>>>>> With vsie feature enabled, kvm can support nested guests (guest-3).
>>>>>> So inject machine check to the guest-2 if it happens when the nested
>>>>>> guest is running. And guest-2 will detect the machine check belongs
>>>>>> to guest-3 and reinject it into guest-3.
>>>>>> The host (guest-1) tries to inject the machine check to the picked
>>>>>> destination vcpu if it's a floating machine check.
>>>>>
>>>>> The subject is confusing. We don't inject anything into the nested guest
>>>>> here. We just catch machine checks during vsie and inject it into the
>>>>> ordinary kvm guest.
>>>>
>>>> Agreed, it is confusing and maybe a leftover from an early rework due to internal
>>>> feedback. We inject in guest-2 and rely on guest-2 to reinject to guest-3 would
>>>> be a better wording.
>>>>
>>>>  
>>>>
>>>>
>>>>> Are there any SIE specific things to consider here, that may have to be
>>>>> translated?
>>>>
>>>> As HW exits SIE before delivering the machine check, the SIE control block 
>>>> contains all saved guest3 registers and the host (guest1) registers contain
>>>> the lazy registers (as we have already restored them) just like a normal exit.
>>>
>>> As mentioned in the other mail, e.g. vector register validity should
>>> only be set if vector registers are enabled for the nested guest
>>> (execution control enabled).
>>
>> Yes, but sInce we inject in the base guest (non-nested) we have to check the base guest
>> execution control. 
> 
> While we drag the vector validity along, it looks like __write_machine_check is already
> doing the right thing for vector and guarded storage before writing the mcic to the
> guest, no?

(sorry for the delay, I'm currently visiting a rural part of Slovakia)

Yes, perfect. So the only thing that could actually happen is an under
indication, which can be easily added later.

Thanks!


-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-06-30 16:30 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-28 17:30 [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Christian Borntraeger
2017-06-28 17:30 ` [GIT PULL 1/9] KVM: s390: CMMA tracking, ESSA emulation, migration mode Christian Borntraeger
2017-06-28 17:30 ` [GIT PULL 2/9] KVM: s390: ioctls to get and set guest storage attributes Christian Borntraeger
2017-06-28 17:30 ` [GIT PULL 3/9] KVM: s390: implement instruction execution protection for emulated ifetch Christian Borntraeger
2017-06-28 17:30 ` [GIT PULL 4/9] KVM: S390: add new group for flic Christian Borntraeger
2017-06-28 17:30 ` [GIT PULL 5/9] KVM: s390: avoid packed attribute Christian Borntraeger
2017-06-28 17:30 ` [GIT PULL 6/9] s390/nmi: s390: New low level handling for machine check happening in guest Christian Borntraeger
2017-06-28 17:30 ` [GIT PULL 7/9] KVM: s390: Backup the guest's machine check info Christian Borntraeger
2017-06-28 18:20   ` David Hildenbrand
2017-06-28 18:37     ` Christian Borntraeger
2017-06-28 18:56       ` David Hildenbrand
2017-06-28 19:14         ` Christian Borntraeger
2017-06-28 19:14           ` Christian Borntraeger
2017-06-28 19:46           ` David Hildenbrand
2017-06-28 17:30 ` [GIT PULL 8/9] KVM: s390: Inject machine check into the guest Christian Borntraeger
2017-06-28 17:30 ` [GIT PULL 9/9] KVM: s390: Inject machine check into the nested guest Christian Borntraeger
2017-06-28 18:06   ` David Hildenbrand
2017-06-28 18:59     ` Christian Borntraeger
2017-06-28 19:08       ` David Hildenbrand
2017-06-28 19:16         ` Christian Borntraeger
2017-06-28 19:17           ` David Hildenbrand
2017-06-29 14:57           ` Christian Borntraeger
2017-06-30 16:30             ` David Hildenbrand
2017-06-28 20:39 ` [GIT PULL 0/9] KVM: s390: fixes and features for 4.13 (via kvm/next) Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.