All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv5 0/7] aync page fault support for s390 (plus flic)
@ 2013-10-08 14:54 Christian Borntraeger
  2013-10-08 14:54 ` [Patchv5 1/7] KVM: s390: add and extend interrupt information data structs Christian Borntraeger
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Christian Borntraeger @ 2013-10-08 14:54 UTC (permalink / raw)
  To: Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Christian Borntraeger

Gleb, Paolo,

here is a set of patches containing the floating interrupt controller
and the async page fault patches on top of them. (It took a bit
longer than expected)

Several changes since v4 of the async patches mostly to make life 
migration possible:
- Please have a look at patch 5. This one is new: We need to a
  way to let all workers finish to inject the completion interrupt.
  Otherwise guest processes will stall forever. (there is no wake all
  in the exisiting guest interface)
- pfault is moved into flic. Reason is simple: The completion interrupts
  are floating,  therefore we must serialize the flushing of pfault and
  creation of the completion interrupts (which are floating) with migrating
  all floating interrupts.
- prepare pfault context for migration
- bugfixes
Everything else should look similar to what you already know.

Changes since v3 of flic
- new device number
- no anonymous union
- fixed sparse warning
- get rid of KVM_S390_INT_MAX, which was a leftover from an earlier version
This is what Alex approved (with known todos/cleanups for future patches)



Let me know if you still have some concerns, otherwise, feel free to pull
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git tags/flic_pfault

Christian

Dominik Dingel (5):
  KVM: s390: Add FAULT_FLAG_RETRY_NOWAIT for guest fault
  KVM: async_pf: Provide additional direct page notification
  KVM: async_pf: Allow to wait for outstanding work
  KVM: async_pf:  Async page fault support on s390
  KVM: async_pf: Exploit one reg interface for pfault

Jens Freimann (2):
  KVM: s390: add and extend interrupt information data structs
  KVM: s390: add floating irq controller

 Documentation/virtual/kvm/devices/s390_flic.txt |  36 +++
 arch/s390/include/asm/kvm_host.h                |  57 ++--
 arch/s390/include/asm/pgtable.h                 |   2 +
 arch/s390/include/asm/processor.h               |   1 +
 arch/s390/include/uapi/asm/kvm.h                |  11 +
 arch/s390/kvm/Kconfig                           |   2 +
 arch/s390/kvm/Makefile                          |   2 +-
 arch/s390/kvm/diag.c                            |  84 ++++++
 arch/s390/kvm/interrupt.c                       | 364 ++++++++++++++++++++----
 arch/s390/kvm/kvm-s390.c                        | 134 ++++++++-
 arch/s390/kvm/kvm-s390.h                        |   4 +
 arch/s390/kvm/sigp.c                            |   7 +
 arch/s390/kvm/trace.h                           |  46 +++
 arch/s390/mm/fault.c                            |  26 +-
 arch/x86/kvm/mmu.c                              |   2 +-
 arch/x86/kvm/x86.c                              |   8 +-
 include/linux/kvm_host.h                        |   5 +-
 include/uapi/linux/kvm.h                        |  66 +++++
 virt/kvm/Kconfig                                |   4 +
 virt/kvm/async_pf.c                             |  28 +-
 virt/kvm/kvm_main.c                             |   5 +
 21 files changed, 782 insertions(+), 112 deletions(-)
 create mode 100644 Documentation/virtual/kvm/devices/s390_flic.txt

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patchv5 1/7] KVM: s390: add and extend interrupt information data structs
  2013-10-08 14:54 [PATCHv5 0/7] aync page fault support for s390 (plus flic) Christian Borntraeger
@ 2013-10-08 14:54 ` Christian Borntraeger
  2013-10-08 14:54 ` [Patchv5 2/7] KVM: s390: add floating irq controller Christian Borntraeger
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2013-10-08 14:54 UTC (permalink / raw)
  To: Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Jens Freimann, Christian Borntraeger

From: Jens Freimann <jfrei@linux.vnet.ibm.com>

With the currently available struct kvm_s390_interrupt it is not possible to
inject every kind of interrupt as defined in the z/Architecture. Add
additional interruption parameters to the structures and move it to kvm.h

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 34 +---------------------
 include/uapi/linux/kvm.h         | 63 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+), 33 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 6a0e27b..78b6918 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -16,6 +16,7 @@
 #include <linux/hrtimer.h>
 #include <linux/interrupt.h>
 #include <linux/kvm_host.h>
+#include <linux/kvm.h>
 #include <asm/debug.h>
 #include <asm/cpu.h>
 
@@ -162,18 +163,6 @@ struct kvm_vcpu_stat {
 	u32 diagnose_9c;
 };
 
-struct kvm_s390_io_info {
-	__u16        subchannel_id;            /* 0x0b8 */
-	__u16        subchannel_nr;            /* 0x0ba */
-	__u32        io_int_parm;              /* 0x0bc */
-	__u32        io_int_word;              /* 0x0c0 */
-};
-
-struct kvm_s390_ext_info {
-	__u32 ext_params;
-	__u64 ext_params2;
-};
-
 #define PGM_OPERATION            0x01
 #define PGM_PRIVILEGED_OP	 0x02
 #define PGM_EXECUTE              0x03
@@ -182,27 +171,6 @@ struct kvm_s390_ext_info {
 #define PGM_SPECIFICATION        0x06
 #define PGM_DATA                 0x07
 
-struct kvm_s390_pgm_info {
-	__u16 code;
-};
-
-struct kvm_s390_prefix_info {
-	__u32 address;
-};
-
-struct kvm_s390_extcall_info {
-	__u16 code;
-};
-
-struct kvm_s390_emerg_info {
-	__u16 code;
-};
-
-struct kvm_s390_mchk_info {
-	__u64 cr14;
-	__u64 mcic;
-};
-
 struct kvm_s390_interrupt_info {
 	struct list_head list;
 	u64	type;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 99c2533..450fae8 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -434,6 +434,69 @@ struct kvm_s390_interrupt {
 	__u64 parm64;
 };
 
+struct kvm_s390_io_info {
+	__u16 subchannel_id;
+	__u16 subchannel_nr;
+	__u32 io_int_parm;
+	__u32 io_int_word;
+};
+
+struct kvm_s390_ext_info {
+	__u32 ext_params;
+	__u32 pad;
+	__u64 ext_params2;
+};
+
+struct kvm_s390_pgm_info {
+	__u64 trans_exc_code;
+	__u64 mon_code;
+	__u64 per_address;
+	__u32 data_exc_code;
+	__u16 code;
+	__u16 mon_class_nr;
+	__u8 per_code;
+	__u8 per_atmid;
+	__u8 exc_access_id;
+	__u8 per_access_id;
+	__u8 op_access_id;
+	__u8 pad[3];
+};
+
+struct kvm_s390_prefix_info {
+	__u32 address;
+};
+
+struct kvm_s390_extcall_info {
+	__u16 code;
+};
+
+struct kvm_s390_emerg_info {
+	__u16 code;
+};
+
+struct kvm_s390_mchk_info {
+	__u64 cr14;
+	__u64 mcic;
+	__u64 failing_storage_address;
+	__u32 ext_damage_code;
+	__u32 pad;
+	__u8 fixed_logout[16];
+};
+
+struct kvm_s390_irq {
+	__u64 type;
+	union {
+		struct kvm_s390_io_info io;
+		struct kvm_s390_ext_info ext;
+		struct kvm_s390_pgm_info pgm;
+		struct kvm_s390_emerg_info emerg;
+		struct kvm_s390_extcall_info extcall;
+		struct kvm_s390_prefix_info prefix;
+		struct kvm_s390_mchk_info mchk;
+		char reserved[64];
+	} u;
+};
+
 /* for KVM_SET_GUEST_DEBUG */
 
 #define KVM_GUESTDBG_ENABLE		0x00000001
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Patchv5 2/7] KVM: s390: add floating irq controller
  2013-10-08 14:54 [PATCHv5 0/7] aync page fault support for s390 (plus flic) Christian Borntraeger
  2013-10-08 14:54 ` [Patchv5 1/7] KVM: s390: add and extend interrupt information data structs Christian Borntraeger
@ 2013-10-08 14:54 ` Christian Borntraeger
  2013-10-13  8:39   ` Gleb Natapov
  2013-10-08 14:54 ` [Patchv5 3/7] KVM: s390: Add FAULT_FLAG_RETRY_NOWAIT for guest fault Christian Borntraeger
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Christian Borntraeger @ 2013-10-08 14:54 UTC (permalink / raw)
  To: Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Jens Freimann, Christian Borntraeger

From: Jens Freimann <jfrei@linux.vnet.ibm.com>

This patch adds a floating irq controller as a kvm_device.
It will be necessary for migration of floating interrupts as well
as for hardening the reset code by allowing user space to explicitly
remove all pending floating interrupts.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 Documentation/virtual/kvm/devices/s390_flic.txt |  36 +++
 arch/s390/include/asm/kvm_host.h                |   1 +
 arch/s390/include/uapi/asm/kvm.h                |   5 +
 arch/s390/kvm/interrupt.c                       | 296 ++++++++++++++++++++----
 arch/s390/kvm/kvm-s390.c                        |   1 +
 include/linux/kvm_host.h                        |   1 +
 include/uapi/linux/kvm.h                        |   1 +
 virt/kvm/kvm_main.c                             |   5 +
 8 files changed, 295 insertions(+), 51 deletions(-)
 create mode 100644 Documentation/virtual/kvm/devices/s390_flic.txt

diff --git a/Documentation/virtual/kvm/devices/s390_flic.txt b/Documentation/virtual/kvm/devices/s390_flic.txt
new file mode 100644
index 0000000..06aef31
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/s390_flic.txt
@@ -0,0 +1,36 @@
+FLIC (floating interrupt controller)
+====================================
+
+FLIC handles floating (non per-cpu) interrupts, i.e.  I/O, service and some
+machine check interruptions. All interrupts are stored in a per-vm list of
+pending interrupts. FLIC performs operations on this list.
+
+Only one FLIC instance may be instantiated.
+
+FLIC provides support to
+- add/delete interrupts (KVM_DEV_FLIC_ENQUEUE and _DEQUEUE)
+- purge all pending floating interrupts (KVM_DEV_FLIC_CLEAR_IRQS)
+
+Groups:
+  KVM_DEV_FLIC_ENQUEUE
+    Adds one interrupt to the list of pending floating interrupts. Interrupts
+    are taken from this list for injection into the guest. attr contains
+    a struct kvm_s390_irq which contains all data relevant for
+    interrupt injection.
+    The format of the data structure kvm_s390_irq as it is copied from userspace
+    is defined in usr/include/linux/kvm.h.
+    For historic reasons list members are stored in a different data structure, i.e.
+    we need to copy the relevant data into a struct kvm_s390_interrupt_info
+    which can then be added to the list.
+
+  KVM_DEV_FLIC_DEQUEUE
+    Takes one element off the pending interrupts list and copies it into userspace.
+    Dequeued interrupts are not injected into the guest.
+    attr->addr contains the userspace address of a struct kvm_s390_irq.
+    List elements are stored in the format of struct kvm_s390_interrupt_info
+    (arch/s390/include/asm/kvm_host.h) and are copied into a struct kvm_s390_irq
+    (usr/include/linux/kvm.h)
+
+  KVM_DEV_FLIC_CLEAR_IRQS
+    Simply deletes all elements from the list of currently pending floating interrupts.
+    No interrupts are injected into the guest.
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 78b6918..2d09c1d 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -237,6 +237,7 @@ struct kvm_arch{
 	struct sca_block *sca;
 	debug_info_t *dbf;
 	struct kvm_s390_float_interrupt float_int;
+	struct kvm_device *flic;
 	struct gmap *gmap;
 	int css_support;
 };
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index d25da59..33d52b8 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -16,6 +16,11 @@
 
 #define __KVM_S390
 
+/* Device control API: s390-specific devices */
+#define KVM_DEV_FLIC_DEQUEUE 1
+#define KVM_DEV_FLIC_ENQUEUE 2
+#define KVM_DEV_FLIC_CLEAR_IRQS 3
+
 /* for KVM_GET_REGS and KVM_SET_REGS */
 struct kvm_regs {
 	/* general purpose regs for s390 */
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index e7323cd..66478a0 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -659,53 +659,85 @@ struct kvm_s390_interrupt_info *kvm_s390_get_io_int(struct kvm *kvm,
 	return inti;
 }
 
-int kvm_s390_inject_vm(struct kvm *kvm,
-		       struct kvm_s390_interrupt *s390int)
+static void __inject_vm(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
 {
 	struct kvm_s390_local_interrupt *li;
 	struct kvm_s390_float_interrupt *fi;
-	struct kvm_s390_interrupt_info *inti, *iter;
+	struct kvm_s390_interrupt_info *iter;
 	int sigcpu;
 
+	mutex_lock(&kvm->lock);
+	fi = &kvm->arch.float_int;
+	spin_lock(&fi->lock);
+	if (!is_ioint(inti->type)) {
+		list_add_tail(&inti->list, &fi->list);
+	} else {
+		u64 isc_bits = int_word_to_isc_bits(inti->io.io_int_word);
+
+		/* Keep I/O interrupts sorted in isc order. */
+		list_for_each_entry(iter, &fi->list, list) {
+			if (!is_ioint(iter->type))
+				continue;
+			if (int_word_to_isc_bits(iter->io.io_int_word) <= isc_bits)
+				continue;
+			break;
+		}
+		list_add_tail(&inti->list, &iter->list);
+	}
+	atomic_set(&fi->active, 1);
+	sigcpu = find_first_bit(fi->idle_mask, KVM_MAX_VCPUS);
+	if (sigcpu == KVM_MAX_VCPUS) {
+		do {
+			sigcpu = fi->next_rr_cpu++;
+			if (sigcpu == KVM_MAX_VCPUS)
+				sigcpu = fi->next_rr_cpu = 0;
+		} while (fi->local_int[sigcpu] == NULL);
+	}
+	li = fi->local_int[sigcpu];
+	spin_lock_bh(&li->lock);
+	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
+	if (waitqueue_active(li->wq))
+		wake_up_interruptible(li->wq);
+	spin_unlock_bh(&li->lock);
+	spin_unlock(&fi->lock);
+	mutex_unlock(&kvm->lock);
+}
+
+int kvm_s390_inject_vm(struct kvm *kvm,
+		       struct kvm_s390_interrupt *s390int)
+{
+	struct kvm_s390_interrupt_info *inti;
+
 	inti = kzalloc(sizeof(*inti), GFP_KERNEL);
 	if (!inti)
 		return -ENOMEM;
 
-	switch (s390int->type) {
+	inti->type = s390int->type;
+	switch (inti->type) {
 	case KVM_S390_INT_VIRTIO:
 		VM_EVENT(kvm, 5, "inject: virtio parm:%x,parm64:%llx",
 			 s390int->parm, s390int->parm64);
-		inti->type = s390int->type;
 		inti->ext.ext_params = s390int->parm;
 		inti->ext.ext_params2 = s390int->parm64;
 		break;
 	case KVM_S390_INT_SERVICE:
 		VM_EVENT(kvm, 5, "inject: sclp parm:%x", s390int->parm);
-		inti->type = s390int->type;
 		inti->ext.ext_params = s390int->parm;
 		break;
-	case KVM_S390_PROGRAM_INT:
-	case KVM_S390_SIGP_STOP:
-	case KVM_S390_INT_EXTERNAL_CALL:
-	case KVM_S390_INT_EMERGENCY:
-		kfree(inti);
-		return -EINVAL;
 	case KVM_S390_MCHK:
 		VM_EVENT(kvm, 5, "inject: machine check parm64:%llx",
 			 s390int->parm64);
-		inti->type = s390int->type;
 		inti->mchk.cr14 = s390int->parm; /* upper bits are not used */
 		inti->mchk.mcic = s390int->parm64;
 		break;
 	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
-		if (s390int->type & IOINT_AI_MASK)
+		if (inti->type & IOINT_AI_MASK)
 			VM_EVENT(kvm, 5, "%s", "inject: I/O (AI)");
 		else
 			VM_EVENT(kvm, 5, "inject: I/O css %x ss %x schid %04x",
 				 s390int->type & IOINT_CSSID_MASK,
 				 s390int->type & IOINT_SSID_MASK,
 				 s390int->type & IOINT_SCHID_MASK);
-		inti->type = s390int->type;
 		inti->io.subchannel_id = s390int->parm >> 16;
 		inti->io.subchannel_nr = s390int->parm & 0x0000ffffu;
 		inti->io.io_int_parm = s390int->parm64 >> 32;
@@ -718,42 +750,7 @@ int kvm_s390_inject_vm(struct kvm *kvm,
 	trace_kvm_s390_inject_vm(s390int->type, s390int->parm, s390int->parm64,
 				 2);
 
-	mutex_lock(&kvm->lock);
-	fi = &kvm->arch.float_int;
-	spin_lock(&fi->lock);
-	if (!is_ioint(inti->type))
-		list_add_tail(&inti->list, &fi->list);
-	else {
-		u64 isc_bits = int_word_to_isc_bits(inti->io.io_int_word);
-
-		/* Keep I/O interrupts sorted in isc order. */
-		list_for_each_entry(iter, &fi->list, list) {
-			if (!is_ioint(iter->type))
-				continue;
-			if (int_word_to_isc_bits(iter->io.io_int_word)
-			    <= isc_bits)
-				continue;
-			break;
-		}
-		list_add_tail(&inti->list, &iter->list);
-	}
-	atomic_set(&fi->active, 1);
-	sigcpu = find_first_bit(fi->idle_mask, KVM_MAX_VCPUS);
-	if (sigcpu == KVM_MAX_VCPUS) {
-		do {
-			sigcpu = fi->next_rr_cpu++;
-			if (sigcpu == KVM_MAX_VCPUS)
-				sigcpu = fi->next_rr_cpu = 0;
-		} while (fi->local_int[sigcpu] == NULL);
-	}
-	li = fi->local_int[sigcpu];
-	spin_lock_bh(&li->lock);
-	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
-	if (waitqueue_active(li->wq))
-		wake_up_interruptible(li->wq);
-	spin_unlock_bh(&li->lock);
-	spin_unlock(&fi->lock);
-	mutex_unlock(&kvm->lock);
+	__inject_vm(kvm, inti);
 	return 0;
 }
 
@@ -841,3 +838,200 @@ int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
 	mutex_unlock(&vcpu->kvm->lock);
 	return 0;
 }
+
+static void clear_floating_interrupts(struct kvm *kvm)
+{
+	struct kvm_s390_float_interrupt *fi;
+	struct kvm_s390_interrupt_info	*n, *inti = NULL;
+
+	mutex_lock(&kvm->lock);
+	fi = &kvm->arch.float_int;
+	spin_lock(&fi->lock);
+	list_for_each_entry_safe(inti, n, &fi->list, list) {
+		list_del(&inti->list);
+		kfree(inti);
+	}
+	atomic_set(&fi->active, 0);
+	spin_unlock(&fi->lock);
+	mutex_unlock(&kvm->lock);
+}
+
+static inline int copy_irq_to_user(struct kvm_s390_interrupt_info *inti,
+				   u64 addr)
+{
+	struct kvm_s390_irq __user *uptr = (struct kvm_s390_irq __user *) addr;
+	void __user *target;
+	void *source;
+	u64 size;
+	int r = 0;
+
+	switch (inti->type) {
+	case KVM_S390_INT_VIRTIO:
+	case KVM_S390_INT_SERVICE:
+		source = &inti->ext;
+		target = &uptr->u.ext;
+		size = sizeof(inti->ext);
+		break;
+	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
+		source = &inti->io;
+		target = &uptr->u.io;
+		size = sizeof(inti->io);
+		break;
+	case KVM_S390_MCHK:
+		source = &inti->mchk;
+		target = &uptr->u.mchk;
+		size = sizeof(inti->mchk);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	r = put_user(inti->type, (u64 __user *) &uptr->type);
+	if (copy_to_user(target, source, size))
+		r = -EFAULT;
+
+	return r;
+}
+
+static int dequeue_floating_irq(struct kvm *kvm, __u64 addr)
+{
+	struct kvm_s390_interrupt_info *inti;
+	struct kvm_s390_float_interrupt *fi;
+	int r = 0;
+
+
+	mutex_lock(&kvm->lock);
+	fi = &kvm->arch.float_int;
+	spin_lock(&fi->lock);
+	if (list_empty(&fi->list)) {
+		mutex_unlock(&kvm->lock);
+		spin_unlock(&fi->lock);
+		return -ENODATA;
+	}
+	inti = list_first_entry(&fi->list, struct kvm_s390_interrupt_info, list);
+	list_del(&inti->list);
+	spin_unlock(&fi->lock);
+	mutex_unlock(&kvm->lock);
+
+	r = copy_irq_to_user(inti, addr);
+
+	kfree(inti);
+	return r;
+}
+
+static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
+{
+	int r;
+
+	switch (attr->group) {
+	case KVM_DEV_FLIC_DEQUEUE:
+		r = dequeue_floating_irq(dev->kvm, attr->addr);
+		break;
+	default:
+		r = -EINVAL;
+	}
+
+	return r;
+}
+
+static inline int copy_irq_from_user(struct kvm_s390_interrupt_info *inti,
+				     u64 addr)
+{
+	struct kvm_s390_irq __user *uptr = (struct kvm_s390_irq __user *) addr;
+	void *target = NULL;
+	void __user *source;
+	u64 size;
+	int r = 0;
+
+	if (get_user(inti->type, (u64 __user *)addr))
+		return -EFAULT;
+	switch (inti->type) {
+	case KVM_S390_INT_VIRTIO:
+	case KVM_S390_INT_SERVICE:
+		target = (void *) &inti->ext;
+		source = &uptr->u.ext;
+		size = sizeof(inti->ext);
+		break;
+	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
+		target = (void *) &inti->io;
+		source = &uptr->u.io;
+		size = sizeof(inti->io);
+		break;
+	case KVM_S390_MCHK:
+		target = (void *) &inti->mchk;
+		source = &uptr->u.mchk;
+		size = sizeof(inti->mchk);
+		break;
+	default:
+		r = -EINVAL;
+		return r;
+	}
+
+	if (copy_from_user(target, source, size))
+		r = -EFAULT;
+
+	return r;
+}
+
+static int enqueue_floating_irq(struct kvm_device *dev,
+				 struct kvm_device_attr *attr)
+{
+	struct kvm_s390_interrupt_info *inti = NULL;
+	int r = 0;
+
+	inti = kzalloc(sizeof(*inti), GFP_KERNEL);
+	if (!inti)
+		return -ENOMEM;
+
+	r = copy_irq_from_user(inti, attr->addr);
+	if (r) {
+		kfree(inti);
+		return r;
+	}
+	__inject_vm(dev->kvm, inti);
+
+	return r;
+}
+
+static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
+{
+	int r = 0;
+
+	switch (attr->group) {
+	case KVM_DEV_FLIC_ENQUEUE:
+		r = enqueue_floating_irq(dev, attr);
+		break;
+	case KVM_DEV_FLIC_CLEAR_IRQS:
+		r = 0;
+		clear_floating_interrupts(dev->kvm);
+		break;
+	default:
+		r = -EINVAL;
+	}
+
+	return r;
+}
+
+static int flic_create(struct kvm_device *dev, u32 type)
+{
+	if (!dev)
+		return -EINVAL;
+	if (dev->kvm->arch.flic)
+		return -EINVAL;
+	dev->kvm->arch.flic = dev;
+	return 0;
+}
+
+static void flic_destroy(struct kvm_device *dev)
+{
+	dev->kvm->arch.flic = NULL;
+}
+
+/* s390 floating irq controller (flic) */
+struct kvm_device_ops kvm_flic_ops = {
+	.name = "kvm-flic",
+	.get_attr = flic_get_attr,
+	.set_attr = flic_set_attr,
+	.create = flic_create,
+	.destroy = flic_destroy,
+};
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 1e4e7b9..30e2c9a 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -157,6 +157,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_ENABLE_CAP:
 	case KVM_CAP_S390_CSS_SUPPORT:
 	case KVM_CAP_IOEVENTFD:
+	case KVM_CAP_DEVICE_CTRL:
 		r = 1;
 		break;
 	case KVM_CAP_NR_VCPUS:
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7c961e1..2077dd0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1065,6 +1065,7 @@ struct kvm_device *kvm_device_from_filp(struct file *filp);
 
 extern struct kvm_device_ops kvm_mpic_ops;
 extern struct kvm_device_ops kvm_xics_ops;
+extern struct kvm_device_ops kvm_flic_ops;
 
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 450fae8..fa59f1a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -906,6 +906,7 @@ struct kvm_device_attr {
 #define KVM_DEV_TYPE_FSL_MPIC_20	1
 #define KVM_DEV_TYPE_FSL_MPIC_42	2
 #define KVM_DEV_TYPE_XICS		3
+#define KVM_DEV_TYPE_FLIC		5
 
 /*
  * ioctls for VM fds
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d469114..dd2cc28 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2270,6 +2270,11 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
 		ops = &kvm_xics_ops;
 		break;
 #endif
+#ifdef CONFIG_S390
+	case KVM_DEV_TYPE_FLIC:
+		ops = &kvm_flic_ops;
+		break;
+#endif
 	default:
 		return -ENODEV;
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Patchv5 3/7] KVM: s390: Add FAULT_FLAG_RETRY_NOWAIT for guest fault
  2013-10-08 14:54 [PATCHv5 0/7] aync page fault support for s390 (plus flic) Christian Borntraeger
  2013-10-08 14:54 ` [Patchv5 1/7] KVM: s390: add and extend interrupt information data structs Christian Borntraeger
  2013-10-08 14:54 ` [Patchv5 2/7] KVM: s390: add floating irq controller Christian Borntraeger
@ 2013-10-08 14:54 ` Christian Borntraeger
  2013-10-08 14:54 ` [Patchv5 4/7] KVM: async_pf: Provide additional direct page notification Christian Borntraeger
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2013-10-08 14:54 UTC (permalink / raw)
  To: Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Dominik Dingel, Christian Borntraeger

From: Dominik Dingel <dingel@linux.vnet.ibm.com>

In case of a fault retry exit sie64() with gmap_fault indication for the
running thread set. This makes it possible to handle async page faults
without the need for mm notifiers.

Based on a patch from Martin Schwidefsky.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/pgtable.h   |  2 ++
 arch/s390/include/asm/processor.h |  1 +
 arch/s390/kvm/kvm-s390.c          | 28 +++++++++++++++++++++++-----
 arch/s390/mm/fault.c              | 26 ++++++++++++++++++++++----
 4 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 9b60a36..18495d9 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -765,6 +765,7 @@ static inline void pgste_set_pte(pte_t *ptep, pte_t entry)
  * @table: pointer to the page directory
  * @asce: address space control element for gmap page table
  * @crst_list: list of all crst tables used in the guest address space
+ * @pfault_enabled: defines if pfaults are applicable for the guest
  */
 struct gmap {
 	struct list_head list;
@@ -773,6 +774,7 @@ struct gmap {
 	unsigned long asce;
 	void *private;
 	struct list_head crst_list;
+	bool pfault_enabled;
 };
 
 /**
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 0eb3750..bea6968 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -79,6 +79,7 @@ struct thread_struct {
         unsigned long ksp;              /* kernel stack pointer             */
 	mm_segment_t mm_segment;
 	unsigned long gmap_addr;	/* address of last gmap fault. */
+	unsigned int gmap_pfault;	/* signal of a pending guest pfault */
 	struct per_regs per_user;	/* User specified PER registers */
 	struct per_event per_event;	/* Cause of the last PER trap */
 	unsigned long per_flags;	/* Flags to control debug behavior */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 30e2c9a..785e36e 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -255,6 +255,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		if (!kvm->arch.gmap)
 			goto out_nogmap;
 		kvm->arch.gmap->private = kvm;
+		kvm->arch.gmap->pfault_enabled = 0;
 	}
 
 	kvm->arch.css_support = 0;
@@ -690,6 +691,17 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static long kvm_arch_fault_in_sync(struct kvm_vcpu *vcpu)
+{
+	long rc;
+	hva_t fault = gmap_fault(current->thread.gmap_addr, vcpu->arch.gmap);
+	struct mm_struct *mm = current->mm;
+	down_read(&mm->mmap_sem);
+	rc = get_user_pages(current, mm, fault, 1, 1, 0, NULL, NULL);
+	up_read(&mm->mmap_sem);
+	return rc;
+}
+
 static int vcpu_pre_run(struct kvm_vcpu *vcpu)
 {
 	int rc, cpuflags;
@@ -719,7 +731,7 @@ static int vcpu_pre_run(struct kvm_vcpu *vcpu)
 
 static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
 {
-	int rc;
+	int rc = -1;
 
 	VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
 		   vcpu->arch.sie_block->icptcode);
@@ -730,13 +742,19 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
 	} else {
 		if (kvm_is_ucontrol(vcpu->kvm)) {
 			rc = SIE_INTERCEPT_UCONTROL;
-		} else {
-			VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
-			trace_kvm_s390_sie_fault(vcpu);
-			rc = kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+		} else if (current->thread.gmap_pfault) {
+			current->thread.gmap_pfault = 0;
+			if (kvm_arch_fault_in_sync(vcpu) >= 0)
+				rc = 0;
 		}
 	}
 
+	if (rc == -1) {
+		VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
+		trace_kvm_s390_sie_fault(vcpu);
+		rc = kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+	}
+
 	memcpy(&vcpu->run->s.regs.gprs[14], &vcpu->arch.sie_block->gg14, 16);
 
 	if (rc == 0) {
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index fc66792..7505552 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -50,6 +50,7 @@
 #define VM_FAULT_BADMAP		0x020000
 #define VM_FAULT_BADACCESS	0x040000
 #define VM_FAULT_SIGNAL		0x080000
+#define VM_FAULT_PFAULT		0x100000
 
 static unsigned long store_indication __read_mostly;
 
@@ -232,6 +233,7 @@ static noinline void do_fault_error(struct pt_regs *regs, int fault)
 			return;
 		}
 	case VM_FAULT_BADCONTEXT:
+	case VM_FAULT_PFAULT:
 		do_no_context(regs);
 		break;
 	case VM_FAULT_SIGNAL:
@@ -269,6 +271,9 @@ static noinline void do_fault_error(struct pt_regs *regs, int fault)
  */
 static inline int do_exception(struct pt_regs *regs, int access)
 {
+#ifdef CONFIG_PGSTE
+	struct gmap *gmap;
+#endif
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
@@ -309,9 +314,10 @@ static inline int do_exception(struct pt_regs *regs, int access)
 	down_read(&mm->mmap_sem);
 
 #ifdef CONFIG_PGSTE
-	if ((current->flags & PF_VCPU) && S390_lowcore.gmap) {
-		address = __gmap_fault(address,
-				     (struct gmap *) S390_lowcore.gmap);
+	gmap = (struct gmap *)
+		((current->flags & PF_VCPU) ? S390_lowcore.gmap : 0);
+	if (gmap) {
+		address = __gmap_fault(address, gmap);
 		if (address == -EFAULT) {
 			fault = VM_FAULT_BADMAP;
 			goto out_up;
@@ -320,6 +326,8 @@ static inline int do_exception(struct pt_regs *regs, int access)
 			fault = VM_FAULT_OOM;
 			goto out_up;
 		}
+		if (gmap->pfault_enabled)
+			flags |= FAULT_FLAG_RETRY_NOWAIT;
 	}
 #endif
 
@@ -376,9 +384,19 @@ retry:
 				      regs, address);
 		}
 		if (fault & VM_FAULT_RETRY) {
+#ifdef CONFIG_PGSTE
+			if (gmap && (flags & FAULT_FLAG_RETRY_NOWAIT)) {
+				/* FAULT_FLAG_RETRY_NOWAIT has been set,
+				 * mmap_sem has not been released */
+				current->thread.gmap_pfault = 1;
+				fault = VM_FAULT_PFAULT;
+				goto out_up;
+			}
+#endif
 			/* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
 			 * of starvation. */
-			flags &= ~FAULT_FLAG_ALLOW_RETRY;
+			flags &= ~(FAULT_FLAG_ALLOW_RETRY |
+				   FAULT_FLAG_RETRY_NOWAIT);
 			flags |= FAULT_FLAG_TRIED;
 			down_read(&mm->mmap_sem);
 			goto retry;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Patchv5 4/7] KVM: async_pf: Provide additional direct page notification
  2013-10-08 14:54 [PATCHv5 0/7] aync page fault support for s390 (plus flic) Christian Borntraeger
                   ` (2 preceding siblings ...)
  2013-10-08 14:54 ` [Patchv5 3/7] KVM: s390: Add FAULT_FLAG_RETRY_NOWAIT for guest fault Christian Borntraeger
@ 2013-10-08 14:54 ` Christian Borntraeger
  2013-10-08 14:54 ` [Patchv5 5/7] KVM: async_pf: Allow to wait for outstanding work Christian Borntraeger
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2013-10-08 14:54 UTC (permalink / raw)
  To: Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Dominik Dingel, Christian Borntraeger

From: Dominik Dingel <dingel@linux.vnet.ibm.com>

By setting a Kconfig option, the architecture can control when
guest notifications will be presented by the apf backend.
So there is the default batch mechanism, working as before, where the vcpu thread
should pull in this information. On the other hand there is now the direct
mechanism, this will directly push the information to the guest.
This way s390 can use an already existing architecture interface.

Still the vcpu thread should call check_completion to cleanup leftovers,
that leaves most of the common code untouched.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/x86/kvm/mmu.c       |  2 +-
 include/linux/kvm_host.h |  2 +-
 virt/kvm/Kconfig         |  4 ++++
 virt/kvm/async_pf.c      | 22 +++++++++++++++++++---
 4 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cf95cfe..9862e36 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3324,7 +3324,7 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
 	arch.direct_map = vcpu->arch.mmu.direct_map;
 	arch.cr3 = vcpu->arch.mmu.get_cr3(vcpu);
 
-	return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
+	return kvm_setup_async_pf(vcpu, gva, gfn_to_hva(vcpu->kvm, gfn), &arch);
 }
 
 static bool can_do_async_pf(struct kvm_vcpu *vcpu)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2077dd0..b4e8666 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -194,7 +194,7 @@ struct kvm_async_pf {
 
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
 void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
-int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
 		       struct kvm_arch_async_pf *arch);
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 779262f..0774495 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -22,6 +22,10 @@ config KVM_MMIO
 config KVM_ASYNC_PF
        bool
 
+# Toggle to switch between direct notification and batch job
+config KVM_ASYNC_PF_SYNC
+       bool
+
 config HAVE_KVM_MSI
        bool
 
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index b197950..8f57d63 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -28,6 +28,21 @@
 #include "async_pf.h"
 #include <trace/events/kvm.h>
 
+static inline void kvm_async_page_present_sync(struct kvm_vcpu *vcpu,
+					       struct kvm_async_pf *work)
+{
+#ifdef CONFIG_KVM_ASYNC_PF_SYNC
+	kvm_arch_async_page_present(vcpu, work);
+#endif
+}
+static inline void kvm_async_page_present_async(struct kvm_vcpu *vcpu,
+						struct kvm_async_pf *work)
+{
+#ifndef CONFIG_KVM_ASYNC_PF_SYNC
+	kvm_arch_async_page_present(vcpu, work);
+#endif
+}
+
 static struct kmem_cache *async_pf_cache;
 
 int kvm_async_pf_init(void)
@@ -70,6 +85,7 @@ static void async_pf_execute(struct work_struct *work)
 	down_read(&mm->mmap_sem);
 	get_user_pages(current, mm, addr, 1, 1, 0, &page, NULL);
 	up_read(&mm->mmap_sem);
+	kvm_async_page_present_sync(vcpu, apf);
 	unuse_mm(mm);
 
 	spin_lock(&vcpu->async_pf.lock);
@@ -135,7 +151,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 
 		if (work->page)
 			kvm_arch_async_page_ready(vcpu, work);
-		kvm_arch_async_page_present(vcpu, work);
+		kvm_async_page_present_async(vcpu, work);
 
 		list_del(&work->queue);
 		vcpu->async_pf.queued--;
@@ -145,7 +161,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 	}
 }
 
-int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
 		       struct kvm_arch_async_pf *arch)
 {
 	struct kvm_async_pf *work;
@@ -166,7 +182,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
 	work->page = NULL;
 	work->vcpu = vcpu;
 	work->gva = gva;
-	work->addr = gfn_to_hva(vcpu->kvm, gfn);
+	work->addr = hva;
 	work->arch = *arch;
 	work->mm = current->mm;
 	atomic_inc(&work->mm->mm_count);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Patchv5 5/7] KVM: async_pf: Allow to wait for outstanding work
  2013-10-08 14:54 [PATCHv5 0/7] aync page fault support for s390 (plus flic) Christian Borntraeger
                   ` (3 preceding siblings ...)
  2013-10-08 14:54 ` [Patchv5 4/7] KVM: async_pf: Provide additional direct page notification Christian Borntraeger
@ 2013-10-08 14:54 ` Christian Borntraeger
  2013-10-13  8:48   ` Gleb Natapov
  2013-10-08 14:54 ` [Patchv5 6/7] KVM: async_pf: Async page fault support on s390 Christian Borntraeger
  2013-10-08 14:55 ` [Patchv5 7/7] KVM: async_pf: Exploit one reg interface for pfault Christian Borntraeger
  6 siblings, 1 reply; 20+ messages in thread
From: Christian Borntraeger @ 2013-10-08 14:54 UTC (permalink / raw)
  To: Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Dominik Dingel, Christian Borntraeger

From: Dominik Dingel <dingel@linux.vnet.ibm.com>

kvm_clear_async_pf_completion get an additional flag to either cancel outstanding
work or wait for oustanding work to be finished, x86 currentlx cancels all work.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/x86/kvm/x86.c       | 8 ++++----
 include/linux/kvm_host.h | 2 +-
 virt/kvm/async_pf.c      | 6 +++++-
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 187f824..00a4262 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -539,7 +539,7 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	kvm_x86_ops->set_cr0(vcpu, cr0);
 
 	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
-		kvm_clear_async_pf_completion_queue(vcpu);
+		kvm_clear_async_pf_completion_queue(vcpu, false);
 		kvm_async_pf_hash_reset(vcpu);
 	}
 
@@ -1911,7 +1911,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 	vcpu->arch.apf.msr_val = data;
 
 	if (!(data & KVM_ASYNC_PF_ENABLED)) {
-		kvm_clear_async_pf_completion_queue(vcpu);
+		kvm_clear_async_pf_completion_queue(vcpu, false);
 		kvm_async_pf_hash_reset(vcpu);
 		return 0;
 	}
@@ -6742,7 +6742,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
 
 	kvmclock_reset(vcpu);
 
-	kvm_clear_async_pf_completion_queue(vcpu);
+	kvm_clear_async_pf_completion_queue(vcpu, false);
 	kvm_async_pf_hash_reset(vcpu);
 	vcpu->arch.apf.halted = false;
 
@@ -7015,7 +7015,7 @@ static void kvm_free_vcpus(struct kvm *kvm)
 	 * Unpin any mmu pages first.
 	 */
 	kvm_for_each_vcpu(i, vcpu, kvm) {
-		kvm_clear_async_pf_completion_queue(vcpu);
+		kvm_clear_async_pf_completion_queue(vcpu, false);
 		kvm_unload_vcpu_mmu(vcpu);
 	}
 	kvm_for_each_vcpu(i, vcpu, kvm)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b4e8666..223fcf3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -192,7 +192,7 @@ struct kvm_async_pf {
 	struct page *page;
 };
 
-void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
+void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu, bool flush);
 void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
 int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
 		       struct kvm_arch_async_pf *arch);
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 8f57d63..3e13a73 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -107,7 +107,7 @@ static void async_pf_execute(struct work_struct *work)
 	kvm_put_kvm(vcpu->kvm);
 }
 
-void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
+void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu, bool flush)
 {
 	/* cancel outstanding work queue item */
 	while (!list_empty(&vcpu->async_pf.queue)) {
@@ -115,6 +115,10 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
 			list_entry(vcpu->async_pf.queue.next,
 				   typeof(*work), queue);
 		list_del(&work->queue);
+		if (flush) {
+			flush_work(&work->work);
+			continue;
+		}
 		if (cancel_work_sync(&work->work)) {
 			mmdrop(work->mm);
 			kvm_put_kvm(vcpu->kvm); /* == work->vcpu->kvm */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Patchv5 6/7] KVM: async_pf:  Async page fault support on s390
  2013-10-08 14:54 [PATCHv5 0/7] aync page fault support for s390 (plus flic) Christian Borntraeger
                   ` (4 preceding siblings ...)
  2013-10-08 14:54 ` [Patchv5 5/7] KVM: async_pf: Allow to wait for outstanding work Christian Borntraeger
@ 2013-10-08 14:54 ` Christian Borntraeger
  2013-10-13  9:15   ` Gleb Natapov
  2013-10-13  9:30   ` Gleb Natapov
  2013-10-08 14:55 ` [Patchv5 7/7] KVM: async_pf: Exploit one reg interface for pfault Christian Borntraeger
  6 siblings, 2 replies; 20+ messages in thread
From: Christian Borntraeger @ 2013-10-08 14:54 UTC (permalink / raw)
  To: Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Dominik Dingel, Christian Borntraeger

From: Dominik Dingel <dingel@linux.vnet.ibm.com>

This patch enables async page faults for s390 kvm guests.
It provides the userspace API to enable, disable and disable_wait this feature.
By providing disable and disable_wait, the userspace can first asynchronly disable
the feature, then continue the live migration and later on enforce that the feature
is off by waiting on it.
Also it includes the diagnose code, called by the guest to enable async page faults.

The async page faults will use an already existing guest interface for this
purpose, as described in "CP Programming Services (SC24-6084)".

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 22 +++++++++++
 arch/s390/include/uapi/asm/kvm.h |  9 +++--
 arch/s390/kvm/Kconfig            |  2 +
 arch/s390/kvm/Makefile           |  2 +-
 arch/s390/kvm/diag.c             | 84 +++++++++++++++++++++++++++++++++++++++
 arch/s390/kvm/interrupt.c        | 68 ++++++++++++++++++++++++++++----
 arch/s390/kvm/kvm-s390.c         | 85 +++++++++++++++++++++++++++++++++++++++-
 arch/s390/kvm/kvm-s390.h         |  4 ++
 arch/s390/kvm/sigp.c             |  7 ++++
 arch/s390/kvm/trace.h            | 46 ++++++++++++++++++++++
 include/uapi/linux/kvm.h         |  2 +
 11 files changed, 318 insertions(+), 13 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 2d09c1d..151ea01 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -224,6 +224,10 @@ struct kvm_vcpu_arch {
 		u64		stidp_data;
 	};
 	struct gmap *gmap;
+#define KVM_S390_PFAULT_TOKEN_INVALID	(-1UL)
+	unsigned long pfault_token;
+	unsigned long pfault_select;
+	unsigned long pfault_compare;
 };
 
 struct kvm_vm_stat {
@@ -250,6 +254,24 @@ static inline bool kvm_is_error_hva(unsigned long addr)
 	return IS_ERR_VALUE(addr);
 }
 
+#define ASYNC_PF_PER_VCPU	64
+struct kvm_vcpu;
+struct kvm_async_pf;
+struct kvm_arch_async_pf {
+	unsigned long pfault_token;
+};
+
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work);
+
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work);
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work);
+
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
 #endif
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 33d52b8..1e8fced 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -17,9 +17,12 @@
 #define __KVM_S390
 
 /* Device control API: s390-specific devices */
-#define KVM_DEV_FLIC_DEQUEUE 1
-#define KVM_DEV_FLIC_ENQUEUE 2
-#define KVM_DEV_FLIC_CLEAR_IRQS 3
+#define KVM_DEV_FLIC_DEQUEUE		1
+#define KVM_DEV_FLIC_ENQUEUE		2
+#define KVM_DEV_FLIC_CLEAR_IRQS		3
+#define KVM_DEV_FLIC_APF_ENABLE		4
+#define KVM_DEV_FLIC_APF_DISABLE	5
+#define KVM_DEV_FLIC_APF_DISABLE_WAIT	6
 
 /* for KVM_GET_REGS and KVM_SET_REGS */
 struct kvm_regs {
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 70b46ea..c8bacbc 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -23,6 +23,8 @@ config KVM
 	select ANON_INODES
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select HAVE_KVM_EVENTFD
+	select KVM_ASYNC_PF
+	select KVM_ASYNC_PF_SYNC
 	---help---
 	  Support hosting paravirtualized guest machines using the SIE
 	  virtualization capability on the mainframe. This should work
diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
index 40b4c64..a47d2c3 100644
--- a/arch/s390/kvm/Makefile
+++ b/arch/s390/kvm/Makefile
@@ -7,7 +7,7 @@
 # as published by the Free Software Foundation.
 
 KVM := ../../../virt/kvm
-common-objs = $(KVM)/kvm_main.o $(KVM)/eventfd.o
+common-objs = $(KVM)/kvm_main.o $(KVM)/eventfd.o  $(KVM)/async_pf.o
 
 ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
 
diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index 78d967f..e50aadf 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -17,6 +17,7 @@
 #include "kvm-s390.h"
 #include "trace.h"
 #include "trace-s390.h"
+#include "gaccess.h"
 
 static int diag_release_pages(struct kvm_vcpu *vcpu)
 {
@@ -46,6 +47,87 @@ static int diag_release_pages(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static int __diag_page_ref_service(struct kvm_vcpu *vcpu)
+{
+	struct prs_parm {
+		u16 code;
+		u16 subcode;
+		u16 parm_len;
+		u16 parm_version;
+		u64 token_addr;
+		u64 select_mask;
+		u64 compare_mask;
+		u64 zarch;
+	};
+	struct prs_parm parm;
+	int rc;
+	u16 rx = (vcpu->arch.sie_block->ipa & 0xf0) >> 4;
+	u16 ry = (vcpu->arch.sie_block->ipa & 0x0f);
+	unsigned long hva_token = KVM_HVA_ERR_BAD;
+
+	if (vcpu->run->s.regs.gprs[rx] & 7)
+		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
+	if (copy_from_guest(vcpu, &parm, vcpu->run->s.regs.gprs[rx], sizeof(parm)))
+		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+	if (parm.parm_version != 2 || parm.parm_len < 5 || parm.code != 0x258)
+		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
+
+	switch (parm.subcode) {
+	case 0: /* TOKEN */
+		if (vcpu->arch.pfault_token != KVM_S390_PFAULT_TOKEN_INVALID) {
+			/*
+			 * pagefault handshake already done, token will not be
+			 * changed setting return value to 8
+			 */
+			vcpu->run->s.regs.gprs[ry] = 8;
+			return 0;
+		}
+
+		if ((parm.compare_mask & parm.select_mask) != parm.compare_mask ||
+		    parm.token_addr & 7 || parm.zarch != 0x8000000000000000ULL)
+			return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
+
+		hva_token = gfn_to_hva(vcpu->kvm, gpa_to_gfn(parm.token_addr));
+		if (kvm_is_error_hva(hva_token))
+			return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+
+		vcpu->arch.pfault_token = parm.token_addr;
+		vcpu->arch.pfault_select = parm.select_mask;
+		vcpu->arch.pfault_compare = parm.compare_mask;
+		vcpu->run->s.regs.gprs[ry] = 0;
+		rc = 0;
+		break;
+	case 1: /*
+		 * CANCEL
+		 * Specification allows to let already pending tokens survive
+		 * the cancel, therefore to reduce code complexity, we assume, all
+		 * outstanding tokens are already pending.
+		 */
+		if (parm.token_addr || parm.select_mask || parm.compare_mask ||
+		    parm.zarch)
+			return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
+
+		vcpu->run->s.regs.gprs[ry] = 0;
+		/*
+		 * in the case the pfault handling was not established or
+		 * already canceled we will set represent the right return
+		 * value to the guest
+		 */
+		if (vcpu->arch.pfault_token == KVM_S390_PFAULT_TOKEN_INVALID)
+			vcpu->run->s.regs.gprs[ry] = 4;
+		else
+			vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
+
+		rc = 0;
+		break;
+	default:
+		rc = -EOPNOTSUPP;
+		break;
+	}
+
+	return rc;
+}
+
 static int __diag_time_slice_end(struct kvm_vcpu *vcpu)
 {
 	VCPU_EVENT(vcpu, 5, "%s", "diag time slice end");
@@ -150,6 +232,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
 		return __diag_time_slice_end(vcpu);
 	case 0x9c:
 		return __diag_time_slice_end_directed(vcpu);
+	case 0x258:
+		return __diag_page_ref_service(vcpu);
 	case 0x308:
 		return __diag_ipl_functions(vcpu);
 	case 0x500:
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 66478a0..18e39d4 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -31,7 +31,7 @@ static int is_ioint(u64 type)
 	return ((type & 0xfffe0000u) != 0xfffe0000u);
 }
 
-static int psw_extint_disabled(struct kvm_vcpu *vcpu)
+int psw_extint_disabled(struct kvm_vcpu *vcpu)
 {
 	return !(vcpu->arch.sie_block->gpsw.mask & PSW_MASK_EXT);
 }
@@ -78,11 +78,8 @@ static int __interrupt_is_deliverable(struct kvm_vcpu *vcpu,
 			return 1;
 		return 0;
 	case KVM_S390_INT_SERVICE:
-		if (psw_extint_disabled(vcpu))
-			return 0;
-		if (vcpu->arch.sie_block->gcr[0] & 0x200ul)
-			return 1;
-		return 0;
+	case KVM_S390_INT_PFAULT_INIT:
+	case KVM_S390_INT_PFAULT_DONE:
 	case KVM_S390_INT_VIRTIO:
 		if (psw_extint_disabled(vcpu))
 			return 0;
@@ -150,6 +147,8 @@ static void __set_intercept_indicator(struct kvm_vcpu *vcpu,
 	case KVM_S390_INT_EXTERNAL_CALL:
 	case KVM_S390_INT_EMERGENCY:
 	case KVM_S390_INT_SERVICE:
+	case KVM_S390_INT_PFAULT_INIT:
+	case KVM_S390_INT_PFAULT_DONE:
 	case KVM_S390_INT_VIRTIO:
 		if (psw_extint_disabled(vcpu))
 			__set_cpuflag(vcpu, CPUSTAT_EXT_INT);
@@ -223,6 +222,30 @@ static void __do_deliver_interrupt(struct kvm_vcpu *vcpu,
 		rc |= put_guest(vcpu, inti->ext.ext_params,
 				(u32 __user *)__LC_EXT_PARAMS);
 		break;
+	case KVM_S390_INT_PFAULT_INIT:
+		trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, inti->type, 0,
+						 inti->ext.ext_params2);
+		rc  = put_guest(vcpu, 0x2603, (u16 __user *) __LC_EXT_INT_CODE);
+		rc |= put_guest(vcpu, 0x0600, (u16 __user *) __LC_EXT_CPU_ADDR);
+		rc |= copy_to_guest(vcpu, __LC_EXT_OLD_PSW,
+				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+		rc |= copy_from_guest(vcpu, &vcpu->arch.sie_block->gpsw,
+				      __LC_EXT_NEW_PSW, sizeof(psw_t));
+		rc |= put_guest(vcpu, inti->ext.ext_params2,
+				(u64 __user *) __LC_EXT_PARAMS2);
+		break;
+	case KVM_S390_INT_PFAULT_DONE:
+		trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, inti->type, 0,
+						 inti->ext.ext_params2);
+		rc  = put_guest(vcpu, 0x2603, (u16 __user *) __LC_EXT_INT_CODE);
+		rc |= put_guest(vcpu, 0x0680, (u16 __user *) __LC_EXT_CPU_ADDR);
+		rc |= copy_to_guest(vcpu, __LC_EXT_OLD_PSW,
+				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+		rc |= copy_from_guest(vcpu, &vcpu->arch.sie_block->gpsw,
+				      __LC_EXT_NEW_PSW, sizeof(psw_t));
+		rc |= put_guest(vcpu, inti->ext.ext_params2,
+				(u64 __user *) __LC_EXT_PARAMS2);
+		break;
 	case KVM_S390_INT_VIRTIO:
 		VCPU_EVENT(vcpu, 4, "interrupt: virtio parm:%x,parm64:%llx",
 			   inti->ext.ext_params, inti->ext.ext_params2);
@@ -357,7 +380,7 @@ static int __try_deliver_ckc_interrupt(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
-static int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu)
+int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
 	struct kvm_s390_float_interrupt *fi = vcpu->arch.local_int.float_int;
@@ -724,6 +747,10 @@ int kvm_s390_inject_vm(struct kvm *kvm,
 		VM_EVENT(kvm, 5, "inject: sclp parm:%x", s390int->parm);
 		inti->ext.ext_params = s390int->parm;
 		break;
+	case KVM_S390_INT_PFAULT_DONE:
+		inti->type = s390int->type;
+		inti->ext.ext_params2 = s390int->parm64;
+		break;
 	case KVM_S390_MCHK:
 		VM_EVENT(kvm, 5, "inject: machine check parm64:%llx",
 			 s390int->parm64);
@@ -811,6 +838,10 @@ int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
 		inti->type = s390int->type;
 		inti->mchk.mcic = s390int->parm64;
 		break;
+	case KVM_S390_INT_PFAULT_INIT:
+		inti->type = s390int->type;
+		inti->ext.ext_params2 = s390int->parm64;
+		break;
 	case KVM_S390_INT_VIRTIO:
 	case KVM_S390_INT_SERVICE:
 	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
@@ -866,6 +897,8 @@ static inline int copy_irq_to_user(struct kvm_s390_interrupt_info *inti,
 	int r = 0;
 
 	switch (inti->type) {
+	case KVM_S390_INT_PFAULT_INIT:
+	case KVM_S390_INT_PFAULT_DONE:
 	case KVM_S390_INT_VIRTIO:
 	case KVM_S390_INT_SERVICE:
 		source = &inti->ext;
@@ -946,6 +979,8 @@ static inline int copy_irq_from_user(struct kvm_s390_interrupt_info *inti,
 	if (get_user(inti->type, (u64 __user *)addr))
 		return -EFAULT;
 	switch (inti->type) {
+	case KVM_S390_INT_PFAULT_INIT:
+	case KVM_S390_INT_PFAULT_DONE:
 	case KVM_S390_INT_VIRTIO:
 	case KVM_S390_INT_SERVICE:
 		target = (void *) &inti->ext;
@@ -996,6 +1031,8 @@ static int enqueue_floating_irq(struct kvm_device *dev,
 static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 {
 	int r = 0;
+	unsigned int i;
+	struct kvm_vcpu *vcpu;
 
 	switch (attr->group) {
 	case KVM_DEV_FLIC_ENQUEUE:
@@ -1005,6 +1042,23 @@ static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 		r = 0;
 		clear_floating_interrupts(dev->kvm);
 		break;
+	case KVM_DEV_FLIC_APF_ENABLE:
+		dev->kvm->arch.gmap->pfault_enabled = 1;
+		break;
+	case KVM_DEV_FLIC_APF_DISABLE:
+		dev->kvm->arch.gmap->pfault_enabled = 0;
+		break;
+	case KVM_DEV_FLIC_APF_DISABLE_WAIT:
+		dev->kvm->arch.gmap->pfault_enabled = 0;
+		/*
+		 * Make sure no async faults are in transition when
+		 * clearing the queues. So we don't need to worry
+		 * about late coming workers.
+		 */
+		synchronize_srcu(&dev->kvm->srcu);
+		kvm_for_each_vcpu(i, vcpu, dev->kvm)
+			kvm_clear_async_pf_completion_queue(vcpu, true);
+		break;
 	default:
 		r = -EINVAL;
 	}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 785e36e..c4f92f6 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -152,6 +152,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 #ifdef CONFIG_KVM_S390_UCONTROL
 	case KVM_CAP_S390_UCONTROL:
 #endif
+	case KVM_CAP_ASYNC_PF:
 	case KVM_CAP_SYNC_REGS:
 	case KVM_CAP_ONE_REG:
 	case KVM_CAP_ENABLE_CAP:
@@ -273,6 +274,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
 	VCPU_EVENT(vcpu, 3, "%s", "free cpu");
 	trace_kvm_s390_destroy_vcpu(vcpu->vcpu_id);
+	kvm_clear_async_pf_completion_queue(vcpu, false);
 	if (!kvm_is_ucontrol(vcpu->kvm)) {
 		clear_bit(63 - vcpu->vcpu_id,
 			  (unsigned long *) &vcpu->kvm->arch.sca->mcn);
@@ -322,6 +324,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 /* Section: vcpu related */
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
+	vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
+	kvm_clear_async_pf_completion_queue(vcpu, false);
 	if (kvm_is_ucontrol(vcpu->kvm)) {
 		vcpu->arch.gmap = gmap_alloc(current->mm);
 		if (!vcpu->arch.gmap)
@@ -379,6 +383,7 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.guest_fpregs.fpc = 0;
 	asm volatile("lfpc %0" : : "Q" (vcpu->arch.guest_fpregs.fpc));
 	vcpu->arch.sie_block->gbea = 1;
+	vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
 	atomic_set_mask(CPUSTAT_STOPPED, &vcpu->arch.sie_block->cpuflags);
 }
 
@@ -702,10 +707,84 @@ static long kvm_arch_fault_in_sync(struct kvm_vcpu *vcpu)
 	return rc;
 }
 
+static void __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
+				      unsigned long token)
+{
+	struct kvm_s390_interrupt inti;
+	inti.parm64 = token;
+
+	if (start_token) {
+		inti.type = KVM_S390_INT_PFAULT_INIT;
+		WARN_ON_ONCE(kvm_s390_inject_vcpu(vcpu, &inti));
+	} else {
+		inti.type = KVM_S390_INT_PFAULT_DONE;
+		WARN_ON_ONCE(kvm_s390_inject_vm(vcpu->kvm, &inti));
+	}
+}
+
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work)
+{
+	trace_kvm_s390_pfault_init(vcpu, work->arch.pfault_token);
+	__kvm_inject_pfault_token(vcpu, true, work->arch.pfault_token);
+}
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work)
+{
+	trace_kvm_s390_pfault_done(vcpu, work->arch.pfault_token);
+	__kvm_inject_pfault_token(vcpu, false, work->arch.pfault_token);
+}
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work)
+{
+	/* s390 will always inject the page directly */
+}
+
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * s390 will always inject the page directly,
+	 * but we still want check_async_completion to cleanup
+	 */
+	return true;
+}
+
+static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu)
+{
+	hva_t hva;
+	struct kvm_arch_async_pf arch;
+	int rc;
+
+	if (vcpu->arch.pfault_token == KVM_S390_PFAULT_TOKEN_INVALID)
+		return 0;
+	if ((vcpu->arch.sie_block->gpsw.mask & vcpu->arch.pfault_select) !=
+	    vcpu->arch.pfault_compare)
+		return 0;
+	if (psw_extint_disabled(vcpu))
+		return 0;
+	if (kvm_cpu_has_interrupt(vcpu))
+		return 0;
+	if (!(vcpu->arch.sie_block->gcr[0] & 0x200ul))
+		return 0;
+	if (!vcpu->arch.gmap->pfault_enabled)
+		return 0;
+
+	hva = gmap_fault(current->thread.gmap_addr, vcpu->arch.gmap);
+	if (copy_from_guest(vcpu, &arch.pfault_token, vcpu->arch.pfault_token, 8))
+		return 0;
+
+	rc = kvm_setup_async_pf(vcpu, current->thread.gmap_addr, hva, &arch);
+	return rc;
+}
+
 static int vcpu_pre_run(struct kvm_vcpu *vcpu)
 {
 	int rc, cpuflags;
 
+	kvm_check_async_pf_completion(vcpu);
+
 	memcpy(&vcpu->arch.sie_block->gg14, &vcpu->run->s.regs.gprs[14], 16);
 
 	if (need_resched())
@@ -743,9 +822,11 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
 		if (kvm_is_ucontrol(vcpu->kvm)) {
 			rc = SIE_INTERCEPT_UCONTROL;
 		} else if (current->thread.gmap_pfault) {
+			trace_kvm_s390_major_guest_pfault(vcpu);
 			current->thread.gmap_pfault = 0;
-			if (kvm_arch_fault_in_sync(vcpu) >= 0)
-				rc = 0;
+			if (kvm_arch_setup_async_pf(vcpu) ||
+			    (kvm_arch_fault_in_sync(vcpu) >= 0))
+					rc = 0;
 		}
 	}
 
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index b44912a..c2bf916 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -159,4 +159,8 @@ void exit_sie_sync(struct kvm_vcpu *vcpu);
 /* implemented in diag.c */
 int kvm_s390_handle_diag(struct kvm_vcpu *vcpu);
 
+/* implemented in interrupt.c */
+int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
+int psw_extint_disabled(struct kvm_vcpu *vcpu);
+
 #endif
diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index bec398c..8fcc5c6 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -186,6 +186,13 @@ int kvm_s390_inject_sigp_stop(struct kvm_vcpu *vcpu, int action)
 static int __sigp_set_arch(struct kvm_vcpu *vcpu, u32 parameter)
 {
 	int rc;
+	unsigned int i;
+	struct kvm_vcpu *vcpu_to_set;
+
+	kvm_for_each_vcpu(i, vcpu_to_set, vcpu->kvm) {
+		vcpu_to_set->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
+		kvm_clear_async_pf_completion_queue(vcpu, false);
+	}
 
 	switch (parameter & 0xff) {
 	case 0:
diff --git a/arch/s390/kvm/trace.h b/arch/s390/kvm/trace.h
index c2f582bb1c..e4816a5 100644
--- a/arch/s390/kvm/trace.h
+++ b/arch/s390/kvm/trace.h
@@ -29,6 +29,52 @@
 	TP_printk("%02d[%016lx-%016lx]: " p_str, __entry->id,		\
 		  __entry->pswmask, __entry->pswaddr, p_args)
 
+TRACE_EVENT(kvm_s390_major_guest_pfault,
+	    TP_PROTO(VCPU_PROTO_COMMON),
+	    TP_ARGS(VCPU_ARGS_COMMON),
+
+	    TP_STRUCT__entry(
+		    VCPU_FIELD_COMMON
+		    ),
+
+	    TP_fast_assign(
+		    VCPU_ASSIGN_COMMON
+		    ),
+	    VCPU_TP_PRINTK("%s", "major fault, maybe applicable for pfault")
+	);
+
+TRACE_EVENT(kvm_s390_pfault_init,
+	    TP_PROTO(VCPU_PROTO_COMMON, long pfault_token),
+	    TP_ARGS(VCPU_ARGS_COMMON, pfault_token),
+
+	    TP_STRUCT__entry(
+		    VCPU_FIELD_COMMON
+		    __field(long, pfault_token)
+		    ),
+
+	    TP_fast_assign(
+		    VCPU_ASSIGN_COMMON
+		    __entry->pfault_token = pfault_token;
+		    ),
+	    VCPU_TP_PRINTK("init pfault token %ld", __entry->pfault_token)
+	);
+
+TRACE_EVENT(kvm_s390_pfault_done,
+	    TP_PROTO(VCPU_PROTO_COMMON, long pfault_token),
+	    TP_ARGS(VCPU_ARGS_COMMON, pfault_token),
+
+	    TP_STRUCT__entry(
+		    VCPU_FIELD_COMMON
+		    __field(long, pfault_token)
+		    ),
+
+	    TP_fast_assign(
+		    VCPU_ASSIGN_COMMON
+		    __entry->pfault_token = pfault_token;
+		    ),
+	    VCPU_TP_PRINTK("done pfault token %ld", __entry->pfault_token)
+	);
+
 /*
  * Tracepoints for SIE entry and exit.
  */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index fa59f1a..5c7cfc0 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -413,6 +413,8 @@ struct kvm_s390_psw {
 #define KVM_S390_PROGRAM_INT		0xfffe0001u
 #define KVM_S390_SIGP_SET_PREFIX	0xfffe0002u
 #define KVM_S390_RESTART		0xfffe0003u
+#define KVM_S390_INT_PFAULT_INIT	0xfffe0004u
+#define KVM_S390_INT_PFAULT_DONE	0xfffe0005u
 #define KVM_S390_MCHK			0xfffe1000u
 #define KVM_S390_INT_VIRTIO		0xffff2603u
 #define KVM_S390_INT_SERVICE		0xffff2401u
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Patchv5 7/7] KVM: async_pf: Exploit one reg interface for pfault
  2013-10-08 14:54 [PATCHv5 0/7] aync page fault support for s390 (plus flic) Christian Borntraeger
                   ` (5 preceding siblings ...)
  2013-10-08 14:54 ` [Patchv5 6/7] KVM: async_pf: Async page fault support on s390 Christian Borntraeger
@ 2013-10-08 14:55 ` Christian Borntraeger
  6 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2013-10-08 14:55 UTC (permalink / raw)
  To: Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Dominik Dingel, Christian Borntraeger

From: Dominik Dingel <dingel@linux.vnet.ibm.com>

To enable pfault after live migration we need to expose pfault_token,
pfault_select and pfault_compare, as one reg registers to userspace.

So that qemu is able to transfer this between the source and the target.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/uapi/asm/kvm.h |  3 +++
 arch/s390/kvm/kvm-s390.c         | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 1e8fced..8876b00 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -65,4 +65,7 @@ struct kvm_sync_regs {
 #define KVM_REG_S390_EPOCHDIFF	(KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x2)
 #define KVM_REG_S390_CPU_TIMER  (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x3)
 #define KVM_REG_S390_CLOCK_COMP (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x4)
+#define KVM_REG_S390_PFTOKEN	(KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x5)
+#define KVM_REG_S390_PFCOMPARE	(KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x6)
+#define KVM_REG_S390_PFSELECT	(KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x7)
 #endif
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index c4f92f6..e65c193 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -552,6 +552,18 @@ static int kvm_arch_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu,
 		r = put_user(vcpu->arch.sie_block->ckc,
 			     (u64 __user *)reg->addr);
 		break;
+	case KVM_REG_S390_PFTOKEN:
+		r = put_user(vcpu->arch.pfault_token,
+			     (u64 __user *)reg->addr);
+		break;	
+	case KVM_REG_S390_PFCOMPARE:
+		r = put_user(vcpu->arch.pfault_compare,
+			     (u64 __user *)reg->addr);
+		break;
+	case KVM_REG_S390_PFSELECT:
+		r = put_user(vcpu->arch.pfault_select,
+			     (u64 __user *)reg->addr);
+		break;
 	default:
 		break;
 	}
@@ -581,6 +593,18 @@ static int kvm_arch_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu,
 		r = get_user(vcpu->arch.sie_block->ckc,
 			     (u64 __user *)reg->addr);
 		break;
+	case KVM_REG_S390_PFTOKEN:
+		r = get_user(vcpu->arch.pfault_token,
+			     (u64 __user *)reg->addr);
+		break;	
+	case KVM_REG_S390_PFCOMPARE:
+		r = get_user(vcpu->arch.pfault_compare,
+			     (u64 __user *)reg->addr);
+		break;
+	case KVM_REG_S390_PFSELECT:
+		r = get_user(vcpu->arch.pfault_select,
+			     (u64 __user *)reg->addr);
+		break;
 	default:
 		break;
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Patchv5 2/7] KVM: s390: add floating irq controller
  2013-10-08 14:54 ` [Patchv5 2/7] KVM: s390: add floating irq controller Christian Borntraeger
@ 2013-10-13  8:39   ` Gleb Natapov
  2013-10-14  7:58     ` Christian Borntraeger
  2013-10-14  8:28     ` Jens Freimann
  0 siblings, 2 replies; 20+ messages in thread
From: Gleb Natapov @ 2013-10-13  8:39 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paolo Bonzini, Cornelia Huck, Heiko Carstens, Martin Schwidefsky,
	KVM, linux-s390, Jens Freimann

On Tue, Oct 08, 2013 at 04:54:55PM +0200, Christian Borntraeger wrote:
> From: Jens Freimann <jfrei@linux.vnet.ibm.com>
> 
> This patch adds a floating irq controller as a kvm_device.
> It will be necessary for migration of floating interrupts as well
> as for hardening the reset code by allowing user space to explicitly
> remove all pending floating interrupts.
> 
> Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  Documentation/virtual/kvm/devices/s390_flic.txt |  36 +++
>  arch/s390/include/asm/kvm_host.h                |   1 +
>  arch/s390/include/uapi/asm/kvm.h                |   5 +
>  arch/s390/kvm/interrupt.c                       | 296 ++++++++++++++++++++----
>  arch/s390/kvm/kvm-s390.c                        |   1 +
>  include/linux/kvm_host.h                        |   1 +
>  include/uapi/linux/kvm.h                        |   1 +
>  virt/kvm/kvm_main.c                             |   5 +
>  8 files changed, 295 insertions(+), 51 deletions(-)
>  create mode 100644 Documentation/virtual/kvm/devices/s390_flic.txt
> 
> diff --git a/Documentation/virtual/kvm/devices/s390_flic.txt b/Documentation/virtual/kvm/devices/s390_flic.txt
> new file mode 100644
> index 0000000..06aef31
> --- /dev/null
> +++ b/Documentation/virtual/kvm/devices/s390_flic.txt
> @@ -0,0 +1,36 @@
> +FLIC (floating interrupt controller)
> +====================================
> +
> +FLIC handles floating (non per-cpu) interrupts, i.e.  I/O, service and some
> +machine check interruptions. All interrupts are stored in a per-vm list of
> +pending interrupts. FLIC performs operations on this list.
> +
> +Only one FLIC instance may be instantiated.
> +
> +FLIC provides support to
> +- add/delete interrupts (KVM_DEV_FLIC_ENQUEUE and _DEQUEUE)
> +- purge all pending floating interrupts (KVM_DEV_FLIC_CLEAR_IRQS)
> +
> +Groups:
> +  KVM_DEV_FLIC_ENQUEUE
> +    Adds one interrupt to the list of pending floating interrupts. Interrupts
> +    are taken from this list for injection into the guest. attr contains
> +    a struct kvm_s390_irq which contains all data relevant for
> +    interrupt injection.
> +    The format of the data structure kvm_s390_irq as it is copied from userspace
> +    is defined in usr/include/linux/kvm.h.
> +    For historic reasons list members are stored in a different data structure, i.e.
> +    we need to copy the relevant data into a struct kvm_s390_interrupt_info
> +    which can then be added to the list.
> +
> +  KVM_DEV_FLIC_DEQUEUE
> +    Takes one element off the pending interrupts list and copies it into userspace.
> +    Dequeued interrupts are not injected into the guest.
> +    attr->addr contains the userspace address of a struct kvm_s390_irq.
> +    List elements are stored in the format of struct kvm_s390_interrupt_info
> +    (arch/s390/include/asm/kvm_host.h) and are copied into a struct kvm_s390_irq
> +    (usr/include/linux/kvm.h)
> +
Can interrupt be dequeued on real HW also? When this interface will be
used?

> +  KVM_DEV_FLIC_CLEAR_IRQS
> +    Simply deletes all elements from the list of currently pending floating interrupts.
> +    No interrupts are injected into the guest.
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 78b6918..2d09c1d 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -237,6 +237,7 @@ struct kvm_arch{
>  	struct sca_block *sca;
>  	debug_info_t *dbf;
>  	struct kvm_s390_float_interrupt float_int;
> +	struct kvm_device *flic;
>  	struct gmap *gmap;
>  	int css_support;
>  };
> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index d25da59..33d52b8 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -16,6 +16,11 @@
>  
>  #define __KVM_S390
>  
> +/* Device control API: s390-specific devices */
> +#define KVM_DEV_FLIC_DEQUEUE 1
> +#define KVM_DEV_FLIC_ENQUEUE 2
> +#define KVM_DEV_FLIC_CLEAR_IRQS 3
> +
>  /* for KVM_GET_REGS and KVM_SET_REGS */
>  struct kvm_regs {
>  	/* general purpose regs for s390 */
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index e7323cd..66478a0 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -659,53 +659,85 @@ struct kvm_s390_interrupt_info *kvm_s390_get_io_int(struct kvm *kvm,
>  	return inti;
>  }
>  
> -int kvm_s390_inject_vm(struct kvm *kvm,
> -		       struct kvm_s390_interrupt *s390int)
> +static void __inject_vm(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
>  {
>  	struct kvm_s390_local_interrupt *li;
>  	struct kvm_s390_float_interrupt *fi;
> -	struct kvm_s390_interrupt_info *inti, *iter;
> +	struct kvm_s390_interrupt_info *iter;
>  	int sigcpu;
>  
> +	mutex_lock(&kvm->lock);
> +	fi = &kvm->arch.float_int;
> +	spin_lock(&fi->lock);
> +	if (!is_ioint(inti->type)) {
> +		list_add_tail(&inti->list, &fi->list);
> +	} else {
> +		u64 isc_bits = int_word_to_isc_bits(inti->io.io_int_word);
> +
> +		/* Keep I/O interrupts sorted in isc order. */
> +		list_for_each_entry(iter, &fi->list, list) {
> +			if (!is_ioint(iter->type))
> +				continue;
> +			if (int_word_to_isc_bits(iter->io.io_int_word) <= isc_bits)
> +				continue;
> +			break;
> +		}
> +		list_add_tail(&inti->list, &iter->list);
> +	}
> +	atomic_set(&fi->active, 1);
> +	sigcpu = find_first_bit(fi->idle_mask, KVM_MAX_VCPUS);
> +	if (sigcpu == KVM_MAX_VCPUS) {
> +		do {
> +			sigcpu = fi->next_rr_cpu++;
> +			if (sigcpu == KVM_MAX_VCPUS)
> +				sigcpu = fi->next_rr_cpu = 0;
> +		} while (fi->local_int[sigcpu] == NULL);
> +	}
> +	li = fi->local_int[sigcpu];
> +	spin_lock_bh(&li->lock);
> +	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
> +	if (waitqueue_active(li->wq))
> +		wake_up_interruptible(li->wq);
> +	spin_unlock_bh(&li->lock);
> +	spin_unlock(&fi->lock);
> +	mutex_unlock(&kvm->lock);
> +}
> +
> +int kvm_s390_inject_vm(struct kvm *kvm,
> +		       struct kvm_s390_interrupt *s390int)
> +{
> +	struct kvm_s390_interrupt_info *inti;
> +
>  	inti = kzalloc(sizeof(*inti), GFP_KERNEL);
>  	if (!inti)
>  		return -ENOMEM;
>  
> -	switch (s390int->type) {
> +	inti->type = s390int->type;
> +	switch (inti->type) {
>  	case KVM_S390_INT_VIRTIO:
>  		VM_EVENT(kvm, 5, "inject: virtio parm:%x,parm64:%llx",
>  			 s390int->parm, s390int->parm64);
> -		inti->type = s390int->type;
>  		inti->ext.ext_params = s390int->parm;
>  		inti->ext.ext_params2 = s390int->parm64;
>  		break;
>  	case KVM_S390_INT_SERVICE:
>  		VM_EVENT(kvm, 5, "inject: sclp parm:%x", s390int->parm);
> -		inti->type = s390int->type;
>  		inti->ext.ext_params = s390int->parm;
>  		break;
> -	case KVM_S390_PROGRAM_INT:
> -	case KVM_S390_SIGP_STOP:
> -	case KVM_S390_INT_EXTERNAL_CALL:
> -	case KVM_S390_INT_EMERGENCY:
> -		kfree(inti);
> -		return -EINVAL;
>  	case KVM_S390_MCHK:
>  		VM_EVENT(kvm, 5, "inject: machine check parm64:%llx",
>  			 s390int->parm64);
> -		inti->type = s390int->type;
>  		inti->mchk.cr14 = s390int->parm; /* upper bits are not used */
>  		inti->mchk.mcic = s390int->parm64;
>  		break;
>  	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
> -		if (s390int->type & IOINT_AI_MASK)
> +		if (inti->type & IOINT_AI_MASK)
>  			VM_EVENT(kvm, 5, "%s", "inject: I/O (AI)");
>  		else
>  			VM_EVENT(kvm, 5, "inject: I/O css %x ss %x schid %04x",
>  				 s390int->type & IOINT_CSSID_MASK,
>  				 s390int->type & IOINT_SSID_MASK,
>  				 s390int->type & IOINT_SCHID_MASK);
> -		inti->type = s390int->type;
>  		inti->io.subchannel_id = s390int->parm >> 16;
>  		inti->io.subchannel_nr = s390int->parm & 0x0000ffffu;
>  		inti->io.io_int_parm = s390int->parm64 >> 32;
> @@ -718,42 +750,7 @@ int kvm_s390_inject_vm(struct kvm *kvm,
>  	trace_kvm_s390_inject_vm(s390int->type, s390int->parm, s390int->parm64,
>  				 2);
>  
> -	mutex_lock(&kvm->lock);
> -	fi = &kvm->arch.float_int;
> -	spin_lock(&fi->lock);
> -	if (!is_ioint(inti->type))
> -		list_add_tail(&inti->list, &fi->list);
> -	else {
> -		u64 isc_bits = int_word_to_isc_bits(inti->io.io_int_word);
> -
> -		/* Keep I/O interrupts sorted in isc order. */
> -		list_for_each_entry(iter, &fi->list, list) {
> -			if (!is_ioint(iter->type))
> -				continue;
> -			if (int_word_to_isc_bits(iter->io.io_int_word)
> -			    <= isc_bits)
> -				continue;
> -			break;
> -		}
> -		list_add_tail(&inti->list, &iter->list);
> -	}
> -	atomic_set(&fi->active, 1);
> -	sigcpu = find_first_bit(fi->idle_mask, KVM_MAX_VCPUS);
> -	if (sigcpu == KVM_MAX_VCPUS) {
> -		do {
> -			sigcpu = fi->next_rr_cpu++;
> -			if (sigcpu == KVM_MAX_VCPUS)
> -				sigcpu = fi->next_rr_cpu = 0;
> -		} while (fi->local_int[sigcpu] == NULL);
> -	}
> -	li = fi->local_int[sigcpu];
> -	spin_lock_bh(&li->lock);
> -	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
> -	if (waitqueue_active(li->wq))
> -		wake_up_interruptible(li->wq);
> -	spin_unlock_bh(&li->lock);
> -	spin_unlock(&fi->lock);
> -	mutex_unlock(&kvm->lock);
> +	__inject_vm(kvm, inti);
>  	return 0;
>  }
>  
> @@ -841,3 +838,200 @@ int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
>  	mutex_unlock(&vcpu->kvm->lock);
>  	return 0;
>  }
> +
> +static void clear_floating_interrupts(struct kvm *kvm)
> +{
> +	struct kvm_s390_float_interrupt *fi;
> +	struct kvm_s390_interrupt_info	*n, *inti = NULL;
> +
> +	mutex_lock(&kvm->lock);
> +	fi = &kvm->arch.float_int;
> +	spin_lock(&fi->lock);
> +	list_for_each_entry_safe(inti, n, &fi->list, list) {
> +		list_del(&inti->list);
> +		kfree(inti);
> +	}
> +	atomic_set(&fi->active, 0);
> +	spin_unlock(&fi->lock);
> +	mutex_unlock(&kvm->lock);
> +}
> +
> +static inline int copy_irq_to_user(struct kvm_s390_interrupt_info *inti,
> +				   u64 addr)
> +{
> +	struct kvm_s390_irq __user *uptr = (struct kvm_s390_irq __user *) addr;
> +	void __user *target;
> +	void *source;
> +	u64 size;
> +	int r = 0;
> +
> +	switch (inti->type) {
> +	case KVM_S390_INT_VIRTIO:
> +	case KVM_S390_INT_SERVICE:
> +		source = &inti->ext;
> +		target = &uptr->u.ext;
> +		size = sizeof(inti->ext);
> +		break;
> +	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
> +		source = &inti->io;
> +		target = &uptr->u.io;
> +		size = sizeof(inti->io);
> +		break;
> +	case KVM_S390_MCHK:
> +		source = &inti->mchk;
> +		target = &uptr->u.mchk;
> +		size = sizeof(inti->mchk);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	r = put_user(inti->type, (u64 __user *) &uptr->type);
> +	if (copy_to_user(target, source, size))
> +		r = -EFAULT;
> +
> +	return r;
> +}
> +
> +static int dequeue_floating_irq(struct kvm *kvm, __u64 addr)
> +{
> +	struct kvm_s390_interrupt_info *inti;
> +	struct kvm_s390_float_interrupt *fi;
> +	int r = 0;
> +
> +
> +	mutex_lock(&kvm->lock);
> +	fi = &kvm->arch.float_int;
> +	spin_lock(&fi->lock);
> +	if (list_empty(&fi->list)) {
> +		mutex_unlock(&kvm->lock);
> +		spin_unlock(&fi->lock);
> +		return -ENODATA;
> +	}
> +	inti = list_first_entry(&fi->list, struct kvm_s390_interrupt_info, list);
> +	list_del(&inti->list);
> +	spin_unlock(&fi->lock);
> +	mutex_unlock(&kvm->lock);
> +
> +	r = copy_irq_to_user(inti, addr);
> +
> +	kfree(inti);
> +	return r;
> +}
> +
> +static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
> +{
> +	int r;
> +
> +	switch (attr->group) {
> +	case KVM_DEV_FLIC_DEQUEUE:
> +		r = dequeue_floating_irq(dev->kvm, attr->addr);
> +		break;
> +	default:
> +		r = -EINVAL;
> +	}
> +
> +	return r;
> +}
> +
> +static inline int copy_irq_from_user(struct kvm_s390_interrupt_info *inti,
> +				     u64 addr)
> +{
> +	struct kvm_s390_irq __user *uptr = (struct kvm_s390_irq __user *) addr;
> +	void *target = NULL;
> +	void __user *source;
> +	u64 size;
> +	int r = 0;
> +
> +	if (get_user(inti->type, (u64 __user *)addr))
> +		return -EFAULT;
> +	switch (inti->type) {
> +	case KVM_S390_INT_VIRTIO:
> +	case KVM_S390_INT_SERVICE:
> +		target = (void *) &inti->ext;
> +		source = &uptr->u.ext;
> +		size = sizeof(inti->ext);
> +		break;
> +	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
> +		target = (void *) &inti->io;
> +		source = &uptr->u.io;
> +		size = sizeof(inti->io);
> +		break;
> +	case KVM_S390_MCHK:
> +		target = (void *) &inti->mchk;
> +		source = &uptr->u.mchk;
> +		size = sizeof(inti->mchk);
> +		break;
> +	default:
> +		r = -EINVAL;
> +		return r;
> +	}
> +
> +	if (copy_from_user(target, source, size))
> +		r = -EFAULT;
> +
> +	return r;
> +}
> +
> +static int enqueue_floating_irq(struct kvm_device *dev,
> +				 struct kvm_device_attr *attr)
> +{
> +	struct kvm_s390_interrupt_info *inti = NULL;
> +	int r = 0;
> +
> +	inti = kzalloc(sizeof(*inti), GFP_KERNEL);
> +	if (!inti)
> +		return -ENOMEM;
> +
> +	r = copy_irq_from_user(inti, attr->addr);
> +	if (r) {
> +		kfree(inti);
> +		return r;
> +	}
> +	__inject_vm(dev->kvm, inti);
> +
> +	return r;
> +}
> +
> +static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
> +{
> +	int r = 0;
> +
> +	switch (attr->group) {
> +	case KVM_DEV_FLIC_ENQUEUE:
> +		r = enqueue_floating_irq(dev, attr);
> +		break;
> +	case KVM_DEV_FLIC_CLEAR_IRQS:
> +		r = 0;
> +		clear_floating_interrupts(dev->kvm);
> +		break;
> +	default:
> +		r = -EINVAL;
> +	}
> +
> +	return r;
> +}
> +
> +static int flic_create(struct kvm_device *dev, u32 type)
> +{
> +	if (!dev)
> +		return -EINVAL;
> +	if (dev->kvm->arch.flic)
> +		return -EINVAL;
> +	dev->kvm->arch.flic = dev;
> +	return 0;
> +}
> +
> +static void flic_destroy(struct kvm_device *dev)
> +{
> +	dev->kvm->arch.flic = NULL;
You need to call kfree(dev) here. There is a patch that moves this free
to a common code, but it is not yet in.

> +}
> +
> +/* s390 floating irq controller (flic) */
> +struct kvm_device_ops kvm_flic_ops = {
> +	.name = "kvm-flic",
> +	.get_attr = flic_get_attr,
> +	.set_attr = flic_set_attr,
> +	.create = flic_create,
> +	.destroy = flic_destroy,
> +};
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 1e4e7b9..30e2c9a 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -157,6 +157,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_ENABLE_CAP:
>  	case KVM_CAP_S390_CSS_SUPPORT:
>  	case KVM_CAP_IOEVENTFD:
> +	case KVM_CAP_DEVICE_CTRL:
>  		r = 1;
>  		break;
>  	case KVM_CAP_NR_VCPUS:
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7c961e1..2077dd0 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1065,6 +1065,7 @@ struct kvm_device *kvm_device_from_filp(struct file *filp);
>  
>  extern struct kvm_device_ops kvm_mpic_ops;
>  extern struct kvm_device_ops kvm_xics_ops;
> +extern struct kvm_device_ops kvm_flic_ops;
>  
>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 450fae8..fa59f1a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -906,6 +906,7 @@ struct kvm_device_attr {
>  #define KVM_DEV_TYPE_FSL_MPIC_20	1
>  #define KVM_DEV_TYPE_FSL_MPIC_42	2
>  #define KVM_DEV_TYPE_XICS		3
> +#define KVM_DEV_TYPE_FLIC		5
>  
>  /*
>   * ioctls for VM fds
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d469114..dd2cc28 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2270,6 +2270,11 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
>  		ops = &kvm_xics_ops;
>  		break;
>  #endif
> +#ifdef CONFIG_S390
> +	case KVM_DEV_TYPE_FLIC:
> +		ops = &kvm_flic_ops;
> +		break;
> +#endif
>  	default:
>  		return -ENODEV;
>  	}
> -- 
> 1.8.3.1

--
			Gleb.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patchv5 5/7] KVM: async_pf: Allow to wait for outstanding work
  2013-10-08 14:54 ` [Patchv5 5/7] KVM: async_pf: Allow to wait for outstanding work Christian Borntraeger
@ 2013-10-13  8:48   ` Gleb Natapov
  2013-10-13  9:08     ` Gleb Natapov
  0 siblings, 1 reply; 20+ messages in thread
From: Gleb Natapov @ 2013-10-13  8:48 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paolo Bonzini, Cornelia Huck, Heiko Carstens, Martin Schwidefsky,
	KVM, linux-s390, Dominik Dingel

On Tue, Oct 08, 2013 at 04:54:58PM +0200, Christian Borntraeger wrote:
> From: Dominik Dingel <dingel@linux.vnet.ibm.com>
> 
> kvm_clear_async_pf_completion get an additional flag to either cancel outstanding
> work or wait for oustanding work to be finished, x86 currentlx cancels all work.
>
I do not see why x86 would not cancel all work in the feature, so the
flag seems to be always true on s390 and always false on x86, which
means that it is better to make it compile time option, same as
KVM_ASYNC_PF_SYNC. Actually we can reuse KVM_ASYNC_PF_SYNC in
kvm_clear_async_pf_completion_queue() instead of adding another one.
 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/x86/kvm/x86.c       | 8 ++++----
>  include/linux/kvm_host.h | 2 +-
>  virt/kvm/async_pf.c      | 6 +++++-
>  3 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 187f824..00a4262 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -539,7 +539,7 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>  	kvm_x86_ops->set_cr0(vcpu, cr0);
>  
>  	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
> -		kvm_clear_async_pf_completion_queue(vcpu);
> +		kvm_clear_async_pf_completion_queue(vcpu, false);
>  		kvm_async_pf_hash_reset(vcpu);
>  	}
>  
> @@ -1911,7 +1911,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>  	vcpu->arch.apf.msr_val = data;
>  
>  	if (!(data & KVM_ASYNC_PF_ENABLED)) {
> -		kvm_clear_async_pf_completion_queue(vcpu);
> +		kvm_clear_async_pf_completion_queue(vcpu, false);
>  		kvm_async_pf_hash_reset(vcpu);
>  		return 0;
>  	}
> @@ -6742,7 +6742,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>  
>  	kvmclock_reset(vcpu);
>  
> -	kvm_clear_async_pf_completion_queue(vcpu);
> +	kvm_clear_async_pf_completion_queue(vcpu, false);
>  	kvm_async_pf_hash_reset(vcpu);
>  	vcpu->arch.apf.halted = false;
>  
> @@ -7015,7 +7015,7 @@ static void kvm_free_vcpus(struct kvm *kvm)
>  	 * Unpin any mmu pages first.
>  	 */
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
> -		kvm_clear_async_pf_completion_queue(vcpu);
> +		kvm_clear_async_pf_completion_queue(vcpu, false);
>  		kvm_unload_vcpu_mmu(vcpu);
>  	}
>  	kvm_for_each_vcpu(i, vcpu, kvm)
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b4e8666..223fcf3 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -192,7 +192,7 @@ struct kvm_async_pf {
>  	struct page *page;
>  };
>  
> -void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
> +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu, bool flush);
>  void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
>  int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
>  		       struct kvm_arch_async_pf *arch);
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index 8f57d63..3e13a73 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -107,7 +107,7 @@ static void async_pf_execute(struct work_struct *work)
>  	kvm_put_kvm(vcpu->kvm);
>  }
>  
> -void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
> +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu, bool flush)
>  {
>  	/* cancel outstanding work queue item */
>  	while (!list_empty(&vcpu->async_pf.queue)) {
> @@ -115,6 +115,10 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>  			list_entry(vcpu->async_pf.queue.next,
>  				   typeof(*work), queue);
>  		list_del(&work->queue);
> +		if (flush) {
> +			flush_work(&work->work);
> +			continue;
> +		}
>  		if (cancel_work_sync(&work->work)) {
>  			mmdrop(work->mm);
>  			kvm_put_kvm(vcpu->kvm); /* == work->vcpu->kvm */
> -- 
> 1.8.3.1

--
			Gleb.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patchv5 5/7] KVM: async_pf: Allow to wait for outstanding work
  2013-10-13  8:48   ` Gleb Natapov
@ 2013-10-13  9:08     ` Gleb Natapov
  0 siblings, 0 replies; 20+ messages in thread
From: Gleb Natapov @ 2013-10-13  9:08 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paolo Bonzini, Cornelia Huck, Heiko Carstens, Martin Schwidefsky,
	KVM, linux-s390, Dominik Dingel

On Sun, Oct 13, 2013 at 11:48:03AM +0300, Gleb Natapov wrote:
> On Tue, Oct 08, 2013 at 04:54:58PM +0200, Christian Borntraeger wrote:
> > From: Dominik Dingel <dingel@linux.vnet.ibm.com>
> > 
> > kvm_clear_async_pf_completion get an additional flag to either cancel outstanding
> > work or wait for oustanding work to be finished, x86 currentlx cancels all work.
> >
> I do not see why x86 would not cancel all work in the feature, so the
> flag seems to be always true on s390 and always false on x86, which
> means that it is better to make it compile time option, same as
> KVM_ASYNC_PF_SYNC. Actually we can reuse KVM_ASYNC_PF_SYNC in
> kvm_clear_async_pf_completion_queue() instead of adding another one.
>  
Spoke to soon. I see that s390 uses both true and false, mostly false.
Lets add another function kvm_drain_async_pf_completion_queue() instead
of new parameter.

> > Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > ---
> >  arch/x86/kvm/x86.c       | 8 ++++----
> >  include/linux/kvm_host.h | 2 +-
> >  virt/kvm/async_pf.c      | 6 +++++-
> >  3 files changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 187f824..00a4262 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -539,7 +539,7 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
> >  	kvm_x86_ops->set_cr0(vcpu, cr0);
> >  
> >  	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
> > -		kvm_clear_async_pf_completion_queue(vcpu);
> > +		kvm_clear_async_pf_completion_queue(vcpu, false);
> >  		kvm_async_pf_hash_reset(vcpu);
> >  	}
> >  
> > @@ -1911,7 +1911,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
> >  	vcpu->arch.apf.msr_val = data;
> >  
> >  	if (!(data & KVM_ASYNC_PF_ENABLED)) {
> > -		kvm_clear_async_pf_completion_queue(vcpu);
> > +		kvm_clear_async_pf_completion_queue(vcpu, false);
> >  		kvm_async_pf_hash_reset(vcpu);
> >  		return 0;
> >  	}
> > @@ -6742,7 +6742,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
> >  
> >  	kvmclock_reset(vcpu);
> >  
> > -	kvm_clear_async_pf_completion_queue(vcpu);
> > +	kvm_clear_async_pf_completion_queue(vcpu, false);
> >  	kvm_async_pf_hash_reset(vcpu);
> >  	vcpu->arch.apf.halted = false;
> >  
> > @@ -7015,7 +7015,7 @@ static void kvm_free_vcpus(struct kvm *kvm)
> >  	 * Unpin any mmu pages first.
> >  	 */
> >  	kvm_for_each_vcpu(i, vcpu, kvm) {
> > -		kvm_clear_async_pf_completion_queue(vcpu);
> > +		kvm_clear_async_pf_completion_queue(vcpu, false);
> >  		kvm_unload_vcpu_mmu(vcpu);
> >  	}
> >  	kvm_for_each_vcpu(i, vcpu, kvm)
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index b4e8666..223fcf3 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -192,7 +192,7 @@ struct kvm_async_pf {
> >  	struct page *page;
> >  };
> >  
> > -void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
> > +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu, bool flush);
> >  void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
> >  int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
> >  		       struct kvm_arch_async_pf *arch);
> > diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> > index 8f57d63..3e13a73 100644
> > --- a/virt/kvm/async_pf.c
> > +++ b/virt/kvm/async_pf.c
> > @@ -107,7 +107,7 @@ static void async_pf_execute(struct work_struct *work)
> >  	kvm_put_kvm(vcpu->kvm);
> >  }
> >  
> > -void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
> > +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu, bool flush)
> >  {
> >  	/* cancel outstanding work queue item */
> >  	while (!list_empty(&vcpu->async_pf.queue)) {
> > @@ -115,6 +115,10 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
> >  			list_entry(vcpu->async_pf.queue.next,
> >  				   typeof(*work), queue);
> >  		list_del(&work->queue);
> > +		if (flush) {
> > +			flush_work(&work->work);
> > +			continue;
> > +		}
> >  		if (cancel_work_sync(&work->work)) {
> >  			mmdrop(work->mm);
> >  			kvm_put_kvm(vcpu->kvm); /* == work->vcpu->kvm */
> > -- 
> > 1.8.3.1
> 
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patchv5 6/7] KVM: async_pf:  Async page fault support on s390
  2013-10-08 14:54 ` [Patchv5 6/7] KVM: async_pf: Async page fault support on s390 Christian Borntraeger
@ 2013-10-13  9:15   ` Gleb Natapov
  2013-10-13  9:30   ` Gleb Natapov
  1 sibling, 0 replies; 20+ messages in thread
From: Gleb Natapov @ 2013-10-13  9:15 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paolo Bonzini, Cornelia Huck, Heiko Carstens, Martin Schwidefsky,
	KVM, linux-s390, Dominik Dingel

On Tue, Oct 08, 2013 at 04:54:59PM +0200, Christian Borntraeger wrote:
> From: Dominik Dingel <dingel@linux.vnet.ibm.com>
> 
> This patch enables async page faults for s390 kvm guests.
> It provides the userspace API to enable, disable and disable_wait this feature.
> By providing disable and disable_wait, the userspace can first asynchronly disable
> the feature, then continue the live migration and later on enforce that the feature
> is off by waiting on it.
> Also it includes the diagnose code, called by the guest to enable async page faults.
> 
> The async page faults will use an already existing guest interface for this
> purpose, as described in "CP Programming Services (SC24-6084)".
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h | 22 +++++++++++
>  arch/s390/include/uapi/asm/kvm.h |  9 +++--
>  arch/s390/kvm/Kconfig            |  2 +
>  arch/s390/kvm/Makefile           |  2 +-
>  arch/s390/kvm/diag.c             | 84 +++++++++++++++++++++++++++++++++++++++
>  arch/s390/kvm/interrupt.c        | 68 ++++++++++++++++++++++++++++----
>  arch/s390/kvm/kvm-s390.c         | 85 +++++++++++++++++++++++++++++++++++++++-
>  arch/s390/kvm/kvm-s390.h         |  4 ++
>  arch/s390/kvm/sigp.c             |  7 ++++
>  arch/s390/kvm/trace.h            | 46 ++++++++++++++++++++++
>  include/uapi/linux/kvm.h         |  2 +
>  11 files changed, 318 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 2d09c1d..151ea01 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -224,6 +224,10 @@ struct kvm_vcpu_arch {
>  		u64		stidp_data;
>  	};
>  	struct gmap *gmap;
> +#define KVM_S390_PFAULT_TOKEN_INVALID	(-1UL)
> +	unsigned long pfault_token;
> +	unsigned long pfault_select;
> +	unsigned long pfault_compare;
>  };
>  
>  struct kvm_vm_stat {
> @@ -250,6 +254,24 @@ static inline bool kvm_is_error_hva(unsigned long addr)
>  	return IS_ERR_VALUE(addr);
>  }
>  
> +#define ASYNC_PF_PER_VCPU	64
> +struct kvm_vcpu;
> +struct kvm_async_pf;
> +struct kvm_arch_async_pf {
> +	unsigned long pfault_token;
> +};
> +
> +bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
> +
> +void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
> +			       struct kvm_async_pf *work);
> +
> +void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
> +				     struct kvm_async_pf *work);
> +
> +void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
> +				 struct kvm_async_pf *work);
> +
>  extern int sie64a(struct kvm_s390_sie_block *, u64 *);
>  extern char sie_exit;
>  #endif
> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index 33d52b8..1e8fced 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -17,9 +17,12 @@
>  #define __KVM_S390
>  
>  /* Device control API: s390-specific devices */
> -#define KVM_DEV_FLIC_DEQUEUE 1
> -#define KVM_DEV_FLIC_ENQUEUE 2
> -#define KVM_DEV_FLIC_CLEAR_IRQS 3
> +#define KVM_DEV_FLIC_DEQUEUE		1
> +#define KVM_DEV_FLIC_ENQUEUE		2
> +#define KVM_DEV_FLIC_CLEAR_IRQS		3
> +#define KVM_DEV_FLIC_APF_ENABLE		4
> +#define KVM_DEV_FLIC_APF_DISABLE	5
> +#define KVM_DEV_FLIC_APF_DISABLE_WAIT	6
>  
>  /* for KVM_GET_REGS and KVM_SET_REGS */
>  struct kvm_regs {
> diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
> index 70b46ea..c8bacbc 100644
> --- a/arch/s390/kvm/Kconfig
> +++ b/arch/s390/kvm/Kconfig
> @@ -23,6 +23,8 @@ config KVM
>  	select ANON_INODES
>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>  	select HAVE_KVM_EVENTFD
> +	select KVM_ASYNC_PF
> +	select KVM_ASYNC_PF_SYNC
>  	---help---
>  	  Support hosting paravirtualized guest machines using the SIE
>  	  virtualization capability on the mainframe. This should work
> diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
> index 40b4c64..a47d2c3 100644
> --- a/arch/s390/kvm/Makefile
> +++ b/arch/s390/kvm/Makefile
> @@ -7,7 +7,7 @@
>  # as published by the Free Software Foundation.
>  
>  KVM := ../../../virt/kvm
> -common-objs = $(KVM)/kvm_main.o $(KVM)/eventfd.o
> +common-objs = $(KVM)/kvm_main.o $(KVM)/eventfd.o  $(KVM)/async_pf.o
>  
>  ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
>  
> diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
> index 78d967f..e50aadf 100644
> --- a/arch/s390/kvm/diag.c
> +++ b/arch/s390/kvm/diag.c
> @@ -17,6 +17,7 @@
>  #include "kvm-s390.h"
>  #include "trace.h"
>  #include "trace-s390.h"
> +#include "gaccess.h"
>  
>  static int diag_release_pages(struct kvm_vcpu *vcpu)
>  {
> @@ -46,6 +47,87 @@ static int diag_release_pages(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> +static int __diag_page_ref_service(struct kvm_vcpu *vcpu)
> +{
> +	struct prs_parm {
> +		u16 code;
> +		u16 subcode;
> +		u16 parm_len;
> +		u16 parm_version;
> +		u64 token_addr;
> +		u64 select_mask;
> +		u64 compare_mask;
> +		u64 zarch;
> +	};
> +	struct prs_parm parm;
> +	int rc;
> +	u16 rx = (vcpu->arch.sie_block->ipa & 0xf0) >> 4;
> +	u16 ry = (vcpu->arch.sie_block->ipa & 0x0f);
> +	unsigned long hva_token = KVM_HVA_ERR_BAD;
> +
> +	if (vcpu->run->s.regs.gprs[rx] & 7)
> +		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
> +	if (copy_from_guest(vcpu, &parm, vcpu->run->s.regs.gprs[rx], sizeof(parm)))
> +		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
> +	if (parm.parm_version != 2 || parm.parm_len < 5 || parm.code != 0x258)
> +		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
> +
> +	switch (parm.subcode) {
> +	case 0: /* TOKEN */
> +		if (vcpu->arch.pfault_token != KVM_S390_PFAULT_TOKEN_INVALID) {
> +			/*
> +			 * pagefault handshake already done, token will not be
> +			 * changed setting return value to 8
> +			 */
> +			vcpu->run->s.regs.gprs[ry] = 8;
> +			return 0;
> +		}
> +
> +		if ((parm.compare_mask & parm.select_mask) != parm.compare_mask ||
> +		    parm.token_addr & 7 || parm.zarch != 0x8000000000000000ULL)
> +			return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
> +
> +		hva_token = gfn_to_hva(vcpu->kvm, gpa_to_gfn(parm.token_addr));
> +		if (kvm_is_error_hva(hva_token))
> +			return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
> +
> +		vcpu->arch.pfault_token = parm.token_addr;
> +		vcpu->arch.pfault_select = parm.select_mask;
> +		vcpu->arch.pfault_compare = parm.compare_mask;
> +		vcpu->run->s.regs.gprs[ry] = 0;
> +		rc = 0;
> +		break;
> +	case 1: /*
> +		 * CANCEL
> +		 * Specification allows to let already pending tokens survive
> +		 * the cancel, therefore to reduce code complexity, we assume, all
> +		 * outstanding tokens are already pending.
> +		 */
> +		if (parm.token_addr || parm.select_mask || parm.compare_mask ||
> +		    parm.zarch)
> +			return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
> +
> +		vcpu->run->s.regs.gprs[ry] = 0;
> +		/*
> +		 * in the case the pfault handling was not established or
> +		 * already canceled we will set represent the right return
> +		 * value to the guest
> +		 */
> +		if (vcpu->arch.pfault_token == KVM_S390_PFAULT_TOKEN_INVALID)
> +			vcpu->run->s.regs.gprs[ry] = 4;
> +		else
> +			vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
> +
> +		rc = 0;
> +		break;
> +	default:
> +		rc = -EOPNOTSUPP;
> +		break;
> +	}
> +
> +	return rc;
> +}
> +
>  static int __diag_time_slice_end(struct kvm_vcpu *vcpu)
>  {
>  	VCPU_EVENT(vcpu, 5, "%s", "diag time slice end");
> @@ -150,6 +232,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
>  		return __diag_time_slice_end(vcpu);
>  	case 0x9c:
>  		return __diag_time_slice_end_directed(vcpu);
> +	case 0x258:
> +		return __diag_page_ref_service(vcpu);
>  	case 0x308:
>  		return __diag_ipl_functions(vcpu);
>  	case 0x500:
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index 66478a0..18e39d4 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -31,7 +31,7 @@ static int is_ioint(u64 type)
>  	return ((type & 0xfffe0000u) != 0xfffe0000u);
>  }
>  
> -static int psw_extint_disabled(struct kvm_vcpu *vcpu)
> +int psw_extint_disabled(struct kvm_vcpu *vcpu)
>  {
>  	return !(vcpu->arch.sie_block->gpsw.mask & PSW_MASK_EXT);
>  }
> @@ -78,11 +78,8 @@ static int __interrupt_is_deliverable(struct kvm_vcpu *vcpu,
>  			return 1;
>  		return 0;
>  	case KVM_S390_INT_SERVICE:
> -		if (psw_extint_disabled(vcpu))
> -			return 0;
> -		if (vcpu->arch.sie_block->gcr[0] & 0x200ul)
> -			return 1;
> -		return 0;
> +	case KVM_S390_INT_PFAULT_INIT:
> +	case KVM_S390_INT_PFAULT_DONE:
>  	case KVM_S390_INT_VIRTIO:
>  		if (psw_extint_disabled(vcpu))
>  			return 0;
> @@ -150,6 +147,8 @@ static void __set_intercept_indicator(struct kvm_vcpu *vcpu,
>  	case KVM_S390_INT_EXTERNAL_CALL:
>  	case KVM_S390_INT_EMERGENCY:
>  	case KVM_S390_INT_SERVICE:
> +	case KVM_S390_INT_PFAULT_INIT:
> +	case KVM_S390_INT_PFAULT_DONE:
>  	case KVM_S390_INT_VIRTIO:
>  		if (psw_extint_disabled(vcpu))
>  			__set_cpuflag(vcpu, CPUSTAT_EXT_INT);
> @@ -223,6 +222,30 @@ static void __do_deliver_interrupt(struct kvm_vcpu *vcpu,
>  		rc |= put_guest(vcpu, inti->ext.ext_params,
>  				(u32 __user *)__LC_EXT_PARAMS);
>  		break;
> +	case KVM_S390_INT_PFAULT_INIT:
> +		trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, inti->type, 0,
> +						 inti->ext.ext_params2);
> +		rc  = put_guest(vcpu, 0x2603, (u16 __user *) __LC_EXT_INT_CODE);
> +		rc |= put_guest(vcpu, 0x0600, (u16 __user *) __LC_EXT_CPU_ADDR);
> +		rc |= copy_to_guest(vcpu, __LC_EXT_OLD_PSW,
> +				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +		rc |= copy_from_guest(vcpu, &vcpu->arch.sie_block->gpsw,
> +				      __LC_EXT_NEW_PSW, sizeof(psw_t));
> +		rc |= put_guest(vcpu, inti->ext.ext_params2,
> +				(u64 __user *) __LC_EXT_PARAMS2);
> +		break;
> +	case KVM_S390_INT_PFAULT_DONE:
> +		trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, inti->type, 0,
> +						 inti->ext.ext_params2);
> +		rc  = put_guest(vcpu, 0x2603, (u16 __user *) __LC_EXT_INT_CODE);
> +		rc |= put_guest(vcpu, 0x0680, (u16 __user *) __LC_EXT_CPU_ADDR);
> +		rc |= copy_to_guest(vcpu, __LC_EXT_OLD_PSW,
> +				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +		rc |= copy_from_guest(vcpu, &vcpu->arch.sie_block->gpsw,
> +				      __LC_EXT_NEW_PSW, sizeof(psw_t));
> +		rc |= put_guest(vcpu, inti->ext.ext_params2,
> +				(u64 __user *) __LC_EXT_PARAMS2);
> +		break;
>  	case KVM_S390_INT_VIRTIO:
>  		VCPU_EVENT(vcpu, 4, "interrupt: virtio parm:%x,parm64:%llx",
>  			   inti->ext.ext_params, inti->ext.ext_params2);
> @@ -357,7 +380,7 @@ static int __try_deliver_ckc_interrupt(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> -static int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu)
> +int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
>  	struct kvm_s390_float_interrupt *fi = vcpu->arch.local_int.float_int;
> @@ -724,6 +747,10 @@ int kvm_s390_inject_vm(struct kvm *kvm,
>  		VM_EVENT(kvm, 5, "inject: sclp parm:%x", s390int->parm);
>  		inti->ext.ext_params = s390int->parm;
>  		break;
> +	case KVM_S390_INT_PFAULT_DONE:
> +		inti->type = s390int->type;
> +		inti->ext.ext_params2 = s390int->parm64;
> +		break;
>  	case KVM_S390_MCHK:
>  		VM_EVENT(kvm, 5, "inject: machine check parm64:%llx",
>  			 s390int->parm64);
> @@ -811,6 +838,10 @@ int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
>  		inti->type = s390int->type;
>  		inti->mchk.mcic = s390int->parm64;
>  		break;
> +	case KVM_S390_INT_PFAULT_INIT:
> +		inti->type = s390int->type;
> +		inti->ext.ext_params2 = s390int->parm64;
> +		break;
>  	case KVM_S390_INT_VIRTIO:
>  	case KVM_S390_INT_SERVICE:
>  	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
> @@ -866,6 +897,8 @@ static inline int copy_irq_to_user(struct kvm_s390_interrupt_info *inti,
>  	int r = 0;
>  
>  	switch (inti->type) {
> +	case KVM_S390_INT_PFAULT_INIT:
> +	case KVM_S390_INT_PFAULT_DONE:
>  	case KVM_S390_INT_VIRTIO:
>  	case KVM_S390_INT_SERVICE:
>  		source = &inti->ext;
> @@ -946,6 +979,8 @@ static inline int copy_irq_from_user(struct kvm_s390_interrupt_info *inti,
>  	if (get_user(inti->type, (u64 __user *)addr))
>  		return -EFAULT;
>  	switch (inti->type) {
> +	case KVM_S390_INT_PFAULT_INIT:
> +	case KVM_S390_INT_PFAULT_DONE:
>  	case KVM_S390_INT_VIRTIO:
>  	case KVM_S390_INT_SERVICE:
>  		target = (void *) &inti->ext;
> @@ -996,6 +1031,8 @@ static int enqueue_floating_irq(struct kvm_device *dev,
>  static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
>  {
>  	int r = 0;
> +	unsigned int i;
> +	struct kvm_vcpu *vcpu;
>  
>  	switch (attr->group) {
>  	case KVM_DEV_FLIC_ENQUEUE:
> @@ -1005,6 +1042,23 @@ static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
>  		r = 0;
>  		clear_floating_interrupts(dev->kvm);
>  		break;
> +	case KVM_DEV_FLIC_APF_ENABLE:
> +		dev->kvm->arch.gmap->pfault_enabled = 1;
> +		break;
> +	case KVM_DEV_FLIC_APF_DISABLE:
> +		dev->kvm->arch.gmap->pfault_enabled = 0;
> +		break;
> +	case KVM_DEV_FLIC_APF_DISABLE_WAIT:
> +		dev->kvm->arch.gmap->pfault_enabled = 0;
> +		/*
> +		 * Make sure no async faults are in transition when
> +		 * clearing the queues. So we don't need to worry
> +		 * about late coming workers.
> +		 */
> +		synchronize_srcu(&dev->kvm->srcu);
> +		kvm_for_each_vcpu(i, vcpu, dev->kvm)
> +			kvm_clear_async_pf_completion_queue(vcpu, true);
> +		break;
>  	default:
>  		r = -EINVAL;
>  	}
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 785e36e..c4f92f6 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -152,6 +152,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  #ifdef CONFIG_KVM_S390_UCONTROL
>  	case KVM_CAP_S390_UCONTROL:
>  #endif
> +	case KVM_CAP_ASYNC_PF:
>  	case KVM_CAP_SYNC_REGS:
>  	case KVM_CAP_ONE_REG:
>  	case KVM_CAP_ENABLE_CAP:
> @@ -273,6 +274,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
>  	VCPU_EVENT(vcpu, 3, "%s", "free cpu");
>  	trace_kvm_s390_destroy_vcpu(vcpu->vcpu_id);
> +	kvm_clear_async_pf_completion_queue(vcpu, false);
>  	if (!kvm_is_ucontrol(vcpu->kvm)) {
>  		clear_bit(63 - vcpu->vcpu_id,
>  			  (unsigned long *) &vcpu->kvm->arch.sca->mcn);
> @@ -322,6 +324,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  /* Section: vcpu related */
>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  {
> +	vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
> +	kvm_clear_async_pf_completion_queue(vcpu, false);
>  	if (kvm_is_ucontrol(vcpu->kvm)) {
>  		vcpu->arch.gmap = gmap_alloc(current->mm);
>  		if (!vcpu->arch.gmap)
> @@ -379,6 +383,7 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
>  	vcpu->arch.guest_fpregs.fpc = 0;
>  	asm volatile("lfpc %0" : : "Q" (vcpu->arch.guest_fpregs.fpc));
>  	vcpu->arch.sie_block->gbea = 1;
> +	vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
>  	atomic_set_mask(CPUSTAT_STOPPED, &vcpu->arch.sie_block->cpuflags);
>  }
>  
> @@ -702,10 +707,84 @@ static long kvm_arch_fault_in_sync(struct kvm_vcpu *vcpu)
>  	return rc;
>  }
>  
> +static void __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
> +				      unsigned long token)
> +{
> +	struct kvm_s390_interrupt inti;
> +	inti.parm64 = token;
> +
> +	if (start_token) {
> +		inti.type = KVM_S390_INT_PFAULT_INIT;
> +		WARN_ON_ONCE(kvm_s390_inject_vcpu(vcpu, &inti));
> +	} else {
> +		inti.type = KVM_S390_INT_PFAULT_DONE;
> +		WARN_ON_ONCE(kvm_s390_inject_vm(vcpu->kvm, &inti));
> +	}
> +}
> +
> +void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
> +				     struct kvm_async_pf *work)
> +{
> +	trace_kvm_s390_pfault_init(vcpu, work->arch.pfault_token);
> +	__kvm_inject_pfault_token(vcpu, true, work->arch.pfault_token);
> +}
> +
> +void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
> +				 struct kvm_async_pf *work)
> +{
> +	trace_kvm_s390_pfault_done(vcpu, work->arch.pfault_token);
> +	__kvm_inject_pfault_token(vcpu, false, work->arch.pfault_token);
> +}
> +
> +void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
> +			       struct kvm_async_pf *work)
> +{
> +	/* s390 will always inject the page directly */
> +}
> +
> +bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * s390 will always inject the page directly,
> +	 * but we still want check_async_completion to cleanup
> +	 */
> +	return true;
> +}
> +
> +static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu)
> +{
> +	hva_t hva;
> +	struct kvm_arch_async_pf arch;
> +	int rc;
> +
> +	if (vcpu->arch.pfault_token == KVM_S390_PFAULT_TOKEN_INVALID)
> +		return 0;
> +	if ((vcpu->arch.sie_block->gpsw.mask & vcpu->arch.pfault_select) !=
> +	    vcpu->arch.pfault_compare)
> +		return 0;
> +	if (psw_extint_disabled(vcpu))
> +		return 0;
> +	if (kvm_cpu_has_interrupt(vcpu))
> +		return 0;
> +	if (!(vcpu->arch.sie_block->gcr[0] & 0x200ul))
> +		return 0;
> +	if (!vcpu->arch.gmap->pfault_enabled)
> +		return 0;
> +
> +	hva = gmap_fault(current->thread.gmap_addr, vcpu->arch.gmap);
> +	if (copy_from_guest(vcpu, &arch.pfault_token, vcpu->arch.pfault_token, 8))
> +		return 0;
> +
> +	rc = kvm_setup_async_pf(vcpu, current->thread.gmap_addr, hva, &arch);
> +	return rc;
> +}
> +
>  static int vcpu_pre_run(struct kvm_vcpu *vcpu)
>  {
>  	int rc, cpuflags;
>  
> +	kvm_check_async_pf_completion(vcpu);
> +
We need this here just to put_page() that was GUPed in
async_pf_execute(). There is a patch[1] that makes this GUP to not
take a reference to a page. This will simplify s390 implementation I
think. For s390 there will be no need to put async_pf work into 
vcpu->async_pf.done and this call will not be needed either.

[1] https://lkml.org/lkml/2013/10/10/282
>  	memcpy(&vcpu->arch.sie_block->gg14, &vcpu->run->s.regs.gprs[14], 16);
>  
>  	if (need_resched())
> @@ -743,9 +822,11 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>  		if (kvm_is_ucontrol(vcpu->kvm)) {
>  			rc = SIE_INTERCEPT_UCONTROL;
>  		} else if (current->thread.gmap_pfault) {
> +			trace_kvm_s390_major_guest_pfault(vcpu);
>  			current->thread.gmap_pfault = 0;
> -			if (kvm_arch_fault_in_sync(vcpu) >= 0)
> -				rc = 0;
> +			if (kvm_arch_setup_async_pf(vcpu) ||
> +			    (kvm_arch_fault_in_sync(vcpu) >= 0))
> +					rc = 0;
>  		}
>  	}
>  
> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
> index b44912a..c2bf916 100644
> --- a/arch/s390/kvm/kvm-s390.h
> +++ b/arch/s390/kvm/kvm-s390.h
> @@ -159,4 +159,8 @@ void exit_sie_sync(struct kvm_vcpu *vcpu);
>  /* implemented in diag.c */
>  int kvm_s390_handle_diag(struct kvm_vcpu *vcpu);
>  
> +/* implemented in interrupt.c */
> +int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
> +int psw_extint_disabled(struct kvm_vcpu *vcpu);
> +
>  #endif
> diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
> index bec398c..8fcc5c6 100644
> --- a/arch/s390/kvm/sigp.c
> +++ b/arch/s390/kvm/sigp.c
> @@ -186,6 +186,13 @@ int kvm_s390_inject_sigp_stop(struct kvm_vcpu *vcpu, int action)
>  static int __sigp_set_arch(struct kvm_vcpu *vcpu, u32 parameter)
>  {
>  	int rc;
> +	unsigned int i;
> +	struct kvm_vcpu *vcpu_to_set;
> +
> +	kvm_for_each_vcpu(i, vcpu_to_set, vcpu->kvm) {
> +		vcpu_to_set->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
> +		kvm_clear_async_pf_completion_queue(vcpu, false);
> +	}
>  
>  	switch (parameter & 0xff) {
>  	case 0:
> diff --git a/arch/s390/kvm/trace.h b/arch/s390/kvm/trace.h
> index c2f582bb1c..e4816a5 100644
> --- a/arch/s390/kvm/trace.h
> +++ b/arch/s390/kvm/trace.h
> @@ -29,6 +29,52 @@
>  	TP_printk("%02d[%016lx-%016lx]: " p_str, __entry->id,		\
>  		  __entry->pswmask, __entry->pswaddr, p_args)
>  
> +TRACE_EVENT(kvm_s390_major_guest_pfault,
> +	    TP_PROTO(VCPU_PROTO_COMMON),
> +	    TP_ARGS(VCPU_ARGS_COMMON),
> +
> +	    TP_STRUCT__entry(
> +		    VCPU_FIELD_COMMON
> +		    ),
> +
> +	    TP_fast_assign(
> +		    VCPU_ASSIGN_COMMON
> +		    ),
> +	    VCPU_TP_PRINTK("%s", "major fault, maybe applicable for pfault")
> +	);
> +
> +TRACE_EVENT(kvm_s390_pfault_init,
> +	    TP_PROTO(VCPU_PROTO_COMMON, long pfault_token),
> +	    TP_ARGS(VCPU_ARGS_COMMON, pfault_token),
> +
> +	    TP_STRUCT__entry(
> +		    VCPU_FIELD_COMMON
> +		    __field(long, pfault_token)
> +		    ),
> +
> +	    TP_fast_assign(
> +		    VCPU_ASSIGN_COMMON
> +		    __entry->pfault_token = pfault_token;
> +		    ),
> +	    VCPU_TP_PRINTK("init pfault token %ld", __entry->pfault_token)
> +	);
> +
> +TRACE_EVENT(kvm_s390_pfault_done,
> +	    TP_PROTO(VCPU_PROTO_COMMON, long pfault_token),
> +	    TP_ARGS(VCPU_ARGS_COMMON, pfault_token),
> +
> +	    TP_STRUCT__entry(
> +		    VCPU_FIELD_COMMON
> +		    __field(long, pfault_token)
> +		    ),
> +
> +	    TP_fast_assign(
> +		    VCPU_ASSIGN_COMMON
> +		    __entry->pfault_token = pfault_token;
> +		    ),
> +	    VCPU_TP_PRINTK("done pfault token %ld", __entry->pfault_token)
> +	);
> +
>  /*
>   * Tracepoints for SIE entry and exit.
>   */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index fa59f1a..5c7cfc0 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -413,6 +413,8 @@ struct kvm_s390_psw {
>  #define KVM_S390_PROGRAM_INT		0xfffe0001u
>  #define KVM_S390_SIGP_SET_PREFIX	0xfffe0002u
>  #define KVM_S390_RESTART		0xfffe0003u
> +#define KVM_S390_INT_PFAULT_INIT	0xfffe0004u
> +#define KVM_S390_INT_PFAULT_DONE	0xfffe0005u
>  #define KVM_S390_MCHK			0xfffe1000u
>  #define KVM_S390_INT_VIRTIO		0xffff2603u
>  #define KVM_S390_INT_SERVICE		0xffff2401u
> -- 
> 1.8.3.1

--
			Gleb.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patchv5 6/7] KVM: async_pf:  Async page fault support on s390
  2013-10-08 14:54 ` [Patchv5 6/7] KVM: async_pf: Async page fault support on s390 Christian Borntraeger
  2013-10-13  9:15   ` Gleb Natapov
@ 2013-10-13  9:30   ` Gleb Natapov
  1 sibling, 0 replies; 20+ messages in thread
From: Gleb Natapov @ 2013-10-13  9:30 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paolo Bonzini, Cornelia Huck, Heiko Carstens, Martin Schwidefsky,
	KVM, linux-s390, Dominik Dingel

On Tue, Oct 08, 2013 at 04:54:59PM +0200, Christian Borntraeger wrote:
> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index 33d52b8..1e8fced 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -17,9 +17,12 @@
>  #define __KVM_S390
>  
>  /* Device control API: s390-specific devices */
> -#define KVM_DEV_FLIC_DEQUEUE 1
> -#define KVM_DEV_FLIC_ENQUEUE 2
> -#define KVM_DEV_FLIC_CLEAR_IRQS 3
Those were introduced in patch 2. Fix spaces there please.

> +#define KVM_DEV_FLIC_DEQUEUE		1
> +#define KVM_DEV_FLIC_ENQUEUE		2
> +#define KVM_DEV_FLIC_CLEAR_IRQS		3
> +#define KVM_DEV_FLIC_APF_ENABLE		4
> +#define KVM_DEV_FLIC_APF_DISABLE	5
> +#define KVM_DEV_FLIC_APF_DISABLE_WAIT	6
Those need to be documented in
Documentation/virtual/kvm/devices/s390_flic.txt

--
			Gleb.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patchv5 2/7] KVM: s390: add floating irq controller
  2013-10-13  8:39   ` Gleb Natapov
@ 2013-10-14  7:58     ` Christian Borntraeger
  2013-10-14 10:21       ` Gleb Natapov
  2013-10-14  8:28     ` Jens Freimann
  1 sibling, 1 reply; 20+ messages in thread
From: Christian Borntraeger @ 2013-10-14  7:58 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, Cornelia Huck, Heiko Carstens, Martin Schwidefsky,
	KVM, linux-s390, Jens Freimann

On 13/10/13 10:39, Gleb Natapov wrote:
> On Tue, Oct 08, 2013 at 04:54:55PM +0200, Christian Borntraeger wrote:
>> From: Jens Freimann <jfrei@linux.vnet.ibm.com>
>>
>> This patch adds a floating irq controller as a kvm_device.
>> It will be necessary for migration of floating interrupts as well
>> as for hardening the reset code by allowing user space to explicitly
>> remove all pending floating interrupts.
>>
>> Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
>> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  Documentation/virtual/kvm/devices/s390_flic.txt |  36 +++
>>  arch/s390/include/asm/kvm_host.h                |   1 +
>>  arch/s390/include/uapi/asm/kvm.h                |   5 +
>>  arch/s390/kvm/interrupt.c                       | 296 ++++++++++++++++++++----
>>  arch/s390/kvm/kvm-s390.c                        |   1 +
>>  include/linux/kvm_host.h                        |   1 +
>>  include/uapi/linux/kvm.h                        |   1 +
>>  virt/kvm/kvm_main.c                             |   5 +
>>  8 files changed, 295 insertions(+), 51 deletions(-)
>>  create mode 100644 Documentation/virtual/kvm/devices/s390_flic.txt
>>
>> diff --git a/Documentation/virtual/kvm/devices/s390_flic.txt b/Documentation/virtual/kvm/devices/s390_flic.txt
>> new file mode 100644
>> index 0000000..06aef31
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/devices/s390_flic.txt
>> @@ -0,0 +1,36 @@
>> +FLIC (floating interrupt controller)
>> +====================================
>> +
>> +FLIC handles floating (non per-cpu) interrupts, i.e.  I/O, service and some
>> +machine check interruptions. All interrupts are stored in a per-vm list of
>> +pending interrupts. FLIC performs operations on this list.
>> +
>> +Only one FLIC instance may be instantiated.
>> +
>> +FLIC provides support to
>> +- add/delete interrupts (KVM_DEV_FLIC_ENQUEUE and _DEQUEUE)
>> +- purge all pending floating interrupts (KVM_DEV_FLIC_CLEAR_IRQS)
>> +
>> +Groups:
>> +  KVM_DEV_FLIC_ENQUEUE
>> +    Adds one interrupt to the list of pending floating interrupts. Interrupts
>> +    are taken from this list for injection into the guest. attr contains
>> +    a struct kvm_s390_irq which contains all data relevant for
>> +    interrupt injection.
>> +    The format of the data structure kvm_s390_irq as it is copied from userspace
>> +    is defined in usr/include/linux/kvm.h.
>> +    For historic reasons list members are stored in a different data structure, i.e.
>> +    we need to copy the relevant data into a struct kvm_s390_interrupt_info
>> +    which can then be added to the list.
>> +
>> +  KVM_DEV_FLIC_DEQUEUE
>> +    Takes one element off the pending interrupts list and copies it into userspace.
>> +    Dequeued interrupts are not injected into the guest.
>> +    attr->addr contains the userspace address of a struct kvm_s390_irq.
>> +    List elements are stored in the format of struct kvm_s390_interrupt_info
>> +    (arch/s390/include/asm/kvm_host.h) and are copied into a struct kvm_s390_irq
>> +    (usr/include/linux/kvm.h)
>> +
> Can interrupt be dequeued on real HW also? When this interface will be
> used?

This is used for migration. (Will send the qemu patches soon). 

The thing is,that we dont have classic interrupt lines from a software perspective. We have
external interrupts, I/O interrupts, machine check interrupts, program interrupts, restart
interrupts, supervisor call interrupts. Several interrupts are cpu local (restart, supervisor
call, program check interrupts). This is simple, because only one interrupt can be pending
at a CPU.

There are several types of external interrupts. Some are cpu local (after a sigp --> IPI)
others are floating (pending on all CPUs).

All I/O interrupts are floating. The thing is now, that each classic I/O interrupts has a 12
byte chunk of per interrupt payload. (There is an additional interrupt response block that has
to be queried by the guest with TSCH). 

Since we can have up to 256k devices per guest, we could in theory have up to 256k classic 
interrupts with different payload pending. (plus machine checks, plus other floating external
interupts)
We dont want to always dump this big queue, therefore we decided to keep these in a list.

I think Jens will give some introduction about s390 architecture at the KVM forum this year.


PS: There is some upcoming changes that will use adapter interrupts for virtio (I/O interrupts
with limited payload that do interrupt coalescing for all pending devices), Still we need to
be able to handle the classic interrupts as well for ccw configuration handling.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patchv5 2/7] KVM: s390: add floating irq controller
  2013-10-13  8:39   ` Gleb Natapov
  2013-10-14  7:58     ` Christian Borntraeger
@ 2013-10-14  8:28     ` Jens Freimann
  2013-10-14  9:07       ` Gleb Natapov
  1 sibling, 1 reply; 20+ messages in thread
From: Jens Freimann @ 2013-10-14  8:28 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christian Borntraeger, Paolo Bonzini, Cornelia Huck,
	Heiko Carstens, Martin Schwidefsky, KVM, linux-s390

On Sun, Oct 13, 2013 at 11:39:55AM +0300, Gleb Natapov wrote:
> On Tue, Oct 08, 2013 at 04:54:55PM +0200, Christian Borntraeger wrote:
> > From: Jens Freimann <jfrei@linux.vnet.ibm.com>
> > 
> > This patch adds a floating irq controller as a kvm_device.
> > It will be necessary for migration of floating interrupts as well
> > as for hardening the reset code by allowing user space to explicitly
> > remove all pending floating interrupts.
> > 
> > Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
> > Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > ---
> >  Documentation/virtual/kvm/devices/s390_flic.txt |  36 +++
> >  arch/s390/include/asm/kvm_host.h                |   1 +
> >  arch/s390/include/uapi/asm/kvm.h                |   5 +
> >  arch/s390/kvm/interrupt.c                       | 296 ++++++++++++++++++++----
> >  arch/s390/kvm/kvm-s390.c                        |   1 +
> >  include/linux/kvm_host.h                        |   1 +
> >  include/uapi/linux/kvm.h                        |   1 +
> >  virt/kvm/kvm_main.c                             |   5 +
> >  8 files changed, 295 insertions(+), 51 deletions(-)
> >  create mode 100644 Documentation/virtual/kvm/devices/s390_flic.txt
> > 
> > diff --git a/Documentation/virtual/kvm/devices/s390_flic.txt b/Documentation/virtual/kvm/devices/s390_flic.txt
> > new file mode 100644
> > index 0000000..06aef31
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/devices/s390_flic.txt
> > @@ -0,0 +1,36 @@
> > +FLIC (floating interrupt controller)
> > +====================================
> > +
> > +FLIC handles floating (non per-cpu) interrupts, i.e.  I/O, service and some
> > +machine check interruptions. All interrupts are stored in a per-vm list of
> > +pending interrupts. FLIC performs operations on this list.
> > +
> > +Only one FLIC instance may be instantiated.
> > +
> > +FLIC provides support to
> > +- add/delete interrupts (KVM_DEV_FLIC_ENQUEUE and _DEQUEUE)
> > +- purge all pending floating interrupts (KVM_DEV_FLIC_CLEAR_IRQS)
> > +
> > +Groups:
> > +  KVM_DEV_FLIC_ENQUEUE
> > +    Adds one interrupt to the list of pending floating interrupts. Interrupts
> > +    are taken from this list for injection into the guest. attr contains
> > +    a struct kvm_s390_irq which contains all data relevant for
> > +    interrupt injection.
> > +    The format of the data structure kvm_s390_irq as it is copied from userspace
> > +    is defined in usr/include/linux/kvm.h.
> > +    For historic reasons list members are stored in a different data structure, i.e.
> > +    we need to copy the relevant data into a struct kvm_s390_interrupt_info
> > +    which can then be added to the list.
> > +
> > +  KVM_DEV_FLIC_DEQUEUE
> > +    Takes one element off the pending interrupts list and copies it into userspace.
> > +    Dequeued interrupts are not injected into the guest.
> > +    attr->addr contains the userspace address of a struct kvm_s390_irq.
> > +    List elements are stored in the format of struct kvm_s390_interrupt_info
> > +    (arch/s390/include/asm/kvm_host.h) and are copied into a struct kvm_s390_irq
> > +    (usr/include/linux/kvm.h)
> > +
> Can interrupt be dequeued on real HW also? When this interface will be
> used?

We will it for migration. (See Christians mail)  
 
> > +  KVM_DEV_FLIC_CLEAR_IRQS
> > +    Simply deletes all elements from the list of currently pending floating interrupts.
> > +    No interrupts are injected into the guest.
> > diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> > index 78b6918..2d09c1d 100644
> > --- a/arch/s390/include/asm/kvm_host.h
> > +++ b/arch/s390/include/asm/kvm_host.h
> > @@ -237,6 +237,7 @@ struct kvm_arch{
> >  	struct sca_block *sca;
> >  	debug_info_t *dbf;
> >  	struct kvm_s390_float_interrupt float_int;
> > +	struct kvm_device *flic;
> >  	struct gmap *gmap;
> >  	int css_support;
> >  };
> > diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> > index d25da59..33d52b8 100644
> > --- a/arch/s390/include/uapi/asm/kvm.h
> > +++ b/arch/s390/include/uapi/asm/kvm.h
> > @@ -16,6 +16,11 @@
> >  
> >  #define __KVM_S390
> >  
> > +/* Device control API: s390-specific devices */
> > +#define KVM_DEV_FLIC_DEQUEUE 1
> > +#define KVM_DEV_FLIC_ENQUEUE 2
> > +#define KVM_DEV_FLIC_CLEAR_IRQS 3
> > +
> >  /* for KVM_GET_REGS and KVM_SET_REGS */
> >  struct kvm_regs {
> >  	/* general purpose regs for s390 */
> > diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> > index e7323cd..66478a0 100644
> > --- a/arch/s390/kvm/interrupt.c
> > +++ b/arch/s390/kvm/interrupt.c
> > @@ -659,53 +659,85 @@ struct kvm_s390_interrupt_info *kvm_s390_get_io_int(struct kvm *kvm,
> >  	return inti;
> >  }
> >  
> > -int kvm_s390_inject_vm(struct kvm *kvm,
> > -		       struct kvm_s390_interrupt *s390int)
> > +static void __inject_vm(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
> >  {
> >  	struct kvm_s390_local_interrupt *li;
> >  	struct kvm_s390_float_interrupt *fi;
> > -	struct kvm_s390_interrupt_info *inti, *iter;
> > +	struct kvm_s390_interrupt_info *iter;
> >  	int sigcpu;
> >  
> > +	mutex_lock(&kvm->lock);
> > +	fi = &kvm->arch.float_int;
> > +	spin_lock(&fi->lock);
> > +	if (!is_ioint(inti->type)) {
> > +		list_add_tail(&inti->list, &fi->list);
> > +	} else {
> > +		u64 isc_bits = int_word_to_isc_bits(inti->io.io_int_word);
> > +
> > +		/* Keep I/O interrupts sorted in isc order. */
> > +		list_for_each_entry(iter, &fi->list, list) {
> > +			if (!is_ioint(iter->type))
> > +				continue;
> > +			if (int_word_to_isc_bits(iter->io.io_int_word) <= isc_bits)
> > +				continue;
> > +			break;
> > +		}
> > +		list_add_tail(&inti->list, &iter->list);
> > +	}
> > +	atomic_set(&fi->active, 1);
> > +	sigcpu = find_first_bit(fi->idle_mask, KVM_MAX_VCPUS);
> > +	if (sigcpu == KVM_MAX_VCPUS) {
> > +		do {
> > +			sigcpu = fi->next_rr_cpu++;
> > +			if (sigcpu == KVM_MAX_VCPUS)
> > +				sigcpu = fi->next_rr_cpu = 0;
> > +		} while (fi->local_int[sigcpu] == NULL);
> > +	}
> > +	li = fi->local_int[sigcpu];
> > +	spin_lock_bh(&li->lock);
> > +	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
> > +	if (waitqueue_active(li->wq))
> > +		wake_up_interruptible(li->wq);
> > +	spin_unlock_bh(&li->lock);
> > +	spin_unlock(&fi->lock);
> > +	mutex_unlock(&kvm->lock);
> > +}
> > +
> > +int kvm_s390_inject_vm(struct kvm *kvm,
> > +		       struct kvm_s390_interrupt *s390int)
> > +{
> > +	struct kvm_s390_interrupt_info *inti;
> > +
> >  	inti = kzalloc(sizeof(*inti), GFP_KERNEL);
> >  	if (!inti)
> >  		return -ENOMEM;
> >  
> > -	switch (s390int->type) {
> > +	inti->type = s390int->type;
> > +	switch (inti->type) {
> >  	case KVM_S390_INT_VIRTIO:
> >  		VM_EVENT(kvm, 5, "inject: virtio parm:%x,parm64:%llx",
> >  			 s390int->parm, s390int->parm64);
> > -		inti->type = s390int->type;
> >  		inti->ext.ext_params = s390int->parm;
> >  		inti->ext.ext_params2 = s390int->parm64;
> >  		break;
> >  	case KVM_S390_INT_SERVICE:
> >  		VM_EVENT(kvm, 5, "inject: sclp parm:%x", s390int->parm);
> > -		inti->type = s390int->type;
> >  		inti->ext.ext_params = s390int->parm;
> >  		break;
> > -	case KVM_S390_PROGRAM_INT:
> > -	case KVM_S390_SIGP_STOP:
> > -	case KVM_S390_INT_EXTERNAL_CALL:
> > -	case KVM_S390_INT_EMERGENCY:
> > -		kfree(inti);
> > -		return -EINVAL;
> >  	case KVM_S390_MCHK:
> >  		VM_EVENT(kvm, 5, "inject: machine check parm64:%llx",
> >  			 s390int->parm64);
> > -		inti->type = s390int->type;
> >  		inti->mchk.cr14 = s390int->parm; /* upper bits are not used */
> >  		inti->mchk.mcic = s390int->parm64;
> >  		break;
> >  	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
> > -		if (s390int->type & IOINT_AI_MASK)
> > +		if (inti->type & IOINT_AI_MASK)
> >  			VM_EVENT(kvm, 5, "%s", "inject: I/O (AI)");
> >  		else
> >  			VM_EVENT(kvm, 5, "inject: I/O css %x ss %x schid %04x",
> >  				 s390int->type & IOINT_CSSID_MASK,
> >  				 s390int->type & IOINT_SSID_MASK,
> >  				 s390int->type & IOINT_SCHID_MASK);
> > -		inti->type = s390int->type;
> >  		inti->io.subchannel_id = s390int->parm >> 16;
> >  		inti->io.subchannel_nr = s390int->parm & 0x0000ffffu;
> >  		inti->io.io_int_parm = s390int->parm64 >> 32;
> > @@ -718,42 +750,7 @@ int kvm_s390_inject_vm(struct kvm *kvm,
> >  	trace_kvm_s390_inject_vm(s390int->type, s390int->parm, s390int->parm64,
> >  				 2);
> >  
> > -	mutex_lock(&kvm->lock);
> > -	fi = &kvm->arch.float_int;
> > -	spin_lock(&fi->lock);
> > -	if (!is_ioint(inti->type))
> > -		list_add_tail(&inti->list, &fi->list);
> > -	else {
> > -		u64 isc_bits = int_word_to_isc_bits(inti->io.io_int_word);
> > -
> > -		/* Keep I/O interrupts sorted in isc order. */
> > -		list_for_each_entry(iter, &fi->list, list) {
> > -			if (!is_ioint(iter->type))
> > -				continue;
> > -			if (int_word_to_isc_bits(iter->io.io_int_word)
> > -			    <= isc_bits)
> > -				continue;
> > -			break;
> > -		}
> > -		list_add_tail(&inti->list, &iter->list);
> > -	}
> > -	atomic_set(&fi->active, 1);
> > -	sigcpu = find_first_bit(fi->idle_mask, KVM_MAX_VCPUS);
> > -	if (sigcpu == KVM_MAX_VCPUS) {
> > -		do {
> > -			sigcpu = fi->next_rr_cpu++;
> > -			if (sigcpu == KVM_MAX_VCPUS)
> > -				sigcpu = fi->next_rr_cpu = 0;
> > -		} while (fi->local_int[sigcpu] == NULL);
> > -	}
> > -	li = fi->local_int[sigcpu];
> > -	spin_lock_bh(&li->lock);
> > -	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
> > -	if (waitqueue_active(li->wq))
> > -		wake_up_interruptible(li->wq);
> > -	spin_unlock_bh(&li->lock);
> > -	spin_unlock(&fi->lock);
> > -	mutex_unlock(&kvm->lock);
> > +	__inject_vm(kvm, inti);
> >  	return 0;
> >  }
> >  
> > @@ -841,3 +838,200 @@ int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
> >  	mutex_unlock(&vcpu->kvm->lock);
> >  	return 0;
> >  }
> > +
> > +static void clear_floating_interrupts(struct kvm *kvm)
> > +{
> > +	struct kvm_s390_float_interrupt *fi;
> > +	struct kvm_s390_interrupt_info	*n, *inti = NULL;
> > +
> > +	mutex_lock(&kvm->lock);
> > +	fi = &kvm->arch.float_int;
> > +	spin_lock(&fi->lock);
> > +	list_for_each_entry_safe(inti, n, &fi->list, list) {
> > +		list_del(&inti->list);
> > +		kfree(inti);
> > +	}
> > +	atomic_set(&fi->active, 0);
> > +	spin_unlock(&fi->lock);
> > +	mutex_unlock(&kvm->lock);
> > +}
> > +
> > +static inline int copy_irq_to_user(struct kvm_s390_interrupt_info *inti,
> > +				   u64 addr)
> > +{
> > +	struct kvm_s390_irq __user *uptr = (struct kvm_s390_irq __user *) addr;
> > +	void __user *target;
> > +	void *source;
> > +	u64 size;
> > +	int r = 0;
> > +
> > +	switch (inti->type) {
> > +	case KVM_S390_INT_VIRTIO:
> > +	case KVM_S390_INT_SERVICE:
> > +		source = &inti->ext;
> > +		target = &uptr->u.ext;
> > +		size = sizeof(inti->ext);
> > +		break;
> > +	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
> > +		source = &inti->io;
> > +		target = &uptr->u.io;
> > +		size = sizeof(inti->io);
> > +		break;
> > +	case KVM_S390_MCHK:
> > +		source = &inti->mchk;
> > +		target = &uptr->u.mchk;
> > +		size = sizeof(inti->mchk);
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	r = put_user(inti->type, (u64 __user *) &uptr->type);
> > +	if (copy_to_user(target, source, size))
> > +		r = -EFAULT;
> > +
> > +	return r;
> > +}
> > +
> > +static int dequeue_floating_irq(struct kvm *kvm, __u64 addr)
> > +{
> > +	struct kvm_s390_interrupt_info *inti;
> > +	struct kvm_s390_float_interrupt *fi;
> > +	int r = 0;
> > +
> > +
> > +	mutex_lock(&kvm->lock);
> > +	fi = &kvm->arch.float_int;
> > +	spin_lock(&fi->lock);
> > +	if (list_empty(&fi->list)) {
> > +		mutex_unlock(&kvm->lock);
> > +		spin_unlock(&fi->lock);
> > +		return -ENODATA;
> > +	}
> > +	inti = list_first_entry(&fi->list, struct kvm_s390_interrupt_info, list);
> > +	list_del(&inti->list);
> > +	spin_unlock(&fi->lock);
> > +	mutex_unlock(&kvm->lock);
> > +
> > +	r = copy_irq_to_user(inti, addr);
> > +
> > +	kfree(inti);
> > +	return r;
> > +}
> > +
> > +static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
> > +{
> > +	int r;
> > +
> > +	switch (attr->group) {
> > +	case KVM_DEV_FLIC_DEQUEUE:
> > +		r = dequeue_floating_irq(dev->kvm, attr->addr);
> > +		break;
> > +	default:
> > +		r = -EINVAL;
> > +	}
> > +
> > +	return r;
> > +}
> > +
> > +static inline int copy_irq_from_user(struct kvm_s390_interrupt_info *inti,
> > +				     u64 addr)
> > +{
> > +	struct kvm_s390_irq __user *uptr = (struct kvm_s390_irq __user *) addr;
> > +	void *target = NULL;
> > +	void __user *source;
> > +	u64 size;
> > +	int r = 0;
> > +
> > +	if (get_user(inti->type, (u64 __user *)addr))
> > +		return -EFAULT;
> > +	switch (inti->type) {
> > +	case KVM_S390_INT_VIRTIO:
> > +	case KVM_S390_INT_SERVICE:
> > +		target = (void *) &inti->ext;
> > +		source = &uptr->u.ext;
> > +		size = sizeof(inti->ext);
> > +		break;
> > +	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
> > +		target = (void *) &inti->io;
> > +		source = &uptr->u.io;
> > +		size = sizeof(inti->io);
> > +		break;
> > +	case KVM_S390_MCHK:
> > +		target = (void *) &inti->mchk;
> > +		source = &uptr->u.mchk;
> > +		size = sizeof(inti->mchk);
> > +		break;
> > +	default:
> > +		r = -EINVAL;
> > +		return r;
> > +	}
> > +
> > +	if (copy_from_user(target, source, size))
> > +		r = -EFAULT;
> > +
> > +	return r;
> > +}
> > +
> > +static int enqueue_floating_irq(struct kvm_device *dev,
> > +				 struct kvm_device_attr *attr)
> > +{
> > +	struct kvm_s390_interrupt_info *inti = NULL;
> > +	int r = 0;
> > +
> > +	inti = kzalloc(sizeof(*inti), GFP_KERNEL);
> > +	if (!inti)
> > +		return -ENOMEM;
> > +
> > +	r = copy_irq_from_user(inti, attr->addr);
> > +	if (r) {
> > +		kfree(inti);
> > +		return r;
> > +	}
> > +	__inject_vm(dev->kvm, inti);
> > +
> > +	return r;
> > +}
> > +
> > +static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
> > +{
> > +	int r = 0;
> > +
> > +	switch (attr->group) {
> > +	case KVM_DEV_FLIC_ENQUEUE:
> > +		r = enqueue_floating_irq(dev, attr);
> > +		break;
> > +	case KVM_DEV_FLIC_CLEAR_IRQS:
> > +		r = 0;
> > +		clear_floating_interrupts(dev->kvm);
> > +		break;
> > +	default:
> > +		r = -EINVAL;
> > +	}
> > +
> > +	return r;
> > +}
> > +
> > +static int flic_create(struct kvm_device *dev, u32 type)
> > +{
> > +	if (!dev)
> > +		return -EINVAL;
> > +	if (dev->kvm->arch.flic)
> > +		return -EINVAL;
> > +	dev->kvm->arch.flic = dev;
> > +	return 0;
> > +}
> > +
> > +static void flic_destroy(struct kvm_device *dev)
> > +{
> > +	dev->kvm->arch.flic = NULL;
> You need to call kfree(dev) here. There is a patch that moves this free
> to a common code, but it is not yet in.

Ok, I wasn't aware of this. Will fix.

regards
Jens
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patchv5 2/7] KVM: s390: add floating irq controller
  2013-10-14  8:28     ` Jens Freimann
@ 2013-10-14  9:07       ` Gleb Natapov
  2013-10-14 11:13         ` Jens Freimann
  0 siblings, 1 reply; 20+ messages in thread
From: Gleb Natapov @ 2013-10-14  9:07 UTC (permalink / raw)
  To: Jens Freimann
  Cc: Christian Borntraeger, Paolo Bonzini, Cornelia Huck,
	Heiko Carstens, Martin Schwidefsky, KVM, linux-s390

On Mon, Oct 14, 2013 at 10:28:57AM +0200, Jens Freimann wrote:
> On Sun, Oct 13, 2013 at 11:39:55AM +0300, Gleb Natapov wrote:
> > On Tue, Oct 08, 2013 at 04:54:55PM +0200, Christian Borntraeger wrote:
> > > From: Jens Freimann <jfrei@linux.vnet.ibm.com>
> > > 
> > > This patch adds a floating irq controller as a kvm_device.
> > > It will be necessary for migration of floating interrupts as well
> > > as for hardening the reset code by allowing user space to explicitly
> > > remove all pending floating interrupts.
> > > 
> > > Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
> > > Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > > ---
> > >  Documentation/virtual/kvm/devices/s390_flic.txt |  36 +++
> > >  arch/s390/include/asm/kvm_host.h                |   1 +
> > >  arch/s390/include/uapi/asm/kvm.h                |   5 +
> > >  arch/s390/kvm/interrupt.c                       | 296 ++++++++++++++++++++----
> > >  arch/s390/kvm/kvm-s390.c                        |   1 +
> > >  include/linux/kvm_host.h                        |   1 +
> > >  include/uapi/linux/kvm.h                        |   1 +
> > >  virt/kvm/kvm_main.c                             |   5 +
> > >  8 files changed, 295 insertions(+), 51 deletions(-)
> > >  create mode 100644 Documentation/virtual/kvm/devices/s390_flic.txt
> > > 
> > > diff --git a/Documentation/virtual/kvm/devices/s390_flic.txt b/Documentation/virtual/kvm/devices/s390_flic.txt
> > > new file mode 100644
> > > index 0000000..06aef31
> > > --- /dev/null
> > > +++ b/Documentation/virtual/kvm/devices/s390_flic.txt
> > > @@ -0,0 +1,36 @@
> > > +FLIC (floating interrupt controller)
> > > +====================================
> > > +
> > > +FLIC handles floating (non per-cpu) interrupts, i.e.  I/O, service and some
> > > +machine check interruptions. All interrupts are stored in a per-vm list of
> > > +pending interrupts. FLIC performs operations on this list.
> > > +
> > > +Only one FLIC instance may be instantiated.
> > > +
> > > +FLIC provides support to
> > > +- add/delete interrupts (KVM_DEV_FLIC_ENQUEUE and _DEQUEUE)
> > > +- purge all pending floating interrupts (KVM_DEV_FLIC_CLEAR_IRQS)
> > > +
> > > +Groups:
> > > +  KVM_DEV_FLIC_ENQUEUE
> > > +    Adds one interrupt to the list of pending floating interrupts. Interrupts
> > > +    are taken from this list for injection into the guest. attr contains
> > > +    a struct kvm_s390_irq which contains all data relevant for
> > > +    interrupt injection.
> > > +    The format of the data structure kvm_s390_irq as it is copied from userspace
> > > +    is defined in usr/include/linux/kvm.h.
> > > +    For historic reasons list members are stored in a different data structure, i.e.
> > > +    we need to copy the relevant data into a struct kvm_s390_interrupt_info
> > > +    which can then be added to the list.
> > > +
> > > +  KVM_DEV_FLIC_DEQUEUE
> > > +    Takes one element off the pending interrupts list and copies it into userspace.
> > > +    Dequeued interrupts are not injected into the guest.
> > > +    attr->addr contains the userspace address of a struct kvm_s390_irq.
> > > +    List elements are stored in the format of struct kvm_s390_interrupt_info
> > > +    (arch/s390/include/asm/kvm_host.h) and are copied into a struct kvm_s390_irq
> > > +    (usr/include/linux/kvm.h)
> > > +
> > Can interrupt be dequeued on real HW also? When this interface will be
> > used?
> 
> We will it for migration. (See Christians mail)  
>  
For migration you do not need dequeue semantics though, dequeuing
does not hurt in case of migration, but what if we will want to add
QEMU monitor command that inspects interrupt queue (like we have for
inspecting processor's register state). The destructive nature of the
command will prevent us from doing so. We need non destructive way to
inspect the state, no?

--
			Gleb.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patchv5 2/7] KVM: s390: add floating irq controller
  2013-10-14  7:58     ` Christian Borntraeger
@ 2013-10-14 10:21       ` Gleb Natapov
  0 siblings, 0 replies; 20+ messages in thread
From: Gleb Natapov @ 2013-10-14 10:21 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paolo Bonzini, Cornelia Huck, Heiko Carstens, Martin Schwidefsky,
	KVM, linux-s390, Jens Freimann

On Mon, Oct 14, 2013 at 09:58:47AM +0200, Christian Borntraeger wrote:
> On 13/10/13 10:39, Gleb Natapov wrote:
> > On Tue, Oct 08, 2013 at 04:54:55PM +0200, Christian Borntraeger wrote:
> >> From: Jens Freimann <jfrei@linux.vnet.ibm.com>
> >>
> >> This patch adds a floating irq controller as a kvm_device.
> >> It will be necessary for migration of floating interrupts as well
> >> as for hardening the reset code by allowing user space to explicitly
> >> remove all pending floating interrupts.
> >>
> >> Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
> >> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> >> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> >> ---
> >>  Documentation/virtual/kvm/devices/s390_flic.txt |  36 +++
> >>  arch/s390/include/asm/kvm_host.h                |   1 +
> >>  arch/s390/include/uapi/asm/kvm.h                |   5 +
> >>  arch/s390/kvm/interrupt.c                       | 296 ++++++++++++++++++++----
> >>  arch/s390/kvm/kvm-s390.c                        |   1 +
> >>  include/linux/kvm_host.h                        |   1 +
> >>  include/uapi/linux/kvm.h                        |   1 +
> >>  virt/kvm/kvm_main.c                             |   5 +
> >>  8 files changed, 295 insertions(+), 51 deletions(-)
> >>  create mode 100644 Documentation/virtual/kvm/devices/s390_flic.txt
> >>
> >> diff --git a/Documentation/virtual/kvm/devices/s390_flic.txt b/Documentation/virtual/kvm/devices/s390_flic.txt
> >> new file mode 100644
> >> index 0000000..06aef31
> >> --- /dev/null
> >> +++ b/Documentation/virtual/kvm/devices/s390_flic.txt
> >> @@ -0,0 +1,36 @@
> >> +FLIC (floating interrupt controller)
> >> +====================================
> >> +
> >> +FLIC handles floating (non per-cpu) interrupts, i.e.  I/O, service and some
> >> +machine check interruptions. All interrupts are stored in a per-vm list of
> >> +pending interrupts. FLIC performs operations on this list.
> >> +
> >> +Only one FLIC instance may be instantiated.
> >> +
> >> +FLIC provides support to
> >> +- add/delete interrupts (KVM_DEV_FLIC_ENQUEUE and _DEQUEUE)
> >> +- purge all pending floating interrupts (KVM_DEV_FLIC_CLEAR_IRQS)
> >> +
> >> +Groups:
> >> +  KVM_DEV_FLIC_ENQUEUE
> >> +    Adds one interrupt to the list of pending floating interrupts. Interrupts
> >> +    are taken from this list for injection into the guest. attr contains
> >> +    a struct kvm_s390_irq which contains all data relevant for
> >> +    interrupt injection.
> >> +    The format of the data structure kvm_s390_irq as it is copied from userspace
> >> +    is defined in usr/include/linux/kvm.h.
> >> +    For historic reasons list members are stored in a different data structure, i.e.
> >> +    we need to copy the relevant data into a struct kvm_s390_interrupt_info
> >> +    which can then be added to the list.
> >> +
> >> +  KVM_DEV_FLIC_DEQUEUE
> >> +    Takes one element off the pending interrupts list and copies it into userspace.
> >> +    Dequeued interrupts are not injected into the guest.
> >> +    attr->addr contains the userspace address of a struct kvm_s390_irq.
> >> +    List elements are stored in the format of struct kvm_s390_interrupt_info
> >> +    (arch/s390/include/asm/kvm_host.h) and are copied into a struct kvm_s390_irq
> >> +    (usr/include/linux/kvm.h)
> >> +
> > Can interrupt be dequeued on real HW also? When this interface will be
> > used?
> 
> This is used for migration. (Will send the qemu patches soon). 
> 
> The thing is,that we dont have classic interrupt lines from a software perspective. We have
> external interrupts, I/O interrupts, machine check interrupts, program interrupts, restart
> interrupts, supervisor call interrupts. Several interrupts are cpu local (restart, supervisor
> call, program check interrupts). This is simple, because only one interrupt can be pending
> at a CPU.
> 
> There are several types of external interrupts. Some are cpu local (after a sigp --> IPI)
> others are floating (pending on all CPUs).
> 
> All I/O interrupts are floating. The thing is now, that each classic I/O interrupts has a 12
> byte chunk of per interrupt payload. (There is an additional interrupt response block that has
> to be queried by the guest with TSCH). 
> 
> Since we can have up to 256k devices per guest, we could in theory have up to 256k classic 
> interrupts with different payload pending. (plus machine checks, plus other floating external
> interupts)
> We dont want to always dump this big queue, therefore we decided to keep these in a list.
> 
But you need to limit the queue anyway otherwise userspace can allocate
quite a bit of kernel memory by filling in the queue, no? It is strange
to have destructive interface here because it makes queue inspection
impossible (at least without stopping a guest, dequeuing everything and
queuing it back again). What about an interface where userspace provides
an array to store queue elements and if an array is not big enough
appropriate array is returned, so userspace can retry with bigger one?
Using list internally is OK as long as its length is limited somehow.

--
			Gleb.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patchv5 2/7] KVM: s390: add floating irq controller
  2013-10-14  9:07       ` Gleb Natapov
@ 2013-10-14 11:13         ` Jens Freimann
  2013-10-14 11:31           ` Gleb Natapov
  2013-10-14 13:35           ` Gleb Natapov
  0 siblings, 2 replies; 20+ messages in thread
From: Jens Freimann @ 2013-10-14 11:13 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christian Borntraeger, Paolo Bonzini, Cornelia Huck,
	Heiko Carstens, Martin Schwidefsky, KVM, linux-s390

On Mon, Oct 14, 2013 at 12:07:24PM +0300, Gleb Natapov wrote:
> On Mon, Oct 14, 2013 at 10:28:57AM +0200, Jens Freimann wrote:
> > On Sun, Oct 13, 2013 at 11:39:55AM +0300, Gleb Natapov wrote:
> > > On Tue, Oct 08, 2013 at 04:54:55PM +0200, Christian Borntraeger wrote:
> > > > From: Jens Freimann <jfrei@linux.vnet.ibm.com>
> > > > 
> > > > This patch adds a floating irq controller as a kvm_device.
> > > > It will be necessary for migration of floating interrupts as well
> > > > as for hardening the reset code by allowing user space to explicitly
> > > > remove all pending floating interrupts.
> > > > 
> > > > Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
> > > > Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > > > ---
> > > >  Documentation/virtual/kvm/devices/s390_flic.txt |  36 +++
> > > >  arch/s390/include/asm/kvm_host.h                |   1 +
> > > >  arch/s390/include/uapi/asm/kvm.h                |   5 +
> > > >  arch/s390/kvm/interrupt.c                       | 296 ++++++++++++++++++++----
> > > >  arch/s390/kvm/kvm-s390.c                        |   1 +
> > > >  include/linux/kvm_host.h                        |   1 +
> > > >  include/uapi/linux/kvm.h                        |   1 +
> > > >  virt/kvm/kvm_main.c                             |   5 +
> > > >  8 files changed, 295 insertions(+), 51 deletions(-)
> > > >  create mode 100644 Documentation/virtual/kvm/devices/s390_flic.txt
> > > > 
> > > > diff --git a/Documentation/virtual/kvm/devices/s390_flic.txt b/Documentation/virtual/kvm/devices/s390_flic.txt
> > > > new file mode 100644
> > > > index 0000000..06aef31
> > > > --- /dev/null
> > > > +++ b/Documentation/virtual/kvm/devices/s390_flic.txt
> > > > @@ -0,0 +1,36 @@
> > > > +FLIC (floating interrupt controller)
> > > > +====================================
> > > > +
> > > > +FLIC handles floating (non per-cpu) interrupts, i.e.  I/O, service and some
> > > > +machine check interruptions. All interrupts are stored in a per-vm list of
> > > > +pending interrupts. FLIC performs operations on this list.
> > > > +
> > > > +Only one FLIC instance may be instantiated.
> > > > +
> > > > +FLIC provides support to
> > > > +- add/delete interrupts (KVM_DEV_FLIC_ENQUEUE and _DEQUEUE)
> > > > +- purge all pending floating interrupts (KVM_DEV_FLIC_CLEAR_IRQS)
> > > > +
> > > > +Groups:
> > > > +  KVM_DEV_FLIC_ENQUEUE
> > > > +    Adds one interrupt to the list of pending floating interrupts. Interrupts
> > > > +    are taken from this list for injection into the guest. attr contains
> > > > +    a struct kvm_s390_irq which contains all data relevant for
> > > > +    interrupt injection.
> > > > +    The format of the data structure kvm_s390_irq as it is copied from userspace
> > > > +    is defined in usr/include/linux/kvm.h.
> > > > +    For historic reasons list members are stored in a different data structure, i.e.
> > > > +    we need to copy the relevant data into a struct kvm_s390_interrupt_info
> > > > +    which can then be added to the list.
> > > > +
> > > > +  KVM_DEV_FLIC_DEQUEUE
> > > > +    Takes one element off the pending interrupts list and copies it into userspace.
> > > > +    Dequeued interrupts are not injected into the guest.
> > > > +    attr->addr contains the userspace address of a struct kvm_s390_irq.
> > > > +    List elements are stored in the format of struct kvm_s390_interrupt_info
> > > > +    (arch/s390/include/asm/kvm_host.h) and are copied into a struct kvm_s390_irq
> > > > +    (usr/include/linux/kvm.h)
> > > > +
> > > Can interrupt be dequeued on real HW also? When this interface will be
> > > used?
> > 
> > We will it for migration. (See Christians mail)  
> >  
> For migration you do not need dequeue semantics though, dequeuing
> does not hurt in case of migration, but what if we will want to add
> QEMU monitor command that inspects interrupt queue (like we have for
> inspecting processor's register state). The destructive nature of the
> command will prevent us from doing so. We need non destructive way to
> inspect the state, no?

Inspection is a requirement that we didn't have in mind when we designed 
this. But yes, it should be non-destructive in that case.

Christian and I agree that we should defer these patches for now. It would
be good if we could discuss this interface next week at the KVM Forum.


regards
Jens

> 
> --
> 			Gleb.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patchv5 2/7] KVM: s390: add floating irq controller
  2013-10-14 11:13         ` Jens Freimann
@ 2013-10-14 11:31           ` Gleb Natapov
  2013-10-14 13:35           ` Gleb Natapov
  1 sibling, 0 replies; 20+ messages in thread
From: Gleb Natapov @ 2013-10-14 11:31 UTC (permalink / raw)
  To: Jens Freimann
  Cc: Christian Borntraeger, Paolo Bonzini, Cornelia Huck,
	Heiko Carstens, Martin Schwidefsky, KVM, linux-s390

On Mon, Oct 14, 2013 at 01:13:30PM +0200, Jens Freimann wrote:
> On Mon, Oct 14, 2013 at 12:07:24PM +0300, Gleb Natapov wrote:
> > On Mon, Oct 14, 2013 at 10:28:57AM +0200, Jens Freimann wrote:
> > > On Sun, Oct 13, 2013 at 11:39:55AM +0300, Gleb Natapov wrote:
> > > > On Tue, Oct 08, 2013 at 04:54:55PM +0200, Christian Borntraeger wrote:
> > > > > From: Jens Freimann <jfrei@linux.vnet.ibm.com>
> > > > > 
> > > > > This patch adds a floating irq controller as a kvm_device.
> > > > > It will be necessary for migration of floating interrupts as well
> > > > > as for hardening the reset code by allowing user space to explicitly
> > > > > remove all pending floating interrupts.
> > > > > 
> > > > > Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
> > > > > Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > > > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > > > > ---
> > > > >  Documentation/virtual/kvm/devices/s390_flic.txt |  36 +++
> > > > >  arch/s390/include/asm/kvm_host.h                |   1 +
> > > > >  arch/s390/include/uapi/asm/kvm.h                |   5 +
> > > > >  arch/s390/kvm/interrupt.c                       | 296 ++++++++++++++++++++----
> > > > >  arch/s390/kvm/kvm-s390.c                        |   1 +
> > > > >  include/linux/kvm_host.h                        |   1 +
> > > > >  include/uapi/linux/kvm.h                        |   1 +
> > > > >  virt/kvm/kvm_main.c                             |   5 +
> > > > >  8 files changed, 295 insertions(+), 51 deletions(-)
> > > > >  create mode 100644 Documentation/virtual/kvm/devices/s390_flic.txt
> > > > > 
> > > > > diff --git a/Documentation/virtual/kvm/devices/s390_flic.txt b/Documentation/virtual/kvm/devices/s390_flic.txt
> > > > > new file mode 100644
> > > > > index 0000000..06aef31
> > > > > --- /dev/null
> > > > > +++ b/Documentation/virtual/kvm/devices/s390_flic.txt
> > > > > @@ -0,0 +1,36 @@
> > > > > +FLIC (floating interrupt controller)
> > > > > +====================================
> > > > > +
> > > > > +FLIC handles floating (non per-cpu) interrupts, i.e.  I/O, service and some
> > > > > +machine check interruptions. All interrupts are stored in a per-vm list of
> > > > > +pending interrupts. FLIC performs operations on this list.
> > > > > +
> > > > > +Only one FLIC instance may be instantiated.
> > > > > +
> > > > > +FLIC provides support to
> > > > > +- add/delete interrupts (KVM_DEV_FLIC_ENQUEUE and _DEQUEUE)
> > > > > +- purge all pending floating interrupts (KVM_DEV_FLIC_CLEAR_IRQS)
> > > > > +
> > > > > +Groups:
> > > > > +  KVM_DEV_FLIC_ENQUEUE
> > > > > +    Adds one interrupt to the list of pending floating interrupts. Interrupts
> > > > > +    are taken from this list for injection into the guest. attr contains
> > > > > +    a struct kvm_s390_irq which contains all data relevant for
> > > > > +    interrupt injection.
> > > > > +    The format of the data structure kvm_s390_irq as it is copied from userspace
> > > > > +    is defined in usr/include/linux/kvm.h.
> > > > > +    For historic reasons list members are stored in a different data structure, i.e.
> > > > > +    we need to copy the relevant data into a struct kvm_s390_interrupt_info
> > > > > +    which can then be added to the list.
> > > > > +
> > > > > +  KVM_DEV_FLIC_DEQUEUE
> > > > > +    Takes one element off the pending interrupts list and copies it into userspace.
> > > > > +    Dequeued interrupts are not injected into the guest.
> > > > > +    attr->addr contains the userspace address of a struct kvm_s390_irq.
> > > > > +    List elements are stored in the format of struct kvm_s390_interrupt_info
> > > > > +    (arch/s390/include/asm/kvm_host.h) and are copied into a struct kvm_s390_irq
> > > > > +    (usr/include/linux/kvm.h)
> > > > > +
> > > > Can interrupt be dequeued on real HW also? When this interface will be
> > > > used?
> > > 
> > > We will it for migration. (See Christians mail)  
> > >  
> > For migration you do not need dequeue semantics though, dequeuing
> > does not hurt in case of migration, but what if we will want to add
> > QEMU monitor command that inspects interrupt queue (like we have for
> > inspecting processor's register state). The destructive nature of the
> > command will prevent us from doing so. We need non destructive way to
> > inspect the state, no?
> 
> Inspection is a requirement that we didn't have in mind when we designed 
> this. But yes, it should be non-destructive in that case.
> 
> Christian and I agree that we should defer these patches for now. It would
> be good if we could discuss this interface next week at the KVM Forum.
> 
Of course.

--
			Gleb.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patchv5 2/7] KVM: s390: add floating irq controller
  2013-10-14 11:13         ` Jens Freimann
  2013-10-14 11:31           ` Gleb Natapov
@ 2013-10-14 13:35           ` Gleb Natapov
  1 sibling, 0 replies; 20+ messages in thread
From: Gleb Natapov @ 2013-10-14 13:35 UTC (permalink / raw)
  To: Jens Freimann
  Cc: Christian Borntraeger, Paolo Bonzini, Cornelia Huck,
	Heiko Carstens, Martin Schwidefsky, KVM, linux-s390

On Mon, Oct 14, 2013 at 01:13:30PM +0200, Jens Freimann wrote:
> On Mon, Oct 14, 2013 at 12:07:24PM +0300, Gleb Natapov wrote:
> > On Mon, Oct 14, 2013 at 10:28:57AM +0200, Jens Freimann wrote:
> > > On Sun, Oct 13, 2013 at 11:39:55AM +0300, Gleb Natapov wrote:
> > > > On Tue, Oct 08, 2013 at 04:54:55PM +0200, Christian Borntraeger wrote:
> > > > > From: Jens Freimann <jfrei@linux.vnet.ibm.com>
> > > > > 
> > > > > This patch adds a floating irq controller as a kvm_device.
> > > > > It will be necessary for migration of floating interrupts as well
> > > > > as for hardening the reset code by allowing user space to explicitly
> > > > > remove all pending floating interrupts.
> > > > > 
> > > > > Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
> > > > > Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > > > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > > > > ---
> > > > >  Documentation/virtual/kvm/devices/s390_flic.txt |  36 +++
> > > > >  arch/s390/include/asm/kvm_host.h                |   1 +
> > > > >  arch/s390/include/uapi/asm/kvm.h                |   5 +
> > > > >  arch/s390/kvm/interrupt.c                       | 296 ++++++++++++++++++++----
> > > > >  arch/s390/kvm/kvm-s390.c                        |   1 +
> > > > >  include/linux/kvm_host.h                        |   1 +
> > > > >  include/uapi/linux/kvm.h                        |   1 +
> > > > >  virt/kvm/kvm_main.c                             |   5 +
> > > > >  8 files changed, 295 insertions(+), 51 deletions(-)
> > > > >  create mode 100644 Documentation/virtual/kvm/devices/s390_flic.txt
> > > > > 
> > > > > diff --git a/Documentation/virtual/kvm/devices/s390_flic.txt b/Documentation/virtual/kvm/devices/s390_flic.txt
> > > > > new file mode 100644
> > > > > index 0000000..06aef31
> > > > > --- /dev/null
> > > > > +++ b/Documentation/virtual/kvm/devices/s390_flic.txt
> > > > > @@ -0,0 +1,36 @@
> > > > > +FLIC (floating interrupt controller)
> > > > > +====================================
> > > > > +
> > > > > +FLIC handles floating (non per-cpu) interrupts, i.e.  I/O, service and some
> > > > > +machine check interruptions. All interrupts are stored in a per-vm list of
> > > > > +pending interrupts. FLIC performs operations on this list.
> > > > > +
> > > > > +Only one FLIC instance may be instantiated.
> > > > > +
> > > > > +FLIC provides support to
> > > > > +- add/delete interrupts (KVM_DEV_FLIC_ENQUEUE and _DEQUEUE)
> > > > > +- purge all pending floating interrupts (KVM_DEV_FLIC_CLEAR_IRQS)
> > > > > +
> > > > > +Groups:
> > > > > +  KVM_DEV_FLIC_ENQUEUE
> > > > > +    Adds one interrupt to the list of pending floating interrupts. Interrupts
> > > > > +    are taken from this list for injection into the guest. attr contains
> > > > > +    a struct kvm_s390_irq which contains all data relevant for
> > > > > +    interrupt injection.
> > > > > +    The format of the data structure kvm_s390_irq as it is copied from userspace
> > > > > +    is defined in usr/include/linux/kvm.h.
> > > > > +    For historic reasons list members are stored in a different data structure, i.e.
> > > > > +    we need to copy the relevant data into a struct kvm_s390_interrupt_info
> > > > > +    which can then be added to the list.
> > > > > +
> > > > > +  KVM_DEV_FLIC_DEQUEUE
> > > > > +    Takes one element off the pending interrupts list and copies it into userspace.
> > > > > +    Dequeued interrupts are not injected into the guest.
> > > > > +    attr->addr contains the userspace address of a struct kvm_s390_irq.
> > > > > +    List elements are stored in the format of struct kvm_s390_interrupt_info
> > > > > +    (arch/s390/include/asm/kvm_host.h) and are copied into a struct kvm_s390_irq
> > > > > +    (usr/include/linux/kvm.h)
> > > > > +
> > > > Can interrupt be dequeued on real HW also? When this interface will be
> > > > used?
> > > 
> > > We will it for migration. (See Christians mail)  
> > >  
> > For migration you do not need dequeue semantics though, dequeuing
> > does not hurt in case of migration, but what if we will want to add
> > QEMU monitor command that inspects interrupt queue (like we have for
> > inspecting processor's register state). The destructive nature of the
> > command will prevent us from doing so. We need non destructive way to
> > inspect the state, no?
> 
> Inspection is a requirement that we didn't have in mind when we designed 
> this. But yes, it should be non-destructive in that case.
> 
BTW, not destructive interface is better for migration too. What if
migration fails and src needs to be restarted?

--
			Gleb.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-10-14 13:35 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-08 14:54 [PATCHv5 0/7] aync page fault support for s390 (plus flic) Christian Borntraeger
2013-10-08 14:54 ` [Patchv5 1/7] KVM: s390: add and extend interrupt information data structs Christian Borntraeger
2013-10-08 14:54 ` [Patchv5 2/7] KVM: s390: add floating irq controller Christian Borntraeger
2013-10-13  8:39   ` Gleb Natapov
2013-10-14  7:58     ` Christian Borntraeger
2013-10-14 10:21       ` Gleb Natapov
2013-10-14  8:28     ` Jens Freimann
2013-10-14  9:07       ` Gleb Natapov
2013-10-14 11:13         ` Jens Freimann
2013-10-14 11:31           ` Gleb Natapov
2013-10-14 13:35           ` Gleb Natapov
2013-10-08 14:54 ` [Patchv5 3/7] KVM: s390: Add FAULT_FLAG_RETRY_NOWAIT for guest fault Christian Borntraeger
2013-10-08 14:54 ` [Patchv5 4/7] KVM: async_pf: Provide additional direct page notification Christian Borntraeger
2013-10-08 14:54 ` [Patchv5 5/7] KVM: async_pf: Allow to wait for outstanding work Christian Borntraeger
2013-10-13  8:48   ` Gleb Natapov
2013-10-13  9:08     ` Gleb Natapov
2013-10-08 14:54 ` [Patchv5 6/7] KVM: async_pf: Async page fault support on s390 Christian Borntraeger
2013-10-13  9:15   ` Gleb Natapov
2013-10-13  9:30   ` Gleb Natapov
2013-10-08 14:55 ` [Patchv5 7/7] KVM: async_pf: Exploit one reg interface for pfault Christian Borntraeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.