[PATCH v5 0/2] kvm: x86: hyperv: guest->host event signaling via eventfd

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v5 0/2] kvm: x86: hyperv: guest->host event signaling via eventfd
@ 2017-12-12 16:07 Roman Kagan
  2017-12-12 16:07 ` [PATCH v5 1/2] kvm: x86: factor out kvm.arch.hyperv (de)init Roman Kagan
  2017-12-12 16:07 ` [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd Roman Kagan
  0 siblings, 2 replies; 21+ messages in thread
From: Roman Kagan @ 2017-12-12 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Radim Krčmář
  Cc: Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov,
	David Hildenbrand

Make it possible for guests using Hyper-V emulation to do guest->host
notification via EVENT_SIGNAL hypercall without a user exit.

v4 -> v5:
 - fix block comment formatting

v3 -> v4:
 - switch to kvm_vcpu_read_guest and take srcu_read_lock around it
 - rework and document the interpretation of the hypercall parameter
 - merge !fast version into kvm_hvcall_signal_event for brevity

v2 -> v3:
 - expand docs on allowed values and return codes
 - fix uninitialized return
 - style fixes

v1 -> v2:
 - make data types consistent
 - get by without the recently dropped struct hv_input_signal_event
 - fix subject prefixes

Roman Kagan (2):
  kvm: x86: factor out kvm.arch.hyperv (de)init
  kvm: x86: hyperv: guest->host event signaling via eventfd

 Documentation/virtual/kvm/api.txt |  31 ++++++++++
 arch/x86/include/asm/kvm_host.h   |   2 +
 arch/x86/kvm/hyperv.h             |   4 ++
 include/uapi/linux/kvm.h          |  13 +++++
 arch/x86/kvm/hyperv.c             | 119 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                |  13 ++++-
 6 files changed, 180 insertions(+), 2 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v5 1/2] kvm: x86: factor out kvm.arch.hyperv (de)init
  2017-12-12 16:07 [PATCH v5 0/2] kvm: x86: hyperv: guest->host event signaling via eventfd Roman Kagan
@ 2017-12-12 16:07 ` Roman Kagan
  2017-12-12 16:07 ` [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd Roman Kagan
  1 sibling, 0 replies; 21+ messages in thread
From: Roman Kagan @ 2017-12-12 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Radim Krčmář
  Cc: Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov,
	David Hildenbrand

Move kvm.arch.hyperv initialization and cleanup to separate functions.

For now only a mutex is inited in the former, and the latter is empty;
more stuff will go in there in a followup patch.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
---
v3 -> v4 -> v5:
 no change

v2 -> v3:
 - style fixes

v1 -> v2:
 - fix subject prefix

 arch/x86/kvm/hyperv.h | 3 +++
 arch/x86/kvm/hyperv.c | 9 +++++++++
 arch/x86/kvm/x86.c    | 3 ++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index e637631a9574..cc2468244ca2 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -88,4 +88,7 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu);
 void kvm_hv_setup_tsc_page(struct kvm *kvm,
 			   struct pvclock_vcpu_time_info *hv_clock);
 
+void kvm_hv_init_vm(struct kvm *kvm);
+void kvm_hv_destroy_vm(struct kvm *kvm);
+
 #endif
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index dc97f2544b6f..015fb06c7522 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1301,3 +1301,12 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	kvm_hv_hypercall_set_result(vcpu, ret);
 	return 1;
 }
+
+void kvm_hv_init_vm(struct kvm *kvm)
+{
+	mutex_init(&kvm->arch.hyperv.hv_lock);
+}
+
+void kvm_hv_destroy_vm(struct kvm *kvm)
+{
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index faf843c9b916..d17cf7900138 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8148,7 +8148,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
 	mutex_init(&kvm->arch.apic_map_lock);
-	mutex_init(&kvm->arch.hyperv.hv_lock);
 	spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock);
 
 	kvm->arch.kvmclock_offset = -ktime_get_boot_ns();
@@ -8157,6 +8156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
 
+	kvm_hv_init_vm(kvm);
 	kvm_page_track_init(kvm);
 	kvm_mmu_init_vm(kvm);
 
@@ -8291,6 +8291,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
 	kvm_mmu_uninit_vm(kvm);
 	kvm_page_track_cleanup(kvm);
+	kvm_hv_destroy_vm(kvm);
 }
 
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-12 16:07 [PATCH v5 0/2] kvm: x86: hyperv: guest->host event signaling via eventfd Roman Kagan
  2017-12-12 16:07 ` [PATCH v5 1/2] kvm: x86: factor out kvm.arch.hyperv (de)init Roman Kagan
@ 2017-12-12 16:07 ` Roman Kagan
  2017-12-12 16:22   ` Paolo Bonzini
                     ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Roman Kagan @ 2017-12-12 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Radim Krčmář
  Cc: Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov,
	David Hildenbrand

In Hyper-V, the fast guest->host notification mechanism is the
SIGNAL_EVENT hypercall, with a single parameter of the connection ID to
signal.

Currently this hypercall incurs a user exit and requires the userspace
to decode the parameters and trigger the notification of the potentially
different I/O context.

To avoid the costly user exit, process this hypercall and signal the
corresponding eventfd in KVM, similar to ioeventfd.  The association
between the connection id and the eventfd is established via the newly
introduced KVM_HYPERV_EVENTFD ioctl, and maintained in an
(srcu-protected) IDR.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
---
v4 -> v5:
 - fix block comment formatting

v3 -> v4:
 - switch to kvm_vcpu_read_guest and take srcu_read_lock around it
 - rework and document the interpretation of the hypercall parameter
 - merge !fast version into kvm_hvcall_signal_event for brevity

v2 -> v3:
 - expand docs on allowed values and return codes
 - fix uninitialized return
 - style fixes

v1 -> v2:
 - make data types consistent
 - get by without the recently dropped struct hv_input_signal_event
 - fix subject prefix

 Documentation/virtual/kvm/api.txt |  31 +++++++++++
 arch/x86/include/asm/kvm_host.h   |   2 +
 arch/x86/kvm/hyperv.h             |   1 +
 include/uapi/linux/kvm.h          |  13 +++++
 arch/x86/kvm/hyperv.c             | 110 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                |  10 ++++
 6 files changed, 166 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 57d3ee9e4bde..3a959373fd97 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3403,6 +3403,37 @@ invalid, if invalid pages are written to (e.g. after the end of memory)
 or if no page table is present for the addresses (e.g. when using
 hugepages).
 
+4.109 KVM_HYPERV_EVENTFD
+
+Capability: KVM_CAP_HYPERV_EVENTFD
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_hyperv_eventfd (in)
+
+This ioctl (un)registers an eventfd to receive notifications from the guest on
+the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without
+causing a user exit.
+
+struct kvm_hyperv_eventfd {
+	__u32 conn_id;
+	__s32 fd;
+	__u32 flags;
+	__u32 padding[3];
+};
+
+The conn_id field should fit within 24 bits:
+
+#define KVM_HYPERV_CONN_ID_MASK		0x00ffffff
+
+The acceptable values for the flags field are:
+
+#define KVM_HYPERV_EVENTFD_DEASSIGN	(1 << 0)
+
+Returns: 0 on success,
+	-EINVAL if conn_id or flags is outside the allowed range
+	-ENOENT on deassign if the conn_id isn't registered
+	-EEXIST on assign if the conn_id is already registered
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 516798431328..6a9914752a84 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -752,6 +752,8 @@ struct kvm_hv {
 	u64 hv_crash_ctl;
 
 	HV_REFERENCE_TSC_PAGE tsc_ref;
+
+	struct idr conn_to_evt;
 };
 
 enum kvm_irqchip_mode {
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index cc2468244ca2..837465d69c6d 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -90,5 +90,6 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 
 void kvm_hv_init_vm(struct kvm *kvm);
 void kvm_hv_destroy_vm(struct kvm *kvm);
+int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
 
 #endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 496e59a2738b..7a871e7fb5df 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_HYPERV_SYNIC2 148
 #define KVM_CAP_HYPERV_VP_INDEX 149
 #define KVM_CAP_S390_AIS_MIGRATION 150
+#define KVM_CAP_HYPERV_EVENTFD 151
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1359,6 +1360,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_S390_GET_CMMA_BITS      _IOWR(KVMIO, 0xb8, struct kvm_s390_cmma_log)
 #define KVM_S390_SET_CMMA_BITS      _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
 
+#define KVM_HYPERV_EVENTFD	_IOW(KVMIO,  0xba, struct kvm_hyperv_eventfd)
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
@@ -1419,4 +1422,14 @@ struct kvm_assigned_msix_entry {
 #define KVM_ARM_DEV_EL1_PTIMER		(1 << 1)
 #define KVM_ARM_DEV_PMU			(1 << 2)
 
+struct kvm_hyperv_eventfd {
+	__u32 conn_id;
+	__s32 fd;
+	__u32 flags;
+	__u32 padding[3];
+};
+
+#define KVM_HYPERV_CONN_ID_MASK		0x00ffffff
+#define KVM_HYPERV_EVENTFD_DEASSIGN	(1 << 0)
+
 #endif /* __LINUX_KVM_H */
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 015fb06c7522..c1541dccf14d 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -29,6 +29,7 @@
 #include <linux/kvm_host.h>
 #include <linux/highmem.h>
 #include <linux/sched/cputime.h>
+#include <linux/eventfd.h>
 
 #include <asm/apicdef.h>
 #include <trace/events/kvm.h>
@@ -1226,6 +1227,50 @@ static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
+{
+	u16 ret;
+	u32 conn_id, flag_no;
+	int idx;
+	struct eventfd_ctx *eventfd;
+
+	if (unlikely(!fast)) {
+		gpa_t gpa = param;
+
+		if ((gpa & (__alignof__(param) - 1)) ||
+		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
+			return HV_STATUS_INVALID_ALIGNMENT;
+
+		idx = srcu_read_lock(&vcpu->kvm->srcu);
+		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
+		srcu_read_unlock(&vcpu->kvm->srcu, idx);
+
+		if (ret < 0)
+			return HV_STATUS_INSUFFICIENT_MEMORY;
+	}
+
+	/*
+	 * the signaled event number is made up of a 24bit "connection id" and
+	 * a 16bit "flag number"; on the hypervisor side it's only their sum
+	 * that matters
+	 */
+	conn_id = param;
+	flag_no = param >> 32;
+	if ((conn_id & ~KVM_HYPERV_CONN_ID_MASK) || (flag_no & 0xffff0000))
+		return HV_STATUS_INVALID_CONNECTION_ID;
+	conn_id += flag_no;
+	if (conn_id & ~KVM_HYPERV_CONN_ID_MASK)
+		return HV_STATUS_INVALID_CONNECTION_ID;
+
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+	eventfd = idr_find(&vcpu->kvm->arch.hyperv.conn_to_evt, conn_id);
+	if (eventfd)
+		eventfd_signal(eventfd, 1);
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+
+	return eventfd ? HV_STATUS_SUCCESS : HV_STATUS_INVALID_CONNECTION_ID;
+}
+
 int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 {
 	u64 param, ingpa, outgpa, ret;
@@ -1276,8 +1321,12 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	case HVCALL_NOTIFY_LONG_SPIN_WAIT:
 		kvm_vcpu_on_spin(vcpu, true);
 		break;
-	case HVCALL_POST_MESSAGE:
 	case HVCALL_SIGNAL_EVENT:
+		res = kvm_hvcall_signal_event(vcpu, fast, ingpa);
+		if (res != HV_STATUS_INVALID_CONNECTION_ID)
+			break;
+		/* maybe userspace knows this conn_id: fall through */
+	case HVCALL_POST_MESSAGE:
 		/* don't bother userspace if it has no way to handle it */
 		if (!vcpu_to_synic(vcpu)->active) {
 			res = HV_STATUS_INVALID_HYPERCALL_CODE;
@@ -1305,8 +1354,67 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 void kvm_hv_init_vm(struct kvm *kvm)
 {
 	mutex_init(&kvm->arch.hyperv.hv_lock);
+	idr_init(&kvm->arch.hyperv.conn_to_evt);
 }
 
 void kvm_hv_destroy_vm(struct kvm *kvm)
 {
+	struct eventfd_ctx *eventfd;
+	int i;
+
+	idr_for_each_entry(&kvm->arch.hyperv.conn_to_evt, eventfd, i)
+		eventfd_ctx_put(eventfd);
+	idr_destroy(&kvm->arch.hyperv.conn_to_evt);
+}
+
+static int kvm_hv_eventfd_assign(struct kvm *kvm, u32 conn_id, int fd)
+{
+	struct kvm_hv *hv = &kvm->arch.hyperv;
+	struct eventfd_ctx *eventfd;
+	int ret;
+
+	eventfd = eventfd_ctx_fdget(fd);
+	if (IS_ERR(eventfd))
+		return PTR_ERR(eventfd);
+
+	mutex_lock(&hv->hv_lock);
+	ret = idr_alloc(&hv->conn_to_evt, eventfd, conn_id, conn_id + 1,
+			GFP_KERNEL);
+	mutex_unlock(&hv->hv_lock);
+
+	if (ret >= 0)
+		return 0;
+
+	if (ret == -ENOSPC)
+		ret = -EEXIST;
+	eventfd_ctx_put(eventfd);
+	return ret;
+}
+
+static int kvm_hv_eventfd_deassign(struct kvm *kvm, u32 conn_id)
+{
+	struct kvm_hv *hv = &kvm->arch.hyperv;
+	struct eventfd_ctx *eventfd;
+
+	mutex_lock(&hv->hv_lock);
+	eventfd = idr_remove(&hv->conn_to_evt, conn_id);
+	mutex_unlock(&hv->hv_lock);
+
+	if (!eventfd)
+		return -ENOENT;
+
+	synchronize_srcu(&kvm->srcu);
+	eventfd_ctx_put(eventfd);
+	return 0;
+}
+
+int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args)
+{
+	if ((args->flags & ~KVM_HYPERV_EVENTFD_DEASSIGN) ||
+	    (args->conn_id & ~KVM_HYPERV_CONN_ID_MASK))
+		return -EINVAL;
+
+	if (args->flags == KVM_HYPERV_EVENTFD_DEASSIGN)
+		return kvm_hv_eventfd_deassign(kvm, args->conn_id);
+	return kvm_hv_eventfd_assign(kvm, args->conn_id, args->fd);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d17cf7900138..1c43d262da14 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2701,6 +2701,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_HYPERV_SYNIC:
 	case KVM_CAP_HYPERV_SYNIC2:
 	case KVM_CAP_HYPERV_VP_INDEX:
+	case KVM_CAP_HYPERV_EVENTFD:
 	case KVM_CAP_PCI_SEGMENT:
 	case KVM_CAP_DEBUGREGS:
 	case KVM_CAP_X86_ROBUST_SINGLESTEP:
@@ -4295,6 +4296,15 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = kvm_vm_ioctl_enable_cap(kvm, &cap);
 		break;
 	}
+	case KVM_HYPERV_EVENTFD: {
+		struct kvm_hyperv_eventfd hvevfd;
+
+		r = -EFAULT;
+		if (copy_from_user(&hvevfd, argp, sizeof(hvevfd)))
+			goto out;
+		r = kvm_vm_ioctl_hv_eventfd(kvm, &hvevfd);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-12 16:07 ` [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd Roman Kagan
@ 2017-12-12 16:22   ` Paolo Bonzini
  2017-12-12 16:29     ` David Hildenbrand
  2017-12-12 17:03     ` Roman Kagan
  2017-12-13  9:51   ` David Hildenbrand
  2017-12-13  9:55   ` David Hildenbrand
  2 siblings, 2 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-12-12 16:22 UTC (permalink / raw)
  To: Roman Kagan, kvm, Radim Krčmář
  Cc: Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov,
	David Hildenbrand

On 12/12/2017 17:07, Roman Kagan wrote:
> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +

The lock/unlock is not needed (vcpu_enter_guest -> vmx_handle_exit ->
handle_vmcall -> kvm_emulate_hypercall -> kvm_hv_hypercall ->
kvm_hvcall_signal_event).  I'll drop it.

Thanks,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-12 16:22   ` Paolo Bonzini
@ 2017-12-12 16:29     ` David Hildenbrand
  2017-12-12 18:18       ` Roman Kagan
  2017-12-12 17:03     ` Roman Kagan
  1 sibling, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2017-12-12 16:29 UTC (permalink / raw)
  To: Paolo Bonzini, Roman Kagan, kvm, Radim Krčmář
  Cc: Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On 12.12.2017 17:22, Paolo Bonzini wrote:
> On 12/12/2017 17:07, Roman Kagan wrote:
>> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
>> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
>> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
>> +
> 
> The lock/unlock is not needed (vcpu_enter_guest -> vmx_handle_exit ->
> handle_vmcall -> kvm_emulate_hypercall -> kvm_hv_hypercall ->
> kvm_hvcall_signal_event).  I'll drop it.
> 
> Thanks,
> 

Feel free to add my

Reviewed-by: David Hildenbrand <david@redhat.com>

(although it would be good to have another look at the hyperv specific
conn_id handling)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-12 16:22   ` Paolo Bonzini
  2017-12-12 16:29     ` David Hildenbrand
@ 2017-12-12 17:03     ` Roman Kagan
  1 sibling, 0 replies; 21+ messages in thread
From: Roman Kagan @ 2017-12-12 17:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov,
	David Hildenbrand

On Tue, Dec 12, 2017 at 05:22:20PM +0100, Paolo Bonzini wrote:
> On 12/12/2017 17:07, Roman Kagan wrote:
> > +		idx = srcu_read_lock(&vcpu->kvm->srcu);
> > +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
> > +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
> > +
> 
> The lock/unlock is not needed (vcpu_enter_guest -> vmx_handle_exit ->
> handle_vmcall -> kvm_emulate_hypercall -> kvm_hv_hypercall ->
> kvm_hvcall_signal_event).  I'll drop it.

Oh crap, indeed.  I'll respin soonish.

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-12 16:29     ` David Hildenbrand
@ 2017-12-12 18:18       ` Roman Kagan
  2017-12-12 18:20         ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Roman Kagan @ 2017-12-12 18:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Paolo Bonzini, kvm, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On Tue, Dec 12, 2017 at 05:29:02PM +0100, David Hildenbrand wrote:
> On 12.12.2017 17:22, Paolo Bonzini wrote:
> > On 12/12/2017 17:07, Roman Kagan wrote:
> >> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
> >> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
> >> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
> >> +
> > 
> > The lock/unlock is not needed (vcpu_enter_guest -> vmx_handle_exit ->
> > handle_vmcall -> kvm_emulate_hypercall -> kvm_hv_hypercall ->
> > kvm_hvcall_signal_event).  I'll drop it.
> > 
> > Thanks,
> > 
> 
> Feel free to add my
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> (although it would be good to have another look at the hyperv specific
> conn_id handling)

Are you talking about this weird flag_no part of the hypercall parameter
which we add to the conn_id?

Yes this is something I'm not quite comfortable with: we've never seen
it being anything but 0, and Linux guest drivers ignore its existence
altogether (and there seem to be no fields in the vmbus protocol
structures to let the guest know of the allowed numbers there).

So I may be misinterpreting the spec; and it may be prudent just to
reject all flag_no != 0 hypercalls until we actually start seeing ones,
and then research how to handle them correctly.  What do you think?

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-12 18:18       ` Roman Kagan
@ 2017-12-12 18:20         ` David Hildenbrand
  2017-12-13  8:41           ` Roman Kagan
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2017-12-12 18:20 UTC (permalink / raw)
  To: Roman Kagan, Paolo Bonzini, kvm, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On 12.12.2017 19:18, Roman Kagan wrote:
> On Tue, Dec 12, 2017 at 05:29:02PM +0100, David Hildenbrand wrote:
>> On 12.12.2017 17:22, Paolo Bonzini wrote:
>>> On 12/12/2017 17:07, Roman Kagan wrote:
>>>> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
>>>> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
>>>> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
>>>> +
>>>
>>> The lock/unlock is not needed (vcpu_enter_guest -> vmx_handle_exit ->
>>> handle_vmcall -> kvm_emulate_hypercall -> kvm_hv_hypercall ->
>>> kvm_hvcall_signal_event).  I'll drop it.
>>>
>>> Thanks,
>>>
>>
>> Feel free to add my
>>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>
>> (although it would be good to have another look at the hyperv specific
>> conn_id handling)
> 
> Are you talking about this weird flag_no part of the hypercall parameter
> which we add to the conn_id?
> 
> Yes this is something I'm not quite comfortable with: we've never seen
> it being anything but 0, and Linux guest drivers ignore its existence
> altogether (and there seem to be no fields in the vmbus protocol
> structures to let the guest know of the allowed numbers there).
> 
> So I may be misinterpreting the spec; and it may be prudent just to
> reject all flag_no != 0 hypercalls until we actually start seeing ones,
> and then research how to handle them correctly.  What do you think?

Could be done, but if it isn't too much trouble, let's try to get it
right from the beginning.

Can you point me at the spec so I can verify?

> 
> Thanks,
> Roman.
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-12 18:20         ` David Hildenbrand
@ 2017-12-13  8:41           ` Roman Kagan
  2017-12-13  9:35             ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Roman Kagan @ 2017-12-13  8:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Paolo Bonzini, kvm, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On Tue, Dec 12, 2017 at 07:20:52PM +0100, David Hildenbrand wrote:
> On 12.12.2017 19:18, Roman Kagan wrote:
> > On Tue, Dec 12, 2017 at 05:29:02PM +0100, David Hildenbrand wrote:
> >> On 12.12.2017 17:22, Paolo Bonzini wrote:
> >>> On 12/12/2017 17:07, Roman Kagan wrote:
> >> (although it would be good to have another look at the hyperv specific
> >> conn_id handling)
> > 
> > Are you talking about this weird flag_no part of the hypercall parameter
> > which we add to the conn_id?
> > 
> > Yes this is something I'm not quite comfortable with: we've never seen
> > it being anything but 0, and Linux guest drivers ignore its existence
> > altogether (and there seem to be no fields in the vmbus protocol
> > structures to let the guest know of the allowed numbers there).
> > 
> > So I may be misinterpreting the spec; and it may be prudent just to
> > reject all flag_no != 0 hypercalls until we actually start seeing ones,
> > and then research how to handle them correctly.  What do you think?
> 
> Could be done, but if it isn't too much trouble, let's try to get it
> right from the beginning.
> 
> Can you point me at the spec so I can verify?

Sure.  The Hyper-V spec is

https://github.com/Microsoft/Virtualization-Documentation/raw/master/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0b.pdf

Hope you can read Microsoftish better than me.  The relevant part is
chapter 11, "Inter-Partition Communication"; the hypercall itself is
described in "11.11.2 HvSignalEvent".

However, the VMBus guest drivers in Linux use only a limited subset of
those.  In particular, the concept of ports seems completely unused, the
connections appear pre-established, and the parent partition informs the
guest of the connection to signal for a specific channel via a u32 field
in the channel offer message.
(See drivers/hv/channel_mgmt.c:vmbus_onoffer for how the connection_id
from the offer is recorded, and drivers/hv/connection.c:vmbus_set_event
for how it's used for signaling.)

So I'm tempted to think this extra flag number is just the usual
overdesign, and I'd better require it to be zero.  This is certainly the
case for all the VMBus devices currently supported by the Linux guest
drivers, so we should be good with it too.

Roman.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-13  8:41           ` Roman Kagan
@ 2017-12-13  9:35             ` David Hildenbrand
  2017-12-13 10:00               ` Roman Kagan
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2017-12-13  9:35 UTC (permalink / raw)
  To: Roman Kagan, Paolo Bonzini, kvm, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On 13.12.2017 09:41, Roman Kagan wrote:
> On Tue, Dec 12, 2017 at 07:20:52PM +0100, David Hildenbrand wrote:
>> On 12.12.2017 19:18, Roman Kagan wrote:
>>> On Tue, Dec 12, 2017 at 05:29:02PM +0100, David Hildenbrand wrote:
>>>> On 12.12.2017 17:22, Paolo Bonzini wrote:
>>>>> On 12/12/2017 17:07, Roman Kagan wrote:
>>>> (although it would be good to have another look at the hyperv specific
>>>> conn_id handling)
>>>
>>> Are you talking about this weird flag_no part of the hypercall parameter
>>> which we add to the conn_id?
>>>
>>> Yes this is something I'm not quite comfortable with: we've never seen
>>> it being anything but 0, and Linux guest drivers ignore its existence
>>> altogether (and there seem to be no fields in the vmbus protocol
>>> structures to let the guest know of the allowed numbers there).
>>>
>>> So I may be misinterpreting the spec; and it may be prudent just to
>>> reject all flag_no != 0 hypercalls until we actually start seeing ones,
>>> and then research how to handle them correctly.  What do you think?
>>
>> Could be done, but if it isn't too much trouble, let's try to get it
>> right from the beginning.
>>
>> Can you point me at the spec so I can verify?
> 
> Sure.  The Hyper-V spec is
> 
> https://github.com/Microsoft/Virtualization-Documentation/raw/master/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0b.pdf
> 
> Hope you can read Microsoftish better than me.  The relevant part is
> chapter 11, "Inter-Partition Communication"; the hypercall itself is
> described in "11.11.2 HvSignalEvent".
> 
> However, the VMBus guest drivers in Linux use only a limited subset of
> those.  In particular, the concept of ports seems completely unused, the
> connections appear pre-established, and the parent partition informs the
> guest of the connection to signal for a specific channel via a u32 field
> in the channel offer message.
> (See drivers/hv/channel_mgmt.c:vmbus_onoffer for how the connection_id
> from the offer is recorded, and drivers/hv/connection.c:vmbus_set_event
> for how it's used for signaling.)
> 
> So I'm tempted to think this extra flag number is just the usual
> overdesign, and I'd better require it to be zero.  This is certainly the
> case for all the VMBus devices currently supported by the Linux guest
> drivers, so we should be good with it too.
> 
> Roman.
> 

General question, what about Microsoft Windows guests?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-12 16:07 ` [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd Roman Kagan
  2017-12-12 16:22   ` Paolo Bonzini
@ 2017-12-13  9:51   ` David Hildenbrand
  2017-12-13 11:04     ` Roman Kagan
  2017-12-13  9:55   ` David Hildenbrand
  2 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2017-12-13  9:51 UTC (permalink / raw)
  To: Roman Kagan, kvm, Paolo Bonzini, Radim Krčmář
  Cc: Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov


>  
> +static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
> +{
> +	u16 ret;
> +	u32 conn_id, flag_no;
> +	int idx;
> +	struct eventfd_ctx *eventfd;
> +
> +	if (unlikely(!fast)) {
> +		gpa_t gpa = param;
> +
> +		if ((gpa & (__alignof__(param) - 1)) ||
> +		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
> +			return HV_STATUS_INVALID_ALIGNMENT;
> +
> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +
> +		if (ret < 0)
> +			return HV_STATUS_INSUFFICIENT_MEMORY;
> +	}
> +
> +	/*
> +	 * the signaled event number is made up of a 24bit "connection id" and
> +	 * a 16bit "flag number"; on the hypervisor side it's only their sum
> +	 * that matters
> +	 */
> +	conn_id = param;
> +	flag_no = param >> 32;
> +	if ((conn_id & ~KVM_HYPERV_CONN_ID_MASK) || (flag_no & 0xffff0000))

1. You seem to check for RsvdZ here (flag_no & 0xffff0000),
  "HV_STATUS_INVALID_CONNECTION_ID: The specified connection ID is
   invalid."
   -> Shouldn't the reserved field be simply ignored?

2. KVM_HYPERV_CONN_ID_MASK. "ConnectionId (4 bytes)".

Why should it only be 3bytes?

(I am pretty sure I am missing something in the document here :) )


> +		return HV_STATUS_INVALID_CONNECTION_ID;
> +	conn_id += flag_no;

"FlagNumber specifies the relative index of the event flag that the
caller wants to set within the target"

I am not sure if we should simply change the connection id here. This
seems to be an additional parameter to be passed to the one connection id.


> +	if (conn_id & ~KVM_HYPERV_CONN_ID_MASK)
> +		return HV_STATUS_INVALID_CONNECTION_ID;
> +
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> +	eventfd = idr_find(&vcpu->kvm->arch.hyperv.conn_to_evt, conn_id);
> +	if (eventfd)
> +		eventfd_signal(eventfd, 1);
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +
> +	return eventfd ? HV_STATUS_SUCCESS : HV_STATUS_INVALID_CONNECTION_ID;
> +}
> +


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-12 16:07 ` [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd Roman Kagan
  2017-12-12 16:22   ` Paolo Bonzini
  2017-12-13  9:51   ` David Hildenbrand
@ 2017-12-13  9:55   ` David Hildenbrand
  2017-12-13 11:44     ` Roman Kagan
  2 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2017-12-13  9:55 UTC (permalink / raw)
  To: Roman Kagan, kvm, Paolo Bonzini, Radim Krčmář
  Cc: Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov


> +static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
> +{
> +	u16 ret;
> +	u32 conn_id, flag_no;
> +	int idx;
> +	struct eventfd_ctx *eventfd;
> +
> +	if (unlikely(!fast)) {
> +		gpa_t gpa = param;
> +
> +		if ((gpa & (__alignof__(param) - 1)) ||
> +		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
> +			return HV_STATUS_INVALID_ALIGNMENT;
> +
> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +
> +		if (ret < 0)
> +			return HV_STATUS_INSUFFICIENT_MEMORY;

"... event flags do not require any buffer allocation or queuing within
the hypervisor, so HvSignalEvent will never fail due to insufficient
resources."


> +	}
> +
> +	/*
> +	 * the signaled event number is made up of a 24bit "connection id" and
> +	 * a 16bit "flag number"; on the hypervisor side it's only their sum
> +	 * that matters

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-13  9:35             ` David Hildenbrand
@ 2017-12-13 10:00               ` Roman Kagan
  0 siblings, 0 replies; 21+ messages in thread
From: Roman Kagan @ 2017-12-13 10:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Paolo Bonzini, kvm, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On Wed, Dec 13, 2017 at 10:35:40AM +0100, David Hildenbrand wrote:
> On 13.12.2017 09:41, Roman Kagan wrote:
> > On Tue, Dec 12, 2017 at 07:20:52PM +0100, David Hildenbrand wrote:
> >> On 12.12.2017 19:18, Roman Kagan wrote:
> >>> On Tue, Dec 12, 2017 at 05:29:02PM +0100, David Hildenbrand wrote:
> >>>> On 12.12.2017 17:22, Paolo Bonzini wrote:
> >>>>> On 12/12/2017 17:07, Roman Kagan wrote:
> >>>> (although it would be good to have another look at the hyperv specific
> >>>> conn_id handling)
> >>>
> >>> Are you talking about this weird flag_no part of the hypercall parameter
> >>> which we add to the conn_id?
> >>>
> >>> Yes this is something I'm not quite comfortable with: we've never seen
> >>> it being anything but 0, and Linux guest drivers ignore its existence
> >>> altogether (and there seem to be no fields in the vmbus protocol
> >>> structures to let the guest know of the allowed numbers there).
> >>>
> >>> So I may be misinterpreting the spec; and it may be prudent just to
> >>> reject all flag_no != 0 hypercalls until we actually start seeing ones,
> >>> and then research how to handle them correctly.  What do you think?
> >>
> >> Could be done, but if it isn't too much trouble, let's try to get it
> >> right from the beginning.
> >>
> >> Can you point me at the spec so I can verify?
> > 
> > Sure.  The Hyper-V spec is
> > 
> > https://github.com/Microsoft/Virtualization-Documentation/raw/master/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0b.pdf
> > 
> > Hope you can read Microsoftish better than me.  The relevant part is
> > chapter 11, "Inter-Partition Communication"; the hypercall itself is
> > described in "11.11.2 HvSignalEvent".
> > 
> > However, the VMBus guest drivers in Linux use only a limited subset of
> > those.  In particular, the concept of ports seems completely unused, the
> > connections appear pre-established, and the parent partition informs the
> > guest of the connection to signal for a specific channel via a u32 field
> > in the channel offer message.
> > (See drivers/hv/channel_mgmt.c:vmbus_onoffer for how the connection_id
> > from the offer is recorded, and drivers/hv/connection.c:vmbus_set_event
> > for how it's used for signaling.)
> > 
> > So I'm tempted to think this extra flag number is just the usual
> > overdesign, and I'd better require it to be zero.  This is certainly the
> > case for all the VMBus devices currently supported by the Linux guest
> > drivers, so we should be good with it too.
> > 
> > Roman.
> > 
> 
> General question, what about Microsoft Windows guests?

Well, we haven't tried intercepting this hypercall in Windows guests on
the real Hyper-V to see if they ever use non-zero flag number.

On our implementation of VMBus in QEMU, based on the protocol deduced
from the Linux guest drivers, the Windows guests abide by the same
rules.  At least it was enough to get VMBus scsi, network, and util
devices working.

Roman.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-13  9:51   ` David Hildenbrand
@ 2017-12-13 11:04     ` Roman Kagan
  2017-12-13 11:14       ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Roman Kagan @ 2017-12-13 11:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, Paolo Bonzini, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On Wed, Dec 13, 2017 at 10:51:06AM +0100, David Hildenbrand wrote:
> > +static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
> > +{
> > +	u16 ret;
> > +	u32 conn_id, flag_no;
> > +	int idx;
> > +	struct eventfd_ctx *eventfd;
> > +
> > +	if (unlikely(!fast)) {
> > +		gpa_t gpa = param;
> > +
> > +		if ((gpa & (__alignof__(param) - 1)) ||
> > +		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
> > +			return HV_STATUS_INVALID_ALIGNMENT;
> > +
> > +		idx = srcu_read_lock(&vcpu->kvm->srcu);
> > +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
> > +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
> > +
> > +		if (ret < 0)
> > +			return HV_STATUS_INSUFFICIENT_MEMORY;
> > +	}
> > +
> > +	/*
> > +	 * the signaled event number is made up of a 24bit "connection id" and
> > +	 * a 16bit "flag number"; on the hypervisor side it's only their sum
> > +	 * that matters
> > +	 */
> > +	conn_id = param;
> > +	flag_no = param >> 32;
> > +	if ((conn_id & ~KVM_HYPERV_CONN_ID_MASK) || (flag_no & 0xffff0000))
> 
> 1. You seem to check for RsvdZ here (flag_no & 0xffff0000),
>   "HV_STATUS_INVALID_CONNECTION_ID: The specified connection ID is
>    invalid."
>    -> Shouldn't the reserved field be simply ignored?

It's "reserved zero", so I think it's correct to reject non-zero values.

> 2. KVM_HYPERV_CONN_ID_MASK. "ConnectionId (4 bytes)".
> 
> Why should it only be 3bytes?
> 
> (I am pretty sure I am missing something in the document here :) )


"11.10.7 Connections

Connections are identified by 32-bit IDs. The high 8 bits are reserved
and must be zero.[...]"

> > +		return HV_STATUS_INVALID_CONNECTION_ID;
> > +	conn_id += flag_no;
> 
> "FlagNumber specifies the relative index of the event flag that the
> caller wants to set within the target"
> 
> I am not sure if we should simply change the connection id here. This
> seems to be an additional parameter to be passed to the one connection id.

Right.  I think I misinterpreted this part back when I implemented it in
QEMU (first submission was this summer).  We lived happily with that
code, because the FlagNuber was always zero, and then I copied that over
to this KVM patch.

I think requiring it to be zero is a better choice; I've prepared v7 to
that end.

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-13 11:04     ` Roman Kagan
@ 2017-12-13 11:14       ` David Hildenbrand
  2017-12-13 12:16         ` Roman Kagan
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2017-12-13 11:14 UTC (permalink / raw)
  To: Roman Kagan, kvm, Paolo Bonzini, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On 13.12.2017 12:04, Roman Kagan wrote:
> On Wed, Dec 13, 2017 at 10:51:06AM +0100, David Hildenbrand wrote:
>>> +static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
>>> +{
>>> +	u16 ret;
>>> +	u32 conn_id, flag_no;
>>> +	int idx;
>>> +	struct eventfd_ctx *eventfd;
>>> +
>>> +	if (unlikely(!fast)) {
>>> +		gpa_t gpa = param;
>>> +
>>> +		if ((gpa & (__alignof__(param) - 1)) ||
>>> +		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
>>> +			return HV_STATUS_INVALID_ALIGNMENT;
>>> +
>>> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
>>> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
>>> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
>>> +
>>> +		if (ret < 0)
>>> +			return HV_STATUS_INSUFFICIENT_MEMORY;
>>> +	}
>>> +
>>> +	/*
>>> +	 * the signaled event number is made up of a 24bit "connection id" and
>>> +	 * a 16bit "flag number"; on the hypervisor side it's only their sum
>>> +	 * that matters
>>> +	 */
>>> +	conn_id = param;
>>> +	flag_no = param >> 32;
>>> +	if ((conn_id & ~KVM_HYPERV_CONN_ID_MASK) || (flag_no & 0xffff0000))
>>
>> 1. You seem to check for RsvdZ here (flag_no & 0xffff0000),
>>   "HV_STATUS_INVALID_CONNECTION_ID: The specified connection ID is
>>    invalid."
>>    -> Shouldn't the reserved field be simply ignored?
> 
> It's "reserved zero", so I think it's correct to reject non-zero values.

If it's not documented, guess it is ignored? At least
"HV_STATUS_INVALID_CONNECTION_ID" is wrong, as this is not the
connection id.

> 
>> 2. KVM_HYPERV_CONN_ID_MASK. "ConnectionId (4 bytes)".
>>
>> Why should it only be 3bytes?
>>
>> (I am pretty sure I am missing something in the document here :) )
> 
> 
> "11.10.7 Connections
> 
> Connections are identified by 32-bit IDs. The high 8 bits are reserved
> and must be zero.[...]"

This makes then sense indeed!

> 
>>> +		return HV_STATUS_INVALID_CONNECTION_ID;
>>> +	conn_id += flag_no;
>>
>> "FlagNumber specifies the relative index of the event flag that the
>> caller wants to set within the target"
>>
>> I am not sure if we should simply change the connection id here. This
>> seems to be an additional parameter to be passed to the one connection id.
> 
> Right.  I think I misinterpreted this part back when I implemented it in
> QEMU (first submission was this summer).  We lived happily with that
> code, because the FlagNuber was always zero, and then I copied that over
> to this KVM patch.
> 
> I think requiring it to be zero is a better choice; I've prepared v7 to
> that end.

But which return value to use? HV_STATUS_INVALID_PARAMETER?

> 
> Thanks,
> Roman.
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-13  9:55   ` David Hildenbrand
@ 2017-12-13 11:44     ` Roman Kagan
  2017-12-13 11:46       ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Roman Kagan @ 2017-12-13 11:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, Paolo Bonzini, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On Wed, Dec 13, 2017 at 10:55:02AM +0100, David Hildenbrand wrote:
> 
> > +static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
> > +{
> > +	u16 ret;
> > +	u32 conn_id, flag_no;
> > +	int idx;
> > +	struct eventfd_ctx *eventfd;
> > +
> > +	if (unlikely(!fast)) {
> > +		gpa_t gpa = param;
> > +
> > +		if ((gpa & (__alignof__(param) - 1)) ||
> > +		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
> > +			return HV_STATUS_INVALID_ALIGNMENT;
> > +
> > +		idx = srcu_read_lock(&vcpu->kvm->srcu);
> > +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
> > +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
> > +
> > +		if (ret < 0)
> > +			return HV_STATUS_INSUFFICIENT_MEMORY;
> 
> "... event flags do not require any buffer allocation or queuing within
> the hypervisor, so HvSignalEvent will never fail due to insufficient
> resources."

Ouch.  I wonder what else to map -EFAULT to then...  Looks like
HV_STATUS_INVALID_ALIGNMENT is the most appropriate (as in "The input or
output GPA pointer is not within the bounds of the GPA space.")

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-13 11:44     ` Roman Kagan
@ 2017-12-13 11:46       ` David Hildenbrand
  2017-12-13 11:51         ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2017-12-13 11:46 UTC (permalink / raw)
  To: Roman Kagan, kvm, Paolo Bonzini, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On 13.12.2017 12:44, Roman Kagan wrote:
> On Wed, Dec 13, 2017 at 10:55:02AM +0100, David Hildenbrand wrote:
>>
>>> +static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
>>> +{
>>> +	u16 ret;
>>> +	u32 conn_id, flag_no;
>>> +	int idx;
>>> +	struct eventfd_ctx *eventfd;
>>> +
>>> +	if (unlikely(!fast)) {
>>> +		gpa_t gpa = param;
>>> +
>>> +		if ((gpa & (__alignof__(param) - 1)) ||
>>> +		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
>>> +			return HV_STATUS_INVALID_ALIGNMENT;
>>> +
>>> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
>>> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
>>> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
>>> +
>>> +		if (ret < 0)
>>> +			return HV_STATUS_INSUFFICIENT_MEMORY;
>>
>> "... event flags do not require any buffer allocation or queuing within
>> the hypervisor, so HvSignalEvent will never fail due to insufficient
>> resources."
> 
> Ouch.  I wonder what else to map -EFAULT to then...  Looks like
> HV_STATUS_INVALID_ALIGNMENT is the most appropriate (as in "The input or
> output GPA pointer is not within the bounds of the GPA space.")
> 
> Thanks,
> Roman.
> 

It is likely happen rearelyn, but usually if we are out of memory we
should report to user space. (chances that we won't be able to continue
for longer either way are very high)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-13 11:46       ` David Hildenbrand
@ 2017-12-13 11:51         ` David Hildenbrand
  2017-12-13 11:57           ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2017-12-13 11:51 UTC (permalink / raw)
  To: Roman Kagan, kvm, Paolo Bonzini, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On 13.12.2017 12:46, David Hildenbrand wrote:
> On 13.12.2017 12:44, Roman Kagan wrote:
>> On Wed, Dec 13, 2017 at 10:55:02AM +0100, David Hildenbrand wrote:
>>>
>>>> +static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
>>>> +{
>>>> +	u16 ret;
>>>> +	u32 conn_id, flag_no;
>>>> +	int idx;
>>>> +	struct eventfd_ctx *eventfd;
>>>> +
>>>> +	if (unlikely(!fast)) {
>>>> +		gpa_t gpa = param;
>>>> +
>>>> +		if ((gpa & (__alignof__(param) - 1)) ||
>>>> +		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
>>>> +			return HV_STATUS_INVALID_ALIGNMENT;
>>>> +
>>>> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
>>>> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
>>>> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
>>>> +
>>>> +		if (ret < 0)
>>>> +			return HV_STATUS_INSUFFICIENT_MEMORY;
>>>
>>> "... event flags do not require any buffer allocation or queuing within
>>> the hypervisor, so HvSignalEvent will never fail due to insufficient
>>> resources."
>>
>> Ouch.  I wonder what else to map -EFAULT to then...  Looks like
>> HV_STATUS_INVALID_ALIGNMENT is the most appropriate (as in "The input or
>> output GPA pointer is not within the bounds of the GPA space.")
>>
>> Thanks,
>> Roman.
>>
> 
> It is likely happen rearelyn, but usually if we are out of memory we
> should report to user space. (chances that we won't be able to continue
> for longer either way are very high)
> 
(I implied -ENOMEM is translated to -EFAULT) it can of course happen if
the address is wrong.

Also wonder how to deal with that. Could it be that we even have to
trigger a page fault in the guest?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-13 11:51         ` David Hildenbrand
@ 2017-12-13 11:57           ` David Hildenbrand
  2017-12-13 12:58             ` Roman Kagan
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2017-12-13 11:57 UTC (permalink / raw)
  To: Roman Kagan, kvm, Paolo Bonzini, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On 13.12.2017 12:51, David Hildenbrand wrote:
> On 13.12.2017 12:46, David Hildenbrand wrote:
>> On 13.12.2017 12:44, Roman Kagan wrote:
>>> On Wed, Dec 13, 2017 at 10:55:02AM +0100, David Hildenbrand wrote:
>>>>
>>>>> +static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
>>>>> +{
>>>>> +	u16 ret;
>>>>> +	u32 conn_id, flag_no;
>>>>> +	int idx;
>>>>> +	struct eventfd_ctx *eventfd;
>>>>> +
>>>>> +	if (unlikely(!fast)) {
>>>>> +		gpa_t gpa = param;
>>>>> +
>>>>> +		if ((gpa & (__alignof__(param) - 1)) ||
>>>>> +		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
>>>>> +			return HV_STATUS_INVALID_ALIGNMENT;
>>>>> +
>>>>> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
>>>>> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
>>>>> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
>>>>> +
>>>>> +		if (ret < 0)
>>>>> +			return HV_STATUS_INSUFFICIENT_MEMORY;
>>>>
>>>> "... event flags do not require any buffer allocation or queuing within
>>>> the hypervisor, so HvSignalEvent will never fail due to insufficient
>>>> resources."
>>>
>>> Ouch.  I wonder what else to map -EFAULT to then...  Looks like
>>> HV_STATUS_INVALID_ALIGNMENT is the most appropriate (as in "The input or
>>> output GPA pointer is not within the bounds of the GPA space.")
>>>
>>> Thanks,
>>> Roman.
>>>
>>
>> It is likely happen rearelyn, but usually if we are out of memory we
>> should report to user space. (chances that we won't be able to continue
>> for longer either way are very high)
>>
> (I implied -ENOMEM is translated to -EFAULT) it can of course happen if
> the address is wrong.
> 
> Also wonder how to deal with that. Could it be that we even have to
> trigger a page fault in the guest?
> 


The hypervisor will validate that the calling partition can read from
the input page before executing the
requested hypercall. This validation consists of two checks: the
specified GPA is mapped and the GPA is
marked readable. If either of these tests fails, the hypervisor
generates a memory intercept message.

So memory intercept message is the right thing to do I guess?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-13 11:14       ` David Hildenbrand
@ 2017-12-13 12:16         ` Roman Kagan
  0 siblings, 0 replies; 21+ messages in thread
From: Roman Kagan @ 2017-12-13 12:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, Paolo Bonzini, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On Wed, Dec 13, 2017 at 12:14:48PM +0100, David Hildenbrand wrote:
> On 13.12.2017 12:04, Roman Kagan wrote:
> > On Wed, Dec 13, 2017 at 10:51:06AM +0100, David Hildenbrand wrote:
> >>> +static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
> >>> +{
> >>> +	u16 ret;
> >>> +	u32 conn_id, flag_no;
> >>> +	int idx;
> >>> +	struct eventfd_ctx *eventfd;
> >>> +
> >>> +	if (unlikely(!fast)) {
> >>> +		gpa_t gpa = param;
> >>> +
> >>> +		if ((gpa & (__alignof__(param) - 1)) ||
> >>> +		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
> >>> +			return HV_STATUS_INVALID_ALIGNMENT;
> >>> +
> >>> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
> >>> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
> >>> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
> >>> +
> >>> +		if (ret < 0)
> >>> +			return HV_STATUS_INSUFFICIENT_MEMORY;
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * the signaled event number is made up of a 24bit "connection id" and
> >>> +	 * a 16bit "flag number"; on the hypervisor side it's only their sum
> >>> +	 * that matters
> >>> +	 */
> >>> +	conn_id = param;
> >>> +	flag_no = param >> 32;
> >>> +	if ((conn_id & ~KVM_HYPERV_CONN_ID_MASK) || (flag_no & 0xffff0000))
> >>
> >> 1. You seem to check for RsvdZ here (flag_no & 0xffff0000),
> >>   "HV_STATUS_INVALID_CONNECTION_ID: The specified connection ID is
> >>    invalid."
> >>    -> Shouldn't the reserved field be simply ignored?
> > 
> > It's "reserved zero", so I think it's correct to reject non-zero values.
> 
> If it's not documented, guess it is ignored?

It is (sort of) documented, per

"1.2 Reserved Values
[...]
Zero value (documented as RsvdZ in diagrams and ReservedZ in code
segments) – For maximum forward compatibility, clients should zero the
value within this field."

> At least "HV_STATUS_INVALID_CONNECTION_ID" is wrong, as this is not
> the connection id.

I (ab)used this return code here to fall through to userspace handling
of this hypercall.  But it may indeed make more sense to return
HV_STATUS_INVALID_HYPERCALL_INPUT, as in "A reserved bit in the
specified hypercall input value is non-zero."

> >>> +		return HV_STATUS_INVALID_CONNECTION_ID;
> >>> +	conn_id += flag_no;
> >>
> >> "FlagNumber specifies the relative index of the event flag that the
> >> caller wants to set within the target"
> >>
> >> I am not sure if we should simply change the connection id here. This
> >> seems to be an additional parameter to be passed to the one connection id.
> > 
> > Right.  I think I misinterpreted this part back when I implemented it in
> > QEMU (first submission was this summer).  We lived happily with that
> > code, because the FlagNuber was always zero, and then I copied that over
> > to this KVM patch.
> > 
> > I think requiring it to be zero is a better choice; I've prepared v7 to
> > that end.
> 
> But which return value to use? HV_STATUS_INVALID_PARAMETER?

I think we're better off resorting to the userspace for handling it (at
least to report the situation).  With the current patch this will be
HV_STATUS_INVALID_CONNECTION_ID, but I can use something else or define
a special code for that if desired.

Roman.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd
  2017-12-13 11:57           ` David Hildenbrand
@ 2017-12-13 12:58             ` Roman Kagan
  0 siblings, 0 replies; 21+ messages in thread
From: Roman Kagan @ 2017-12-13 12:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, Paolo Bonzini, Radim Krčmář,
	Denis V. Lunev, Konrad Rzeszutek Wilk, Vitaly Kuznetsov

On Wed, Dec 13, 2017 at 12:57:11PM +0100, David Hildenbrand wrote:
> On 13.12.2017 12:51, David Hildenbrand wrote:
> > On 13.12.2017 12:46, David Hildenbrand wrote:
> >> On 13.12.2017 12:44, Roman Kagan wrote:
> >>> On Wed, Dec 13, 2017 at 10:55:02AM +0100, David Hildenbrand wrote:
> >>>>
> >>>>> +static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
> >>>>> +{
> >>>>> +	u16 ret;
> >>>>> +	u32 conn_id, flag_no;
> >>>>> +	int idx;
> >>>>> +	struct eventfd_ctx *eventfd;
> >>>>> +
> >>>>> +	if (unlikely(!fast)) {
> >>>>> +		gpa_t gpa = param;
> >>>>> +
> >>>>> +		if ((gpa & (__alignof__(param) - 1)) ||
> >>>>> +		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
> >>>>> +			return HV_STATUS_INVALID_ALIGNMENT;
> >>>>> +
> >>>>> +		idx = srcu_read_lock(&vcpu->kvm->srcu);
> >>>>> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
> >>>>> +		srcu_read_unlock(&vcpu->kvm->srcu, idx);
> >>>>> +
> >>>>> +		if (ret < 0)
> >>>>> +			return HV_STATUS_INSUFFICIENT_MEMORY;
> >>>>
> >>>> "... event flags do not require any buffer allocation or queuing within
> >>>> the hypervisor, so HvSignalEvent will never fail due to insufficient
> >>>> resources."
> >>>
> >>> Ouch.  I wonder what else to map -EFAULT to then...  Looks like
> >>> HV_STATUS_INVALID_ALIGNMENT is the most appropriate (as in "The input or
> >>> output GPA pointer is not within the bounds of the GPA space.")
> >>>
> >>> Thanks,
> >>> Roman.
> >>>
> >>
> >> It is likely happen rearelyn, but usually if we are out of memory we
> >> should report to user space. (chances that we won't be able to continue
> >> for longer either way are very high)
> >>
> > (I implied -ENOMEM is translated to -EFAULT) it can of course happen if
> > the address is wrong.
> > 
> > Also wonder how to deal with that. Could it be that we even have to
> > trigger a page fault in the guest?
> > 
> 
> 
> The hypervisor will validate that the calling partition can read from
> the input page before executing the
> requested hypercall. This validation consists of two checks: the
> specified GPA is mapped and the GPA is
> marked readable. If either of these tests fails, the hypervisor
> generates a memory intercept message.
> 
> So memory intercept message is the right thing to do I guess?

The intercept messages are sent to the parent partition (whose role is
played by the hypervisor in our case) and that one is supposed to decide
what to do with it.

Triggering a pagefault in the guest doesn't look correct because the
checks you mention are related to the GPA and not to the guest virtual
address.

So I think the only option we have is to return an error code, and we
need to decide which one.  We also can fall through to the userspace,
but I can't see why the userspace would have better chances to read that
GPA.

Roman.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-12-13 12:59 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-12 16:07 [PATCH v5 0/2] kvm: x86: hyperv: guest->host event signaling via eventfd Roman Kagan
2017-12-12 16:07 ` [PATCH v5 1/2] kvm: x86: factor out kvm.arch.hyperv (de)init Roman Kagan
2017-12-12 16:07 ` [PATCH v5 2/2] kvm: x86: hyperv: guest->host event signaling via eventfd Roman Kagan
2017-12-12 16:22   ` Paolo Bonzini
2017-12-12 16:29     ` David Hildenbrand
2017-12-12 18:18       ` Roman Kagan
2017-12-12 18:20         ` David Hildenbrand
2017-12-13  8:41           ` Roman Kagan
2017-12-13  9:35             ` David Hildenbrand
2017-12-13 10:00               ` Roman Kagan
2017-12-12 17:03     ` Roman Kagan
2017-12-13  9:51   ` David Hildenbrand
2017-12-13 11:04     ` Roman Kagan
2017-12-13 11:14       ` David Hildenbrand
2017-12-13 12:16         ` Roman Kagan
2017-12-13  9:55   ` David Hildenbrand
2017-12-13 11:44     ` Roman Kagan
2017-12-13 11:46       ` David Hildenbrand
2017-12-13 11:51         ` David Hildenbrand
2017-12-13 11:57           ` David Hildenbrand
2017-12-13 12:58             ` Roman Kagan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.