All of lore.kernel.org
 help / color / mirror / Atom feed
* [KVM PATCH v4 0/3] iosignalfd
@ 2009-05-26 19:15 Gregory Haskins
  2009-05-26 19:15 ` [KVM PATCH v4 1/3] eventfd: export eventfd interfaces for module use Gregory Haskins
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Gregory Haskins @ 2009-05-26 19:15 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, davide, mtosatti

(Applies to kvm.git/master:74dfca0a)

This is v4 of the series.  For more details, please see the header to
patch 3/3.

This series has been tested against the kvm-eventfd unit test, and
appears to be functioning properly.  You can download this test here:

ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2

This series is ready to be considered for inclusion, pending any further
review comments.

[
   Changelog:

      v4:
	   *) Fixed a bug in the original 2/4 where the PIT failure case
              would potentially leave the io_bus components registered.
	   *) Condensed the v3 2/4 and 3/4 into one patch (2/2) since
              the patches became interdependent with the fix described above
	   *) Rebased to kvm.git/master:74dfca0a

      v3:
	   *) fixed patch 2/4 to handle error cases instead of BUG_ON
	   *) implemented same HAVE_EVENTFD protection mechanism as
              irqfd to prevent compilation errors on unsupported arches
	   *) completed testing
	   *) rebased to kvm.git/master:7391a6d5

      v2:
           *) added optional data-matching capability (via cookie field)
           *) changed name from iofd to iosignalfd
           *) added io_bus unregister function
           *) implemented deassign feature

      v1:
           *) original release (integrated into irqfd v7 series as "iofd")
]


---

Gregory Haskins (3):
      kvm: add iosignalfd support
      kvm: make io_bus interface more robust
      eventfd: export eventfd interfaces for module use


 arch/x86/kvm/i8254.c      |   34 +++++++--
 arch/x86/kvm/i8259.c      |    9 ++-
 arch/x86/kvm/x86.c        |    1 
 fs/eventfd.c              |    3 +
 include/linux/kvm.h       |   15 ++++
 include/linux/kvm_host.h  |   18 ++++-
 virt/kvm/coalesced_mmio.c |    8 ++
 virt/kvm/eventfd.c        |  162 +++++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/ioapic.c         |    9 ++-
 virt/kvm/kvm_main.c       |   60 ++++++++++++++---
 10 files changed, 290 insertions(+), 29 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [KVM PATCH v4 1/3] eventfd: export eventfd interfaces for module use
  2009-05-26 19:15 [KVM PATCH v4 0/3] iosignalfd Gregory Haskins
@ 2009-05-26 19:15 ` Gregory Haskins
  2009-05-27  8:57   ` Avi Kivity
  2009-05-26 19:15 ` [KVM PATCH v4 2/3] kvm: make io_bus interface more robust Gregory Haskins
  2009-05-26 19:15 ` [KVM PATCH v4 3/3] kvm: add iosignalfd support Gregory Haskins
  2 siblings, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2009-05-26 19:15 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, davide, mtosatti

We want to use eventfd from KVM which can be compiled as a module, so
export the interfaces.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 fs/eventfd.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 2a701d5..3f0e197 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -16,6 +16,7 @@
 #include <linux/anon_inodes.h>
 #include <linux/eventfd.h>
 #include <linux/syscalls.h>
+#include <linux/module.h>
 
 struct eventfd_ctx {
 	wait_queue_head_t wqh;
@@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n)
 
 	return n;
 }
+EXPORT_SYMBOL_GPL(eventfd_signal);
 
 static int eventfd_release(struct inode *inode, struct file *file)
 {
@@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd)
 
 	return file;
 }
+EXPORT_SYMBOL_GPL(eventfd_fget);
 
 SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
 {


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [KVM PATCH v4 2/3] kvm: make io_bus interface more robust
  2009-05-26 19:15 [KVM PATCH v4 0/3] iosignalfd Gregory Haskins
  2009-05-26 19:15 ` [KVM PATCH v4 1/3] eventfd: export eventfd interfaces for module use Gregory Haskins
@ 2009-05-26 19:15 ` Gregory Haskins
  2009-05-27  8:54   ` Avi Kivity
  2009-05-26 19:15 ` [KVM PATCH v4 3/3] kvm: add iosignalfd support Gregory Haskins
  2 siblings, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2009-05-26 19:15 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, davide, mtosatti

Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON if it
fails.  We want to create dynamic MMIO/PIO entries driven from userspace later
in the series, so we need to enhance the code to be more robust with the
following changes:

   1) Add a return value to the registration function
   2) Fix up all the callsites to check the return code, handle any
      failures, and percolate the error up to the caller.
   3) Refactor io_bus to allow "holes" in the array so device hotplug
      can add/remove devices arbitrarily.
   4) Add an unregister function

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 arch/x86/kvm/i8254.c      |   34 ++++++++++++++++++++++---------
 arch/x86/kvm/i8259.c      |    9 +++++++-
 include/linux/kvm_host.h  |    8 +++++--
 virt/kvm/coalesced_mmio.c |    8 ++++++-
 virt/kvm/ioapic.c         |    9 ++++++--
 virt/kvm/kvm_main.c       |   49 +++++++++++++++++++++++++++++++++++++++------
 6 files changed, 92 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 584e3d3..6cf84d4 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -564,36 +564,40 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 {
 	struct kvm_pit *pit;
 	struct kvm_kpit_state *pit_state;
+	int ret;
 
 	pit = kzalloc(sizeof(struct kvm_pit), GFP_KERNEL);
 	if (!pit)
 		return NULL;
 
 	pit->irq_source_id = kvm_request_irq_source_id(kvm);
-	if (pit->irq_source_id < 0) {
-		kfree(pit);
-		return NULL;
-	}
-
-	mutex_init(&pit->pit_state.lock);
-	mutex_lock(&pit->pit_state.lock);
-	spin_lock_init(&pit->pit_state.inject_lock);
+	if (pit->irq_source_id < 0)
+		goto fail;
 
 	/* Initialize PIO device */
 	pit->dev.read = pit_ioport_read;
 	pit->dev.write = pit_ioport_write;
 	pit->dev.in_range = pit_in_range;
 	pit->dev.private = pit;
-	kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
+	ret = kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
+	if (ret < 0)
+		goto fail;
 
 	if (flags & KVM_PIT_SPEAKER_DUMMY) {
 		pit->speaker_dev.read = speaker_ioport_read;
 		pit->speaker_dev.write = speaker_ioport_write;
 		pit->speaker_dev.in_range = speaker_in_range;
 		pit->speaker_dev.private = pit;
-		kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev);
+		ret = kvm_io_bus_register_dev(&kvm->pio_bus,
+					      &pit->speaker_dev);
+		if (ret < 0)
+			goto fail;
 	}
 
+	mutex_init(&pit->pit_state.lock);
+	mutex_lock(&pit->pit_state.lock);
+	spin_lock_init(&pit->pit_state.inject_lock);
+
 	kvm->arch.vpit = pit;
 	pit->kvm = kvm;
 
@@ -613,6 +617,16 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 	kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
 
 	return pit;
+
+fail:
+	kvm_io_bus_unregister_dev(&kvm->pio_bus, &pit->speaker_dev);
+	kvm_io_bus_unregister_dev(&kvm->pio_bus, &pit->dev);
+
+	if (pit->irq_source_id >= 0)
+		kvm_free_irq_source_id(kvm, pit->irq_source_id);
+
+	kfree(pit);
+	return NULL;
 }
 
 void kvm_free_pit(struct kvm *kvm)
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 1ccb50c..0caf7d4 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -519,6 +519,8 @@ static void pic_irq_request(void *opaque, int level)
 struct kvm_pic *kvm_create_pic(struct kvm *kvm)
 {
 	struct kvm_pic *s;
+	int ret;
+
 	s = kzalloc(sizeof(struct kvm_pic), GFP_KERNEL);
 	if (!s)
 		return NULL;
@@ -538,6 +540,11 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm)
 	s->dev.write = picdev_write;
 	s->dev.in_range = picdev_in_range;
 	s->dev.private = s;
-	kvm_io_bus_register_dev(&kvm->pio_bus, &s->dev);
+	ret = kvm_io_bus_register_dev(&kvm->pio_bus, &s->dev);
+	if (ret < 0) {
+		kfree(s);
+		return NULL;
+	}
+
 	return s;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 28bd112..7dcae4b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -52,7 +52,7 @@ extern struct kmem_cache *kvm_vcpu_cache;
  * in one place.
  */
 struct kvm_io_bus {
-	int                   dev_count;
+	spinlock_t lock;
 #define NR_IOBUS_DEVS 6
 	struct kvm_io_device *devs[NR_IOBUS_DEVS];
 };
@@ -61,8 +61,10 @@ void kvm_io_bus_init(struct kvm_io_bus *bus);
 void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
 					  gpa_t addr, int len, int is_write);
-void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
-			     struct kvm_io_device *dev);
+int kvm_io_bus_register_dev(struct kvm_io_bus *bus,
+			    struct kvm_io_device *dev);
+int kvm_io_bus_unregister_dev(struct kvm_io_bus *bus,
+			    struct kvm_io_device *dev);
 
 struct kvm_vcpu {
 	struct kvm *kvm;
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5ae620d..ede9087 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -86,6 +86,7 @@ static void coalesced_mmio_destructor(struct kvm_io_device *this)
 int kvm_coalesced_mmio_init(struct kvm *kvm)
 {
 	struct kvm_coalesced_mmio_dev *dev;
+	int ret;
 
 	dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL);
 	if (!dev)
@@ -96,9 +97,12 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
 	dev->dev.private  = dev;
 	dev->kvm = kvm;
 	kvm->coalesced_mmio_dev = dev;
-	kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev);
 
-	return 0;
+	ret = kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev);
+	if (ret < 0)
+		kfree(dev);
+
+	return ret;
 }
 
 int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 1eddae9..9be89f5 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -317,6 +317,7 @@ void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
 int kvm_ioapic_init(struct kvm *kvm)
 {
 	struct kvm_ioapic *ioapic;
+	int ret;
 
 	ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL);
 	if (!ioapic)
@@ -328,7 +329,11 @@ int kvm_ioapic_init(struct kvm *kvm)
 	ioapic->dev.in_range = ioapic_in_range;
 	ioapic->dev.private = ioapic;
 	ioapic->kvm = kvm;
-	kvm_io_bus_register_dev(&kvm->mmio_bus, &ioapic->dev);
-	return 0;
+
+	ret = kvm_io_bus_register_dev(&kvm->mmio_bus, &ioapic->dev);
+	if (ret < 0)
+		kfree(ioapic);
+
+	return ret;
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index de042cb..4c36ac8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2435,16 +2435,18 @@ static struct notifier_block kvm_reboot_notifier = {
 void kvm_io_bus_init(struct kvm_io_bus *bus)
 {
 	memset(bus, 0, sizeof(*bus));
+	spin_lock_init(&bus->lock);
 }
 
 void kvm_io_bus_destroy(struct kvm_io_bus *bus)
 {
 	int i;
 
-	for (i = 0; i < bus->dev_count; i++) {
+	for (i = 0; i < NR_IOBUS_DEVS; i++) {
 		struct kvm_io_device *pos = bus->devs[i];
 
-		kvm_iodevice_destructor(pos);
+		if (pos)
+			kvm_iodevice_destructor(pos);
 	}
 }
 
@@ -2453,21 +2455,54 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
 {
 	int i;
 
-	for (i = 0; i < bus->dev_count; i++) {
+	for (i = 0; i < NR_IOBUS_DEVS; i++) {
 		struct kvm_io_device *pos = bus->devs[i];
 
-		if (pos->in_range(pos, addr, len, is_write))
+		if (pos && pos->in_range(pos, addr, len, is_write))
 			return pos;
 	}
 
 	return NULL;
 }
 
-void kvm_io_bus_register_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev)
+int kvm_io_bus_register_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev)
 {
-	BUG_ON(bus->dev_count > (NR_IOBUS_DEVS-1));
+	int i;
+
+	spin_lock(&bus->lock);
+
+	for (i = 0; i < NR_IOBUS_DEVS; i++) {
+		if (bus->devs[i])
+			continue;
+
+		bus->devs[i] = dev;
+		spin_unlock(&bus->lock);
+		return 0;
+	}
+
+	spin_unlock(&bus->lock);
+
+	return -ENOSPC;
+}
+
+int kvm_io_bus_unregister_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev)
+{
+	int i;
+
+	spin_lock(&bus->lock);
+
+	for (i = 0; i < NR_IOBUS_DEVS; i++) {
+
+		if (bus->devs[i] == dev) {
+			bus->devs[i] = NULL;
+			spin_unlock(&bus->lock);
+			return 0;
+		}
+	}
+
+	spin_unlock(&bus->lock);
 
-	bus->devs[bus->dev_count++] = dev;
+	return -ENOENT;
 }
 
 static struct notifier_block kvm_cpu_notifier = {


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-26 19:15 [KVM PATCH v4 0/3] iosignalfd Gregory Haskins
  2009-05-26 19:15 ` [KVM PATCH v4 1/3] eventfd: export eventfd interfaces for module use Gregory Haskins
  2009-05-26 19:15 ` [KVM PATCH v4 2/3] kvm: make io_bus interface more robust Gregory Haskins
@ 2009-05-26 19:15 ` Gregory Haskins
  2009-05-27  9:03   ` Avi Kivity
  2 siblings, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2009-05-26 19:15 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, davide, mtosatti

iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest.  Host userspace can register any arbitrary
IO address with a corresponding eventfd and then pass the eventfd to a
specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc).  For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible.  All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling.  This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd.  This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called "doorbell".  This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled.  It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl().  The other is direct via
iosignalfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio:       110000 iops, 9.09us rtt
iosignalfd-mmio: 200100 iops, 5.00us rtt
iosignalfd-pio:  367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy.  However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio:      153139 iops, 6.53us rtt
iosignalfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

The conclusion to draw is that we save about 4us by skipping the userspace
hop.

--------------------

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 arch/x86/kvm/x86.c       |    1 
 include/linux/kvm.h      |   15 ++++
 include/linux/kvm_host.h |   10 ++-
 virt/kvm/eventfd.c       |  162 ++++++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c      |   11 +++
 5 files changed, 195 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6d44dd5..31c86a3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1089,6 +1089,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_IRQ_INJECT_STATUS:
 	case KVM_CAP_ASSIGN_DEV_IRQ:
 	case KVM_CAP_IRQFD:
+	case KVM_CAP_IOSIGNALFD:
 	case KVM_CAP_PIT2:
 		r = 1;
 		break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 65df6f5..b63475e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -300,6 +300,19 @@ struct kvm_guest_debug {
 	struct kvm_guest_debug_arch arch;
 };
 
+#define KVM_IOSIGNALFD_FLAG_DEASSIGN  (1 << 0)
+#define KVM_IOSIGNALFD_FLAG_PIO       (1 << 1)
+#define KVM_IOSIGNALFD_FLAG_COOKIE    (1 << 2)
+
+struct kvm_iosignalfd {
+	__u64 cookie;
+	__u64 addr;
+	__u32 len;
+	__u32 fd;
+	__u32 flags;
+	__u8  pad[12];
+};
+
 #define KVM_TRC_SHIFT           16
 /*
  * kvm trace categories
@@ -430,6 +443,7 @@ struct kvm_trace_rec {
 #ifdef __KVM_HAVE_PIT
 #define KVM_CAP_PIT2 33
 #endif
+#define KVM_CAP_IOSIGNALFD 34
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -537,6 +551,7 @@ struct kvm_irqfd {
 #define KVM_DEASSIGN_DEV_IRQ       _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
 #define KVM_IRQFD                  _IOW(KVMIO, 0x76, struct kvm_irqfd)
 #define KVM_CREATE_PIT2		   _IOW(KVMIO, 0x77, struct kvm_pit_config)
+#define KVM_IOSIGNALFD             _IOW(KVMIO, 0x78, struct kvm_iosignalfd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7dcae4b..5b2be86 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -138,6 +138,7 @@ struct kvm {
 	struct kvm_io_bus pio_bus;
 #ifdef CONFIG_HAVE_KVM_EVENTFD
 	struct list_head irqfds;
+	struct list_head iosignalfds;
 #endif
 	struct kvm_vm_stat stat;
 	struct kvm_arch arch;
@@ -535,19 +536,24 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 
 #ifdef CONFIG_HAVE_KVM_EVENTFD
 
-void kvm_irqfd_init(struct kvm *kvm);
+void kvm_eventfd_init(struct kvm *kvm);
 int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
 void kvm_irqfd_release(struct kvm *kvm);
+int kvm_iosignalfd(struct kvm *kvm, struct kvm_iosignalfd *args);
 
 #else
 
-static inline void kvm_irqfd_init(struct kvm *kvm) {}
+static inline void kvm_eventfd_init(struct kvm *kvm) {}
 static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
 {
 	return -EINVAL;
 }
 
 static inline void kvm_irqfd_release(struct kvm *kvm) {}
+static inline int kvm_iosignalfd(struct kvm *kvm, struct kvm_iosignalfd *args)
+{
+	return -EINVAL;
+}
 
 #endif /* CONFIG_HAVE_KVM_EVENTFD */
 
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c63ff6a..1112349 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -21,12 +21,16 @@
  */
 
 #include <linux/kvm_host.h>
+#include <linux/kvm.h>
 #include <linux/workqueue.h>
 #include <linux/syscalls.h>
 #include <linux/wait.h>
 #include <linux/poll.h>
 #include <linux/file.h>
 #include <linux/list.h>
+#include <linux/eventfd.h>
+
+#include "iodev.h"
 
 /*
  * --------------------------------------------------------------------
@@ -207,9 +211,10 @@ kvm_deassign_irqfd(struct kvm *kvm, int fd, int gsi)
 }
 
 void
-kvm_irqfd_init(struct kvm *kvm)
+kvm_eventfd_init(struct kvm *kvm)
 {
 	INIT_LIST_HEAD(&kvm->irqfds);
+	INIT_LIST_HEAD(&kvm->iosignalfds);
 }
 
 int
@@ -232,3 +237,158 @@ kvm_irqfd_release(struct kvm *kvm)
 		irqfd_release(irqfd);
 	}
 }
+
+/*
+ * --------------------------------------------------------------------
+ * iosignalfd: translate a PIO/MMIO memory write to an eventfd signal.
+ *
+ * userspace can register a PIO/MMIO address with an eventfd for recieving
+ * notification when the memory has been touched.
+ * --------------------------------------------------------------------
+ */
+
+struct _iosignalfd {
+	u64                  cookie;
+	u64                  addr;
+	size_t               length;
+	struct file         *file;
+	struct list_head     list;
+	struct kvm_io_device dev;
+};
+
+static int
+iosignalfd_in_range(struct kvm_io_device *this, gpa_t addr, int len,
+		    int is_write)
+{
+	struct _iosignalfd *p = (struct _iosignalfd *)this->private;
+
+	return ((addr >= p->addr && (addr < p->addr + p->length)));
+}
+
+/* writes trigger an event */
+static void
+iosignalfd_write(struct kvm_io_device *this, gpa_t addr, int len,
+		 const void *val)
+{
+	struct _iosignalfd *iosignalfd = (struct _iosignalfd *)this->private;
+
+	eventfd_signal(iosignalfd->file, 1);
+}
+
+/* reads return all zeros */
+static void
+iosignalfd_read(struct kvm_io_device *this, gpa_t addr, int len, void *val)
+{
+	memset(val, 0, len);
+}
+
+static void
+_iosignalfd_destructor(struct _iosignalfd *iosignalfd)
+{
+	fput(iosignalfd->file);
+	list_del(&iosignalfd->list);
+
+	kfree(iosignalfd);
+}
+
+static void
+iosignalfd_destructor(struct kvm_io_device *this)
+{
+	struct _iosignalfd *iosignalfd = (struct _iosignalfd *)this->private;
+
+	_iosignalfd_destructor(iosignalfd);
+}
+
+static int
+kvm_assign_iosignalfd(struct kvm *kvm, struct kvm_iosignalfd *args)
+{
+	int                 pio = args->flags & KVM_IOSIGNALFD_FLAG_PIO;
+	struct kvm_io_bus  *bus = pio ? &kvm->pio_bus : &kvm->mmio_bus;
+	struct _iosignalfd *iosignalfd;
+	struct file        *file;
+	int                 ret;
+
+	file = eventfd_fget(args->fd);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		printk(KERN_ERR "iosignalfd: failed to get %d eventfd: %d\n",
+		       args->fd, ret);
+		return ret;
+	}
+
+	iosignalfd = kzalloc(sizeof(*iosignalfd), GFP_KERNEL);
+	if (!iosignalfd) {
+		printk(KERN_ERR "iosignalfd: memory pressure\n");
+		fput(file);
+		return -ENOMEM;
+	}
+
+	iosignalfd->dev.read       = iosignalfd_read;
+	iosignalfd->dev.write      = iosignalfd_write;
+	iosignalfd->dev.in_range   = iosignalfd_in_range;
+	iosignalfd->dev.destructor = iosignalfd_destructor;
+	iosignalfd->dev.private    = iosignalfd;
+
+	iosignalfd->cookie         = args->cookie;
+	iosignalfd->addr           = args->addr;
+	iosignalfd->length         = args->len;
+	iosignalfd->file           = file;
+	INIT_LIST_HEAD(&iosignalfd->list);
+
+	ret = kvm_io_bus_register_dev(bus, &iosignalfd->dev);
+	if (ret < 0) {
+		printk(KERN_ERR "iosignalfd: failed to register IODEV: %d\n",
+		       ret);
+		goto fail;
+	}
+
+	mutex_lock(&kvm->lock);
+	list_add_tail(&iosignalfd->list, &kvm->iosignalfds);
+	mutex_unlock(&kvm->lock);
+
+	return 0;
+
+fail:
+	/*
+	 * This doesn't take a lock, but the failure case will never result
+	 * in the list being populated anyway
+	 */
+	_iosignalfd_destructor(iosignalfd);
+
+	return ret;
+}
+
+static int
+kvm_deassign_iosignalfd(struct kvm *kvm, struct kvm_iosignalfd *args)
+{
+	int                   pio = args->flags & KVM_IOSIGNALFD_FLAG_PIO;
+	struct kvm_io_bus    *bus = pio ? &kvm->pio_bus : &kvm->mmio_bus;
+	struct _iosignalfd   *iosignalfd, *tmp;
+
+	mutex_lock(&kvm->lock);
+
+	list_for_each_entry_safe(iosignalfd, tmp, &kvm->iosignalfds, list) {
+		if (iosignalfd->addr != args->addr)
+			continue;
+
+		if ((args->flags & KVM_IOSIGNALFD_FLAG_COOKIE) &&
+		    (iosignalfd->cookie != args->cookie))
+			continue;
+
+		kvm_io_bus_unregister_dev(bus, &iosignalfd->dev);
+		_iosignalfd_destructor(iosignalfd);
+	}
+
+	mutex_unlock(&kvm->lock);
+
+	return 0;
+}
+
+int
+kvm_iosignalfd(struct kvm *kvm, struct kvm_iosignalfd *args)
+{
+	if (args->flags & KVM_IOSIGNALFD_FLAG_DEASSIGN)
+		return kvm_deassign_iosignalfd(kvm, args);
+
+	return kvm_assign_iosignalfd(kvm, args);
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4c36ac8..79fdc08 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -983,7 +983,7 @@ static struct kvm *kvm_create_vm(void)
 	atomic_inc(&kvm->mm->mm_count);
 	spin_lock_init(&kvm->mmu_lock);
 	kvm_io_bus_init(&kvm->pio_bus);
-	kvm_irqfd_init(kvm);
+	kvm_eventfd_init(kvm);
 	mutex_init(&kvm->lock);
 	kvm_io_bus_init(&kvm->mmio_bus);
 	init_rwsem(&kvm->slots_lock);
@@ -2221,6 +2221,15 @@ static long kvm_vm_ioctl(struct file *filp,
 		r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags);
 		break;
 	}
+	case KVM_IOSIGNALFD: {
+		struct kvm_iosignalfd entry;
+
+		r = -EFAULT;
+		if (copy_from_user(&entry, argp, sizeof entry))
+			goto out;
+		r = kvm_iosignalfd(kvm, &entry);
+		break;
+	}
 	default:
 		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
 	}


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 2/3] kvm: make io_bus interface more robust
  2009-05-26 19:15 ` [KVM PATCH v4 2/3] kvm: make io_bus interface more robust Gregory Haskins
@ 2009-05-27  8:54   ` Avi Kivity
  2009-05-27 11:26     ` Gregory Haskins
  0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2009-05-27  8:54 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davide, mtosatti

Gregory Haskins wrote:
> Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON if it
> fails.  We want to create dynamic MMIO/PIO entries driven from userspace later
> in the series, so we need to enhance the code to be more robust with the
> following changes:
>
>    1) Add a return value to the registration function
>    2) Fix up all the callsites to check the return code, handle any
>       failures, and percolate the error up to the caller.
>    3) Refactor io_bus to allow "holes" in the array so device hotplug
>       can add/remove devices arbitrarily.
>    4) Add an unregister function
>
> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> ---
>
>  arch/x86/kvm/i8254.c      |   34 ++++++++++++++++++++++---------
>  arch/x86/kvm/i8259.c      |    9 +++++++-
>  include/linux/kvm_host.h  |    8 +++++--
>  virt/kvm/coalesced_mmio.c |    8 ++++++-
>  virt/kvm/ioapic.c         |    9 ++++++--
>  virt/kvm/kvm_main.c       |   49 +++++++++++++++++++++++++++++++++++++++------
>  6 files changed, 92 insertions(+), 25 deletions(-)
>
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index 584e3d3..6cf84d4 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -564,36 +564,40 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
>  {
>  	struct kvm_pit *pit;
>  	struct kvm_kpit_state *pit_state;
> +	int ret;
>  
>  	pit = kzalloc(sizeof(struct kvm_pit), GFP_KERNEL);
>  	if (!pit)
>  		return NULL;
>  
>  	pit->irq_source_id = kvm_request_irq_source_id(kvm);
> -	if (pit->irq_source_id < 0) {
> -		kfree(pit);
> -		return NULL;
> -	}
> -
> -	mutex_init(&pit->pit_state.lock);
> -	mutex_lock(&pit->pit_state.lock);
> -	spin_lock_init(&pit->pit_state.inject_lock);
> +	if (pit->irq_source_id < 0)
> +		goto fail;
>  
>  	/* Initialize PIO device */
>  	pit->dev.read = pit_ioport_read;
>  	pit->dev.write = pit_ioport_write;
>  	pit->dev.in_range = pit_in_range;
>  	pit->dev.private = pit;
> -	kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
> +	ret = kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
> +	if (ret < 0)
> +		goto fail;
>  
>  	if (flags & KVM_PIT_SPEAKER_DUMMY) {
>  		pit->speaker_dev.read = speaker_ioport_read;
>  		pit->speaker_dev.write = speaker_ioport_write;
>  		pit->speaker_dev.in_range = speaker_in_range;
>  		pit->speaker_dev.private = pit;
> -		kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev);
> +		ret = kvm_io_bus_register_dev(&kvm->pio_bus,
> +					      &pit->speaker_dev);
> +		if (ret < 0)
> +			goto fail;
>  	}
>  
> +	mutex_init(&pit->pit_state.lock);
> +	mutex_lock(&pit->pit_state.lock);
> +	spin_lock_init(&pit->pit_state.inject_lock);
> +
>   

You are registering the PIT before it is initialized.  That exposes a 
race.  The original code also did that, but at least the pit lock was 
held while this was done.

>  	kvm->arch.vpit = pit;
>  	pit->kvm = kvm;
>  
> @@ -613,6 +617,16 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
>  	kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
>  
>  	return pit;
> +
> +fail:
> +	kvm_io_bus_unregister_dev(&kvm->pio_bus, &pit->speaker_dev);
>   

There's an option now to avoid speaker_dev, so this needs to be conditional.

> +	kvm_io_bus_unregister_dev(&kvm->pio_bus, &pit->dev);
> +
> +	if (pit->irq_source_id >= 0)
> +		kvm_free_irq_source_id(kvm, pit->irq_source_id);
> +
> +	kfree(pit);
> +	return NULL;
>  }
>  
> @@ -52,7 +52,7 @@ extern struct kmem_cache *kvm_vcpu_cache;
>   * in one place.
>   */
>  struct kvm_io_bus {
> -	int                   dev_count;
> +	spinlock_t lock;
>  #define NR_IOBUS_DEVS 6
>  	struct kvm_io_device *devs[NR_IOBUS_DEVS];
>  };
> @@ -61,8 +61,10 @@ void kvm_io_bus_init(struct kvm_io_bus *bus);
>  void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>  struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
>  					  gpa_t addr, int len, int is_write);
> -void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
> -			     struct kvm_io_device *dev);
> +int kvm_io_bus_register_dev(struct kvm_io_bus *bus,
> +			    struct kvm_io_device *dev);
> +int kvm_io_bus_unregister_dev(struct kvm_io_bus *bus,
> +			    struct kvm_io_device *dev);
>  
>   

unregister() should return void.  There's really nothing you can do to 
recover from a failure.

>  
> @@ -2453,21 +2455,54 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
>  {
>  	int i;
>  
> -	for (i = 0; i < bus->dev_count; i++) {
> +	for (i = 0; i < NR_IOBUS_DEVS; i++) {
>  		struct kvm_io_device *pos = bus->devs[i];
>  
> -		if (pos->in_range(pos, addr, len, is_write))
> +		if (pos && pos->in_range(pos, addr, len, is_write))
>  			return pos;
>  	}
>   

Let's keep dev_count, and just move things around on unregister.

 


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 1/3] eventfd: export eventfd interfaces for module use
  2009-05-26 19:15 ` [KVM PATCH v4 1/3] eventfd: export eventfd interfaces for module use Gregory Haskins
@ 2009-05-27  8:57   ` Avi Kivity
  2009-05-27 11:53     ` Gregory Haskins
  0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2009-05-27  8:57 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davide, mtosatti

Gregory Haskins wrote:
> We want to use eventfd from KVM which can be compiled as a module, so
> export the interfaces.
>
> Signed-off-by: Gregory Haskins <ghaskins@novell.com

Since Davide has already acked this patch, please include this in the 
signoff section so we don't lose the paperwork.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-26 19:15 ` [KVM PATCH v4 3/3] kvm: add iosignalfd support Gregory Haskins
@ 2009-05-27  9:03   ` Avi Kivity
  2009-05-27 11:47     ` Gregory Haskins
  0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2009-05-27  9:03 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davide, mtosatti

Gregory Haskins wrote:
> iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
> signal when written to by a guest.  Host userspace can register any arbitrary
> IO address with a corresponding eventfd and then pass the eventfd to a
> specific end-point of interest for handling.
>
> Normal IO requires a blocking round-trip since the operation may cause
> side-effects in the emulated model or may return data to the caller.
> Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
> "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
> device model synchronously before returning control back to the vcpu.
>
> However, there is a subclass of IO which acts purely as a trigger for
> other IO (such as to kick off an out-of-band DMA request, etc).  For these
> patterns, the synchronous call is particularly expensive since we really
> only want to simply get our notification transmitted asychronously and
> return as quickly as possible.  All the sychronous infrastructure to ensure
> proper data-dependencies are met in the normal IO case are just unecessary
> overhead for signalling.  This adds additional computational load on the
> system, as well as latency to the signalling path.
>
> Therefore, we provide a mechanism for registration of an in-kernel trigger
> point that allows the VCPU to only require a very brief, lightweight
> exit just long enough to signal an eventfd.  This also means that any
> clients compatible with the eventfd interface (which includes userspace
> and kernelspace equally well) can now register to be notified. The end
> result should be a more flexible and higher performance notification API
> for the backend KVM hypervisor and perhipheral components.
>
> To test this theory, we built a test-harness called "doorbell".  This
> module has a function called "doorbell_ring()" which simply increments a
> counter for each time the doorbell is signaled.  It supports signalling
> from either an eventfd, or an ioctl().
>
> We then wired up two paths to the doorbell: One via QEMU via a registered
> io region and through the doorbell ioctl().  The other is direct via
> iosignalfd.
>
> You can download this test harness here:
>
> ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
>
> The measured results are as follows:
>
> qemu-mmio:       110000 iops, 9.09us rtt
> iosignalfd-mmio: 200100 iops, 5.00us rtt
> iosignalfd-pio:  367300 iops, 2.72us rtt
>
> I didn't measure qemu-pio, because I have to figure out how to register a
> PIO region with qemu's device model, and I got lazy.  However, for now we
> can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
> and -350ns for HC, we get:
>
> qemu-pio:      153139 iops, 6.53us rtt
> iosignalfd-hc: 412585 iops, 2.37us rtt
>
> these are just for fun, for now, until I can gather more data.
>
> Here is a graph for your convenience:
>
> http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
>
> The conclusion to draw is that we save about 4us by skipping the userspace
> hop.
>
> +/* writes trigger an event */
> +static void
> +iosignalfd_write(struct kvm_io_device *this, gpa_t addr, int len,
> +		 const void *val)
> +{
> +	struct _iosignalfd *iosignalfd = (struct _iosignalfd *)this->private;
> +
> +	eventfd_signal(iosignalfd->file, 1);
> +}
>   

I much prefer including kvm_io_device inside _iosignalfd and using 
container_of() instead of ->private.  But that is of course unrelated to 
this patch and is not a requirement.

> +
> +static int
> +kvm_assign_iosignalfd(struct kvm *kvm, struct kvm_iosignalfd *args)
> +{
> +	int                 pio = args->flags & KVM_IOSIGNALFD_FLAG_PIO;
> +	struct kvm_io_bus  *bus = pio ? &kvm->pio_bus : &kvm->mmio_bus;
> +	struct _iosignalfd *iosignalfd;
> +	struct file        *file;
> +	int                 ret;
> +
> +	file = eventfd_fget(args->fd);
> +	if (IS_ERR(file)) {
> +		ret = PTR_ERR(file);
> +		printk(KERN_ERR "iosignalfd: failed to get %d eventfd: %d\n",
> +		       args->fd, ret);
>   

drop the printk, we don't want to let users spam dmesg.

> +		return ret;
> +	}
> +
> +	iosignalfd = kzalloc(sizeof(*iosignalfd), GFP_KERNEL);
> +	if (!iosignalfd) {
> +		printk(KERN_ERR "iosignalfd: memory pressure\n");
>   

here too.

> +	ret = kvm_io_bus_register_dev(bus, &iosignalfd->dev);
> +	if (ret < 0) {
> +		printk(KERN_ERR "iosignalfd: failed to register IODEV: %d\n",
> +		       ret);
>   

and here etc.

What happens if you register to iosignalfds for the same address but 
with different cookies (a very practical scenario)?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 2/3] kvm: make io_bus interface more robust
  2009-05-27  8:54   ` Avi Kivity
@ 2009-05-27 11:26     ` Gregory Haskins
  0 siblings, 0 replies; 22+ messages in thread
From: Gregory Haskins @ 2009-05-27 11:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, Davide Libenzi, mtosatti

[-- Attachment #1: Type: text/plain, Size: 5748 bytes --]

[Restoring Davide's proper email.  I had a typo in the original v4
announcement.  Sorry Davide]

Avi Kivity wrote:
> Gregory Haskins wrote:
>> Today kvm_io_bus_regsiter_dev() returns void and will internally
>> BUG_ON if it
>> fails.  We want to create dynamic MMIO/PIO entries driven from
>> userspace later
>> in the series, so we need to enhance the code to be more robust with the
>> following changes:
>>
>>    1) Add a return value to the registration function
>>    2) Fix up all the callsites to check the return code, handle any
>>       failures, and percolate the error up to the caller.
>>    3) Refactor io_bus to allow "holes" in the array so device hotplug
>>       can add/remove devices arbitrarily.
>>    4) Add an unregister function
>>
>> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
>> ---
>>
>>  arch/x86/kvm/i8254.c      |   34 ++++++++++++++++++++++---------
>>  arch/x86/kvm/i8259.c      |    9 +++++++-
>>  include/linux/kvm_host.h  |    8 +++++--
>>  virt/kvm/coalesced_mmio.c |    8 ++++++-
>>  virt/kvm/ioapic.c         |    9 ++++++--
>>  virt/kvm/kvm_main.c       |   49
>> +++++++++++++++++++++++++++++++++++++++------
>>  6 files changed, 92 insertions(+), 25 deletions(-)
>>
>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
>> index 584e3d3..6cf84d4 100644
>> --- a/arch/x86/kvm/i8254.c
>> +++ b/arch/x86/kvm/i8254.c
>> @@ -564,36 +564,40 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm,
>> u32 flags)
>>  {
>>      struct kvm_pit *pit;
>>      struct kvm_kpit_state *pit_state;
>> +    int ret;
>>  
>>      pit = kzalloc(sizeof(struct kvm_pit), GFP_KERNEL);
>>      if (!pit)
>>          return NULL;
>>  
>>      pit->irq_source_id = kvm_request_irq_source_id(kvm);
>> -    if (pit->irq_source_id < 0) {
>> -        kfree(pit);
>> -        return NULL;
>> -    }
>> -
>> -    mutex_init(&pit->pit_state.lock);
>> -    mutex_lock(&pit->pit_state.lock);
>> -    spin_lock_init(&pit->pit_state.inject_lock);
>> +    if (pit->irq_source_id < 0)
>> +        goto fail;
>>  
>>      /* Initialize PIO device */
>>      pit->dev.read = pit_ioport_read;
>>      pit->dev.write = pit_ioport_write;
>>      pit->dev.in_range = pit_in_range;
>>      pit->dev.private = pit;
>> -    kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
>> +    ret = kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
>> +    if (ret < 0)
>> +        goto fail;
>>  
>>      if (flags & KVM_PIT_SPEAKER_DUMMY) {
>>          pit->speaker_dev.read = speaker_ioport_read;
>>          pit->speaker_dev.write = speaker_ioport_write;
>>          pit->speaker_dev.in_range = speaker_in_range;
>>          pit->speaker_dev.private = pit;
>> -        kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev);
>> +        ret = kvm_io_bus_register_dev(&kvm->pio_bus,
>> +                          &pit->speaker_dev);
>> +        if (ret < 0)
>> +            goto fail;
>>      }
>>  
>> +    mutex_init(&pit->pit_state.lock);
>> +    mutex_lock(&pit->pit_state.lock);
>> +    spin_lock_init(&pit->pit_state.inject_lock);
>> +
>>   
>
> You are registering the PIT before it is initialized.  That exposes a
> race.  The original code also did that, but at least the pit lock was
> held while this was done.
Doh!  Will fix.

>
>>      kvm->arch.vpit = pit;
>>      pit->kvm = kvm;
>>  
>> @@ -613,6 +617,16 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm,
>> u32 flags)
>>      kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
>>  
>>      return pit;
>> +
>> +fail:
>> +    kvm_io_bus_unregister_dev(&kvm->pio_bus, &pit->speaker_dev);
>>   
>
> There's an option now to avoid speaker_dev, so this needs to be
> conditional.

I was intentionally simple here based on the fact that the unregister
can silently/harmlessly fail.  (Another scenario is that we never tried
to register the speaker_dev before we hit the fail: path.

>
>> +    kvm_io_bus_unregister_dev(&kvm->pio_bus, &pit->dev);
>> +
>> +    if (pit->irq_source_id >= 0)
>> +        kvm_free_irq_source_id(kvm, pit->irq_source_id);
>> +
>> +    kfree(pit);
>> +    return NULL;
>>  }
>>  
>> @@ -52,7 +52,7 @@ extern struct kmem_cache *kvm_vcpu_cache;
>>   * in one place.
>>   */
>>  struct kvm_io_bus {
>> -    int                   dev_count;
>> +    spinlock_t lock;
>>  #define NR_IOBUS_DEVS 6
>>      struct kvm_io_device *devs[NR_IOBUS_DEVS];
>>  };
>> @@ -61,8 +61,10 @@ void kvm_io_bus_init(struct kvm_io_bus *bus);
>>  void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>>  struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
>>                        gpa_t addr, int len, int is_write);
>> -void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
>> -                 struct kvm_io_device *dev);
>> +int kvm_io_bus_register_dev(struct kvm_io_bus *bus,
>> +                struct kvm_io_device *dev);
>> +int kvm_io_bus_unregister_dev(struct kvm_io_bus *bus,
>> +                struct kvm_io_device *dev);
>>  
>>   
>
> unregister() should return void.  There's really nothing you can do to
> recover from a failure.

Yeah, good point.

>
>>  
>> @@ -2453,21 +2455,54 @@ struct kvm_io_device
>> *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
>>  {
>>      int i;
>>  
>> -    for (i = 0; i < bus->dev_count; i++) {
>> +    for (i = 0; i < NR_IOBUS_DEVS; i++) {
>>          struct kvm_io_device *pos = bus->devs[i];
>>  
>> -        if (pos->in_range(pos, addr, len, is_write))
>> +        if (pos && pos->in_range(pos, addr, len, is_write))
>>              return pos;
>>      }
>>   
>
> Let's keep dev_count, and just move things around on unregister.

Ok

-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-27  9:03   ` Avi Kivity
@ 2009-05-27 11:47     ` Gregory Haskins
  2009-05-27 12:11       ` Avi Kivity
  0 siblings, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2009-05-27 11:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, Davide Libenzi, mtosatti

[-- Attachment #1: Type: text/plain, Size: 6596 bytes --]

Avi Kivity wrote:
> Gregory Haskins wrote:
>> iosignalfd is a mechanism to register PIO/MMIO regions to trigger an
>> eventfd
>> signal when written to by a guest.  Host userspace can register any
>> arbitrary
>> IO address with a corresponding eventfd and then pass the eventfd to a
>> specific end-point of interest for handling.
>>
>> Normal IO requires a blocking round-trip since the operation may cause
>> side-effects in the emulated model or may return data to the caller.
>> Therefore, an IO in KVM traps from the guest to the host, causes a
>> VMX/SVM
>> "heavy-weight" exit back to userspace, and is ultimately serviced by
>> qemu's
>> device model synchronously before returning control back to the vcpu.
>>
>> However, there is a subclass of IO which acts purely as a trigger for
>> other IO (such as to kick off an out-of-band DMA request, etc).  For
>> these
>> patterns, the synchronous call is particularly expensive since we really
>> only want to simply get our notification transmitted asychronously and
>> return as quickly as possible.  All the sychronous infrastructure to
>> ensure
>> proper data-dependencies are met in the normal IO case are just
>> unecessary
>> overhead for signalling.  This adds additional computational load on the
>> system, as well as latency to the signalling path.
>>
>> Therefore, we provide a mechanism for registration of an in-kernel
>> trigger
>> point that allows the VCPU to only require a very brief, lightweight
>> exit just long enough to signal an eventfd.  This also means that any
>> clients compatible with the eventfd interface (which includes userspace
>> and kernelspace equally well) can now register to be notified. The end
>> result should be a more flexible and higher performance notification API
>> for the backend KVM hypervisor and perhipheral components.
>>
>> To test this theory, we built a test-harness called "doorbell".  This
>> module has a function called "doorbell_ring()" which simply increments a
>> counter for each time the doorbell is signaled.  It supports signalling
>> from either an eventfd, or an ioctl().
>>
>> We then wired up two paths to the doorbell: One via QEMU via a
>> registered
>> io region and through the doorbell ioctl().  The other is direct via
>> iosignalfd.
>>
>> You can download this test harness here:
>>
>> ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
>>
>> The measured results are as follows:
>>
>> qemu-mmio:       110000 iops, 9.09us rtt
>> iosignalfd-mmio: 200100 iops, 5.00us rtt
>> iosignalfd-pio:  367300 iops, 2.72us rtt
>>
>> I didn't measure qemu-pio, because I have to figure out how to
>> register a
>> PIO region with qemu's device model, and I got lazy.  However, for
>> now we
>> can extrapolate based on the data from the NULLIO runs of +2.56us for
>> MMIO,
>> and -350ns for HC, we get:
>>
>> qemu-pio:      153139 iops, 6.53us rtt
>> iosignalfd-hc: 412585 iops, 2.37us rtt
>>
>> these are just for fun, for now, until I can gather more data.
>>
>> Here is a graph for your convenience:
>>
>> http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
>>
>> The conclusion to draw is that we save about 4us by skipping the
>> userspace
>> hop.
>>
>> +/* writes trigger an event */
>> +static void
>> +iosignalfd_write(struct kvm_io_device *this, gpa_t addr, int len,
>> +         const void *val)
>> +{
>> +    struct _iosignalfd *iosignalfd = (struct _iosignalfd
>> *)this->private;
>> +
>> +    eventfd_signal(iosignalfd->file, 1);
>> +}
>>   
>
> I much prefer including kvm_io_device inside _iosignalfd and using
> container_of() instead of ->private.  But that is of course unrelated
> to this patch and is not a requirement.
>
>> +
>> +static int
>> +kvm_assign_iosignalfd(struct kvm *kvm, struct kvm_iosignalfd *args)
>> +{
>> +    int                 pio = args->flags & KVM_IOSIGNALFD_FLAG_PIO;
>> +    struct kvm_io_bus  *bus = pio ? &kvm->pio_bus : &kvm->mmio_bus;
>> +    struct _iosignalfd *iosignalfd;
>> +    struct file        *file;
>> +    int                 ret;
>> +
>> +    file = eventfd_fget(args->fd);
>> +    if (IS_ERR(file)) {
>> +        ret = PTR_ERR(file);
>> +        printk(KERN_ERR "iosignalfd: failed to get %d eventfd: %d\n",
>> +               args->fd, ret);
>>   
>
> drop the printk, we don't want to let users spam dmesg.
>
>> +        return ret;
>> +    }
>> +
>> +    iosignalfd = kzalloc(sizeof(*iosignalfd), GFP_KERNEL);
>> +    if (!iosignalfd) {
>> +        printk(KERN_ERR "iosignalfd: memory pressure\n");
>>   
>
> here too.
>
>> +    ret = kvm_io_bus_register_dev(bus, &iosignalfd->dev);
>> +    if (ret < 0) {
>> +        printk(KERN_ERR "iosignalfd: failed to register IODEV: %d\n",
>> +               ret);
>>   
>
> and here etc.

Ack on the printk removals.


>
> What happens if you register to iosignalfds for the same address but
> with different cookies (a very practical scenario)?

This is really only supported at the iosignal interface level.  Today,
you can do this and the registration will succeed, but at run-time an
IO-exit will stop at the first in_range() hit it finds.  Therefore, you
will only get service on the first/lowest registered range.

I knew this was a limitation of the current io_bus, but I put the
feature into iosignalfd anyway so that the user/kern interface was
robust enough to support the notion should we ever need it (and can thus
patch io_bus at that time).  Perhaps that is short-sighted because
userspace would never know its ranges weren't really registered properly.

I guess its simple enough to have io_bus check all devices for a match
instead of stopping on the first.  Should I just make a patch to fix
this, or should I fix iosignalfd to check for in_range matches and fail
if it finds overlap?  (We could then add a CAP_OVERLAP_IO bit in the
future if we finally fix the io_bus capability).  I am inclined to lean
towards option 2, since its not known whether this will ever be useful,
and io_bus scanning is in a hot-path.

Thinking about it some more, I wonder if we should just get rid of the
notion of overlap to begin with.  Its a slippery slope (should we also
return to userspace after scanning and matching io_bus to see if it has
any overlap too?).  I am not sure if it would ever be used (real
hardware doesn't have multiple devices at the same address), and we can
always have multiple end-points mux from one iosignalfd if we really
need that.  Thoughts?

-Greg




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 1/3] eventfd: export eventfd interfaces for module use
  2009-05-27  8:57   ` Avi Kivity
@ 2009-05-27 11:53     ` Gregory Haskins
  0 siblings, 0 replies; 22+ messages in thread
From: Gregory Haskins @ 2009-05-27 11:53 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, Davide Libenzi, mtosatti

[-- Attachment #1: Type: text/plain, Size: 396 bytes --]

[Fixing Davide's email address]

Avi Kivity wrote:
> Gregory Haskins wrote:
>> We want to use eventfd from KVM which can be compiled as a module, so
>> export the interfaces.
>>
>> Signed-off-by: Gregory Haskins <ghaskins@novell.com
>
> Since Davide has already acked this patch, please include this in the
> signoff section so we don't lose the paperwork.
>

Will do.

-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-27 11:47     ` Gregory Haskins
@ 2009-05-27 12:11       ` Avi Kivity
  2009-05-27 12:54         ` Gregory Haskins
  2009-05-27 17:25         ` Mark McLoughlin
  0 siblings, 2 replies; 22+ messages in thread
From: Avi Kivity @ 2009-05-27 12:11 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: kvm, linux-kernel, Davide Libenzi, mtosatti, Mark McLoughlin

Gregory Haskins wrote:
>
>> What happens if you register to iosignalfds for the same address but
>> with different cookies (a very practical scenario)?
>>     
>
> This is really only supported at the iosignal interface level.  Today,
> you can do this and the registration will succeed, but at run-time an
> IO-exit will stop at the first in_range() hit it finds.  Therefore, you
> will only get service on the first/lowest registered range.
>
> I knew this was a limitation of the current io_bus, but I put the
> feature into iosignalfd anyway so that the user/kern interface was
> robust enough to support the notion should we ever need it (and can thus
> patch io_bus at that time).  Perhaps that is short-sighted because
> userspace would never know its ranges weren't really registered properly.
>
> I guess its simple enough to have io_bus check all devices for a match
> instead of stopping on the first.  Should I just make a patch to fix
> this, or should I fix iosignalfd to check for in_range matches and fail
> if it finds overlap?  (We could then add a CAP_OVERLAP_IO bit in the
> future if we finally fix the io_bus capability).  I am inclined to lean
> towards option 2, since its not known whether this will ever be useful,
> and io_bus scanning is in a hot-path.
>
> Thinking about it some more, I wonder if we should just get rid of the
> notion of overlap to begin with.  Its a slippery slope (should we also
> return to userspace after scanning and matching io_bus to see if it has
> any overlap too?).  I am not sure if it would ever be used (real
> hardware doesn't have multiple devices at the same address), and we can
> always have multiple end-points mux from one iosignalfd if we really
> need that.  Thoughts?
>   

Multiple cookies on the same address are required by virtio.  You can't 
mux since the data doesn't go anywhere.

Virtio can survive by checking all rings on a notify, and we can later 
add a mechanism that has a distinct address for each ring, but let's see 
if we can cope with multiple cookies.  Mark?

You could search existing iosignalfds for the same address and re-use 
the same iodevice.  I don't want to search the entire list since that 
precludes tricks like using hashtables or sorting the list by frequency 
of access.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-27 12:11       ` Avi Kivity
@ 2009-05-27 12:54         ` Gregory Haskins
  2009-05-27 17:25         ` Mark McLoughlin
  1 sibling, 0 replies; 22+ messages in thread
From: Gregory Haskins @ 2009-05-27 12:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, Davide Libenzi, mtosatti, Mark McLoughlin

[-- Attachment #1: Type: text/plain, Size: 3557 bytes --]

Avi Kivity wrote:
> Gregory Haskins wrote:
>>
>>> What happens if you register to iosignalfds for the same address but
>>> with different cookies (a very practical scenario)?
>>>     
>>
>> This is really only supported at the iosignal interface level.  Today,
>> you can do this and the registration will succeed, but at run-time an
>> IO-exit will stop at the first in_range() hit it finds.  Therefore, you
>> will only get service on the first/lowest registered range.
>>
>> I knew this was a limitation of the current io_bus, but I put the
>> feature into iosignalfd anyway so that the user/kern interface was
>> robust enough to support the notion should we ever need it (and can thus
>> patch io_bus at that time).  Perhaps that is short-sighted because
>> userspace would never know its ranges weren't really registered
>> properly.
>>
>> I guess its simple enough to have io_bus check all devices for a match
>> instead of stopping on the first.  Should I just make a patch to fix
>> this, or should I fix iosignalfd to check for in_range matches and fail
>> if it finds overlap?  (We could then add a CAP_OVERLAP_IO bit in the
>> future if we finally fix the io_bus capability).  I am inclined to lean
>> towards option 2, since its not known whether this will ever be useful,
>> and io_bus scanning is in a hot-path.
>>
>> Thinking about it some more, I wonder if we should just get rid of the
>> notion of overlap to begin with.  Its a slippery slope (should we also
>> return to userspace after scanning and matching io_bus to see if it has
>> any overlap too?).  I am not sure if it would ever be used (real
>> hardware doesn't have multiple devices at the same address), and we can
>> always have multiple end-points mux from one iosignalfd if we really
>> need that.  Thoughts?
>>   
>
> Multiple cookies on the same address are required by virtio.  You
> can't mux since the data doesn't go anywhere.

Hmm..well, I might not be understanding properly, but I think we are
still ok.  IIUC, the concept is that we can register multiple
iosignalfds to trigger when a single range of [MM|P]IO is touched.  I.e.
one iowrite() triggers multiple eventfd_signal()s to go out.  You could
do this directly by having io_bus support multiple matches for
in_range().  You could also use a mux concept where one registration
fans out to multiple iosignalfds (either like you suggest below, or by
having one iosignalfd mux/relay to the others...I like your idea below
better, btw).

Or am I missing something?

>
>
> Virtio can survive by checking all rings on a notify, and we can later
> add a mechanism that has a distinct address for each ring, but let's
> see if we can cope with multiple cookies.  Mark?

I am confused by this.  I can totally see the use case for one
iosignalfd (w/ one address) for all rings (in a device), and one
iosignalfd per ring (each with a unique address).  But when would we
want to have one address serve multiple rings each with their own
notification?  Just curious.

>
>
> You could search existing iosignalfds for the same address and re-use
> the same iodevice.  I don't want to search the entire list since that
> precludes tricks like using hashtables or sorting the list by
> frequency of access.
>
Yeah, I like this idea best.  I can basically have my own "in_range"
mechanism inside the _iosignalfd structure.  I only register one range
with io_bus, but then I may have multiple targets within that.  I will
do this for v5.

-Greg




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-27 12:11       ` Avi Kivity
  2009-05-27 12:54         ` Gregory Haskins
@ 2009-05-27 17:25         ` Mark McLoughlin
  2009-05-27 17:40           ` Gregory Haskins
  1 sibling, 1 reply; 22+ messages in thread
From: Mark McLoughlin @ 2009-05-27 17:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gregory Haskins, kvm, linux-kernel, Davide Libenzi, mtosatti

On Wed, 2009-05-27 at 15:11 +0300, Avi Kivity wrote:

> Multiple cookies on the same address are required by virtio.  You can't 
> mux since the data doesn't go anywhere.
> 
> Virtio can survive by checking all rings on a notify, and we can later 
> add a mechanism that has a distinct address for each ring, but let's see 
> if we can cope with multiple cookies.  Mark?

Trying to catch up, but you're talking about replacing virtio-pci
QUEUE_NOTIFY handling with iosignalfd ?

For a perfect replacement, what you really need is to be able to
register multiple cookies per address range, but only have them trigger
if the written data matches a provided value.

If the data is lost, virtio has no way of knowing which queue is being
notified, so we either end up with per-device, rather than per-queue,
notifications (probably not too bad for net, at least) or a different
notify address per queue (limiting the number of queues per device).

Cheers,
Mark.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-27 17:25         ` Mark McLoughlin
@ 2009-05-27 17:40           ` Gregory Haskins
  2009-05-27 17:48             ` Mark McLoughlin
  0 siblings, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2009-05-27 17:40 UTC (permalink / raw)
  To: Mark McLoughlin; +Cc: Avi Kivity, kvm, linux-kernel, Davide Libenzi, mtosatti

[-- Attachment #1: Type: text/plain, Size: 1547 bytes --]

Mark McLoughlin wrote:
> On Wed, 2009-05-27 at 15:11 +0300, Avi Kivity wrote:
>
>   
>> Multiple cookies on the same address are required by virtio.  You can't 
>> mux since the data doesn't go anywhere.
>>
>> Virtio can survive by checking all rings on a notify, and we can later 
>> add a mechanism that has a distinct address for each ring, but let's see 
>> if we can cope with multiple cookies.  Mark?
>>     
>
> Trying to catch up, but you're talking about replacing virtio-pci
> QUEUE_NOTIFY handling with iosignalfd ?
>
> For a perfect replacement, what you really need is to be able to
> register multiple cookies per address range, but only have them trigger
> if the written data matches a provided value.
>   

Hmm..thats an interesting idea.  To date, the "cookie" has really been
for identifying the proper range selected for deassignment.  I never
thought of using it as an actual trigger value at run-time.

> If the data is lost, virtio has no way of knowing which queue is being
> notified, so we either end up with per-device, rather than per-queue,
> notifications (probably not too bad for net, at least) or a different
> notify address per queue (limiting the number of queues per device).
>   

The addr-per-queue is how I was envisioning it, but the trigger value
concept hadn't occurred to me.  I could make this an option during
assignment (e.g. "COOKIE" flag means only trigger on writes of the
provided cookie, otherwise trigger on any write).  Sound good?

Thanks Mark,
-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-27 17:40           ` Gregory Haskins
@ 2009-05-27 17:48             ` Mark McLoughlin
  2009-05-27 20:45               ` Gregory Haskins
  2009-06-03 22:04               ` Gregory Haskins
  0 siblings, 2 replies; 22+ messages in thread
From: Mark McLoughlin @ 2009-05-27 17:48 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: Avi Kivity, kvm, linux-kernel, Davide Libenzi, mtosatti

On Wed, 2009-05-27 at 13:40 -0400, Gregory Haskins wrote:
> Mark McLoughlin wrote:
> > On Wed, 2009-05-27 at 15:11 +0300, Avi Kivity wrote:
> >
> >   
> >> Multiple cookies on the same address are required by virtio.  You can't 
> >> mux since the data doesn't go anywhere.
> >>
> >> Virtio can survive by checking all rings on a notify, and we can later 
> >> add a mechanism that has a distinct address for each ring, but let's see 
> >> if we can cope with multiple cookies.  Mark?
> >>     
> >
> > Trying to catch up, but you're talking about replacing virtio-pci
> > QUEUE_NOTIFY handling with iosignalfd ?
> >
> > For a perfect replacement, what you really need is to be able to
> > register multiple cookies per address range, but only have them trigger
> > if the written data matches a provided value.
> >   
> 
> Hmm..thats an interesting idea.  To date, the "cookie" has really been
> for identifying the proper range selected for deassignment.  I never
> thought of using it as an actual trigger value at run-time.
> 
> > If the data is lost, virtio has no way of knowing which queue is being
> > notified, so we either end up with per-device, rather than per-queue,
> > notifications (probably not too bad for net, at least) or a different
> > notify address per queue (limiting the number of queues per device).
> >   
> 
> The addr-per-queue is how I was envisioning it, but the trigger value
> concept hadn't occurred to me.  I could make this an option during
> assignment (e.g. "COOKIE" flag means only trigger on writes of the
> provided cookie, otherwise trigger on any write).  Sound good?

Ah, I'd been thinking of the trigger data being provided separately to
the cookie.

The virtio ABI is fixed, so we couldn't e.g. have the guest use a cookie
to identify a queue - it's just going to continue using a per-device
queue number. So, if the cookie was also the trigger, we'd need an
eventfd per device.

And if this was a device where the guest writes similar values to
multiple addresses, you'd need an eventfd per address.

Cheers,
Mark.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-27 17:48             ` Mark McLoughlin
@ 2009-05-27 20:45               ` Gregory Haskins
  2009-05-28  9:09                 ` Mark McLoughlin
  2009-06-03 22:04               ` Gregory Haskins
  1 sibling, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2009-05-27 20:45 UTC (permalink / raw)
  To: Mark McLoughlin
  Cc: Gregory Haskins, Avi Kivity, kvm, linux-kernel, Davide Libenzi, mtosatti

[-- Attachment #1: Type: text/plain, Size: 3081 bytes --]

Mark McLoughlin wrote:
> On Wed, 2009-05-27 at 13:40 -0400, Gregory Haskins wrote:
>   
>> Mark McLoughlin wrote:
>>     
>>> On Wed, 2009-05-27 at 15:11 +0300, Avi Kivity wrote:
>>>
>>>   
>>>       
>>>> Multiple cookies on the same address are required by virtio.  You can't 
>>>> mux since the data doesn't go anywhere.
>>>>
>>>> Virtio can survive by checking all rings on a notify, and we can later 
>>>> add a mechanism that has a distinct address for each ring, but let's see 
>>>> if we can cope with multiple cookies.  Mark?
>>>>     
>>>>         
>>> Trying to catch up, but you're talking about replacing virtio-pci
>>> QUEUE_NOTIFY handling with iosignalfd ?
>>>
>>> For a perfect replacement, what you really need is to be able to
>>> register multiple cookies per address range, but only have them trigger
>>> if the written data matches a provided value.
>>>   
>>>       
>> Hmm..thats an interesting idea.  To date, the "cookie" has really been
>> for identifying the proper range selected for deassignment.  I never
>> thought of using it as an actual trigger value at run-time.
>>
>>     
>>> If the data is lost, virtio has no way of knowing which queue is being
>>> notified, so we either end up with per-device, rather than per-queue,
>>> notifications (probably not too bad for net, at least) or a different
>>> notify address per queue (limiting the number of queues per device).
>>>   
>>>       
>> The addr-per-queue is how I was envisioning it, but the trigger value
>> concept hadn't occurred to me.  I could make this an option during
>> assignment (e.g. "COOKIE" flag means only trigger on writes of the
>> provided cookie, otherwise trigger on any write).  Sound good?
>>     
>
> Ah, I'd been thinking of the trigger data being provided separately to
> the cookie.
>   

Yeah, that shouldn't be a problem.  Its probably cleaner, too.  Perhaps
I can simplify things and make the trigger width fixed (say, always
32bit) just so I dont have to support 8,16,32 and 64 variants.

> The virtio ABI is fixed, so we couldn't e.g. have the guest use a cookie
> to identify a queue - it's just going to continue using a per-device
> queue number. 

Actually, I was originally thinking this would be exposed as a virtio
FEATURE bit anyway, so there were no backwards-compat constraints.  That
said, we can possibly make it work in a backwards compat way, too. 
IIRC, today virtio does a PIO cycle to a specific register with the
queue-id when it wants to signal guest->host, right?  What is the width
of the write?


> So, if the cookie was also the trigger, we'd need an
> eventfd per device.
>   

I'm having trouble parsing this one.  The cookie namespace is controlled
by the userspace component that owns the corresponding IO address, so
there's no reason you can't make "queue-id = 0" use cookie = 0, or
whatever.  That said, I still think a separation of the cookie and
trigger as suggested above is a good idea, so its probably moot to
discuss this point further.

-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-27 20:45               ` Gregory Haskins
@ 2009-05-28  9:09                 ` Mark McLoughlin
  2009-05-28 12:12                   ` Gregory Haskins
  0 siblings, 1 reply; 22+ messages in thread
From: Mark McLoughlin @ 2009-05-28  9:09 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Gregory Haskins, Avi Kivity, kvm, linux-kernel, Davide Libenzi, mtosatti

On Wed, 2009-05-27 at 16:45 -0400, Gregory Haskins wrote:
> Mark McLoughlin wrote:
> > The virtio ABI is fixed, so we couldn't e.g. have the guest use a cookie
> > to identify a queue - it's just going to continue using a per-device
> > queue number. 
> 
> Actually, I was originally thinking this would be exposed as a virtio
> FEATURE bit anyway, so there were no backwards-compat constraints.  That
> said, we can possibly make it work in a backwards compat way, too. 
> IIRC, today virtio does a PIO cycle to a specific register with the
> queue-id when it wants to signal guest->host, right?  What is the width
> of the write?

It's a 16-bit write.

/* A 16-bit r/w queue notifier */
#define VIRTIO_PCI_QUEUE_NOTIFY         16

> > So, if the cookie was also the trigger, we'd need an
> > eventfd per device.
> >   
> 
> I'm having trouble parsing this one.  The cookie namespace is controlled
> by the userspace component that owns the corresponding IO address, so
> there's no reason you can't make "queue-id = 0" use cookie = 0, or
> whatever.  That said, I still think a separation of the cookie and
> trigger as suggested above is a good idea, so its probably moot to
> discuss this point further.

Ah, my mistake - I thought the cookie was returned to userspace when the
eventfd was signalled, but no ... userspace only gets an event counter
value and the cookie is used during de-assignment to distinguish between
iosignalfds.

Okay, so suppose you do assign multiple times at a given address -
you're presumably going to use a different eventfd for each assignment?
If so, can't we match using both the address and eventfd at
de-assignment and drop the cookie from the interface altogether?

i.e. to replace the virtio queue notify with this, we'd:

  1) create an eventfd per queue

  2) assign each of those eventfds to the QUEUE_NOTIFY address

  3) have only one of the eventfds be triggered, depending on what 
     value is written by the guest

  4) de-assign using the address/eventfd pair to distinguish between 
     assignments

Cheers,
Mark.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-28  9:09                 ` Mark McLoughlin
@ 2009-05-28 12:12                   ` Gregory Haskins
  2009-05-31  9:11                     ` Avi Kivity
  0 siblings, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2009-05-28 12:12 UTC (permalink / raw)
  To: Mark McLoughlin
  Cc: Gregory Haskins, Avi Kivity, kvm, linux-kernel, Davide Libenzi, mtosatti

[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]

Mark McLoughlin wrote:
> On Wed, 2009-05-27 at 16:45 -0400, Gregory Haskins wrote:
>   
>> Mark McLoughlin wrote:
>>     
>>> The virtio ABI is fixed, so we couldn't e.g. have the guest use a cookie
>>> to identify a queue - it's just going to continue using a per-device
>>> queue number. 
>>>       
>> Actually, I was originally thinking this would be exposed as a virtio
>> FEATURE bit anyway, so there were no backwards-compat constraints.  That
>> said, we can possibly make it work in a backwards compat way, too. 
>> IIRC, today virtio does a PIO cycle to a specific register with the
>> queue-id when it wants to signal guest->host, right?  What is the width
>> of the write?
>>     
>
> It's a 16-bit write.
>
> /* A 16-bit r/w queue notifier */
> #define VIRTIO_PCI_QUEUE_NOTIFY         16
>   

(Thanks)

>   
>>> So, if the cookie was also the trigger, we'd need an
>>> eventfd per device.
>>>   
>>>       
>> I'm having trouble parsing this one.  The cookie namespace is controlled
>> by the userspace component that owns the corresponding IO address, so
>> there's no reason you can't make "queue-id = 0" use cookie = 0, or
>> whatever.  That said, I still think a separation of the cookie and
>> trigger as suggested above is a good idea, so its probably moot to
>> discuss this point further.
>>     
>
> Ah, my mistake - I thought the cookie was returned to userspace when the
> eventfd was signalled, but no ... userspace only gets an event counter
> value and the cookie is used during de-assignment to distinguish between
> iosignalfds.
>
> Okay, so suppose you do assign multiple times at a given address -
> you're presumably going to use a different eventfd for each assignment?
> If so, can't we match using both the address and eventfd at
> de-assignment and drop the cookie from the interface altogether?
>   

This is closer to how the original series worked, but Avi asked for a
data-match token and thus the cookie was born.  I think the rationale is
that we can't predict whether the same eventfd will be registered more
than once, and thus we need a way to further qualify it.  However, to
your point, I cannot think of a valid use case for having the same fd
registered to the same address more than once, so perhaps your fd/addr
tuple is sufficient and we can drop the cookie (or, really, rename it to
"trigger" ;)

Avi?

Regards,
-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-28 12:12                   ` Gregory Haskins
@ 2009-05-31  9:11                     ` Avi Kivity
  2009-06-01 12:14                       ` Gregory Haskins
  0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2009-05-31  9:11 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Mark McLoughlin, Gregory Haskins, kvm, linux-kernel,
	Davide Libenzi, mtosatti

Gregory Haskins wrote:
> This is closer to how the original series worked, but Avi asked for a
> data-match token and thus the cookie was born.  I think the rationale is
> that we can't predict whether the same eventfd will be registered more
> than once, and thus we need a way to further qualify it.  However, to
> your point, I cannot think of a valid use case for having the same fd
> registered to the same address more than once, so perhaps your fd/addr
> tuple is sufficient and we can drop the cookie (or, really, rename it to
> "trigger" ;)
>
> Avi?
>   

This is just how virtio works.  To kick ring N of device X, it writes N 
to a port specific to X.

If we lose N, then we don't know which ring was kicked and have to check 
them all.

May we can rename cookie to data_match to make it explicit.  If the data 
doesn't match, the eventfd isn't kicked.

(Mark, same as we have arbitrary ring->MSI mappings (allowing one MSI to 
notify multiple rings), perhaps we should have the same capability for 
the other direction?  So the guest could kick mulitple rings with one 
write, or just one ring, according to personal preference.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-31  9:11                     ` Avi Kivity
@ 2009-06-01 12:14                       ` Gregory Haskins
  0 siblings, 0 replies; 22+ messages in thread
From: Gregory Haskins @ 2009-06-01 12:14 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Mark McLoughlin, Gregory Haskins, kvm, linux-kernel,
	Davide Libenzi, mtosatti

[-- Attachment #1: Type: text/plain, Size: 2129 bytes --]

Avi Kivity wrote:
> Gregory Haskins wrote:
>> This is closer to how the original series worked, but Avi asked for a
>> data-match token and thus the cookie was born.  I think the rationale is
>> that we can't predict whether the same eventfd will be registered more
>> than once, and thus we need a way to further qualify it.  However, to
>> your point, I cannot think of a valid use case for having the same fd
>> registered to the same address more than once, so perhaps your fd/addr
>> tuple is sufficient and we can drop the cookie (or, really, rename it to
>> "trigger" ;)
>>
>> Avi?
>>   
>
> This is just how virtio works.  To kick ring N of device X, it writes
> N to a port specific to X.
>
> If we lose N, then we don't know which ring was kicked and have to
> check them all.
>
> May we can rename cookie to data_match to make it explicit.  If the
> data doesn't match, the eventfd isn't kicked.
>
> (Mark, same as we have arbitrary ring->MSI mappings (allowing one MSI
> to notify multiple rings), perhaps we should have the same capability
> for the other direction?  So the guest could kick mulitple rings with
> one write, or just one ring, according to personal preference.
>
I think we are all on the same page here, actually (more or less).  I
think the confusion was my own when you initially asked for a data-match
token, and I gave you "cookie".  "cookie" as I implemented it was really
only used to match up the eventfd during de-assign, not the actual
signal event.  I think what you are talking about here is the same as
what Mark and I have been calling "trigger".  I agree that we need this
interface to be able to properly sort something like virtio, and I will
be including this in the next release.

OTOH, what I was proposing above is that my misguided attempt at
"cookie" for deassign is redundant with simply looking at the
eventfd/addr tuple.  We can simply key off of those items to match up
the iosignalfd to close, and therefore lets get rid of the cookie field.

Sorry for the confusion.  Let me know if you still think this isn't right.

-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-05-27 17:48             ` Mark McLoughlin
  2009-05-27 20:45               ` Gregory Haskins
@ 2009-06-03 22:04               ` Gregory Haskins
  2009-06-04 13:20                 ` Mark McLoughlin
  1 sibling, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2009-06-03 22:04 UTC (permalink / raw)
  To: Mark McLoughlin
  Cc: Gregory Haskins, Avi Kivity, kvm, linux-kernel, Davide Libenzi, mtosatti

[-- Attachment #1: Type: text/plain, Size: 4683 bytes --]

Mark McLoughlin wrote:
> On Wed, 2009-05-27 at 13:40 -0400, Gregory Haskins wrote:
>   
>> Mark McLoughlin wrote:
>>     
>>> On Wed, 2009-05-27 at 15:11 +0300, Avi Kivity wrote:
>>>
>>>   
>>>       
>>>> Multiple cookies on the same address are required by virtio.  You can't 
>>>> mux since the data doesn't go anywhere.
>>>>
>>>> Virtio can survive by checking all rings on a notify, and we can later 
>>>> add a mechanism that has a distinct address for each ring, but let's see 
>>>> if we can cope with multiple cookies.  Mark?
>>>>     
>>>>         
>>> Trying to catch up, but you're talking about replacing virtio-pci
>>> QUEUE_NOTIFY handling with iosignalfd ?
>>>
>>> For a perfect replacement, what you really need is to be able to
>>> register multiple cookies per address range, but only have them trigger
>>> if the written data matches a provided value.
>>>   
>>>       
>> Hmm..thats an interesting idea.  To date, the "cookie" has really been
>> for identifying the proper range selected for deassignment.  I never
>> thought of using it as an actual trigger value at run-time.
>>
>>     
>>> If the data is lost, virtio has no way of knowing which queue is being
>>> notified, so we either end up with per-device, rather than per-queue,
>>> notifications (probably not too bad for net, at least) or a different
>>> notify address per queue (limiting the number of queues per device).
>>>   
>>>       
>> The addr-per-queue is how I was envisioning it, but the trigger value
>> concept hadn't occurred to me.  I could make this an option during
>> assignment (e.g. "COOKIE" flag means only trigger on writes of the
>> provided cookie, otherwise trigger on any write).  Sound good?
>>     
>
> Ah, I'd been thinking of the trigger data being provided separately to
> the cookie.
>
> The virtio ABI is fixed, so we couldn't e.g. have the guest use a cookie
> to identify a queue - it's just going to continue using a per-device
> queue number. So, if the cookie was also the trigger, we'd need an
> eventfd per device.
>
> And if this was a device where the guest writes similar values to
> multiple addresses, you'd need an eventfd per address.
>
>   

Hi Mark,
  So with the v5 release of iosignalfd, we now have the notion of a
"trigger", the API of which is as follows:

-----------------------
/*!
 * \brief Assign an eventfd to an IO port (PIO or MMIO)
 *
 * Assigns an eventfd based file-descriptor to a specific PIO or MMIO
 * address range.  Any guest writes to the specified range will generate
 * an eventfd signal.
 *
 * A data-match pointer can be optionally provided in "trigger" and only
 * writes which match this value exactly will generate an event.  The length
 * of the trigger is established by the length of the overall IO range, and
 * therefore must be in a natural byte-width for the IO routines of your
 * particular architecture (e.g. 1, 2, 4, or 8 bytes on x86_64).
 *
 * \param kvm Pointer to the current kvm_context
 * \param addr The IO address
 * \param len The length of the IO region at the address
 * \param fd The eventfd file-descriptor
 * \param trigger A optional pointer providing data-match token
 * \param flags FLAG_PIO: PIO, else MMIO
 */
int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long addr, size_t len,
              int fd, void *trigger, int flags);
-----------------

in the kvm-eventfd test harness, I create three unique eventfd handles,
and do the following:


-------------------

    unsigned char matchA = 0xa5, matchB = 0x42;

    kvm_assign_iosignalfd(kvm_context, addr, 1, fd[0], NULL, 0);
    kvm_assign_iosignalfd(kvm_context, addr, 1, fd[1], &matchA, 0);
    kvm_assign_iosignalfd(kvm_context, addr, 1, fd[2], &matchB, 0);

-------------------

In otherwords, I register a "NULL" trigger (wildcarded) on the first
fd.  The second has a data-match trigger of 0xa5, and the third has
0x42.  All three of these eventfd's map to the same mmio address with a
width of 1 byte.

I also fork a task which selects all three fds, and will print out the
eventfd "count" value when tripped.

Then, in the guest, I do:

----------------------

    iowrite8(0, iosignalfd_mmio);
    iowrite8(0xa5, iosignalfd_mmio);
    iowrite8(0x42, iosignalfd_mmio);

-------------------

The result of which is:

IOSIGNALFD 0: event triggered with val 3
IOSIGNALFD 1: event triggered with val 1
IOSIGNALFD 2: event triggered with val 1

on the host, which is my expected outcome.  Let me know if you do not
think this is sufficient to implement a solution to your virtio-pci design.

-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
  2009-06-03 22:04               ` Gregory Haskins
@ 2009-06-04 13:20                 ` Mark McLoughlin
  0 siblings, 0 replies; 22+ messages in thread
From: Mark McLoughlin @ 2009-06-04 13:20 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Gregory Haskins, Avi Kivity, kvm, linux-kernel, Davide Libenzi, mtosatti

Hi Greg,

On Wed, 2009-06-03 at 18:04 -0400, Gregory Haskins wrote:
> Hi Mark,
>   So with the v5 release of iosignalfd, we now have the notion of a
> "trigger", the API of which is as follows:
> 
> -----------------------
> /*!
>  * \brief Assign an eventfd to an IO port (PIO or MMIO)
>  *
>  * Assigns an eventfd based file-descriptor to a specific PIO or MMIO
>  * address range.  Any guest writes to the specified range will generate
>  * an eventfd signal.
>  *
>  * A data-match pointer can be optionally provided in "trigger" and only
>  * writes which match this value exactly will generate an event.  The length
>  * of the trigger is established by the length of the overall IO range, and
>  * therefore must be in a natural byte-width for the IO routines of your
>  * particular architecture (e.g. 1, 2, 4, or 8 bytes on x86_64).

This looks like it'll work fine for virtio-pci.

Thanks,
Mark.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-06-04 13:20 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-26 19:15 [KVM PATCH v4 0/3] iosignalfd Gregory Haskins
2009-05-26 19:15 ` [KVM PATCH v4 1/3] eventfd: export eventfd interfaces for module use Gregory Haskins
2009-05-27  8:57   ` Avi Kivity
2009-05-27 11:53     ` Gregory Haskins
2009-05-26 19:15 ` [KVM PATCH v4 2/3] kvm: make io_bus interface more robust Gregory Haskins
2009-05-27  8:54   ` Avi Kivity
2009-05-27 11:26     ` Gregory Haskins
2009-05-26 19:15 ` [KVM PATCH v4 3/3] kvm: add iosignalfd support Gregory Haskins
2009-05-27  9:03   ` Avi Kivity
2009-05-27 11:47     ` Gregory Haskins
2009-05-27 12:11       ` Avi Kivity
2009-05-27 12:54         ` Gregory Haskins
2009-05-27 17:25         ` Mark McLoughlin
2009-05-27 17:40           ` Gregory Haskins
2009-05-27 17:48             ` Mark McLoughlin
2009-05-27 20:45               ` Gregory Haskins
2009-05-28  9:09                 ` Mark McLoughlin
2009-05-28 12:12                   ` Gregory Haskins
2009-05-31  9:11                     ` Avi Kivity
2009-06-01 12:14                       ` Gregory Haskins
2009-06-03 22:04               ` Gregory Haskins
2009-06-04 13:20                 ` Mark McLoughlin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.