linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] virtio PCI driver
@ 2007-11-08  2:46 Anthony Liguori
  2007-11-08  2:46 ` [PATCH 1/3] Export vring functions for modules to use Anthony Liguori
  0 siblings, 1 reply; 33+ messages in thread
From: Anthony Liguori @ 2007-11-08  2:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Anthony Liguori, Rusty Russell, virtualization, kvm-devel

This patch series implements a PCI driver for virtio.  This allows virtio
devices (like block and network) to be used in QEMU/KVM.  I'll post a very
early KVM userspace backend in kvm-devel for those that are interested.

This series depends on the two virtio fixes I've posted and Rusty's config_ops
refactoring.  I've tested with these patches on Rusty's experimental virtio
tree.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/3] Export vring functions for modules to use
  2007-11-08  2:46 [PATCH 0/3] virtio PCI driver Anthony Liguori
@ 2007-11-08  2:46 ` Anthony Liguori
  2007-11-08  2:46   ` [PATCH 2/3] Put the virtio under the virtualization menu Anthony Liguori
  0 siblings, 1 reply; 33+ messages in thread
From: Anthony Liguori @ 2007-11-08  2:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Anthony Liguori, Rusty Russell, virtualization, kvm-devel

This is needed for the virtio PCI device to be compiled as a module.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0e1bf05..3f28b47 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -260,6 +260,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 	return IRQ_HANDLED;
 }
 
+EXPORT_SYMBOL_GPL(vring_interrupt);
+
 static struct virtqueue_ops vring_vq_ops = {
 	.add_buf = vring_add_buf,
 	.get_buf = vring_get_buf,
@@ -306,8 +308,12 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
 	return &vq->vq;
 }
 
+EXPORT_SYMBOL_GPL(vring_new_virtqueue);
+
 void vring_del_virtqueue(struct virtqueue *vq)
 {
 	kfree(to_vvq(vq));
 }
 
+EXPORT_SYMBOL_GPL(vring_del_virtqueue);
+

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/3] Put the virtio under the virtualization menu
  2007-11-08  2:46 ` [PATCH 1/3] Export vring functions for modules to use Anthony Liguori
@ 2007-11-08  2:46   ` Anthony Liguori
  2007-11-08  2:46     ` [PATCH 3/3] virtio PCI device Anthony Liguori
  2007-11-08  6:49     ` [kvm-devel] [PATCH 2/3] Put the virtio under the virtualization menu Avi Kivity
  0 siblings, 2 replies; 33+ messages in thread
From: Anthony Liguori @ 2007-11-08  2:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Anthony Liguori, Rusty Russell, virtualization, kvm-devel

This patch moves virtio under the virtualization menu and changes virtio
devices to not claim to only be for lguest.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/drivers/Kconfig b/drivers/Kconfig
index f4076d9..d945ffc 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -93,6 +93,4 @@ source "drivers/auxdisplay/Kconfig"
 source "drivers/kvm/Kconfig"
 
 source "drivers/uio/Kconfig"
-
-source "drivers/virtio/Kconfig"
 endmenu
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 4d0119e..be4b224 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -429,6 +429,7 @@ config VIRTIO_BLK
 	tristate "Virtio block driver (EXPERIMENTAL)"
 	depends on EXPERIMENTAL && VIRTIO
 	---help---
-	  This is the virtual block driver for lguest.  Say Y or M.
+	  This is the virtual block driver for virtio.  It can be used with
+          lguest or QEMU based VMMs (like KVM or Xen).  Say Y or M.
 
 endif # BLK_DEV
diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 6569206..ac4bcdf 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -50,5 +50,6 @@ config KVM_AMD
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source drivers/lguest/Kconfig
+source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 86b8641..e66aec4 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -3107,6 +3107,7 @@ config VIRTIO_NET
 	tristate "Virtio network driver (EXPERIMENTAL)"
 	depends on EXPERIMENTAL && VIRTIO
 	---help---
-	  This is the virtual network driver for lguest.  Say Y or M.
+	  This is the virtual network driver for virtio.  It can be used with
+          lguest or QEMU based VMMs (like KVM or Xen).  Say Y or M.
 
 endif # NETDEVICES

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/3] virtio PCI device
  2007-11-08  2:46   ` [PATCH 2/3] Put the virtio under the virtualization menu Anthony Liguori
@ 2007-11-08  2:46     ` Anthony Liguori
  2007-11-08  6:12       ` [kvm-devel] " Avi Kivity
                         ` (3 more replies)
  2007-11-08  6:49     ` [kvm-devel] [PATCH 2/3] Put the virtio under the virtualization menu Avi Kivity
  1 sibling, 4 replies; 33+ messages in thread
From: Anthony Liguori @ 2007-11-08  2:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Anthony Liguori, Rusty Russell, virtualization, kvm-devel

This is a PCI device that implements a transport for virtio.  It allows virtio
devices to be used by QEMU based VMMs like KVM or Xen.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 9e33fc4..c81e0f3 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -6,3 +6,20 @@ config VIRTIO
 config VIRTIO_RING
 	bool
 	depends on VIRTIO
+
+config VIRTIO_PCI
+	tristate "PCI driver for virtio devices (EXPERIMENTAL)"
+	depends on PCI && EXPERIMENTAL
+	select VIRTIO
+	select VIRTIO_RING
+	---help---
+	  This drivers provides support for virtio based paravirtual device
+	  drivers over PCI.  This requires that your VMM has appropriate PCI
+	  virtio backends.  Most QEMU based VMMs should support these devices
+	  (like KVM or Xen).
+
+	  Currently, the ABI is not considered stable so there is no guarantee
+	  that this version of the driver will work with your VMM.
+
+	  If unsure, say M.
+          
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index f70e409..cc84999 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_VIRTIO) += virtio.o
 obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
+obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
new file mode 100644
index 0000000..85ae096
--- /dev/null
+++ b/drivers/virtio/virtio_pci.c
@@ -0,0 +1,469 @@
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/pci.h>
+#include <linux/interrupt.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_ring.h>
+#include <linux/virtio_pci.h>
+#include <linux/highmem.h>
+#include <linux/spinlock.h>
+
+MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
+MODULE_DESCRIPTION("virtio-pci");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1");
+
+/* Our device structure */
+struct virtio_pci_device
+{
+	/* the virtio device */
+	struct virtio_device vdev;
+	/* the PCI device */
+	struct pci_dev *pci_dev;
+	/* the IO mapping for the PCI config space */
+	void *ioaddr;
+
+	spinlock_t lock;
+	struct list_head virtqueues;
+};
+
+struct virtio_pci_vq_info
+{
+	/* the number of entries in the queue */
+	int num;
+	/* the number of pages the device needs for the ring queue */
+	int n_pages;
+	/* the index of the queue */
+	int queue_index;
+	/* the struct page of the ring queue */
+	struct page *pages;
+	/* the virtual address of the ring queue */
+	void *queue;
+	/* a pointer to the virtqueue */
+	struct virtqueue *vq;
+	/* the node pointer */
+	struct list_head node;
+};
+
+/* We have to enumerate here all virtio PCI devices. */
+static struct pci_device_id virtio_pci_id_table[] = {
+	{ 0x5002, 0x2258, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, /* Dummy entry */
+	{ 0 },
+};
+
+MODULE_DEVICE_TABLE(pci, virtio_pci_id_table);
+
+/* A PCI device has it's own struct device and so does a virtio device so
+ * we create a place for the virtio devices to show up in sysfs.  I think it
+ * would make more sense for virtio to not insist on having it's own device. */
+static struct device virtio_pci_root = {
+	.parent		= NULL,
+	.bus_id		= "virtio-pci",
+};
+
+/* Unique numbering for devices under the kvm root */
+static unsigned int dev_index;
+
+/* Convert a generic virtio device to our structure */
+static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
+{
+	return container_of(vdev, struct virtio_pci_device, vdev);
+}
+
+/* virtio config->feature() implementation */
+static bool vp_feature(struct virtio_device *vdev, unsigned bit)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	u32 mask;
+
+	/* Since this function is supposed to have the side effect of
+	 * enabling a queried feature, we simulate that by doing a read
+	 * from the host feature bitmask and then writing to the guest
+	 * feature bitmask */
+	mask = ioread32(vp_dev->ioaddr + VIRTIO_PCI_HOST_FEATURES);
+	if (mask & (1 << bit)) {
+		mask |= (1 << bit);
+		iowrite32(mask, vp_dev->ioaddr + VIRTIO_PCI_GUEST_FEATURES);
+	}
+
+	return !!(mask & (1 << bit));
+}
+
+/* virtio config->get() implementation */
+static void vp_get(struct virtio_device *vdev, unsigned offset,
+		   void *buf, unsigned len)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	void *ioaddr = vp_dev->ioaddr + VIRTIO_PCI_CONFIG + offset;
+
+	/* We translate appropriately sized get requests into more natural
+	 * IO operations.  These functions also take care of endianness
+	 * conversion. */
+	switch (len) {
+	case 1: {
+		u8 val;
+		val = ioread8(ioaddr);
+		memcpy(buf, &val, sizeof(val));
+		break;
+	}
+	case 2: {
+		u16 val;
+		val = ioread16(ioaddr);
+		memcpy(buf, &val, sizeof(val));
+		break;
+	}
+	case 4: {
+		u32 val;
+		val = ioread32(ioaddr);
+		memcpy(buf, &val, sizeof(val));
+		break;
+	}
+	case 8: {
+		u64 val;
+		val = (u64)ioread32(ioaddr) << 32;
+		val |= ioread32(ioaddr + 4);
+		memcpy(buf, &val, sizeof(val));
+		break;
+	}
+
+	/* for strange accesses of an odd size, we do not perform any
+	 * endianness conversion. */
+	default:
+		ioread8_rep(ioaddr, buf, len);
+		break;
+	}
+}
+
+/* the config->set() implementation.  it's symmetric to the config->get()
+ * implementation */
+static void vp_set(struct virtio_device *vdev, unsigned offset,
+		   const void *buf, unsigned len)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	void *ioaddr = vp_dev->ioaddr + VIRTIO_PCI_CONFIG + offset;
+
+	switch (len) {
+	case 1: {
+		u8 val;
+		memcpy(&val, buf, sizeof(val));
+		iowrite8(val, ioaddr);
+		break;
+	}
+	case 2: {
+		u16 val;
+		memcpy(&val, buf, sizeof(val));
+		iowrite16(val, ioaddr);
+		break;
+	}
+	case 4: {
+		u32 val;
+		memcpy(&val, buf, sizeof(val));
+		iowrite32(val, ioaddr);
+		break;
+	}
+	case 8: {
+		u64 val;
+		memcpy(&val, buf, sizeof(val));
+		iowrite32(val >> 32, ioaddr);
+		iowrite32(val, ioaddr + 4);
+		break;
+	}
+	default:
+		iowrite8_rep(ioaddr, buf, len);
+		break;
+	}
+}
+
+/* config->{get,set}_status() implementations */
+static u8 vp_get_status(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	return ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
+}
+
+static void vp_set_status(struct virtio_device *vdev, u8 status)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	return iowrite8(status, vp_dev->ioaddr + VIRTIO_PCI_STATUS);
+}
+
+/* the notify function used when creating a virt queue */
+static void vp_notify(struct virtqueue *vq)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
+
+	/* we write the queue's selector into the notification register to
+	 * signal the other end */
+	iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
+}
+
+/* A small wrapper to also acknowledge the interrupt when it's handled.
+ * I really need an EIO hook for the vring so I can ack the interrupt once we
+ * know that we'll be handling the IRQ but before we invoke the callback since
+ * the callback may notify the host which results in the host attempting to
+ * raise an interrupt that we would then mask once we acknowledged the
+ * interrupt. */
+static irqreturn_t vp_interrupt(int irq, void *opaque)
+{
+	struct virtio_pci_device *vp_dev = opaque;
+	struct virtio_pci_vq_info *info;
+	irqreturn_t ret = IRQ_NONE;
+	u8 isr;
+
+	/* reading the ISR has the effect of also clearing it so it's very
+	 * important to save off the value. */
+	isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
+
+	/* It's definitely not us if the ISR was not high */
+	if (!isr)
+		return IRQ_NONE;
+
+	spin_lock(&vp_dev->lock);
+	list_for_each_entry(info, &vp_dev->virtqueues, node) {
+		if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+			ret = IRQ_HANDLED;
+	}
+	spin_unlock(&vp_dev->lock);
+
+	return ret;
+}
+
+/* the config->find_vq() implementation */
+static struct virtqueue *vp_find_vq(struct virtio_device *vdev, unsigned index,
+				    bool (*callback)(struct virtqueue *vq))
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_vq_info *info;
+	struct virtqueue *vq;
+	int err;
+	u16 num;
+
+	/* Select the queue we're interested in */
+	iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
+
+	/* Check if queue is either not available or already active. */
+	num = ioread16(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NUM);
+	if (!num || ioread32(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN))
+		return ERR_PTR(-ENOENT);
+
+	/* allocate and fill out our structure the represents an active
+	 * queue */
+	info = kmalloc(sizeof(struct virtio_pci_vq_info), GFP_KERNEL);
+	if (!info)
+		return ERR_PTR(-ENOMEM);
+
+	info->queue_index = index;
+	info->num = num;
+
+	/* determine the memory needed for the queue and provide the memory
+	 * location to the host */
+	info->n_pages = DIV_ROUND_UP(vring_size(num), PAGE_SIZE);
+	info->pages = alloc_pages(GFP_KERNEL | __GFP_ZERO,
+				  get_order(info->n_pages));
+	if (info->pages == NULL) {
+		err = -ENOMEM;
+		goto out_info;
+	}
+
+	/* FIXME: is this sufficient for info->n_pages > 1? */
+	info->queue = kmap(info->pages);
+	if (info->queue == NULL) {
+		err = -ENOMEM;
+		goto out_alloc_pages;
+	}
+
+	/* activate the queue */
+	iowrite32(page_to_pfn(info->pages),
+		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+		  
+	/* create the vring */
+	vq = vring_new_virtqueue(info->num, vdev, info->queue,
+				 vp_notify, callback);
+	if (!vq) {
+		err = -ENOMEM;
+		goto out_activate_queue;
+	}
+
+	vq->priv = info;
+	info->vq = vq;
+
+	spin_lock(&vp_dev->lock);
+	list_add(&info->node, &vp_dev->virtqueues);
+	spin_unlock(&vp_dev->lock);
+
+	return vq;
+
+out_activate_queue:
+	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+	kunmap(info->queue);
+out_alloc_pages:
+	__free_pages(info->pages, get_order(info->n_pages));
+out_info:
+	kfree(info);
+	return ERR_PTR(err);
+}
+
+/* the config->del_vq() implementation */
+static void vp_del_vq(struct virtqueue *vq)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
+
+	spin_lock(&vp_dev->lock);
+	list_del(&info->node);
+	spin_unlock(&vp_dev->lock);
+
+	vring_del_virtqueue(vq);
+
+	/* Select and deactivate the queue */
+	iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
+	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
+
+	kunmap(info->queue);
+	__free_pages(info->pages, get_order(info->n_pages));
+	kfree(info);
+}
+
+static struct virtio_config_ops virtio_pci_config_ops = {
+	.feature	= vp_feature,
+	.get		= vp_get,
+	.set		= vp_set,
+	.get_status	= vp_get_status,
+	.set_status	= vp_set_status,
+	.find_vq	= vp_find_vq,
+	.del_vq		= vp_del_vq,
+};
+
+/* the PCI probing function */
+static int __devinit virtio_pci_probe(struct pci_dev *pci_dev,
+				      const struct pci_device_id *id)
+{
+	struct virtio_pci_device *vp_dev;
+	int err;
+
+	/* allocate our structure and fill it out */
+	vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
+	if (vp_dev == NULL)
+		return -ENOMEM;
+
+	vp_dev->pci_dev = pci_dev;
+	vp_dev->vdev.dev.parent = &virtio_pci_root;
+	vp_dev->vdev.index = dev_index++;
+	vp_dev->vdev.config = &virtio_pci_config_ops;
+	INIT_LIST_HEAD(&vp_dev->virtqueues);
+	spin_lock_init(&vp_dev->lock);
+
+	/* enable the device */
+	err = pci_enable_device(pci_dev);
+	if (err)
+		goto out;
+
+	err = pci_request_regions(pci_dev, "virtio-pci");
+	if (err)
+		goto out_enable_device;
+
+	vp_dev->ioaddr = pci_iomap(pci_dev, 0, 0);
+	if (vp_dev->ioaddr == NULL)
+		goto out_req_regions;
+
+	pci_set_drvdata(pci_dev, vp_dev);
+
+	/* we use the subsystem vendor/device id as the virtio vendor/device
+	 * id.  this allows us to use the same PCI vendor/device id for all
+	 * virtio devices and to identify the particular virtio driver by
+	 * the subsytem ids */
+	vp_dev->vdev.id.vendor = pci_dev->subsystem_vendor;
+	vp_dev->vdev.id.device = pci_dev->subsystem_device;
+
+	/* register a handler for the queue with the PCI device's interrupt */
+	err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, IRQF_SHARED,
+			  pci_name(vp_dev->pci_dev), vp_dev);
+	if (err)
+		goto out_set_drvdata;
+
+	/* finally register the virtio device */
+	err = register_virtio_device(&vp_dev->vdev);
+	if (err)
+		goto out_req_irq;
+
+	return 0;
+
+out_req_irq:
+	free_irq(pci_dev->irq, vp_dev);
+out_set_drvdata:
+	pci_set_drvdata(pci_dev, NULL);
+	pci_iounmap(pci_dev, vp_dev->ioaddr);
+out_req_regions:
+	pci_release_regions(pci_dev);
+out_enable_device:
+	pci_disable_device(pci_dev);
+out:
+	kfree(vp_dev);
+	return err;
+}
+
+static void __devexit virtio_pci_remove(struct pci_dev *pci_dev)
+{
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+
+	free_irq(pci_dev->irq, vp_dev);
+	pci_set_drvdata(pci_dev, NULL);
+	pci_iounmap(pci_dev, vp_dev->ioaddr);
+	pci_release_regions(pci_dev);
+	pci_disable_device(pci_dev);
+	kfree(vp_dev);
+}
+
+#ifdef CONFIG_PM
+static int virtio_pci_suspend(struct pci_dev *pci_dev, pm_message_t state)
+{
+	pci_save_state(pci_dev);
+	pci_set_power_state(pci_dev, PCI_D3hot);
+	return 0;
+}
+
+static int virtio_pci_resume(struct pci_dev *pci_dev)
+{
+	pci_restore_state(pci_dev);
+	pci_set_power_state(pci_dev, PCI_D0);
+	return 0;
+}
+#endif
+
+static struct pci_driver virtio_pci_driver = {
+	.name		= "virtio-pci",
+	.id_table	= virtio_pci_id_table,
+	.probe		= virtio_pci_probe,
+	.remove		= virtio_pci_remove,
+#ifdef CONFIG_PM
+	.suspend	= virtio_pci_suspend,
+	.resume		= virtio_pci_resume,
+#endif
+};
+
+static int __init virtio_pci_init(void)
+{
+	int err;
+
+	err = device_register(&virtio_pci_root);
+	if (err)
+		return err;
+
+	err = pci_register_driver(&virtio_pci_driver);
+	if (err)
+		device_unregister(&virtio_pci_root);
+
+	return err;
+}
+
+module_init(virtio_pci_init);
+
+static void __exit virtio_pci_exit(void)
+{
+	device_unregister(&virtio_pci_root);
+	pci_unregister_driver(&virtio_pci_driver);
+}
+
+module_exit(virtio_pci_exit);
diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
new file mode 100644
index 0000000..b1a1568
--- /dev/null
+++ b/include/linux/virtio_pci.h
@@ -0,0 +1,36 @@
+#ifndef _LINUX_VIRTIO_PCI_H
+#define _LINUX_VIRTIO_PCI_H
+
+#include <linux/virtio_config.h>
+
+/* A 32-bit r/o bitmask of the features supported by the host */
+#define VIRTIO_PCI_HOST_FEATURES	0
+
+/* A 32-bit r/w bitmask of features activated by the guest */
+#define VIRTIO_PCI_GUEST_FEATURES	4
+
+/* A 32-bit r/w PFN for the currently selected queue */
+#define VIRTIO_PCI_QUEUE_PFN		8
+
+/* A 16-bit r/o queue size for the currently selected queue */
+#define VIRTIO_PCI_QUEUE_NUM		12
+
+/* A 16-bit r/w queue selector */
+#define VIRTIO_PCI_QUEUE_SEL		14
+
+/* A 16-bit r/w queue notifier */
+#define VIRTIO_PCI_QUEUE_NOTIFY		16
+
+/* An 8-bit device status register.  */
+#define VIRTIO_PCI_STATUS		18
+
+/* An 8-bit r/o interrupt status register.  Reading the value will return the
+ * current contents of the ISR and will also clear it.  This is effectively
+ * a read-and-acknowledge. */
+#define VIRTIO_PCI_ISR			19
+
+/* The remaining space is defined by each driver as the per-driver
+ * configuration space */
+#define VIRTIO_PCI_CONFIG		20
+
+#endif

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-08  2:46     ` [PATCH 3/3] virtio PCI device Anthony Liguori
@ 2007-11-08  6:12       ` Avi Kivity
  2007-11-08 13:54         ` Anthony Liguori
  2007-11-08 17:46       ` Arnd Bergmann
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: Avi Kivity @ 2007-11-08  6:12 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: linux-kernel, Rusty Russell, virtualization, kvm-devel

Anthony Liguori wrote:
> This is a PCI device that implements a transport for virtio.  It allows virtio
> devices to be used by QEMU based VMMs like KVM or Xen.
>
>   

Didn't see support for dma. I think that with Amit's pvdma patches you
can support dma-capable devices as well without too much fuss.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 2/3] Put the virtio under the virtualization menu
  2007-11-08  2:46   ` [PATCH 2/3] Put the virtio under the virtualization menu Anthony Liguori
  2007-11-08  2:46     ` [PATCH 3/3] virtio PCI device Anthony Liguori
@ 2007-11-08  6:49     ` Avi Kivity
  1 sibling, 0 replies; 33+ messages in thread
From: Avi Kivity @ 2007-11-08  6:49 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: linux-kernel, Rusty Russell, virtualization, kvm-devel

Anthony Liguori wrote:
> This patch moves virtio under the virtualization menu and changes virtio
> devices to not claim to only be for lguest.
>   

Perhaps the virt menu needs to be split into a host-side support menu
and guest-side support menu.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-08  6:12       ` [kvm-devel] " Avi Kivity
@ 2007-11-08 13:54         ` Anthony Liguori
  2007-11-08 14:37           ` Avi Kivity
  2007-11-08 23:43           ` Dor Laor
  0 siblings, 2 replies; 33+ messages in thread
From: Anthony Liguori @ 2007-11-08 13:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linux-kernel, Rusty Russell, virtualization, kvm-devel

Avi Kivity wrote:
> Anthony Liguori wrote:
>   
>> This is a PCI device that implements a transport for virtio.  It allows virtio
>> devices to be used by QEMU based VMMs like KVM or Xen.
>>
>>   
>>     
>
> Didn't see support for dma.

Not sure what you're expecting there.  Using dma_ops in virtio_ring?

>  I think that with Amit's pvdma patches you
> can support dma-capable devices as well without too much fuss.
>   

What is the use case you're thinking of?  A semi-paravirt driver that 
does dma directly to a device?

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-08 13:54         ` Anthony Liguori
@ 2007-11-08 14:37           ` Avi Kivity
  2007-11-08 15:06             ` Anthony Liguori
  2007-11-08 23:43           ` Dor Laor
  1 sibling, 1 reply; 33+ messages in thread
From: Avi Kivity @ 2007-11-08 14:37 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: linux-kernel, Rusty Russell, virtualization, kvm-devel

Anthony Liguori wrote:
> Avi Kivity wrote:
>> Anthony Liguori wrote:
>>  
>>> This is a PCI device that implements a transport for virtio.  It 
>>> allows virtio
>>> devices to be used by QEMU based VMMs like KVM or Xen.
>>>
>>>       
>>
>> Didn't see support for dma.
>
> Not sure what you're expecting there.  Using dma_ops in virtio_ring?
>

If a pci device is capable of dma (or issuing interrupts), it will be 
useless with pv pci.


>>  I think that with Amit's pvdma patches you
>> can support dma-capable devices as well without too much fuss.
>>   
>
> What is the use case you're thinking of?  A semi-paravirt driver that 
> does dma directly to a device?

No, an unmodified driver that, by using clever tricks with dma_ops, can 
do dma directly to guest memory.  See Amit's patches.

In fact, why do a virtio transport at all?  It can be done either with 
trap'n'emulate, or by directly mapping the device mmio space into the guest.


(what use case are you considering? devices without interrupts and dma? 
pci door stoppers?)

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-08 14:37           ` Avi Kivity
@ 2007-11-08 15:06             ` Anthony Liguori
  2007-11-08 15:13               ` Avi Kivity
  0 siblings, 1 reply; 33+ messages in thread
From: Anthony Liguori @ 2007-11-08 15:06 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linux-kernel, Rusty Russell, virtualization, kvm-devel

Avi Kivity wrote:
> If a pci device is capable of dma (or issuing interrupts), it will be 
> useless with pv pci.

Hrm, I think we may be talking about different things.  Are you thinking 
that the driver I posted allows you to do PCI pass-through over virtio?  
That's not what it is.

The driver I posted is a virtio implementation that uses a PCI device.  
This lets you use virtio-blk and virtio-net under KVM.  The alternative 
to this virtio PCI device would be a virtio transport built with 
hypercalls like lguest has.  I choose a PCI device because it ensured 
that each virtio device showed up like a normal PCI device.

Am I misunderstanding what you're asking about?

Regards,

Anthony Liguori

>
>>>  I think that with Amit's pvdma patches you
>>> can support dma-capable devices as well without too much fuss.
>>>   
>>
>> What is the use case you're thinking of?  A semi-paravirt driver that 
>> does dma directly to a device?
>
> No, an unmodified driver that, by using clever tricks with dma_ops, 
> can do dma directly to guest memory.  See Amit's patches.
>
> In fact, why do a virtio transport at all?  It can be done either with 
> trap'n'emulate, or by directly mapping the device mmio space into the 
> guest.
>
>
> (what use case are you considering? devices without interrupts and 
> dma? pci door stoppers?)
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-08 15:06             ` Anthony Liguori
@ 2007-11-08 15:13               ` Avi Kivity
  0 siblings, 0 replies; 33+ messages in thread
From: Avi Kivity @ 2007-11-08 15:13 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: linux-kernel, Rusty Russell, virtualization, kvm-devel

Anthony Liguori wrote:
> Avi Kivity wrote:
>> If a pci device is capable of dma (or issuing interrupts), it will be 
>> useless with pv pci.
>
> Hrm, I think we may be talking about different things.  Are you 
> thinking that the driver I posted allows you to do PCI pass-through 
> over virtio?  That's not what it is.
>
> The driver I posted is a virtio implementation that uses a PCI 
> device.  This lets you use virtio-blk and virtio-net under KVM.  The 
> alternative to this virtio PCI device would be a virtio transport 
> built with hypercalls like lguest has.  I choose a PCI device because 
> it ensured that each virtio device showed up like a normal PCI device.
>
> Am I misunderstanding what you're asking about?
>

No, I completely misunderstood the patch.  Should review complete 
patches rather than random hunks.

Sorry for the noise.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-08  2:46     ` [PATCH 3/3] virtio PCI device Anthony Liguori
  2007-11-08  6:12       ` [kvm-devel] " Avi Kivity
@ 2007-11-08 17:46       ` Arnd Bergmann
  2007-11-08 19:04         ` Anthony Liguori
  2007-11-09  0:39       ` Dor Laor
  2007-11-20 15:01       ` Avi Kivity
  3 siblings, 1 reply; 33+ messages in thread
From: Arnd Bergmann @ 2007-11-08 17:46 UTC (permalink / raw)
  To: kvm-devel; +Cc: linux-kernel, Rusty Russell, virtualization, Anthony Liguori

On Thursday 08 November 2007, Anthony Liguori wrote:
> +/* A PCI device has it's own struct device and so does a virtio device so
> + * we create a place for the virtio devices to show up in sysfs.  I think it
> + * would make more sense for virtio to not insist on having it's own device. */
> +static struct device virtio_pci_root = {
> +       .parent         = NULL,
> +       .bus_id         = "virtio-pci",
> +};
> +
> +/* Unique numbering for devices under the kvm root */
> +static unsigned int dev_index;
> +

...

> +/* the PCI probing function */
> +static int __devinit virtio_pci_probe(struct pci_dev *pci_dev,
> +                                     const struct pci_device_id *id)
> +{
> +       struct virtio_pci_device *vp_dev;
> +       int err;
> +
> +       /* allocate our structure and fill it out */
> +       vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
> +       if (vp_dev == NULL)
> +               return -ENOMEM;
> +
> +       vp_dev->pci_dev = pci_dev;
> +       vp_dev->vdev.dev.parent = &virtio_pci_root;

If you use 

	vp_dev->vdev.dev.parent = &pci_dev->dev;

Then there is no need for the special kvm root device, and the actual
virtio device shows up in a more logical place, under where it is
really (virtually) attached.

	Arnd <><

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-08 17:46       ` Arnd Bergmann
@ 2007-11-08 19:04         ` Anthony Liguori
  2007-11-09 11:03           ` Arnd Bergmann
  0 siblings, 1 reply; 33+ messages in thread
From: Anthony Liguori @ 2007-11-08 19:04 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: kvm-devel, linux-kernel, Rusty Russell, virtualization

Arnd Bergmann wrote:
> On Thursday 08 November 2007, Anthony Liguori wrote:
>   
>> +/* A PCI device has it's own struct device and so does a virtio device so
>> + * we create a place for the virtio devices to show up in sysfs.  I think it
>> + * would make more sense for virtio to not insist on having it's own device. */
>> +static struct device virtio_pci_root = {
>> +       .parent         = NULL,
>> +       .bus_id         = "virtio-pci",
>> +};
>> +
>> +/* Unique numbering for devices under the kvm root */
>> +static unsigned int dev_index;
>> +
>>     
>
> ...
>
>   
>> +/* the PCI probing function */
>> +static int __devinit virtio_pci_probe(struct pci_dev *pci_dev,
>> +                                     const struct pci_device_id *id)
>> +{
>> +       struct virtio_pci_device *vp_dev;
>> +       int err;
>> +
>> +       /* allocate our structure and fill it out */
>> +       vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
>> +       if (vp_dev == NULL)
>> +               return -ENOMEM;
>> +
>> +       vp_dev->pci_dev = pci_dev;
>> +       vp_dev->vdev.dev.parent = &virtio_pci_root;
>>     
>
> If you use 
>
> 	vp_dev->vdev.dev.parent = &pci_dev->dev;
>
> Then there is no need for the special kvm root device, and the actual
> virtio device shows up in a more logical place, under where it is
> really (virtually) attached.
>   

They already show up underneath of the PCI bus.  The issue is that there 
are two separate 'struct device's for each virtio device.  There's the 
PCI device (that's part of the pci_dev structure) and then there's the 
virtio_device one.  I thought that setting the dev.parent of the 
virtio_device struct device would result in having two separate entries 
under the PCI bus directory which would be pretty confusing :-)

Regards,

Anthony Liguori

> 	Arnd <><
>   


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-08 13:54         ` Anthony Liguori
  2007-11-08 14:37           ` Avi Kivity
@ 2007-11-08 23:43           ` Dor Laor
  1 sibling, 0 replies; 33+ messages in thread
From: Dor Laor @ 2007-11-08 23:43 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Rusty Russell, virtualization, linux-kernel, kvm-devel

Anthony Liguori wrote:
> Avi Kivity wrote:
>   
>> Anthony Liguori wrote:
>>   
>>     
>>> This is a PCI device that implements a transport for virtio.  It allows virtio
>>> devices to be used by QEMU based VMMs like KVM or Xen.
>>>
>>>   
>>>     
>>>       
>> Didn't see support for dma.
>>     
>
> Not sure what you're expecting there.  Using dma_ops in virtio_ring?
>
>   
>>  I think that with Amit's pvdma patches you
>> can support dma-capable devices as well without too much fuss.
>>   
>>     
>
> What is the use case you're thinking of?  A semi-paravirt driver that 
> does dma directly to a device?
>
> Regards,
>
> Anthony Liguori
>
>   
You would also lose performance since pv-dma will trigger an exit for 
each virtio io while
virtio kicks the hypervisor after several IOs were queued.
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
>   


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-08  2:46     ` [PATCH 3/3] virtio PCI device Anthony Liguori
  2007-11-08  6:12       ` [kvm-devel] " Avi Kivity
  2007-11-08 17:46       ` Arnd Bergmann
@ 2007-11-09  0:39       ` Dor Laor
  2007-11-09  2:17         ` Anthony Liguori
  2007-11-20 15:01       ` Avi Kivity
  3 siblings, 1 reply; 33+ messages in thread
From: Dor Laor @ 2007-11-09  0:39 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: linux-kernel, Rusty Russell, virtualization, kvm-devel

Anthony Liguori wrote:
> This is a PCI device that implements a transport for virtio.  It allows virtio
> devices to be used by QEMU based VMMs like KVM or Xen.
>
> ....
>   
While it's a little premature, we can start thinking of irq path 
improvements.
The current patch acks a private isr and afterwards apic eoi will also 
be hit since its
a level trig irq. This means 2 vmexits per irq.
We can start with regular pci irqs and move afterwards to msi.
Some other ugly hack options [we're better use msi]:
    - Read the eoi directly from apic and save the first private isr ack
    - Convert the specific irq line to edge triggered and dont share it
What do you guys think?
> +/* A small wrapper to also acknowledge the interrupt when it's handled.
> + * I really need an EIO hook for the vring so I can ack the interrupt once we
> + * know that we'll be handling the IRQ but before we invoke the callback since
> + * the callback may notify the host which results in the host attempting to
> + * raise an interrupt that we would then mask once we acknowledged the
> + * interrupt. */
> +static irqreturn_t vp_interrupt(int irq, void *opaque)
> +{
> +	struct virtio_pci_device *vp_dev = opaque;
> +	struct virtio_pci_vq_info *info;
> +	irqreturn_t ret = IRQ_NONE;
> +	u8 isr;
> +
> +	/* reading the ISR has the effect of also clearing it so it's very
> +	 * important to save off the value. */
> +	isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
> +
> +	/* It's definitely not us if the ISR was not high */
> +	if (!isr)
> +		return IRQ_NONE;
> +
> +	spin_lock(&vp_dev->lock);
> +	list_for_each_entry(info, &vp_dev->virtqueues, node) {
> +		if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
> +			ret = IRQ_HANDLED;
> +	}
> +	spin_unlock(&vp_dev->lock);
> +
> +	return ret;
> +}
>   


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-09  0:39       ` Dor Laor
@ 2007-11-09  2:17         ` Anthony Liguori
  0 siblings, 0 replies; 33+ messages in thread
From: Anthony Liguori @ 2007-11-09  2:17 UTC (permalink / raw)
  To: dor.laor; +Cc: linux-kernel, Rusty Russell, virtualization, kvm-devel

Dor Laor wrote:
> Anthony Liguori wrote:
>> This is a PCI device that implements a transport for virtio.  It 
>> allows virtio
>> devices to be used by QEMU based VMMs like KVM or Xen.
>>
>> ....
>>   
> While it's a little premature, we can start thinking of irq path 
> improvements.
> The current patch acks a private isr and afterwards apic eoi will also 
> be hit since its
> a level trig irq. This means 2 vmexits per irq.
> We can start with regular pci irqs and move afterwards to msi.
> Some other ugly hack options [we're better use msi]:
>    - Read the eoi directly from apic and save the first private isr ack

I must admit, that I don't know a whole lot about interrupt delivery.  
If we can avoid the private ISR ack then that would certainly be a good 
thing to do!  I think that would involve adding another bit to the 
virtqueues to indicate whether or not there is work to be handled.  It's 
really just moving the ISR to shared memory so that there's no plenty 
for accessing it.

Regards,

Anthony Liguori

>    - Convert the specific irq line to edge triggered and dont share it
> What do you guys think?
>> +/* A small wrapper to also acknowledge the interrupt when it's handled.
>> + * I really need an EIO hook for the vring so I can ack the 
>> interrupt once we
>> + * know that we'll be handling the IRQ but before we invoke the 
>> callback since
>> + * the callback may notify the host which results in the host 
>> attempting to
>> + * raise an interrupt that we would then mask once we acknowledged the
>> + * interrupt. */
>> +static irqreturn_t vp_interrupt(int irq, void *opaque)
>> +{
>> +    struct virtio_pci_device *vp_dev = opaque;
>> +    struct virtio_pci_vq_info *info;
>> +    irqreturn_t ret = IRQ_NONE;
>> +    u8 isr;
>> +
>> +    /* reading the ISR has the effect of also clearing it so it's very
>> +     * important to save off the value. */
>> +    isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
>> +
>> +    /* It's definitely not us if the ISR was not high */
>> +    if (!isr)
>> +        return IRQ_NONE;
>> +
>> +    spin_lock(&vp_dev->lock);
>> +    list_for_each_entry(info, &vp_dev->virtqueues, node) {
>> +        if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
>> +            ret = IRQ_HANDLED;
>> +    }
>> +    spin_unlock(&vp_dev->lock);
>> +
>> +    return ret;
>> +}
>>   
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-08 19:04         ` Anthony Liguori
@ 2007-11-09 11:03           ` Arnd Bergmann
  0 siblings, 0 replies; 33+ messages in thread
From: Arnd Bergmann @ 2007-11-09 11:03 UTC (permalink / raw)
  To: virtualization; +Cc: Anthony Liguori, kvm-devel, virtualization, linux-kernel

On Thursday 08 November 2007, Anthony Liguori wrote:
> 
> They already show up underneath of the PCI bus.  The issue is that there 
> are two separate 'struct device's for each virtio device.  There's the 
> PCI device (that's part of the pci_dev structure) and then there's the 
> virtio_device one.  I thought that setting the dev.parent of the 
> virtio_device struct device would result in having two separate entries 
> under the PCI bus directory which would be pretty confusing 

But that's what a device tree means. Think about a USB disk drive: The drive
shows up as a child of the USB controller, which in turn is a child of
the PCI bridge. Note that I did not suggest having the virtio parent set to
the parent of the PCI device, but to the PCI device itself.

I find it more confusing to have a device just hanging off the root when
it is actually handled by the PCI subsystem.

	Arnd <><

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-08  2:46     ` [PATCH 3/3] virtio PCI device Anthony Liguori
                         ` (2 preceding siblings ...)
  2007-11-09  0:39       ` Dor Laor
@ 2007-11-20 15:01       ` Avi Kivity
  2007-11-20 15:43         ` Anthony Liguori
  3 siblings, 1 reply; 33+ messages in thread
From: Avi Kivity @ 2007-11-20 15:01 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: linux-kernel, Rusty Russell, virtualization, kvm-devel

Anthony Liguori wrote:
> This is a PCI device that implements a transport for virtio.  It allows virtio
> devices to be used by QEMU based VMMs like KVM or Xen.
>
> +
> +/* the notify function used when creating a virt queue */
> +static void vp_notify(struct virtqueue *vq)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
> +	struct virtio_pci_vq_info *info = vq->priv;
> +
> +	/* we write the queue's selector into the notification register to
> +	 * signal the other end */
> +	iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
> +}
>   

This means we can't kick multiple queues with one exit.

I'd also like to see a hypercall-capable version of this (but that can 
wait).

> +
> +/* A small wrapper to also acknowledge the interrupt when it's handled.
> + * I really need an EIO hook for the vring so I can ack the interrupt once we
> + * know that we'll be handling the IRQ but before we invoke the callback since
> + * the callback may notify the host which results in the host attempting to
> + * raise an interrupt that we would then mask once we acknowledged the
> + * interrupt. */
> +static irqreturn_t vp_interrupt(int irq, void *opaque)
> +{
> +	struct virtio_pci_device *vp_dev = opaque;
> +	struct virtio_pci_vq_info *info;
> +	irqreturn_t ret = IRQ_NONE;
> +	u8 isr;
> +
> +	/* reading the ISR has the effect of also clearing it so it's very
> +	 * important to save off the value. */
> +	isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
>   

Can this be implemented via shared memory? We're exiting now on every 
interrupt.


> +	return ret;
> +}
> +
> +/* the config->find_vq() implementation */
> +static struct virtqueue *vp_find_vq(struct virtio_device *vdev, unsigned index,
> +				    bool (*callback)(struct virtqueue *vq))
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	struct virtio_pci_vq_info *info;
> +	struct virtqueue *vq;
> +	int err;
> +	u16 num;
> +
> +	/* Select the queue we're interested in */
> +	iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
>   

I would really like to see this implemented as pci config space, with no 
tricks like multiplexing several virtqueues on one register. Something 
like the PCI BARs where you have all the register numbers allocated 
statically to queues.

> +
> +	/* Check if queue is either not available or already active. */
> +	num = ioread16(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NUM);
> +	if (!num || ioread32(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN))
> +		return ERR_PTR(-ENOENT);
> +
> +	/* allocate and fill out our structure the represents an active
> +	 * queue */
> +	info = kmalloc(sizeof(struct virtio_pci_vq_info), GFP_KERNEL);
> +	if (!info)
> +		return ERR_PTR(-ENOMEM);
> +
> +	info->queue_index = index;
> +	info->num = num;
> +
> +	/* determine the memory needed for the queue and provide the memory
> +	 * location to the host */
> +	info->n_pages = DIV_ROUND_UP(vring_size(num), PAGE_SIZE);
> +	info->pages = alloc_pages(GFP_KERNEL | __GFP_ZERO,
> +				  get_order(info->n_pages));
> +	if (info->pages == NULL) {
> +		err = -ENOMEM;
> +		goto out_info;
> +	}
> +
> +	/* FIXME: is this sufficient for info->n_pages > 1? */
> +	info->queue = kmap(info->pages);
> +	if (info->queue == NULL) {
> +		err = -ENOMEM;
> +		goto out_alloc_pages;
> +	}
> +
> +	/* activate the queue */
> +	iowrite32(page_to_pfn(info->pages),
> +		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> +		  
> +	/* create the vring */
> +	vq = vring_new_virtqueue(info->num, vdev, info->queue,
> +				 vp_notify, callback);
> +	if (!vq) {
> +		err = -ENOMEM;
> +		goto out_activate_queue;
> +	}
> +
> +	vq->priv = info;
> +	info->vq = vq;
> +
> +	spin_lock(&vp_dev->lock);
> +	list_add(&info->node, &vp_dev->virtqueues);
> +	spin_unlock(&vp_dev->lock);
> +
>   

Is this run only on init? If so the lock isn't needed.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-20 15:01       ` Avi Kivity
@ 2007-11-20 15:43         ` Anthony Liguori
  2007-11-20 16:12           ` Avi Kivity
  0 siblings, 1 reply; 33+ messages in thread
From: Anthony Liguori @ 2007-11-20 15:43 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linux-kernel, Rusty Russell, virtualization, kvm-devel

Avi Kivity wrote:
> Anthony Liguori wrote:
>> This is a PCI device that implements a transport for virtio.  It 
>> allows virtio
>> devices to be used by QEMU based VMMs like KVM or Xen.
>>
>> +
>> +/* the notify function used when creating a virt queue */
>> +static void vp_notify(struct virtqueue *vq)
>> +{
>> +    struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
>> +    struct virtio_pci_vq_info *info = vq->priv;
>> +
>> +    /* we write the queue's selector into the notification register to
>> +     * signal the other end */
>> +    iowrite16(info->queue_index, vp_dev->ioaddr + 
>> VIRTIO_PCI_QUEUE_NOTIFY);
>> +}
>>   
>
> This means we can't kick multiple queues with one exit.

There is no interface in virtio currently to batch multiple queue 
notifications so the only way one could do this AFAICT is to use a timer 
to delay the notifications.  Were you thinking of something else?

> I'd also like to see a hypercall-capable version of this (but that can 
> wait).

That can be a different device.

>> +
>> +/* A small wrapper to also acknowledge the interrupt when it's handled.
>> + * I really need an EIO hook for the vring so I can ack the 
>> interrupt once we
>> + * know that we'll be handling the IRQ but before we invoke the 
>> callback since
>> + * the callback may notify the host which results in the host 
>> attempting to
>> + * raise an interrupt that we would then mask once we acknowledged the
>> + * interrupt. */
>> +static irqreturn_t vp_interrupt(int irq, void *opaque)
>> +{
>> +    struct virtio_pci_device *vp_dev = opaque;
>> +    struct virtio_pci_vq_info *info;
>> +    irqreturn_t ret = IRQ_NONE;
>> +    u8 isr;
>> +
>> +    /* reading the ISR has the effect of also clearing it so it's very
>> +     * important to save off the value. */
>> +    isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
>>   
>
> Can this be implemented via shared memory? We're exiting now on every 
> interrupt.

I don't think so.  A vmexit is required to lower the IRQ line.  It may 
be possible to do something clever like set a shared memory value that's 
checked on every vmexit.  I think it's very unlikely that it's worth it 
though.

>
>> +    return ret;
>> +}
>> +
>> +/* the config->find_vq() implementation */
>> +static struct virtqueue *vp_find_vq(struct virtio_device *vdev, 
>> unsigned index,
>> +                    bool (*callback)(struct virtqueue *vq))
>> +{
>> +    struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> +    struct virtio_pci_vq_info *info;
>> +    struct virtqueue *vq;
>> +    int err;
>> +    u16 num;
>> +
>> +    /* Select the queue we're interested in */
>> +    iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
>>   
>
> I would really like to see this implemented as pci config space, with 
> no tricks like multiplexing several virtqueues on one register. 
> Something like the PCI BARs where you have all the register numbers 
> allocated statically to queues.

My first implementation did that.  I switched to using a selector 
because it reduces the amount of PCI config space used and does not 
limit the number of queues defined by the ABI as much.

>> +
>> +    /* Check if queue is either not available or already active. */
>> +    num = ioread16(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NUM);
>> +    if (!num || ioread32(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN))
>> +        return ERR_PTR(-ENOENT);
>> +
>> +    /* allocate and fill out our structure the represents an active
>> +     * queue */
>> +    info = kmalloc(sizeof(struct virtio_pci_vq_info), GFP_KERNEL);
>> +    if (!info)
>> +        return ERR_PTR(-ENOMEM);
>> +
>> +    info->queue_index = index;
>> +    info->num = num;
>> +
>> +    /* determine the memory needed for the queue and provide the memory
>> +     * location to the host */
>> +    info->n_pages = DIV_ROUND_UP(vring_size(num), PAGE_SIZE);
>> +    info->pages = alloc_pages(GFP_KERNEL | __GFP_ZERO,
>> +                  get_order(info->n_pages));
>> +    if (info->pages == NULL) {
>> +        err = -ENOMEM;
>> +        goto out_info;
>> +    }
>> +
>> +    /* FIXME: is this sufficient for info->n_pages > 1? */
>> +    info->queue = kmap(info->pages);
>> +    if (info->queue == NULL) {
>> +        err = -ENOMEM;
>> +        goto out_alloc_pages;
>> +    }
>> +
>> +    /* activate the queue */
>> +    iowrite32(page_to_pfn(info->pages),
>> +          vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>> +          +    /* create the vring */
>> +    vq = vring_new_virtqueue(info->num, vdev, info->queue,
>> +                 vp_notify, callback);
>> +    if (!vq) {
>> +        err = -ENOMEM;
>> +        goto out_activate_queue;
>> +    }
>> +
>> +    vq->priv = info;
>> +    info->vq = vq;
>> +
>> +    spin_lock(&vp_dev->lock);
>> +    list_add(&info->node, &vp_dev->virtqueues);
>> +    spin_unlock(&vp_dev->lock);
>> +
>>   
>
> Is this run only on init? If so the lock isn't needed.

Yes, it's also not stricly needed on cleanup I think.  I left it there 
though for clarity.  I can remove.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-20 15:43         ` Anthony Liguori
@ 2007-11-20 16:12           ` Avi Kivity
  2007-11-20 22:16             ` Anthony Liguori
  0 siblings, 1 reply; 33+ messages in thread
From: Avi Kivity @ 2007-11-20 16:12 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Rusty Russell, virtualization, linux-kernel, kvm-devel

Anthony Liguori wrote:
> Avi Kivity wrote:
>   
>> Anthony Liguori wrote:
>>     
>>> This is a PCI device that implements a transport for virtio.  It 
>>> allows virtio
>>> devices to be used by QEMU based VMMs like KVM or Xen.
>>>
>>> +
>>> +/* the notify function used when creating a virt queue */
>>> +static void vp_notify(struct virtqueue *vq)
>>> +{
>>> +    struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
>>> +    struct virtio_pci_vq_info *info = vq->priv;
>>> +
>>> +    /* we write the queue's selector into the notification register to
>>> +     * signal the other end */
>>> +    iowrite16(info->queue_index, vp_dev->ioaddr + 
>>> VIRTIO_PCI_QUEUE_NOTIFY);
>>> +}
>>>   
>>>       
>> This means we can't kick multiple queues with one exit.
>>     
>
> There is no interface in virtio currently to batch multiple queue 
> notifications so the only way one could do this AFAICT is to use a timer 
> to delay the notifications.  Were you thinking of something else?
>
>   

No.  We can change virtio though, so let's have a flexible ABI.

>> I'd also like to see a hypercall-capable version of this (but that can 
>> wait).
>>     
>
> That can be a different device.
>   

That means the user has to select which device to expose.  With feature 
bits, the hypervisor advertises both pio and hypercalls, the guest picks 
whatever it wants.

>   
>>> +
>>> +/* A small wrapper to also acknowledge the interrupt when it's handled.
>>> + * I really need an EIO hook for the vring so I can ack the 
>>> interrupt once we
>>> + * know that we'll be handling the IRQ but before we invoke the 
>>> callback since
>>> + * the callback may notify the host which results in the host 
>>> attempting to
>>> + * raise an interrupt that we would then mask once we acknowledged the
>>> + * interrupt. */
>>> +static irqreturn_t vp_interrupt(int irq, void *opaque)
>>> +{
>>> +    struct virtio_pci_device *vp_dev = opaque;
>>> +    struct virtio_pci_vq_info *info;
>>> +    irqreturn_t ret = IRQ_NONE;
>>> +    u8 isr;
>>> +
>>> +    /* reading the ISR has the effect of also clearing it so it's very
>>> +     * important to save off the value. */
>>> +    isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
>>>   
>>>       
>> Can this be implemented via shared memory? We're exiting now on every 
>> interrupt.
>>     
>
> I don't think so.  A vmexit is required to lower the IRQ line.  It may 
> be possible to do something clever like set a shared memory value that's 
> checked on every vmexit.  I think it's very unlikely that it's worth it 
> though.
>   

Why so unlikely?  Not all workloads will have good batching.


>   
>>> +    return ret;
>>> +}
>>> +
>>> +/* the config->find_vq() implementation */
>>> +static struct virtqueue *vp_find_vq(struct virtio_device *vdev, 
>>> unsigned index,
>>> +                    bool (*callback)(struct virtqueue *vq))
>>> +{
>>> +    struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>> +    struct virtio_pci_vq_info *info;
>>> +    struct virtqueue *vq;
>>> +    int err;
>>> +    u16 num;
>>> +
>>> +    /* Select the queue we're interested in */
>>> +    iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
>>>   
>>>       
>> I would really like to see this implemented as pci config space, with 
>> no tricks like multiplexing several virtqueues on one register. 
>> Something like the PCI BARs where you have all the register numbers 
>> allocated statically to queues.
>>     
>
> My first implementation did that.  I switched to using a selector 
> because it reduces the amount of PCI config space used and does not 
> limit the number of queues defined by the ABI as much.
>   

But... it's tricky, and it's nonstandard.  With pci config, you can do 
live migration by shipping the pci config space to the other side.  With 
the special iospace, you need to encode/decode it.

Not much of an argument, I know.


wrt. number of queues, 8 queues will consume 32 bytes of pci space if 
all you store is the ring pfn.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-20 16:12           ` Avi Kivity
@ 2007-11-20 22:16             ` Anthony Liguori
  2007-11-21  7:13               ` Avi Kivity
  0 siblings, 1 reply; 33+ messages in thread
From: Anthony Liguori @ 2007-11-20 22:16 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Rusty Russell, virtualization, linux-kernel, kvm-devel,
	Eric Van Hensbergen

Avi Kivity wrote:
> Anthony Liguori wrote:
>> Avi Kivity wrote:
>>  
>>> Anthony Liguori wrote:
>>>    
>>>> This is a PCI device that implements a transport for virtio.  It 
>>>> allows virtio
>>>> devices to be used by QEMU based VMMs like KVM or Xen.
>>>>
>>>> +
>>>> +/* the notify function used when creating a virt queue */
>>>> +static void vp_notify(struct virtqueue *vq)
>>>> +{
>>>> +    struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
>>>> +    struct virtio_pci_vq_info *info = vq->priv;
>>>> +
>>>> +    /* we write the queue's selector into the notification 
>>>> register to
>>>> +     * signal the other end */
>>>> +    iowrite16(info->queue_index, vp_dev->ioaddr + 
>>>> VIRTIO_PCI_QUEUE_NOTIFY);
>>>> +}
>>>>         
>>> This means we can't kick multiple queues with one exit.
>>>     
>>
>> There is no interface in virtio currently to batch multiple queue 
>> notifications so the only way one could do this AFAICT is to use a 
>> timer to delay the notifications.  Were you thinking of something else?
>>
>>   
>
> No.  We can change virtio though, so let's have a flexible ABI.

Well please propose the virtio API first and then I'll adjust the PCI 
ABI.  I don't want to build things into the ABI that we never actually 
end up using in virtio :-)

>>> I'd also like to see a hypercall-capable version of this (but that 
>>> can wait).
>>>     
>>
>> That can be a different device.
>>   
>
> That means the user has to select which device to expose.  With 
> feature bits, the hypervisor advertises both pio and hypercalls, the 
> guest picks whatever it wants.

I was thinking more along the lines that a hypercall-based device would 
certainly be implemented in-kernel whereas the current device is 
naturally implemented in userspace.  We can simply use a different 
device for in-kernel drivers than for userspace drivers.  There's no 
point at all in doing a hypercall based userspace device IMHO.

>> I don't think so.  A vmexit is required to lower the IRQ line.  It 
>> may be possible to do something clever like set a shared memory value 
>> that's checked on every vmexit.  I think it's very unlikely that it's 
>> worth it though.
>>   
>
> Why so unlikely?  Not all workloads will have good batching.

It's pretty invasive.  I think a more paravirt device that expected an 
edge triggered interrupt would be a better solution for those types of 
devices.
 
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +/* the config->find_vq() implementation */
>>>> +static struct virtqueue *vp_find_vq(struct virtio_device *vdev, 
>>>> unsigned index,
>>>> +                    bool (*callback)(struct virtqueue *vq))
>>>> +{
>>>> +    struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>>> +    struct virtio_pci_vq_info *info;
>>>> +    struct virtqueue *vq;
>>>> +    int err;
>>>> +    u16 num;
>>>> +
>>>> +    /* Select the queue we're interested in */
>>>> +    iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
>>>>         
>>> I would really like to see this implemented as pci config space, 
>>> with no tricks like multiplexing several virtqueues on one register. 
>>> Something like the PCI BARs where you have all the register numbers 
>>> allocated statically to queues.
>>>     
>>
>> My first implementation did that.  I switched to using a selector 
>> because it reduces the amount of PCI config space used and does not 
>> limit the number of queues defined by the ABI as much.
>>   
>
> But... it's tricky, and it's nonstandard.  With pci config, you can do 
> live migration by shipping the pci config space to the other side.  
> With the special iospace, you need to encode/decode it.

None of the PCI devices currently work like that in QEMU.  It would be 
very hard to make a device that worked this way because since the order 
in which values are written matter a whole lot.  For instance, if you 
wrote the status register before the queue information, the driver could 
get into a funky state.

We'll still need save/restore routines for virtio devices.  I don't 
really see this as a problem since we do this for every other device.

> Not much of an argument, I know.
>
>
> wrt. number of queues, 8 queues will consume 32 bytes of pci space if 
> all you store is the ring pfn.

You also at least need a num argument which takes you to 48 or 64 
depending on whether you care about strange formatting.  8 queues may 
not be enough either.  Eric and I have discussed whether the 9p virtio 
device should support multiple mounts per-virtio device and if so, 
whether each one should have it's own queue.  Any devices that supports 
this sort of multiplexing will very quickly start using a lot of queues.

I think most types of hardware have some notion of a selector or mode.  
Take a look at the LSI adapter or even VGA.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-20 22:16             ` Anthony Liguori
@ 2007-11-21  7:13               ` Avi Kivity
  2007-11-21 18:22                 ` Zachary Amsden
  2007-11-23 16:51                 ` Anthony Liguori
  0 siblings, 2 replies; 33+ messages in thread
From: Avi Kivity @ 2007-11-21  7:13 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm-devel, virtualization, linux-kernel, Eric Van Hensbergen

Anthony Liguori wrote:
> Avi Kivity wrote:
>   
>> Anthony Liguori wrote:
>>     
>>> Avi Kivity wrote:
>>>  
>>>       
>>>> Anthony Liguori wrote:
>>>>    
>>>>         
>>>>> This is a PCI device that implements a transport for virtio.  It 
>>>>> allows virtio
>>>>> devices to be used by QEMU based VMMs like KVM or Xen.
>>>>>
>>>>> +
>>>>> +/* the notify function used when creating a virt queue */
>>>>> +static void vp_notify(struct virtqueue *vq)
>>>>> +{
>>>>> +    struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
>>>>> +    struct virtio_pci_vq_info *info = vq->priv;
>>>>> +
>>>>> +    /* we write the queue's selector into the notification 
>>>>> register to
>>>>> +     * signal the other end */
>>>>> +    iowrite16(info->queue_index, vp_dev->ioaddr + 
>>>>> VIRTIO_PCI_QUEUE_NOTIFY);
>>>>> +}
>>>>>         
>>>>>           
>>>> This means we can't kick multiple queues with one exit.
>>>>     
>>>>         
>>> There is no interface in virtio currently to batch multiple queue 
>>> notifications so the only way one could do this AFAICT is to use a 
>>> timer to delay the notifications.  Were you thinking of something else?
>>>
>>>   
>>>       
>> No.  We can change virtio though, so let's have a flexible ABI.
>>     
>
> Well please propose the virtio API first and then I'll adjust the PCI 
> ABI.  I don't want to build things into the ABI that we never actually 
> end up using in virtio :-)
>
>   

Move ->kick() to virtio_driver.

I believe Xen networking uses the same event channel for both rx and tx, 
so in effect they're using this model.  Long time since I looked though,

>>>> I'd also like to see a hypercall-capable version of this (but that 
>>>> can wait).
>>>>     
>>>>         
>>> That can be a different device.
>>>   
>>>       
>> That means the user has to select which device to expose.  With 
>> feature bits, the hypervisor advertises both pio and hypercalls, the 
>> guest picks whatever it wants.
>>     
>
> I was thinking more along the lines that a hypercall-based device would 
> certainly be implemented in-kernel whereas the current device is 
> naturally implemented in userspace.  We can simply use a different 
> device for in-kernel drivers than for userspace drivers.  

Where the device is implemented is an implementation detail that should 
be hidden from the guest, isn't that one of the strengths of 
virtualization?  Two examples: a file-based block device implemented in 
qemu gives you fancy file formats with encryption and compression, while 
the same device implemented in the kernel gives you a low-overhead path 
directly to a zillion-disk SAN volume.  Or a user-level network device 
capable of running with the slirp stack and no permissions vs. the 
kernel device running copyless most of the time and using a dma engine 
for the rest but requiring you to be good friends with the admin.

The user should expect zero reconfigurations moving a VM from one model 
to the other.

> There's no 
> point at all in doing a hypercall based userspace device IMHO.
>   

We abstract this away by having a "channel signalled" API (both at the 
kernel for kernel devices and as a kvm.h exit reason / libkvm callback.

Again, somewhat like Xen's event channels, though asymmetric.

>>> I don't think so.  A vmexit is required to lower the IRQ line.  It 
>>> may be possible to do something clever like set a shared memory value 
>>> that's checked on every vmexit.  I think it's very unlikely that it's 
>>> worth it though.
>>>   
>>>       
>> Why so unlikely?  Not all workloads will have good batching.
>>     
>
> It's pretty invasive.  I think a more paravirt device that expected an 
> edge triggered interrupt would be a better solution for those types of 
> devices.
>   

I was thinking it could be useful mostly in the context of a paravirt 
irqchip, where we can lower the cost of level-triggered interrupts.

>>>>> +
>>>>> +    /* Select the queue we're interested in */
>>>>> +    iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
>>>>>         
>>>>>           
>>>> I would really like to see this implemented as pci config space, 
>>>> with no tricks like multiplexing several virtqueues on one register. 
>>>> Something like the PCI BARs where you have all the register numbers 
>>>> allocated statically to queues.
>>>>     
>>>>         
>>> My first implementation did that.  I switched to using a selector 
>>> because it reduces the amount of PCI config space used and does not 
>>> limit the number of queues defined by the ABI as much.
>>>   
>>>       
>> But... it's tricky, and it's nonstandard.  With pci config, you can do 
>> live migration by shipping the pci config space to the other side.  
>> With the special iospace, you need to encode/decode it.
>>     
>
> None of the PCI devices currently work like that in QEMU.  It would be 
> very hard to make a device that worked this way because since the order 
> in which values are written matter a whole lot.  For instance, if you 
> wrote the status register before the queue information, the driver could 
> get into a funky state.
>   

I assume you're talking about restore?  Isn't that atomic?

> We'll still need save/restore routines for virtio devices.  I don't 
> really see this as a problem since we do this for every other device.
>
>   

Yeah.

>> Not much of an argument, I know.
>>
>>
>> wrt. number of queues, 8 queues will consume 32 bytes of pci space if 
>> all you store is the ring pfn.
>>     
>
> You also at least need a num argument which takes you to 48 or 64 
> depending on whether you care about strange formatting.  8 queues may 
> not be enough either.  Eric and I have discussed whether the 9p virtio 
> device should support multiple mounts per-virtio device and if so, 
> whether each one should have it's own queue.  Any devices that supports 
> this sort of multiplexing will very quickly start using a lot of queues.
>   

Make it appear as a pci function?  (though my feeling is that multiple 
mounts should be different devices; we can then hotplug mountpoints).

> I think most types of hardware have some notion of a selector or mode.  
> Take a look at the LSI adapter or even VGA.
>
>   

True.  They aren't fun to use, though.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-21  7:13               ` Avi Kivity
@ 2007-11-21 18:22                 ` Zachary Amsden
  2007-11-22  7:32                   ` Avi Kivity
  2007-11-23 16:51                 ` Anthony Liguori
  1 sibling, 1 reply; 33+ messages in thread
From: Zachary Amsden @ 2007-11-21 18:22 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, kvm-devel, virtualization, linux-kernel,
	Eric Van Hensbergen

On Wed, 2007-11-21 at 09:13 +0200, Avi Kivity wrote:

> Where the device is implemented is an implementation detail that should 
> be hidden from the guest, isn't that one of the strengths of 
> virtualization?  Two examples: a file-based block device implemented in 
> qemu gives you fancy file formats with encryption and compression, while 
> the same device implemented in the kernel gives you a low-overhead path 
> directly to a zillion-disk SAN volume.  Or a user-level network device 
> capable of running with the slirp stack and no permissions vs. the 
> kernel device running copyless most of the time and using a dma engine 
> for the rest but requiring you to be good friends with the admin.
> 
> The user should expect zero reconfigurations moving a VM from one model 
> to the other.

I think that is pretty insightful, and indeed, is probably the only
reason we would ever consider using a virtio based driver.

But is this really a virtualization problem, and is virtio the right
place to solve it?  Doesn't I/O hotplug with multipathing or NIC teaming
provide the same infrastructure in a way that is useful in more than
just a virtualization context?

Zach


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-21 18:22                 ` Zachary Amsden
@ 2007-11-22  7:32                   ` Avi Kivity
  0 siblings, 0 replies; 33+ messages in thread
From: Avi Kivity @ 2007-11-22  7:32 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Anthony Liguori, kvm-devel, virtualization, linux-kernel,
	Eric Van Hensbergen

Zachary Amsden wrote:
> On Wed, 2007-11-21 at 09:13 +0200, Avi Kivity wrote:
>
>   
>> Where the device is implemented is an implementation detail that should 
>> be hidden from the guest, isn't that one of the strengths of 
>> virtualization?  Two examples: a file-based block device implemented in 
>> qemu gives you fancy file formats with encryption and compression, while 
>> the same device implemented in the kernel gives you a low-overhead path 
>> directly to a zillion-disk SAN volume.  Or a user-level network device 
>> capable of running with the slirp stack and no permissions vs. the 
>> kernel device running copyless most of the time and using a dma engine 
>> for the rest but requiring you to be good friends with the admin.
>>
>> The user should expect zero reconfigurations moving a VM from one model 
>> to the other.
>>     
>
> I think that is pretty insightful, and indeed, is probably the only
> reason we would ever consider using a virtio based driver.
>
> But is this really a virtualization problem, and is virtio the right
> place to solve it?  Doesn't I/O hotplug with multipathing or NIC teaming
> provide the same infrastructure in a way that is useful in more than
> just a virtualization context?
>   

With the aid of a dictionary I was able to understand about half the 
words in the last sentence.  Moving from device to device using 
hotplug+multipath is complex to configure, available on only some 
guests, uses rarely-exercised paths in the guest OS, and only works for 
a few types of devices (network and block).  Having host independence in 
the device means you can change the device implementation for, say, a 
display driver (consider, for example, a vmgl+virtio driver, which can 
be implemented in userspace or tunneled via virtio-over-tcp to some 
remote display without going through userspace, without the guest 
knowing about it).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-21  7:13               ` Avi Kivity
  2007-11-21 18:22                 ` Zachary Amsden
@ 2007-11-23 16:51                 ` Anthony Liguori
  2007-11-23 17:47                   ` Avi Kivity
  1 sibling, 1 reply; 33+ messages in thread
From: Anthony Liguori @ 2007-11-23 16:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm-devel, virtualization, linux-kernel, Eric Van Hensbergen,
	Rusty Russell, lguest

Avi Kivity wrote:
> Anthony Liguori wrote:
>> Well please propose the virtio API first and then I'll adjust the PCI 
>> ABI.  I don't want to build things into the ABI that we never 
>> actually end up using in virtio :-)
>>
>>   
>
> Move ->kick() to virtio_driver.

Then on each kick, all queues have to be checked for processing?  What 
devices do you expect this would help?

> I believe Xen networking uses the same event channel for both rx and 
> tx, so in effect they're using this model.  Long time since I looked 
> though,

I would have to look, but since rx/tx are rather independent actions, 
I'm not sure that you would really save that much.  You still end up 
doing the same number of kicks unless I'm missing something.

>> I was thinking more along the lines that a hypercall-based device 
>> would certainly be implemented in-kernel whereas the current device 
>> is naturally implemented in userspace.  We can simply use a different 
>> device for in-kernel drivers than for userspace drivers.  
>
> Where the device is implemented is an implementation detail that 
> should be hidden from the guest, isn't that one of the strengths of 
> virtualization?  Two examples: a file-based block device implemented 
> in qemu gives you fancy file formats with encryption and compression, 
> while the same device implemented in the kernel gives you a 
> low-overhead path directly to a zillion-disk SAN volume.  Or a 
> user-level network device capable of running with the slirp stack and 
> no permissions vs. the kernel device running copyless most of the time 
> and using a dma engine for the rest but requiring you to be good 
> friends with the admin.
>
> The user should expect zero reconfigurations moving a VM from one 
> model to the other.

I'm wary of introducing the notion of hypercalls to this device because 
it makes the device VMM specific.  Maybe we could have the device 
provide an option ROM that was treated as the device "BIOS" that we 
could use for kicking and interrupt acking?  Any idea of how that would 
map to Windows?  Are there real PCI devices that use the option ROM 
space to provide what's essentially firmware?  Unfortunately, I don't 
think an option ROM BIOS would map well to other architectures.

>> None of the PCI devices currently work like that in QEMU.  It would 
>> be very hard to make a device that worked this way because since the 
>> order in which values are written matter a whole lot.  For instance, 
>> if you wrote the status register before the queue information, the 
>> driver could get into a funky state.
>>   
>
> I assume you're talking about restore?  Isn't that atomic?

If you're doing restore by passing the PCI config blob to a registered 
routine, then sure, but that doesn't seem much better to me than just 
having the device generate that blob in the first place (which is what 
we have today).  I was assuming that you would want to use the existing 
PIO/MMIO handlers to do restore by rewriting the config as if the guest was.

>>> Not much of an argument, I know.
>>>
>>>
>>> wrt. number of queues, 8 queues will consume 32 bytes of pci space 
>>> if all you store is the ring pfn.
>>>     
>>
>> You also at least need a num argument which takes you to 48 or 64 
>> depending on whether you care about strange formatting.  8 queues may 
>> not be enough either.  Eric and I have discussed whether the 9p 
>> virtio device should support multiple mounts per-virtio device and if 
>> so, whether each one should have it's own queue.  Any devices that 
>> supports this sort of multiplexing will very quickly start using a 
>> lot of queues.
>>   
>
> Make it appear as a pci function?  (though my feeling is that multiple 
> mounts should be different devices; we can then hotplug mountpoints).

We may run out of PCI slots though :-/

>> I think most types of hardware have some notion of a selector or 
>> mode.  Take a look at the LSI adapter or even VGA.
>>
>>   
>
> True.  They aren't fun to use, though.

I don't think they're really any worse :-)

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-23 16:51                 ` Anthony Liguori
@ 2007-11-23 17:47                   ` Avi Kivity
  2007-11-26 19:18                     ` Anthony Liguori
  0 siblings, 1 reply; 33+ messages in thread
From: Avi Kivity @ 2007-11-23 17:47 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Eric Van Hensbergen, lguest, kvm-devel, linux-kernel, virtualization

Anthony Liguori wrote:
> Avi Kivity wrote:
>   
>> Anthony Liguori wrote:
>>     
>>> Well please propose the virtio API first and then I'll adjust the PCI 
>>> ABI.  I don't want to build things into the ABI that we never 
>>> actually end up using in virtio :-)
>>>
>>>   
>>>       
>> Move ->kick() to virtio_driver.
>>     
>
> Then on each kick, all queues have to be checked for processing?  What 
> devices do you expect this would help?
>
>   

Networking.

>> I believe Xen networking uses the same event channel for both rx and 
>> tx, so in effect they're using this model.  Long time since I looked 
>> though,
>>     
>
> I would have to look, but since rx/tx are rather independent actions, 
> I'm not sure that you would really save that much.  You still end up 
> doing the same number of kicks unless I'm missing something.
>
>   

rx and tx are closely related. You rarely have one without the other.

In fact, a turned implementation should have zero kicks or interrupts 
for bulk transfers. The rx interrupt on the host will process new tx 
descriptors and fill the guest's rx queue; the guest's transmit function 
can also check the receive queue. I don't know if that's achievable for 
Linuz guests currently, but we should aim to make it possible.

Another point is that virtio still has a lot of leading zeros in its 
mileage counter. We need to keep things flexible and learn from others 
as much as possible, especially when talking about the ABI.

> I'm wary of introducing the notion of hypercalls to this device because 
> it makes the device VMM specific.  Maybe we could have the device 
> provide an option ROM that was treated as the device "BIOS" that we 
> could use for kicking and interrupt acking?  Any idea of how that would 
> map to Windows?  Are there real PCI devices that use the option ROM 
> space to provide what's essentially firmware?  Unfortunately, I don't 
> think an option ROM BIOS would map well to other architectures.
>
>   

The BIOS wouldn't work even on x86 because it isn't mapped to the guest 
address space (at least not consistently), and doesn't know the guest's 
programming model (16, 32, or 64-bits? segmented or flat?)

Xen uses a hypercall page to abstract these details out. However, I'm 
not proposing that. Simply indicate that we support hypercalls, and use 
some layer below to actually send them. It is the responsibility of this 
layer to detect if hypercalls are present and how to call them.

Hey, I think the best place for it is in paravirt_ops. We can even patch 
the hypercall instruction inline, and the driver doesn't need to know 
about it.

>>> None of the PCI devices currently work like that in QEMU.  It would 
>>> be very hard to make a device that worked this way because since the 
>>> order in which values are written matter a whole lot.  For instance, 
>>> if you wrote the status register before the queue information, the 
>>> driver could get into a funky state.
>>>   
>>>       
>> I assume you're talking about restore?  Isn't that atomic?
>>     
>
> If you're doing restore by passing the PCI config blob to a registered 
> routine, then sure, but that doesn't seem much better to me than just 
> having the device generate that blob in the first place (which is what 
> we have today).  I was assuming that you would want to use the existing 
> PIO/MMIO handlers to do restore by rewriting the config as if the guest was.
>
>   

Sure some complexity is unavoidable. But flat is simpler than indirect.

>>>> Not much of an argument, I know.
>>>>
>>>>
>>>> wrt. number of queues, 8 queues will consume 32 bytes of pci space 
>>>> if all you store is the ring pfn.
>>>>     
>>>>         
>>> You also at least need a num argument which takes you to 48 or 64 
>>> depending on whether you care about strange formatting.  8 queues may 
>>> not be enough either.  Eric and I have discussed whether the 9p 
>>> virtio device should support multiple mounts per-virtio device and if 
>>> so, whether each one should have it's own queue.  Any devices that 
>>> supports this sort of multiplexing will very quickly start using a 
>>> lot of queues.
>>>   
>>>       
>> Make it appear as a pci function?  (though my feeling is that multiple 
>> mounts should be different devices; we can then hotplug mountpoints).
>>     
>
> We may run out of PCI slots though :-/
>   

Then we can start selling virtio extension chassis.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-23 17:47                   ` Avi Kivity
@ 2007-11-26 19:18                     ` Anthony Liguori
  2007-11-27  9:02                       ` Avi Kivity
  0 siblings, 1 reply; 33+ messages in thread
From: Anthony Liguori @ 2007-11-26 19:18 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Eric Van Hensbergen, lguest, kvm-devel, linux-kernel, virtualization

Avi Kivity wrote:
> rx and tx are closely related. You rarely have one without the other.
>
> In fact, a turned implementation should have zero kicks or interrupts 
> for bulk transfers. The rx interrupt on the host will process new tx 
> descriptors and fill the guest's rx queue; the guest's transmit 
> function can also check the receive queue. I don't know if that's 
> achievable for Linuz guests currently, but we should aim to make it 
> possible.

ATM, the net driver does a pretty good job of disabling kicks/interrupts 
unless they are needed.  Checking for rx on tx and vice versa is a good 
idea and could further help there.  I'll give it a try this week.

> Another point is that virtio still has a lot of leading zeros in its 
> mileage counter. We need to keep things flexible and learn from others 
> as much as possible, especially when talking about the ABI.

Yes, after thinking about it over holiday, I agree that we should at 
least introduce a virtio-pci feature bitmask.  I'm not inclined to 
attempt to define a hypercall ABI or anything like that right now but 
having the feature bitmask will at least make it possible to do such a 
thing in the future.

>> I'm wary of introducing the notion of hypercalls to this device 
>> because it makes the device VMM specific.  Maybe we could have the 
>> device provide an option ROM that was treated as the device "BIOS" 
>> that we could use for kicking and interrupt acking?  Any idea of how 
>> that would map to Windows?  Are there real PCI devices that use the 
>> option ROM space to provide what's essentially firmware?  
>> Unfortunately, I don't think an option ROM BIOS would map well to 
>> other architectures.
>>
>>   
>
> The BIOS wouldn't work even on x86 because it isn't mapped to the 
> guest address space (at least not consistently), and doesn't know the 
> guest's programming model (16, 32, or 64-bits? segmented or flat?)
>
> Xen uses a hypercall page to abstract these details out. However, I'm 
> not proposing that. Simply indicate that we support hypercalls, and 
> use some layer below to actually send them. It is the responsibility 
> of this layer to detect if hypercalls are present and how to call them.
>
> Hey, I think the best place for it is in paravirt_ops. We can even 
> patch the hypercall instruction inline, and the driver doesn't need to 
> know about it.

Yes, paravirt_ops is attractive for abstracting the hypercall calling 
mechanism but it's still necessary to figure out how hypercalls would be 
identified.  I think it would be necessary to define a virtio specific 
hypercall space and use the virtio device ID to claim subspaces.

For instance, the hypercall number could be (virtio_devid << 16) | (call 
number).  How that translates into a hypercall would then be part of the 
paravirt_ops abstraction.  In KVM, we may have a single virtio hypercall 
where we pass the virtio hypercall number as one of the arguments or 
something like that.

>>>>> Not much of an argument, I know.
>>>>>
>>>>>
>>>>> wrt. number of queues, 8 queues will consume 32 bytes of pci space 
>>>>> if all you store is the ring pfn.
>>>>>             
>>>> You also at least need a num argument which takes you to 48 or 64 
>>>> depending on whether you care about strange formatting.  8 queues 
>>>> may not be enough either.  Eric and I have discussed whether the 9p 
>>>> virtio device should support multiple mounts per-virtio device and 
>>>> if so, whether each one should have it's own queue.  Any devices 
>>>> that supports this sort of multiplexing will very quickly start 
>>>> using a lot of queues.
>>>>         
>>> Make it appear as a pci function?  (though my feeling is that 
>>> multiple mounts should be different devices; we can then hotplug 
>>> mountpoints).
>>>     
>>
>> We may run out of PCI slots though :-/
>>   
>
> Then we can start selling virtio extension chassis.

:-)  Do you know if there is a hard limit on the number of devices on a 
PCI bus?  My concern was that it was limited by something stupid like an 
8-bit identifier.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-26 19:18                     ` Anthony Liguori
@ 2007-11-27  9:02                       ` Avi Kivity
  2007-11-27  9:09                         ` Carsten Otte
  2007-11-27  9:25                         ` Arnd Bergmann
  0 siblings, 2 replies; 33+ messages in thread
From: Avi Kivity @ 2007-11-27  9:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Eric Van Hensbergen, lguest, kvm-devel, linux-kernel, virtualization

Anthony Liguori wrote:
>> Another point is that virtio still has a lot of leading zeros in its 
>> mileage counter. We need to keep things flexible and learn from 
>> others as much as possible, especially when talking about the ABI.
>
> Yes, after thinking about it over holiday, I agree that we should at 
> least introduce a virtio-pci feature bitmask.  I'm not inclined to 
> attempt to define a hypercall ABI or anything like that right now but 
> having the feature bitmask will at least make it possible to do such a 
> thing in the future.

No, definitely not define a hypercall ABI.  The feature bit should say 
"this device understands a hypervisor-specific way of kicking.  consult 
your hypervisor manual and cpuid bits for further details.  should you 
not be satisfied with this method, port io is still available".

>
>>> I'm wary of introducing the notion of hypercalls to this device 
>>> because it makes the device VMM specific.  Maybe we could have the 
>>> device provide an option ROM that was treated as the device "BIOS" 
>>> that we could use for kicking and interrupt acking?  Any idea of how 
>>> that would map to Windows?  Are there real PCI devices that use the 
>>> option ROM space to provide what's essentially firmware?  
>>> Unfortunately, I don't think an option ROM BIOS would map well to 
>>> other architectures.
>>>
>>>   
>>
>> The BIOS wouldn't work even on x86 because it isn't mapped to the 
>> guest address space (at least not consistently), and doesn't know the 
>> guest's programming model (16, 32, or 64-bits? segmented or flat?)
>>
>> Xen uses a hypercall page to abstract these details out. However, I'm 
>> not proposing that. Simply indicate that we support hypercalls, and 
>> use some layer below to actually send them. It is the responsibility 
>> of this layer to detect if hypercalls are present and how to call them.
>>
>> Hey, I think the best place for it is in paravirt_ops. We can even 
>> patch the hypercall instruction inline, and the driver doesn't need 
>> to know about it.
>
> Yes, paravirt_ops is attractive for abstracting the hypercall calling 
> mechanism but it's still necessary to figure out how hypercalls would 
> be identified.  I think it would be necessary to define a virtio 
> specific hypercall space and use the virtio device ID to claim subspaces.
>
> For instance, the hypercall number could be (virtio_devid << 16) | 
> (call number).  How that translates into a hypercall would then be 
> part of the paravirt_ops abstraction.  In KVM, we may have a single 
> virtio hypercall where we pass the virtio hypercall number as one of 
> the arguments or something like that.

If we don't call it a hypercall, but a virtio kick operation, we don't 
need to worry about the hypercall number or ABI.  It's just a function 
that takes an argument that's implemented differently by every 
hypervisor.  The default implementation can be a pio operation.

>>>> Make it appear as a pci function?  (though my feeling is that 
>>>> multiple mounts should be different devices; we can then hotplug 
>>>> mountpoints).
>>>>     
>>>
>>> We may run out of PCI slots though :-/
>>>   
>>
>> Then we can start selling virtio extension chassis.
>
> :-)  Do you know if there is a hard limit on the number of devices on 
> a PCI bus?  My concern was that it was limited by something stupid 
> like an 8-bit identifier.

IIRC pci slots are 8-bit, but you can have multiple buses, so 
effectively 16 bits of device address space (discounting functions which 
are likely not hot-pluggable).


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-27  9:02                       ` Avi Kivity
@ 2007-11-27  9:09                         ` Carsten Otte
  2007-11-27  9:27                           ` Avi Kivity
  2007-11-27  9:25                         ` Arnd Bergmann
  1 sibling, 1 reply; 33+ messages in thread
From: Carsten Otte @ 2007-11-27  9:09 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Eric Van Hensbergen, kvm-devel, lguest,
	linux-kernel, virtualization

Avi Kivity wrote:
> No, definitely not define a hypercall ABI.  The feature bit should say 
> "this device understands a hypervisor-specific way of kicking.  consult 
> your hypervisor manual and cpuid bits for further details.  should you 
> not be satisfied with this method, port io is still available".
...unless you're lucky enough to be on s390 where pio is not available.
I don't see why we'd have two different ways to talk to a virtio 
device. I think we should use a hypercall for interrupt injection, 
without support for grumpy old soldered pci features other than 
HPA-style Lguest PCI bus organization. There are no devices that we 
want to be backward compatible with.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-27  9:02                       ` Avi Kivity
  2007-11-27  9:09                         ` Carsten Otte
@ 2007-11-27  9:25                         ` Arnd Bergmann
  1 sibling, 0 replies; 33+ messages in thread
From: Arnd Bergmann @ 2007-11-27  9:25 UTC (permalink / raw)
  To: kvm-devel
  Cc: Anthony Liguori, Eric Van Hensbergen, kvm-devel, lguest,
	linux-kernel, virtualization

On Tuesday 27 November 2007, Avi Kivity wrote:
> > :-)  Do you know if there is a hard limit on the number of devices on 
> > a PCI bus?  My concern was that it was limited by something stupid 
> > like an 8-bit identifier.
> 
> IIRC pci slots are 8-bit, but you can have multiple buses, so 
> effectively 16 bits of device address space (discounting functions which 
> are likely not hot-pluggable).

You have an 8 bit bus number and an 8 bit device/function number.
The function number is 3 bits, so if you want to use only function 0
for everything, you are limited to a little under 8192 (2^(8+5)) devices
per PCI domain. PC style hardware cannot easily address multiple PCI
domains, but I think you can have them if you assume that the guest is
using mmconfig.

For using multiple buses, the easiest way could be to have every
device/function on bus 0 be a bridge by itself, so you end up with a
flat number space for the actual devices,

$ lspci -t
 [0000:00]-+-00.0-[0000:01]--+-00.0
           |                 +-01.0
           |                 +-02.0
           |                 + ...
           |                 \-3f.0
           +-00.1-[0000:02]--+-00.0
           |                 +-01.0
           |                 +-02.0
           |                 + ...
           |                 \-3f.0
           + ...
           |
           +-3f.6-[0000:ff]--+-00.0
                             +-01.0
                             +-02.0
                             + ...
                             \-3f.0

	Arnd <><

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-27  9:09                         ` Carsten Otte
@ 2007-11-27  9:27                           ` Avi Kivity
  2007-11-27 10:12                             ` Carsten Otte
  0 siblings, 1 reply; 33+ messages in thread
From: Avi Kivity @ 2007-11-27  9:27 UTC (permalink / raw)
  To: carsteno
  Cc: Eric Van Hensbergen, kvm-devel, lguest, linux-kernel, virtualization

Carsten Otte wrote:
> Avi Kivity wrote:
>   
>> No, definitely not define a hypercall ABI.  The feature bit should say 
>> "this device understands a hypervisor-specific way of kicking.  consult 
>> your hypervisor manual and cpuid bits for further details.  should you 
>> not be satisfied with this method, port io is still available".
>>     
> ...unless you're lucky enough to be on s390 where pio is not available.
> I don't see why we'd have two different ways to talk to a virtio 
> device. I think we should use a hypercall for interrupt injection, 
> without support for grumpy old soldered pci features other than 
> HPA-style Lguest PCI bus organization. There are no devices that we 
> want to be backward compatible with.
>   

pio is useful for qemu, for example, and as a fallback against changing 
hypervisor calling conventions.  As Anthony points out, it makes a 
qemu-implemented device instantly available to Xen at no extra charge.

My wording was inappropriate for s390, though.  The politically correct 
version reads "this device understands a hypervisor-specific way of 
kicking. consult your hypervisor manual and platform-specific way of 
querying hypervisor information for further details. should you not be 
satisfied with this method, the standard method of kicking virtio 
devices on your platform is still available".

On s390, I imagine that "the standard method" is the fabled diag 
instruction (which, with the proper arguments, will cook your steak to 
the exact shade of medium-rare you desire).  So you will never need to 
set the "hypervisor-specific way of kicking" bit, as your standard 
method is already optimal.

Unfortunately, we have to care for platform differences, subarch 
differences (vmx/svm), hypervisor differences (with virtio), and guest 
differences (Linux/Windows/pvLinux, 32/64).  Much care is needed when 
designing the ABI here.

[actually thinking a bit, this is specific to the virtio pci binding; 
s390 will never see any of it]

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-27  9:27                           ` Avi Kivity
@ 2007-11-27 10:12                             ` Carsten Otte
  2007-11-27 10:19                               ` Avi Kivity
  0 siblings, 1 reply; 33+ messages in thread
From: Carsten Otte @ 2007-11-27 10:12 UTC (permalink / raw)
  To: Avi Kivity
  Cc: carsteno, Eric Van Hensbergen, kvm-devel, lguest, linux-kernel,
	virtualization

Avi Kivity wrote:
> Unfortunately, we have to care for platform differences, subarch 
> differences (vmx/svm), hypervisor differences (with virtio), and guest 
> differences (Linux/Windows/pvLinux, 32/64).  Much care is needed when 
> designing the ABI here.
Yea, I agree.

> [actually thinking a bit, this is specific to the virtio pci binding; 
> s390 will never see any of it]
You remember that we've lost the big debate around virtio in Tucson? 
We intend to bind our virtio devices to PCI too, so that they look the 
same in Linux userland across architectures.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-27 10:12                             ` Carsten Otte
@ 2007-11-27 10:19                               ` Avi Kivity
  2007-11-27 10:28                                 ` Carsten Otte
  0 siblings, 1 reply; 33+ messages in thread
From: Avi Kivity @ 2007-11-27 10:19 UTC (permalink / raw)
  To: carsteno
  Cc: Eric Van Hensbergen, kvm-devel, lguest, linux-kernel, virtualization

Carsten Otte wrote:
>
>> [actually thinking a bit, this is specific to the virtio pci binding; 
>> s390 will never see any of it]
> You remember that we've lost the big debate around virtio in Tucson? 

I was in the embedded BOF.

> We intend to bind our virtio devices to PCI too, so that they look the 
> same in Linux userland across architectures.

Ouch.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  2007-11-27 10:19                               ` Avi Kivity
@ 2007-11-27 10:28                                 ` Carsten Otte
  0 siblings, 0 replies; 33+ messages in thread
From: Carsten Otte @ 2007-11-27 10:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: carsteno, Eric Van Hensbergen, kvm-devel, lguest, linux-kernel,
	virtualization

Avi Kivity wrote:
>> We intend to bind our virtio devices to PCI too, so that they look the 
>> same in Linux userland across architectures.
> 
> Ouch.
That was my initial opinion too, but HPA has come up with a lean and 
clean PCI binding for lguest. I think we should seriously consider 
using that over the current qemu device emulation based thing.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2007-11-27 10:29 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-08  2:46 [PATCH 0/3] virtio PCI driver Anthony Liguori
2007-11-08  2:46 ` [PATCH 1/3] Export vring functions for modules to use Anthony Liguori
2007-11-08  2:46   ` [PATCH 2/3] Put the virtio under the virtualization menu Anthony Liguori
2007-11-08  2:46     ` [PATCH 3/3] virtio PCI device Anthony Liguori
2007-11-08  6:12       ` [kvm-devel] " Avi Kivity
2007-11-08 13:54         ` Anthony Liguori
2007-11-08 14:37           ` Avi Kivity
2007-11-08 15:06             ` Anthony Liguori
2007-11-08 15:13               ` Avi Kivity
2007-11-08 23:43           ` Dor Laor
2007-11-08 17:46       ` Arnd Bergmann
2007-11-08 19:04         ` Anthony Liguori
2007-11-09 11:03           ` Arnd Bergmann
2007-11-09  0:39       ` Dor Laor
2007-11-09  2:17         ` Anthony Liguori
2007-11-20 15:01       ` Avi Kivity
2007-11-20 15:43         ` Anthony Liguori
2007-11-20 16:12           ` Avi Kivity
2007-11-20 22:16             ` Anthony Liguori
2007-11-21  7:13               ` Avi Kivity
2007-11-21 18:22                 ` Zachary Amsden
2007-11-22  7:32                   ` Avi Kivity
2007-11-23 16:51                 ` Anthony Liguori
2007-11-23 17:47                   ` Avi Kivity
2007-11-26 19:18                     ` Anthony Liguori
2007-11-27  9:02                       ` Avi Kivity
2007-11-27  9:09                         ` Carsten Otte
2007-11-27  9:27                           ` Avi Kivity
2007-11-27 10:12                             ` Carsten Otte
2007-11-27 10:19                               ` Avi Kivity
2007-11-27 10:28                                 ` Carsten Otte
2007-11-27  9:25                         ` Arnd Bergmann
2007-11-08  6:49     ` [kvm-devel] [PATCH 2/3] Put the virtio under the virtualization menu Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).