linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] virtio mmio specification enhancement
@ 2020-02-10  9:05 Zha Bin
  2020-02-10  9:05 ` [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue Zha Bin
                   ` (5 more replies)
  0 siblings, 6 replies; 40+ messages in thread
From: Zha Bin @ 2020-02-10  9:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: mst, jasowang, slp, virtio-dev, qemu-devel, gerry, zhabin,
	jing2.liu, chao.p.peng

In cloud native environment, we need a lightweight and secure system. It
should benefit from the speed of containers and the security of VM, which
is classified as secure containers. The traditional solution of cloud VM
is Qemu. In fact we don't need to pay for the legacy devices. Currently,
more userspace VMMs, e.g. Qemu, Firecracker, Cloud Hypervisor and Alibaba
Cloud VMM which is called Dragonball, began to pay attention to a
lightweight solution.

The lightweight VMM is suitable to cloud native infrastructure which is
designed for creating secure sandbox to address the requirements of
multi-tenant. Meanwhile, with faster startup time and lower memory
overhead, it makes possible to launch thousands of microVMs on the same
machine. This VMM minimizes the emulation devices and uses virtio-mmio to
get a more lightweight transport layer. The virtio-mmio devices have less
code than virtio-pci, which can decrease boot time and increase deploy
density by customizing kernel such as setting pci=off. From another point
of view, the minimal device can reduce the attack surface.

We have compared the number of files and the lines of code between
virtio-mmio and virio-pci.

				Virtio-PCI	    Virtio-MMIO	
	number of files(Linux)	    161			1
	lines of code(Linux)	    78237		538
	number of files(Qemu)	    24			1
	lines of code(Qemu)	    8952		421

But the current standard virtio-mmio spec has some limitations which is
only support legacy interrupt and will cause performance penalties.

To address such limitation, we proposed to update virtio-mmio spec with
two new feature bits to support MSI interrupt and enhancing notification
mechanism[1], which can achieve the same performance as virtio-pci devices
with only around 600 lines of code.

Here are the performance gain of MSI interrupt in virtio-mmio. Each case is
repeated three times.

        netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024

                Virtio-PCI    Virtio-MMIO   Virtio-MMIO(MSI)
        trans/s     9536        6939            9500
        trans/s     9734        7029            9749
        trans/s     9894        7095            9318

With the virtio spec proposal[1], other VMMs (e.g. Qemu) can also make use
of the new features to get a enhancing performance.

[1] https://lkml.org/lkml/2020/1/21/31

Change Log:
v1->v2
* Change version update to feature bit
* Add mask/unmask support
* Add two MSI sharing/non-sharing modes
* Create generic irq domain for all architectures

Liu Jiang (5):
  virtio-mmio: add notify feature for per-queue
  virtio-mmio: refactor common functionality
  virtio-mmio: create a generic MSI irq domain
  virtio-mmio: add MSI interrupt feature support
  x86: virtio-mmio: support virtio-mmio with MSI for x86

 arch/x86/kernel/apic/msi.c          |  11 +-
 drivers/base/platform-msi.c         |   4 +-
 drivers/virtio/Kconfig              |   9 +
 drivers/virtio/virtio_mmio.c        | 351 ++++++++++++++++++++++++++++++++----
 drivers/virtio/virtio_mmio_common.h |  39 ++++
 drivers/virtio/virtio_mmio_msi.h    | 175 ++++++++++++++++++
 include/linux/msi.h                 |   1 +
 include/uapi/linux/virtio_config.h  |  13 +-
 include/uapi/linux/virtio_mmio.h    |  31 ++++
 9 files changed, 596 insertions(+), 38 deletions(-)
 create mode 100644 drivers/virtio/virtio_mmio_common.h
 create mode 100644 drivers/virtio/virtio_mmio_msi.h

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue
  2020-02-10  9:05 [PATCH v2 0/5] virtio mmio specification enhancement Zha Bin
@ 2020-02-10  9:05 ` Zha Bin
  2020-02-11 10:50   ` Michael S. Tsirkin
  2020-02-11 11:33   ` Michael S. Tsirkin
  2020-02-10  9:05 ` [PATCH v2 2/5] virtio-mmio: refactor common functionality Zha Bin
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 40+ messages in thread
From: Zha Bin @ 2020-02-10  9:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: mst, jasowang, slp, virtio-dev, qemu-devel, gerry, zhabin,
	jing2.liu, chao.p.peng

From: Liu Jiang <gerry@linux.alibaba.com>

The standard virtio-mmio devices use notification register to signal
backend. This will cause vmexits and slow down the performance when we
passthrough the virtio-mmio devices to guest virtual machines.
We proposed to update virtio over MMIO spec to add the per-queue
notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
configure notify location for each queue.

[1] https://lkml.org/lkml/2020/1/21/31

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 drivers/virtio/virtio_mmio.c       | 37 +++++++++++++++++++++++++++++++++++--
 include/uapi/linux/virtio_config.h |  8 +++++++-
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 97d5725..1733ab97 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -90,6 +90,9 @@ struct virtio_mmio_device {
 	/* a list of queues so we can dispatch IRQs */
 	spinlock_t lock;
 	struct list_head virtqueues;
+
+	unsigned short notify_base;
+	unsigned short notify_multiplier;
 };
 
 struct virtio_mmio_vq_info {
@@ -98,6 +101,9 @@ struct virtio_mmio_vq_info {
 
 	/* the list node for the virtqueues list */
 	struct list_head node;
+
+	/* Notify Address*/
+	unsigned int notify_addr;
 };
 
 
@@ -119,13 +125,23 @@ static u64 vm_get_features(struct virtio_device *vdev)
 	return features;
 }
 
+static void vm_transport_features(struct virtio_device *vdev, u64 features)
+{
+	if (features & BIT_ULL(VIRTIO_F_MMIO_NOTIFICATION))
+		__virtio_set_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION);
+}
+
 static int vm_finalize_features(struct virtio_device *vdev)
 {
 	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+	u64 features = vdev->features;
 
 	/* Give virtio_ring a chance to accept features. */
 	vring_transport_features(vdev);
 
+	/* Give virtio_mmio a chance to accept features. */
+	vm_transport_features(vdev, features);
+
 	/* Make sure there is are no mixed devices */
 	if (vm_dev->version == 2 &&
 			!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
@@ -272,10 +288,13 @@ static void vm_reset(struct virtio_device *vdev)
 static bool vm_notify(struct virtqueue *vq)
 {
 	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
+	struct virtio_mmio_vq_info *info = vq->priv;
 
-	/* We write the queue's selector into the notification register to
+	/* We write the queue's selector into the Notify Address to
 	 * signal the other end */
-	writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
+	if (info)
+		writel(vq->index, vm_dev->base + info->notify_addr);
+
 	return true;
 }
 
@@ -434,6 +453,12 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
 	vq->priv = info;
 	info->vq = vq;
 
+	if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION))
+		info->notify_addr = vm_dev->notify_base +
+				vm_dev->notify_multiplier * vq->index;
+	else
+		info->notify_addr = VIRTIO_MMIO_QUEUE_NOTIFY;
+
 	spin_lock_irqsave(&vm_dev->lock, flags);
 	list_add(&info->node, &vm_dev->virtqueues);
 	spin_unlock_irqrestore(&vm_dev->lock, flags);
@@ -471,6 +496,14 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 		return irq;
 	}
 
+	if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION)) {
+		unsigned int notify = readl(vm_dev->base +
+				VIRTIO_MMIO_QUEUE_NOTIFY);
+
+		vm_dev->notify_base = notify & 0xffff;
+		vm_dev->notify_multiplier = (notify >> 16) & 0xffff;
+	}
+
 	err = request_irq(irq, vm_interrupt, IRQF_SHARED,
 			dev_name(&vdev->dev), vm_dev);
 	if (err)
diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index ff8e7dc..5d93c01 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -52,7 +52,7 @@
  * rest are per-device feature bits.
  */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		38
+#define VIRTIO_TRANSPORT_F_END		40
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -88,4 +88,10 @@
  * Does the device support Single Root I/O Virtualization?
  */
 #define VIRTIO_F_SR_IOV			37
+
+/*
+ * This feature indicates the enhanced notification support on MMIO transport
+ * layer.
+ */
+#define VIRTIO_F_MMIO_NOTIFICATION	39
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 2/5] virtio-mmio: refactor common functionality
  2020-02-10  9:05 [PATCH v2 0/5] virtio mmio specification enhancement Zha Bin
  2020-02-10  9:05 ` [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue Zha Bin
@ 2020-02-10  9:05 ` Zha Bin
  2020-02-11 11:19   ` Michael S. Tsirkin
  2020-02-10  9:05 ` [PATCH v2 3/5] virtio-mmio: create a generic MSI irq domain Zha Bin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 40+ messages in thread
From: Zha Bin @ 2020-02-10  9:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: mst, jasowang, slp, virtio-dev, qemu-devel, gerry, zhabin,
	jing2.liu, chao.p.peng

From: Liu Jiang <gerry@linux.alibaba.com>

Common functionality is refactored into virtio_mmio_common.h
in order to MSI support in later patch set.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 drivers/virtio/virtio_mmio.c        | 21 +--------------------
 drivers/virtio/virtio_mmio_common.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 20 deletions(-)
 create mode 100644 drivers/virtio/virtio_mmio_common.h

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 1733ab97..41e1c93 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -61,13 +61,12 @@
 #include <linux/io.h>
 #include <linux/list.h>
 #include <linux/module.h>
-#include <linux/platform_device.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
-#include <linux/virtio.h>
 #include <linux/virtio_config.h>
 #include <uapi/linux/virtio_mmio.h>
 #include <linux/virtio_ring.h>
+#include "virtio_mmio_common.h"
 
 
 
@@ -77,24 +76,6 @@
 
 
 
-#define to_virtio_mmio_device(_plat_dev) \
-	container_of(_plat_dev, struct virtio_mmio_device, vdev)
-
-struct virtio_mmio_device {
-	struct virtio_device vdev;
-	struct platform_device *pdev;
-
-	void __iomem *base;
-	unsigned long version;
-
-	/* a list of queues so we can dispatch IRQs */
-	spinlock_t lock;
-	struct list_head virtqueues;
-
-	unsigned short notify_base;
-	unsigned short notify_multiplier;
-};
-
 struct virtio_mmio_vq_info {
 	/* the actual virtqueue */
 	struct virtqueue *vq;
diff --git a/drivers/virtio/virtio_mmio_common.h b/drivers/virtio/virtio_mmio_common.h
new file mode 100644
index 0000000..90cb304
--- /dev/null
+++ b/drivers/virtio/virtio_mmio_common.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _DRIVERS_VIRTIO_VIRTIO_MMIO_COMMON_H
+#define _DRIVERS_VIRTIO_VIRTIO_MMIO_COMMON_H
+/*
+ * Virtio MMIO driver - common functionality for all device versions
+ *
+ * This module allows virtio devices to be used over a memory-mapped device.
+ */
+
+#include <linux/platform_device.h>
+#include <linux/virtio.h>
+
+#define to_virtio_mmio_device(_plat_dev) \
+	container_of(_plat_dev, struct virtio_mmio_device, vdev)
+
+struct virtio_mmio_device {
+	struct virtio_device vdev;
+	struct platform_device *pdev;
+
+	void __iomem *base;
+	unsigned long version;
+
+	/* a list of queues so we can dispatch IRQs */
+	spinlock_t lock;
+	struct list_head virtqueues;
+
+	unsigned short notify_base;
+	unsigned short notify_multiplier;
+};
+
+#endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 3/5] virtio-mmio: create a generic MSI irq domain
  2020-02-10  9:05 [PATCH v2 0/5] virtio mmio specification enhancement Zha Bin
  2020-02-10  9:05 ` [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue Zha Bin
  2020-02-10  9:05 ` [PATCH v2 2/5] virtio-mmio: refactor common functionality Zha Bin
@ 2020-02-10  9:05 ` Zha Bin
  2020-02-11 11:16   ` Michael S. Tsirkin
  2020-02-12  7:40   ` Michael S. Tsirkin
  2020-02-10  9:05 ` [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support Zha Bin
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 40+ messages in thread
From: Zha Bin @ 2020-02-10  9:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: mst, jasowang, slp, virtio-dev, qemu-devel, gerry, zhabin,
	jing2.liu, chao.p.peng

From: Liu Jiang <gerry@linux.alibaba.com>

Create a generic irq domain for all architectures which
supports virtio-mmio. The device offering VIRTIO_F_MMIO_MSI
feature bit can use this irq domain.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 drivers/base/platform-msi.c      |  4 +-
 drivers/virtio/Kconfig           |  9 ++++
 drivers/virtio/virtio_mmio_msi.h | 93 ++++++++++++++++++++++++++++++++++++++++
 include/linux/msi.h              |  1 +
 4 files changed, 105 insertions(+), 2 deletions(-)
 create mode 100644 drivers/virtio/virtio_mmio_msi.h

diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index 8da314b..45752f1 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -31,12 +31,11 @@ struct platform_msi_priv_data {
 /* The devid allocator */
 static DEFINE_IDA(platform_msi_devid_ida);
 
-#ifdef GENERIC_MSI_DOMAIN_OPS
 /*
  * Convert an msi_desc to a globaly unique identifier (per-device
  * devid + msi_desc position in the msi_list).
  */
-static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
+irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
 {
 	u32 devid;
 
@@ -45,6 +44,7 @@ static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
 	return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index;
 }
 
+#ifdef GENERIC_MSI_DOMAIN_OPS
 static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
 {
 	arg->desc = desc;
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 078615c..551a9f7 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -84,6 +84,15 @@ config VIRTIO_MMIO
 
  	 If unsure, say N.
 
+config VIRTIO_MMIO_MSI
+	bool "Memory-mapped virtio device MSI"
+	depends on VIRTIO_MMIO && GENERIC_MSI_IRQ_DOMAIN && GENERIC_MSI_IRQ
+	help
+	 This allows device drivers to support msi interrupt handling for
+	 virtio-mmio devices. It can improve performance greatly.
+
+	 If unsure, say N.
+
 config VIRTIO_MMIO_CMDLINE_DEVICES
 	bool "Memory mapped virtio devices parameter parsing"
 	depends on VIRTIO_MMIO
diff --git a/drivers/virtio/virtio_mmio_msi.h b/drivers/virtio/virtio_mmio_msi.h
new file mode 100644
index 0000000..27cb2af
--- /dev/null
+++ b/drivers/virtio/virtio_mmio_msi.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
+#define _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
+
+#ifdef CONFIG_VIRTIO_MMIO_MSI
+
+#include <linux/msi.h>
+#include <linux/irq.h>
+#include <linux/irqdomain.h>
+#include <linux/platform_device.h>
+
+static irq_hw_number_t mmio_msi_hwirq;
+static struct irq_domain *mmio_msi_domain;
+
+struct irq_domain *__weak arch_msi_root_irq_domain(void)
+{
+	return NULL;
+}
+
+void __weak irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+}
+
+static void mmio_msi_mask_irq(struct irq_data *data)
+{
+}
+
+static void mmio_msi_unmask_irq(struct irq_data *data)
+{
+}
+
+static struct irq_chip mmio_msi_controller = {
+	.name			= "VIRTIO-MMIO-MSI",
+	.irq_mask		= mmio_msi_mask_irq,
+	.irq_unmask		= mmio_msi_unmask_irq,
+	.irq_ack		= irq_chip_ack_parent,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_compose_msi_msg	= irq_msi_compose_msg,
+	.flags			= IRQCHIP_SKIP_SET_WAKE,
+};
+
+static int mmio_msi_prepare(struct irq_domain *domain, struct device *dev,
+				int nvec, msi_alloc_info_t *arg)
+{
+	memset(arg, 0, sizeof(*arg));
+	return 0;
+}
+
+static void mmio_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
+{
+	mmio_msi_hwirq = platform_msi_calc_hwirq(desc);
+}
+
+static irq_hw_number_t mmio_msi_get_hwirq(struct msi_domain_info *info,
+					      msi_alloc_info_t *arg)
+{
+	return mmio_msi_hwirq;
+}
+
+static struct msi_domain_ops mmio_msi_domain_ops = {
+	.msi_prepare	= mmio_msi_prepare,
+	.set_desc	= mmio_msi_set_desc,
+	.get_hwirq	= mmio_msi_get_hwirq,
+};
+
+static struct msi_domain_info mmio_msi_domain_info = {
+	.flags          = MSI_FLAG_USE_DEF_DOM_OPS |
+			  MSI_FLAG_USE_DEF_CHIP_OPS |
+			  MSI_FLAG_ACTIVATE_EARLY,
+	.ops            = &mmio_msi_domain_ops,
+	.chip           = &mmio_msi_controller,
+	.handler        = handle_edge_irq,
+	.handler_name   = "edge",
+};
+
+static inline void mmio_msi_create_irq_domain(void)
+{
+	struct fwnode_handle *fn;
+	struct irq_domain *parent = arch_msi_root_irq_domain();
+
+	fn = irq_domain_alloc_named_fwnode("VIRTIO-MMIO-MSI");
+	if (fn && parent) {
+		mmio_msi_domain =
+			platform_msi_create_irq_domain(fn,
+				&mmio_msi_domain_info, parent);
+		irq_domain_free_fwnode(fn);
+	}
+}
+#else
+static inline void mmio_msi_create_irq_domain(void) {}
+#endif
+
+#endif
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 8ad679e..ee5f566 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -362,6 +362,7 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
 void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq,
 			      unsigned int nvec);
 void *platform_msi_get_host_data(struct irq_domain *domain);
+irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc);
 #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
 
 #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-10  9:05 [PATCH v2 0/5] virtio mmio specification enhancement Zha Bin
                   ` (2 preceding siblings ...)
  2020-02-10  9:05 ` [PATCH v2 3/5] virtio-mmio: create a generic MSI irq domain Zha Bin
@ 2020-02-10  9:05 ` Zha Bin
  2020-02-11  3:17   ` Jason Wang
  2020-02-11 11:11   ` Michael S. Tsirkin
  2020-02-10  9:05 ` [PATCH v2 5/5] x86: virtio-mmio: support virtio-mmio with MSI for x86 Zha Bin
  2020-02-10 11:44 ` [PATCH v2 0/5] virtio mmio specification enhancement Michael S. Tsirkin
  5 siblings, 2 replies; 40+ messages in thread
From: Zha Bin @ 2020-02-10  9:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: mst, jasowang, slp, virtio-dev, qemu-devel, gerry, zhabin,
	jing2.liu, chao.p.peng

From: Liu Jiang <gerry@linux.alibaba.com>

Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
virtio over mmio devices as a lightweight machine model for modern
cloud. The standard virtio over MMIO transport layer only supports one
legacy interrupt, which is much heavier than virtio over PCI transport
layer using MSI. Legacy interrupt has long work path and causes specific
VMExits in following cases, which would considerably slow down the
performance:

1) read interrupt status register
2) update interrupt status register
3) write IOAPIC EOI register

We proposed to add MSI support for virtio over MMIO via new feature
bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.

With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
uses msi_sharing[1] to indicate the event and vector mapping.
Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
Bit 1 is 1: device uses sharing mode and dynamic mapping.

[1] https://lkml.org/lkml/2020/1/21/31

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 drivers/virtio/virtio_mmio.c        | 299 ++++++++++++++++++++++++++++++++++--
 drivers/virtio/virtio_mmio_common.h |   8 +
 drivers/virtio/virtio_mmio_msi.h    |  82 ++++++++++
 include/uapi/linux/virtio_config.h  |   7 +-
 include/uapi/linux/virtio_mmio.h    |  31 ++++
 5 files changed, 411 insertions(+), 16 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 41e1c93..b24ce21 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -67,8 +67,7 @@
 #include <uapi/linux/virtio_mmio.h>
 #include <linux/virtio_ring.h>
 #include "virtio_mmio_common.h"
-
-
+#include "virtio_mmio_msi.h"
 
 /* The alignment to use between consumer and producer parts of vring.
  * Currently hardcoded to the page size. */
@@ -85,9 +84,13 @@ struct virtio_mmio_vq_info {
 
 	/* Notify Address*/
 	unsigned int notify_addr;
-};
 
+	/* MSI vector (or none) */
+	unsigned int msi_vector;
+};
 
+static void vm_free_msi_irqs(struct virtio_device *vdev);
+static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs);
 
 /* Configuration interface */
 
@@ -110,6 +113,9 @@ static void vm_transport_features(struct virtio_device *vdev, u64 features)
 {
 	if (features & BIT_ULL(VIRTIO_F_MMIO_NOTIFICATION))
 		__virtio_set_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION);
+
+	if (features & BIT_ULL(VIRTIO_F_MMIO_MSI))
+		__virtio_set_bit(vdev, VIRTIO_F_MMIO_MSI);
 }
 
 static int vm_finalize_features(struct virtio_device *vdev)
@@ -307,7 +313,33 @@ static irqreturn_t vm_interrupt(int irq, void *opaque)
 	return ret;
 }
 
+static irqreturn_t vm_vring_interrupt(int irq, void *opaque)
+{
+	struct virtio_mmio_device *vm_dev = opaque;
+	struct virtio_mmio_vq_info *info;
+	irqreturn_t ret = IRQ_NONE;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vm_dev->lock, flags);
+	list_for_each_entry(info, &vm_dev->virtqueues, node) {
+		if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+			ret = IRQ_HANDLED;
+	}
+	spin_unlock_irqrestore(&vm_dev->lock, flags);
+
+	return ret;
+}
+
+
+/* Handle a configuration change */
+static irqreturn_t vm_config_changed(int irq, void *opaque)
+{
+	struct virtio_mmio_device *vm_dev = opaque;
+
+	virtio_config_changed(&vm_dev->vdev);
 
+	return IRQ_HANDLED;
+}
 
 static void vm_del_vq(struct virtqueue *vq)
 {
@@ -316,6 +348,15 @@ static void vm_del_vq(struct virtqueue *vq)
 	unsigned long flags;
 	unsigned int index = vq->index;
 
+	if (vm_dev->msi_enabled && !vm_dev->msi_share) {
+		if (info->msi_vector != VIRTIO_MMIO_MSI_NO_VECTOR) {
+			int irq = mmio_msi_irq_vector(&vq->vdev->dev,
+					info->msi_vector);
+
+			free_irq(irq, vq);
+		}
+	}
+
 	spin_lock_irqsave(&vm_dev->lock, flags);
 	list_del(&info->node);
 	spin_unlock_irqrestore(&vm_dev->lock, flags);
@@ -334,20 +375,56 @@ static void vm_del_vq(struct virtqueue *vq)
 	kfree(info);
 }
 
-static void vm_del_vqs(struct virtio_device *vdev)
+static void vm_free_irqs(struct virtio_device *vdev)
 {
 	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+	if (vm_dev->msi_enabled)
+		vm_free_msi_irqs(vdev);
+	else
+		free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
+}
+
+static void vm_del_vqs(struct virtio_device *vdev)
+{
 	struct virtqueue *vq, *n;
 
 	list_for_each_entry_safe(vq, n, &vdev->vqs, list)
 		vm_del_vq(vq);
 
-	free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
+	vm_free_irqs(vdev);
+}
+
+static inline void mmio_msi_set_enable(struct virtio_mmio_device *vm_dev,
+					int enable)
+{
+	u32 state;
+
+	state = readl(vm_dev->base + VIRTIO_MMIO_MSI_STATE);
+	if (enable && (state & VIRTIO_MMIO_MSI_ENABLED_MASK))
+		return;
+
+	writel(VIRTIO_MMIO_MSI_CMD_ENABLE,
+		vm_dev->base + VIRTIO_MMIO_MSI_COMMAND);
+}
+
+static void mmio_msi_config_vector(struct virtio_mmio_device *vm_dev, u32 vec)
+{
+	writel(vec, vm_dev->base + VIRTIO_MMIO_MSI_VEC_SEL);
+	writel(VIRTIO_MMIO_MSI_CMD_MAP_CONFIG, vm_dev->base +
+			VIRTIO_MMIO_MSI_COMMAND);
+}
+
+static void mmio_msi_queue_vector(struct virtio_mmio_device *vm_dev, u32 vec)
+{
+	writel(vec, vm_dev->base + VIRTIO_MMIO_MSI_VEC_SEL);
+	writel(VIRTIO_MMIO_MSI_CMD_MAP_QUEUE, vm_dev->base +
+			VIRTIO_MMIO_MSI_COMMAND);
 }
 
 static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
 				  void (*callback)(struct virtqueue *vq),
-				  const char *name, bool ctx)
+				  const char *name, bool ctx, u32 msi_vector)
 {
 	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
 	struct virtio_mmio_vq_info *info;
@@ -440,6 +517,11 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
 	else
 		info->notify_addr = VIRTIO_MMIO_QUEUE_NOTIFY;
 
+	info->msi_vector = msi_vector;
+	/* Set queue event and vector mapping for MSI share mode. */
+	if (vm_dev->msi_share && msi_vector != VIRTIO_MMIO_MSI_NO_VECTOR)
+		mmio_msi_queue_vector(vm_dev, msi_vector);
+
 	spin_lock_irqsave(&vm_dev->lock, flags);
 	list_add(&info->node, &vm_dev->virtqueues);
 	spin_unlock_irqrestore(&vm_dev->lock, flags);
@@ -461,12 +543,11 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
 	return ERR_PTR(err);
 }
 
-static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
-		       struct virtqueue *vqs[],
-		       vq_callback_t *callbacks[],
-		       const char * const names[],
-		       const bool *ctx,
-		       struct irq_affinity *desc)
+static int vm_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
+			struct virtqueue *vqs[],
+			vq_callback_t *callbacks[],
+			const char * const names[],
+			const bool *ctx)
 {
 	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
 	int irq = platform_get_irq(vm_dev->pdev, 0);
@@ -487,8 +568,6 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 
 	err = request_irq(irq, vm_interrupt, IRQF_SHARED,
 			dev_name(&vdev->dev), vm_dev);
-	if (err)
-		return err;
 
 	for (i = 0; i < nvqs; ++i) {
 		if (!names[i]) {
@@ -497,14 +576,202 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 		}
 
 		vqs[i] = vm_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
-				     ctx ? ctx[i] : false);
+				     ctx ? ctx[i] : false,
+				     VIRTIO_MMIO_MSI_NO_VECTOR);
 		if (IS_ERR(vqs[i])) {
 			vm_del_vqs(vdev);
 			return PTR_ERR(vqs[i]);
 		}
 	}
 
+	return err;
+}
+
+static int vm_find_vqs_msi(struct virtio_device *vdev, unsigned int nvqs,
+			struct virtqueue *vqs[], vq_callback_t *callbacks[],
+			const char * const names[], bool per_vq_vectors,
+			const bool *ctx, struct irq_affinity *desc)
+{
+	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+	int i, err, allocated_vectors, nvectors;
+	u32 msi_vec;
+	u32 max_vec_num = readl(vm_dev->base + VIRTIO_MMIO_MSI_VEC_NUM);
+
+	/* For MSI non-sharing, the max vector number MUST greater than nvqs.
+	 * Otherwise, go back to legacy interrupt.
+	 */
+	if (per_vq_vectors && max_vec_num < (nvqs + 1))
+		return -EINVAL;
+
+	if (per_vq_vectors) {
+		nvectors = 1;
+		for (i = 0; i < nvqs; ++i)
+			if (callbacks[i])
+				++nvectors;
+	} else {
+		nvectors = 2;
+	}
+
+	vm_dev->msi_share = !per_vq_vectors;
+
+	/* Allocate nvqs irqs for queues and one irq for configuration */
+	err = vm_request_msi_vectors(vdev, nvectors);
+	if (err != 0)
+		return err;
+
+	allocated_vectors = vm_dev->msi_used_vectors;
+	for (i = 0; i < nvqs; i++) {
+		if (!names[i]) {
+			vqs[i] = NULL;
+			continue;
+		}
+		if (!callbacks[i])
+			msi_vec = VIRTIO_MMIO_MSI_NO_VECTOR;
+		else if (per_vq_vectors)
+			msi_vec = allocated_vectors++;
+		else
+			msi_vec = 1;
+		vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i],
+				ctx ? ctx[i] : false, msi_vec);
+		if (IS_ERR(vqs[i])) {
+			err = PTR_ERR(vqs[i]);
+			goto error_find;
+		}
+
+		if (!per_vq_vectors ||
+				msi_vec == VIRTIO_MMIO_MSI_NO_VECTOR)
+			continue;
+
+		/* allocate per-vq irq if available and necessary */
+		snprintf(vm_dev->vm_vq_names[msi_vec],
+			sizeof(*vm_dev->vm_vq_names),
+			"%s-%s",
+			dev_name(&vm_dev->vdev.dev), names[i]);
+		err = request_irq(mmio_msi_irq_vector(&vqs[i]->vdev->dev,
+					msi_vec),
+				vring_interrupt, 0,
+				vm_dev->vm_vq_names[msi_vec], vqs[i]);
+
+		if (err)
+			goto error_find;
+	}
+
 	return 0;
+
+error_find:
+	vm_del_vqs(vdev);
+	return err;
+}
+
+static int vm_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
+		       struct virtqueue *vqs[],
+		       vq_callback_t *callbacks[],
+		       const char * const names[],
+		       const bool *ctx,
+		       struct irq_affinity *desc)
+{
+	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+	int err;
+
+	if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION)) {
+		unsigned int notify = readl(vm_dev->base +
+				VIRTIO_MMIO_QUEUE_NOTIFY);
+
+		vm_dev->notify_base = notify & 0xffff;
+		vm_dev->notify_multiplier = (notify >> 16) & 0xffff;
+	}
+
+	if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_MSI)) {
+		bool dyn_mapping = !!(readl(vm_dev->base +
+					VIRTIO_MMIO_MSI_STATE) &
+				VIRTIO_MMIO_MSI_SHARING_MASK);
+		if (!dyn_mapping)
+			err = vm_find_vqs_msi(vdev, nvqs, vqs, callbacks,
+				names, true, ctx, desc);
+		else
+			err = vm_find_vqs_msi(vdev, nvqs, vqs, callbacks,
+				names, false, ctx, desc);
+		if (!err)
+			return 0;
+	}
+
+	return vm_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, ctx);
+}
+
+static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs)
+{
+	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+	unsigned int v;
+	int irq, err;
+
+	if (vm_dev->msi_enabled)
+		return -EINVAL;
+
+	vm_dev->vm_vq_names = kmalloc_array(nirqs, sizeof(*vm_dev->vm_vq_names),
+					GFP_KERNEL);
+	if (!vm_dev->vm_vq_names)
+		return -ENOMEM;
+
+	mmio_get_msi_domain(vdev);
+	err = mmio_msi_domain_alloc_irqs(&vdev->dev, nirqs);
+	if (err) {
+		kfree(vm_dev->vm_vq_names);
+		vm_dev->vm_vq_names = NULL;
+		return err;
+	}
+
+	mmio_msi_set_enable(vm_dev, 1);
+	vm_dev->msi_enabled = true;
+
+	v = vm_dev->msi_used_vectors;
+	/* The first MSI vector is used for configuration change event. */
+	snprintf(vm_dev->vm_vq_names[v], sizeof(*vm_dev->vm_vq_names),
+			"%s-config", dev_name(&vdev->dev));
+	irq = mmio_msi_irq_vector(&vdev->dev, v);
+	err = request_irq(irq, vm_config_changed, 0, vm_dev->vm_vq_names[v],
+			vm_dev);
+	if (err)
+		goto error_request_irq;
+
+	/* Set the configuration event mapping. */
+	if (vm_dev->msi_share)
+		mmio_msi_config_vector(vm_dev, v);
+
+	++vm_dev->msi_used_vectors;
+
+	if (vm_dev->msi_share) {
+		v = vm_dev->msi_used_vectors;
+		snprintf(vm_dev->vm_vq_names[v], sizeof(*vm_dev->vm_vq_names),
+			 "%s-virtqueues", dev_name(&vm_dev->vdev.dev));
+		err = request_irq(mmio_msi_irq_vector(&vdev->dev, v),
+				  vm_vring_interrupt, 0, vm_dev->vm_vq_names[v],
+				  vm_dev);
+		if (err)
+			goto error_request_irq;
+		++vm_dev->msi_used_vectors;
+	}
+
+	return 0;
+
+error_request_irq:
+	vm_free_msi_irqs(vdev);
+
+	return err;
+}
+
+static void vm_free_msi_irqs(struct virtio_device *vdev)
+{
+	int i;
+	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+	mmio_msi_set_enable(vm_dev, 0);
+	for (i = 0; i < vm_dev->msi_used_vectors; i++)
+		free_irq(mmio_msi_irq_vector(&vdev->dev, i), vm_dev);
+	mmio_msi_domain_free_irqs(&vdev->dev);
+	kfree(vm_dev->vm_vq_names);
+	vm_dev->vm_vq_names = NULL;
+	vm_dev->msi_enabled = false;
+	vm_dev->msi_used_vectors = 0;
 }
 
 static const char *vm_bus_name(struct virtio_device *vdev)
@@ -609,6 +876,8 @@ static int virtio_mmio_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, vm_dev);
 
+	mmio_msi_create_irq_domain();
+
 	rc = register_virtio_device(&vm_dev->vdev);
 	if (rc)
 		put_device(&vm_dev->vdev.dev);
diff --git a/drivers/virtio/virtio_mmio_common.h b/drivers/virtio/virtio_mmio_common.h
index 90cb304..ccf6320 100644
--- a/drivers/virtio/virtio_mmio_common.h
+++ b/drivers/virtio/virtio_mmio_common.h
@@ -26,6 +26,14 @@ struct virtio_mmio_device {
 
 	unsigned short notify_base;
 	unsigned short notify_multiplier;
+
+	/* Name strings for interrupts. This size should be enough. */
+	char (*vm_vq_names)[256];
+
+	/* used vectors */
+	unsigned int msi_used_vectors;
+	bool msi_share;
+	bool msi_enabled;
 };
 
 #endif
diff --git a/drivers/virtio/virtio_mmio_msi.h b/drivers/virtio/virtio_mmio_msi.h
index 27cb2af..10038cb 100644
--- a/drivers/virtio/virtio_mmio_msi.h
+++ b/drivers/virtio/virtio_mmio_msi.h
@@ -8,6 +8,7 @@
 #include <linux/irq.h>
 #include <linux/irqdomain.h>
 #include <linux/platform_device.h>
+#include "virtio_mmio_common.h"
 
 static irq_hw_number_t mmio_msi_hwirq;
 static struct irq_domain *mmio_msi_domain;
@@ -21,12 +22,41 @@ void __weak irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
 }
 
+static void __iomem *vm_dev_base(struct msi_desc *desc)
+{
+	if (desc) {
+		struct device *dev = desc->dev;
+		struct virtio_device *vdev = dev_to_virtio(dev);
+		struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+		return vm_dev->base;
+	}
+
+	return NULL;
+}
+
+static void mmio_msi_set_mask_bit(struct irq_data *data, u32 flag)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	void __iomem *base = vm_dev_base(desc);
+	unsigned int offset = data->irq - desc->irq;
+
+	if (base) {
+		u32 op = flag ? VIRTIO_MMIO_MSI_CMD_MASK :
+			VIRTIO_MMIO_MSI_CMD_UNMASK;
+		writel(offset, base + VIRTIO_MMIO_MSI_VEC_SEL);
+		writel(op, base + VIRTIO_MMIO_MSI_COMMAND);
+	}
+}
+
 static void mmio_msi_mask_irq(struct irq_data *data)
 {
+	mmio_msi_set_mask_bit(data, 1);
 }
 
 static void mmio_msi_unmask_irq(struct irq_data *data)
 {
+	mmio_msi_set_mask_bit(data, 0);
 }
 
 static struct irq_chip mmio_msi_controller = {
@@ -86,8 +116,60 @@ static inline void mmio_msi_create_irq_domain(void)
 		irq_domain_free_fwnode(fn);
 	}
 }
+
+static void mmio_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
+{
+	void __iomem *base = vm_dev_base(desc);
+
+	if (base) {
+		writel(desc->platform.msi_index, base +
+				VIRTIO_MMIO_MSI_VEC_SEL);
+		writel(msg->address_lo, base + VIRTIO_MMIO_MSI_ADDRESS_LOW);
+		writel(msg->address_hi, base + VIRTIO_MMIO_MSI_ADDRESS_HIGH);
+		writel(msg->data, base + VIRTIO_MMIO_MSI_DATA);
+		writel(VIRTIO_MMIO_MSI_CMD_CONFIGURE,
+			base + VIRTIO_MMIO_MSI_COMMAND);
+	}
+}
+
+static inline int mmio_msi_domain_alloc_irqs(struct device *dev,
+				unsigned int nvec)
+{
+	return platform_msi_domain_alloc_irqs(dev, nvec,
+			mmio_write_msi_msg);
+}
+
+static inline void mmio_msi_domain_free_irqs(struct device *dev)
+{
+	return platform_msi_domain_free_irqs(dev);
+}
+
+static inline void mmio_get_msi_domain(struct virtio_device *vdev)
+{
+	if (!vdev->dev.msi_domain)
+		vdev->dev.msi_domain = mmio_msi_domain;
+}
+
+static inline int mmio_msi_irq_vector(struct device *dev, unsigned int nr)
+{
+	struct msi_desc *entry = first_msi_entry(dev);
+
+	return entry->irq + nr;
+}
+
 #else
 static inline void mmio_msi_create_irq_domain(void) {}
+static inline int mmio_msi_irq_vector(struct device *dev, unsigned int nr)
+{
+	return -EINVAL;
+}
+static inline void mmio_get_msi_domain(struct virtio_device *vdev) {}
+static inline int mmio_msi_domain_alloc_irqs(struct device *dev,
+				unsigned int nvec)
+{
+	return -EINVAL;
+}
+static inline void mmio_msi_domain_free_irqs(struct device *dev) {}
 #endif
 
 #endif
diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 5d93c01..5eb3900 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -52,7 +52,7 @@
  * rest are per-device feature bits.
  */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		40
+#define VIRTIO_TRANSPORT_F_END		41
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -94,4 +94,9 @@
  * layer.
  */
 #define VIRTIO_F_MMIO_NOTIFICATION	39
+
+/*
+ * This feature indicates the MSI support on MMIO layer.
+ */
+#define VIRTIO_F_MMIO_MSI		40
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_mmio.h b/include/uapi/linux/virtio_mmio.h
index c4b0968..777cb0e 100644
--- a/include/uapi/linux/virtio_mmio.h
+++ b/include/uapi/linux/virtio_mmio.h
@@ -122,6 +122,21 @@
 #define VIRTIO_MMIO_QUEUE_USED_LOW	0x0a0
 #define VIRTIO_MMIO_QUEUE_USED_HIGH	0x0a4
 
+/* MSI max vector number that device supports - Read Only */
+#define VIRTIO_MMIO_MSI_VEC_NUM		0x0c0
+/* MSI state register  - Read Only */
+#define VIRTIO_MMIO_MSI_STATE		0x0c4
+/* MSI command register - Write Only */
+#define VIRTIO_MMIO_MSI_COMMAND		0x0c8
+/* MSI vector selector  - Write Only */
+#define VIRTIO_MMIO_MSI_VEC_SEL		0x0d0
+/* MSI low 32 bit address, 64 bits in two halves */
+#define VIRTIO_MMIO_MSI_ADDRESS_LOW	0x0d4
+/* MSI high 32 bit address, 64 bits in two halves */
+#define VIRTIO_MMIO_MSI_ADDRESS_HIGH	0x0d8
+/* MSI 32 bit data */
+#define VIRTIO_MMIO_MSI_DATA		0x0dc
+
 /* Configuration atomicity value */
 #define VIRTIO_MMIO_CONFIG_GENERATION	0x0fc
 
@@ -130,6 +145,22 @@
 #define VIRTIO_MMIO_CONFIG		0x100
 
 
+/* MSI commands */
+#define VIRTIO_MMIO_MSI_CMD_ENABLE	0x1
+#define VIRTIO_MMIO_MSI_CMD_DISABLE	0x2
+#define VIRTIO_MMIO_MSI_CMD_CONFIGURE	0x3
+#define VIRTIO_MMIO_MSI_CMD_MASK	0x4
+#define VIRTIO_MMIO_MSI_CMD_UNMASK	0x5
+#define VIRTIO_MMIO_MSI_CMD_MAP_CONFIG	0x6
+#define VIRTIO_MMIO_MSI_CMD_MAP_QUEUE	0x7
+
+/* MSI NO_VECTOR */
+#define VIRTIO_MMIO_MSI_NO_VECTOR	0xffffffff
+
+/* MSI state enabled state mask */
+#define VIRTIO_MMIO_MSI_ENABLED_MASK	(1 << 31)
+/* MSI state MSI sharing mask */
+#define VIRTIO_MMIO_MSI_SHARING_MASK	(1 << 30)
 
 /*
  * Interrupt flags (re: interrupt status & acknowledge registers)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 5/5] x86: virtio-mmio: support virtio-mmio with MSI for x86
  2020-02-10  9:05 [PATCH v2 0/5] virtio mmio specification enhancement Zha Bin
                   ` (3 preceding siblings ...)
  2020-02-10  9:05 ` [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support Zha Bin
@ 2020-02-10  9:05 ` Zha Bin
  2020-02-11 11:14   ` Michael S. Tsirkin
  2020-02-10 11:44 ` [PATCH v2 0/5] virtio mmio specification enhancement Michael S. Tsirkin
  5 siblings, 1 reply; 40+ messages in thread
From: Zha Bin @ 2020-02-10  9:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: mst, jasowang, slp, virtio-dev, qemu-devel, gerry, zhabin,
	jing2.liu, chao.p.peng

From: Liu Jiang <gerry@linux.alibaba.com>

virtio-mmio supports a generic MSI irq domain for all archs. This
patch adds the x86 architecture support.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 arch/x86/kernel/apic/msi.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 159bd0c..2fcd602 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -45,7 +45,11 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 		MSI_DATA_VECTOR(cfg->vector);
 }
 
-static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+/*
+ * x86 PCI-MSI/HPET/DMAR related method.
+ * Also can be used as arch specific method for virtio-mmio MSI.
+ */
+void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	__irq_msi_compose_msg(irqd_cfg(data), msg);
 }
@@ -166,6 +170,11 @@ static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 	return ret;
 }
 
+struct irq_domain *arch_msi_root_irq_domain(void)
+{
+	return x86_vector_domain;
+}
+
 /*
  * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
  * which implement the MSI or MSI-X Capability Structure.
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/5] virtio mmio specification enhancement
  2020-02-10  9:05 [PATCH v2 0/5] virtio mmio specification enhancement Zha Bin
                   ` (4 preceding siblings ...)
  2020-02-10  9:05 ` [PATCH v2 5/5] x86: virtio-mmio: support virtio-mmio with MSI for x86 Zha Bin
@ 2020-02-10 11:44 ` Michael S. Tsirkin
  2020-02-11 16:05   ` Chao Peng
  5 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-10 11:44 UTC (permalink / raw)
  To: Zha Bin
  Cc: linux-kernel, jasowang, slp, virtio-dev, qemu-devel, gerry,
	jing2.liu, chao.p.peng

On Mon, Feb 10, 2020 at 05:05:16PM +0800, Zha Bin wrote:
> We have compared the number of files and the lines of code between
> virtio-mmio and virio-pci.
> 
> 				Virtio-PCI	    Virtio-MMIO	
> 	number of files(Linux)	    161			1
> 	lines of code(Linux)	    78237		538



Something's very wrong here. virtio PCI is 161 files?
Are you counting the whole PCI subsystem?
Sure enough:

$ find drivers/pci -name '*c' |wc -l
150

That's not reasonable, this includes a bunch of drivers that
never run on a typical hypervisor.

MMIO is also not as small as you are trying to show:

$ cloc drivers/virtio/virtio_mmio.c include/uapi/linux/virtio_mmio.h
       2 text files.
       2 unique files.                              
       0 files ignored.

github.com/AlDanial/cloc v 1.82  T=0.01 s (230.7 files/s, 106126.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                                1            144            100            535
C/C++ Header                     1             39             66             36
-------------------------------------------------------------------------------
SUM:                             2            183            166            571
-------------------------------------------------------------------------------


I don't doubt MMIO is smaller than PCI. Of course that's because it has
no features to speak of - just this patch already doubles it's size. If
we keep doing that because we want the features then they will reach
the same size in about 4 iterations.


-- 
MST


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-10  9:05 ` [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support Zha Bin
@ 2020-02-11  3:17   ` Jason Wang
  2020-02-11  3:35     ` [virtio-dev] " Liu, Jing2
  2020-02-11 11:11   ` Michael S. Tsirkin
  1 sibling, 1 reply; 40+ messages in thread
From: Jason Wang @ 2020-02-11  3:17 UTC (permalink / raw)
  To: Zha Bin, linux-kernel
  Cc: virtio-dev, slp, mst, jing2.liu, qemu-devel, chao.p.peng, gerry


On 2020/2/10 下午5:05, Zha Bin wrote:
> From: Liu Jiang<gerry@linux.alibaba.com>
>
> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
> virtio over mmio devices as a lightweight machine model for modern
> cloud. The standard virtio over MMIO transport layer only supports one
> legacy interrupt, which is much heavier than virtio over PCI transport
> layer using MSI. Legacy interrupt has long work path and causes specific
> VMExits in following cases, which would considerably slow down the
> performance:
>
> 1) read interrupt status register
> 2) update interrupt status register
> 3) write IOAPIC EOI register
>
> We proposed to add MSI support for virtio over MMIO via new feature
> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
>
> With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
> uses msi_sharing[1] to indicate the event and vector mapping.
> Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
> Bit 1 is 1: device uses sharing mode and dynamic mapping.


I believe dynamic mapping should cover the case of fixed vector?

Thanks



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-11  3:17   ` Jason Wang
@ 2020-02-11  3:35     ` Liu, Jing2
  2020-02-11  4:02       ` Jason Wang
  2020-02-11 11:21       ` Michael S. Tsirkin
  0 siblings, 2 replies; 40+ messages in thread
From: Liu, Jing2 @ 2020-02-11  3:35 UTC (permalink / raw)
  To: Jason Wang, Zha Bin, linux-kernel
  Cc: virtio-dev, slp, mst, qemu-devel, chao.p.peng, gerry


On 2/11/2020 11:17 AM, Jason Wang wrote:
>
> On 2020/2/10 下午5:05, Zha Bin wrote:
>> From: Liu Jiang<gerry@linux.alibaba.com>
>>
>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>> virtio over mmio devices as a lightweight machine model for modern
>> cloud. The standard virtio over MMIO transport layer only supports one
>> legacy interrupt, which is much heavier than virtio over PCI transport
>> layer using MSI. Legacy interrupt has long work path and causes specific
>> VMExits in following cases, which would considerably slow down the
>> performance:
>>
>> 1) read interrupt status register
>> 2) update interrupt status register
>> 3) write IOAPIC EOI register
>>
>> We proposed to add MSI support for virtio over MMIO via new feature
>> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
>>
>> With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
>> uses msi_sharing[1] to indicate the event and vector mapping.
>> Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
>> Bit 1 is 1: device uses sharing mode and dynamic mapping.
>
>
> I believe dynamic mapping should cover the case of fixed vector?
>
Actually this bit *aims* for msi sharing or msi non-sharing.

It means, when msi sharing bit is 1, device doesn't want vector per queue

(it wants msi vector sharing as name) and doesn't want a high interrupt 
rate.

So driver turns to !per_vq_vectors and has to do dynamical mapping.

So they are opposite not superset.

Thanks!

Jing


> Thanks
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-11  3:35     ` [virtio-dev] " Liu, Jing2
@ 2020-02-11  4:02       ` Jason Wang
       [not found]         ` <5522f205-207b-b012-6631-3cc77dde3bfe@linux.intel.com>
  2020-02-11 11:21       ` Michael S. Tsirkin
  1 sibling, 1 reply; 40+ messages in thread
From: Jason Wang @ 2020-02-11  4:02 UTC (permalink / raw)
  To: Liu, Jing2, Zha Bin, linux-kernel
  Cc: virtio-dev, slp, mst, qemu-devel, chao.p.peng, gerry


On 2020/2/11 上午11:35, Liu, Jing2 wrote:
>
> On 2/11/2020 11:17 AM, Jason Wang wrote:
>>
>> On 2020/2/10 下午5:05, Zha Bin wrote:
>>> From: Liu Jiang<gerry@linux.alibaba.com>
>>>
>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>>> virtio over mmio devices as a lightweight machine model for modern
>>> cloud. The standard virtio over MMIO transport layer only supports one
>>> legacy interrupt, which is much heavier than virtio over PCI transport
>>> layer using MSI. Legacy interrupt has long work path and causes 
>>> specific
>>> VMExits in following cases, which would considerably slow down the
>>> performance:
>>>
>>> 1) read interrupt status register
>>> 2) update interrupt status register
>>> 3) write IOAPIC EOI register
>>>
>>> We proposed to add MSI support for virtio over MMIO via new feature
>>> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
>>>
>>> With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
>>> uses msi_sharing[1] to indicate the event and vector mapping.
>>> Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
>>> Bit 1 is 1: device uses sharing mode and dynamic mapping.
>>
>>
>> I believe dynamic mapping should cover the case of fixed vector?
>>
> Actually this bit *aims* for msi sharing or msi non-sharing.
>
> It means, when msi sharing bit is 1, device doesn't want vector per queue
>
> (it wants msi vector sharing as name) and doesn't want a high 
> interrupt rate.
>
> So driver turns to !per_vq_vectors and has to do dynamical mapping.
>
> So they are opposite not superset.
>
> Thanks!
>
> Jing


I think you need add more comments on the command.

E.g if I want to map vector 0 to queue 1, how do I need to do?

write(1, queue_sel);
write(0, vector_sel);

?

Thanks


>
>
>> Thanks
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
       [not found]         ` <5522f205-207b-b012-6631-3cc77dde3bfe@linux.intel.com>
@ 2020-02-11  7:40           ` Jason Wang
  2020-02-11 11:58             ` Michael S. Tsirkin
       [not found]             ` <4c19292f-9d25-a859-3dde-6dd5a03fdf0b@linux.intel.com>
  0 siblings, 2 replies; 40+ messages in thread
From: Jason Wang @ 2020-02-11  7:40 UTC (permalink / raw)
  To: Liu, Jing2, Zha Bin, linux-kernel
  Cc: virtio-dev, slp, mst, qemu-devel, chao.p.peng, gerry


On 2020/2/11 下午2:02, Liu, Jing2 wrote:
>
>
> On 2/11/2020 12:02 PM, Jason Wang wrote:
>>
>> On 2020/2/11 上午11:35, Liu, Jing2 wrote:
>>>
>>> On 2/11/2020 11:17 AM, Jason Wang wrote:
>>>>
>>>> On 2020/2/10 下午5:05, Zha Bin wrote:
>>>>> From: Liu Jiang<gerry@linux.alibaba.com>
>>>>>
>>>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of 
>>>>> using
>>>>> virtio over mmio devices as a lightweight machine model for modern
>>>>> cloud. The standard virtio over MMIO transport layer only supports 
>>>>> one
>>>>> legacy interrupt, which is much heavier than virtio over PCI 
>>>>> transport
>>>>> layer using MSI. Legacy interrupt has long work path and causes 
>>>>> specific
>>>>> VMExits in following cases, which would considerably slow down the
>>>>> performance:
>>>>>
>>>>> 1) read interrupt status register
>>>>> 2) update interrupt status register
>>>>> 3) write IOAPIC EOI register
>>>>>
>>>>> We proposed to add MSI support for virtio over MMIO via new feature
>>>>> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
>>>>>
>>>>> With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
>>>>> uses msi_sharing[1] to indicate the event and vector mapping.
>>>>> Bit 1 is 0: device uses non-sharing and fixed vector per event 
>>>>> mapping.
>>>>> Bit 1 is 1: device uses sharing mode and dynamic mapping.
>>>>
>>>>
>>>> I believe dynamic mapping should cover the case of fixed vector?
>>>>
>>> Actually this bit *aims* for msi sharing or msi non-sharing.
>>>
>>> It means, when msi sharing bit is 1, device doesn't want vector per 
>>> queue
>>>
>>> (it wants msi vector sharing as name) and doesn't want a high 
>>> interrupt rate.
>>>
>>> So driver turns to !per_vq_vectors and has to do dynamical mapping.
>>>
>>> So they are opposite not superset.
>>>
>>> Thanks!
>>>
>>> Jing
>>
>>
>> I think you need add more comments on the command.
>>
>> E.g if I want to map vector 0 to queue 1, how do I need to do?
>>
>> write(1, queue_sel);
>> write(0, vector_sel);
>
> That's true. Besides, two commands are used for msi sharing mode,
>
> VIRTIO_MMIO_MSI_CMD_MAP_CONFIG and VIRTIO_MMIO_MSI_CMD_MAP_QUEUE.
>
> "To set up the event and vector mapping for MSI sharing mode, driver 
> SHOULD write a valid MsiVecSel followed by 
> VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE command 
> to map the configuration change/selected queue events respectively.  " 
> (See spec patch 5/5)
>
> So if driver detects the msi sharing mode, when it does setup vq, 
> writes the queue_sel (this already exists in setup vq), vector sel and 
> then MAP_QUEUE command to do the queue event mapping.
>

So actually the per vq msix could be done through this. I don't get why 
you need to introduce MSI_SHARING_MASK which is the charge of driver 
instead of device. The interrupt rate should have no direct relationship 
with whether it has been shared or not.

Btw, you introduce mask/unmask without pending, how to deal with the 
lost interrupt during the masking then?


> For msi non-sharing mode, no special action is needed because we make 
> the rule of per_vq_vector and fixed relationship.
>
> Correct me if this is not that clear for spec/code comments.
>

The ABI is not as straightforward as PCI did. Why not just reuse the PCI 
layout?

E.g having

queue_sel
queue_msix_vector
msix_config

for configuring map between msi vector and queues/config

Then

vector_sel
address
data
pending
mask
unmask

for configuring msi table?

Thanks


> Thanks!
>
> Jing
>
>
>>
>> ?
>>
>> Thanks
>>
>>
>>>
>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>
>>>
>>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue
  2020-02-10  9:05 ` [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue Zha Bin
@ 2020-02-11 10:50   ` Michael S. Tsirkin
  2020-02-11 11:33   ` Michael S. Tsirkin
  1 sibling, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 10:50 UTC (permalink / raw)
  To: Zha Bin
  Cc: linux-kernel, jasowang, slp, virtio-dev, qemu-devel, gerry,
	jing2.liu, chao.p.peng

On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
> From: Liu Jiang <gerry@linux.alibaba.com>
> 
> The standard virtio-mmio devices use notification register to signal
> backend. This will cause vmexits and slow down the performance when we
> passthrough the virtio-mmio devices to guest virtual machines.
> We proposed to update virtio over MMIO spec to add the per-queue
> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
> configure notify location for each queue.
> 
> [1] https://lkml.org/lkml/2020/1/21/31
> 
> Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
> Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
> Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
> Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
> Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
> Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>

So this is for pass-through for nested virt?
OK but let's split this out please, and benchmark separately.


> ---
>  drivers/virtio/virtio_mmio.c       | 37 +++++++++++++++++++++++++++++++++++--
>  include/uapi/linux/virtio_config.h |  8 +++++++-
>  2 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 97d5725..1733ab97 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -90,6 +90,9 @@ struct virtio_mmio_device {
>  	/* a list of queues so we can dispatch IRQs */
>  	spinlock_t lock;
>  	struct list_head virtqueues;
> +
> +	unsigned short notify_base;
> +	unsigned short notify_multiplier;
>  };
>  
>  struct virtio_mmio_vq_info {
> @@ -98,6 +101,9 @@ struct virtio_mmio_vq_info {
>  
>  	/* the list node for the virtqueues list */
>  	struct list_head node;
> +
> +	/* Notify Address*/
> +	unsigned int notify_addr;
>  };
>  
>  
> @@ -119,13 +125,23 @@ static u64 vm_get_features(struct virtio_device *vdev)
>  	return features;
>  }
>  
> +static void vm_transport_features(struct virtio_device *vdev, u64 features)
> +{
> +	if (features & BIT_ULL(VIRTIO_F_MMIO_NOTIFICATION))
> +		__virtio_set_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION);
> +}
> +
>  static int vm_finalize_features(struct virtio_device *vdev)
>  {
>  	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +	u64 features = vdev->features;
>  
>  	/* Give virtio_ring a chance to accept features. */
>  	vring_transport_features(vdev);
>  
> +	/* Give virtio_mmio a chance to accept features. */
> +	vm_transport_features(vdev, features);
> +
>  	/* Make sure there is are no mixed devices */
>  	if (vm_dev->version == 2 &&
>  			!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> @@ -272,10 +288,13 @@ static void vm_reset(struct virtio_device *vdev)
>  static bool vm_notify(struct virtqueue *vq)
>  {
>  	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
> +	struct virtio_mmio_vq_info *info = vq->priv;
>  
> -	/* We write the queue's selector into the notification register to
> +	/* We write the queue's selector into the Notify Address to
>  	 * signal the other end */
> -	writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> +	if (info)
> +		writel(vq->index, vm_dev->base + info->notify_addr);
> +
>  	return true;
>  }
>  
> @@ -434,6 +453,12 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
>  	vq->priv = info;
>  	info->vq = vq;
>  
> +	if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION))
> +		info->notify_addr = vm_dev->notify_base +
> +				vm_dev->notify_multiplier * vq->index;
> +	else
> +		info->notify_addr = VIRTIO_MMIO_QUEUE_NOTIFY;
> +
>  	spin_lock_irqsave(&vm_dev->lock, flags);
>  	list_add(&info->node, &vm_dev->virtqueues);
>  	spin_unlock_irqrestore(&vm_dev->lock, flags);
> @@ -471,6 +496,14 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
>  		return irq;
>  	}
>  
> +	if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION)) {
> +		unsigned int notify = readl(vm_dev->base +
> +				VIRTIO_MMIO_QUEUE_NOTIFY);
> +
> +		vm_dev->notify_base = notify & 0xffff;
> +		vm_dev->notify_multiplier = (notify >> 16) & 0xffff;
> +	}
> +
>  	err = request_irq(irq, vm_interrupt, IRQF_SHARED,
>  			dev_name(&vdev->dev), vm_dev);
>  	if (err)
> diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> index ff8e7dc..5d93c01 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -52,7 +52,7 @@
>   * rest are per-device feature bits.
>   */
>  #define VIRTIO_TRANSPORT_F_START	28
> -#define VIRTIO_TRANSPORT_F_END		38
> +#define VIRTIO_TRANSPORT_F_END		40
>  
>  #ifndef VIRTIO_CONFIG_NO_LEGACY
>  /* Do we get callbacks when the ring is completely used, even if we've
> @@ -88,4 +88,10 @@
>   * Does the device support Single Root I/O Virtualization?
>   */
>  #define VIRTIO_F_SR_IOV			37
> +
> +/*
> + * This feature indicates the enhanced notification support on MMIO transport
> + * layer.
> + */
> +#define VIRTIO_F_MMIO_NOTIFICATION	39
>  #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> -- 
> 1.8.3.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/5] virtio mmio specification enhancement
  2020-02-11 16:05   ` Chao Peng
@ 2020-02-11 10:57     ` Michael S. Tsirkin
  0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 10:57 UTC (permalink / raw)
  To: Chao Peng
  Cc: Zha Bin, linux-kernel, jasowang, slp, virtio-dev, qemu-devel,
	gerry, jing2.liu

On Tue, Feb 11, 2020 at 04:05:41PM +0000, Chao Peng wrote:
> On Mon, Feb 10, 2020 at 06:44:50AM -0500, Michael S. Tsirkin wrote:
> > On Mon, Feb 10, 2020 at 05:05:16PM +0800, Zha Bin wrote:
> > > We have compared the number of files and the lines of code between
> > > virtio-mmio and virio-pci.
> > > 
> > > 				Virtio-PCI	    Virtio-MMIO	
> > > 	number of files(Linux)	    161			1
> > > 	lines of code(Linux)	    78237		538
> > 
> > 
> > 
> > Something's very wrong here. virtio PCI is 161 files?
> > Are you counting the whole PCI subsystem?
> 
> Right, that is just a rough statistics.

Please try not to make them look so wrong then.
E.g. you don't include drivers/base/platform-msi.c for
mmio do you? Your patch brings a bunch of code in there.

> Surely enough, some drivers will
> never get enabled in a typcial config.
> 
> > Sure enough:
> > 
> > $ find drivers/pci -name '*c' |wc -l
> > 150
> 
> and plus:
> $ find arch/x86/pci/ -name '*c' |wc -l
> 22

But what's the point? This is code that is maintained by PCI core
people anyway.

> > 
> > That's not reasonable, this includes a bunch of drivers that
> > never run on a typical hypervisor.
> > 
> > MMIO is also not as small as you are trying to show:
> > 
> > $ cloc drivers/virtio/virtio_mmio.c include/uapi/linux/virtio_mmio.h
> >        2 text files.
> >        2 unique files.                              
> >        0 files ignored.
> > 
> > github.com/AlDanial/cloc v 1.82  T=0.01 s (230.7 files/s, 106126.5 lines/s)
> > -------------------------------------------------------------------------------
> > Language                     files          blank        comment           code
> > -------------------------------------------------------------------------------
> > C                                1            144            100            535
> > C/C++ Header                     1             39             66             36
> > -------------------------------------------------------------------------------
> > SUM:                             2            183            166            571
> > -------------------------------------------------------------------------------
> > 
> > 
> > I don't doubt MMIO is smaller than PCI. Of course that's because it has
> > no features to speak of - just this patch already doubles it's size. If
> > we keep doing that because we want the features then they will reach
> > the same size in about 4 iterations.
> 
> Since current virtio-mmio size is small enough, so adding any notable
> feature would easily double it.

But really unlike PCI this is just PV stuff that is not reused by
anyone. We end up maintaining all this by ourselves.

> I have no objection that it may one day
> reach the same level of PCI, but in this patch some are actually
> generic changes and for MSI specific code we provide the option to
> confige away.
> 
> Thanks,
> Chao

The option will make it fall down at runtime but
it does not actually seem to remove all of the overhead.



> > 
> > 
> > -- 
> > MST


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-10  9:05 ` [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support Zha Bin
  2020-02-11  3:17   ` Jason Wang
@ 2020-02-11 11:11   ` Michael S. Tsirkin
  1 sibling, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 11:11 UTC (permalink / raw)
  To: Zha Bin
  Cc: linux-kernel, jasowang, slp, virtio-dev, qemu-devel, gerry,
	jing2.liu, chao.p.peng

On Mon, Feb 10, 2020 at 05:05:20PM +0800, Zha Bin wrote:
> From: Liu Jiang <gerry@linux.alibaba.com>
> 
> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
> virtio over mmio devices as a lightweight machine model for modern
> cloud. The standard virtio over MMIO transport layer only supports one
> legacy interrupt, which is much heavier than virtio over PCI transport
> layer using MSI. Legacy interrupt has long work path and causes specific
> VMExits in following cases, which would considerably slow down the
> performance:
> 
> 1) read interrupt status register
> 2) update interrupt status register
> 3) write IOAPIC EOI register
> 
> We proposed to add MSI support for virtio over MMIO via new feature
> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
> 
> With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
> uses msi_sharing[1] to indicate the event and vector mapping.
> Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
> Bit 1 is 1: device uses sharing mode and dynamic mapping.
> 
> [1] https://lkml.org/lkml/2020/1/21/31
> 
> Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
> Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
> Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
> Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
> Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
> Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> ---
>  drivers/virtio/virtio_mmio.c        | 299 ++++++++++++++++++++++++++++++++++--
>  drivers/virtio/virtio_mmio_common.h |   8 +
>  drivers/virtio/virtio_mmio_msi.h    |  82 ++++++++++
>  include/uapi/linux/virtio_config.h  |   7 +-
>  include/uapi/linux/virtio_mmio.h    |  31 ++++
>  5 files changed, 411 insertions(+), 16 deletions(-)

So that's > 100 exatra lines in headers that you for some reason
don't count when comparing code size :).


I think an effort can be made to reduce the complexity here.
Otherwise you will end up with a clone of PCI
sooner than you think. In fact, disabling all legacy:

$ wc -l drivers/virtio/virtio_pci_modern.c
734 drivers/virtio/virtio_pci_modern.c
$ wc -l drivers/virtio/virtio_pci_common.c
635 drivers/virtio/virtio_pci_common.c

So you have almost matched that with your patch.


> 
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 41e1c93..b24ce21 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -67,8 +67,7 @@
>  #include <uapi/linux/virtio_mmio.h>
>  #include <linux/virtio_ring.h>
>  #include "virtio_mmio_common.h"
> -
> -
> +#include "virtio_mmio_msi.h"
>  
>  /* The alignment to use between consumer and producer parts of vring.
>   * Currently hardcoded to the page size. */
> @@ -85,9 +84,13 @@ struct virtio_mmio_vq_info {
>  
>  	/* Notify Address*/
>  	unsigned int notify_addr;
> -};
>  
> +	/* MSI vector (or none) */
> +	unsigned int msi_vector;
> +};
>  
> +static void vm_free_msi_irqs(struct virtio_device *vdev);
> +static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs);
>  
>  /* Configuration interface */
>  
> @@ -110,6 +113,9 @@ static void vm_transport_features(struct virtio_device *vdev, u64 features)
>  {
>  	if (features & BIT_ULL(VIRTIO_F_MMIO_NOTIFICATION))
>  		__virtio_set_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION);
> +
> +	if (features & BIT_ULL(VIRTIO_F_MMIO_MSI))
> +		__virtio_set_bit(vdev, VIRTIO_F_MMIO_MSI);
>  }
>  
>  static int vm_finalize_features(struct virtio_device *vdev)
> @@ -307,7 +313,33 @@ static irqreturn_t vm_interrupt(int irq, void *opaque)
>  	return ret;
>  }
>  
> +static irqreturn_t vm_vring_interrupt(int irq, void *opaque)
> +{
> +	struct virtio_mmio_device *vm_dev = opaque;
> +	struct virtio_mmio_vq_info *info;
> +	irqreturn_t ret = IRQ_NONE;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&vm_dev->lock, flags);
> +	list_for_each_entry(info, &vm_dev->virtqueues, node) {
> +		if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
> +			ret = IRQ_HANDLED;
> +	}
> +	spin_unlock_irqrestore(&vm_dev->lock, flags);
> +
> +	return ret;
> +}
> +
> +
> +/* Handle a configuration change */
> +static irqreturn_t vm_config_changed(int irq, void *opaque)
> +{
> +	struct virtio_mmio_device *vm_dev = opaque;
> +
> +	virtio_config_changed(&vm_dev->vdev);
>  
> +	return IRQ_HANDLED;
> +}
>  
>  static void vm_del_vq(struct virtqueue *vq)
>  {
> @@ -316,6 +348,15 @@ static void vm_del_vq(struct virtqueue *vq)
>  	unsigned long flags;
>  	unsigned int index = vq->index;
>  
> +	if (vm_dev->msi_enabled && !vm_dev->msi_share) {
> +		if (info->msi_vector != VIRTIO_MMIO_MSI_NO_VECTOR) {
> +			int irq = mmio_msi_irq_vector(&vq->vdev->dev,
> +					info->msi_vector);
> +
> +			free_irq(irq, vq);
> +		}
> +	}
> +
>  	spin_lock_irqsave(&vm_dev->lock, flags);
>  	list_del(&info->node);
>  	spin_unlock_irqrestore(&vm_dev->lock, flags);
> @@ -334,20 +375,56 @@ static void vm_del_vq(struct virtqueue *vq)
>  	kfree(info);
>  }
>  
> -static void vm_del_vqs(struct virtio_device *vdev)
> +static void vm_free_irqs(struct virtio_device *vdev)
>  {
>  	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> +	if (vm_dev->msi_enabled)
> +		vm_free_msi_irqs(vdev);
> +	else
> +		free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
> +}
> +
> +static void vm_del_vqs(struct virtio_device *vdev)
> +{
>  	struct virtqueue *vq, *n;
>  
>  	list_for_each_entry_safe(vq, n, &vdev->vqs, list)
>  		vm_del_vq(vq);
>  
> -	free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
> +	vm_free_irqs(vdev);
> +}
> +
> +static inline void mmio_msi_set_enable(struct virtio_mmio_device *vm_dev,
> +					int enable)
> +{
> +	u32 state;
> +
> +	state = readl(vm_dev->base + VIRTIO_MMIO_MSI_STATE);
> +	if (enable && (state & VIRTIO_MMIO_MSI_ENABLED_MASK))
> +		return;
> +
> +	writel(VIRTIO_MMIO_MSI_CMD_ENABLE,
> +		vm_dev->base + VIRTIO_MMIO_MSI_COMMAND);
> +}
> +
> +static void mmio_msi_config_vector(struct virtio_mmio_device *vm_dev, u32 vec)
> +{
> +	writel(vec, vm_dev->base + VIRTIO_MMIO_MSI_VEC_SEL);
> +	writel(VIRTIO_MMIO_MSI_CMD_MAP_CONFIG, vm_dev->base +
> +			VIRTIO_MMIO_MSI_COMMAND);
> +}
> +
> +static void mmio_msi_queue_vector(struct virtio_mmio_device *vm_dev, u32 vec)
> +{
> +	writel(vec, vm_dev->base + VIRTIO_MMIO_MSI_VEC_SEL);
> +	writel(VIRTIO_MMIO_MSI_CMD_MAP_QUEUE, vm_dev->base +
> +			VIRTIO_MMIO_MSI_COMMAND);
>  }
>  
>  static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
>  				  void (*callback)(struct virtqueue *vq),
> -				  const char *name, bool ctx)
> +				  const char *name, bool ctx, u32 msi_vector)
>  {
>  	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
>  	struct virtio_mmio_vq_info *info;
> @@ -440,6 +517,11 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
>  	else
>  		info->notify_addr = VIRTIO_MMIO_QUEUE_NOTIFY;
>  
> +	info->msi_vector = msi_vector;
> +	/* Set queue event and vector mapping for MSI share mode. */
> +	if (vm_dev->msi_share && msi_vector != VIRTIO_MMIO_MSI_NO_VECTOR)
> +		mmio_msi_queue_vector(vm_dev, msi_vector);
> +
>  	spin_lock_irqsave(&vm_dev->lock, flags);
>  	list_add(&info->node, &vm_dev->virtqueues);
>  	spin_unlock_irqrestore(&vm_dev->lock, flags);
> @@ -461,12 +543,11 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
>  	return ERR_PTR(err);
>  }
>  
> -static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> -		       struct virtqueue *vqs[],
> -		       vq_callback_t *callbacks[],
> -		       const char * const names[],
> -		       const bool *ctx,
> -		       struct irq_affinity *desc)
> +static int vm_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
> +			struct virtqueue *vqs[],
> +			vq_callback_t *callbacks[],
> +			const char * const names[],
> +			const bool *ctx)
>  {
>  	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
>  	int irq = platform_get_irq(vm_dev->pdev, 0);
> @@ -487,8 +568,6 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
>  
>  	err = request_irq(irq, vm_interrupt, IRQF_SHARED,
>  			dev_name(&vdev->dev), vm_dev);
> -	if (err)
> -		return err;
>  
>  	for (i = 0; i < nvqs; ++i) {
>  		if (!names[i]) {
> @@ -497,14 +576,202 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
>  		}
>  
>  		vqs[i] = vm_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
> -				     ctx ? ctx[i] : false);
> +				     ctx ? ctx[i] : false,
> +				     VIRTIO_MMIO_MSI_NO_VECTOR);
>  		if (IS_ERR(vqs[i])) {
>  			vm_del_vqs(vdev);
>  			return PTR_ERR(vqs[i]);
>  		}
>  	}
>  
> +	return err;
> +}
> +
> +static int vm_find_vqs_msi(struct virtio_device *vdev, unsigned int nvqs,
> +			struct virtqueue *vqs[], vq_callback_t *callbacks[],
> +			const char * const names[], bool per_vq_vectors,
> +			const bool *ctx, struct irq_affinity *desc)
> +{
> +	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +	int i, err, allocated_vectors, nvectors;
> +	u32 msi_vec;
> +	u32 max_vec_num = readl(vm_dev->base + VIRTIO_MMIO_MSI_VEC_NUM);
> +
> +	/* For MSI non-sharing, the max vector number MUST greater than nvqs.
> +	 * Otherwise, go back to legacy interrupt.
> +	 */
> +	if (per_vq_vectors && max_vec_num < (nvqs + 1))
> +		return -EINVAL;
> +
> +	if (per_vq_vectors) {
> +		nvectors = 1;
> +		for (i = 0; i < nvqs; ++i)
> +			if (callbacks[i])
> +				++nvectors;
> +	} else {
> +		nvectors = 2;
> +	}
> +
> +	vm_dev->msi_share = !per_vq_vectors;
> +
> +	/* Allocate nvqs irqs for queues and one irq for configuration */
> +	err = vm_request_msi_vectors(vdev, nvectors);
> +	if (err != 0)
> +		return err;
> +
> +	allocated_vectors = vm_dev->msi_used_vectors;
> +	for (i = 0; i < nvqs; i++) {
> +		if (!names[i]) {
> +			vqs[i] = NULL;
> +			continue;
> +		}
> +		if (!callbacks[i])
> +			msi_vec = VIRTIO_MMIO_MSI_NO_VECTOR;
> +		else if (per_vq_vectors)
> +			msi_vec = allocated_vectors++;
> +		else
> +			msi_vec = 1;
> +		vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i],
> +				ctx ? ctx[i] : false, msi_vec);
> +		if (IS_ERR(vqs[i])) {
> +			err = PTR_ERR(vqs[i]);
> +			goto error_find;
> +		}
> +
> +		if (!per_vq_vectors ||
> +				msi_vec == VIRTIO_MMIO_MSI_NO_VECTOR)
> +			continue;
> +
> +		/* allocate per-vq irq if available and necessary */
> +		snprintf(vm_dev->vm_vq_names[msi_vec],
> +			sizeof(*vm_dev->vm_vq_names),
> +			"%s-%s",
> +			dev_name(&vm_dev->vdev.dev), names[i]);
> +		err = request_irq(mmio_msi_irq_vector(&vqs[i]->vdev->dev,
> +					msi_vec),
> +				vring_interrupt, 0,
> +				vm_dev->vm_vq_names[msi_vec], vqs[i]);
> +
> +		if (err)
> +			goto error_find;
> +	}
> +
>  	return 0;
> +
> +error_find:
> +	vm_del_vqs(vdev);
> +	return err;
> +}
> +
> +static int vm_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> +		       struct virtqueue *vqs[],
> +		       vq_callback_t *callbacks[],
> +		       const char * const names[],
> +		       const bool *ctx,
> +		       struct irq_affinity *desc)
> +{
> +	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +	int err;
> +
> +	if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION)) {
> +		unsigned int notify = readl(vm_dev->base +
> +				VIRTIO_MMIO_QUEUE_NOTIFY);
> +
> +		vm_dev->notify_base = notify & 0xffff;
> +		vm_dev->notify_multiplier = (notify >> 16) & 0xffff;
> +	}
> +
> +	if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_MSI)) {
> +		bool dyn_mapping = !!(readl(vm_dev->base +
> +					VIRTIO_MMIO_MSI_STATE) &
> +				VIRTIO_MMIO_MSI_SHARING_MASK);
> +		if (!dyn_mapping)
> +			err = vm_find_vqs_msi(vdev, nvqs, vqs, callbacks,
> +				names, true, ctx, desc);
> +		else
> +			err = vm_find_vqs_msi(vdev, nvqs, vqs, callbacks,
> +				names, false, ctx, desc);
> +		if (!err)
> +			return 0;
> +	}
> +
> +	return vm_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, ctx);
> +}
> +
> +static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs)
> +{
> +	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +	unsigned int v;
> +	int irq, err;
> +
> +	if (vm_dev->msi_enabled)
> +		return -EINVAL;
> +
> +	vm_dev->vm_vq_names = kmalloc_array(nirqs, sizeof(*vm_dev->vm_vq_names),
> +					GFP_KERNEL);
> +	if (!vm_dev->vm_vq_names)
> +		return -ENOMEM;
> +
> +	mmio_get_msi_domain(vdev);
> +	err = mmio_msi_domain_alloc_irqs(&vdev->dev, nirqs);
> +	if (err) {
> +		kfree(vm_dev->vm_vq_names);
> +		vm_dev->vm_vq_names = NULL;
> +		return err;
> +	}
> +
> +	mmio_msi_set_enable(vm_dev, 1);
> +	vm_dev->msi_enabled = true;
> +
> +	v = vm_dev->msi_used_vectors;
> +	/* The first MSI vector is used for configuration change event. */
> +	snprintf(vm_dev->vm_vq_names[v], sizeof(*vm_dev->vm_vq_names),
> +			"%s-config", dev_name(&vdev->dev));
> +	irq = mmio_msi_irq_vector(&vdev->dev, v);
> +	err = request_irq(irq, vm_config_changed, 0, vm_dev->vm_vq_names[v],
> +			vm_dev);
> +	if (err)
> +		goto error_request_irq;
> +
> +	/* Set the configuration event mapping. */
> +	if (vm_dev->msi_share)
> +		mmio_msi_config_vector(vm_dev, v);
> +
> +	++vm_dev->msi_used_vectors;
> +
> +	if (vm_dev->msi_share) {
> +		v = vm_dev->msi_used_vectors;
> +		snprintf(vm_dev->vm_vq_names[v], sizeof(*vm_dev->vm_vq_names),
> +			 "%s-virtqueues", dev_name(&vm_dev->vdev.dev));
> +		err = request_irq(mmio_msi_irq_vector(&vdev->dev, v),
> +				  vm_vring_interrupt, 0, vm_dev->vm_vq_names[v],
> +				  vm_dev);
> +		if (err)
> +			goto error_request_irq;
> +		++vm_dev->msi_used_vectors;
> +	}
> +
> +	return 0;
> +
> +error_request_irq:
> +	vm_free_msi_irqs(vdev);
> +
> +	return err;
> +}
> +
> +static void vm_free_msi_irqs(struct virtio_device *vdev)
> +{
> +	int i;
> +	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> +	mmio_msi_set_enable(vm_dev, 0);
> +	for (i = 0; i < vm_dev->msi_used_vectors; i++)
> +		free_irq(mmio_msi_irq_vector(&vdev->dev, i), vm_dev);
> +	mmio_msi_domain_free_irqs(&vdev->dev);
> +	kfree(vm_dev->vm_vq_names);
> +	vm_dev->vm_vq_names = NULL;
> +	vm_dev->msi_enabled = false;
> +	vm_dev->msi_used_vectors = 0;
>  }
>  
>  static const char *vm_bus_name(struct virtio_device *vdev)
> @@ -609,6 +876,8 @@ static int virtio_mmio_probe(struct platform_device *pdev)
>  
>  	platform_set_drvdata(pdev, vm_dev);
>  
> +	mmio_msi_create_irq_domain();
> +
>  	rc = register_virtio_device(&vm_dev->vdev);
>  	if (rc)
>  		put_device(&vm_dev->vdev.dev);
> diff --git a/drivers/virtio/virtio_mmio_common.h b/drivers/virtio/virtio_mmio_common.h
> index 90cb304..ccf6320 100644
> --- a/drivers/virtio/virtio_mmio_common.h
> +++ b/drivers/virtio/virtio_mmio_common.h
> @@ -26,6 +26,14 @@ struct virtio_mmio_device {
>  
>  	unsigned short notify_base;
>  	unsigned short notify_multiplier;
> +
> +	/* Name strings for interrupts. This size should be enough. */
> +	char (*vm_vq_names)[256];
> +
> +	/* used vectors */
> +	unsigned int msi_used_vectors;
> +	bool msi_share;
> +	bool msi_enabled;
>  };
>  
>  #endif
> diff --git a/drivers/virtio/virtio_mmio_msi.h b/drivers/virtio/virtio_mmio_msi.h
> index 27cb2af..10038cb 100644
> --- a/drivers/virtio/virtio_mmio_msi.h
> +++ b/drivers/virtio/virtio_mmio_msi.h
> @@ -8,6 +8,7 @@
>  #include <linux/irq.h>
>  #include <linux/irqdomain.h>
>  #include <linux/platform_device.h>
> +#include "virtio_mmio_common.h"
>  
>  static irq_hw_number_t mmio_msi_hwirq;
>  static struct irq_domain *mmio_msi_domain;
> @@ -21,12 +22,41 @@ void __weak irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
>  {
>  }
>  
> +static void __iomem *vm_dev_base(struct msi_desc *desc)
> +{
> +	if (desc) {
> +		struct device *dev = desc->dev;
> +		struct virtio_device *vdev = dev_to_virtio(dev);
> +		struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> +		return vm_dev->base;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void mmio_msi_set_mask_bit(struct irq_data *data, u32 flag)
> +{
> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
> +	void __iomem *base = vm_dev_base(desc);
> +	unsigned int offset = data->irq - desc->irq;
> +
> +	if (base) {
> +		u32 op = flag ? VIRTIO_MMIO_MSI_CMD_MASK :
> +			VIRTIO_MMIO_MSI_CMD_UNMASK;
> +		writel(offset, base + VIRTIO_MMIO_MSI_VEC_SEL);
> +		writel(op, base + VIRTIO_MMIO_MSI_COMMAND);
> +	}
> +}
> +
>  static void mmio_msi_mask_irq(struct irq_data *data)
>  {
> +	mmio_msi_set_mask_bit(data, 1);
>  }
>  
>  static void mmio_msi_unmask_irq(struct irq_data *data)
>  {
> +	mmio_msi_set_mask_bit(data, 0);
>  }
>  
>  static struct irq_chip mmio_msi_controller = {
> @@ -86,8 +116,60 @@ static inline void mmio_msi_create_irq_domain(void)
>  		irq_domain_free_fwnode(fn);
>  	}
>  }
> +
> +static void mmio_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
> +{
> +	void __iomem *base = vm_dev_base(desc);
> +
> +	if (base) {
> +		writel(desc->platform.msi_index, base +
> +				VIRTIO_MMIO_MSI_VEC_SEL);
> +		writel(msg->address_lo, base + VIRTIO_MMIO_MSI_ADDRESS_LOW);
> +		writel(msg->address_hi, base + VIRTIO_MMIO_MSI_ADDRESS_HIGH);
> +		writel(msg->data, base + VIRTIO_MMIO_MSI_DATA);
> +		writel(VIRTIO_MMIO_MSI_CMD_CONFIGURE,
> +			base + VIRTIO_MMIO_MSI_COMMAND);
> +	}
> +}
> +
> +static inline int mmio_msi_domain_alloc_irqs(struct device *dev,
> +				unsigned int nvec)
> +{
> +	return platform_msi_domain_alloc_irqs(dev, nvec,
> +			mmio_write_msi_msg);
> +}
> +
> +static inline void mmio_msi_domain_free_irqs(struct device *dev)
> +{
> +	return platform_msi_domain_free_irqs(dev);
> +}
> +
> +static inline void mmio_get_msi_domain(struct virtio_device *vdev)
> +{
> +	if (!vdev->dev.msi_domain)
> +		vdev->dev.msi_domain = mmio_msi_domain;
> +}
> +
> +static inline int mmio_msi_irq_vector(struct device *dev, unsigned int nr)
> +{
> +	struct msi_desc *entry = first_msi_entry(dev);
> +
> +	return entry->irq + nr;
> +}
> +
>  #else
>  static inline void mmio_msi_create_irq_domain(void) {}
> +static inline int mmio_msi_irq_vector(struct device *dev, unsigned int nr)
> +{
> +	return -EINVAL;
> +}
> +static inline void mmio_get_msi_domain(struct virtio_device *vdev) {}
> +static inline int mmio_msi_domain_alloc_irqs(struct device *dev,
> +				unsigned int nvec)
> +{
> +	return -EINVAL;
> +}
> +static inline void mmio_msi_domain_free_irqs(struct device *dev) {}
>  #endif
>  
>  #endif
> diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> index 5d93c01..5eb3900 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -52,7 +52,7 @@
>   * rest are per-device feature bits.
>   */
>  #define VIRTIO_TRANSPORT_F_START	28
> -#define VIRTIO_TRANSPORT_F_END		40
> +#define VIRTIO_TRANSPORT_F_END		41
>  
>  #ifndef VIRTIO_CONFIG_NO_LEGACY
>  /* Do we get callbacks when the ring is completely used, even if we've
> @@ -94,4 +94,9 @@
>   * layer.
>   */
>  #define VIRTIO_F_MMIO_NOTIFICATION	39
> +
> +/*
> + * This feature indicates the MSI support on MMIO layer.
> + */
> +#define VIRTIO_F_MMIO_MSI		40
>  #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> diff --git a/include/uapi/linux/virtio_mmio.h b/include/uapi/linux/virtio_mmio.h
> index c4b0968..777cb0e 100644
> --- a/include/uapi/linux/virtio_mmio.h
> +++ b/include/uapi/linux/virtio_mmio.h
> @@ -122,6 +122,21 @@
>  #define VIRTIO_MMIO_QUEUE_USED_LOW	0x0a0
>  #define VIRTIO_MMIO_QUEUE_USED_HIGH	0x0a4
>  
> +/* MSI max vector number that device supports - Read Only */
> +#define VIRTIO_MMIO_MSI_VEC_NUM		0x0c0
> +/* MSI state register  - Read Only */
> +#define VIRTIO_MMIO_MSI_STATE		0x0c4
> +/* MSI command register - Write Only */
> +#define VIRTIO_MMIO_MSI_COMMAND		0x0c8
> +/* MSI vector selector  - Write Only */
> +#define VIRTIO_MMIO_MSI_VEC_SEL		0x0d0
> +/* MSI low 32 bit address, 64 bits in two halves */
> +#define VIRTIO_MMIO_MSI_ADDRESS_LOW	0x0d4
> +/* MSI high 32 bit address, 64 bits in two halves */
> +#define VIRTIO_MMIO_MSI_ADDRESS_HIGH	0x0d8
> +/* MSI 32 bit data */
> +#define VIRTIO_MMIO_MSI_DATA		0x0dc
> +
>  /* Configuration atomicity value */
>  #define VIRTIO_MMIO_CONFIG_GENERATION	0x0fc
>  
> @@ -130,6 +145,22 @@
>  #define VIRTIO_MMIO_CONFIG		0x100
>  
>  
> +/* MSI commands */
> +#define VIRTIO_MMIO_MSI_CMD_ENABLE	0x1
> +#define VIRTIO_MMIO_MSI_CMD_DISABLE	0x2
> +#define VIRTIO_MMIO_MSI_CMD_CONFIGURE	0x3
> +#define VIRTIO_MMIO_MSI_CMD_MASK	0x4
> +#define VIRTIO_MMIO_MSI_CMD_UNMASK	0x5
> +#define VIRTIO_MMIO_MSI_CMD_MAP_CONFIG	0x6
> +#define VIRTIO_MMIO_MSI_CMD_MAP_QUEUE	0x7
> +
> +/* MSI NO_VECTOR */
> +#define VIRTIO_MMIO_MSI_NO_VECTOR	0xffffffff
> +
> +/* MSI state enabled state mask */
> +#define VIRTIO_MMIO_MSI_ENABLED_MASK	(1 << 31)
> +/* MSI state MSI sharing mask */
> +#define VIRTIO_MMIO_MSI_SHARING_MASK	(1 << 30)
>  

You should be able to drop MAP commands,
replace vector number with vq number.

I also don't really see a burning need for separate
enable/disable/configure/mask/unmask commands.
Just two registers mask and enable should be enough?
Will remove need for NO_VECTOR hack too.

what is sharing that hypervisor needs to worry about it?
just make all interrupts shared, or all unshared.
does it matter for performance?

>  /*
>   * Interrupt flags (re: interrupt status & acknowledge registers)
> -- 
> 1.8.3.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 5/5] x86: virtio-mmio: support virtio-mmio with MSI for x86
  2020-02-10  9:05 ` [PATCH v2 5/5] x86: virtio-mmio: support virtio-mmio with MSI for x86 Zha Bin
@ 2020-02-11 11:14   ` Michael S. Tsirkin
  0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 11:14 UTC (permalink / raw)
  To: Zha Bin
  Cc: linux-kernel, jasowang, slp, virtio-dev, qemu-devel, gerry,
	jing2.liu, chao.p.peng

On Mon, Feb 10, 2020 at 05:05:21PM +0800, Zha Bin wrote:
> From: Liu Jiang <gerry@linux.alibaba.com>
> 
> virtio-mmio supports a generic MSI irq domain for all archs. This
> patch adds the x86 architecture support.
> 
> Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
> Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
> Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
> Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
> Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
> Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> ---
>  arch/x86/kernel/apic/msi.c | 11 ++++++++++-

Patches like this need to CC x86 maintainers.


>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> index 159bd0c..2fcd602 100644
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -45,7 +45,11 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
>  		MSI_DATA_VECTOR(cfg->vector);
>  }
>  
> -static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
> +/*
> + * x86 PCI-MSI/HPET/DMAR related method.
> + * Also can be used as arch specific method for virtio-mmio MSI.
> + */
> +void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
>  {
>  	__irq_msi_compose_msg(irqd_cfg(data), msg);
>  }
> @@ -166,6 +170,11 @@ static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
>  	return ret;
>  }
>  
> +struct irq_domain *arch_msi_root_irq_domain(void)
> +{
> +	return x86_vector_domain;
> +}
> +
>  /*
>   * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
>   * which implement the MSI or MSI-X Capability Structure.


So there's a new non-static functions with no callers here,
and a previously static irq_msi_compose_msg is not longer
static. No callers so how does this help anyone?
And how does this achieve the goal of enabling virtio mmio?


> -- 
> 1.8.3.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/5] virtio-mmio: create a generic MSI irq domain
  2020-02-10  9:05 ` [PATCH v2 3/5] virtio-mmio: create a generic MSI irq domain Zha Bin
@ 2020-02-11 11:16   ` Michael S. Tsirkin
  2020-02-12  7:40   ` Michael S. Tsirkin
  1 sibling, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 11:16 UTC (permalink / raw)
  To: Zha Bin
  Cc: linux-kernel, jasowang, slp, virtio-dev, qemu-devel, gerry,
	jing2.liu, chao.p.peng

On Mon, Feb 10, 2020 at 05:05:19PM +0800, Zha Bin wrote:
> From: Liu Jiang <gerry@linux.alibaba.com>
> 
> Create a generic irq domain for all architectures which
> supports virtio-mmio. The device offering VIRTIO_F_MMIO_MSI
> feature bit can use this irq domain.
> 
> Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
> Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
> Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
> Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
> Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
> Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> ---
>  drivers/base/platform-msi.c      |  4 +-
>  drivers/virtio/Kconfig           |  9 ++++
>  drivers/virtio/virtio_mmio_msi.h | 93 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/msi.h              |  1 +
>  4 files changed, 105 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/virtio/virtio_mmio_msi.h


This patch needs to copy maintainers for drivers/base/platform-msi.c and
include/linux/msi.h

> diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
> index 8da314b..45752f1 100644
> --- a/drivers/base/platform-msi.c
> +++ b/drivers/base/platform-msi.c
> @@ -31,12 +31,11 @@ struct platform_msi_priv_data {
>  /* The devid allocator */
>  static DEFINE_IDA(platform_msi_devid_ida);
>  
> -#ifdef GENERIC_MSI_DOMAIN_OPS
>  /*
>   * Convert an msi_desc to a globaly unique identifier (per-device
>   * devid + msi_desc position in the msi_list).
>   */
> -static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
> +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
>  {
>  	u32 devid;
>  
> @@ -45,6 +44,7 @@ static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
>  	return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index;
>  }
>  
> +#ifdef GENERIC_MSI_DOMAIN_OPS
>  static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
>  {
>  	arg->desc = desc;
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 078615c..551a9f7 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -84,6 +84,15 @@ config VIRTIO_MMIO
>  
>   	 If unsure, say N.
>  
> +config VIRTIO_MMIO_MSI
> +	bool "Memory-mapped virtio device MSI"
> +	depends on VIRTIO_MMIO && GENERIC_MSI_IRQ_DOMAIN && GENERIC_MSI_IRQ
> +	help
> +	 This allows device drivers to support msi interrupt handling for
> +	 virtio-mmio devices. It can improve performance greatly.
> +
> +	 If unsure, say N.
> +
>  config VIRTIO_MMIO_CMDLINE_DEVICES
>  	bool "Memory mapped virtio devices parameter parsing"
>  	depends on VIRTIO_MMIO
> diff --git a/drivers/virtio/virtio_mmio_msi.h b/drivers/virtio/virtio_mmio_msi.h
> new file mode 100644
> index 0000000..27cb2af
> --- /dev/null
> +++ b/drivers/virtio/virtio_mmio_msi.h
> @@ -0,0 +1,93 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
> +#define _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
> +
> +#ifdef CONFIG_VIRTIO_MMIO_MSI
> +
> +#include <linux/msi.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/platform_device.h>
> +
> +static irq_hw_number_t mmio_msi_hwirq;
> +static struct irq_domain *mmio_msi_domain;
> +
> +struct irq_domain *__weak arch_msi_root_irq_domain(void)
> +{
> +	return NULL;
> +}
> +
> +void __weak irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
> +{
> +}
> +
> +static void mmio_msi_mask_irq(struct irq_data *data)
> +{
> +}
> +
> +static void mmio_msi_unmask_irq(struct irq_data *data)
> +{
> +}
> +
> +static struct irq_chip mmio_msi_controller = {
> +	.name			= "VIRTIO-MMIO-MSI",
> +	.irq_mask		= mmio_msi_mask_irq,
> +	.irq_unmask		= mmio_msi_unmask_irq,
> +	.irq_ack		= irq_chip_ack_parent,
> +	.irq_retrigger		= irq_chip_retrigger_hierarchy,
> +	.irq_compose_msi_msg	= irq_msi_compose_msg,
> +	.flags			= IRQCHIP_SKIP_SET_WAKE,
> +};
> +
> +static int mmio_msi_prepare(struct irq_domain *domain, struct device *dev,
> +				int nvec, msi_alloc_info_t *arg)
> +{
> +	memset(arg, 0, sizeof(*arg));
> +	return 0;
> +}
> +
> +static void mmio_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
> +{
> +	mmio_msi_hwirq = platform_msi_calc_hwirq(desc);
> +}


This call isn't exported to modules. How will it work when virtio is
modular?

> +
> +static irq_hw_number_t mmio_msi_get_hwirq(struct msi_domain_info *info,
> +					      msi_alloc_info_t *arg)
> +{
> +	return mmio_msi_hwirq;
> +}
> +
> +static struct msi_domain_ops mmio_msi_domain_ops = {
> +	.msi_prepare	= mmio_msi_prepare,
> +	.set_desc	= mmio_msi_set_desc,
> +	.get_hwirq	= mmio_msi_get_hwirq,
> +};
> +
> +static struct msi_domain_info mmio_msi_domain_info = {
> +	.flags          = MSI_FLAG_USE_DEF_DOM_OPS |
> +			  MSI_FLAG_USE_DEF_CHIP_OPS |
> +			  MSI_FLAG_ACTIVATE_EARLY,
> +	.ops            = &mmio_msi_domain_ops,
> +	.chip           = &mmio_msi_controller,
> +	.handler        = handle_edge_irq,
> +	.handler_name   = "edge",
> +};
> +
> +static inline void mmio_msi_create_irq_domain(void)
> +{
> +	struct fwnode_handle *fn;
> +	struct irq_domain *parent = arch_msi_root_irq_domain();
> +
> +	fn = irq_domain_alloc_named_fwnode("VIRTIO-MMIO-MSI");
> +	if (fn && parent) {
> +		mmio_msi_domain =
> +			platform_msi_create_irq_domain(fn,
> +				&mmio_msi_domain_info, parent);
> +		irq_domain_free_fwnode(fn);
> +	}
> +}
> +#else
> +static inline void mmio_msi_create_irq_domain(void) {}
> +#endif
> +
> +#endif
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 8ad679e..ee5f566 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -362,6 +362,7 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
>  void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq,
>  			      unsigned int nvec);
>  void *platform_msi_get_host_data(struct irq_domain *domain);
> +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc);
>  #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
>  
>  #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> -- 
> 1.8.3.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/5] virtio-mmio: refactor common functionality
  2020-02-10  9:05 ` [PATCH v2 2/5] virtio-mmio: refactor common functionality Zha Bin
@ 2020-02-11 11:19   ` Michael S. Tsirkin
  2020-02-12  2:58     ` [virtio-dev] " Liu, Jing2
  0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 11:19 UTC (permalink / raw)
  To: Zha Bin
  Cc: linux-kernel, jasowang, slp, virtio-dev, qemu-devel, gerry,
	jing2.liu, chao.p.peng

On Mon, Feb 10, 2020 at 05:05:18PM +0800, Zha Bin wrote:
> From: Liu Jiang <gerry@linux.alibaba.com>
> 
> Common functionality is refactored into virtio_mmio_common.h
> in order to MSI support in later patch set.
> 
> Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
> Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
> Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
> Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
> Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
> Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>

What does this proliferation of header files achieve?
common with what?

> ---
>  drivers/virtio/virtio_mmio.c        | 21 +--------------------
>  drivers/virtio/virtio_mmio_common.h | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+), 20 deletions(-)
>  create mode 100644 drivers/virtio/virtio_mmio_common.h
> 
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 1733ab97..41e1c93 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -61,13 +61,12 @@
>  #include <linux/io.h>
>  #include <linux/list.h>
>  #include <linux/module.h>
> -#include <linux/platform_device.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
> -#include <linux/virtio.h>
>  #include <linux/virtio_config.h>
>  #include <uapi/linux/virtio_mmio.h>
>  #include <linux/virtio_ring.h>
> +#include "virtio_mmio_common.h"
>  
>  
>  
> @@ -77,24 +76,6 @@
>  
>  
>  
> -#define to_virtio_mmio_device(_plat_dev) \
> -	container_of(_plat_dev, struct virtio_mmio_device, vdev)
> -
> -struct virtio_mmio_device {
> -	struct virtio_device vdev;
> -	struct platform_device *pdev;
> -
> -	void __iomem *base;
> -	unsigned long version;
> -
> -	/* a list of queues so we can dispatch IRQs */
> -	spinlock_t lock;
> -	struct list_head virtqueues;
> -
> -	unsigned short notify_base;
> -	unsigned short notify_multiplier;
> -};
> -
>  struct virtio_mmio_vq_info {
>  	/* the actual virtqueue */
>  	struct virtqueue *vq;
> diff --git a/drivers/virtio/virtio_mmio_common.h b/drivers/virtio/virtio_mmio_common.h
> new file mode 100644
> index 0000000..90cb304
> --- /dev/null
> +++ b/drivers/virtio/virtio_mmio_common.h
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _DRIVERS_VIRTIO_VIRTIO_MMIO_COMMON_H
> +#define _DRIVERS_VIRTIO_VIRTIO_MMIO_COMMON_H
> +/*
> + * Virtio MMIO driver - common functionality for all device versions
> + *
> + * This module allows virtio devices to be used over a memory-mapped device.
> + */
> +
> +#include <linux/platform_device.h>
> +#include <linux/virtio.h>
> +
> +#define to_virtio_mmio_device(_plat_dev) \
> +	container_of(_plat_dev, struct virtio_mmio_device, vdev)
> +
> +struct virtio_mmio_device {
> +	struct virtio_device vdev;
> +	struct platform_device *pdev;
> +
> +	void __iomem *base;
> +	unsigned long version;
> +
> +	/* a list of queues so we can dispatch IRQs */
> +	spinlock_t lock;
> +	struct list_head virtqueues;
> +
> +	unsigned short notify_base;
> +	unsigned short notify_multiplier;
> +};
> +
> +#endif
> -- 
> 1.8.3.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-11  3:35     ` [virtio-dev] " Liu, Jing2
  2020-02-11  4:02       ` Jason Wang
@ 2020-02-11 11:21       ` Michael S. Tsirkin
  1 sibling, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 11:21 UTC (permalink / raw)
  To: Liu, Jing2
  Cc: Jason Wang, Zha Bin, linux-kernel, virtio-dev, slp, qemu-devel,
	chao.p.peng, gerry

On Tue, Feb 11, 2020 at 11:35:43AM +0800, Liu, Jing2 wrote:
> 
> On 2/11/2020 11:17 AM, Jason Wang wrote:
> > 
> > On 2020/2/10 下午5:05, Zha Bin wrote:
> > > From: Liu Jiang<gerry@linux.alibaba.com>
> > > 
> > > Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
> > > virtio over mmio devices as a lightweight machine model for modern
> > > cloud. The standard virtio over MMIO transport layer only supports one
> > > legacy interrupt, which is much heavier than virtio over PCI transport
> > > layer using MSI. Legacy interrupt has long work path and causes specific
> > > VMExits in following cases, which would considerably slow down the
> > > performance:
> > > 
> > > 1) read interrupt status register
> > > 2) update interrupt status register
> > > 3) write IOAPIC EOI register
> > > 
> > > We proposed to add MSI support for virtio over MMIO via new feature
> > > bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
> > > 
> > > With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
> > > uses msi_sharing[1] to indicate the event and vector mapping.
> > > Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
> > > Bit 1 is 1: device uses sharing mode and dynamic mapping.
> > 
> > 
> > I believe dynamic mapping should cover the case of fixed vector?
> > 
> Actually this bit *aims* for msi sharing or msi non-sharing.
> 
> It means, when msi sharing bit is 1, device doesn't want vector per queue
> 
> (it wants msi vector sharing as name) and doesn't want a high interrupt
> rate.
> 
> So driver turns to !per_vq_vectors and has to do dynamical mapping.
> 
> So they are opposite not superset.
> 
> Thanks!
> 
> Jing

what's the point of all this flexibility? the cover letter
complains about the code size of pci, then you go back
and add a ton of options with no justification.



> 
> > Thanks
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue
  2020-02-10  9:05 ` [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue Zha Bin
  2020-02-11 10:50   ` Michael S. Tsirkin
@ 2020-02-11 11:33   ` Michael S. Tsirkin
  2020-02-12  3:39     ` Jason Wang
  1 sibling, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 11:33 UTC (permalink / raw)
  To: Zha Bin
  Cc: linux-kernel, jasowang, slp, virtio-dev, qemu-devel, gerry,
	jing2.liu, chao.p.peng

On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
> From: Liu Jiang <gerry@linux.alibaba.com>
> 
> The standard virtio-mmio devices use notification register to signal
> backend. This will cause vmexits and slow down the performance when we
> passthrough the virtio-mmio devices to guest virtual machines.
> We proposed to update virtio over MMIO spec to add the per-queue
> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
> configure notify location for each queue.
> 
> [1] https://lkml.org/lkml/2020/1/21/31
> 
> Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
> Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
> Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
> Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
> Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
> Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>


Hmm. Any way to make this static so we don't need
base and multiplier?

> ---
>  drivers/virtio/virtio_mmio.c       | 37 +++++++++++++++++++++++++++++++++++--
>  include/uapi/linux/virtio_config.h |  8 +++++++-
>  2 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 97d5725..1733ab97 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -90,6 +90,9 @@ struct virtio_mmio_device {
>  	/* a list of queues so we can dispatch IRQs */
>  	spinlock_t lock;
>  	struct list_head virtqueues;
> +
> +	unsigned short notify_base;
> +	unsigned short notify_multiplier;
>  };
>  
>  struct virtio_mmio_vq_info {
> @@ -98,6 +101,9 @@ struct virtio_mmio_vq_info {
>  
>  	/* the list node for the virtqueues list */
>  	struct list_head node;
> +
> +	/* Notify Address*/
> +	unsigned int notify_addr;
>  };
>  
>  
> @@ -119,13 +125,23 @@ static u64 vm_get_features(struct virtio_device *vdev)
>  	return features;
>  }
>  
> +static void vm_transport_features(struct virtio_device *vdev, u64 features)
> +{
> +	if (features & BIT_ULL(VIRTIO_F_MMIO_NOTIFICATION))
> +		__virtio_set_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION);
> +}
> +
>  static int vm_finalize_features(struct virtio_device *vdev)
>  {
>  	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +	u64 features = vdev->features;
>  
>  	/* Give virtio_ring a chance to accept features. */
>  	vring_transport_features(vdev);
>  
> +	/* Give virtio_mmio a chance to accept features. */
> +	vm_transport_features(vdev, features);
> +
>  	/* Make sure there is are no mixed devices */
>  	if (vm_dev->version == 2 &&
>  			!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> @@ -272,10 +288,13 @@ static void vm_reset(struct virtio_device *vdev)
>  static bool vm_notify(struct virtqueue *vq)
>  {
>  	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
> +	struct virtio_mmio_vq_info *info = vq->priv;
>  
> -	/* We write the queue's selector into the notification register to
> +	/* We write the queue's selector into the Notify Address to
>  	 * signal the other end */
> -	writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> +	if (info)
> +		writel(vq->index, vm_dev->base + info->notify_addr);
> +
>  	return true;
>  }
>  
> @@ -434,6 +453,12 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
>  	vq->priv = info;
>  	info->vq = vq;
>  
> +	if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION))
> +		info->notify_addr = vm_dev->notify_base +
> +				vm_dev->notify_multiplier * vq->index;
> +	else
> +		info->notify_addr = VIRTIO_MMIO_QUEUE_NOTIFY;
> +
>  	spin_lock_irqsave(&vm_dev->lock, flags);
>  	list_add(&info->node, &vm_dev->virtqueues);
>  	spin_unlock_irqrestore(&vm_dev->lock, flags);
> @@ -471,6 +496,14 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
>  		return irq;
>  	}
>  
> +	if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION)) {
> +		unsigned int notify = readl(vm_dev->base +
> +				VIRTIO_MMIO_QUEUE_NOTIFY);


that register is documented as:

/* Queue notifier - Write Only */
#define VIRTIO_MMIO_QUEUE_NOTIFY        0x050

so at least you need to update the doc.

> +
> +		vm_dev->notify_base = notify & 0xffff;
> +		vm_dev->notify_multiplier = (notify >> 16) & 0xffff;

are 16 bit base/limit always enough?
In fact won't we be short on 16 bit address space
in a rather short order if queues use up a page
of space at a time?


> +	}
> +
>  	err = request_irq(irq, vm_interrupt, IRQF_SHARED,
>  			dev_name(&vdev->dev), vm_dev);
>  	if (err)
> diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> index ff8e7dc..5d93c01 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -52,7 +52,7 @@
>   * rest are per-device feature bits.
>   */
>  #define VIRTIO_TRANSPORT_F_START	28
> -#define VIRTIO_TRANSPORT_F_END		38
> +#define VIRTIO_TRANSPORT_F_END		40
>  
>  #ifndef VIRTIO_CONFIG_NO_LEGACY
>  /* Do we get callbacks when the ring is completely used, even if we've
> @@ -88,4 +88,10 @@
>   * Does the device support Single Root I/O Virtualization?
>   */
>  #define VIRTIO_F_SR_IOV			37
> +
> +/*
> + * This feature indicates the enhanced notification support on MMIO transport
> + * layer.

Let's replace this with an actual description of the enhancement please
otherwise it will not make sense in a couple of months.

e.g. "Per queue notification address"?


> + */
> +#define VIRTIO_F_MMIO_NOTIFICATION	39
>  #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> -- 
> 1.8.3.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-11  7:40           ` Jason Wang
@ 2020-02-11 11:58             ` Michael S. Tsirkin
  2020-02-11 12:04               ` Jason Wang
       [not found]             ` <4c19292f-9d25-a859-3dde-6dd5a03fdf0b@linux.intel.com>
  1 sibling, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 11:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: Liu, Jing2, Zha Bin, linux-kernel, virtio-dev, slp, qemu-devel,
	chao.p.peng, gerry

On Tue, Feb 11, 2020 at 03:40:23PM +0800, Jason Wang wrote:
> 
> On 2020/2/11 下午2:02, Liu, Jing2 wrote:
> > 
> > 
> > On 2/11/2020 12:02 PM, Jason Wang wrote:
> > > 
> > > On 2020/2/11 上午11:35, Liu, Jing2 wrote:
> > > > 
> > > > On 2/11/2020 11:17 AM, Jason Wang wrote:
> > > > > 
> > > > > On 2020/2/10 下午5:05, Zha Bin wrote:
> > > > > > From: Liu Jiang<gerry@linux.alibaba.com>
> > > > > > 
> > > > > > Userspace VMMs (e.g. Qemu microvm, Firecracker) take
> > > > > > advantage of using
> > > > > > virtio over mmio devices as a lightweight machine model for modern
> > > > > > cloud. The standard virtio over MMIO transport layer
> > > > > > only supports one
> > > > > > legacy interrupt, which is much heavier than virtio over
> > > > > > PCI transport
> > > > > > layer using MSI. Legacy interrupt has long work path and
> > > > > > causes specific
> > > > > > VMExits in following cases, which would considerably slow down the
> > > > > > performance:
> > > > > > 
> > > > > > 1) read interrupt status register
> > > > > > 2) update interrupt status register
> > > > > > 3) write IOAPIC EOI register
> > > > > > 
> > > > > > We proposed to add MSI support for virtio over MMIO via new feature
> > > > > > bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
> > > > > > 
> > > > > > With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
> > > > > > uses msi_sharing[1] to indicate the event and vector mapping.
> > > > > > Bit 1 is 0: device uses non-sharing and fixed vector per
> > > > > > event mapping.
> > > > > > Bit 1 is 1: device uses sharing mode and dynamic mapping.
> > > > > 
> > > > > 
> > > > > I believe dynamic mapping should cover the case of fixed vector?
> > > > > 
> > > > Actually this bit *aims* for msi sharing or msi non-sharing.
> > > > 
> > > > It means, when msi sharing bit is 1, device doesn't want vector
> > > > per queue
> > > > 
> > > > (it wants msi vector sharing as name) and doesn't want a high
> > > > interrupt rate.
> > > > 
> > > > So driver turns to !per_vq_vectors and has to do dynamical mapping.
> > > > 
> > > > So they are opposite not superset.
> > > > 
> > > > Thanks!
> > > > 
> > > > Jing
> > > 
> > > 
> > > I think you need add more comments on the command.
> > > 
> > > E.g if I want to map vector 0 to queue 1, how do I need to do?
> > > 
> > > write(1, queue_sel);
> > > write(0, vector_sel);
> > 
> > That's true. Besides, two commands are used for msi sharing mode,
> > 
> > VIRTIO_MMIO_MSI_CMD_MAP_CONFIG and VIRTIO_MMIO_MSI_CMD_MAP_QUEUE.
> > 
> > "To set up the event and vector mapping for MSI sharing mode, driver
> > SHOULD write a valid MsiVecSel followed by
> > VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE command to
> > map the configuration change/selected queue events respectively.  " (See
> > spec patch 5/5)
> > 
> > So if driver detects the msi sharing mode, when it does setup vq, writes
> > the queue_sel (this already exists in setup vq), vector sel and then
> > MAP_QUEUE command to do the queue event mapping.
> > 
> 
> So actually the per vq msix could be done through this. I don't get why you
> need to introduce MSI_SHARING_MASK which is the charge of driver instead of
> device. The interrupt rate should have no direct relationship with whether
> it has been shared or not.
> 
> Btw, you introduce mask/unmask without pending, how to deal with the lost
> interrupt during the masking then?

pending can be an internal device register. as long as device
does not lose interrupts while masked, all's well.

There's value is being able to say "this queue sends no
interrupts do not bother checking used notification area".
so we need way to say that. So I guess an enable interrupts
register might have some value...
But besides that, it's enough to have mask/unmask/address/data
per vq.


> 
> > For msi non-sharing mode, no special action is needed because we make
> > the rule of per_vq_vector and fixed relationship.
> > 
> > Correct me if this is not that clear for spec/code comments.
> > 
> 
> The ABI is not as straightforward as PCI did. Why not just reuse the PCI
> layout?
> 
> E.g having
> 
> queue_sel
> queue_msix_vector
> msix_config
> 
> for configuring map between msi vector and queues/config
> 
> Then
> 
> vector_sel
> address
> data
> pending
> mask
> unmask
> 
> for configuring msi table?
> 
> Thanks
> 
> 
> > Thanks!
> > 
> > Jing
> > 
> > 
> > > 
> > > ?
> > > 
> > > Thanks
> > > 
> > > 
> > > > 
> > > > 
> > > > > Thanks
> > > > > 
> > > > > 
> > > > > 
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > 
> > > > 
> > > 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-11 11:58             ` Michael S. Tsirkin
@ 2020-02-11 12:04               ` Jason Wang
  2020-02-11 12:08                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2020-02-11 12:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, Zha Bin, slp, Liu, Jing2, linux-kernel, qemu-devel,
	chao.p.peng, gerry


On 2020/2/11 下午7:58, Michael S. Tsirkin wrote:
> On Tue, Feb 11, 2020 at 03:40:23PM +0800, Jason Wang wrote:
>> On 2020/2/11 下午2:02, Liu, Jing2 wrote:
>>> On 2/11/2020 12:02 PM, Jason Wang wrote:
>>>> On 2020/2/11 上午11:35, Liu, Jing2 wrote:
>>>>> On 2/11/2020 11:17 AM, Jason Wang wrote:
>>>>>> On 2020/2/10 下午5:05, Zha Bin wrote:
>>>>>>> From: Liu Jiang<gerry@linux.alibaba.com>
>>>>>>>
>>>>>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take
>>>>>>> advantage of using
>>>>>>> virtio over mmio devices as a lightweight machine model for modern
>>>>>>> cloud. The standard virtio over MMIO transport layer
>>>>>>> only supports one
>>>>>>> legacy interrupt, which is much heavier than virtio over
>>>>>>> PCI transport
>>>>>>> layer using MSI. Legacy interrupt has long work path and
>>>>>>> causes specific
>>>>>>> VMExits in following cases, which would considerably slow down the
>>>>>>> performance:
>>>>>>>
>>>>>>> 1) read interrupt status register
>>>>>>> 2) update interrupt status register
>>>>>>> 3) write IOAPIC EOI register
>>>>>>>
>>>>>>> We proposed to add MSI support for virtio over MMIO via new feature
>>>>>>> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
>>>>>>>
>>>>>>> With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
>>>>>>> uses msi_sharing[1] to indicate the event and vector mapping.
>>>>>>> Bit 1 is 0: device uses non-sharing and fixed vector per
>>>>>>> event mapping.
>>>>>>> Bit 1 is 1: device uses sharing mode and dynamic mapping.
>>>>>> I believe dynamic mapping should cover the case of fixed vector?
>>>>>>
>>>>> Actually this bit*aims*  for msi sharing or msi non-sharing.
>>>>>
>>>>> It means, when msi sharing bit is 1, device doesn't want vector
>>>>> per queue
>>>>>
>>>>> (it wants msi vector sharing as name) and doesn't want a high
>>>>> interrupt rate.
>>>>>
>>>>> So driver turns to !per_vq_vectors and has to do dynamical mapping.
>>>>>
>>>>> So they are opposite not superset.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Jing
>>>> I think you need add more comments on the command.
>>>>
>>>> E.g if I want to map vector 0 to queue 1, how do I need to do?
>>>>
>>>> write(1, queue_sel);
>>>> write(0, vector_sel);
>>> That's true. Besides, two commands are used for msi sharing mode,
>>>
>>> VIRTIO_MMIO_MSI_CMD_MAP_CONFIG and VIRTIO_MMIO_MSI_CMD_MAP_QUEUE.
>>>
>>> "To set up the event and vector mapping for MSI sharing mode, driver
>>> SHOULD write a valid MsiVecSel followed by
>>> VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE command to
>>> map the configuration change/selected queue events respectively.  " (See
>>> spec patch 5/5)
>>>
>>> So if driver detects the msi sharing mode, when it does setup vq, writes
>>> the queue_sel (this already exists in setup vq), vector sel and then
>>> MAP_QUEUE command to do the queue event mapping.
>>>
>> So actually the per vq msix could be done through this. I don't get why you
>> need to introduce MSI_SHARING_MASK which is the charge of driver instead of
>> device. The interrupt rate should have no direct relationship with whether
>> it has been shared or not.
>>
>> Btw, you introduce mask/unmask without pending, how to deal with the lost
>> interrupt during the masking then?
> pending can be an internal device register. as long as device
> does not lose interrupts while masked, all's well.


You meant raise the interrupt during unmask automatically?


>
> There's value is being able to say "this queue sends no
> interrupts do not bother checking used notification area".
> so we need way to say that. So I guess an enable interrupts
> register might have some value...
> But besides that, it's enough to have mask/unmask/address/data
> per vq.


Just to check, do you mean "per vector" here?

Thanks


>
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-11 12:04               ` Jason Wang
@ 2020-02-11 12:08                 ` Michael S. Tsirkin
  2020-02-11 12:18                   ` Jason Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 12:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, Zha Bin, slp, Liu, Jing2, linux-kernel, qemu-devel,
	chao.p.peng, gerry

On Tue, Feb 11, 2020 at 08:04:24PM +0800, Jason Wang wrote:
> 
> On 2020/2/11 下午7:58, Michael S. Tsirkin wrote:
> > On Tue, Feb 11, 2020 at 03:40:23PM +0800, Jason Wang wrote:
> > > On 2020/2/11 下午2:02, Liu, Jing2 wrote:
> > > > On 2/11/2020 12:02 PM, Jason Wang wrote:
> > > > > On 2020/2/11 上午11:35, Liu, Jing2 wrote:
> > > > > > On 2/11/2020 11:17 AM, Jason Wang wrote:
> > > > > > > On 2020/2/10 下午5:05, Zha Bin wrote:
> > > > > > > > From: Liu Jiang<gerry@linux.alibaba.com>
> > > > > > > > 
> > > > > > > > Userspace VMMs (e.g. Qemu microvm, Firecracker) take
> > > > > > > > advantage of using
> > > > > > > > virtio over mmio devices as a lightweight machine model for modern
> > > > > > > > cloud. The standard virtio over MMIO transport layer
> > > > > > > > only supports one
> > > > > > > > legacy interrupt, which is much heavier than virtio over
> > > > > > > > PCI transport
> > > > > > > > layer using MSI. Legacy interrupt has long work path and
> > > > > > > > causes specific
> > > > > > > > VMExits in following cases, which would considerably slow down the
> > > > > > > > performance:
> > > > > > > > 
> > > > > > > > 1) read interrupt status register
> > > > > > > > 2) update interrupt status register
> > > > > > > > 3) write IOAPIC EOI register
> > > > > > > > 
> > > > > > > > We proposed to add MSI support for virtio over MMIO via new feature
> > > > > > > > bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
> > > > > > > > 
> > > > > > > > With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
> > > > > > > > uses msi_sharing[1] to indicate the event and vector mapping.
> > > > > > > > Bit 1 is 0: device uses non-sharing and fixed vector per
> > > > > > > > event mapping.
> > > > > > > > Bit 1 is 1: device uses sharing mode and dynamic mapping.
> > > > > > > I believe dynamic mapping should cover the case of fixed vector?
> > > > > > > 
> > > > > > Actually this bit*aims*  for msi sharing or msi non-sharing.
> > > > > > 
> > > > > > It means, when msi sharing bit is 1, device doesn't want vector
> > > > > > per queue
> > > > > > 
> > > > > > (it wants msi vector sharing as name) and doesn't want a high
> > > > > > interrupt rate.
> > > > > > 
> > > > > > So driver turns to !per_vq_vectors and has to do dynamical mapping.
> > > > > > 
> > > > > > So they are opposite not superset.
> > > > > > 
> > > > > > Thanks!
> > > > > > 
> > > > > > Jing
> > > > > I think you need add more comments on the command.
> > > > > 
> > > > > E.g if I want to map vector 0 to queue 1, how do I need to do?
> > > > > 
> > > > > write(1, queue_sel);
> > > > > write(0, vector_sel);
> > > > That's true. Besides, two commands are used for msi sharing mode,
> > > > 
> > > > VIRTIO_MMIO_MSI_CMD_MAP_CONFIG and VIRTIO_MMIO_MSI_CMD_MAP_QUEUE.
> > > > 
> > > > "To set up the event and vector mapping for MSI sharing mode, driver
> > > > SHOULD write a valid MsiVecSel followed by
> > > > VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE command to
> > > > map the configuration change/selected queue events respectively.  " (See
> > > > spec patch 5/5)
> > > > 
> > > > So if driver detects the msi sharing mode, when it does setup vq, writes
> > > > the queue_sel (this already exists in setup vq), vector sel and then
> > > > MAP_QUEUE command to do the queue event mapping.
> > > > 
> > > So actually the per vq msix could be done through this. I don't get why you
> > > need to introduce MSI_SHARING_MASK which is the charge of driver instead of
> > > device. The interrupt rate should have no direct relationship with whether
> > > it has been shared or not.
> > > 
> > > Btw, you introduce mask/unmask without pending, how to deal with the lost
> > > interrupt during the masking then?
> > pending can be an internal device register. as long as device
> > does not lose interrupts while masked, all's well.
> 
> 
> You meant raise the interrupt during unmask automatically?
> 


yes - that's also what pci does.

the guest visible pending bit is partially implemented in qemu
as per pci spec but it's unused.

> > 
> > There's value is being able to say "this queue sends no
> > interrupts do not bother checking used notification area".
> > so we need way to say that. So I guess an enable interrupts
> > register might have some value...
> > But besides that, it's enough to have mask/unmask/address/data
> > per vq.
> 
> 
> Just to check, do you mean "per vector" here?
> 
> Thanks
> 

No, per VQ. An indirection VQ -> vector -> address/data isn't
necessary for PCI either, but that ship has sailed.

-- 
MST


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-11 12:08                 ` Michael S. Tsirkin
@ 2020-02-11 12:18                   ` Jason Wang
  2020-02-11 14:00                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2020-02-11 12:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, Zha Bin, slp, Liu, Jing2, linux-kernel, qemu-devel,
	chao.p.peng, gerry


On 2020/2/11 下午8:08, Michael S. Tsirkin wrote:
> On Tue, Feb 11, 2020 at 08:04:24PM +0800, Jason Wang wrote:
>> On 2020/2/11 下午7:58, Michael S. Tsirkin wrote:
>>> On Tue, Feb 11, 2020 at 03:40:23PM +0800, Jason Wang wrote:
>>>> On 2020/2/11 下午2:02, Liu, Jing2 wrote:
>>>>> On 2/11/2020 12:02 PM, Jason Wang wrote:
>>>>>> On 2020/2/11 上午11:35, Liu, Jing2 wrote:
>>>>>>> On 2/11/2020 11:17 AM, Jason Wang wrote:
>>>>>>>> On 2020/2/10 下午5:05, Zha Bin wrote:
>>>>>>>>> From: Liu Jiang<gerry@linux.alibaba.com>
>>>>>>>>>
>>>>>>>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take
>>>>>>>>> advantage of using
>>>>>>>>> virtio over mmio devices as a lightweight machine model for modern
>>>>>>>>> cloud. The standard virtio over MMIO transport layer
>>>>>>>>> only supports one
>>>>>>>>> legacy interrupt, which is much heavier than virtio over
>>>>>>>>> PCI transport
>>>>>>>>> layer using MSI. Legacy interrupt has long work path and
>>>>>>>>> causes specific
>>>>>>>>> VMExits in following cases, which would considerably slow down the
>>>>>>>>> performance:
>>>>>>>>>
>>>>>>>>> 1) read interrupt status register
>>>>>>>>> 2) update interrupt status register
>>>>>>>>> 3) write IOAPIC EOI register
>>>>>>>>>
>>>>>>>>> We proposed to add MSI support for virtio over MMIO via new feature
>>>>>>>>> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
>>>>>>>>>
>>>>>>>>> With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
>>>>>>>>> uses msi_sharing[1] to indicate the event and vector mapping.
>>>>>>>>> Bit 1 is 0: device uses non-sharing and fixed vector per
>>>>>>>>> event mapping.
>>>>>>>>> Bit 1 is 1: device uses sharing mode and dynamic mapping.
>>>>>>>> I believe dynamic mapping should cover the case of fixed vector?
>>>>>>>>
>>>>>>> Actually this bit*aims*  for msi sharing or msi non-sharing.
>>>>>>>
>>>>>>> It means, when msi sharing bit is 1, device doesn't want vector
>>>>>>> per queue
>>>>>>>
>>>>>>> (it wants msi vector sharing as name) and doesn't want a high
>>>>>>> interrupt rate.
>>>>>>>
>>>>>>> So driver turns to !per_vq_vectors and has to do dynamical mapping.
>>>>>>>
>>>>>>> So they are opposite not superset.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Jing
>>>>>> I think you need add more comments on the command.
>>>>>>
>>>>>> E.g if I want to map vector 0 to queue 1, how do I need to do?
>>>>>>
>>>>>> write(1, queue_sel);
>>>>>> write(0, vector_sel);
>>>>> That's true. Besides, two commands are used for msi sharing mode,
>>>>>
>>>>> VIRTIO_MMIO_MSI_CMD_MAP_CONFIG and VIRTIO_MMIO_MSI_CMD_MAP_QUEUE.
>>>>>
>>>>> "To set up the event and vector mapping for MSI sharing mode, driver
>>>>> SHOULD write a valid MsiVecSel followed by
>>>>> VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE command to
>>>>> map the configuration change/selected queue events respectively.  " (See
>>>>> spec patch 5/5)
>>>>>
>>>>> So if driver detects the msi sharing mode, when it does setup vq, writes
>>>>> the queue_sel (this already exists in setup vq), vector sel and then
>>>>> MAP_QUEUE command to do the queue event mapping.
>>>>>
>>>> So actually the per vq msix could be done through this. I don't get why you
>>>> need to introduce MSI_SHARING_MASK which is the charge of driver instead of
>>>> device. The interrupt rate should have no direct relationship with whether
>>>> it has been shared or not.
>>>>
>>>> Btw, you introduce mask/unmask without pending, how to deal with the lost
>>>> interrupt during the masking then?
>>> pending can be an internal device register. as long as device
>>> does not lose interrupts while masked, all's well.
>>
>> You meant raise the interrupt during unmask automatically?
>>
>
> yes - that's also what pci does.
>
> the guest visible pending bit is partially implemented in qemu
> as per pci spec but it's unused.


Ok.


>
>>> There's value is being able to say "this queue sends no
>>> interrupts do not bother checking used notification area".
>>> so we need way to say that. So I guess an enable interrupts
>>> register might have some value...
>>> But besides that, it's enough to have mask/unmask/address/data
>>> per vq.
>>
>> Just to check, do you mean "per vector" here?
>>
>> Thanks
>>
> No, per VQ. An indirection VQ -> vector -> address/data isn't
> necessary for PCI either, but that ship has sailed.


Yes, it can work but it may bring extra effort when you want to mask a 
vector which is was shared by a lot of queues.

Thanks

>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-11 12:18                   ` Jason Wang
@ 2020-02-11 14:00                     ` Michael S. Tsirkin
  2020-02-12  9:03                       ` Jason Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 14:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, Zha Bin, slp, Liu, Jing2, linux-kernel, qemu-devel,
	chao.p.peng, gerry

On Tue, Feb 11, 2020 at 08:18:54PM +0800, Jason Wang wrote:
> 
> On 2020/2/11 下午8:08, Michael S. Tsirkin wrote:
> > On Tue, Feb 11, 2020 at 08:04:24PM +0800, Jason Wang wrote:
> > > On 2020/2/11 下午7:58, Michael S. Tsirkin wrote:
> > > > On Tue, Feb 11, 2020 at 03:40:23PM +0800, Jason Wang wrote:
> > > > > On 2020/2/11 下午2:02, Liu, Jing2 wrote:
> > > > > > On 2/11/2020 12:02 PM, Jason Wang wrote:
> > > > > > > On 2020/2/11 上午11:35, Liu, Jing2 wrote:
> > > > > > > > On 2/11/2020 11:17 AM, Jason Wang wrote:
> > > > > > > > > On 2020/2/10 下午5:05, Zha Bin wrote:
> > > > > > > > > > From: Liu Jiang<gerry@linux.alibaba.com>
> > > > > > > > > > 
> > > > > > > > > > Userspace VMMs (e.g. Qemu microvm, Firecracker) take
> > > > > > > > > > advantage of using
> > > > > > > > > > virtio over mmio devices as a lightweight machine model for modern
> > > > > > > > > > cloud. The standard virtio over MMIO transport layer
> > > > > > > > > > only supports one
> > > > > > > > > > legacy interrupt, which is much heavier than virtio over
> > > > > > > > > > PCI transport
> > > > > > > > > > layer using MSI. Legacy interrupt has long work path and
> > > > > > > > > > causes specific
> > > > > > > > > > VMExits in following cases, which would considerably slow down the
> > > > > > > > > > performance:
> > > > > > > > > > 
> > > > > > > > > > 1) read interrupt status register
> > > > > > > > > > 2) update interrupt status register
> > > > > > > > > > 3) write IOAPIC EOI register
> > > > > > > > > > 
> > > > > > > > > > We proposed to add MSI support for virtio over MMIO via new feature
> > > > > > > > > > bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
> > > > > > > > > > 
> > > > > > > > > > With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
> > > > > > > > > > uses msi_sharing[1] to indicate the event and vector mapping.
> > > > > > > > > > Bit 1 is 0: device uses non-sharing and fixed vector per
> > > > > > > > > > event mapping.
> > > > > > > > > > Bit 1 is 1: device uses sharing mode and dynamic mapping.
> > > > > > > > > I believe dynamic mapping should cover the case of fixed vector?
> > > > > > > > > 
> > > > > > > > Actually this bit*aims*  for msi sharing or msi non-sharing.
> > > > > > > > 
> > > > > > > > It means, when msi sharing bit is 1, device doesn't want vector
> > > > > > > > per queue
> > > > > > > > 
> > > > > > > > (it wants msi vector sharing as name) and doesn't want a high
> > > > > > > > interrupt rate.
> > > > > > > > 
> > > > > > > > So driver turns to !per_vq_vectors and has to do dynamical mapping.
> > > > > > > > 
> > > > > > > > So they are opposite not superset.
> > > > > > > > 
> > > > > > > > Thanks!
> > > > > > > > 
> > > > > > > > Jing
> > > > > > > I think you need add more comments on the command.
> > > > > > > 
> > > > > > > E.g if I want to map vector 0 to queue 1, how do I need to do?
> > > > > > > 
> > > > > > > write(1, queue_sel);
> > > > > > > write(0, vector_sel);
> > > > > > That's true. Besides, two commands are used for msi sharing mode,
> > > > > > 
> > > > > > VIRTIO_MMIO_MSI_CMD_MAP_CONFIG and VIRTIO_MMIO_MSI_CMD_MAP_QUEUE.
> > > > > > 
> > > > > > "To set up the event and vector mapping for MSI sharing mode, driver
> > > > > > SHOULD write a valid MsiVecSel followed by
> > > > > > VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE command to
> > > > > > map the configuration change/selected queue events respectively.  " (See
> > > > > > spec patch 5/5)
> > > > > > 
> > > > > > So if driver detects the msi sharing mode, when it does setup vq, writes
> > > > > > the queue_sel (this already exists in setup vq), vector sel and then
> > > > > > MAP_QUEUE command to do the queue event mapping.
> > > > > > 
> > > > > So actually the per vq msix could be done through this. I don't get why you
> > > > > need to introduce MSI_SHARING_MASK which is the charge of driver instead of
> > > > > device. The interrupt rate should have no direct relationship with whether
> > > > > it has been shared or not.
> > > > > 
> > > > > Btw, you introduce mask/unmask without pending, how to deal with the lost
> > > > > interrupt during the masking then?
> > > > pending can be an internal device register. as long as device
> > > > does not lose interrupts while masked, all's well.
> > > 
> > > You meant raise the interrupt during unmask automatically?
> > > 
> > 
> > yes - that's also what pci does.
> > 
> > the guest visible pending bit is partially implemented in qemu
> > as per pci spec but it's unused.
> 
> 
> Ok.
> 
> 
> > 
> > > > There's value is being able to say "this queue sends no
> > > > interrupts do not bother checking used notification area".
> > > > so we need way to say that. So I guess an enable interrupts
> > > > register might have some value...
> > > > But besides that, it's enough to have mask/unmask/address/data
> > > > per vq.
> > > 
> > > Just to check, do you mean "per vector" here?
> > > 
> > > Thanks
> > > 
> > No, per VQ. An indirection VQ -> vector -> address/data isn't
> > necessary for PCI either, but that ship has sailed.
> 
> 
> Yes, it can work but it may bring extra effort when you want to mask a
> vector which is was shared by a lot of queues.
> 
> Thanks
> 

masking should be per vq too.

-- 
MST


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/5] virtio mmio specification enhancement
  2020-02-10 11:44 ` [PATCH v2 0/5] virtio mmio specification enhancement Michael S. Tsirkin
@ 2020-02-11 16:05   ` Chao Peng
  2020-02-11 10:57     ` Michael S. Tsirkin
  0 siblings, 1 reply; 40+ messages in thread
From: Chao Peng @ 2020-02-11 16:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zha Bin, linux-kernel, jasowang, slp, virtio-dev, qemu-devel,
	gerry, jing2.liu

On Mon, Feb 10, 2020 at 06:44:50AM -0500, Michael S. Tsirkin wrote:
> On Mon, Feb 10, 2020 at 05:05:16PM +0800, Zha Bin wrote:
> > We have compared the number of files and the lines of code between
> > virtio-mmio and virio-pci.
> > 
> > 				Virtio-PCI	    Virtio-MMIO	
> > 	number of files(Linux)	    161			1
> > 	lines of code(Linux)	    78237		538
> 
> 
> 
> Something's very wrong here. virtio PCI is 161 files?
> Are you counting the whole PCI subsystem?

Right, that is just a rough statistics. Surely enough, some drivers will
never get enabled in a typcial config.

> Sure enough:
> 
> $ find drivers/pci -name '*c' |wc -l
> 150

and plus:
$ find arch/x86/pci/ -name '*c' |wc -l
22

> 
> That's not reasonable, this includes a bunch of drivers that
> never run on a typical hypervisor.
> 
> MMIO is also not as small as you are trying to show:
> 
> $ cloc drivers/virtio/virtio_mmio.c include/uapi/linux/virtio_mmio.h
>        2 text files.
>        2 unique files.                              
>        0 files ignored.
> 
> github.com/AlDanial/cloc v 1.82  T=0.01 s (230.7 files/s, 106126.5 lines/s)
> -------------------------------------------------------------------------------
> Language                     files          blank        comment           code
> -------------------------------------------------------------------------------
> C                                1            144            100            535
> C/C++ Header                     1             39             66             36
> -------------------------------------------------------------------------------
> SUM:                             2            183            166            571
> -------------------------------------------------------------------------------
> 
> 
> I don't doubt MMIO is smaller than PCI. Of course that's because it has
> no features to speak of - just this patch already doubles it's size. If
> we keep doing that because we want the features then they will reach
> the same size in about 4 iterations.

Since current virtio-mmio size is small enough, so adding any notable
feature would easily double it. I have no objection that it may one day
reach the same level of PCI, but in this patch some are actually
generic changes and for MSI specific code we provide the option to
confige away.

Thanks,
Chao

> 
> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 2/5] virtio-mmio: refactor common functionality
  2020-02-11 11:19   ` Michael S. Tsirkin
@ 2020-02-12  2:58     ` Liu, Jing2
  2020-02-12  7:29       ` Michael S. Tsirkin
  0 siblings, 1 reply; 40+ messages in thread
From: Liu, Jing2 @ 2020-02-12  2:58 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zha Bin
  Cc: linux-kernel, jasowang, slp, virtio-dev, qemu-devel, gerry, chao.p.peng


On 2/11/2020 7:19 PM, Michael S. Tsirkin wrote:
> On Mon, Feb 10, 2020 at 05:05:18PM +0800, Zha Bin wrote:
>> From: Liu Jiang <gerry@linux.alibaba.com>
>>
>> Common functionality is refactored into virtio_mmio_common.h
>> in order to MSI support in later patch set.
>>
>> Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
>> Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
>> Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
>> Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
>> Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
>> Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> What does this proliferation of header files achieve?
> common with what?

We're considering that the virtio mmio structure is useful for virtio 
mmio msi file so refactor out.

e.g. to get the base of virtio_mmio_device from struct msi_desc *desc.

Jing


>> ---
>>   drivers/virtio/virtio_mmio.c        | 21 +--------------------
>>   drivers/virtio/virtio_mmio_common.h | 31 +++++++++++++++++++++++++++++++
>>   2 files changed, 32 insertions(+), 20 deletions(-)
>>   create mode 100644 drivers/virtio/virtio_mmio_common.h
>>
>> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
>> index 1733ab97..41e1c93 100644
>> --- a/drivers/virtio/virtio_mmio.c
>> +++ b/drivers/virtio/virtio_mmio.c
>> @@ -61,13 +61,12 @@
>>   #include <linux/io.h>
>>   #include <linux/list.h>
>>   #include <linux/module.h>
>> -#include <linux/platform_device.h>
>>   #include <linux/slab.h>
>>   #include <linux/spinlock.h>
>> -#include <linux/virtio.h>
>>   #include <linux/virtio_config.h>
>>   #include <uapi/linux/virtio_mmio.h>
>>   #include <linux/virtio_ring.h>
>> +#include "virtio_mmio_common.h"
>>   
>>   
>>   
>> @@ -77,24 +76,6 @@
>>   
>>   
>>   
>> -#define to_virtio_mmio_device(_plat_dev) \
>> -	container_of(_plat_dev, struct virtio_mmio_device, vdev)
>> -
>> -struct virtio_mmio_device {
>> -	struct virtio_device vdev;
>> -	struct platform_device *pdev;
>> -
>> -	void __iomem *base;
>> -	unsigned long version;
>> -
>> -	/* a list of queues so we can dispatch IRQs */
>> -	spinlock_t lock;
>> -	struct list_head virtqueues;
>> -
>> -	unsigned short notify_base;
>> -	unsigned short notify_multiplier;
>> -};
>> -
>>   struct virtio_mmio_vq_info {
>>   	/* the actual virtqueue */
>>   	struct virtqueue *vq;
>> diff --git a/drivers/virtio/virtio_mmio_common.h b/drivers/virtio/virtio_mmio_common.h
>> new file mode 100644
>> index 0000000..90cb304
>> --- /dev/null
>> +++ b/drivers/virtio/virtio_mmio_common.h
>> @@ -0,0 +1,31 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +#ifndef _DRIVERS_VIRTIO_VIRTIO_MMIO_COMMON_H
>> +#define _DRIVERS_VIRTIO_VIRTIO_MMIO_COMMON_H
>> +/*
>> + * Virtio MMIO driver - common functionality for all device versions
>> + *
>> + * This module allows virtio devices to be used over a memory-mapped device.
>> + */
>> +
>> +#include <linux/platform_device.h>
>> +#include <linux/virtio.h>
>> +
>> +#define to_virtio_mmio_device(_plat_dev) \
>> +	container_of(_plat_dev, struct virtio_mmio_device, vdev)
>> +
>> +struct virtio_mmio_device {
>> +	struct virtio_device vdev;
>> +	struct platform_device *pdev;
>> +
>> +	void __iomem *base;
>> +	unsigned long version;
>> +
>> +	/* a list of queues so we can dispatch IRQs */
>> +	spinlock_t lock;
>> +	struct list_head virtqueues;
>> +
>> +	unsigned short notify_base;
>> +	unsigned short notify_multiplier;
>> +};
>> +
>> +#endif
>> -- 
>> 1.8.3.1
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue
  2020-02-11 11:33   ` Michael S. Tsirkin
@ 2020-02-12  3:39     ` Jason Wang
  2020-02-12  8:18       ` Michael S. Tsirkin
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2020-02-12  3:39 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zha Bin
  Cc: virtio-dev, slp, jing2.liu, linux-kernel, qemu-devel, chao.p.peng, gerry


On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
> On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
>> From: Liu Jiang<gerry@linux.alibaba.com>
>>
>> The standard virtio-mmio devices use notification register to signal
>> backend. This will cause vmexits and slow down the performance when we
>> passthrough the virtio-mmio devices to guest virtual machines.
>> We proposed to update virtio over MMIO spec to add the per-queue
>> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
>> configure notify location for each queue.
>>
>> [1]https://lkml.org/lkml/2020/1/21/31
>>
>> Signed-off-by: Liu Jiang<gerry@linux.alibaba.com>
>> Co-developed-by: Zha Bin<zhabin@linux.alibaba.com>
>> Signed-off-by: Zha Bin<zhabin@linux.alibaba.com>
>> Co-developed-by: Jing Liu<jing2.liu@linux.intel.com>
>> Signed-off-by: Jing Liu<jing2.liu@linux.intel.com>
>> Co-developed-by: Chao Peng<chao.p.peng@linux.intel.com>
>> Signed-off-by: Chao Peng<chao.p.peng@linux.intel.com>
> Hmm. Any way to make this static so we don't need
> base and multiplier?


E.g page per vq?

Thanks


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 2/5] virtio-mmio: refactor common functionality
  2020-02-12  2:58     ` [virtio-dev] " Liu, Jing2
@ 2020-02-12  7:29       ` Michael S. Tsirkin
  0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-12  7:29 UTC (permalink / raw)
  To: Liu, Jing2
  Cc: Zha Bin, linux-kernel, jasowang, slp, virtio-dev, qemu-devel,
	gerry, chao.p.peng

On Wed, Feb 12, 2020 at 10:58:13AM +0800, Liu, Jing2 wrote:
> 
> On 2/11/2020 7:19 PM, Michael S. Tsirkin wrote:
> > On Mon, Feb 10, 2020 at 05:05:18PM +0800, Zha Bin wrote:
> > > From: Liu Jiang <gerry@linux.alibaba.com>
> > > 
> > > Common functionality is refactored into virtio_mmio_common.h
> > > in order to MSI support in later patch set.
> > > 
> > > Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
> > > Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
> > > Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
> > > Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
> > > Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
> > > Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
> > > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> > What does this proliferation of header files achieve?
> > common with what?
> 
> We're considering that the virtio mmio structure is useful for virtio mmio
> msi file so refactor out.

I would not put msi in a separate file either.

> e.g. to get the base of virtio_mmio_device from struct msi_desc *desc.
> 
> Jing
> 
> 
> > > ---
> > >   drivers/virtio/virtio_mmio.c        | 21 +--------------------
> > >   drivers/virtio/virtio_mmio_common.h | 31 +++++++++++++++++++++++++++++++
> > >   2 files changed, 32 insertions(+), 20 deletions(-)
> > >   create mode 100644 drivers/virtio/virtio_mmio_common.h
> > > 
> > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > index 1733ab97..41e1c93 100644
> > > --- a/drivers/virtio/virtio_mmio.c
> > > +++ b/drivers/virtio/virtio_mmio.c
> > > @@ -61,13 +61,12 @@
> > >   #include <linux/io.h>
> > >   #include <linux/list.h>
> > >   #include <linux/module.h>
> > > -#include <linux/platform_device.h>
> > >   #include <linux/slab.h>
> > >   #include <linux/spinlock.h>
> > > -#include <linux/virtio.h>
> > >   #include <linux/virtio_config.h>
> > >   #include <uapi/linux/virtio_mmio.h>
> > >   #include <linux/virtio_ring.h>
> > > +#include "virtio_mmio_common.h"
> > > @@ -77,24 +76,6 @@
> > > -#define to_virtio_mmio_device(_plat_dev) \
> > > -	container_of(_plat_dev, struct virtio_mmio_device, vdev)
> > > -
> > > -struct virtio_mmio_device {
> > > -	struct virtio_device vdev;
> > > -	struct platform_device *pdev;
> > > -
> > > -	void __iomem *base;
> > > -	unsigned long version;
> > > -
> > > -	/* a list of queues so we can dispatch IRQs */
> > > -	spinlock_t lock;
> > > -	struct list_head virtqueues;
> > > -
> > > -	unsigned short notify_base;
> > > -	unsigned short notify_multiplier;
> > > -};
> > > -
> > >   struct virtio_mmio_vq_info {
> > >   	/* the actual virtqueue */
> > >   	struct virtqueue *vq;
> > > diff --git a/drivers/virtio/virtio_mmio_common.h b/drivers/virtio/virtio_mmio_common.h
> > > new file mode 100644
> > > index 0000000..90cb304
> > > --- /dev/null
> > > +++ b/drivers/virtio/virtio_mmio_common.h
> > > @@ -0,0 +1,31 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > > +#ifndef _DRIVERS_VIRTIO_VIRTIO_MMIO_COMMON_H
> > > +#define _DRIVERS_VIRTIO_VIRTIO_MMIO_COMMON_H
> > > +/*
> > > + * Virtio MMIO driver - common functionality for all device versions
> > > + *
> > > + * This module allows virtio devices to be used over a memory-mapped device.
> > > + */
> > > +
> > > +#include <linux/platform_device.h>
> > > +#include <linux/virtio.h>
> > > +
> > > +#define to_virtio_mmio_device(_plat_dev) \
> > > +	container_of(_plat_dev, struct virtio_mmio_device, vdev)
> > > +
> > > +struct virtio_mmio_device {
> > > +	struct virtio_device vdev;
> > > +	struct platform_device *pdev;
> > > +
> > > +	void __iomem *base;
> > > +	unsigned long version;
> > > +
> > > +	/* a list of queues so we can dispatch IRQs */
> > > +	spinlock_t lock;
> > > +	struct list_head virtqueues;
> > > +
> > > +	unsigned short notify_base;
> > > +	unsigned short notify_multiplier;
> > > +};
> > > +
> > > +#endif
> > > -- 
> > > 1.8.3.1
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
       [not found]             ` <4c19292f-9d25-a859-3dde-6dd5a03fdf0b@linux.intel.com>
@ 2020-02-12  7:33               ` Michael S. Tsirkin
  2020-02-12  9:06               ` Jason Wang
  1 sibling, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-12  7:33 UTC (permalink / raw)
  To: Liu, Jing2
  Cc: Jason Wang, Zha Bin, linux-kernel, virtio-dev, slp, qemu-devel,
	chao.p.peng, gerry

On Wed, Feb 12, 2020 at 11:54:53AM +0800, Liu, Jing2 wrote:
> 
> On 2/11/2020 3:40 PM, Jason Wang wrote:
> 
> 
>     On 2020/2/11 下午2:02, Liu, Jing2 wrote:
> 
> 
> 
>         On 2/11/2020 12:02 PM, Jason Wang wrote:
> 
> 
>             On 2020/2/11 上午11:35, Liu, Jing2 wrote:
> 
> 
>                 On 2/11/2020 11:17 AM, Jason Wang wrote:
> 
> 
>                     On 2020/2/10 下午5:05, Zha Bin wrote:
> 
>                         From: Liu Jiang<gerry@linux.alibaba.com>
> 
>                         Userspace VMMs (e.g. Qemu microvm, Firecracker) take
>                         advantage of using
>                         virtio over mmio devices as a lightweight machine model
>                         for modern
>                         cloud. The standard virtio over MMIO transport layer
>                         only supports one
>                         legacy interrupt, which is much heavier than virtio
>                         over PCI transport
>                         layer using MSI. Legacy interrupt has long work path
>                         and causes specific
>                         VMExits in following cases, which would considerably
>                         slow down the
>                         performance:
> 
>                         1) read interrupt status register
>                         2) update interrupt status register
>                         3) write IOAPIC EOI register
> 
>                         We proposed to add MSI support for virtio over MMIO via
>                         new feature
>                         bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt
>                         performance.
> 
>                         With the VIRTIO_F_MMIO_MSI feature bit supported, the
>                         virtio-mmio MSI
>                         uses msi_sharing[1] to indicate the event and vector
>                         mapping.
>                         Bit 1 is 0: device uses non-sharing and fixed vector
>                         per event mapping.
>                         Bit 1 is 1: device uses sharing mode and dynamic
>                         mapping.
> 
> 
> 
>                     I believe dynamic mapping should cover the case of fixed
>                     vector?
> 
> 
>                 Actually this bit *aims* for msi sharing or msi non-sharing.
> 
>                 It means, when msi sharing bit is 1, device doesn't want vector
>                 per queue
> 
>                 (it wants msi vector sharing as name) and doesn't want a high
>                 interrupt rate.
> 
>                 So driver turns to !per_vq_vectors and has to do dynamical
>                 mapping.
> 
>                 So they are opposite not superset.
> 
>                 Thanks!
> 
>                 Jing
> 
> 
> 
>             I think you need add more comments on the command.
> 
>             E.g if I want to map vector 0 to queue 1, how do I need to do?
> 
>             write(1, queue_sel);
>             write(0, vector_sel);
> 
> 
>         That's true. Besides, two commands are used for msi sharing mode,
> 
>         VIRTIO_MMIO_MSI_CMD_MAP_CONFIG and VIRTIO_MMIO_MSI_CMD_MAP_QUEUE.
> 
>         "To set up the event and vector mapping for MSI sharing mode, driver
>         SHOULD write a valid MsiVecSel followed by
>         VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE command to
>         map the configuration change/selected queue events respectively.  "
>         (See spec patch 5/5)
> 
>         So if driver detects the msi sharing mode, when it does setup vq,
>         writes the queue_sel (this already exists in setup vq), vector sel and
>         then MAP_QUEUE command to do the queue event mapping.
> 
> 
> 
>     So actually the per vq msix could be done through this.
> 
> Right, per vq msix can also be mapped by the 2 commands if we want. 
> 
> The current design benefits for those devices requesting per vq msi that driver
> 
> doesn't have to map during setup each queue,
> 
> since we define the relationship by default.

Point being to save some exits for configuration? How much does it
save? IMHO we need to find a way to drastically simplify this interface,
to cut down the new LOC to at most 100-200, proportionally to the
performance gain it gives.


> 
>     I don't get why you need to introduce MSI_SHARING_MASK which is the charge
>     of driver instead of device.
> 
> MSI_SHARING_MASK is just for identifying the msi_sharing bit in readl(MsiState)
> (0x0c4). The device tells whether it wants msi_sharing.
> 
> MsiState register: R
> 
> le32 {
>     msi_enabled : 1;
>     msi_sharing: 1;
>     reserved : 30;
> };
> 
>     The interrupt rate should have no direct relationship with whether it has
>     been shared or not.
> 
> 
> 
>     Btw, you introduce mask/unmask without pending, how to deal with the lost
>     interrupt during the masking then?
> 
> 
> 
>         For msi non-sharing mode, no special action is needed because we make
>         the rule of per_vq_vector and fixed relationship.
> 
>         Correct me if this is not that clear for spec/code comments.
> 
> 
> 
>     The ABI is not as straightforward as PCI did. Why not just reuse the PCI
>     layout?
> 
>     E.g having
> 
>     queue_sel
>     queue_msix_vector
>     msix_config
> 
>     for configuring map between msi vector and queues/config
> 
> Thanks for the advice. :)
> 
> Actually when looking into pci, the queue_msix_vector/msix_config is the msi
> vector index, which is the same as the mmio register MsiVecSel (0x0d0).
> 
> So we don't introduce two extra registers for mapping even in sharing mode.
> 
> What do you think?
> 
> 
> 
>     Then
> 
>     vector_sel
>     address
>     data
>     pending
>     mask
>     unmask
> 
>     for configuring msi table?
> 
> PCI-like msix table is not introduced to device and instead simply use commands
> to tell the mask/configure/enable.
> 
> Thanks!
> 
> Jing
> 
> 
>     Thanks
> 
> 
> 
>         Thanks!
> 
>         Jing
> 
> 
> 
> 
>             ?
> 
>             Thanks
> 
> 
> 
> 
> 
> 
>                     Thanks
> 
> 
> 
>                     ---------------------------------------------------------------------
> 
>                     To unsubscribe, e-mail:
>                     virtio-dev-unsubscribe@lists.oasis-open.org
>                     For additional commands, e-mail:
>                     virtio-dev-help@lists.oasis-open.org
> 
> 
> 
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/5] virtio-mmio: create a generic MSI irq domain
  2020-02-10  9:05 ` [PATCH v2 3/5] virtio-mmio: create a generic MSI irq domain Zha Bin
  2020-02-11 11:16   ` Michael S. Tsirkin
@ 2020-02-12  7:40   ` Michael S. Tsirkin
  1 sibling, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-12  7:40 UTC (permalink / raw)
  To: Zha Bin
  Cc: linux-kernel, jasowang, slp, virtio-dev, qemu-devel, gerry,
	jing2.liu, chao.p.peng

On Mon, Feb 10, 2020 at 05:05:19PM +0800, Zha Bin wrote:
> From: Liu Jiang <gerry@linux.alibaba.com>
> 
> Create a generic irq domain for all architectures which
> supports virtio-mmio. The device offering VIRTIO_F_MMIO_MSI
> feature bit can use this irq domain.
> 
> Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
> Co-developed-by: Zha Bin <zhabin@linux.alibaba.com>
> Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
> Co-developed-by: Jing Liu <jing2.liu@linux.intel.com>
> Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
> Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> ---
>  drivers/base/platform-msi.c      |  4 +-
>  drivers/virtio/Kconfig           |  9 ++++
>  drivers/virtio/virtio_mmio_msi.h | 93 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/msi.h              |  1 +
>  4 files changed, 105 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/virtio/virtio_mmio_msi.h
> 
> diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
> index 8da314b..45752f1 100644
> --- a/drivers/base/platform-msi.c
> +++ b/drivers/base/platform-msi.c
> @@ -31,12 +31,11 @@ struct platform_msi_priv_data {
>  /* The devid allocator */
>  static DEFINE_IDA(platform_msi_devid_ida);
>  
> -#ifdef GENERIC_MSI_DOMAIN_OPS
>  /*
>   * Convert an msi_desc to a globaly unique identifier (per-device
>   * devid + msi_desc position in the msi_list).
>   */
> -static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
> +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
>  {
>  	u32 devid;
>  
> @@ -45,6 +44,7 @@ static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
>  	return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index;
>  }
>  
> +#ifdef GENERIC_MSI_DOMAIN_OPS
>  static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
>  {
>  	arg->desc = desc;
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 078615c..551a9f7 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -84,6 +84,15 @@ config VIRTIO_MMIO
>  
>   	 If unsure, say N.
>  
> +config VIRTIO_MMIO_MSI
> +	bool "Memory-mapped virtio device MSI"
> +	depends on VIRTIO_MMIO && GENERIC_MSI_IRQ_DOMAIN && GENERIC_MSI_IRQ
> +	help
> +	 This allows device drivers to support msi interrupt handling for
> +	 virtio-mmio devices. It can improve performance greatly.
> +
> +	 If unsure, say N.
> +
>  config VIRTIO_MMIO_CMDLINE_DEVICES
>  	bool "Memory mapped virtio devices parameter parsing"
>  	depends on VIRTIO_MMIO
> diff --git a/drivers/virtio/virtio_mmio_msi.h b/drivers/virtio/virtio_mmio_msi.h
> new file mode 100644
> index 0000000..27cb2af
> --- /dev/null
> +++ b/drivers/virtio/virtio_mmio_msi.h
> @@ -0,0 +1,93 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
> +#define _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
> +
> +#ifdef CONFIG_VIRTIO_MMIO_MSI
> +
> +#include <linux/msi.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/platform_device.h>
> +
> +static irq_hw_number_t mmio_msi_hwirq;
> +static struct irq_domain *mmio_msi_domain;

So each instance of including this header will create its own
copy of the variables. That won't do the right thing.


> +
> +struct irq_domain *__weak arch_msi_root_irq_domain(void)
> +{
> +	return NULL;
> +}
> +
> +void __weak irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
> +{
> +}
> +
> +static void mmio_msi_mask_irq(struct irq_data *data)
> +{
> +}
> +
> +static void mmio_msi_unmask_irq(struct irq_data *data)
> +{
> +}
> +
> +static struct irq_chip mmio_msi_controller = {
> +	.name			= "VIRTIO-MMIO-MSI",
> +	.irq_mask		= mmio_msi_mask_irq,
> +	.irq_unmask		= mmio_msi_unmask_irq,
> +	.irq_ack		= irq_chip_ack_parent,
> +	.irq_retrigger		= irq_chip_retrigger_hierarchy,
> +	.irq_compose_msi_msg	= irq_msi_compose_msg,
> +	.flags			= IRQCHIP_SKIP_SET_WAKE,
> +};
> +
> +static int mmio_msi_prepare(struct irq_domain *domain, struct device *dev,
> +				int nvec, msi_alloc_info_t *arg)
> +{
> +	memset(arg, 0, sizeof(*arg));
> +	return 0;
> +}
> +
> +static void mmio_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
> +{
> +	mmio_msi_hwirq = platform_msi_calc_hwirq(desc);
> +}
> +
> +static irq_hw_number_t mmio_msi_get_hwirq(struct msi_domain_info *info,
> +					      msi_alloc_info_t *arg)
> +{
> +	return mmio_msi_hwirq;
> +}
> +
> +static struct msi_domain_ops mmio_msi_domain_ops = {
> +	.msi_prepare	= mmio_msi_prepare,
> +	.set_desc	= mmio_msi_set_desc,
> +	.get_hwirq	= mmio_msi_get_hwirq,
> +};
> +
> +static struct msi_domain_info mmio_msi_domain_info = {
> +	.flags          = MSI_FLAG_USE_DEF_DOM_OPS |
> +			  MSI_FLAG_USE_DEF_CHIP_OPS |
> +			  MSI_FLAG_ACTIVATE_EARLY,
> +	.ops            = &mmio_msi_domain_ops,
> +	.chip           = &mmio_msi_controller,
> +	.handler        = handle_edge_irq,
> +	.handler_name   = "edge",
> +};
> +
> +static inline void mmio_msi_create_irq_domain(void)
> +{
> +	struct fwnode_handle *fn;
> +	struct irq_domain *parent = arch_msi_root_irq_domain();
> +
> +	fn = irq_domain_alloc_named_fwnode("VIRTIO-MMIO-MSI");
> +	if (fn && parent) {
> +		mmio_msi_domain =
> +			platform_msi_create_irq_domain(fn,
> +				&mmio_msi_domain_info, parent);
> +		irq_domain_free_fwnode(fn);
> +	}
> +}

It does not look like anyone cleans up this domain.
So how does this work with a modular virtio mmio?
what happens when you unload the module?


> +#else
> +static inline void mmio_msi_create_irq_domain(void) {}
> +#endif
> +
> +#endif
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 8ad679e..ee5f566 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -362,6 +362,7 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
>  void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq,
>  			      unsigned int nvec);
>  void *platform_msi_get_host_data(struct irq_domain *domain);
> +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc);
>  #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
>  
>  #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> -- 
> 1.8.3.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue
  2020-02-12  3:39     ` Jason Wang
@ 2020-02-12  8:18       ` Michael S. Tsirkin
  2020-02-12  8:53         ` Jason Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-12  8:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zha Bin, virtio-dev, slp, jing2.liu, linux-kernel, qemu-devel,
	chao.p.peng, gerry

On Wed, Feb 12, 2020 at 11:39:54AM +0800, Jason Wang wrote:
> 
> On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
> > On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
> > > From: Liu Jiang<gerry@linux.alibaba.com>
> > > 
> > > The standard virtio-mmio devices use notification register to signal
> > > backend. This will cause vmexits and slow down the performance when we
> > > passthrough the virtio-mmio devices to guest virtual machines.
> > > We proposed to update virtio over MMIO spec to add the per-queue
> > > notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
> > > configure notify location for each queue.
> > > 
> > > [1]https://lkml.org/lkml/2020/1/21/31
> > > 
> > > Signed-off-by: Liu Jiang<gerry@linux.alibaba.com>
> > > Co-developed-by: Zha Bin<zhabin@linux.alibaba.com>
> > > Signed-off-by: Zha Bin<zhabin@linux.alibaba.com>
> > > Co-developed-by: Jing Liu<jing2.liu@linux.intel.com>
> > > Signed-off-by: Jing Liu<jing2.liu@linux.intel.com>
> > > Co-developed-by: Chao Peng<chao.p.peng@linux.intel.com>
> > > Signed-off-by: Chao Peng<chao.p.peng@linux.intel.com>
> > Hmm. Any way to make this static so we don't need
> > base and multiplier?
> 
> 
> E.g page per vq?
> 
> Thanks

Problem is, is page size well defined enough?
Are there cases where guest and host page sizes differ?
I suspect there might be.

But I also think this whole patch is unproven. Is someone actually
working on QEMU code to support pass-trough of virtio-pci
as virtio-mmio for nested guests? What's the performance
gain like?

-- 
MST


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue
  2020-02-12  8:18       ` Michael S. Tsirkin
@ 2020-02-12  8:53         ` Jason Wang
  2020-02-12  9:33           ` Jason Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2020-02-12  8:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zha Bin, virtio-dev, slp, jing2.liu, linux-kernel, qemu-devel,
	chao.p.peng, gerry


On 2020/2/12 下午4:18, Michael S. Tsirkin wrote:
> On Wed, Feb 12, 2020 at 11:39:54AM +0800, Jason Wang wrote:
>> On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
>>> On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
>>>> From: Liu Jiang<gerry@linux.alibaba.com>
>>>>
>>>> The standard virtio-mmio devices use notification register to signal
>>>> backend. This will cause vmexits and slow down the performance when we
>>>> passthrough the virtio-mmio devices to guest virtual machines.
>>>> We proposed to update virtio over MMIO spec to add the per-queue
>>>> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
>>>> configure notify location for each queue.
>>>>
>>>> [1]https://lkml.org/lkml/2020/1/21/31
>>>>
>>>> Signed-off-by: Liu Jiang<gerry@linux.alibaba.com>
>>>> Co-developed-by: Zha Bin<zhabin@linux.alibaba.com>
>>>> Signed-off-by: Zha Bin<zhabin@linux.alibaba.com>
>>>> Co-developed-by: Jing Liu<jing2.liu@linux.intel.com>
>>>> Signed-off-by: Jing Liu<jing2.liu@linux.intel.com>
>>>> Co-developed-by: Chao Peng<chao.p.peng@linux.intel.com>
>>>> Signed-off-by: Chao Peng<chao.p.peng@linux.intel.com>
>>> Hmm. Any way to make this static so we don't need
>>> base and multiplier?
>> E.g page per vq?
>>
>> Thanks
> Problem is, is page size well defined enough?
> Are there cases where guest and host page sizes differ?
> I suspect there might be.


Right, so it looks better to keep base and multiplier, e.g for vDPA.


>
> But I also think this whole patch is unproven. Is someone actually
> working on QEMU code to support pass-trough of virtio-pci
> as virtio-mmio for nested guests? What's the performance
> gain like?


I don't know.

Thanks


>
> -- MST


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-11 14:00                     ` Michael S. Tsirkin
@ 2020-02-12  9:03                       ` Jason Wang
  2020-02-12  9:15                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2020-02-12  9:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, Zha Bin, slp, Liu, Jing2, linux-kernel, qemu-devel,
	chao.p.peng, gerry


On 2020/2/11 下午10:00, Michael S. Tsirkin wrote:
>> Yes, it can work but it may bring extra effort when you want to mask a
>> vector which is was shared by a lot of queues.
>>
>> Thanks
>>
> masking should be per vq too.
>
> -- 


Yes, but if the vector is shard by e.g 16 queues, then all those 
virtqueues needs to be masked which is expensive.

Thanks



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
       [not found]             ` <4c19292f-9d25-a859-3dde-6dd5a03fdf0b@linux.intel.com>
  2020-02-12  7:33               ` Michael S. Tsirkin
@ 2020-02-12  9:06               ` Jason Wang
  2020-02-12  9:16                 ` Michael S. Tsirkin
  1 sibling, 1 reply; 40+ messages in thread
From: Jason Wang @ 2020-02-12  9:06 UTC (permalink / raw)
  To: Liu, Jing2, Zha Bin, linux-kernel
  Cc: virtio-dev, slp, mst, qemu-devel, chao.p.peng, gerry


On 2020/2/12 上午11:54, Liu, Jing2 wrote:
>
>
> On 2/11/2020 3:40 PM, Jason Wang wrote:
>>
>> On 2020/2/11 下午2:02, Liu, Jing2 wrote:
>>>
>>>
>>> On 2/11/2020 12:02 PM, Jason Wang wrote:
>>>>
>>>> On 2020/2/11 上午11:35, Liu, Jing2 wrote:
>>>>>
>>>>> On 2/11/2020 11:17 AM, Jason Wang wrote:
>>>>>>
>>>>>> On 2020/2/10 下午5:05, Zha Bin wrote:
>>>>>>> From: Liu Jiang<gerry@linux.alibaba.com>
>>>>>>>
>>>>>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage 
>>>>>>> of using
>>>>>>> virtio over mmio devices as a lightweight machine model for modern
>>>>>>> cloud. The standard virtio over MMIO transport layer only 
>>>>>>> supports one
>>>>>>> legacy interrupt, which is much heavier than virtio over PCI 
>>>>>>> transport
>>>>>>> layer using MSI. Legacy interrupt has long work path and causes 
>>>>>>> specific
>>>>>>> VMExits in following cases, which would considerably slow down the
>>>>>>> performance:
>>>>>>>
>>>>>>> 1) read interrupt status register
>>>>>>> 2) update interrupt status register
>>>>>>> 3) write IOAPIC EOI register
>>>>>>>
>>>>>>> We proposed to add MSI support for virtio over MMIO via new feature
>>>>>>> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
>>>>>>>
>>>>>>> With the VIRTIO_F_MMIO_MSI feature bit supported, the 
>>>>>>> virtio-mmio MSI
>>>>>>> uses msi_sharing[1] to indicate the event and vector mapping.
>>>>>>> Bit 1 is 0: device uses non-sharing and fixed vector per event 
>>>>>>> mapping.
>>>>>>> Bit 1 is 1: device uses sharing mode and dynamic mapping.
>>>>>>
>>>>>>
>>>>>> I believe dynamic mapping should cover the case of fixed vector?
>>>>>>
>>>>> Actually this bit *aims* for msi sharing or msi non-sharing.
>>>>>
>>>>> It means, when msi sharing bit is 1, device doesn't want vector 
>>>>> per queue
>>>>>
>>>>> (it wants msi vector sharing as name) and doesn't want a high 
>>>>> interrupt rate.
>>>>>
>>>>> So driver turns to !per_vq_vectors and has to do dynamical mapping.
>>>>>
>>>>> So they are opposite not superset.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Jing
>>>>
>>>>
>>>> I think you need add more comments on the command.
>>>>
>>>> E.g if I want to map vector 0 to queue 1, how do I need to do?
>>>>
>>>> write(1, queue_sel);
>>>> write(0, vector_sel);
>>>
>>> That's true. Besides, two commands are used for msi sharing mode,
>>>
>>> VIRTIO_MMIO_MSI_CMD_MAP_CONFIG and VIRTIO_MMIO_MSI_CMD_MAP_QUEUE.
>>>
>>> "To set up the event and vector mapping for MSI sharing mode, driver 
>>> SHOULD write a valid MsiVecSel followed by 
>>> VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE command 
>>> to map the configuration change/selected queue events respectively.  
>>> " (See spec patch 5/5)
>>>
>>> So if driver detects the msi sharing mode, when it does setup vq, 
>>> writes the queue_sel (this already exists in setup vq), vector sel 
>>> and then MAP_QUEUE command to do the queue event mapping.
>>>
>>
>> So actually the per vq msix could be done through this. 
>
> Right, per vq msix can also be mapped by the 2 commands if we want.
>
> The current design benefits for those devices requesting per vq msi 
> that driver
>
> doesn't have to map during setup each queue,
>
> since we define the relationship by default.
>

Well since you've defined the dynamic mapping, having some "default" 
mapping won't help to reduce the complexity but increase it.


>
>> I don't get why you need to introduce MSI_SHARING_MASK which is the 
>> charge of driver instead of device. 
>
> MSI_SHARING_MASK is just for identifying the msi_sharing bit in 
> readl(MsiState) (0x0c4). The device tells whether it wants msi_sharing.
>
> MsiState register: R
>
> le32 {
>     msi_enabled : 1;
>     msi_sharing: 1;
>     reserved : 30;
> };
>

The question is why device want such information.


>
>> The interrupt rate should have no direct relationship with whether it 
>> has been shared or not.
>
>>
>> Btw, you introduce mask/unmask without pending, how to deal with the 
>> lost interrupt during the masking then?
>>
>>
>>> For msi non-sharing mode, no special action is needed because we 
>>> make the rule of per_vq_vector and fixed relationship.
>>>
>>> Correct me if this is not that clear for spec/code comments.
>>>
>>
>> The ABI is not as straightforward as PCI did. Why not just reuse the 
>> PCI layout?
>>
>> E.g having
>>
>> queue_sel
>> queue_msix_vector
>> msix_config
>>
>> for configuring map between msi vector and queues/config
>
> Thanks for the advice. :)
>
> Actually when looking into pci, the queue_msix_vector/msix_config is 
> the msi vector index, which is the same as the mmio register MsiVecSel 
> (0x0d0).
>
> So we don't introduce two extra registers for mapping even in sharing 
> mode.
>
> What do you think?
>

I'm not sure I get the point, but I still prefer the separate vector_sel 
from queue_msix_vector.

Btw, Michael propose per vq registers which could also work.

Thanks


>
>>
>> Then
>>
>> vector_sel
>> address
>> data
>> pending
>> mask
>> unmask
>>
>> for configuring msi table?
>
> PCI-like msix table is not introduced to device and instead simply use 
> commands to tell the mask/configure/enable.
>
> Thanks!
>
> Jing
>
>>
>> Thanks
>>
>>
>>> Thanks!
>>>
>>> Jing
>>>
>>>
>>>>
>>>> ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>>> For additional commands, e-mail: 
>>>>>> virtio-dev-help@lists.oasis-open.org
>>>>>>
>>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-12  9:03                       ` Jason Wang
@ 2020-02-12  9:15                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-12  9:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, Zha Bin, slp, Liu, Jing2, linux-kernel, qemu-devel,
	chao.p.peng, gerry

On Wed, Feb 12, 2020 at 05:03:00PM +0800, Jason Wang wrote:
> 
> On 2020/2/11 下午10:00, Michael S. Tsirkin wrote:
> > > Yes, it can work but it may bring extra effort when you want to mask a
> > > vector which is was shared by a lot of queues.
> > > 
> > > Thanks
> > > 
> > masking should be per vq too.
> > 
> > -- 
> 
> 
> Yes, but if the vector is shard by e.g 16 queues, then all those virtqueues
> needs to be masked which is expensive.
> 
> Thanks

I think that's ok - masking is rare. in fact if vectors can be
changed atomically I'm no longer sure they are needed,
maybe a destructive "disable" which can lose interrupts
is enough.

-- 
MST


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-12  9:06               ` Jason Wang
@ 2020-02-12  9:16                 ` Michael S. Tsirkin
  2020-02-13  3:40                   ` Jason Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-12  9:16 UTC (permalink / raw)
  To: Jason Wang
  Cc: Liu, Jing2, Zha Bin, linux-kernel, virtio-dev, slp, qemu-devel,
	chao.p.peng, gerry

On Wed, Feb 12, 2020 at 05:06:52PM +0800, Jason Wang wrote:
> 
> On 2020/2/12 上午11:54, Liu, Jing2 wrote:
> > 
> > 
> > On 2/11/2020 3:40 PM, Jason Wang wrote:
> > > 
> > > On 2020/2/11 下午2:02, Liu, Jing2 wrote:
> > > > 
> > > > 
> > > > On 2/11/2020 12:02 PM, Jason Wang wrote:
> > > > > 
> > > > > On 2020/2/11 上午11:35, Liu, Jing2 wrote:
> > > > > > 
> > > > > > On 2/11/2020 11:17 AM, Jason Wang wrote:
> > > > > > > 
> > > > > > > On 2020/2/10 下午5:05, Zha Bin wrote:
> > > > > > > > From: Liu Jiang<gerry@linux.alibaba.com>
> > > > > > > > 
> > > > > > > > Userspace VMMs (e.g. Qemu microvm, Firecracker)
> > > > > > > > take advantage of using
> > > > > > > > virtio over mmio devices as a lightweight machine model for modern
> > > > > > > > cloud. The standard virtio over MMIO transport
> > > > > > > > layer only supports one
> > > > > > > > legacy interrupt, which is much heavier than
> > > > > > > > virtio over PCI transport
> > > > > > > > layer using MSI. Legacy interrupt has long work
> > > > > > > > path and causes specific
> > > > > > > > VMExits in following cases, which would considerably slow down the
> > > > > > > > performance:
> > > > > > > > 
> > > > > > > > 1) read interrupt status register
> > > > > > > > 2) update interrupt status register
> > > > > > > > 3) write IOAPIC EOI register
> > > > > > > > 
> > > > > > > > We proposed to add MSI support for virtio over MMIO via new feature
> > > > > > > > bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
> > > > > > > > 
> > > > > > > > With the VIRTIO_F_MMIO_MSI feature bit
> > > > > > > > supported, the virtio-mmio MSI
> > > > > > > > uses msi_sharing[1] to indicate the event and vector mapping.
> > > > > > > > Bit 1 is 0: device uses non-sharing and fixed
> > > > > > > > vector per event mapping.
> > > > > > > > Bit 1 is 1: device uses sharing mode and dynamic mapping.
> > > > > > > 
> > > > > > > 
> > > > > > > I believe dynamic mapping should cover the case of fixed vector?
> > > > > > > 
> > > > > > Actually this bit *aims* for msi sharing or msi non-sharing.
> > > > > > 
> > > > > > It means, when msi sharing bit is 1, device doesn't want
> > > > > > vector per queue
> > > > > > 
> > > > > > (it wants msi vector sharing as name) and doesn't want a
> > > > > > high interrupt rate.
> > > > > > 
> > > > > > So driver turns to !per_vq_vectors and has to do dynamical mapping.
> > > > > > 
> > > > > > So they are opposite not superset.
> > > > > > 
> > > > > > Thanks!
> > > > > > 
> > > > > > Jing
> > > > > 
> > > > > 
> > > > > I think you need add more comments on the command.
> > > > > 
> > > > > E.g if I want to map vector 0 to queue 1, how do I need to do?
> > > > > 
> > > > > write(1, queue_sel);
> > > > > write(0, vector_sel);
> > > > 
> > > > That's true. Besides, two commands are used for msi sharing mode,
> > > > 
> > > > VIRTIO_MMIO_MSI_CMD_MAP_CONFIG and VIRTIO_MMIO_MSI_CMD_MAP_QUEUE.
> > > > 
> > > > "To set up the event and vector mapping for MSI sharing mode,
> > > > driver SHOULD write a valid MsiVecSel followed by
> > > > VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE
> > > > command to map the configuration change/selected queue events
> > > > respectively.  " (See spec patch 5/5)
> > > > 
> > > > So if driver detects the msi sharing mode, when it does setup
> > > > vq, writes the queue_sel (this already exists in setup vq),
> > > > vector sel and then MAP_QUEUE command to do the queue event
> > > > mapping.
> > > > 
> > > 
> > > So actually the per vq msix could be done through this.
> > 
> > Right, per vq msix can also be mapped by the 2 commands if we want.
> > 
> > The current design benefits for those devices requesting per vq msi that
> > driver
> > 
> > doesn't have to map during setup each queue,
> > 
> > since we define the relationship by default.
> > 
> 
> Well since you've defined the dynamic mapping, having some "default" mapping
> won't help to reduce the complexity but increase it.
> 
> 
> > 
> > > I don't get why you need to introduce MSI_SHARING_MASK which is the
> > > charge of driver instead of device.
> > 
> > MSI_SHARING_MASK is just for identifying the msi_sharing bit in
> > readl(MsiState) (0x0c4). The device tells whether it wants msi_sharing.
> > 
> > MsiState register: R
> > 
> > le32 {
> >     msi_enabled : 1;
> >     msi_sharing: 1;
> >     reserved : 30;
> > };
> > 
> 
> The question is why device want such information.
> 
> 
> > 
> > > The interrupt rate should have no direct relationship with whether
> > > it has been shared or not.
> > 
> > > 
> > > Btw, you introduce mask/unmask without pending, how to deal with the
> > > lost interrupt during the masking then?
> > > 
> > > 
> > > > For msi non-sharing mode, no special action is needed because we
> > > > make the rule of per_vq_vector and fixed relationship.
> > > > 
> > > > Correct me if this is not that clear for spec/code comments.
> > > > 
> > > 
> > > The ABI is not as straightforward as PCI did. Why not just reuse the
> > > PCI layout?
> > > 
> > > E.g having
> > > 
> > > queue_sel
> > > queue_msix_vector
> > > msix_config
> > > 
> > > for configuring map between msi vector and queues/config
> > 
> > Thanks for the advice. :)
> > 
> > Actually when looking into pci, the queue_msix_vector/msix_config is the
> > msi vector index, which is the same as the mmio register MsiVecSel
> > (0x0d0).
> > 
> > So we don't introduce two extra registers for mapping even in sharing
> > mode.
> > 
> > What do you think?
> > 
> 
> I'm not sure I get the point, but I still prefer the separate vector_sel
> from queue_msix_vector.
> 
> Btw, Michael propose per vq registers which could also work.
> 
> Thanks
> 

Right and I'd even ask a question: do we need shared MSI at all?
Is it somehow better than legacy interrupt? And why?
Performance numbers please.

> > 
> > > 
> > > Then
> > > 
> > > vector_sel
> > > address
> > > data
> > > pending
> > > mask
> > > unmask
> > > 
> > > for configuring msi table?
> > 
> > PCI-like msix table is not introduced to device and instead simply use
> > commands to tell the mask/configure/enable.
> > 
> > Thanks!
> > 
> > Jing
> > 
> > > 
> > > Thanks
> > > 
> > > 
> > > > Thanks!
> > > > 
> > > > Jing
> > > > 
> > > > 
> > > > > 
> > > > > ?
> > > > > 
> > > > > Thanks
> > > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > > Thanks
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > ---------------------------------------------------------------------
> > > > > > > 
> > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > For additional commands, e-mail:
> > > > > > > virtio-dev-help@lists.oasis-open.org
> > > > > > > 
> > > > > > 
> > > > > 
> > > 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue
  2020-02-12  8:53         ` Jason Wang
@ 2020-02-12  9:33           ` Jason Wang
  2020-02-12  9:55             ` Michael S. Tsirkin
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2020-02-12  9:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zha Bin, virtio-dev, slp, jing2.liu, linux-kernel, qemu-devel,
	chao.p.peng, gerry


On 2020/2/12 下午4:53, Jason Wang wrote:
>
> On 2020/2/12 下午4:18, Michael S. Tsirkin wrote:
>> On Wed, Feb 12, 2020 at 11:39:54AM +0800, Jason Wang wrote:
>>> On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
>>>> On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
>>>>> From: Liu Jiang<gerry@linux.alibaba.com>
>>>>>
>>>>> The standard virtio-mmio devices use notification register to signal
>>>>> backend. This will cause vmexits and slow down the performance 
>>>>> when we
>>>>> passthrough the virtio-mmio devices to guest virtual machines.
>>>>> We proposed to update virtio over MMIO spec to add the per-queue
>>>>> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
>>>>> configure notify location for each queue.
>>>>>
>>>>> [1]https://lkml.org/lkml/2020/1/21/31
>>>>>
>>>>> Signed-off-by: Liu Jiang<gerry@linux.alibaba.com>
>>>>> Co-developed-by: Zha Bin<zhabin@linux.alibaba.com>
>>>>> Signed-off-by: Zha Bin<zhabin@linux.alibaba.com>
>>>>> Co-developed-by: Jing Liu<jing2.liu@linux.intel.com>
>>>>> Signed-off-by: Jing Liu<jing2.liu@linux.intel.com>
>>>>> Co-developed-by: Chao Peng<chao.p.peng@linux.intel.com>
>>>>> Signed-off-by: Chao Peng<chao.p.peng@linux.intel.com>
>>>> Hmm. Any way to make this static so we don't need
>>>> base and multiplier?
>>> E.g page per vq?
>>>
>>> Thanks
>> Problem is, is page size well defined enough?
>> Are there cases where guest and host page sizes differ?
>> I suspect there might be.
>
>
> Right, so it looks better to keep base and multiplier, e.g for vDPA.
>
>
>>
>> But I also think this whole patch is unproven. Is someone actually
>> working on QEMU code to support pass-trough of virtio-pci
>> as virtio-mmio for nested guests? What's the performance
>> gain like?
>
>
> I don't know.
>
> Thanks


Btw, I think there's no need for a nested environment to test. Current 
eventfd hook to MSIX should still work for MMIO.

Thanks



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue
  2020-02-12  9:33           ` Jason Wang
@ 2020-02-12  9:55             ` Michael S. Tsirkin
  2020-02-13  3:38               ` Jason Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2020-02-12  9:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zha Bin, virtio-dev, slp, jing2.liu, linux-kernel, qemu-devel,
	chao.p.peng, gerry

On Wed, Feb 12, 2020 at 05:33:06PM +0800, Jason Wang wrote:
> 
> On 2020/2/12 下午4:53, Jason Wang wrote:
> > 
> > On 2020/2/12 下午4:18, Michael S. Tsirkin wrote:
> > > On Wed, Feb 12, 2020 at 11:39:54AM +0800, Jason Wang wrote:
> > > > On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
> > > > > On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
> > > > > > From: Liu Jiang<gerry@linux.alibaba.com>
> > > > > > 
> > > > > > The standard virtio-mmio devices use notification register to signal
> > > > > > backend. This will cause vmexits and slow down the
> > > > > > performance when we
> > > > > > passthrough the virtio-mmio devices to guest virtual machines.
> > > > > > We proposed to update virtio over MMIO spec to add the per-queue
> > > > > > notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
> > > > > > configure notify location for each queue.
> > > > > > 
> > > > > > [1]https://lkml.org/lkml/2020/1/21/31
> > > > > > 
> > > > > > Signed-off-by: Liu Jiang<gerry@linux.alibaba.com>
> > > > > > Co-developed-by: Zha Bin<zhabin@linux.alibaba.com>
> > > > > > Signed-off-by: Zha Bin<zhabin@linux.alibaba.com>
> > > > > > Co-developed-by: Jing Liu<jing2.liu@linux.intel.com>
> > > > > > Signed-off-by: Jing Liu<jing2.liu@linux.intel.com>
> > > > > > Co-developed-by: Chao Peng<chao.p.peng@linux.intel.com>
> > > > > > Signed-off-by: Chao Peng<chao.p.peng@linux.intel.com>
> > > > > Hmm. Any way to make this static so we don't need
> > > > > base and multiplier?
> > > > E.g page per vq?
> > > > 
> > > > Thanks
> > > Problem is, is page size well defined enough?
> > > Are there cases where guest and host page sizes differ?
> > > I suspect there might be.
> > 
> > 
> > Right, so it looks better to keep base and multiplier, e.g for vDPA.
> > 
> > 
> > > 
> > > But I also think this whole patch is unproven. Is someone actually
> > > working on QEMU code to support pass-trough of virtio-pci
> > > as virtio-mmio for nested guests? What's the performance
> > > gain like?
> > 
> > 
> > I don't know.
> > 
> > Thanks
> 
> 
> Btw, I think there's no need for a nested environment to test. Current
> eventfd hook to MSIX should still work for MMIO.
> 
> Thanks


Oh yes it's the wildcard thingy but how much extra performance does one get
from it with MMIO? A couple % might not be worth the trouble for MMIO.

-- 
MST


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue
  2020-02-12  9:55             ` Michael S. Tsirkin
@ 2020-02-13  3:38               ` Jason Wang
  0 siblings, 0 replies; 40+ messages in thread
From: Jason Wang @ 2020-02-13  3:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, Zha Bin, slp, jing2.liu, linux-kernel, qemu-devel,
	chao.p.peng, gerry


On 2020/2/12 下午5:55, Michael S. Tsirkin wrote:
> On Wed, Feb 12, 2020 at 05:33:06PM +0800, Jason Wang wrote:
>> On 2020/2/12 下午4:53, Jason Wang wrote:
>>> On 2020/2/12 下午4:18, Michael S. Tsirkin wrote:
>>>> On Wed, Feb 12, 2020 at 11:39:54AM +0800, Jason Wang wrote:
>>>>> On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
>>>>>> On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
>>>>>>> From: Liu Jiang<gerry@linux.alibaba.com>
>>>>>>>
>>>>>>> The standard virtio-mmio devices use notification register to signal
>>>>>>> backend. This will cause vmexits and slow down the
>>>>>>> performance when we
>>>>>>> passthrough the virtio-mmio devices to guest virtual machines.
>>>>>>> We proposed to update virtio over MMIO spec to add the per-queue
>>>>>>> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
>>>>>>> configure notify location for each queue.
>>>>>>>
>>>>>>> [1]https://lkml.org/lkml/2020/1/21/31
>>>>>>>
>>>>>>> Signed-off-by: Liu Jiang<gerry@linux.alibaba.com>
>>>>>>> Co-developed-by: Zha Bin<zhabin@linux.alibaba.com>
>>>>>>> Signed-off-by: Zha Bin<zhabin@linux.alibaba.com>
>>>>>>> Co-developed-by: Jing Liu<jing2.liu@linux.intel.com>
>>>>>>> Signed-off-by: Jing Liu<jing2.liu@linux.intel.com>
>>>>>>> Co-developed-by: Chao Peng<chao.p.peng@linux.intel.com>
>>>>>>> Signed-off-by: Chao Peng<chao.p.peng@linux.intel.com>
>>>>>> Hmm. Any way to make this static so we don't need
>>>>>> base and multiplier?
>>>>> E.g page per vq?
>>>>>
>>>>> Thanks
>>>> Problem is, is page size well defined enough?
>>>> Are there cases where guest and host page sizes differ?
>>>> I suspect there might be.
>>>
>>> Right, so it looks better to keep base and multiplier, e.g for vDPA.
>>>
>>>
>>>> But I also think this whole patch is unproven. Is someone actually
>>>> working on QEMU code to support pass-trough of virtio-pci
>>>> as virtio-mmio for nested guests? What's the performance
>>>> gain like?
>>>
>>> I don't know.
>>>
>>> Thanks
>>
>> Btw, I think there's no need for a nested environment to test. Current
>> eventfd hook to MSIX should still work for MMIO.
>>
>> Thanks
>
> Oh yes it's the wildcard thingy but how much extra performance does one get
> from it with MMIO? A couple % might not be worth the trouble for MMIO.


The cover letter have some numbers but I'm not sure whether or not it 
was measured by vhost or other which needs some clarification.

Thanks


>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support
  2020-02-12  9:16                 ` Michael S. Tsirkin
@ 2020-02-13  3:40                   ` Jason Wang
  0 siblings, 0 replies; 40+ messages in thread
From: Jason Wang @ 2020-02-13  3:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Liu, Jing2, Zha Bin, linux-kernel, virtio-dev, slp, qemu-devel,
	chao.p.peng, gerry


On 2020/2/12 下午5:16, Michael S. Tsirkin wrote:
>>> Thanks for the advice.:)
>>>
>>> Actually when looking into pci, the queue_msix_vector/msix_config is the
>>> msi vector index, which is the same as the mmio register MsiVecSel
>>> (0x0d0).
>>>
>>> So we don't introduce two extra registers for mapping even in sharing
>>> mode.
>>>
>>> What do you think?
>>>
>> I'm not sure I get the point, but I still prefer the separate vector_sel
>> from queue_msix_vector.
>>
>> Btw, Michael propose per vq registers which could also work.
>>
>> Thanks
>>
> Right and I'd even ask a question: do we need shared MSI at all?


I guess it is still needed at least for the current virtio code. 
Technically we may have thousands queues.

Thanks


> Is it somehow better than legacy interrupt? And why?
> Performance numbers please.
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2020-02-13  3:40 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10  9:05 [PATCH v2 0/5] virtio mmio specification enhancement Zha Bin
2020-02-10  9:05 ` [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue Zha Bin
2020-02-11 10:50   ` Michael S. Tsirkin
2020-02-11 11:33   ` Michael S. Tsirkin
2020-02-12  3:39     ` Jason Wang
2020-02-12  8:18       ` Michael S. Tsirkin
2020-02-12  8:53         ` Jason Wang
2020-02-12  9:33           ` Jason Wang
2020-02-12  9:55             ` Michael S. Tsirkin
2020-02-13  3:38               ` Jason Wang
2020-02-10  9:05 ` [PATCH v2 2/5] virtio-mmio: refactor common functionality Zha Bin
2020-02-11 11:19   ` Michael S. Tsirkin
2020-02-12  2:58     ` [virtio-dev] " Liu, Jing2
2020-02-12  7:29       ` Michael S. Tsirkin
2020-02-10  9:05 ` [PATCH v2 3/5] virtio-mmio: create a generic MSI irq domain Zha Bin
2020-02-11 11:16   ` Michael S. Tsirkin
2020-02-12  7:40   ` Michael S. Tsirkin
2020-02-10  9:05 ` [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support Zha Bin
2020-02-11  3:17   ` Jason Wang
2020-02-11  3:35     ` [virtio-dev] " Liu, Jing2
2020-02-11  4:02       ` Jason Wang
     [not found]         ` <5522f205-207b-b012-6631-3cc77dde3bfe@linux.intel.com>
2020-02-11  7:40           ` Jason Wang
2020-02-11 11:58             ` Michael S. Tsirkin
2020-02-11 12:04               ` Jason Wang
2020-02-11 12:08                 ` Michael S. Tsirkin
2020-02-11 12:18                   ` Jason Wang
2020-02-11 14:00                     ` Michael S. Tsirkin
2020-02-12  9:03                       ` Jason Wang
2020-02-12  9:15                         ` Michael S. Tsirkin
     [not found]             ` <4c19292f-9d25-a859-3dde-6dd5a03fdf0b@linux.intel.com>
2020-02-12  7:33               ` Michael S. Tsirkin
2020-02-12  9:06               ` Jason Wang
2020-02-12  9:16                 ` Michael S. Tsirkin
2020-02-13  3:40                   ` Jason Wang
2020-02-11 11:21       ` Michael S. Tsirkin
2020-02-11 11:11   ` Michael S. Tsirkin
2020-02-10  9:05 ` [PATCH v2 5/5] x86: virtio-mmio: support virtio-mmio with MSI for x86 Zha Bin
2020-02-11 11:14   ` Michael S. Tsirkin
2020-02-10 11:44 ` [PATCH v2 0/5] virtio mmio specification enhancement Michael S. Tsirkin
2020-02-11 16:05   ` Chao Peng
2020-02-11 10:57     ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).