KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH V4 0/6] IRQ offloading for vDPA
@ 2020-07-28  4:23 Zhu Lingshan
  2020-07-28  4:24 ` [PATCH V4 1/6] vhost: introduce vhost_vring_call Zhu Lingshan
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Zhu Lingshan @ 2020-07-28  4:23 UTC (permalink / raw)
  To: jasowang, alex.williamson, mst, pbonzini, sean.j.christopherson,
	wanpengli
  Cc: virtualization, netdev, kvm, eli, shahafs, parav, Zhu Lingshan

This series intends to implement IRQ offloading for
vhost_vdpa.

By the feat of irq forwarding facilities like posted
interrupt on X86, irq bypass can  help deliver
interrupts to vCPU directly.

vDPA devices have dedicated hardware backends like VFIO
pass-throughed devices. So it would be possible to setup
irq offloading(irq bypass) for vDPA devices and gain
performance improvements.

In my testing, with this feature, we can save 0.1ms
in a ping between two VFs on average.
changes from V3:
(1)removed vDPA irq allocate/free helpers in vDPA core.
(2)add a new function get_vq_irq() in struct vdpa_config_ops,
upper layer driver can use this function to: A. query the
irq numbner of a vq. B. detect whether a vq is enabled.
(3)implement get_vq_irq() in ifcvf driver.
(4)in vhost_vdpa, set_status() will setup irq offloading when
setting DRIVER_OK, and unsetup when receive !DRIVER_OK.
(5)minor improvements.

changes from V2:
(1)rename struct vhost_call_ctx to vhost_vring_call
(2)add kvm_arch_end_assignment() in del_producer()
code path
(3)rename vDPA helpers to vdpa_devm_request_irq()
and vdpa_devm_free_irq(). Add better comments
for them.
(4)better comments for setup_vq_irq() and
unsetup_vq_irq()
(5)In vDPA VHOST_SET_VRING_CALL, will call vhost_vdpa_update_vq_irq()
without checking producer.irq, move this check into
vhost_vdpa_update_vq_irq(), so that get advantage of the spinlock.
(6)Add a function vhost_vdpa_clean_irq(), this function will unregister
the producer of vqs when vhost_vdpa_release(). This is safe
for control vq.
(7) minor improvements

changes from V1:
(1)dropped vfio changes.
(3)removed KVM_HVAE_IRQ_BYPASS checks
(4)locking fixes
(5)simplified vhost_vdpa_updat


Zhu Lingshan (6):
  vhost: introduce vhost_vring_call
  kvm: detect assigned device via irqbypass manager
  vDPA: add get_vq_irq() in vdpa_config_ops
  vhost_vdpa: implement IRQ offloading in vhost_vdpa
  ifcvf: implement vdpa_config_ops.get_vq_irq()
  irqbypass: do not start cons/prod when failed connect

 arch/x86/kvm/x86.c              | 12 ++++-
 drivers/vdpa/ifcvf/ifcvf_main.c | 18 ++++++-
 drivers/vhost/Kconfig           |  1 +
 drivers/vhost/vdpa.c            | 83 +++++++++++++++++++++++++++++++--
 drivers/vhost/vhost.c           | 22 ++++++---
 drivers/vhost/vhost.h           |  9 +++-
 include/linux/vdpa.h            |  6 +++
 virt/lib/irqbypass.c            | 16 ++++---
 8 files changed, 147 insertions(+), 20 deletions(-)

-- 
2.18.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V4 1/6] vhost: introduce vhost_vring_call
  2020-07-28  4:23 [PATCH V4 0/6] IRQ offloading for vDPA Zhu Lingshan
@ 2020-07-28  4:24 ` Zhu Lingshan
  2020-07-28  4:24 ` [PATCH V4 2/6] kvm: detect assigned device via irqbypass manager Zhu Lingshan
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Zhu Lingshan @ 2020-07-28  4:24 UTC (permalink / raw)
  To: jasowang, alex.williamson, mst, pbonzini, sean.j.christopherson,
	wanpengli
  Cc: virtualization, netdev, kvm, eli, shahafs, parav, Zhu Lingshan

This commit introduces struct vhost_vring_call which replaced
raw struct eventfd_ctx *call_ctx in struct vhost_virtqueue.
Besides eventfd_ctx, it contains a spin lock and an
irq_bypass_producer in its structure.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vdpa.c  |  4 ++--
 drivers/vhost/vhost.c | 22 ++++++++++++++++------
 drivers/vhost/vhost.h |  9 ++++++++-
 3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index a54b60d6623f..df3cf386b0cd 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -96,7 +96,7 @@ static void handle_vq_kick(struct vhost_work *work)
 static irqreturn_t vhost_vdpa_virtqueue_cb(void *private)
 {
 	struct vhost_virtqueue *vq = private;
-	struct eventfd_ctx *call_ctx = vq->call_ctx;
+	struct eventfd_ctx *call_ctx = vq->call_ctx.ctx;
 
 	if (call_ctx)
 		eventfd_signal(call_ctx, 1);
@@ -382,7 +382,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
 		break;
 
 	case VHOST_SET_VRING_CALL:
-		if (vq->call_ctx) {
+		if (vq->call_ctx.ctx) {
 			cb.callback = vhost_vdpa_virtqueue_cb;
 			cb.private = vq;
 		} else {
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d7b8df3edffc..9f1a845a9302 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -298,6 +298,13 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 		__vhost_vq_meta_reset(d->vqs[i]);
 }
 
+static void vhost_vring_call_reset(struct vhost_vring_call *call_ctx)
+{
+	call_ctx->ctx = NULL;
+	memset(&call_ctx->producer, 0x0, sizeof(struct irq_bypass_producer));
+	spin_lock_init(&call_ctx->ctx_lock);
+}
+
 static void vhost_vq_reset(struct vhost_dev *dev,
 			   struct vhost_virtqueue *vq)
 {
@@ -319,13 +326,13 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->log_base = NULL;
 	vq->error_ctx = NULL;
 	vq->kick = NULL;
-	vq->call_ctx = NULL;
 	vq->log_ctx = NULL;
 	vhost_reset_is_le(vq);
 	vhost_disable_cross_endian(vq);
 	vq->busyloop_timeout = 0;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
+	vhost_vring_call_reset(&vq->call_ctx);
 	__vhost_vq_meta_reset(vq);
 }
 
@@ -685,8 +692,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 			eventfd_ctx_put(dev->vqs[i]->error_ctx);
 		if (dev->vqs[i]->kick)
 			fput(dev->vqs[i]->kick);
-		if (dev->vqs[i]->call_ctx)
-			eventfd_ctx_put(dev->vqs[i]->call_ctx);
+		if (dev->vqs[i]->call_ctx.ctx)
+			eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
 		vhost_vq_reset(dev, dev->vqs[i]);
 	}
 	vhost_dev_free_iovecs(dev);
@@ -1629,7 +1636,10 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 			r = PTR_ERR(ctx);
 			break;
 		}
-		swap(ctx, vq->call_ctx);
+
+		spin_lock(&vq->call_ctx.ctx_lock);
+		swap(ctx, vq->call_ctx.ctx);
+		spin_unlock(&vq->call_ctx.ctx_lock);
 		break;
 	case VHOST_SET_VRING_ERR:
 		if (copy_from_user(&f, argp, sizeof f)) {
@@ -2440,8 +2450,8 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
 	/* Signal the Guest tell them we used something up. */
-	if (vq->call_ctx && vhost_notify(dev, vq))
-		eventfd_signal(vq->call_ctx, 1);
+	if (vq->call_ctx.ctx && vhost_notify(dev, vq))
+		eventfd_signal(vq->call_ctx.ctx, 1);
 }
 EXPORT_SYMBOL_GPL(vhost_signal);
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index c8e96a095d3b..38eb1aa3b68d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -13,6 +13,7 @@
 #include <linux/virtio_ring.h>
 #include <linux/atomic.h>
 #include <linux/vhost_iotlb.h>
+#include <linux/irqbypass.h>
 
 struct vhost_work;
 typedef void (*vhost_work_fn_t)(struct vhost_work *work);
@@ -60,6 +61,12 @@ enum vhost_uaddr_type {
 	VHOST_NUM_ADDRS = 3,
 };
 
+struct vhost_vring_call {
+	struct eventfd_ctx *ctx;
+	struct irq_bypass_producer producer;
+	spinlock_t ctx_lock;
+};
+
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
 	struct vhost_dev *dev;
@@ -72,7 +79,7 @@ struct vhost_virtqueue {
 	vring_used_t __user *used;
 	const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS];
 	struct file *kick;
-	struct eventfd_ctx *call_ctx;
+	struct vhost_vring_call call_ctx;
 	struct eventfd_ctx *error_ctx;
 	struct eventfd_ctx *log_ctx;
 
-- 
2.18.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V4 2/6] kvm: detect assigned device via irqbypass manager
  2020-07-28  4:23 [PATCH V4 0/6] IRQ offloading for vDPA Zhu Lingshan
  2020-07-28  4:24 ` [PATCH V4 1/6] vhost: introduce vhost_vring_call Zhu Lingshan
@ 2020-07-28  4:24 ` Zhu Lingshan
  2020-07-28  4:24 ` [PATCH V4 3/6] vDPA: add get_vq_irq() in vdpa_config_ops Zhu Lingshan
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Zhu Lingshan @ 2020-07-28  4:24 UTC (permalink / raw)
  To: jasowang, alex.williamson, mst, pbonzini, sean.j.christopherson,
	wanpengli
  Cc: virtualization, netdev, kvm, eli, shahafs, parav, Zhu Lingshan

vDPA devices has dedicated backed hardware like
passthrough-ed devices. Then it is possible to setup irq
offloading to vCPU for vDPA devices. Thus this patch tries to
manipulated assigned device counters by
kvm_arch_start/end_assignment() in irqbypass manager, so that
assigned devices could be detected in update_pi_irte()

We will increase/decrease the assigned device counter in kvm/x86.
Both vDPA and VFIO would go through this code path.

Only X86 uses these counters and kvm_arch_start/end_assignment(),
so this code path only affect x86 for now.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
---
 arch/x86/kvm/x86.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 88c593f83b28..76a2e7fd18c7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10630,11 +10630,17 @@ int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
 {
 	struct kvm_kernel_irqfd *irqfd =
 		container_of(cons, struct kvm_kernel_irqfd, consumer);
+	int ret;
 
 	irqfd->producer = prod;
+	kvm_arch_start_assignment(irqfd->kvm);
+	ret = kvm_x86_ops.update_pi_irte(irqfd->kvm,
+					 prod->irq, irqfd->gsi, 1);
+
+	if (ret)
+		kvm_arch_end_assignment(irqfd->kvm);
 
-	return kvm_x86_ops.update_pi_irte(irqfd->kvm,
-					   prod->irq, irqfd->gsi, 1);
+	return ret;
 }
 
 void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
@@ -10657,6 +10663,8 @@ void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
 	if (ret)
 		printk(KERN_INFO "irq bypass consumer (token %p) unregistration"
 		       " fails: %d\n", irqfd->consumer.token, ret);
+
+	kvm_arch_end_assignment(irqfd->kvm);
 }
 
 int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
-- 
2.18.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V4 3/6] vDPA: add get_vq_irq() in vdpa_config_ops
  2020-07-28  4:23 [PATCH V4 0/6] IRQ offloading for vDPA Zhu Lingshan
  2020-07-28  4:24 ` [PATCH V4 1/6] vhost: introduce vhost_vring_call Zhu Lingshan
  2020-07-28  4:24 ` [PATCH V4 2/6] kvm: detect assigned device via irqbypass manager Zhu Lingshan
@ 2020-07-28  4:24 ` Zhu Lingshan
  2020-07-28  7:42   ` Jason Wang
  2020-07-28  4:24 ` [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa Zhu Lingshan
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Zhu Lingshan @ 2020-07-28  4:24 UTC (permalink / raw)
  To: jasowang, alex.williamson, mst, pbonzini, sean.j.christopherson,
	wanpengli
  Cc: virtualization, netdev, kvm, eli, shahafs, parav, Zhu Lingshan

This commit adds a new function get_vq_irq() in struct
vdpa_config_ops, which will return the irq number of a
virtqueue.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/vdpa.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 239db794357c..cebc79173aaa 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -87,6 +87,11 @@ struct vdpa_device {
  *				@vdev: vdpa device
  *				@idx: virtqueue index
  *				Returns the notifcation area
+ * @get_vq_irq:			Get the irq number of a virtqueue
+ *				@vdev: vdpa device
+ *				@idx: virtqueue index
+ *				Returns u32: irq number of a virtqueue,
+ *				-EINVAL if no irq assigned.
  * @get_vq_align:		Get the virtqueue align requirement
  *				for the device
  *				@vdev: vdpa device
@@ -178,6 +183,7 @@ struct vdpa_config_ops {
 	u64 (*get_vq_state)(struct vdpa_device *vdev, u16 idx);
 	struct vdpa_notification_area
 	(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
+	u32 (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
 
 	/* Device ops */
 	u32 (*get_vq_align)(struct vdpa_device *vdev);
-- 
2.18.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
  2020-07-28  4:23 [PATCH V4 0/6] IRQ offloading for vDPA Zhu Lingshan
                   ` (2 preceding siblings ...)
  2020-07-28  4:24 ` [PATCH V4 3/6] vDPA: add get_vq_irq() in vdpa_config_ops Zhu Lingshan
@ 2020-07-28  4:24 ` Zhu Lingshan
  2020-07-28  7:53   ` Jason Wang
  2020-07-28  9:04   ` Eli Cohen
  2020-07-28  4:24 ` [PATCH V4 5/6] ifcvf: implement vdpa_config_ops.get_vq_irq() Zhu Lingshan
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 18+ messages in thread
From: Zhu Lingshan @ 2020-07-28  4:24 UTC (permalink / raw)
  To: jasowang, alex.williamson, mst, pbonzini, sean.j.christopherson,
	wanpengli
  Cc: virtualization, netdev, kvm, eli, shahafs, parav, Zhu Lingshan

This patch introduce a set of functions for setup/unsetup
and update irq offloading respectively by register/unregister
and re-register the irq_bypass_producer.

With these functions, this commit can setup/unsetup
irq offloading through setting DRIVER_OK/!DRIVER_OK, and
update irq offloading through SET_VRING_CALL.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/Kconfig |  1 +
 drivers/vhost/vdpa.c  | 79 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index d3688c6afb87..587fbae06182 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -65,6 +65,7 @@ config VHOST_VDPA
 	tristate "Vhost driver for vDPA-based backend"
 	depends on EVENTFD
 	select VHOST
+	select IRQ_BYPASS_MANAGER
 	depends on VDPA
 	help
 	  This kernel module can be loaded in host kernel to accelerate
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index df3cf386b0cd..1dccced321f8 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -115,6 +115,55 @@ static irqreturn_t vhost_vdpa_config_cb(void *private)
 	return IRQ_HANDLED;
 }
 
+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
+{
+	struct vhost_virtqueue *vq = &v->vqs[qid];
+	const struct vdpa_config_ops *ops = v->vdpa->config;
+	struct vdpa_device *vdpa = v->vdpa;
+	int ret, irq;
+
+	spin_lock(&vq->call_ctx.ctx_lock);
+	irq = ops->get_vq_irq(vdpa, qid);
+	if (!vq->call_ctx.ctx || irq == -EINVAL) {
+		spin_unlock(&vq->call_ctx.ctx_lock);
+		return;
+	}
+
+	vq->call_ctx.producer.token = vq->call_ctx.ctx;
+	vq->call_ctx.producer.irq = irq;
+	ret = irq_bypass_register_producer(&vq->call_ctx.producer);
+	spin_unlock(&vq->call_ctx.ctx_lock);
+}
+
+static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, int qid)
+{
+	struct vhost_virtqueue *vq = &v->vqs[qid];
+
+	spin_lock(&vq->call_ctx.ctx_lock);
+	irq_bypass_unregister_producer(&vq->call_ctx.producer);
+	spin_unlock(&vq->call_ctx.ctx_lock);
+}
+
+static void vhost_vdpa_update_vq_irq(struct vhost_virtqueue *vq)
+{
+	spin_lock(&vq->call_ctx.ctx_lock);
+	/*
+	 * if it has a non-zero irq, means there is a
+	 * previsouly registered irq_bypass_producer,
+	 * we should update it when ctx (its token)
+	 * changes.
+	 */
+	if (!vq->call_ctx.producer.irq) {
+		spin_unlock(&vq->call_ctx.ctx_lock);
+		return;
+	}
+
+	irq_bypass_unregister_producer(&vq->call_ctx.producer);
+	vq->call_ctx.producer.token = vq->call_ctx.ctx;
+	irq_bypass_register_producer(&vq->call_ctx.producer);
+	spin_unlock(&vq->call_ctx.ctx_lock);
+}
+
 static void vhost_vdpa_reset(struct vhost_vdpa *v)
 {
 	struct vdpa_device *vdpa = v->vdpa;
@@ -155,11 +204,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp)
 {
 	struct vdpa_device *vdpa = v->vdpa;
 	const struct vdpa_config_ops *ops = vdpa->config;
-	u8 status;
+	u8 status, status_old;
+	int i, nvqs;
 
 	if (copy_from_user(&status, statusp, sizeof(status)))
 		return -EFAULT;
 
+	status_old = ops->get_status(vdpa);
+	nvqs = v->nvqs;
+
 	/*
 	 * Userspace shouldn't remove status bits unless reset the
 	 * status to 0.
@@ -167,6 +220,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp)
 	if (status != 0 && (ops->get_status(vdpa) & ~status) != 0)
 		return -EINVAL;
 
+	/* vq irq is not expected to be changed once DRIVER_OK is set */
+	if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && !(status_old & VIRTIO_CONFIG_S_DRIVER_OK))
+		for (i = 0; i < nvqs; i++)
+			vhost_vdpa_setup_vq_irq(v, i);
+
+	if ((status_old & VIRTIO_CONFIG_S_DRIVER_OK) && !(status & VIRTIO_CONFIG_S_DRIVER_OK))
+		for (i = 0; i < nvqs; i++)
+			vhost_vdpa_unsetup_vq_irq(v, i);
+
 	ops->set_status(vdpa, status);
 
 	return 0;
@@ -332,6 +394,7 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa *v, u32 __user *argp)
 
 	return 0;
 }
+
 static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
 				   void __user *argp)
 {
@@ -390,6 +453,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
 			cb.private = NULL;
 		}
 		ops->set_vq_cb(vdpa, idx, &cb);
+		vhost_vdpa_update_vq_irq(vq);
 		break;
 
 	case VHOST_SET_VRING_NUM:
@@ -765,6 +829,18 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)
 	return r;
 }
 
+static void vhost_vdpa_clean_irq(struct vhost_vdpa *v)
+{
+	struct vhost_virtqueue *vq;
+	int i;
+
+	for (i = 0; i < v->nvqs; i++) {
+		vq = &v->vqs[i];
+		if (vq->call_ctx.producer.irq)
+			irq_bypass_unregister_producer(&vq->call_ctx.producer);
+	}
+}
+
 static int vhost_vdpa_release(struct inode *inode, struct file *filep)
 {
 	struct vhost_vdpa *v = filep->private_data;
@@ -777,6 +853,7 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep)
 	vhost_vdpa_iotlb_free(v);
 	vhost_vdpa_free_domain(v);
 	vhost_vdpa_config_put(v);
+	vhost_vdpa_clean_irq(v);
 	vhost_dev_cleanup(&v->vdev);
 	kfree(v->vdev.vqs);
 	mutex_unlock(&d->mutex);
-- 
2.18.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V4 5/6] ifcvf: implement vdpa_config_ops.get_vq_irq()
  2020-07-28  4:23 [PATCH V4 0/6] IRQ offloading for vDPA Zhu Lingshan
                   ` (3 preceding siblings ...)
  2020-07-28  4:24 ` [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa Zhu Lingshan
@ 2020-07-28  4:24 ` Zhu Lingshan
  2020-07-28  4:24 ` [PATCH V4 6/6] irqbypass: do not start cons/prod when failed connect Zhu Lingshan
  2020-07-31  3:18 ` [PATCH V4 0/6] IRQ offloading for vDPA Jason Wang
  6 siblings, 0 replies; 18+ messages in thread
From: Zhu Lingshan @ 2020-07-28  4:24 UTC (permalink / raw)
  To: jasowang, alex.williamson, mst, pbonzini, sean.j.christopherson,
	wanpengli
  Cc: virtualization, netdev, kvm, eli, shahafs, parav, Zhu Lingshan

This commit implemented vdpa_config_ops.get_vq_irq() in ifcvf,
and initialized vq irq to -EINVAL. So that ifcvf can report
irq number of a vq, or -EINVAL if the vq is not assigned an
irq number.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vdpa/ifcvf/ifcvf_main.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index f5a60c14b979..8db530280abe 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -50,8 +50,10 @@ static void ifcvf_free_irq(struct ifcvf_adapter *adapter, int queues)
 	int i;
 
 
-	for (i = 0; i < queues; i++)
+	for (i = 0; i < queues; i++) {
 		devm_free_irq(&pdev->dev, vf->vring[i].irq, &vf->vring[i]);
+		vf->vring[i].irq = -EINVAL;
+	}
 
 	ifcvf_free_irq_vectors(pdev);
 }
@@ -352,6 +354,14 @@ static void ifcvf_vdpa_set_config_cb(struct vdpa_device *vdpa_dev,
 	vf->config_cb.private = cb->private;
 }
 
+static u32  ifcvf_vdpa_get_vq_irq(struct vdpa_device *vdpa_dev,
+				  u16 qid)
+{
+	struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+	return vf->vring[qid].irq;
+}
+
 /*
  * IFCVF currently does't have on-chip IOMMU, so not
  * implemented set_map()/dma_map()/dma_unmap()
@@ -369,6 +379,7 @@ static const struct vdpa_config_ops ifc_vdpa_ops = {
 	.get_vq_ready	= ifcvf_vdpa_get_vq_ready,
 	.set_vq_num	= ifcvf_vdpa_set_vq_num,
 	.set_vq_address	= ifcvf_vdpa_set_vq_address,
+	.get_vq_irq	= ifcvf_vdpa_get_vq_irq,
 	.kick_vq	= ifcvf_vdpa_kick_vq,
 	.get_generation	= ifcvf_vdpa_get_generation,
 	.get_device_id	= ifcvf_vdpa_get_device_id,
@@ -384,7 +395,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	struct device *dev = &pdev->dev;
 	struct ifcvf_adapter *adapter;
 	struct ifcvf_hw *vf;
-	int ret;
+	int ret, i;
 
 	ret = pcim_enable_device(pdev);
 	if (ret) {
@@ -441,6 +452,9 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto err;
 	}
 
+	for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++)
+		vf->vring[i].irq = -EINVAL;
+
 	ret = vdpa_register_device(&adapter->vdpa);
 	if (ret) {
 		IFCVF_ERR(pdev, "Failed to register ifcvf to vdpa bus");
-- 
2.18.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V4 6/6] irqbypass: do not start cons/prod when failed connect
  2020-07-28  4:23 [PATCH V4 0/6] IRQ offloading for vDPA Zhu Lingshan
                   ` (4 preceding siblings ...)
  2020-07-28  4:24 ` [PATCH V4 5/6] ifcvf: implement vdpa_config_ops.get_vq_irq() Zhu Lingshan
@ 2020-07-28  4:24 ` Zhu Lingshan
  2020-07-31  3:18 ` [PATCH V4 0/6] IRQ offloading for vDPA Jason Wang
  6 siblings, 0 replies; 18+ messages in thread
From: Zhu Lingshan @ 2020-07-28  4:24 UTC (permalink / raw)
  To: jasowang, alex.williamson, mst, pbonzini, sean.j.christopherson,
	wanpengli
  Cc: virtualization, netdev, kvm, eli, shahafs, parav, Zhu Lingshan

If failed to connect, there is no need to start consumer nor
producer.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
---
 virt/lib/irqbypass.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
index 28fda42e471b..c9bb3957f58a 100644
--- a/virt/lib/irqbypass.c
+++ b/virt/lib/irqbypass.c
@@ -40,17 +40,21 @@ static int __connect(struct irq_bypass_producer *prod,
 	if (prod->add_consumer)
 		ret = prod->add_consumer(prod, cons);
 
-	if (!ret) {
-		ret = cons->add_producer(cons, prod);
-		if (ret && prod->del_consumer)
-			prod->del_consumer(prod, cons);
-	}
+	if (ret)
+		goto err_add_consumer;
+
+	ret = cons->add_producer(cons, prod);
+	if (ret)
+		goto err_add_producer;
 
 	if (cons->start)
 		cons->start(cons);
 	if (prod->start)
 		prod->start(prod);
-
+err_add_producer:
+	if (prod->del_consumer)
+		prod->del_consumer(prod, cons);
+err_add_consumer:
 	return ret;
 }
 
-- 
2.18.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 3/6] vDPA: add get_vq_irq() in vdpa_config_ops
  2020-07-28  4:24 ` [PATCH V4 3/6] vDPA: add get_vq_irq() in vdpa_config_ops Zhu Lingshan
@ 2020-07-28  7:42   ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2020-07-28  7:42 UTC (permalink / raw)
  To: Zhu Lingshan, alex.williamson, mst, pbonzini,
	sean.j.christopherson, wanpengli
  Cc: virtualization, netdev, kvm, eli, shahafs, parav


On 2020/7/28 下午12:24, Zhu Lingshan wrote:
> This commit adds a new function get_vq_irq() in struct
> vdpa_config_ops, which will return the irq number of a
> virtqueue.
>
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Suggested-by: Jason Wang <jasowang@redhat.com>
> ---
>   include/linux/vdpa.h | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 239db794357c..cebc79173aaa 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -87,6 +87,11 @@ struct vdpa_device {
>    *				@vdev: vdpa device
>    *				@idx: virtqueue index
>    *				Returns the notifcation area
> + * @get_vq_irq:			Get the irq number of a virtqueue
> + *				@vdev: vdpa device
> + *				@idx: virtqueue index
> + *				Returns u32: irq number of a virtqueue,
> + *				-EINVAL if no irq assigned.


I think we can not get -EINVAL since the function will return a u32.

Thanks


>    * @get_vq_align:		Get the virtqueue align requirement
>    *				for the device
>    *				@vdev: vdpa device
> @@ -178,6 +183,7 @@ struct vdpa_config_ops {
>   	u64 (*get_vq_state)(struct vdpa_device *vdev, u16 idx);
>   	struct vdpa_notification_area
>   	(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
> +	u32 (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>   
>   	/* Device ops */
>   	u32 (*get_vq_align)(struct vdpa_device *vdev);


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
  2020-07-28  4:24 ` [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa Zhu Lingshan
@ 2020-07-28  7:53   ` Jason Wang
       [not found]     ` <f3e375da-3aa8-7a0c-237c-25943667a535@intel.com>
  2020-07-28  9:04   ` Eli Cohen
  1 sibling, 1 reply; 18+ messages in thread
From: Jason Wang @ 2020-07-28  7:53 UTC (permalink / raw)
  To: Zhu Lingshan, alex.williamson, mst, pbonzini,
	sean.j.christopherson, wanpengli
  Cc: virtualization, netdev, kvm, eli, shahafs, parav


On 2020/7/28 下午12:24, Zhu Lingshan wrote:
> This patch introduce a set of functions for setup/unsetup
> and update irq offloading respectively by register/unregister
> and re-register the irq_bypass_producer.
>
> With these functions, this commit can setup/unsetup
> irq offloading through setting DRIVER_OK/!DRIVER_OK, and
> update irq offloading through SET_VRING_CALL.
>
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Suggested-by: Jason Wang <jasowang@redhat.com>
> ---
>   drivers/vhost/Kconfig |  1 +
>   drivers/vhost/vdpa.c  | 79 ++++++++++++++++++++++++++++++++++++++++++-
>   2 files changed, 79 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> index d3688c6afb87..587fbae06182 100644
> --- a/drivers/vhost/Kconfig
> +++ b/drivers/vhost/Kconfig
> @@ -65,6 +65,7 @@ config VHOST_VDPA
>   	tristate "Vhost driver for vDPA-based backend"
>   	depends on EVENTFD
>   	select VHOST
> +	select IRQ_BYPASS_MANAGER
>   	depends on VDPA
>   	help
>   	  This kernel module can be loaded in host kernel to accelerate
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index df3cf386b0cd..1dccced321f8 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -115,6 +115,55 @@ static irqreturn_t vhost_vdpa_config_cb(void *private)
>   	return IRQ_HANDLED;
>   }
>   
> +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
> +{
> +	struct vhost_virtqueue *vq = &v->vqs[qid];
> +	const struct vdpa_config_ops *ops = v->vdpa->config;
> +	struct vdpa_device *vdpa = v->vdpa;
> +	int ret, irq;
> +
> +	spin_lock(&vq->call_ctx.ctx_lock);
> +	irq = ops->get_vq_irq(vdpa, qid);
> +	if (!vq->call_ctx.ctx || irq == -EINVAL) {


It's better to check returned irq as "irq < 0" to be more robust. 
Forcing a specific errno value is not good.


> +		spin_unlock(&vq->call_ctx.ctx_lock);
> +		return;
> +	}
> +
> +	vq->call_ctx.producer.token = vq->call_ctx.ctx;
> +	vq->call_ctx.producer.irq = irq;
> +	ret = irq_bypass_register_producer(&vq->call_ctx.producer);
> +	spin_unlock(&vq->call_ctx.ctx_lock);
> +}
> +
> +static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, int qid)
> +{
> +	struct vhost_virtqueue *vq = &v->vqs[qid];
> +
> +	spin_lock(&vq->call_ctx.ctx_lock);
> +	irq_bypass_unregister_producer(&vq->call_ctx.producer);
> +	spin_unlock(&vq->call_ctx.ctx_lock);
> +}
> +
> +static void vhost_vdpa_update_vq_irq(struct vhost_virtqueue *vq)
> +{
> +	spin_lock(&vq->call_ctx.ctx_lock);
> +	/*
> +	 * if it has a non-zero irq, means there is a
> +	 * previsouly registered irq_bypass_producer,
> +	 * we should update it when ctx (its token)
> +	 * changes.
> +	 */
> +	if (!vq->call_ctx.producer.irq) {
> +		spin_unlock(&vq->call_ctx.ctx_lock);
> +		return;
> +	}
> +
> +	irq_bypass_unregister_producer(&vq->call_ctx.producer);
> +	vq->call_ctx.producer.token = vq->call_ctx.ctx;
> +	irq_bypass_register_producer(&vq->call_ctx.producer);
> +	spin_unlock(&vq->call_ctx.ctx_lock);
> +}
> +
>   static void vhost_vdpa_reset(struct vhost_vdpa *v)
>   {
>   	struct vdpa_device *vdpa = v->vdpa;
> @@ -155,11 +204,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp)
>   {
>   	struct vdpa_device *vdpa = v->vdpa;
>   	const struct vdpa_config_ops *ops = vdpa->config;
> -	u8 status;
> +	u8 status, status_old;
> +	int i, nvqs;
>   
>   	if (copy_from_user(&status, statusp, sizeof(status)))
>   		return -EFAULT;
>   
> +	status_old = ops->get_status(vdpa);
> +	nvqs = v->nvqs;
> +
>   	/*
>   	 * Userspace shouldn't remove status bits unless reset the
>   	 * status to 0.
> @@ -167,6 +220,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp)
>   	if (status != 0 && (ops->get_status(vdpa) & ~status) != 0)
>   		return -EINVAL;
>   
> +	/* vq irq is not expected to be changed once DRIVER_OK is set */


So this basically limit the usage of get_vq_irq() in the context 
vhost_vdpa_set_status() and other vDPA bus drivers' set_status(). If 
this is true, there's even no need to introduce any new config ops but 
just let set_status() to return the irqs used for the device. Or if we 
want this to be more generic, we need vpda's own irq manager (which 
should be similar to irq bypass manager). That is:

- bus driver can register itself as consumer
- vDPA device driver can register itself as producer
- matching via queue index
- deal with registering/unregistering of consumer/producer

So there's no need to care when or where the vDPA device driver to 
allocate the irq, and we don't need to care at which context the vDPA 
bus driver can use the irq. Either side may get notified when the other 
side is gone (though we only care about the gone of producer probably).

Any thought on this?

Thanks


> +	if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && !(status_old & VIRTIO_CONFIG_S_DRIVER_OK))
> +		for (i = 0; i < nvqs; i++)
> +			vhost_vdpa_setup_vq_irq(v, i);
> +
> +	if ((status_old & VIRTIO_CONFIG_S_DRIVER_OK) && !(status & VIRTIO_CONFIG_S_DRIVER_OK))
> +		for (i = 0; i < nvqs; i++)
> +			vhost_vdpa_unsetup_vq_irq(v, i);
> +
>   	ops->set_status(vdpa, status);
>   
>   	return 0;
> @@ -332,6 +394,7 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa *v, u32 __user *argp)
>   
>   	return 0;
>   }
> +
>   static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>   				   void __user *argp)
>   {
> @@ -390,6 +453,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>   			cb.private = NULL;
>   		}
>   		ops->set_vq_cb(vdpa, idx, &cb);
> +		vhost_vdpa_update_vq_irq(vq);
>   		break;
>   
>   	case VHOST_SET_VRING_NUM:
> @@ -765,6 +829,18 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)
>   	return r;
>   }
>   
> +static void vhost_vdpa_clean_irq(struct vhost_vdpa *v)
> +{
> +	struct vhost_virtqueue *vq;
> +	int i;
> +
> +	for (i = 0; i < v->nvqs; i++) {
> +		vq = &v->vqs[i];
> +		if (vq->call_ctx.producer.irq)
> +			irq_bypass_unregister_producer(&vq->call_ctx.producer);
> +	}
> +}
> +
>   static int vhost_vdpa_release(struct inode *inode, struct file *filep)
>   {
>   	struct vhost_vdpa *v = filep->private_data;
> @@ -777,6 +853,7 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep)
>   	vhost_vdpa_iotlb_free(v);
>   	vhost_vdpa_free_domain(v);
>   	vhost_vdpa_config_put(v);
> +	vhost_vdpa_clean_irq(v);
>   	vhost_dev_cleanup(&v->vdev);
>   	kfree(v->vdev.vqs);
>   	mutex_unlock(&d->mutex);


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
  2020-07-28  4:24 ` [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa Zhu Lingshan
  2020-07-28  7:53   ` Jason Wang
@ 2020-07-28  9:04   ` Eli Cohen
  2020-07-29  9:21     ` Jason Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Eli Cohen @ 2020-07-28  9:04 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, alex.williamson, mst, pbonzini, sean.j.christopherson,
	wanpengli, virtualization, netdev, kvm, shahafs, parav

On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
>  
> +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
> +{
> +	struct vhost_virtqueue *vq = &v->vqs[qid];
> +	const struct vdpa_config_ops *ops = v->vdpa->config;
> +	struct vdpa_device *vdpa = v->vdpa;
> +	int ret, irq;
> +
> +	spin_lock(&vq->call_ctx.ctx_lock);
> +	irq = ops->get_vq_irq(vdpa, qid);
> +	if (!vq->call_ctx.ctx || irq == -EINVAL) {
> +		spin_unlock(&vq->call_ctx.ctx_lock);
> +		return;
> +	}
> +

If I understand correctly, this will cause these IRQs to be forwarded
directly to the VCPU, e.g. will be handled by the guest/qemu.
Does this mean that the host will not handle this interrupt? How does it
work in case on level triggered interrupts?

In the case of ConnectX, I need to execute some code to acknowledge the
interrupt.

Can you explain how this should be done?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
       [not found]     ` <f3e375da-3aa8-7a0c-237c-25943667a535@intel.com>
@ 2020-07-28 10:29       ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2020-07-28 10:29 UTC (permalink / raw)
  To: Zhu, Lingshan, alex.williamson, mst, pbonzini,
	sean.j.christopherson, wanpengli
  Cc: virtualization, netdev, kvm, eli, shahafs, parav


On 2020/7/28 下午5:18, Zhu, Lingshan wrote:
>>>
>>>        * status to 0.
>>> @@ -167,6 +220,15 @@ static long vhost_vdpa_set_status(struct 
>>> vhost_vdpa *v, u8 __user *statusp)
>>>       if (status != 0 && (ops->get_status(vdpa) & ~status) != 0)
>>>           return -EINVAL;
>>>   +    /* vq irq is not expected to be changed once DRIVER_OK is set */
>>
>>
>> So this basically limit the usage of get_vq_irq() in the context 
>> vhost_vdpa_set_status() and other vDPA bus drivers' set_status(). If 
>> this is true, there's even no need to introduce any new config ops 
>> but just let set_status() to return the irqs used for the device. Or 
>> if we want this to be more generic, we need vpda's own irq manager 
>> (which should be similar to irq bypass manager). That is:
> I think there is no need for a driver to free / re-request its irqs after DRIVER_OK though it can do so. If a driver changed its irq of a vq after DRIVER_OK, the vq is still operational but will lose irq offloading that is reasonable.
> If we want set_status() return irqs, we need to record the irqs somewhere in vdpa_device,


Why, we can simply pass an array to the driver I think?

void (*set_status)(struct vdpa_device *vdev, u8 status, int *irqs);


> as we discussed in a previous thread, this may need initialize and cleanup works, so a new ops
> with proper comments (don't say they could not change irq, but highlight if irq changes, irq offloading will not work till next DRIVER_OK) could be more elegant.
> However if we really need to change irq after DRIVER_OK, I think maybe we still need vDPA vq irq allocate / free helpers, then the helpers can not be used in probe() as we discussed before, it is a step back to V3 series.


Still, it's not about whether driver may change irq after DRIVER_OK but 
implication of the assumption. If one bus ops must be called in another 
ops, it's better to just implement them in one ops.


>>
>> - bus driver can register itself as consumer
>> - vDPA device driver can register itself as producer
>> - matching via queue index
> IMHO, is it too heavy for this feature,


Do you mean LOCs? We can:

1) refactor irq bypass manager
2) invent it our own (a much simplified version compared to bypass manager)
3) enforcing them via vDPA bus

Each of the above should be not a lot of coding. I think method 3 is 
partially done in your previous series but in an implicit manner:

- bus driver that has alloc_irq/free_irq implemented could be implicitly 
treated as consumer registering
- every vDPA device driver could be treated as producer
- vdpa_devm_alloc_irq() could be treated as producer registering
- alloc_irq/free_irq is the add_producer/del_procuer

We probably just lack some synchronization with driver probe/remove.


> and how can they match if two individual adapters both have vq idx = 1.


The matching is per vDPA device.

Thanks


> Thanks!
>> - deal with registering/unregistering of consumer/producer
>>
>> So there's no need to care when or where the vDPA device driver to 
>> allocate the irq, and we don't need to care at which context the vDPA 
>> bus driver can use the irq. Either side may get notified when the 
>> other side is gone (though we only care about the gone of producer 
>> probably).
>>
>> Any thought on this?
>>
>> Thanks
>>
>>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
  2020-07-28  9:04   ` Eli Cohen
@ 2020-07-29  9:21     ` Jason Wang
  2020-07-29  9:55       ` Eli Cohen
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2020-07-29  9:21 UTC (permalink / raw)
  To: Eli Cohen, Zhu Lingshan
  Cc: alex.williamson, mst, pbonzini, sean.j.christopherson, wanpengli,
	virtualization, netdev, kvm, shahafs, parav


On 2020/7/28 下午5:04, Eli Cohen wrote:
> On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
>>   
>> +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
>> +{
>> +	struct vhost_virtqueue *vq = &v->vqs[qid];
>> +	const struct vdpa_config_ops *ops = v->vdpa->config;
>> +	struct vdpa_device *vdpa = v->vdpa;
>> +	int ret, irq;
>> +
>> +	spin_lock(&vq->call_ctx.ctx_lock);
>> +	irq = ops->get_vq_irq(vdpa, qid);
>> +	if (!vq->call_ctx.ctx || irq == -EINVAL) {
>> +		spin_unlock(&vq->call_ctx.ctx_lock);
>> +		return;
>> +	}
>> +
> If I understand correctly, this will cause these IRQs to be forwarded
> directly to the VCPU, e.g. will be handled by the guest/qemu.


Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.


> Does this mean that the host will not handle this interrupt? How does it
> work in case on level triggered interrupts?


There's no guarantee that the KVM arch code can make sure the irq bypass 
work for any type of irq. So if they the irq will still need to be 
handled by host first. This means we should keep the host interrupt 
handler as a slowpath (fallback).


>
> In the case of ConnectX, I need to execute some code to acknowledge the
> interrupt.


This turns out to be hard for irq bypassing to work. Is it because the 
irq is shared or what kind of ack you need to do?

Thanks


>
> Can you explain how this should be done?
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
  2020-07-29  9:21     ` Jason Wang
@ 2020-07-29  9:55       ` Eli Cohen
  2020-07-29 10:19         ` Jason Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Cohen @ 2020-07-29  9:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu Lingshan, alex.williamson, mst, pbonzini,
	sean.j.christopherson, wanpengli, virtualization, netdev, kvm,
	shahafs, parav

On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:
> 
> On 2020/7/28 下午5:04, Eli Cohen wrote:
> >On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
> >>+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
> >>+{
> >>+	struct vhost_virtqueue *vq = &v->vqs[qid];
> >>+	const struct vdpa_config_ops *ops = v->vdpa->config;
> >>+	struct vdpa_device *vdpa = v->vdpa;
> >>+	int ret, irq;
> >>+
> >>+	spin_lock(&vq->call_ctx.ctx_lock);
> >>+	irq = ops->get_vq_irq(vdpa, qid);
> >>+	if (!vq->call_ctx.ctx || irq == -EINVAL) {
> >>+		spin_unlock(&vq->call_ctx.ctx_lock);
> >>+		return;
> >>+	}
> >>+
> >If I understand correctly, this will cause these IRQs to be forwarded
> >directly to the VCPU, e.g. will be handled by the guest/qemu.
> 
> 
> Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.
> 

So, usually the network driver knows how to handle interrups for its
devices. I assume the virtio_net driver at the guest has some default
processing but what if the underlying hardware device (such as the case
of vdpa) needs to take some actions? Is there an option to do bounce the
interrupt back to the vendor specific driver in the host so it can take
these actions?

> 
> >Does this mean that the host will not handle this interrupt? How does it
> >work in case on level triggered interrupts?
> 
> 
> There's no guarantee that the KVM arch code can make sure the irq
> bypass work for any type of irq. So if they the irq will still need
> to be handled by host first. This means we should keep the host
> interrupt handler as a slowpath (fallback).
> 
> >
> >In the case of ConnectX, I need to execute some code to acknowledge the
> >interrupt.
> 
> 
> This turns out to be hard for irq bypassing to work. Is it because
> the irq is shared or what kind of ack you need to do?

I have an EQ which is a queue for events comming from the hardware. This
EQ can created so it reports only completion events but I still need to
execute code that roughly tells the device that I saw these event
records and then arm it again so it can report more interrupts (e.g if
more packets are received or sent). This is device specific code.

> 
> Thanks
> 
> 
> >
> >Can you explain how this should be done?
> >
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
  2020-07-29  9:55       ` Eli Cohen
@ 2020-07-29 10:19         ` Jason Wang
  2020-07-29 11:13           ` Eli Cohen
  2020-07-29 14:15           ` Eli Cohen
  0 siblings, 2 replies; 18+ messages in thread
From: Jason Wang @ 2020-07-29 10:19 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Zhu Lingshan, alex.williamson, mst, pbonzini,
	sean.j.christopherson, wanpengli, virtualization, netdev, kvm,
	shahafs, parav


On 2020/7/29 下午5:55, Eli Cohen wrote:
> On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:
>> On 2020/7/28 下午5:04, Eli Cohen wrote:
>>> On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
>>>> +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
>>>> +{
>>>> +	struct vhost_virtqueue *vq = &v->vqs[qid];
>>>> +	const struct vdpa_config_ops *ops = v->vdpa->config;
>>>> +	struct vdpa_device *vdpa = v->vdpa;
>>>> +	int ret, irq;
>>>> +
>>>> +	spin_lock(&vq->call_ctx.ctx_lock);
>>>> +	irq = ops->get_vq_irq(vdpa, qid);
>>>> +	if (!vq->call_ctx.ctx || irq == -EINVAL) {
>>>> +		spin_unlock(&vq->call_ctx.ctx_lock);
>>>> +		return;
>>>> +	}
>>>> +
>>> If I understand correctly, this will cause these IRQs to be forwarded
>>> directly to the VCPU, e.g. will be handled by the guest/qemu.
>>
>> Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.
>>
> So, usually the network driver knows how to handle interrups for its
> devices. I assume the virtio_net driver at the guest has some default
> processing but what if the underlying hardware device (such as the case
> of vdpa) needs to take some actions?


Virtio splits the bus operations out of device operations. So did the 
driver.

The virtio-net driver depends on a transport driver to talk to the real 
device. Usually PCI is used as the transport for the device. In this 
case virtio-pci driver is in charge of dealing with irq 
allocation/free/configuration and it needs to co-operate with platform 
specific irqchip (virtualized by KVM) to finish the work like irq 
acknowledge etc.  E.g for x86, the irq offloading can only work when 
there's a hardware support of virtual irqchip (APICv) then all stuffs 
could be done without vmexits.

So no vendor specific part since the device and transport are all standard.


>   Is there an option to do bounce the
> interrupt back to the vendor specific driver in the host so it can take
> these actions?


Currently not, but even if we can do this, I'm afraid we will lose the 
performance advantage of irq bypassing.


>
>>> Does this mean that the host will not handle this interrupt? How does it
>>> work in case on level triggered interrupts?
>>
>> There's no guarantee that the KVM arch code can make sure the irq
>> bypass work for any type of irq. So if they the irq will still need
>> to be handled by host first. This means we should keep the host
>> interrupt handler as a slowpath (fallback).
>>
>>> In the case of ConnectX, I need to execute some code to acknowledge the
>>> interrupt.
>>
>> This turns out to be hard for irq bypassing to work. Is it because
>> the irq is shared or what kind of ack you need to do?
> I have an EQ which is a queue for events comming from the hardware. This
> EQ can created so it reports only completion events but I still need to
> execute code that roughly tells the device that I saw these event
> records and then arm it again so it can report more interrupts (e.g if
> more packets are received or sent). This is device specific code.


Any chance that the hardware can use MSI (which is not the case here)?

Thanks


>> Thanks
>>
>>
>>> Can you explain how this should be done?
>>>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
  2020-07-29 10:19         ` Jason Wang
@ 2020-07-29 11:13           ` Eli Cohen
  2020-07-29 14:15           ` Eli Cohen
  1 sibling, 0 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-29 11:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu Lingshan, alex.williamson, mst, pbonzini,
	sean.j.christopherson, wanpengli, virtualization, netdev, kvm,
	shahafs, parav

On Wed, Jul 29, 2020 at 06:19:52PM +0800, Jason Wang wrote:
I am checking internally if we can work in a mode not requiring to
acknowledge the interrupt. I will update.

Thanks for the explanations.

> 
> On 2020/7/29 下午5:55, Eli Cohen wrote:
> >On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:
> >>On 2020/7/28 下午5:04, Eli Cohen wrote:
> >>>On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
> >>>>+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
> >>>>+{
> >>>>+	struct vhost_virtqueue *vq = &v->vqs[qid];
> >>>>+	const struct vdpa_config_ops *ops = v->vdpa->config;
> >>>>+	struct vdpa_device *vdpa = v->vdpa;
> >>>>+	int ret, irq;
> >>>>+
> >>>>+	spin_lock(&vq->call_ctx.ctx_lock);
> >>>>+	irq = ops->get_vq_irq(vdpa, qid);
> >>>>+	if (!vq->call_ctx.ctx || irq == -EINVAL) {
> >>>>+		spin_unlock(&vq->call_ctx.ctx_lock);
> >>>>+		return;
> >>>>+	}
> >>>>+
> >>>If I understand correctly, this will cause these IRQs to be forwarded
> >>>directly to the VCPU, e.g. will be handled by the guest/qemu.
> >>
> >>Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.
> >>
> >So, usually the network driver knows how to handle interrups for its
> >devices. I assume the virtio_net driver at the guest has some default
> >processing but what if the underlying hardware device (such as the case
> >of vdpa) needs to take some actions?
> 
> 
> Virtio splits the bus operations out of device operations. So did
> the driver.
> 
> The virtio-net driver depends on a transport driver to talk to the
> real device. Usually PCI is used as the transport for the device. In
> this case virtio-pci driver is in charge of dealing with irq
> allocation/free/configuration and it needs to co-operate with
> platform specific irqchip (virtualized by KVM) to finish the work
> like irq acknowledge etc.  E.g for x86, the irq offloading can only
> work when there's a hardware support of virtual irqchip (APICv) then
> all stuffs could be done without vmexits.
> 
> So no vendor specific part since the device and transport are all standard.
> 
> 
> >  Is there an option to do bounce the
> >interrupt back to the vendor specific driver in the host so it can take
> >these actions?
> 
> 
> Currently not, but even if we can do this, I'm afraid we will lose
> the performance advantage of irq bypassing.
> 
> 
> >
> >>>Does this mean that the host will not handle this interrupt? How does it
> >>>work in case on level triggered interrupts?
> >>
> >>There's no guarantee that the KVM arch code can make sure the irq
> >>bypass work for any type of irq. So if they the irq will still need
> >>to be handled by host first. This means we should keep the host
> >>interrupt handler as a slowpath (fallback).
> >>
> >>>In the case of ConnectX, I need to execute some code to acknowledge the
> >>>interrupt.
> >>
> >>This turns out to be hard for irq bypassing to work. Is it because
> >>the irq is shared or what kind of ack you need to do?
> >I have an EQ which is a queue for events comming from the hardware. This
> >EQ can created so it reports only completion events but I still need to
> >execute code that roughly tells the device that I saw these event
> >records and then arm it again so it can report more interrupts (e.g if
> >more packets are received or sent). This is device specific code.
> 
> 
> Any chance that the hardware can use MSI (which is not the case here)?
> 
> Thanks
> 
> 
> >>Thanks
> >>
> >>
> >>>Can you explain how this should be done?
> >>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
  2020-07-29 10:19         ` Jason Wang
  2020-07-29 11:13           ` Eli Cohen
@ 2020-07-29 14:15           ` Eli Cohen
  2020-07-31  3:11             ` Jason Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Eli Cohen @ 2020-07-29 14:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu Lingshan, alex.williamson, mst, pbonzini,
	sean.j.christopherson, wanpengli, virtualization, netdev, kvm,
	shahafs, parav

OK, we have a mode of operation that does not require driver
intervention to manipulate the event queues so I think we're ok with
this design.

On Wed, Jul 29, 2020 at 06:19:52PM +0800, Jason Wang wrote:
> 
> On 2020/7/29 下午5:55, Eli Cohen wrote:
> >On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:
> >>On 2020/7/28 下午5:04, Eli Cohen wrote:
> >>>On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
> >>>>+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
> >>>>+{
> >>>>+	struct vhost_virtqueue *vq = &v->vqs[qid];
> >>>>+	const struct vdpa_config_ops *ops = v->vdpa->config;
> >>>>+	struct vdpa_device *vdpa = v->vdpa;
> >>>>+	int ret, irq;
> >>>>+
> >>>>+	spin_lock(&vq->call_ctx.ctx_lock);
> >>>>+	irq = ops->get_vq_irq(vdpa, qid);
> >>>>+	if (!vq->call_ctx.ctx || irq == -EINVAL) {
> >>>>+		spin_unlock(&vq->call_ctx.ctx_lock);
> >>>>+		return;
> >>>>+	}
> >>>>+
> >>>If I understand correctly, this will cause these IRQs to be forwarded
> >>>directly to the VCPU, e.g. will be handled by the guest/qemu.
> >>
> >>Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.
> >>
> >So, usually the network driver knows how to handle interrups for its
> >devices. I assume the virtio_net driver at the guest has some default
> >processing but what if the underlying hardware device (such as the case
> >of vdpa) needs to take some actions?
> 
> 
> Virtio splits the bus operations out of device operations. So did
> the driver.
> 
> The virtio-net driver depends on a transport driver to talk to the
> real device. Usually PCI is used as the transport for the device. In
> this case virtio-pci driver is in charge of dealing with irq
> allocation/free/configuration and it needs to co-operate with
> platform specific irqchip (virtualized by KVM) to finish the work
> like irq acknowledge etc.  E.g for x86, the irq offloading can only
> work when there's a hardware support of virtual irqchip (APICv) then
> all stuffs could be done without vmexits.
> 
> So no vendor specific part since the device and transport are all standard.
> 
> 
> >  Is there an option to do bounce the
> >interrupt back to the vendor specific driver in the host so it can take
> >these actions?
> 
> 
> Currently not, but even if we can do this, I'm afraid we will lose
> the performance advantage of irq bypassing.
> 
> 
> >
> >>>Does this mean that the host will not handle this interrupt? How does it
> >>>work in case on level triggered interrupts?
> >>
> >>There's no guarantee that the KVM arch code can make sure the irq
> >>bypass work for any type of irq. So if they the irq will still need
> >>to be handled by host first. This means we should keep the host
> >>interrupt handler as a slowpath (fallback).
> >>
> >>>In the case of ConnectX, I need to execute some code to acknowledge the
> >>>interrupt.
> >>
> >>This turns out to be hard for irq bypassing to work. Is it because
> >>the irq is shared or what kind of ack you need to do?
> >I have an EQ which is a queue for events comming from the hardware. This
> >EQ can created so it reports only completion events but I still need to
> >execute code that roughly tells the device that I saw these event
> >records and then arm it again so it can report more interrupts (e.g if
> >more packets are received or sent). This is device specific code.
> 
> 
> Any chance that the hardware can use MSI (which is not the case here)?
> 
> Thanks
> 
> 
> >>Thanks
> >>
> >>
> >>>Can you explain how this should be done?
> >>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
  2020-07-29 14:15           ` Eli Cohen
@ 2020-07-31  3:11             ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2020-07-31  3:11 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Zhu Lingshan, alex.williamson, mst, pbonzini,
	sean.j.christopherson, wanpengli, virtualization, netdev, kvm,
	shahafs, parav


On 2020/7/29 下午10:15, Eli Cohen wrote:
> OK, we have a mode of operation that does not require driver
> intervention to manipulate the event queues so I think we're ok with
> this design.


Good to know this.

Thanks



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 0/6] IRQ offloading for vDPA
  2020-07-28  4:23 [PATCH V4 0/6] IRQ offloading for vDPA Zhu Lingshan
                   ` (5 preceding siblings ...)
  2020-07-28  4:24 ` [PATCH V4 6/6] irqbypass: do not start cons/prod when failed connect Zhu Lingshan
@ 2020-07-31  3:18 ` Jason Wang
  6 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2020-07-31  3:18 UTC (permalink / raw)
  To: Zhu Lingshan, alex.williamson, mst, pbonzini,
	sean.j.christopherson, wanpengli
  Cc: virtualization, netdev, kvm, eli, shahafs, parav


On 2020/7/28 下午12:23, Zhu Lingshan wrote:
> This series intends to implement IRQ offloading for
> vhost_vdpa.
>
> By the feat of irq forwarding facilities like posted
> interrupt on X86, irq bypass can  help deliver
> interrupts to vCPU directly.
>
> vDPA devices have dedicated hardware backends like VFIO
> pass-throughed devices. So it would be possible to setup
> irq offloading(irq bypass) for vDPA devices and gain
> performance improvements.
>
> In my testing, with this feature, we can save 0.1ms
> in a ping between two VFs on average.
> changes from V3:
> (1)removed vDPA irq allocate/free helpers in vDPA core.
> (2)add a new function get_vq_irq() in struct vdpa_config_ops,
> upper layer driver can use this function to: A. query the
> irq numbner of a vq. B. detect whether a vq is enabled.
> (3)implement get_vq_irq() in ifcvf driver.
> (4)in vhost_vdpa, set_status() will setup irq offloading when
> setting DRIVER_OK, and unsetup when receive !DRIVER_OK.
> (5)minor improvements.


Ok, I think you can go ahead to post a V5. It's not bad to start with 
get_vq_irq() and we can do any changes afterwards if it can work for 
some cases.

Thanks


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, back to index

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-28  4:23 [PATCH V4 0/6] IRQ offloading for vDPA Zhu Lingshan
2020-07-28  4:24 ` [PATCH V4 1/6] vhost: introduce vhost_vring_call Zhu Lingshan
2020-07-28  4:24 ` [PATCH V4 2/6] kvm: detect assigned device via irqbypass manager Zhu Lingshan
2020-07-28  4:24 ` [PATCH V4 3/6] vDPA: add get_vq_irq() in vdpa_config_ops Zhu Lingshan
2020-07-28  7:42   ` Jason Wang
2020-07-28  4:24 ` [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa Zhu Lingshan
2020-07-28  7:53   ` Jason Wang
     [not found]     ` <f3e375da-3aa8-7a0c-237c-25943667a535@intel.com>
2020-07-28 10:29       ` Jason Wang
2020-07-28  9:04   ` Eli Cohen
2020-07-29  9:21     ` Jason Wang
2020-07-29  9:55       ` Eli Cohen
2020-07-29 10:19         ` Jason Wang
2020-07-29 11:13           ` Eli Cohen
2020-07-29 14:15           ` Eli Cohen
2020-07-31  3:11             ` Jason Wang
2020-07-28  4:24 ` [PATCH V4 5/6] ifcvf: implement vdpa_config_ops.get_vq_irq() Zhu Lingshan
2020-07-28  4:24 ` [PATCH V4 6/6] irqbypass: do not start cons/prod when failed connect Zhu Lingshan
2020-07-31  3:18 ` [PATCH V4 0/6] IRQ offloading for vDPA Jason Wang

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git