All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts
@ 2023-03-28 21:53 Reinette Chatre
  2023-03-28 21:53 ` [PATCH V2 1/8] vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable Reinette Chatre
                   ` (7 more replies)
  0 siblings, 8 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-03-28 21:53 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, alex.williamson
  Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
	tom.zanussi, reinette.chatre, linux-kernel

Changes since RFC V1:
- RFC V1: https://lore.kernel.org/lkml/cover.1678911529.git.reinette.chatre@intel.com/
- Improved changelogs.
- Simplify interface so that vfio_irq_ctx_alloc_single() returns pointer to
  allocated context. (Alex)
- Remove vfio_irq_ctx_range_allocated() and associated attempts to maintain
  invalid error path behavior. (Alex and Kevin)
- Add pointer to interrupt context as function parameter to
  vfio_irq_ctx_free(). (Alex)
- Ensure variables are initialized. (Dan Carpenter)
- Only support dynamic allocation if device supports it. (Alex)

Qemu allocates interrupts incrementally at the time the guest unmasks an
interrupt, for example each time a Linux guest runs request_irq().

Dynamic allocation of MSI-X interrupts was not possible until v6.2 [1].
This prompted Qemu to, when allocating a new interrupt, first release all
previously allocated interrupts (including disable of MSI-X) followed
by re-allocation of all interrupts that includes the new interrupt.
Please see [2] for a detailed discussion about this issue.

Releasing and re-allocating interrupts may be acceptable if all
interrupts are unmasked during device initialization. If unmasking of
interrupts occur during runtime this may result in lost interrupts.
For example, consider an accelerator device with multiple work queues,
each work queue having a dedicated interrupt. A work queue can be
enabled at any time with its associated interrupt unmasked while other
work queues are already active. Having all interrupts released and MSI-X
disabled to enable the new work queue will impact active work queues.

This series builds on the recent interrupt sub-system core changes
that added support for dynamic MSI-X allocation after initial MSI-X
enabling.

Add support for dynamic MSI-X allocation to vfio-pci. A flag
indicating lack of support for dynamic allocation already exist:
VFIO_IRQ_INFO_NORESIZE and has always been set for MSI and MSI-X. With
support for dynamic MSI-X the flag is cleared for MSI-X when supported,
enabling Qemu to modify its behavior.

Any feedback is appreciated

Reinette

[1] commit 34026364df8e ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X")
[2] https://lore.kernel.org/kvm/MWHPR11MB188603D0D809C1079F5817DC8C099@MWHPR11MB1886.namprd11.prod.outlook.com/#t

Reinette Chatre (8):
  vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable
  vfio/pci: Remove negative check on unsigned vector
  vfio/pci: Prepare for dynamic interrupt context storage
  vfio/pci: Use xarray for interrupt context storage
  vfio/pci: Remove interrupt context counter
  vfio/pci: Move to single error path
  vfio/pci: Support dynamic MSI-x
  vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X

 drivers/vfio/pci/vfio_pci_core.c  |   5 +-
 drivers/vfio/pci/vfio_pci_intrs.c | 347 +++++++++++++++++++++---------
 include/linux/vfio_pci_core.h     |   3 +-
 include/uapi/linux/vfio.h         |   3 +
 4 files changed, 257 insertions(+), 101 deletions(-)


base-commit: 197b6b60ae7bc51dd0814953c562833143b292aa
-- 
2.34.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH V2 1/8] vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable
  2023-03-28 21:53 [PATCH V2 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
@ 2023-03-28 21:53 ` Reinette Chatre
  2023-03-28 21:53 ` [PATCH V2 2/8] vfio/pci: Remove negative check on unsigned vector Reinette Chatre
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-03-28 21:53 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, alex.williamson
  Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
	tom.zanussi, reinette.chatre, linux-kernel

vfio_msi_disable() releases all previously allocated state
associated with each interrupt before disabling MSI/MSI-X.

vfio_msi_disable() iterates twice over the interrupt state:
first directly with a for loop to do virqfd cleanup, followed
by another for loop within vfio_msi_set_block() that releases
the interrupt and its state using vfio_msi_set_vector_signal().

Simplify interrupt cleanup by iterating over allocated interrupts
once.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index bffb0741518b..6a9c6a143cc3 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -426,10 +426,9 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 	for (i = 0; i < vdev->num_ctx; i++) {
 		vfio_virqfd_disable(&vdev->ctx[i].unmask);
 		vfio_virqfd_disable(&vdev->ctx[i].mask);
+		vfio_msi_set_vector_signal(vdev, i, -1, msix);
 	}
 
-	vfio_msi_set_block(vdev, 0, vdev->num_ctx, NULL, msix);
-
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
 	pci_free_irq_vectors(pdev);
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH V2 2/8] vfio/pci: Remove negative check on unsigned vector
  2023-03-28 21:53 [PATCH V2 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
  2023-03-28 21:53 ` [PATCH V2 1/8] vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable Reinette Chatre
@ 2023-03-28 21:53 ` Reinette Chatre
  2023-03-30 20:26   ` Alex Williamson
  2023-03-28 21:53 ` [PATCH V2 3/8] vfio/pci: Prepare for dynamic interrupt context storage Reinette Chatre
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Reinette Chatre @ 2023-03-28 21:53 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, alex.williamson
  Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
	tom.zanussi, reinette.chatre, linux-kernel

User space provides the vector as an unsigned int that is checked
early for validity (vfio_set_irqs_validate_and_prepare()).

A later negative check of the provided vector is not necessary.

Remove the negative check and ensure the type used
for the vector is consistent as an unsigned int.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 6a9c6a143cc3..3f64ccdce69f 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -317,14 +317,14 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 }
 
 static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
-				      int vector, int fd, bool msix)
+				      unsigned int vector, int fd, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	struct eventfd_ctx *trigger;
 	int irq, ret;
 	u16 cmd;
 
-	if (vector < 0 || vector >= vdev->num_ctx)
+	if (vector >= vdev->num_ctx)
 		return -EINVAL;
 
 	irq = pci_irq_vector(pdev, vector);
@@ -399,7 +399,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
 			      unsigned count, int32_t *fds, bool msix)
 {
-	int i, j, ret = 0;
+	int i, ret = 0;
+	unsigned int j;
 
 	if (start >= vdev->num_ctx || start + count > vdev->num_ctx)
 		return -EINVAL;
@@ -420,7 +421,7 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
 static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
-	int i;
+	unsigned int i;
 	u16 cmd;
 
 	for (i = 0; i < vdev->num_ctx; i++) {
@@ -542,7 +543,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
 				    unsigned index, unsigned start,
 				    unsigned count, uint32_t flags, void *data)
 {
-	int i;
+	unsigned int i;
 	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
 
 	if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH V2 3/8] vfio/pci: Prepare for dynamic interrupt context storage
  2023-03-28 21:53 [PATCH V2 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
  2023-03-28 21:53 ` [PATCH V2 1/8] vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable Reinette Chatre
  2023-03-28 21:53 ` [PATCH V2 2/8] vfio/pci: Remove negative check on unsigned vector Reinette Chatre
@ 2023-03-28 21:53 ` Reinette Chatre
  2023-03-28 21:53 ` [PATCH V2 4/8] vfio/pci: Use xarray for " Reinette Chatre
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-03-28 21:53 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, alex.williamson
  Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
	tom.zanussi, reinette.chatre, linux-kernel

Interrupt context storage is statically allocated at the time
interrupts are allocated. Following allocation, the interrupt
context is managed by directly accessing the elements of the
array using the vector as index.

It is possible to allocate additional MSI-X vectors after
MSI-X has been enabled. Dynamic storage of interrupt context
is needed to support adding new MSI-X vectors after initial
allocation.

Replace direct access of array elements with pointers to the
array elements. Doing so reduces impact of moving to a new data
structure.  Move interactions with the array to helpers to
mostly contain changes needed to transition to a dynamic
data structure.

No functional change intended.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V1:
- Improve accuracy of changelog.

 drivers/vfio/pci/vfio_pci_intrs.c | 206 ++++++++++++++++++++----------
 1 file changed, 140 insertions(+), 66 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 3f64ccdce69f..ece371ebea00 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -48,6 +48,31 @@ static bool is_irq_none(struct vfio_pci_core_device *vdev)
 		 vdev->irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
 }
 
+static
+struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
+					  unsigned long index)
+{
+	if (index >= vdev->num_ctx)
+		return NULL;
+	return &vdev->ctx[index];
+}
+
+static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)
+{
+	kfree(vdev->ctx);
+}
+
+static int vfio_irq_ctx_alloc_num(struct vfio_pci_core_device *vdev,
+				  unsigned long num)
+{
+	vdev->ctx = kcalloc(num, sizeof(struct vfio_pci_irq_ctx),
+			    GFP_KERNEL_ACCOUNT);
+	if (!vdev->ctx)
+		return -ENOMEM;
+
+	return 0;
+}
+
 /*
  * INTx
  */
@@ -55,17 +80,28 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
 {
 	struct vfio_pci_core_device *vdev = opaque;
 
-	if (likely(is_intx(vdev) && !vdev->virq_disabled))
-		eventfd_signal(vdev->ctx[0].trigger, 1);
+	if (likely(is_intx(vdev) && !vdev->virq_disabled)) {
+		struct vfio_pci_irq_ctx *ctx;
+
+		ctx = vfio_irq_ctx_get(vdev, 0);
+		if (!ctx)
+			return;
+		eventfd_signal(ctx->trigger, 1);
+	}
 }
 
 /* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */
 bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_irq_ctx *ctx;
 	unsigned long flags;
 	bool masked_changed = false;
 
+	ctx = vfio_irq_ctx_get(vdev, 0);
+	if (!ctx)
+		return masked_changed;
+
 	spin_lock_irqsave(&vdev->irqlock, flags);
 
 	/*
@@ -77,7 +113,7 @@ bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
 	if (unlikely(!is_intx(vdev))) {
 		if (vdev->pci_2_3)
 			pci_intx(pdev, 0);
-	} else if (!vdev->ctx[0].masked) {
+	} else if (!ctx->masked) {
 		/*
 		 * Can't use check_and_mask here because we always want to
 		 * mask, not just when something is pending.
@@ -87,7 +123,7 @@ bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
 		else
 			disable_irq_nosync(pdev->irq);
 
-		vdev->ctx[0].masked = true;
+		ctx->masked = true;
 		masked_changed = true;
 	}
 
@@ -105,9 +141,14 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
 {
 	struct vfio_pci_core_device *vdev = opaque;
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_irq_ctx *ctx;
 	unsigned long flags;
 	int ret = 0;
 
+	ctx = vfio_irq_ctx_get(vdev, 0);
+	if (!ctx)
+		return ret;
+
 	spin_lock_irqsave(&vdev->irqlock, flags);
 
 	/*
@@ -117,7 +158,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
 	if (unlikely(!is_intx(vdev))) {
 		if (vdev->pci_2_3)
 			pci_intx(pdev, 1);
-	} else if (vdev->ctx[0].masked && !vdev->virq_disabled) {
+	} else if (ctx->masked && !vdev->virq_disabled) {
 		/*
 		 * A pending interrupt here would immediately trigger,
 		 * but we can avoid that overhead by just re-sending
@@ -129,7 +170,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
 		} else
 			enable_irq(pdev->irq);
 
-		vdev->ctx[0].masked = (ret > 0);
+		ctx->masked = (ret > 0);
 	}
 
 	spin_unlock_irqrestore(&vdev->irqlock, flags);
@@ -146,18 +187,23 @@ void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
 static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
 {
 	struct vfio_pci_core_device *vdev = dev_id;
+	struct vfio_pci_irq_ctx *ctx;
 	unsigned long flags;
 	int ret = IRQ_NONE;
 
+	ctx = vfio_irq_ctx_get(vdev, 0);
+	if (!ctx)
+		return ret;
+
 	spin_lock_irqsave(&vdev->irqlock, flags);
 
 	if (!vdev->pci_2_3) {
 		disable_irq_nosync(vdev->pdev->irq);
-		vdev->ctx[0].masked = true;
+		ctx->masked = true;
 		ret = IRQ_HANDLED;
-	} else if (!vdev->ctx[0].masked &&  /* may be shared */
+	} else if (!ctx->masked &&  /* may be shared */
 		   pci_check_and_mask_intx(vdev->pdev)) {
-		vdev->ctx[0].masked = true;
+		ctx->masked = true;
 		ret = IRQ_HANDLED;
 	}
 
@@ -171,15 +217,24 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
 
 static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 {
+	struct vfio_pci_irq_ctx *ctx;
+	int ret;
+
 	if (!is_irq_none(vdev))
 		return -EINVAL;
 
 	if (!vdev->pdev->irq)
 		return -ENODEV;
 
-	vdev->ctx = kzalloc(sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL_ACCOUNT);
-	if (!vdev->ctx)
-		return -ENOMEM;
+	ret = vfio_irq_ctx_alloc_num(vdev, 1);
+	if (ret)
+		return ret;
+
+	ctx = vfio_irq_ctx_get(vdev, 0);
+	if (!ctx) {
+		vfio_irq_ctx_free_all(vdev);
+		return -EINVAL;
+	}
 
 	vdev->num_ctx = 1;
 
@@ -189,9 +244,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 	 * here, non-PCI-2.3 devices will have to wait until the
 	 * interrupt is enabled.
 	 */
-	vdev->ctx[0].masked = vdev->virq_disabled;
+	ctx->masked = vdev->virq_disabled;
 	if (vdev->pci_2_3)
-		pci_intx(vdev->pdev, !vdev->ctx[0].masked);
+		pci_intx(vdev->pdev, !ctx->masked);
 
 	vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
 
@@ -202,41 +257,46 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	unsigned long irqflags = IRQF_SHARED;
+	struct vfio_pci_irq_ctx *ctx;
 	struct eventfd_ctx *trigger;
 	unsigned long flags;
 	int ret;
 
-	if (vdev->ctx[0].trigger) {
+	ctx = vfio_irq_ctx_get(vdev, 0);
+	if (!ctx)
+		return -EINVAL;
+
+	if (ctx->trigger) {
 		free_irq(pdev->irq, vdev);
-		kfree(vdev->ctx[0].name);
-		eventfd_ctx_put(vdev->ctx[0].trigger);
-		vdev->ctx[0].trigger = NULL;
+		kfree(ctx->name);
+		eventfd_ctx_put(ctx->trigger);
+		ctx->trigger = NULL;
 	}
 
 	if (fd < 0) /* Disable only */
 		return 0;
 
-	vdev->ctx[0].name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)",
-				      pci_name(pdev));
-	if (!vdev->ctx[0].name)
+	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)",
+			      pci_name(pdev));
+	if (!ctx->name)
 		return -ENOMEM;
 
 	trigger = eventfd_ctx_fdget(fd);
 	if (IS_ERR(trigger)) {
-		kfree(vdev->ctx[0].name);
+		kfree(ctx->name);
 		return PTR_ERR(trigger);
 	}
 
-	vdev->ctx[0].trigger = trigger;
+	ctx->trigger = trigger;
 
 	if (!vdev->pci_2_3)
 		irqflags = 0;
 
 	ret = request_irq(pdev->irq, vfio_intx_handler,
-			  irqflags, vdev->ctx[0].name, vdev);
+			  irqflags, ctx->name, vdev);
 	if (ret) {
-		vdev->ctx[0].trigger = NULL;
-		kfree(vdev->ctx[0].name);
+		ctx->trigger = NULL;
+		kfree(ctx->name);
 		eventfd_ctx_put(trigger);
 		return ret;
 	}
@@ -246,7 +306,7 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 	 * disable_irq won't.
 	 */
 	spin_lock_irqsave(&vdev->irqlock, flags);
-	if (!vdev->pci_2_3 && vdev->ctx[0].masked)
+	if (!vdev->pci_2_3 && ctx->masked)
 		disable_irq_nosync(pdev->irq);
 	spin_unlock_irqrestore(&vdev->irqlock, flags);
 
@@ -255,12 +315,17 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 
 static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
 {
-	vfio_virqfd_disable(&vdev->ctx[0].unmask);
-	vfio_virqfd_disable(&vdev->ctx[0].mask);
+	struct vfio_pci_irq_ctx *ctx;
+
+	ctx = vfio_irq_ctx_get(vdev, 0);
+	if (ctx) {
+		vfio_virqfd_disable(&ctx->unmask);
+		vfio_virqfd_disable(&ctx->mask);
+	}
 	vfio_intx_set_signal(vdev, -1);
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	vdev->num_ctx = 0;
-	kfree(vdev->ctx);
+	vfio_irq_ctx_free_all(vdev);
 }
 
 /*
@@ -284,10 +349,9 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 	if (!is_irq_none(vdev))
 		return -EINVAL;
 
-	vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx),
-			    GFP_KERNEL_ACCOUNT);
-	if (!vdev->ctx)
-		return -ENOMEM;
+	ret = vfio_irq_ctx_alloc_num(vdev, nvec);
+	if (ret)
+		return ret;
 
 	/* return the number of supported vectors if we can't get all: */
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -296,7 +360,7 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 		if (ret > 0)
 			pci_free_irq_vectors(pdev);
 		vfio_pci_memory_unlock_and_restore(vdev, cmd);
-		kfree(vdev->ctx);
+		vfio_irq_ctx_free_all(vdev);
 		return ret;
 	}
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
@@ -320,6 +384,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 				      unsigned int vector, int fd, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_irq_ctx *ctx;
 	struct eventfd_ctx *trigger;
 	int irq, ret;
 	u16 cmd;
@@ -327,33 +392,33 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 	if (vector >= vdev->num_ctx)
 		return -EINVAL;
 
+	ctx = vfio_irq_ctx_get(vdev, vector);
+	if (!ctx)
+		return -EINVAL;
 	irq = pci_irq_vector(pdev, vector);
 
-	if (vdev->ctx[vector].trigger) {
-		irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
+	if (ctx->trigger) {
+		irq_bypass_unregister_producer(&ctx->producer);
 
 		cmd = vfio_pci_memory_lock_and_enable(vdev);
-		free_irq(irq, vdev->ctx[vector].trigger);
+		free_irq(irq, ctx->trigger);
 		vfio_pci_memory_unlock_and_restore(vdev, cmd);
-
-		kfree(vdev->ctx[vector].name);
-		eventfd_ctx_put(vdev->ctx[vector].trigger);
-		vdev->ctx[vector].trigger = NULL;
+		kfree(ctx->name);
+		eventfd_ctx_put(ctx->trigger);
+		ctx->trigger = NULL;
 	}
 
 	if (fd < 0)
 		return 0;
 
-	vdev->ctx[vector].name = kasprintf(GFP_KERNEL_ACCOUNT,
-					   "vfio-msi%s[%d](%s)",
-					   msix ? "x" : "", vector,
-					   pci_name(pdev));
-	if (!vdev->ctx[vector].name)
+	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
+			      msix ? "x" : "", vector, pci_name(pdev));
+	if (!ctx->name)
 		return -ENOMEM;
 
 	trigger = eventfd_ctx_fdget(fd);
 	if (IS_ERR(trigger)) {
-		kfree(vdev->ctx[vector].name);
+		kfree(ctx->name);
 		return PTR_ERR(trigger);
 	}
 
@@ -372,26 +437,25 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 		pci_write_msi_msg(irq, &msg);
 	}
 
-	ret = request_irq(irq, vfio_msihandler, 0,
-			  vdev->ctx[vector].name, trigger);
+	ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 	if (ret) {
-		kfree(vdev->ctx[vector].name);
+		kfree(ctx->name);
 		eventfd_ctx_put(trigger);
 		return ret;
 	}
 
-	vdev->ctx[vector].producer.token = trigger;
-	vdev->ctx[vector].producer.irq = irq;
-	ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
+	ctx->producer.token = trigger;
+	ctx->producer.irq = irq;
+	ret = irq_bypass_register_producer(&ctx->producer);
 	if (unlikely(ret)) {
 		dev_info(&pdev->dev,
 		"irq bypass producer (token %p) registration fails: %d\n",
-		vdev->ctx[vector].producer.token, ret);
+		ctx->producer.token, ret);
 
-		vdev->ctx[vector].producer.token = NULL;
+		ctx->producer.token = NULL;
 	}
-	vdev->ctx[vector].trigger = trigger;
+	ctx->trigger = trigger;
 
 	return 0;
 }
@@ -421,13 +485,17 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
 static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_irq_ctx *ctx;
 	unsigned int i;
 	u16 cmd;
 
 	for (i = 0; i < vdev->num_ctx; i++) {
-		vfio_virqfd_disable(&vdev->ctx[i].unmask);
-		vfio_virqfd_disable(&vdev->ctx[i].mask);
-		vfio_msi_set_vector_signal(vdev, i, -1, msix);
+		ctx = vfio_irq_ctx_get(vdev, i);
+		if (ctx) {
+			vfio_virqfd_disable(&ctx->unmask);
+			vfio_virqfd_disable(&ctx->mask);
+			vfio_msi_set_vector_signal(vdev, i, -1, msix);
+		}
 	}
 
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -443,7 +511,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	vdev->num_ctx = 0;
-	kfree(vdev->ctx);
+	vfio_irq_ctx_free_all(vdev);
 }
 
 /*
@@ -463,14 +531,18 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
 		if (unmask)
 			vfio_pci_intx_unmask(vdev);
 	} else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
+		struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(vdev, 0);
 		int32_t fd = *(int32_t *)data;
+
+		if (!ctx)
+			return -EINVAL;
 		if (fd >= 0)
 			return vfio_virqfd_enable((void *) vdev,
 						  vfio_pci_intx_unmask_handler,
 						  vfio_send_intx_eventfd, NULL,
-						  &vdev->ctx[0].unmask, fd);
+						  &ctx->unmask, fd);
 
-		vfio_virqfd_disable(&vdev->ctx[0].unmask);
+		vfio_virqfd_disable(&ctx->unmask);
 	}
 
 	return 0;
@@ -543,6 +615,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
 				    unsigned index, unsigned start,
 				    unsigned count, uint32_t flags, void *data)
 {
+	struct vfio_pci_irq_ctx *ctx;
 	unsigned int i;
 	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
 
@@ -577,14 +650,15 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
 		return -EINVAL;
 
 	for (i = start; i < start + count; i++) {
-		if (!vdev->ctx[i].trigger)
+		ctx = vfio_irq_ctx_get(vdev, i);
+		if (!ctx || !ctx->trigger)
 			continue;
 		if (flags & VFIO_IRQ_SET_DATA_NONE) {
-			eventfd_signal(vdev->ctx[i].trigger, 1);
+			eventfd_signal(ctx->trigger, 1);
 		} else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
 			uint8_t *bools = data;
 			if (bools[i - start])
-				eventfd_signal(vdev->ctx[i].trigger, 1);
+				eventfd_signal(ctx->trigger, 1);
 		}
 	}
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH V2 4/8] vfio/pci: Use xarray for interrupt context storage
  2023-03-28 21:53 [PATCH V2 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
                   ` (2 preceding siblings ...)
  2023-03-28 21:53 ` [PATCH V2 3/8] vfio/pci: Prepare for dynamic interrupt context storage Reinette Chatre
@ 2023-03-28 21:53 ` Reinette Chatre
  2023-04-07  7:21   ` Liu, Jing2
  2023-03-28 21:53 ` [PATCH V2 5/8] vfio/pci: Remove interrupt context counter Reinette Chatre
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Reinette Chatre @ 2023-03-28 21:53 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, alex.williamson
  Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
	tom.zanussi, reinette.chatre, linux-kernel

Interrupt context is statically allocated at the time interrupts
are allocated. Following allocation, the context is managed by
directly accessing the elements of the array using the vector
as index. The storage is released when interrupts are disabled.

It is possible to dynamically allocate a single MSI-X index
after MSI-X has been enabled. A dynamic storage for interrupt context
is needed to support this. Replace the interrupt context array with an
xarray (similar to what the core uses as store for MSI descriptors)
that can support the dynamic expansion while maintaining the
custom that uses the vector as index.

Use the new data storage to allocate all elements once and free all
elements together. Dynamic usage follows.

Create helpers with understanding that it is only possible
to (after initial MSI-X enabling) allocate a single MSI-X index at
a time. The only time multiple MSI-X are allocated is during initial
MSI-X enabling where failure results in no allocations.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V1:
- Let vfio_irq_ctx_alloc_single() return pointer to allocated
  context. (Alex)
- Transition INTx allocation to simplified vfio_irq_ctx_alloc_single().
- Improve accuracy of changelog.

 drivers/vfio/pci/vfio_pci_core.c  |  1 +
 drivers/vfio/pci/vfio_pci_intrs.c | 77 ++++++++++++++++++++-----------
 include/linux/vfio_pci_core.h     |  2 +-
 3 files changed, 53 insertions(+), 27 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a5ab416cf476..ae0e161c7fc9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2102,6 +2102,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 	INIT_LIST_HEAD(&vdev->vma_list);
 	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
 	init_rwsem(&vdev->memory_lock);
+	xa_init(&vdev->ctx);
 
 	return 0;
 }
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index ece371ebea00..00119641aa19 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -52,25 +52,60 @@ static
 struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
 					  unsigned long index)
 {
-	if (index >= vdev->num_ctx)
-		return NULL;
-	return &vdev->ctx[index];
+	return xa_load(&vdev->ctx, index);
 }
 
 static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)
 {
-	kfree(vdev->ctx);
+	struct vfio_pci_irq_ctx *ctx;
+	unsigned long index;
+
+	xa_for_each(&vdev->ctx, index, ctx) {
+		xa_erase(&vdev->ctx, index);
+		kfree(ctx);
+	}
+}
+
+static struct vfio_pci_irq_ctx *
+vfio_irq_ctx_alloc_single(struct vfio_pci_core_device *vdev,
+			  unsigned long index)
+{
+	struct vfio_pci_irq_ctx *ctx;
+	int ret;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
+	if (!ctx)
+		return NULL;
+
+	ret = xa_insert(&vdev->ctx, index, ctx, GFP_KERNEL_ACCOUNT);
+	if (ret) {
+		kfree(ctx);
+		return NULL;
+	}
+
+	return ctx;
 }
 
+/* Only called during initial interrupt enabling. Never after.  */
 static int vfio_irq_ctx_alloc_num(struct vfio_pci_core_device *vdev,
 				  unsigned long num)
 {
-	vdev->ctx = kcalloc(num, sizeof(struct vfio_pci_irq_ctx),
-			    GFP_KERNEL_ACCOUNT);
-	if (!vdev->ctx)
-		return -ENOMEM;
+	struct vfio_pci_irq_ctx *ctx;
+	unsigned long index;
+
+	WARN_ON(!xa_empty(&vdev->ctx));
+
+	for (index = 0; index < num; index++) {
+		ctx = vfio_irq_ctx_alloc_single(vdev, index);
+		if (!ctx)
+			goto err;
+	}
 
 	return 0;
+
+err:
+	vfio_irq_ctx_free_all(vdev);
+	return -ENOMEM;
 }
 
 /*
@@ -218,7 +253,6 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
 static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 {
 	struct vfio_pci_irq_ctx *ctx;
-	int ret;
 
 	if (!is_irq_none(vdev))
 		return -EINVAL;
@@ -226,15 +260,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 	if (!vdev->pdev->irq)
 		return -ENODEV;
 
-	ret = vfio_irq_ctx_alloc_num(vdev, 1);
-	if (ret)
-		return ret;
-
-	ctx = vfio_irq_ctx_get(vdev, 0);
-	if (!ctx) {
-		vfio_irq_ctx_free_all(vdev);
-		return -EINVAL;
-	}
+	ctx = vfio_irq_ctx_alloc_single(vdev, 0);
+	if (!ctx)
+		return -ENOMEM;
 
 	vdev->num_ctx = 1;
 
@@ -486,16 +514,13 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	struct vfio_pci_irq_ctx *ctx;
-	unsigned int i;
+	unsigned long i;
 	u16 cmd;
 
-	for (i = 0; i < vdev->num_ctx; i++) {
-		ctx = vfio_irq_ctx_get(vdev, i);
-		if (ctx) {
-			vfio_virqfd_disable(&ctx->unmask);
-			vfio_virqfd_disable(&ctx->mask);
-			vfio_msi_set_vector_signal(vdev, i, -1, msix);
-		}
+	xa_for_each(&vdev->ctx, i, ctx) {
+		vfio_virqfd_disable(&ctx->unmask);
+		vfio_virqfd_disable(&ctx->mask);
+		vfio_msi_set_vector_signal(vdev, i, -1, msix);
 	}
 
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 367fd79226a3..61d7873a3973 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -59,7 +59,7 @@ struct vfio_pci_core_device {
 	struct perm_bits	*msi_perm;
 	spinlock_t		irqlock;
 	struct mutex		igate;
-	struct vfio_pci_irq_ctx	*ctx;
+	struct xarray		ctx;
 	int			num_ctx;
 	int			irq_type;
 	int			num_regions;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH V2 5/8] vfio/pci: Remove interrupt context counter
  2023-03-28 21:53 [PATCH V2 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
                   ` (3 preceding siblings ...)
  2023-03-28 21:53 ` [PATCH V2 4/8] vfio/pci: Use xarray for " Reinette Chatre
@ 2023-03-28 21:53 ` Reinette Chatre
  2023-03-28 21:53 ` [PATCH V2 6/8] vfio/pci: Move to single error path Reinette Chatre
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-03-28 21:53 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, alex.williamson
  Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
	tom.zanussi, reinette.chatre, linux-kernel

struct vfio_pci_core_device::num_ctx counts how many interrupt
contexts have been allocated. When all interrupt contexts are
allocated simultaneously num_ctx provides the upper bound of all
vectors that can be used as indices into the interrupt context
array.

With the upcoming support for dynamic MSI-X the number of
interrupt contexts does not necessarily span the range of allocated
interrupts. Consequently, num_ctx is no longer a trusted upper bound
for valid indices.

Stop using num_ctx to determine if a provided vector is valid, use
the existence of interrupt context directly.

This changes behavior on the error path when user space provides
an invalid vector range. Behavior changes from early exit without
any modifications to possible modifications to valid vectors within
the invalid range. This is acceptable considering that an invalid
range is not a valid scenario, see link to discussion.

The checks that ensure that user space provides a range of vectors
that is valid for the device are untouched.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/lkml/20230316155646.07ae266f.alex.williamson@redhat.com/
---
Changes since RFC V1:
- Remove vfio_irq_ctx_range_allocated(). (Alex and Kevin).

 drivers/vfio/pci/vfio_pci_intrs.c | 13 +------------
 include/linux/vfio_pci_core.h     |  1 -
 2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 00119641aa19..b86dce0f384f 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -264,8 +264,6 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 	if (!ctx)
 		return -ENOMEM;
 
-	vdev->num_ctx = 1;
-
 	/*
 	 * If the virtual interrupt is masked, restore it.  Devices
 	 * supporting DisINTx can be masked at the hardware level
@@ -352,7 +350,6 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
 	}
 	vfio_intx_set_signal(vdev, -1);
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
-	vdev->num_ctx = 0;
 	vfio_irq_ctx_free_all(vdev);
 }
 
@@ -393,7 +390,6 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 	}
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 
-	vdev->num_ctx = nvec;
 	vdev->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
 				VFIO_PCI_MSI_IRQ_INDEX;
 
@@ -417,9 +413,6 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 	int irq, ret;
 	u16 cmd;
 
-	if (vector >= vdev->num_ctx)
-		return -EINVAL;
-
 	ctx = vfio_irq_ctx_get(vdev, vector);
 	if (!ctx)
 		return -EINVAL;
@@ -494,9 +487,6 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
 	int i, ret = 0;
 	unsigned int j;
 
-	if (start >= vdev->num_ctx || start + count > vdev->num_ctx)
-		return -EINVAL;
-
 	for (i = 0, j = start; i < count && !ret; i++, j++) {
 		int fd = fds ? fds[i] : -1;
 		ret = vfio_msi_set_vector_signal(vdev, j, fd, msix);
@@ -535,7 +525,6 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 		pci_intx(pdev, 0);
 
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
-	vdev->num_ctx = 0;
 	vfio_irq_ctx_free_all(vdev);
 }
 
@@ -671,7 +660,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
 		return ret;
 	}
 
-	if (!irq_is(vdev, index) || start + count > vdev->num_ctx)
+	if (!irq_is(vdev, index))
 		return -EINVAL;
 
 	for (i = start; i < start + count; i++) {
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 61d7873a3973..148fd1ae6c1c 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -60,7 +60,6 @@ struct vfio_pci_core_device {
 	spinlock_t		irqlock;
 	struct mutex		igate;
 	struct xarray		ctx;
-	int			num_ctx;
 	int			irq_type;
 	int			num_regions;
 	struct vfio_pci_region	*region;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH V2 6/8] vfio/pci: Move to single error path
  2023-03-28 21:53 [PATCH V2 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
                   ` (4 preceding siblings ...)
  2023-03-28 21:53 ` [PATCH V2 5/8] vfio/pci: Remove interrupt context counter Reinette Chatre
@ 2023-03-28 21:53 ` Reinette Chatre
  2023-03-28 21:53 ` [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x Reinette Chatre
  2023-03-28 21:53 ` [PATCH V2 8/8] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X Reinette Chatre
  7 siblings, 0 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-03-28 21:53 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, alex.williamson
  Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
	tom.zanussi, reinette.chatre, linux-kernel

Enabling and disabling of an interrupt involves several steps
that can fail. Cleanup after failure is done when the error
is encountered, resulting in some repetitive code.

Support for dynamic MSI-X will introduce more steps during
interrupt enabling and disabling.

Transition to centralized exit path in preparation for dynamic
MSI-X to eliminate duplicate error handling code. Ensure
no remaining state refers to freed memory.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V1:
- Improve changelog.

 drivers/vfio/pci/vfio_pci_intrs.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index b86dce0f384f..b3a258e58625 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -439,8 +439,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 
 	trigger = eventfd_ctx_fdget(fd);
 	if (IS_ERR(trigger)) {
-		kfree(ctx->name);
-		return PTR_ERR(trigger);
+		ret = PTR_ERR(trigger);
+		goto out_free_name;
 	}
 
 	/*
@@ -460,11 +460,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 
 	ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
-	if (ret) {
-		kfree(ctx->name);
-		eventfd_ctx_put(trigger);
-		return ret;
-	}
+	if (ret)
+		goto out_put_eventfd_ctx;
 
 	ctx->producer.token = trigger;
 	ctx->producer.irq = irq;
@@ -479,6 +476,13 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 	ctx->trigger = trigger;
 
 	return 0;
+
+out_put_eventfd_ctx:
+	eventfd_ctx_put(trigger);
+out_free_name:
+	kfree(ctx->name);
+	ctx->name = NULL;
+	return ret;
 }
 
 static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-28 21:53 [PATCH V2 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
                   ` (5 preceding siblings ...)
  2023-03-28 21:53 ` [PATCH V2 6/8] vfio/pci: Move to single error path Reinette Chatre
@ 2023-03-28 21:53 ` Reinette Chatre
  2023-03-29  2:48   ` kernel test robot
                     ` (3 more replies)
  2023-03-28 21:53 ` [PATCH V2 8/8] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X Reinette Chatre
  7 siblings, 4 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-03-28 21:53 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, alex.williamson
  Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
	tom.zanussi, reinette.chatre, linux-kernel

Recently introduced pci_msix_alloc_irq_at() and pci_msix_free_irq()
enables an individual MSI-X index to be allocated and freed after
MSI-X enabling.

Support dynamic MSI-X if supported by the device. Keep the association
between allocated interrupt and vfio interrupt context. Allocate new
context together with the new interrupt if no interrupt context exist
for an MSI-X interrupt. Similarly, release an interrupt with its
context.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V1:
- Add pointer to interrupt context as function parameter to
  vfio_irq_ctx_free(). (Alex)
- Initialize new_ctx to false. (Dan Carpenter)
- Only support dynamic allocation if device supports it. (Alex)

 drivers/vfio/pci/vfio_pci_intrs.c | 93 +++++++++++++++++++++++++------
 1 file changed, 76 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index b3a258e58625..755b752ca17e 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -55,6 +55,13 @@ struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
 	return xa_load(&vdev->ctx, index);
 }
 
+static void vfio_irq_ctx_free(struct vfio_pci_core_device *vdev,
+			      struct vfio_pci_irq_ctx *ctx, unsigned long index)
+{
+	xa_erase(&vdev->ctx, index);
+	kfree(ctx);
+}
+
 static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)
 {
 	struct vfio_pci_irq_ctx *ctx;
@@ -409,33 +416,62 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 {
 	struct pci_dev *pdev = vdev->pdev;
 	struct vfio_pci_irq_ctx *ctx;
+	struct msi_map msix_map = {};
+	bool allow_dyn_alloc = false;
 	struct eventfd_ctx *trigger;
+	bool new_ctx = false;
 	int irq, ret;
 	u16 cmd;
 
+	/* Only MSI-X allows dynamic allocation. */
+	if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
+		allow_dyn_alloc = true;
+
 	ctx = vfio_irq_ctx_get(vdev, vector);
-	if (!ctx)
+	if (!ctx && !allow_dyn_alloc)
 		return -EINVAL;
+
 	irq = pci_irq_vector(pdev, vector);
+	/* Context and interrupt are always allocated together. */
+	WARN_ON((ctx && irq == -EINVAL) || (!ctx && irq != -EINVAL));
 
-	if (ctx->trigger) {
+	if (ctx && ctx->trigger) {
 		irq_bypass_unregister_producer(&ctx->producer);
 
 		cmd = vfio_pci_memory_lock_and_enable(vdev);
 		free_irq(irq, ctx->trigger);
+		if (allow_dyn_alloc) {
+			msix_map.index = vector;
+			msix_map.virq = irq;
+			pci_msix_free_irq(pdev, msix_map);
+			irq = -EINVAL;
+		}
 		vfio_pci_memory_unlock_and_restore(vdev, cmd);
 		kfree(ctx->name);
 		eventfd_ctx_put(ctx->trigger);
 		ctx->trigger = NULL;
+		if (allow_dyn_alloc) {
+			vfio_irq_ctx_free(vdev, ctx, vector);
+			ctx = NULL;
+		}
 	}
 
 	if (fd < 0)
 		return 0;
 
+	if (!ctx) {
+		ctx = vfio_irq_ctx_alloc_single(vdev, vector);
+		if (!ctx)
+			return -ENOMEM;
+		new_ctx = true;
+	}
+
 	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
 			      msix ? "x" : "", vector, pci_name(pdev));
-	if (!ctx->name)
-		return -ENOMEM;
+	if (!ctx->name) {
+		ret = -ENOMEM;
+		goto out_free_ctx;
+	}
 
 	trigger = eventfd_ctx_fdget(fd);
 	if (IS_ERR(trigger)) {
@@ -443,25 +479,38 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 		goto out_free_name;
 	}
 
-	/*
-	 * The MSIx vector table resides in device memory which may be cleared
-	 * via backdoor resets. We don't allow direct access to the vector
-	 * table so even if a userspace driver attempts to save/restore around
-	 * such a reset it would be unsuccessful. To avoid this, restore the
-	 * cached value of the message prior to enabling.
-	 */
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
 	if (msix) {
-		struct msi_msg msg;
-
-		get_cached_msi_msg(irq, &msg);
-		pci_write_msi_msg(irq, &msg);
+		if (irq == -EINVAL) {
+			msix_map = pci_msix_alloc_irq_at(pdev, vector, NULL);
+			if (msix_map.index < 0) {
+				vfio_pci_memory_unlock_and_restore(vdev, cmd);
+				ret = msix_map.index;
+				goto out_put_eventfd_ctx;
+			}
+			irq = msix_map.virq;
+		} else {
+			/*
+			 * The MSIx vector table resides in device memory which
+			 * may be cleared via backdoor resets. We don't allow
+			 * direct access to the vector table so even if a
+			 * userspace driver attempts to save/restore around
+			 * such a reset it would be unsuccessful. To avoid
+			 * this, restore the cached value of the message prior
+			 * to enabling.
+			 */
+			struct msi_msg msg;
+
+			get_cached_msi_msg(irq, &msg);
+			pci_write_msi_msg(irq, &msg);
+		}
 	}
 
 	ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
-	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 	if (ret)
-		goto out_put_eventfd_ctx;
+		goto out_free_irq_locked;
+
+	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 
 	ctx->producer.token = trigger;
 	ctx->producer.irq = irq;
@@ -477,11 +526,21 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 
 	return 0;
 
+out_free_irq_locked:
+	if (allow_dyn_alloc && new_ctx) {
+		msix_map.index = vector;
+		msix_map.virq = irq;
+		pci_msix_free_irq(pdev, msix_map);
+	}
+	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 out_put_eventfd_ctx:
 	eventfd_ctx_put(trigger);
 out_free_name:
 	kfree(ctx->name);
 	ctx->name = NULL;
+out_free_ctx:
+	if (allow_dyn_alloc && new_ctx)
+		vfio_irq_ctx_free(vdev, ctx, vector);
 	return ret;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH V2 8/8] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
  2023-03-28 21:53 [PATCH V2 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
                   ` (6 preceding siblings ...)
  2023-03-28 21:53 ` [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x Reinette Chatre
@ 2023-03-28 21:53 ` Reinette Chatre
  2023-03-29  3:29   ` kernel test robot
  2023-03-29  3:29   ` kernel test robot
  7 siblings, 2 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-03-28 21:53 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, alex.williamson
  Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
	tom.zanussi, reinette.chatre, linux-kernel

Dynamic MSI-X is supported. Clear VFIO_IRQ_INFO_NORESIZE
to provide guidance to user space.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V1:
- Only advertise VFIO_IRQ_INFO_NORESIZE for MSI-X devices that
  can actually support dynamic allocation. (Alex)

 drivers/vfio/pci/vfio_pci_core.c | 4 +++-
 include/uapi/linux/vfio.h        | 3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index ae0e161c7fc9..c679d7b59f9b 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1111,7 +1111,9 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev,
 	if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
 		info.flags |=
 			(VFIO_IRQ_INFO_MASKABLE | VFIO_IRQ_INFO_AUTOMASKED);
-	else
+	else if ((info.index != VFIO_PCI_MSIX_IRQ_INDEX) ||
+		 (info.index == VFIO_PCI_MSIX_IRQ_INDEX &&
+		  !pci_msix_can_alloc_dyn(vdev->pdev)))
 		info.flags |= VFIO_IRQ_INFO_NORESIZE;
 
 	return copy_to_user(arg, &info, minsz) ? -EFAULT : 0;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0552e8dcf0cb..1a36134cae5c 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -511,6 +511,9 @@ struct vfio_region_info_cap_nvlink2_lnkspd {
  * then add and unmask vectors, it's up to userspace to make the decision
  * whether to allocate the maximum supported number of vectors or tear
  * down setup and incrementally increase the vectors as each is enabled.
+ * Absence of the NORESIZE flag indicates that vectors can be enabled
+ * and disabled dynamically without impacting other vectors within the
+ * index.
  */
 struct vfio_irq_info {
 	__u32	argsz;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-28 21:53 ` [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x Reinette Chatre
@ 2023-03-29  2:48   ` kernel test robot
  2023-03-29 14:42     ` Reinette Chatre
  2023-03-29  2:58   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 39+ messages in thread
From: kernel test robot @ 2023-03-29  2:48 UTC (permalink / raw)
  To: Reinette Chatre, jgg, yishaih, shameerali.kolothum.thodi,
	kevin.tian, alex.williamson
  Cc: oe-kbuild-all, tglx, darwi, kvm, dave.jiang, jing2.liu,
	ashok.raj, fenghua.yu, tom.zanussi, reinette.chatre,
	linux-kernel

Hi Reinette,

I love your patch! Yet something to improve:

[auto build test ERROR on 197b6b60ae7bc51dd0814953c562833143b292aa]

url:    https://github.com/intel-lab-lkp/linux/commits/Reinette-Chatre/vfio-pci-Consolidate-irq-cleanup-on-MSI-MSI-X-disable/20230329-055735
base:   197b6b60ae7bc51dd0814953c562833143b292aa
patch link:    https://lore.kernel.org/r/419f3ba2f732154d8ae079b3deb02d0fdbe3e258.1680038771.git.reinette.chatre%40intel.com
patch subject: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
config: i386-randconfig-a001-20230327 (https://download.01.org/0day-ci/archive/20230329/202303291000.PWFqGCxH-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/420198d6ba9227a0ef81a2192ca35019fa6439cf
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Reinette-Chatre/vfio-pci-Consolidate-irq-cleanup-on-MSI-MSI-X-disable/20230329-055735
        git checkout 420198d6ba9227a0ef81a2192ca35019fa6439cf
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 olddefconfig
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/vfio/pci/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303291000.PWFqGCxH-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/vfio/pci/vfio_pci_intrs.c: In function 'vfio_msi_set_vector_signal':
>> drivers/vfio/pci/vfio_pci_intrs.c:427:21: error: implicit declaration of function 'pci_msix_can_alloc_dyn' [-Werror=implicit-function-declaration]
     427 |         if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
         |                     ^~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/pci_msix_can_alloc_dyn +427 drivers/vfio/pci/vfio_pci_intrs.c

   413	
   414	static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
   415					      unsigned int vector, int fd, bool msix)
   416	{
   417		struct pci_dev *pdev = vdev->pdev;
   418		struct vfio_pci_irq_ctx *ctx;
   419		struct msi_map msix_map = {};
   420		bool allow_dyn_alloc = false;
   421		struct eventfd_ctx *trigger;
   422		bool new_ctx = false;
   423		int irq, ret;
   424		u16 cmd;
   425	
   426		/* Only MSI-X allows dynamic allocation. */
 > 427		if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
   428			allow_dyn_alloc = true;
   429	
   430		ctx = vfio_irq_ctx_get(vdev, vector);
   431		if (!ctx && !allow_dyn_alloc)
   432			return -EINVAL;
   433	
   434		irq = pci_irq_vector(pdev, vector);
   435		/* Context and interrupt are always allocated together. */
   436		WARN_ON((ctx && irq == -EINVAL) || (!ctx && irq != -EINVAL));
   437	
   438		if (ctx && ctx->trigger) {
   439			irq_bypass_unregister_producer(&ctx->producer);
   440	
   441			cmd = vfio_pci_memory_lock_and_enable(vdev);
   442			free_irq(irq, ctx->trigger);
   443			if (allow_dyn_alloc) {
   444				msix_map.index = vector;
   445				msix_map.virq = irq;
   446				pci_msix_free_irq(pdev, msix_map);
   447				irq = -EINVAL;
   448			}
   449			vfio_pci_memory_unlock_and_restore(vdev, cmd);
   450			kfree(ctx->name);
   451			eventfd_ctx_put(ctx->trigger);
   452			ctx->trigger = NULL;
   453			if (allow_dyn_alloc) {
   454				vfio_irq_ctx_free(vdev, ctx, vector);
   455				ctx = NULL;
   456			}
   457		}
   458	
   459		if (fd < 0)
   460			return 0;
   461	
   462		if (!ctx) {
   463			ctx = vfio_irq_ctx_alloc_single(vdev, vector);
   464			if (!ctx)
   465				return -ENOMEM;
   466			new_ctx = true;
   467		}
   468	
   469		ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
   470				      msix ? "x" : "", vector, pci_name(pdev));
   471		if (!ctx->name) {
   472			ret = -ENOMEM;
   473			goto out_free_ctx;
   474		}
   475	
   476		trigger = eventfd_ctx_fdget(fd);
   477		if (IS_ERR(trigger)) {
   478			ret = PTR_ERR(trigger);
   479			goto out_free_name;
   480		}
   481	
   482		cmd = vfio_pci_memory_lock_and_enable(vdev);
   483		if (msix) {
   484			if (irq == -EINVAL) {
   485				msix_map = pci_msix_alloc_irq_at(pdev, vector, NULL);
   486				if (msix_map.index < 0) {
   487					vfio_pci_memory_unlock_and_restore(vdev, cmd);
   488					ret = msix_map.index;
   489					goto out_put_eventfd_ctx;
   490				}
   491				irq = msix_map.virq;
   492			} else {
   493				/*
   494				 * The MSIx vector table resides in device memory which
   495				 * may be cleared via backdoor resets. We don't allow
   496				 * direct access to the vector table so even if a
   497				 * userspace driver attempts to save/restore around
   498				 * such a reset it would be unsuccessful. To avoid
   499				 * this, restore the cached value of the message prior
   500				 * to enabling.
   501				 */
   502				struct msi_msg msg;
   503	
   504				get_cached_msi_msg(irq, &msg);
   505				pci_write_msi_msg(irq, &msg);
   506			}
   507		}
   508	
   509		ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
   510		if (ret)
   511			goto out_free_irq_locked;
   512	
   513		vfio_pci_memory_unlock_and_restore(vdev, cmd);
   514	
   515		ctx->producer.token = trigger;
   516		ctx->producer.irq = irq;
   517		ret = irq_bypass_register_producer(&ctx->producer);
   518		if (unlikely(ret)) {
   519			dev_info(&pdev->dev,
   520			"irq bypass producer (token %p) registration fails: %d\n",
   521			ctx->producer.token, ret);
   522	
   523			ctx->producer.token = NULL;
   524		}
   525		ctx->trigger = trigger;
   526	
   527		return 0;
   528	
   529	out_free_irq_locked:
   530		if (allow_dyn_alloc && new_ctx) {
   531			msix_map.index = vector;
   532			msix_map.virq = irq;
   533			pci_msix_free_irq(pdev, msix_map);
   534		}
   535		vfio_pci_memory_unlock_and_restore(vdev, cmd);
   536	out_put_eventfd_ctx:
   537		eventfd_ctx_put(trigger);
   538	out_free_name:
   539		kfree(ctx->name);
   540		ctx->name = NULL;
   541	out_free_ctx:
   542		if (allow_dyn_alloc && new_ctx)
   543			vfio_irq_ctx_free(vdev, ctx, vector);
   544		return ret;
   545	}
   546	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-28 21:53 ` [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x Reinette Chatre
  2023-03-29  2:48   ` kernel test robot
@ 2023-03-29  2:58   ` kernel test robot
  2023-03-30 22:40   ` Alex Williamson
  2023-03-31 10:02   ` Liu, Jing2
  3 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2023-03-29  2:58 UTC (permalink / raw)
  To: Reinette Chatre, jgg, yishaih, shameerali.kolothum.thodi,
	kevin.tian, alex.williamson
  Cc: llvm, oe-kbuild-all, tglx, darwi, kvm, dave.jiang, jing2.liu,
	ashok.raj, fenghua.yu, tom.zanussi, reinette.chatre,
	linux-kernel

Hi Reinette,

I love your patch! Yet something to improve:

[auto build test ERROR on 197b6b60ae7bc51dd0814953c562833143b292aa]

url:    https://github.com/intel-lab-lkp/linux/commits/Reinette-Chatre/vfio-pci-Consolidate-irq-cleanup-on-MSI-MSI-X-disable/20230329-055735
base:   197b6b60ae7bc51dd0814953c562833143b292aa
patch link:    https://lore.kernel.org/r/419f3ba2f732154d8ae079b3deb02d0fdbe3e258.1680038771.git.reinette.chatre%40intel.com
patch subject: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
config: i386-randconfig-a014-20230327 (https://download.01.org/0day-ci/archive/20230329/202303291023.2Kc7CcsV-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/420198d6ba9227a0ef81a2192ca35019fa6439cf
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Reinette-Chatre/vfio-pci-Consolidate-irq-cleanup-on-MSI-MSI-X-disable/20230329-055735
        git checkout 420198d6ba9227a0ef81a2192ca35019fa6439cf
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/vfio/pci/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303291023.2Kc7CcsV-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/vfio/pci/vfio_pci_intrs.c:427:14: error: implicit declaration of function 'pci_msix_can_alloc_dyn' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
           if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
                       ^
   1 error generated.


vim +/pci_msix_can_alloc_dyn +427 drivers/vfio/pci/vfio_pci_intrs.c

   413	
   414	static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
   415					      unsigned int vector, int fd, bool msix)
   416	{
   417		struct pci_dev *pdev = vdev->pdev;
   418		struct vfio_pci_irq_ctx *ctx;
   419		struct msi_map msix_map = {};
   420		bool allow_dyn_alloc = false;
   421		struct eventfd_ctx *trigger;
   422		bool new_ctx = false;
   423		int irq, ret;
   424		u16 cmd;
   425	
   426		/* Only MSI-X allows dynamic allocation. */
 > 427		if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
   428			allow_dyn_alloc = true;
   429	
   430		ctx = vfio_irq_ctx_get(vdev, vector);
   431		if (!ctx && !allow_dyn_alloc)
   432			return -EINVAL;
   433	
   434		irq = pci_irq_vector(pdev, vector);
   435		/* Context and interrupt are always allocated together. */
   436		WARN_ON((ctx && irq == -EINVAL) || (!ctx && irq != -EINVAL));
   437	
   438		if (ctx && ctx->trigger) {
   439			irq_bypass_unregister_producer(&ctx->producer);
   440	
   441			cmd = vfio_pci_memory_lock_and_enable(vdev);
   442			free_irq(irq, ctx->trigger);
   443			if (allow_dyn_alloc) {
   444				msix_map.index = vector;
   445				msix_map.virq = irq;
   446				pci_msix_free_irq(pdev, msix_map);
   447				irq = -EINVAL;
   448			}
   449			vfio_pci_memory_unlock_and_restore(vdev, cmd);
   450			kfree(ctx->name);
   451			eventfd_ctx_put(ctx->trigger);
   452			ctx->trigger = NULL;
   453			if (allow_dyn_alloc) {
   454				vfio_irq_ctx_free(vdev, ctx, vector);
   455				ctx = NULL;
   456			}
   457		}
   458	
   459		if (fd < 0)
   460			return 0;
   461	
   462		if (!ctx) {
   463			ctx = vfio_irq_ctx_alloc_single(vdev, vector);
   464			if (!ctx)
   465				return -ENOMEM;
   466			new_ctx = true;
   467		}
   468	
   469		ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
   470				      msix ? "x" : "", vector, pci_name(pdev));
   471		if (!ctx->name) {
   472			ret = -ENOMEM;
   473			goto out_free_ctx;
   474		}
   475	
   476		trigger = eventfd_ctx_fdget(fd);
   477		if (IS_ERR(trigger)) {
   478			ret = PTR_ERR(trigger);
   479			goto out_free_name;
   480		}
   481	
   482		cmd = vfio_pci_memory_lock_and_enable(vdev);
   483		if (msix) {
   484			if (irq == -EINVAL) {
   485				msix_map = pci_msix_alloc_irq_at(pdev, vector, NULL);
   486				if (msix_map.index < 0) {
   487					vfio_pci_memory_unlock_and_restore(vdev, cmd);
   488					ret = msix_map.index;
   489					goto out_put_eventfd_ctx;
   490				}
   491				irq = msix_map.virq;
   492			} else {
   493				/*
   494				 * The MSIx vector table resides in device memory which
   495				 * may be cleared via backdoor resets. We don't allow
   496				 * direct access to the vector table so even if a
   497				 * userspace driver attempts to save/restore around
   498				 * such a reset it would be unsuccessful. To avoid
   499				 * this, restore the cached value of the message prior
   500				 * to enabling.
   501				 */
   502				struct msi_msg msg;
   503	
   504				get_cached_msi_msg(irq, &msg);
   505				pci_write_msi_msg(irq, &msg);
   506			}
   507		}
   508	
   509		ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
   510		if (ret)
   511			goto out_free_irq_locked;
   512	
   513		vfio_pci_memory_unlock_and_restore(vdev, cmd);
   514	
   515		ctx->producer.token = trigger;
   516		ctx->producer.irq = irq;
   517		ret = irq_bypass_register_producer(&ctx->producer);
   518		if (unlikely(ret)) {
   519			dev_info(&pdev->dev,
   520			"irq bypass producer (token %p) registration fails: %d\n",
   521			ctx->producer.token, ret);
   522	
   523			ctx->producer.token = NULL;
   524		}
   525		ctx->trigger = trigger;
   526	
   527		return 0;
   528	
   529	out_free_irq_locked:
   530		if (allow_dyn_alloc && new_ctx) {
   531			msix_map.index = vector;
   532			msix_map.virq = irq;
   533			pci_msix_free_irq(pdev, msix_map);
   534		}
   535		vfio_pci_memory_unlock_and_restore(vdev, cmd);
   536	out_put_eventfd_ctx:
   537		eventfd_ctx_put(trigger);
   538	out_free_name:
   539		kfree(ctx->name);
   540		ctx->name = NULL;
   541	out_free_ctx:
   542		if (allow_dyn_alloc && new_ctx)
   543			vfio_irq_ctx_free(vdev, ctx, vector);
   544		return ret;
   545	}
   546	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 8/8] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
  2023-03-28 21:53 ` [PATCH V2 8/8] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X Reinette Chatre
@ 2023-03-29  3:29   ` kernel test robot
  2023-03-29  3:29   ` kernel test robot
  1 sibling, 0 replies; 39+ messages in thread
From: kernel test robot @ 2023-03-29  3:29 UTC (permalink / raw)
  To: Reinette Chatre, jgg, yishaih, shameerali.kolothum.thodi,
	kevin.tian, alex.williamson
  Cc: oe-kbuild-all, tglx, darwi, kvm, dave.jiang, jing2.liu,
	ashok.raj, fenghua.yu, tom.zanussi, reinette.chatre,
	linux-kernel

Hi Reinette,

I love your patch! Yet something to improve:

[auto build test ERROR on 197b6b60ae7bc51dd0814953c562833143b292aa]

url:    https://github.com/intel-lab-lkp/linux/commits/Reinette-Chatre/vfio-pci-Consolidate-irq-cleanup-on-MSI-MSI-X-disable/20230329-055735
base:   197b6b60ae7bc51dd0814953c562833143b292aa
patch link:    https://lore.kernel.org/r/81a6066c0f0d6dfa06f41c016abfb7152064e33e.1680038771.git.reinette.chatre%40intel.com
patch subject: [PATCH V2 8/8] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
config: i386-randconfig-a001-20230327 (https://download.01.org/0day-ci/archive/20230329/202303291114.8inU0hbN-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/39bc54993b029037b12b4a7e947d6cd500065c6b
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Reinette-Chatre/vfio-pci-Consolidate-irq-cleanup-on-MSI-MSI-X-disable/20230329-055735
        git checkout 39bc54993b029037b12b4a7e947d6cd500065c6b
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 olddefconfig
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/vfio/pci/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303291114.8inU0hbN-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/vfio/pci/vfio_pci_core.c: In function 'vfio_pci_ioctl_get_irq_info':
>> drivers/vfio/pci/vfio_pci_core.c:1116:20: error: implicit declaration of function 'pci_msix_can_alloc_dyn' [-Werror=implicit-function-declaration]
    1116 |                   !pci_msix_can_alloc_dyn(vdev->pdev)))
         |                    ^~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/pci_msix_can_alloc_dyn +1116 drivers/vfio/pci/vfio_pci_core.c

  1082	
  1083	static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev,
  1084					       struct vfio_irq_info __user *arg)
  1085	{
  1086		unsigned long minsz = offsetofend(struct vfio_irq_info, count);
  1087		struct vfio_irq_info info;
  1088	
  1089		if (copy_from_user(&info, arg, minsz))
  1090			return -EFAULT;
  1091	
  1092		if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
  1093			return -EINVAL;
  1094	
  1095		switch (info.index) {
  1096		case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
  1097		case VFIO_PCI_REQ_IRQ_INDEX:
  1098			break;
  1099		case VFIO_PCI_ERR_IRQ_INDEX:
  1100			if (pci_is_pcie(vdev->pdev))
  1101				break;
  1102			fallthrough;
  1103		default:
  1104			return -EINVAL;
  1105		}
  1106	
  1107		info.flags = VFIO_IRQ_INFO_EVENTFD;
  1108	
  1109		info.count = vfio_pci_get_irq_count(vdev, info.index);
  1110	
  1111		if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
  1112			info.flags |=
  1113				(VFIO_IRQ_INFO_MASKABLE | VFIO_IRQ_INFO_AUTOMASKED);
  1114		else if ((info.index != VFIO_PCI_MSIX_IRQ_INDEX) ||
  1115			 (info.index == VFIO_PCI_MSIX_IRQ_INDEX &&
> 1116			  !pci_msix_can_alloc_dyn(vdev->pdev)))
  1117			info.flags |= VFIO_IRQ_INFO_NORESIZE;
  1118	
  1119		return copy_to_user(arg, &info, minsz) ? -EFAULT : 0;
  1120	}
  1121	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 8/8] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
  2023-03-28 21:53 ` [PATCH V2 8/8] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X Reinette Chatre
  2023-03-29  3:29   ` kernel test robot
@ 2023-03-29  3:29   ` kernel test robot
  1 sibling, 0 replies; 39+ messages in thread
From: kernel test robot @ 2023-03-29  3:29 UTC (permalink / raw)
  To: Reinette Chatre, jgg, yishaih, shameerali.kolothum.thodi,
	kevin.tian, alex.williamson
  Cc: llvm, oe-kbuild-all, tglx, darwi, kvm, dave.jiang, jing2.liu,
	ashok.raj, fenghua.yu, tom.zanussi, reinette.chatre,
	linux-kernel

Hi Reinette,

I love your patch! Yet something to improve:

[auto build test ERROR on 197b6b60ae7bc51dd0814953c562833143b292aa]

url:    https://github.com/intel-lab-lkp/linux/commits/Reinette-Chatre/vfio-pci-Consolidate-irq-cleanup-on-MSI-MSI-X-disable/20230329-055735
base:   197b6b60ae7bc51dd0814953c562833143b292aa
patch link:    https://lore.kernel.org/r/81a6066c0f0d6dfa06f41c016abfb7152064e33e.1680038771.git.reinette.chatre%40intel.com
patch subject: [PATCH V2 8/8] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
config: i386-randconfig-a014-20230327 (https://download.01.org/0day-ci/archive/20230329/202303291158.UlMhtYCO-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/39bc54993b029037b12b4a7e947d6cd500065c6b
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Reinette-Chatre/vfio-pci-Consolidate-irq-cleanup-on-MSI-MSI-X-disable/20230329-055735
        git checkout 39bc54993b029037b12b4a7e947d6cd500065c6b
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/vfio/pci/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303291158.UlMhtYCO-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/vfio/pci/vfio_pci_core.c:1116:6: error: implicit declaration of function 'pci_msix_can_alloc_dyn' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
                     !pci_msix_can_alloc_dyn(vdev->pdev)))
                      ^
   1 error generated.


vim +/pci_msix_can_alloc_dyn +1116 drivers/vfio/pci/vfio_pci_core.c

  1082	
  1083	static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev,
  1084					       struct vfio_irq_info __user *arg)
  1085	{
  1086		unsigned long minsz = offsetofend(struct vfio_irq_info, count);
  1087		struct vfio_irq_info info;
  1088	
  1089		if (copy_from_user(&info, arg, minsz))
  1090			return -EFAULT;
  1091	
  1092		if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
  1093			return -EINVAL;
  1094	
  1095		switch (info.index) {
  1096		case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
  1097		case VFIO_PCI_REQ_IRQ_INDEX:
  1098			break;
  1099		case VFIO_PCI_ERR_IRQ_INDEX:
  1100			if (pci_is_pcie(vdev->pdev))
  1101				break;
  1102			fallthrough;
  1103		default:
  1104			return -EINVAL;
  1105		}
  1106	
  1107		info.flags = VFIO_IRQ_INFO_EVENTFD;
  1108	
  1109		info.count = vfio_pci_get_irq_count(vdev, info.index);
  1110	
  1111		if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
  1112			info.flags |=
  1113				(VFIO_IRQ_INFO_MASKABLE | VFIO_IRQ_INFO_AUTOMASKED);
  1114		else if ((info.index != VFIO_PCI_MSIX_IRQ_INDEX) ||
  1115			 (info.index == VFIO_PCI_MSIX_IRQ_INDEX &&
> 1116			  !pci_msix_can_alloc_dyn(vdev->pdev)))
  1117			info.flags |= VFIO_IRQ_INFO_NORESIZE;
  1118	
  1119		return copy_to_user(arg, &info, minsz) ? -EFAULT : 0;
  1120	}
  1121	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-29  2:48   ` kernel test robot
@ 2023-03-29 14:42     ` Reinette Chatre
  2023-03-29 22:10       ` Reinette Chatre
  0 siblings, 1 reply; 39+ messages in thread
From: Reinette Chatre @ 2023-03-29 14:42 UTC (permalink / raw)
  To: kernel test robot, jgg, yishaih, shameerali.kolothum.thodi,
	kevin.tian, alex.williamson
  Cc: oe-kbuild-all, tglx, darwi, kvm, dave.jiang, jing2.liu,
	ashok.raj, fenghua.yu, tom.zanussi, linux-kernel



Thank you very much for the report.

On 3/28/2023 7:48 PM, kernel test robot wrote:
> config: i386-randconfig-a001-20230327 (https://download.01.org/0day-ci/archive/20230329/202303291000.PWFqGCxH-lkp@intel.com/config)

This config (and all the other configs in the reports about
implicit declaration of function 'pci_msix_can_alloc_dyn')
has:
# CONFIG_PCI_MSI is not set

Resulting in:
 
>    drivers/vfio/pci/vfio_pci_intrs.c: In function 'vfio_msi_set_vector_signal':
>>> drivers/vfio/pci/vfio_pci_intrs.c:427:21: error: implicit declaration of function 'pci_msix_can_alloc_dyn' [-Werror=implicit-function-declaration]
>      427 |         if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
>          |                     ^~~~~~~~~~~~~~~~~~~~~~
>    cc1: some warnings being treated as errors
> 


because include/linux/pci.h is missing a stub for
pci_msix_can_alloc_dyn() when CONFIG_PCI_MSI is not set.
I'll send a separate fix for that.

Reinette

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-29 14:42     ` Reinette Chatre
@ 2023-03-29 22:10       ` Reinette Chatre
  0 siblings, 0 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-03-29 22:10 UTC (permalink / raw)
  To: kernel test robot, jgg, yishaih, shameerali.kolothum.thodi,
	kevin.tian, alex.williamson
  Cc: oe-kbuild-all, tglx, darwi, kvm, dave.jiang, jing2.liu,
	ashok.raj, fenghua.yu, tom.zanussi, linux-kernel



On 3/29/2023 7:42 AM, Reinette Chatre wrote:
> I'll send a separate fix for that.

https://lore.kernel.org/lkml/310ecc4815dae4174031062f525245f0755c70e2.1680119924.git.reinette.chatre@intel.com/

Reinette

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 2/8] vfio/pci: Remove negative check on unsigned vector
  2023-03-28 21:53 ` [PATCH V2 2/8] vfio/pci: Remove negative check on unsigned vector Reinette Chatre
@ 2023-03-30 20:26   ` Alex Williamson
  2023-03-30 22:32     ` Reinette Chatre
  0 siblings, 1 reply; 39+ messages in thread
From: Alex Williamson @ 2023-03-30 20:26 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

On Tue, 28 Mar 2023 14:53:29 -0700
Reinette Chatre <reinette.chatre@intel.com> wrote:

> User space provides the vector as an unsigned int that is checked
> early for validity (vfio_set_irqs_validate_and_prepare()).
> 
> A later negative check of the provided vector is not necessary.
> 
> Remove the negative check and ensure the type used
> for the vector is consistent as an unsigned int.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 6a9c6a143cc3..3f64ccdce69f 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -317,14 +317,14 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
>  }
>  
>  static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> -				      int vector, int fd, bool msix)
> +				      unsigned int vector, int fd, bool msix)
>  {
>  	struct pci_dev *pdev = vdev->pdev;
>  	struct eventfd_ctx *trigger;
>  	int irq, ret;
>  	u16 cmd;
>  
> -	if (vector < 0 || vector >= vdev->num_ctx)
> +	if (vector >= vdev->num_ctx)
>  		return -EINVAL;
>  
>  	irq = pci_irq_vector(pdev, vector);
> @@ -399,7 +399,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>  static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
>  			      unsigned count, int32_t *fds, bool msix)
>  {
> -	int i, j, ret = 0;
> +	int i, ret = 0;
> +	unsigned int j;
>  
>  	if (start >= vdev->num_ctx || start + count > vdev->num_ctx)
>  		return -EINVAL;

Unfortunately this turns the unwind portion of the function into an
infinite loop in the common case when @start is zero:

                for (--j; j >= (int)start; j--)
                        vfio_msi_set_vector_signal(vdev, j, -1, msix);

Thanks,
Alex


> @@ -420,7 +421,7 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
>  static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
>  {
>  	struct pci_dev *pdev = vdev->pdev;
> -	int i;
> +	unsigned int i;
>  	u16 cmd;
>  
>  	for (i = 0; i < vdev->num_ctx; i++) {
> @@ -542,7 +543,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
>  				    unsigned index, unsigned start,
>  				    unsigned count, uint32_t flags, void *data)
>  {
> -	int i;
> +	unsigned int i;
>  	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
>  
>  	if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 2/8] vfio/pci: Remove negative check on unsigned vector
  2023-03-30 20:26   ` Alex Williamson
@ 2023-03-30 22:32     ` Reinette Chatre
  2023-03-30 22:54       ` Alex Williamson
  0 siblings, 1 reply; 39+ messages in thread
From: Reinette Chatre @ 2023-03-30 22:32 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

Hi Alex,

On 3/30/2023 1:26 PM, Alex Williamson wrote:
> On Tue, 28 Mar 2023 14:53:29 -0700
> Reinette Chatre <reinette.chatre@intel.com> wrote:
...

>> @@ -399,7 +399,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>  static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
>>  			      unsigned count, int32_t *fds, bool msix)
>>  {
>> -	int i, j, ret = 0;
>> +	int i, ret = 0;
>> +	unsigned int j;
>>  
>>  	if (start >= vdev->num_ctx || start + count > vdev->num_ctx)
>>  		return -EINVAL;
> 
> Unfortunately this turns the unwind portion of the function into an
> infinite loop in the common case when @start is zero:
> 
>                 for (--j; j >= (int)start; j--)
>                         vfio_msi_set_vector_signal(vdev, j, -1, msix);
> 
> 

Thank you very much for catching this. It is not clear to me how you
would prefer to resolve this. Would you prefer that the vector parameter
in vfio_msi_set_vector_signal() continue to be an int and this patch be
dropped and the "if (vector < 0)" check remains (option A)? Or, alternatively,
I see two other possible solutions where the vector parameter in
vfio_msi_set_vector_signal() becomes an unsigned int and the above snippet
could be one of:

option B:
vfio_msi_set_block()
{
	int i, j, ret = 0;

	...
		for (--j; j >= (int)start; j--)
			vfio_msi_set_vector_signal(vdev, (unsigned int)j, -1, msix);
}

option C:
vfio_msi_set_block()
{
	int i, ret = 0;
	unsigned int j;

	...
		for (--j; j >= start && j < start + count; j--)
			vfio_msi_set_vector_signal(vdev, j, -1, msix);
}

What would you prefer?

Reinette

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-28 21:53 ` [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x Reinette Chatre
  2023-03-29  2:48   ` kernel test robot
  2023-03-29  2:58   ` kernel test robot
@ 2023-03-30 22:40   ` Alex Williamson
  2023-03-30 22:42     ` Alex Williamson
  2023-03-31 10:02   ` Liu, Jing2
  3 siblings, 1 reply; 39+ messages in thread
From: Alex Williamson @ 2023-03-30 22:40 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

On Tue, 28 Mar 2023 14:53:34 -0700
Reinette Chatre <reinette.chatre@intel.com> wrote:

> Recently introduced pci_msix_alloc_irq_at() and pci_msix_free_irq()
> enables an individual MSI-X index to be allocated and freed after
> MSI-X enabling.
> 
> Support dynamic MSI-X if supported by the device. Keep the association
> between allocated interrupt and vfio interrupt context. Allocate new
> context together with the new interrupt if no interrupt context exist
> for an MSI-X interrupt. Similarly, release an interrupt with its
> context.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since RFC V1:
> - Add pointer to interrupt context as function parameter to
>   vfio_irq_ctx_free(). (Alex)
> - Initialize new_ctx to false. (Dan Carpenter)
> - Only support dynamic allocation if device supports it. (Alex)
> 
>  drivers/vfio/pci/vfio_pci_intrs.c | 93 +++++++++++++++++++++++++------
>  1 file changed, 76 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index b3a258e58625..755b752ca17e 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -55,6 +55,13 @@ struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
>  	return xa_load(&vdev->ctx, index);
>  }
>  
> +static void vfio_irq_ctx_free(struct vfio_pci_core_device *vdev,
> +			      struct vfio_pci_irq_ctx *ctx, unsigned long index)
> +{
> +	xa_erase(&vdev->ctx, index);
> +	kfree(ctx);
> +}
> +
>  static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)
>  {
>  	struct vfio_pci_irq_ctx *ctx;
> @@ -409,33 +416,62 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>  {
>  	struct pci_dev *pdev = vdev->pdev;
>  	struct vfio_pci_irq_ctx *ctx;
> +	struct msi_map msix_map = {};
> +	bool allow_dyn_alloc = false;
>  	struct eventfd_ctx *trigger;
> +	bool new_ctx = false;
>  	int irq, ret;
>  	u16 cmd;
>  
> +	/* Only MSI-X allows dynamic allocation. */
> +	if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
> +		allow_dyn_alloc = true;

Should vfio-pci-core probe this and store it in a field on
vfio_pci_core_device so that we can simply use something like
vdev->has_dyn_msix throughout?

> +
>  	ctx = vfio_irq_ctx_get(vdev, vector);
> -	if (!ctx)
> +	if (!ctx && !allow_dyn_alloc)
>  		return -EINVAL;
> +
>  	irq = pci_irq_vector(pdev, vector);
> +	/* Context and interrupt are always allocated together. */
> +	WARN_ON((ctx && irq == -EINVAL) || (!ctx && irq != -EINVAL));
>  
> -	if (ctx->trigger) {
> +	if (ctx && ctx->trigger) {
>  		irq_bypass_unregister_producer(&ctx->producer);
>  
>  		cmd = vfio_pci_memory_lock_and_enable(vdev);
>  		free_irq(irq, ctx->trigger);
> +		if (allow_dyn_alloc) {

It almost seems easier to define msix_map in each scope that it's used:

			struct msi_map map = { .index = vector,
					       .virq = irq };

> +			msix_map.index = vector;
> +			msix_map.virq = irq;
> +			pci_msix_free_irq(pdev, msix_map);
> +			irq = -EINVAL;
> +		}
>  		vfio_pci_memory_unlock_and_restore(vdev, cmd);
>  		kfree(ctx->name);
>  		eventfd_ctx_put(ctx->trigger);
>  		ctx->trigger = NULL;
> +		if (allow_dyn_alloc) {
> +			vfio_irq_ctx_free(vdev, ctx, vector);
> +			ctx = NULL;
> +		}
>  	}
>  
>  	if (fd < 0)
>  		return 0;
>  
> +	if (!ctx) {
> +		ctx = vfio_irq_ctx_alloc_single(vdev, vector);
> +		if (!ctx)
> +			return -ENOMEM;
> +		new_ctx = true;
> +	}
> +
>  	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
>  			      msix ? "x" : "", vector, pci_name(pdev));
> -	if (!ctx->name)
> -		return -ENOMEM;
> +	if (!ctx->name) {
> +		ret = -ENOMEM;
> +		goto out_free_ctx;
> +	}
>  
>  	trigger = eventfd_ctx_fdget(fd);
>  	if (IS_ERR(trigger)) {
> @@ -443,25 +479,38 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>  		goto out_free_name;
>  	}
>  
> -	/*
> -	 * The MSIx vector table resides in device memory which may be cleared
> -	 * via backdoor resets. We don't allow direct access to the vector
> -	 * table so even if a userspace driver attempts to save/restore around
> -	 * such a reset it would be unsuccessful. To avoid this, restore the
> -	 * cached value of the message prior to enabling.
> -	 */
>  	cmd = vfio_pci_memory_lock_and_enable(vdev);
>  	if (msix) {
> -		struct msi_msg msg;
> -
> -		get_cached_msi_msg(irq, &msg);
> -		pci_write_msi_msg(irq, &msg);
> +		if (irq == -EINVAL) {
> +			msix_map = pci_msix_alloc_irq_at(pdev, vector, NULL);

			struct msi_map map = pci_msix_alloc_irq_at(pdev,
								vector, NULL);
> +			if (msix_map.index < 0) {
> +				vfio_pci_memory_unlock_and_restore(vdev, cmd);
> +				ret = msix_map.index;
> +				goto out_put_eventfd_ctx;
> +			}
> +			irq = msix_map.virq;
> +		} else {
> +			/*
> +			 * The MSIx vector table resides in device memory which
> +			 * may be cleared via backdoor resets. We don't allow
> +			 * direct access to the vector table so even if a
> +			 * userspace driver attempts to save/restore around
> +			 * such a reset it would be unsuccessful. To avoid
> +			 * this, restore the cached value of the message prior
> +			 * to enabling.
> +			 */

You've only just copied this comment down to here, but I think it's a
bit stale.  Maybe we should update it to something that helps explain
this split better, maybe:

			/*
			 * If the vector was previously allocated, refresh the
			 * on-device message data before enabling in case it had
			 * been cleared or corrupted since writing.
			 */

IIRC, that was the purpose of writing it back to the device and the
blocking of direct access is no longer accurate anyway.

> +			struct msi_msg msg;
> +
> +			get_cached_msi_msg(irq, &msg);
> +			pci_write_msi_msg(irq, &msg);
> +		}
>  	}
>  
>  	ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
> -	vfio_pci_memory_unlock_and_restore(vdev, cmd);
>  	if (ret)
> -		goto out_put_eventfd_ctx;
> +		goto out_free_irq_locked;
> +
> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
>  
>  	ctx->producer.token = trigger;
>  	ctx->producer.irq = irq;
> @@ -477,11 +526,21 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>  
>  	return 0;
>  
> +out_free_irq_locked:
> +	if (allow_dyn_alloc && new_ctx) {

		struct msi_map map = { .index = vector,
				       .virq = irq };

> +		msix_map.index = vector;
> +		msix_map.virq = irq;
> +		pci_msix_free_irq(pdev, msix_map);
> +	}
> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
>  out_put_eventfd_ctx:
>  	eventfd_ctx_put(trigger);
>  out_free_name:
>  	kfree(ctx->name);
>  	ctx->name = NULL;
> +out_free_ctx:
> +	if (allow_dyn_alloc && new_ctx)
> +		vfio_irq_ctx_free(vdev, ctx, vector);
>  	return ret;
>  }
>  

Do we really need the new_ctx test in the above cases?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-30 22:40   ` Alex Williamson
@ 2023-03-30 22:42     ` Alex Williamson
  2023-03-31 17:49       ` Reinette Chatre
  0 siblings, 1 reply; 39+ messages in thread
From: Alex Williamson @ 2023-03-30 22:42 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

On Thu, 30 Mar 2023 16:40:50 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue, 28 Mar 2023 14:53:34 -0700
> Reinette Chatre <reinette.chatre@intel.com> wrote:
> 
> > Recently introduced pci_msix_alloc_irq_at() and pci_msix_free_irq()
> > enables an individual MSI-X index to be allocated and freed after
> > MSI-X enabling.
> > 
> > Support dynamic MSI-X if supported by the device. Keep the association
> > between allocated interrupt and vfio interrupt context. Allocate new
> > context together with the new interrupt if no interrupt context exist
> > for an MSI-X interrupt. Similarly, release an interrupt with its
> > context.
> > 
> > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > ---
> > Changes since RFC V1:
> > - Add pointer to interrupt context as function parameter to
> >   vfio_irq_ctx_free(). (Alex)
> > - Initialize new_ctx to false. (Dan Carpenter)
> > - Only support dynamic allocation if device supports it. (Alex)
> > 
> >  drivers/vfio/pci/vfio_pci_intrs.c | 93 +++++++++++++++++++++++++------
> >  1 file changed, 76 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> > index b3a258e58625..755b752ca17e 100644
> > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > @@ -55,6 +55,13 @@ struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
> >  	return xa_load(&vdev->ctx, index);
> >  }
> >  
> > +static void vfio_irq_ctx_free(struct vfio_pci_core_device *vdev,
> > +			      struct vfio_pci_irq_ctx *ctx, unsigned long index)
> > +{
> > +	xa_erase(&vdev->ctx, index);
> > +	kfree(ctx);
> > +}

Also, the function below should use this rather than open coding the
same now.  Thanks,

Alex

> > +
> >  static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)
> >  {
> >  	struct vfio_pci_irq_ctx *ctx;
> > @@ -409,33 +416,62 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >  {
> >  	struct pci_dev *pdev = vdev->pdev;
> >  	struct vfio_pci_irq_ctx *ctx;
> > +	struct msi_map msix_map = {};
> > +	bool allow_dyn_alloc = false;
> >  	struct eventfd_ctx *trigger;
> > +	bool new_ctx = false;
> >  	int irq, ret;
> >  	u16 cmd;
> >  
> > +	/* Only MSI-X allows dynamic allocation. */
> > +	if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
> > +		allow_dyn_alloc = true;  
> 
> Should vfio-pci-core probe this and store it in a field on
> vfio_pci_core_device so that we can simply use something like
> vdev->has_dyn_msix throughout?
> 
> > +
> >  	ctx = vfio_irq_ctx_get(vdev, vector);
> > -	if (!ctx)
> > +	if (!ctx && !allow_dyn_alloc)
> >  		return -EINVAL;
> > +
> >  	irq = pci_irq_vector(pdev, vector);
> > +	/* Context and interrupt are always allocated together. */
> > +	WARN_ON((ctx && irq == -EINVAL) || (!ctx && irq != -EINVAL));
> >  
> > -	if (ctx->trigger) {
> > +	if (ctx && ctx->trigger) {
> >  		irq_bypass_unregister_producer(&ctx->producer);
> >  
> >  		cmd = vfio_pci_memory_lock_and_enable(vdev);
> >  		free_irq(irq, ctx->trigger);
> > +		if (allow_dyn_alloc) {  
> 
> It almost seems easier to define msix_map in each scope that it's used:
> 
> 			struct msi_map map = { .index = vector,
> 					       .virq = irq };
> 
> > +			msix_map.index = vector;
> > +			msix_map.virq = irq;
> > +			pci_msix_free_irq(pdev, msix_map);
> > +			irq = -EINVAL;
> > +		}
> >  		vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >  		kfree(ctx->name);
> >  		eventfd_ctx_put(ctx->trigger);
> >  		ctx->trigger = NULL;
> > +		if (allow_dyn_alloc) {
> > +			vfio_irq_ctx_free(vdev, ctx, vector);
> > +			ctx = NULL;
> > +		}
> >  	}
> >  
> >  	if (fd < 0)
> >  		return 0;
> >  
> > +	if (!ctx) {
> > +		ctx = vfio_irq_ctx_alloc_single(vdev, vector);
> > +		if (!ctx)
> > +			return -ENOMEM;
> > +		new_ctx = true;
> > +	}
> > +
> >  	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
> >  			      msix ? "x" : "", vector, pci_name(pdev));
> > -	if (!ctx->name)
> > -		return -ENOMEM;
> > +	if (!ctx->name) {
> > +		ret = -ENOMEM;
> > +		goto out_free_ctx;
> > +	}
> >  
> >  	trigger = eventfd_ctx_fdget(fd);
> >  	if (IS_ERR(trigger)) {
> > @@ -443,25 +479,38 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >  		goto out_free_name;
> >  	}
> >  
> > -	/*
> > -	 * The MSIx vector table resides in device memory which may be cleared
> > -	 * via backdoor resets. We don't allow direct access to the vector
> > -	 * table so even if a userspace driver attempts to save/restore around
> > -	 * such a reset it would be unsuccessful. To avoid this, restore the
> > -	 * cached value of the message prior to enabling.
> > -	 */
> >  	cmd = vfio_pci_memory_lock_and_enable(vdev);
> >  	if (msix) {
> > -		struct msi_msg msg;
> > -
> > -		get_cached_msi_msg(irq, &msg);
> > -		pci_write_msi_msg(irq, &msg);
> > +		if (irq == -EINVAL) {
> > +			msix_map = pci_msix_alloc_irq_at(pdev, vector, NULL);  
> 
> 			struct msi_map map = pci_msix_alloc_irq_at(pdev,
> 								vector, NULL);
> > +			if (msix_map.index < 0) {
> > +				vfio_pci_memory_unlock_and_restore(vdev, cmd);
> > +				ret = msix_map.index;
> > +				goto out_put_eventfd_ctx;
> > +			}
> > +			irq = msix_map.virq;
> > +		} else {
> > +			/*
> > +			 * The MSIx vector table resides in device memory which
> > +			 * may be cleared via backdoor resets. We don't allow
> > +			 * direct access to the vector table so even if a
> > +			 * userspace driver attempts to save/restore around
> > +			 * such a reset it would be unsuccessful. To avoid
> > +			 * this, restore the cached value of the message prior
> > +			 * to enabling.
> > +			 */  
> 
> You've only just copied this comment down to here, but I think it's a
> bit stale.  Maybe we should update it to something that helps explain
> this split better, maybe:
> 
> 			/*
> 			 * If the vector was previously allocated, refresh the
> 			 * on-device message data before enabling in case it had
> 			 * been cleared or corrupted since writing.
> 			 */
> 
> IIRC, that was the purpose of writing it back to the device and the
> blocking of direct access is no longer accurate anyway.
> 
> > +			struct msi_msg msg;
> > +
> > +			get_cached_msi_msg(irq, &msg);
> > +			pci_write_msi_msg(irq, &msg);
> > +		}
> >  	}
> >  
> >  	ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
> > -	vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >  	if (ret)
> > -		goto out_put_eventfd_ctx;
> > +		goto out_free_irq_locked;
> > +
> > +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >  
> >  	ctx->producer.token = trigger;
> >  	ctx->producer.irq = irq;
> > @@ -477,11 +526,21 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >  
> >  	return 0;
> >  
> > +out_free_irq_locked:
> > +	if (allow_dyn_alloc && new_ctx) {  
> 
> 		struct msi_map map = { .index = vector,
> 				       .virq = irq };
> 
> > +		msix_map.index = vector;
> > +		msix_map.virq = irq;
> > +		pci_msix_free_irq(pdev, msix_map);
> > +	}
> > +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >  out_put_eventfd_ctx:
> >  	eventfd_ctx_put(trigger);
> >  out_free_name:
> >  	kfree(ctx->name);
> >  	ctx->name = NULL;
> > +out_free_ctx:
> > +	if (allow_dyn_alloc && new_ctx)
> > +		vfio_irq_ctx_free(vdev, ctx, vector);
> >  	return ret;
> >  }
> >    
> 
> Do we really need the new_ctx test in the above cases?  Thanks,
> 
> Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 2/8] vfio/pci: Remove negative check on unsigned vector
  2023-03-30 22:32     ` Reinette Chatre
@ 2023-03-30 22:54       ` Alex Williamson
  2023-03-30 23:54         ` Reinette Chatre
  0 siblings, 1 reply; 39+ messages in thread
From: Alex Williamson @ 2023-03-30 22:54 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

On Thu, 30 Mar 2023 15:32:20 -0700
Reinette Chatre <reinette.chatre@intel.com> wrote:

> Hi Alex,
> 
> On 3/30/2023 1:26 PM, Alex Williamson wrote:
> > On Tue, 28 Mar 2023 14:53:29 -0700
> > Reinette Chatre <reinette.chatre@intel.com> wrote:  
> ...
> 
> >> @@ -399,7 +399,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >>  static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
> >>  			      unsigned count, int32_t *fds, bool msix)
> >>  {
> >> -	int i, j, ret = 0;
> >> +	int i, ret = 0;
> >> +	unsigned int j;
> >>  
> >>  	if (start >= vdev->num_ctx || start + count > vdev->num_ctx)
> >>  		return -EINVAL;  
> > 
> > Unfortunately this turns the unwind portion of the function into an
> > infinite loop in the common case when @start is zero:
> > 
> >                 for (--j; j >= (int)start; j--)
> >                         vfio_msi_set_vector_signal(vdev, j, -1, msix);
> > 
> >   
> 
> Thank you very much for catching this. It is not clear to me how you
> would prefer to resolve this. Would you prefer that the vector parameter
> in vfio_msi_set_vector_signal() continue to be an int and this patch be
> dropped and the "if (vector < 0)" check remains (option A)? Or, alternatively,
> I see two other possible solutions where the vector parameter in
> vfio_msi_set_vector_signal() becomes an unsigned int and the above snippet
> could be one of:
> 
> option B:
> vfio_msi_set_block()
> {
> 	int i, j, ret = 0;
> 
> 	...
> 		for (--j; j >= (int)start; j--)
> 			vfio_msi_set_vector_signal(vdev, (unsigned int)j, -1, msix);
> }
> 
> option C:
> vfio_msi_set_block()
> {
> 	int i, ret = 0;
> 	unsigned int j;
> 
> 	...
> 		for (--j; j >= start && j < start + count; j--)
> 			vfio_msi_set_vector_signal(vdev, j, -1, msix);
> }
> 
> What would you prefer?


Hmm, C is fine, it avoids casting.  I think we could also do:

	unsigned int i, j;
	int ret = 0;

	...

		for (i = start; i < j; i++)
			vfio_msi_set_vector_signal(vdev, i, -1, msix);

Thanks,
Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 2/8] vfio/pci: Remove negative check on unsigned vector
  2023-03-30 22:54       ` Alex Williamson
@ 2023-03-30 23:54         ` Reinette Chatre
  0 siblings, 0 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-03-30 23:54 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

Hi Alex,

On 3/30/2023 3:54 PM, Alex Williamson wrote:
> On Thu, 30 Mar 2023 15:32:20 -0700
> Reinette Chatre <reinette.chatre@intel.com> wrote:
>> On 3/30/2023 1:26 PM, Alex Williamson wrote:
>>> On Tue, 28 Mar 2023 14:53:29 -0700
>>> Reinette Chatre <reinette.chatre@intel.com> wrote:  
>> ...
>>
>>>> @@ -399,7 +399,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>>>  static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
>>>>  			      unsigned count, int32_t *fds, bool msix)
>>>>  {
>>>> -	int i, j, ret = 0;
>>>> +	int i, ret = 0;
>>>> +	unsigned int j;
>>>>  
>>>>  	if (start >= vdev->num_ctx || start + count > vdev->num_ctx)
>>>>  		return -EINVAL;  
>>>
>>> Unfortunately this turns the unwind portion of the function into an
>>> infinite loop in the common case when @start is zero:
>>>
>>>                 for (--j; j >= (int)start; j--)
>>>                         vfio_msi_set_vector_signal(vdev, j, -1, msix);
>>>
>>>   
>>
>> Thank you very much for catching this. It is not clear to me how you
>> would prefer to resolve this. Would you prefer that the vector parameter
>> in vfio_msi_set_vector_signal() continue to be an int and this patch be
>> dropped and the "if (vector < 0)" check remains (option A)? Or, alternatively,
>> I see two other possible solutions where the vector parameter in
>> vfio_msi_set_vector_signal() becomes an unsigned int and the above snippet
>> could be one of:
>>
>> option B:
>> vfio_msi_set_block()
>> {
>> 	int i, j, ret = 0;
>>
>> 	...
>> 		for (--j; j >= (int)start; j--)
>> 			vfio_msi_set_vector_signal(vdev, (unsigned int)j, -1, msix);
>> }
>>
>> option C:
>> vfio_msi_set_block()
>> {
>> 	int i, ret = 0;
>> 	unsigned int j;
>>
>> 	...
>> 		for (--j; j >= start && j < start + count; j--)
>> 			vfio_msi_set_vector_signal(vdev, j, -1, msix);
>> }
>>
>> What would you prefer?
> 
> 
> Hmm, C is fine, it avoids casting.  I think we could also do:
> 
> 	unsigned int i, j;
> 	int ret = 0;
> 
> 	...
> 
> 		for (i = start; i < j; i++)
> 			vfio_msi_set_vector_signal(vdev, i, -1, msix);
> 

Much better. Will do. Thank you very much.

Reinette

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-28 21:53 ` [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x Reinette Chatre
                     ` (2 preceding siblings ...)
  2023-03-30 22:40   ` Alex Williamson
@ 2023-03-31 10:02   ` Liu, Jing2
  2023-03-31 13:51     ` Alex Williamson
  3 siblings, 1 reply; 39+ messages in thread
From: Liu, Jing2 @ 2023-03-31 10:02 UTC (permalink / raw)
  To: Chatre, Reinette, jgg, yishaih, shameerali.kolothum.thodi, Tian,
	Kevin, alex.williamson
  Cc: tglx, darwi, kvm, Jiang, Dave, Raj, Ashok, Yu, Fenghua,
	tom.zanussi, linux-kernel

Hi Reinette,

> @@ -409,33 +416,62 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_core_device *vdev,  {
>  	struct pci_dev *pdev = vdev->pdev;
>  	struct vfio_pci_irq_ctx *ctx;
> +	struct msi_map msix_map = {};
> +	bool allow_dyn_alloc = false;
>  	struct eventfd_ctx *trigger;
> +	bool new_ctx = false;
>  	int irq, ret;
>  	u16 cmd;
> 
> +	/* Only MSI-X allows dynamic allocation. */
> +	if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
> +		allow_dyn_alloc = true;
> +
>  	ctx = vfio_irq_ctx_get(vdev, vector);
> -	if (!ctx)
> +	if (!ctx && !allow_dyn_alloc)
>  		return -EINVAL;
> +
>  	irq = pci_irq_vector(pdev, vector);
> +	/* Context and interrupt are always allocated together. */
> +	WARN_ON((ctx && irq == -EINVAL) || (!ctx && irq != -EINVAL));
> 
> -	if (ctx->trigger) {
> +	if (ctx && ctx->trigger) {
>  		irq_bypass_unregister_producer(&ctx->producer);
> 
>  		cmd = vfio_pci_memory_lock_and_enable(vdev);
>  		free_irq(irq, ctx->trigger);
> +		if (allow_dyn_alloc) {
> +			msix_map.index = vector;
> +			msix_map.virq = irq;
> +			pci_msix_free_irq(pdev, msix_map);
> +			irq = -EINVAL;
> +		}
>  		vfio_pci_memory_unlock_and_restore(vdev, cmd);
>  		kfree(ctx->name);
>  		eventfd_ctx_put(ctx->trigger);
>  		ctx->trigger = NULL;
> +		if (allow_dyn_alloc) {
> +			vfio_irq_ctx_free(vdev, ctx, vector);
> +			ctx = NULL;
> +		}
>  	}
> 
>  	if (fd < 0)
>  		return 0;
> 

While looking at this piece of code, one thought comes to me: 
It might be possible that userspace comes twice with the same valid fd for a specific
vector, this vfio code will free the resource in if(ctx && ctx->trigger) for the second
time, and then re-alloc again for the same fd given by userspace. 

Would that help if we firstly check e.g. ctx->trigger with the given valid fd, to see if 
the trigger is really changed or not?

Thanks,
Jing


> +	if (!ctx) {
> +		ctx = vfio_irq_ctx_alloc_single(vdev, vector);
> +		if (!ctx)
> +			return -ENOMEM;
> +		new_ctx = true;
> +	}
> +
>  	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
>  			      msix ? "x" : "", vector, pci_name(pdev));
> -	if (!ctx->name)
> -		return -ENOMEM;
> +	if (!ctx->name) {
> +		ret = -ENOMEM;
> +		goto out_free_ctx;
> +	}
> 
>  	trigger = eventfd_ctx_fdget(fd);
>  	if (IS_ERR(trigger)) {
> @@ -443,25 +479,38 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_core_device *vdev,
>  		goto out_free_name;
>  	}
> 
> -	/*
> -	 * The MSIx vector table resides in device memory which may be cleared
> -	 * via backdoor resets. We don't allow direct access to the vector
> -	 * table so even if a userspace driver attempts to save/restore around
> -	 * such a reset it would be unsuccessful. To avoid this, restore the
> -	 * cached value of the message prior to enabling.
> -	 */
>  	cmd = vfio_pci_memory_lock_and_enable(vdev);
>  	if (msix) {
> -		struct msi_msg msg;
> -
> -		get_cached_msi_msg(irq, &msg);
> -		pci_write_msi_msg(irq, &msg);
> +		if (irq == -EINVAL) {
> +			msix_map = pci_msix_alloc_irq_at(pdev, vector, NULL);
> +			if (msix_map.index < 0) {
> +				vfio_pci_memory_unlock_and_restore(vdev,
> cmd);
> +				ret = msix_map.index;
> +				goto out_put_eventfd_ctx;
> +			}
> +			irq = msix_map.virq;
> +		} else {
> +			/*
> +			 * The MSIx vector table resides in device memory
> which
> +			 * may be cleared via backdoor resets. We don't allow
> +			 * direct access to the vector table so even if a
> +			 * userspace driver attempts to save/restore around
> +			 * such a reset it would be unsuccessful. To avoid
> +			 * this, restore the cached value of the message prior
> +			 * to enabling.
> +			 */
> +			struct msi_msg msg;
> +
> +			get_cached_msi_msg(irq, &msg);
> +			pci_write_msi_msg(irq, &msg);
> +		}
>  	}
> 
>  	ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
> -	vfio_pci_memory_unlock_and_restore(vdev, cmd);
>  	if (ret)
> -		goto out_put_eventfd_ctx;
> +		goto out_free_irq_locked;
> +
> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
> 
>  	ctx->producer.token = trigger;
>  	ctx->producer.irq = irq;
> @@ -477,11 +526,21 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_core_device *vdev,
> 
>  	return 0;
> 
> +out_free_irq_locked:
> +	if (allow_dyn_alloc && new_ctx) {
> +		msix_map.index = vector;
> +		msix_map.virq = irq;
> +		pci_msix_free_irq(pdev, msix_map);
> +	}
> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
>  out_put_eventfd_ctx:
>  	eventfd_ctx_put(trigger);
>  out_free_name:
>  	kfree(ctx->name);
>  	ctx->name = NULL;
> +out_free_ctx:
> +	if (allow_dyn_alloc && new_ctx)
> +		vfio_irq_ctx_free(vdev, ctx, vector);
>  	return ret;
>  }
> 
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-31 10:02   ` Liu, Jing2
@ 2023-03-31 13:51     ` Alex Williamson
  2023-04-04  3:19       ` Liu, Jing2
  0 siblings, 1 reply; 39+ messages in thread
From: Alex Williamson @ 2023-03-31 13:51 UTC (permalink / raw)
  To: Liu, Jing2
  Cc: Chatre, Reinette, jgg, yishaih, shameerali.kolothum.thodi, Tian,
	Kevin, tglx, darwi, kvm, Jiang, Dave, Raj, Ashok, Yu, Fenghua,
	tom.zanussi, linux-kernel

On Fri, 31 Mar 2023 10:02:32 +0000
"Liu, Jing2" <jing2.liu@intel.com> wrote:

> Hi Reinette,
> 
> > @@ -409,33 +416,62 @@ static int vfio_msi_set_vector_signal(struct
> > vfio_pci_core_device *vdev,  {
> >  	struct pci_dev *pdev = vdev->pdev;
> >  	struct vfio_pci_irq_ctx *ctx;
> > +	struct msi_map msix_map = {};
> > +	bool allow_dyn_alloc = false;
> >  	struct eventfd_ctx *trigger;
> > +	bool new_ctx = false;
> >  	int irq, ret;
> >  	u16 cmd;
> > 
> > +	/* Only MSI-X allows dynamic allocation. */
> > +	if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
> > +		allow_dyn_alloc = true;
> > +
> >  	ctx = vfio_irq_ctx_get(vdev, vector);
> > -	if (!ctx)
> > +	if (!ctx && !allow_dyn_alloc)
> >  		return -EINVAL;
> > +
> >  	irq = pci_irq_vector(pdev, vector);
> > +	/* Context and interrupt are always allocated together. */
> > +	WARN_ON((ctx && irq == -EINVAL) || (!ctx && irq != -EINVAL));
> > 
> > -	if (ctx->trigger) {
> > +	if (ctx && ctx->trigger) {
> >  		irq_bypass_unregister_producer(&ctx->producer);
> > 
> >  		cmd = vfio_pci_memory_lock_and_enable(vdev);
> >  		free_irq(irq, ctx->trigger);
> > +		if (allow_dyn_alloc) {
> > +			msix_map.index = vector;
> > +			msix_map.virq = irq;
> > +			pci_msix_free_irq(pdev, msix_map);
> > +			irq = -EINVAL;
> > +		}
> >  		vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >  		kfree(ctx->name);
> >  		eventfd_ctx_put(ctx->trigger);
> >  		ctx->trigger = NULL;
> > +		if (allow_dyn_alloc) {
> > +			vfio_irq_ctx_free(vdev, ctx, vector);
> > +			ctx = NULL;
> > +		}
> >  	}
> > 
> >  	if (fd < 0)
> >  		return 0;
> >   
> 
> While looking at this piece of code, one thought comes to me: 
> It might be possible that userspace comes twice with the same valid fd for a specific
> vector, this vfio code will free the resource in if(ctx && ctx->trigger) for the second
> time, and then re-alloc again for the same fd given by userspace. 
> 
> Would that help if we firstly check e.g. ctx->trigger with the given valid fd, to see if 
> the trigger is really changed or not?

It's rather a made-up situation, if userspace wants to avoid bouncing
the vector when the eventfd hasn't changed they can simply test this
before calling the ioctl.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-30 22:42     ` Alex Williamson
@ 2023-03-31 17:49       ` Reinette Chatre
  2023-03-31 22:24         ` Alex Williamson
  0 siblings, 1 reply; 39+ messages in thread
From: Reinette Chatre @ 2023-03-31 17:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

Hi Alex,

On 3/30/2023 3:42 PM, Alex Williamson wrote:
> On Thu, 30 Mar 2023 16:40:50 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
>> On Tue, 28 Mar 2023 14:53:34 -0700
>> Reinette Chatre <reinette.chatre@intel.com> wrote:
>>

...

>>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
>>> index b3a258e58625..755b752ca17e 100644
>>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>>> @@ -55,6 +55,13 @@ struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
>>>  	return xa_load(&vdev->ctx, index);
>>>  }
>>>  
>>> +static void vfio_irq_ctx_free(struct vfio_pci_core_device *vdev,
>>> +			      struct vfio_pci_irq_ctx *ctx, unsigned long index)
>>> +{
>>> +	xa_erase(&vdev->ctx, index);
>>> +	kfree(ctx);
>>> +}
> 
> Also, the function below should use this rather than open coding the
> same now.  Thanks,

It should, yes. Thank you. Will do.


>>>  static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)
>>>  {
>>>  	struct vfio_pci_irq_ctx *ctx;
>>> @@ -409,33 +416,62 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>>  {
>>>  	struct pci_dev *pdev = vdev->pdev;
>>>  	struct vfio_pci_irq_ctx *ctx;
>>> +	struct msi_map msix_map = {};
>>> +	bool allow_dyn_alloc = false;
>>>  	struct eventfd_ctx *trigger;
>>> +	bool new_ctx = false;
>>>  	int irq, ret;
>>>  	u16 cmd;
>>>  
>>> +	/* Only MSI-X allows dynamic allocation. */
>>> +	if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
>>> +		allow_dyn_alloc = true;  
>>
>> Should vfio-pci-core probe this and store it in a field on
>> vfio_pci_core_device so that we can simply use something like
>> vdev->has_dyn_msix throughout?

It is not obvious to me if you mean this with vfio-pci-core probe,
but it looks like a change to vfio_pci_core_enable() may be
appropriate with a snippet like below:

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a743b98ba29a..a474ce80a555 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -533,6 +533,8 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
 	} else
 		vdev->msix_bar = 0xFF;
 
+	vdev->has_dyn_msix = pci_msix_can_alloc_dyn(pdev);
+
 	if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
 		vdev->has_vga = true;
 
Please do note that I placed it outside of the earlier "if (msix_pos)" since
pci_msix_can_alloc_dyn() has its own "if (!dev->msix_cap)". If you prefer
to keep all the vdev->*msix* together I can move it into the if statement.

With vdev->has_dyn_msix available "allow_dyn_alloc" can be dropped as you
stated.

>>
>>> +
>>>  	ctx = vfio_irq_ctx_get(vdev, vector);
>>> -	if (!ctx)
>>> +	if (!ctx && !allow_dyn_alloc)
>>>  		return -EINVAL;
>>> +
>>>  	irq = pci_irq_vector(pdev, vector);
>>> +	/* Context and interrupt are always allocated together. */
>>> +	WARN_ON((ctx && irq == -EINVAL) || (!ctx && irq != -EINVAL));
>>>  
>>> -	if (ctx->trigger) {
>>> +	if (ctx && ctx->trigger) {
>>>  		irq_bypass_unregister_producer(&ctx->producer);
>>>  
>>>  		cmd = vfio_pci_memory_lock_and_enable(vdev);
>>>  		free_irq(irq, ctx->trigger);
>>> +		if (allow_dyn_alloc) {  
>>
>> It almost seems easier to define msix_map in each scope that it's used:
>>
>> 			struct msi_map map = { .index = vector,
>> 					       .virq = irq };
>>

Sure. Will do.

>>> +			msix_map.index = vector;
>>> +			msix_map.virq = irq;
>>> +			pci_msix_free_irq(pdev, msix_map);
>>> +			irq = -EINVAL;
>>> +		}
>>>  		vfio_pci_memory_unlock_and_restore(vdev, cmd);
>>>  		kfree(ctx->name);
>>>  		eventfd_ctx_put(ctx->trigger);
>>>  		ctx->trigger = NULL;
>>> +		if (allow_dyn_alloc) {
>>> +			vfio_irq_ctx_free(vdev, ctx, vector);
>>> +			ctx = NULL;
>>> +		}
>>>  	}
>>>  
>>>  	if (fd < 0)
>>>  		return 0;
>>>  
>>> +	if (!ctx) {
>>> +		ctx = vfio_irq_ctx_alloc_single(vdev, vector);
>>> +		if (!ctx)
>>> +			return -ENOMEM;
>>> +		new_ctx = true;
>>> +	}
>>> +
>>>  	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
>>>  			      msix ? "x" : "", vector, pci_name(pdev));
>>> -	if (!ctx->name)
>>> -		return -ENOMEM;
>>> +	if (!ctx->name) {
>>> +		ret = -ENOMEM;
>>> +		goto out_free_ctx;
>>> +	}
>>>  
>>>  	trigger = eventfd_ctx_fdget(fd);
>>>  	if (IS_ERR(trigger)) {
>>> @@ -443,25 +479,38 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>>  		goto out_free_name;
>>>  	}
>>>  
>>> -	/*
>>> -	 * The MSIx vector table resides in device memory which may be cleared
>>> -	 * via backdoor resets. We don't allow direct access to the vector
>>> -	 * table so even if a userspace driver attempts to save/restore around
>>> -	 * such a reset it would be unsuccessful. To avoid this, restore the
>>> -	 * cached value of the message prior to enabling.
>>> -	 */
>>>  	cmd = vfio_pci_memory_lock_and_enable(vdev);
>>>  	if (msix) {
>>> -		struct msi_msg msg;
>>> -
>>> -		get_cached_msi_msg(irq, &msg);
>>> -		pci_write_msi_msg(irq, &msg);
>>> +		if (irq == -EINVAL) {
>>> +			msix_map = pci_msix_alloc_irq_at(pdev, vector, NULL);  
>>
>> 			struct msi_map map = pci_msix_alloc_irq_at(pdev,
>> 								vector, NULL);

Will do.

>>> +			if (msix_map.index < 0) {
>>> +				vfio_pci_memory_unlock_and_restore(vdev, cmd);
>>> +				ret = msix_map.index;
>>> +				goto out_put_eventfd_ctx;
>>> +			}
>>> +			irq = msix_map.virq;
>>> +		} else {
>>> +			/*
>>> +			 * The MSIx vector table resides in device memory which
>>> +			 * may be cleared via backdoor resets. We don't allow
>>> +			 * direct access to the vector table so even if a
>>> +			 * userspace driver attempts to save/restore around
>>> +			 * such a reset it would be unsuccessful. To avoid
>>> +			 * this, restore the cached value of the message prior
>>> +			 * to enabling.
>>> +			 */  
>>
>> You've only just copied this comment down to here, but I think it's a
>> bit stale.  Maybe we should update it to something that helps explain
>> this split better, maybe:
>>
>> 			/*
>> 			 * If the vector was previously allocated, refresh the
>> 			 * on-device message data before enabling in case it had
>> 			 * been cleared or corrupted since writing.
>> 			 */
>>
>> IIRC, that was the purpose of writing it back to the device and the
>> blocking of direct access is no longer accurate anyway.

Thank you. Will do. To keep this patch focused I plan to separate
this change into a new prep patch that will be placed earlier in
this series.

>>
>>> +			struct msi_msg msg;
>>> +
>>> +			get_cached_msi_msg(irq, &msg);
>>> +			pci_write_msi_msg(irq, &msg);
>>> +		}
>>>  	}
>>>  
>>>  	ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
>>> -	vfio_pci_memory_unlock_and_restore(vdev, cmd);
>>>  	if (ret)
>>> -		goto out_put_eventfd_ctx;
>>> +		goto out_free_irq_locked;
>>> +
>>> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
>>>  
>>>  	ctx->producer.token = trigger;
>>>  	ctx->producer.irq = irq;
>>> @@ -477,11 +526,21 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>>  
>>>  	return 0;
>>>  
>>> +out_free_irq_locked:
>>> +	if (allow_dyn_alloc && new_ctx) {  
>>
>> 		struct msi_map map = { .index = vector,
>> 				       .virq = irq };
>>

Will do.

>>> +		msix_map.index = vector;
>>> +		msix_map.virq = irq;
>>> +		pci_msix_free_irq(pdev, msix_map);
>>> +	}
>>> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
>>>  out_put_eventfd_ctx:
>>>  	eventfd_ctx_put(trigger);
>>>  out_free_name:
>>>  	kfree(ctx->name);
>>>  	ctx->name = NULL;
>>> +out_free_ctx:
>>> +	if (allow_dyn_alloc && new_ctx)
>>> +		vfio_irq_ctx_free(vdev, ctx, vector);
>>>  	return ret;
>>>  }
>>>    
>>
>> Do we really need the new_ctx test in the above cases?  Thanks,

new_ctx is not required for correctness but instead is used to keep
the code symmetric. 
Specifically, if the user enables MSI-X without providing triggers and
then later assign triggers then an error path without new_ctx would unwind
more than done in this function, it would free the context that
was allocated within vfio_msi_enable(). 

Reinette

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-31 17:49       ` Reinette Chatre
@ 2023-03-31 22:24         ` Alex Williamson
  2023-04-03 17:31           ` Reinette Chatre
  0 siblings, 1 reply; 39+ messages in thread
From: Alex Williamson @ 2023-03-31 22:24 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

On Fri, 31 Mar 2023 10:49:16 -0700
Reinette Chatre <reinette.chatre@intel.com> wrote:

> Hi Alex,
> 
> On 3/30/2023 3:42 PM, Alex Williamson wrote:
> > On Thu, 30 Mar 2023 16:40:50 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> >> On Tue, 28 Mar 2023 14:53:34 -0700
> >> Reinette Chatre <reinette.chatre@intel.com> wrote:
> >>  
> 
> ...
> 
> >>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> >>> index b3a258e58625..755b752ca17e 100644
> >>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> >>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> >>> @@ -55,6 +55,13 @@ struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
> >>>  	return xa_load(&vdev->ctx, index);
> >>>  }
> >>>  
> >>> +static void vfio_irq_ctx_free(struct vfio_pci_core_device *vdev,
> >>> +			      struct vfio_pci_irq_ctx *ctx, unsigned long index)
> >>> +{
> >>> +	xa_erase(&vdev->ctx, index);
> >>> +	kfree(ctx);
> >>> +}  
> > 
> > Also, the function below should use this rather than open coding the
> > same now.  Thanks,  
> 
> It should, yes. Thank you. Will do.
> 
> 
> >>>  static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)
> >>>  {
> >>>  	struct vfio_pci_irq_ctx *ctx;
> >>> @@ -409,33 +416,62 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >>>  {
> >>>  	struct pci_dev *pdev = vdev->pdev;
> >>>  	struct vfio_pci_irq_ctx *ctx;
> >>> +	struct msi_map msix_map = {};
> >>> +	bool allow_dyn_alloc = false;
> >>>  	struct eventfd_ctx *trigger;
> >>> +	bool new_ctx = false;
> >>>  	int irq, ret;
> >>>  	u16 cmd;
> >>>  
> >>> +	/* Only MSI-X allows dynamic allocation. */
> >>> +	if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
> >>> +		allow_dyn_alloc = true;    
> >>
> >> Should vfio-pci-core probe this and store it in a field on
> >> vfio_pci_core_device so that we can simply use something like
> >> vdev->has_dyn_msix throughout?  
> 
> It is not obvious to me if you mean this with vfio-pci-core probe,
> but it looks like a change to vfio_pci_core_enable() may be
> appropriate with a snippet like below:
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index a743b98ba29a..a474ce80a555 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -533,6 +533,8 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
>  	} else
>  		vdev->msix_bar = 0xFF;
>  
> +	vdev->has_dyn_msix = pci_msix_can_alloc_dyn(pdev);
> +
>  	if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
>  		vdev->has_vga = true;
>  
> Please do note that I placed it outside of the earlier "if (msix_pos)" since
> pci_msix_can_alloc_dyn() has its own "if (!dev->msix_cap)". If you prefer
> to keep all the vdev->*msix* together I can move it into the if statement.

Sure, just for common grouping I'd probably put it within the existing
msix_cap branch.
 
> With vdev->has_dyn_msix available "allow_dyn_alloc" can be dropped as you
> stated.
> 
> >>  
> >>> +
> >>>  	ctx = vfio_irq_ctx_get(vdev, vector);
> >>> -	if (!ctx)
> >>> +	if (!ctx && !allow_dyn_alloc)
> >>>  		return -EINVAL;
> >>> +
> >>>  	irq = pci_irq_vector(pdev, vector);
> >>> +	/* Context and interrupt are always allocated together. */
> >>> +	WARN_ON((ctx && irq == -EINVAL) || (!ctx && irq != -EINVAL));
> >>>  
> >>> -	if (ctx->trigger) {
> >>> +	if (ctx && ctx->trigger) {
> >>>  		irq_bypass_unregister_producer(&ctx->producer);
> >>>  
> >>>  		cmd = vfio_pci_memory_lock_and_enable(vdev);
> >>>  		free_irq(irq, ctx->trigger);
> >>> +		if (allow_dyn_alloc) {    
> >>
> >> It almost seems easier to define msix_map in each scope that it's used:
> >>
> >> 			struct msi_map map = { .index = vector,
> >> 					       .virq = irq };
> >>  
> 
> Sure. Will do.
> 
> >>> +			msix_map.index = vector;
> >>> +			msix_map.virq = irq;
> >>> +			pci_msix_free_irq(pdev, msix_map);
> >>> +			irq = -EINVAL;
> >>> +		}
> >>>  		vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >>>  		kfree(ctx->name);
> >>>  		eventfd_ctx_put(ctx->trigger);
> >>>  		ctx->trigger = NULL;
> >>> +		if (allow_dyn_alloc) {
> >>> +			vfio_irq_ctx_free(vdev, ctx, vector);
> >>> +			ctx = NULL;
> >>> +		}
> >>>  	}
> >>>  
> >>>  	if (fd < 0)
> >>>  		return 0;
> >>>  
> >>> +	if (!ctx) {
> >>> +		ctx = vfio_irq_ctx_alloc_single(vdev, vector);
> >>> +		if (!ctx)
> >>> +			return -ENOMEM;
> >>> +		new_ctx = true;
> >>> +	}
> >>> +
> >>>  	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
> >>>  			      msix ? "x" : "", vector, pci_name(pdev));
> >>> -	if (!ctx->name)
> >>> -		return -ENOMEM;
> >>> +	if (!ctx->name) {
> >>> +		ret = -ENOMEM;
> >>> +		goto out_free_ctx;
> >>> +	}
> >>>  
> >>>  	trigger = eventfd_ctx_fdget(fd);
> >>>  	if (IS_ERR(trigger)) {
> >>> @@ -443,25 +479,38 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >>>  		goto out_free_name;
> >>>  	}
> >>>  
> >>> -	/*
> >>> -	 * The MSIx vector table resides in device memory which may be cleared
> >>> -	 * via backdoor resets. We don't allow direct access to the vector
> >>> -	 * table so even if a userspace driver attempts to save/restore around
> >>> -	 * such a reset it would be unsuccessful. To avoid this, restore the
> >>> -	 * cached value of the message prior to enabling.
> >>> -	 */
> >>>  	cmd = vfio_pci_memory_lock_and_enable(vdev);
> >>>  	if (msix) {
> >>> -		struct msi_msg msg;
> >>> -
> >>> -		get_cached_msi_msg(irq, &msg);
> >>> -		pci_write_msi_msg(irq, &msg);
> >>> +		if (irq == -EINVAL) {
> >>> +			msix_map = pci_msix_alloc_irq_at(pdev, vector, NULL);    
> >>
> >> 			struct msi_map map = pci_msix_alloc_irq_at(pdev,
> >> 								vector, NULL);  
> 
> Will do.
> 
> >>> +			if (msix_map.index < 0) {
> >>> +				vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >>> +				ret = msix_map.index;
> >>> +				goto out_put_eventfd_ctx;
> >>> +			}
> >>> +			irq = msix_map.virq;
> >>> +		} else {
> >>> +			/*
> >>> +			 * The MSIx vector table resides in device memory which
> >>> +			 * may be cleared via backdoor resets. We don't allow
> >>> +			 * direct access to the vector table so even if a
> >>> +			 * userspace driver attempts to save/restore around
> >>> +			 * such a reset it would be unsuccessful. To avoid
> >>> +			 * this, restore the cached value of the message prior
> >>> +			 * to enabling.
> >>> +			 */    
> >>
> >> You've only just copied this comment down to here, but I think it's a
> >> bit stale.  Maybe we should update it to something that helps explain
> >> this split better, maybe:
> >>
> >> 			/*
> >> 			 * If the vector was previously allocated, refresh the
> >> 			 * on-device message data before enabling in case it had
> >> 			 * been cleared or corrupted since writing.
> >> 			 */
> >>
> >> IIRC, that was the purpose of writing it back to the device and the
> >> blocking of direct access is no longer accurate anyway.  
> 
> Thank you. Will do. To keep this patch focused I plan to separate
> this change into a new prep patch that will be placed earlier in
> this series.

Ok.

> >>  
> >>> +			struct msi_msg msg;
> >>> +
> >>> +			get_cached_msi_msg(irq, &msg);
> >>> +			pci_write_msi_msg(irq, &msg);
> >>> +		}
> >>>  	}
> >>>  
> >>>  	ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
> >>> -	vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >>>  	if (ret)
> >>> -		goto out_put_eventfd_ctx;
> >>> +		goto out_free_irq_locked;
> >>> +
> >>> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >>>  
> >>>  	ctx->producer.token = trigger;
> >>>  	ctx->producer.irq = irq;
> >>> @@ -477,11 +526,21 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >>>  
> >>>  	return 0;
> >>>  
> >>> +out_free_irq_locked:
> >>> +	if (allow_dyn_alloc && new_ctx) {    
> >>
> >> 		struct msi_map map = { .index = vector,
> >> 				       .virq = irq };
> >>  
> 
> Will do.
> 
> >>> +		msix_map.index = vector;
> >>> +		msix_map.virq = irq;
> >>> +		pci_msix_free_irq(pdev, msix_map);
> >>> +	}
> >>> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >>>  out_put_eventfd_ctx:
> >>>  	eventfd_ctx_put(trigger);
> >>>  out_free_name:
> >>>  	kfree(ctx->name);
> >>>  	ctx->name = NULL;
> >>> +out_free_ctx:
> >>> +	if (allow_dyn_alloc && new_ctx)
> >>> +		vfio_irq_ctx_free(vdev, ctx, vector);
> >>>  	return ret;
> >>>  }
> >>>      
> >>
> >> Do we really need the new_ctx test in the above cases?  Thanks,  
> 
> new_ctx is not required for correctness but instead is used to keep
> the code symmetric. 
> Specifically, if the user enables MSI-X without providing triggers and
> then later assign triggers then an error path without new_ctx would unwind
> more than done in this function, it would free the context that
> was allocated within vfio_msi_enable(). 

Seems like we already have that asymmetry, if a trigger is unset we'll
free the ctx allocated by vfio_msi_enable().  Tracking which are
allocated where is unnecessarily complex, how about a policy that
devices supporting vdev->has_dyn_msix only ever have active contexts
allocated?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-31 22:24         ` Alex Williamson
@ 2023-04-03 17:31           ` Reinette Chatre
  2023-04-03 20:22             ` Alex Williamson
  0 siblings, 1 reply; 39+ messages in thread
From: Reinette Chatre @ 2023-04-03 17:31 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

Hi Alex,

On 3/31/2023 3:24 PM, Alex Williamson wrote:
> On Fri, 31 Mar 2023 10:49:16 -0700
> Reinette Chatre <reinette.chatre@intel.com> wrote:
>> On 3/30/2023 3:42 PM, Alex Williamson wrote:
>>> On Thu, 30 Mar 2023 16:40:50 -0600
>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>   
>>>> On Tue, 28 Mar 2023 14:53:34 -0700
>>>> Reinette Chatre <reinette.chatre@intel.com> wrote:
>>>>  

...

>>>>> +		msix_map.index = vector;
>>>>> +		msix_map.virq = irq;
>>>>> +		pci_msix_free_irq(pdev, msix_map);
>>>>> +	}
>>>>> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
>>>>>  out_put_eventfd_ctx:
>>>>>  	eventfd_ctx_put(trigger);
>>>>>  out_free_name:
>>>>>  	kfree(ctx->name);
>>>>>  	ctx->name = NULL;
>>>>> +out_free_ctx:
>>>>> +	if (allow_dyn_alloc && new_ctx)
>>>>> +		vfio_irq_ctx_free(vdev, ctx, vector);
>>>>>  	return ret;
>>>>>  }
>>>>>      
>>>>
>>>> Do we really need the new_ctx test in the above cases?  Thanks,  
>>
>> new_ctx is not required for correctness but instead is used to keep
>> the code symmetric. 
>> Specifically, if the user enables MSI-X without providing triggers and
>> then later assign triggers then an error path without new_ctx would unwind
>> more than done in this function, it would free the context that
>> was allocated within vfio_msi_enable(). 
> 
> Seems like we already have that asymmetry, if a trigger is unset we'll
> free the ctx allocated by vfio_msi_enable().  Tracking which are

Apologies, but could you please elaborate on where the asymmetry is? I am
not able to see a flow in this solution where the ctx allocated by
vfio_msi_enable() is freed if the trigger is unset.

> allocated where is unnecessarily complex, how about a policy that

I do not see this as tracking where allocations are made. Instead I
see it as containing/compartmentalizing state changes with the goal of
making the code easier to understand and maintain. Specifically, new_ctx
is used so that if vfio_msi_set_vector_signal() fails, the state 
before and after vfio_msi_set_vector_signal() will be the same.

I do agree that it makes vfio_msi_set_vector_signal() more complex
and I can remove new_ctx if you find that this is unnecessary after
considering the motivations behind its use. 

> devices supporting vdev->has_dyn_msix only ever have active contexts
> allocated?  Thanks,

What do you see as an "active context"? A policy that is currently enforced
is that an allocated context always has an allocated interrupt associated
with it. I do not see how this could be expanded to also require an
enabled interrupt because interrupt enabling requires a trigger that
may not be available.

Reinette




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-04-03 17:31           ` Reinette Chatre
@ 2023-04-03 20:22             ` Alex Williamson
  2023-04-03 22:50               ` Reinette Chatre
  0 siblings, 1 reply; 39+ messages in thread
From: Alex Williamson @ 2023-04-03 20:22 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

On Mon, 3 Apr 2023 10:31:23 -0700
Reinette Chatre <reinette.chatre@intel.com> wrote:

> Hi Alex,
> 
> On 3/31/2023 3:24 PM, Alex Williamson wrote:
> > On Fri, 31 Mar 2023 10:49:16 -0700
> > Reinette Chatre <reinette.chatre@intel.com> wrote:  
> >> On 3/30/2023 3:42 PM, Alex Williamson wrote:  
> >>> On Thu, 30 Mar 2023 16:40:50 -0600
> >>> Alex Williamson <alex.williamson@redhat.com> wrote:
> >>>     
> >>>> On Tue, 28 Mar 2023 14:53:34 -0700
> >>>> Reinette Chatre <reinette.chatre@intel.com> wrote:
> >>>>    
> 
> ...
> 
> >>>>> +		msix_map.index = vector;
> >>>>> +		msix_map.virq = irq;
> >>>>> +		pci_msix_free_irq(pdev, msix_map);
> >>>>> +	}
> >>>>> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >>>>>  out_put_eventfd_ctx:
> >>>>>  	eventfd_ctx_put(trigger);
> >>>>>  out_free_name:
> >>>>>  	kfree(ctx->name);
> >>>>>  	ctx->name = NULL;
> >>>>> +out_free_ctx:
> >>>>> +	if (allow_dyn_alloc && new_ctx)
> >>>>> +		vfio_irq_ctx_free(vdev, ctx, vector);
> >>>>>  	return ret;
> >>>>>  }
> >>>>>        
> >>>>
> >>>> Do we really need the new_ctx test in the above cases?  Thanks,    
> >>
> >> new_ctx is not required for correctness but instead is used to keep
> >> the code symmetric. 
> >> Specifically, if the user enables MSI-X without providing triggers and
> >> then later assign triggers then an error path without new_ctx would unwind
> >> more than done in this function, it would free the context that
> >> was allocated within vfio_msi_enable().   
> > 
> > Seems like we already have that asymmetry, if a trigger is unset we'll
> > free the ctx allocated by vfio_msi_enable().  Tracking which are  
> 
> Apologies, but could you please elaborate on where the asymmetry is? I am
> not able to see a flow in this solution where the ctx allocated by
> vfio_msi_enable() is freed if the trigger is unset.

The user first calls SET_IRQS to enable MSI-X with some number of
vectors with (potentially) an eventfd for each vector.  The user later
calls SET_IRQS passing a -1 eventfd for one or more of the vectors with
an eventfd initialized in the prior step.  Given that we find the ctx,
the ctx has a trigger, and assuming dynamic allocation is supported, the
ctx is freed and vfio_msi_set_vector_signal() returns w/o allocating a
new ctx.  We've de-allocated both the irq and context initialized from
vfio_msi_enable().

> > allocated where is unnecessarily complex, how about a policy that  
> 
> I do not see this as tracking where allocations are made. Instead I
> see it as containing/compartmentalizing state changes with the goal of
> making the code easier to understand and maintain. Specifically, new_ctx
> is used so that if vfio_msi_set_vector_signal() fails, the state 
> before and after vfio_msi_set_vector_signal() will be the same.

That's not really possible given how we teardown the existing ctx
before configuring the new one and unwind to disable contexts in
vfio_msi_set_block()

> I do agree that it makes vfio_msi_set_vector_signal() more complex
> and I can remove new_ctx if you find that this is unnecessary after
> considering the motivations behind its use. 

If the goal is to allow the user to swap one eventfd for another, where
the result will always be the new eventfd on success or the old eventfd
on error, I don't see that this code does that, or that we've ever
attempted to make such a guarantee.  If the ioctl errors, I think the
eventfds are generally deconfigured.   We certainly have the unwind code
that we discussed earlier that deconfigures all the vectors previously
touched in the loop (which seems to be another path where we could
de-allocate from the set of initial ctxs).
 
> > devices supporting vdev->has_dyn_msix only ever have active contexts
> > allocated?  Thanks,  
> 
> What do you see as an "active context"? A policy that is currently enforced
> is that an allocated context always has an allocated interrupt associated
> with it. I do not see how this could be expanded to also require an
> enabled interrupt because interrupt enabling requires a trigger that
> may not be available.

A context is essentially meant to track a trigger, ie. an eventfd
provided by the user.  In the static case all the irqs are necessarily
pre-allocated, therefore we had no reason to consider a dynamic array
for the contexts.  However, a given context is really only "active" if
it has a trigger, otherwise it's just a placeholder.  When the
placeholder is filled by an eventfd, the pre-allocated irq is enabled.

This proposal seems to be a hybrid approach, pre-allocating some
initial set of irqs and contexts and expecting the differentiation to
occur only when new vectors are added, though we have some disagreement
about this per above.  Unfortunately I don't see an API to enable MSI-X
without some vectors, so some pre-allocation of irqs seems to be
required regardless.

But if non-active contexts were only placeholders in the pre-dynamic
world and we now manage them via a dynamic array, why is there any
pre-allocation of contexts without knowing the nature of the eventfd to
fill it?  We could have more commonality between cases if contexts are
always dynamically allocated, which might simplify differentiation of
the has_dyn_msix cases largely to wrappers allocating and freeing irqs.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-04-03 20:22             ` Alex Williamson
@ 2023-04-03 22:50               ` Reinette Chatre
  2023-04-04  3:18                 ` Alex Williamson
  0 siblings, 1 reply; 39+ messages in thread
From: Reinette Chatre @ 2023-04-03 22:50 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

Hi Alex,

On 4/3/2023 1:22 PM, Alex Williamson wrote:
> On Mon, 3 Apr 2023 10:31:23 -0700
> Reinette Chatre <reinette.chatre@intel.com> wrote:
> 
>> Hi Alex,
>>
>> On 3/31/2023 3:24 PM, Alex Williamson wrote:
>>> On Fri, 31 Mar 2023 10:49:16 -0700
>>> Reinette Chatre <reinette.chatre@intel.com> wrote:  
>>>> On 3/30/2023 3:42 PM, Alex Williamson wrote:  
>>>>> On Thu, 30 Mar 2023 16:40:50 -0600
>>>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>>>     
>>>>>> On Tue, 28 Mar 2023 14:53:34 -0700
>>>>>> Reinette Chatre <reinette.chatre@intel.com> wrote:
>>>>>>    
>>
>> ...
>>
>>>>>>> +		msix_map.index = vector;
>>>>>>> +		msix_map.virq = irq;
>>>>>>> +		pci_msix_free_irq(pdev, msix_map);
>>>>>>> +	}
>>>>>>> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
>>>>>>>  out_put_eventfd_ctx:
>>>>>>>  	eventfd_ctx_put(trigger);
>>>>>>>  out_free_name:
>>>>>>>  	kfree(ctx->name);
>>>>>>>  	ctx->name = NULL;
>>>>>>> +out_free_ctx:
>>>>>>> +	if (allow_dyn_alloc && new_ctx)
>>>>>>> +		vfio_irq_ctx_free(vdev, ctx, vector);
>>>>>>>  	return ret;
>>>>>>>  }
>>>>>>>        
>>>>>>
>>>>>> Do we really need the new_ctx test in the above cases?  Thanks,    
>>>>
>>>> new_ctx is not required for correctness but instead is used to keep
>>>> the code symmetric. 
>>>> Specifically, if the user enables MSI-X without providing triggers and
>>>> then later assign triggers then an error path without new_ctx would unwind
>>>> more than done in this function, it would free the context that
>>>> was allocated within vfio_msi_enable().   
>>>
>>> Seems like we already have that asymmetry, if a trigger is unset we'll
>>> free the ctx allocated by vfio_msi_enable().  Tracking which are  
>>
>> Apologies, but could you please elaborate on where the asymmetry is? I am
>> not able to see a flow in this solution where the ctx allocated by
>> vfio_msi_enable() is freed if the trigger is unset.
> 
> The user first calls SET_IRQS to enable MSI-X with some number of
> vectors with (potentially) an eventfd for each vector.  The user later
> calls SET_IRQS passing a -1 eventfd for one or more of the vectors with
> an eventfd initialized in the prior step.  Given that we find the ctx,
> the ctx has a trigger, and assuming dynamic allocation is supported, the
> ctx is freed and vfio_msi_set_vector_signal() returns w/o allocating a
> new ctx.  We've de-allocated both the irq and context initialized from
> vfio_msi_enable().

This is correct. The comment I responded to was in regards to an unset
trigger. The flow you describe is when a trigger is set. Not that
it changes your point though, which is that vfio_msi_set_vector_signal()
frees memory allocated by vfio_msi_enable(). This is clear to me. This
is intended behavior. My concern is/was with the error path where a function
failing may not be expected to change state, you address that concern below.

>>> allocated where is unnecessarily complex, how about a policy that  
>>
>> I do not see this as tracking where allocations are made. Instead I
>> see it as containing/compartmentalizing state changes with the goal of
>> making the code easier to understand and maintain. Specifically, new_ctx
>> is used so that if vfio_msi_set_vector_signal() fails, the state 
>> before and after vfio_msi_set_vector_signal() will be the same.
> 
> That's not really possible given how we teardown the existing ctx
> before configuring the new one and unwind to disable contexts in
> vfio_msi_set_block()

Very unlikely indeed. I agree.

>> I do agree that it makes vfio_msi_set_vector_signal() more complex
>> and I can remove new_ctx if you find that this is unnecessary after
>> considering the motivations behind its use. 
> 
> If the goal is to allow the user to swap one eventfd for another, where
> the result will always be the new eventfd on success or the old eventfd
> on error, I don't see that this code does that, or that we've ever
> attempted to make such a guarantee.  If the ioctl errors, I think the
> eventfds are generally deconfigured.   We certainly have the unwind code
> that we discussed earlier that deconfigures all the vectors previously
> touched in the loop (which seems to be another path where we could
> de-allocate from the set of initial ctxs).

Thank you for your patience in hearing and addressing my concerns. I plan
to remove new_ctx in the next version.

>>> devices supporting vdev->has_dyn_msix only ever have active contexts
>>> allocated?  Thanks,  
>>
>> What do you see as an "active context"? A policy that is currently enforced
>> is that an allocated context always has an allocated interrupt associated
>> with it. I do not see how this could be expanded to also require an
>> enabled interrupt because interrupt enabling requires a trigger that
>> may not be available.
> 
> A context is essentially meant to track a trigger, ie. an eventfd
> provided by the user.  In the static case all the irqs are necessarily
> pre-allocated, therefore we had no reason to consider a dynamic array
> for the contexts.  However, a given context is really only "active" if
> it has a trigger, otherwise it's just a placeholder.  When the
> placeholder is filled by an eventfd, the pre-allocated irq is enabled.

I see.

> 
> This proposal seems to be a hybrid approach, pre-allocating some
> initial set of irqs and contexts and expecting the differentiation to
> occur only when new vectors are added, though we have some disagreement
> about this per above.  Unfortunately I don't see an API to enable MSI-X
> without some vectors, so some pre-allocation of irqs seems to be
> required regardless.

Right. pci_alloc_irq_vectors() or equivalent continues to be needed to
enable MSI-X. Even so, it does seem possible (within vfio_msi_enable())
to just allocate one vector using pci_alloc_irq_vectors()
and then immediately free it using pci_msix_free_irq(). What do you think?
If I understand correctly this can be done without allocating any context
and leave MSI-X enabled without any interrupts allocated. This could be a
way to accomplish the "active context" policy for dynamic allocation.
This is not a policy that can be applied broadly to interrupt contexts though
because MSI and non-dynamic MSI-X could still have contexts with allocated
interrupts without eventfd.

> But if non-active contexts were only placeholders in the pre-dynamic
> world and we now manage them via a dynamic array, why is there any
> pre-allocation of contexts without knowing the nature of the eventfd to
> fill it?  We could have more commonality between cases if contexts are
> always dynamically allocated, which might simplify differentiation of
> the has_dyn_msix cases largely to wrappers allocating and freeing irqs.
> Thanks,

Thank you very much for your guidance. I will digest this some more and 
see how wrappers could be used. In the mean time while trying to think how
to unify this code I do think there is an issue in this patch in that
the get_cached_msi_msg()/pci_write_msi_msg()
should not be in an else branch.

Specifically, I think it needs to be:
	if (msix) {
		if (irq == -EINVAL) {
			/* dynamically allocate interrupt */
		}
		get_cached_msi_msg(irq, &msg);
		pci_write_msi_msg(irq, &msg);
	}


Thank you very much

Reinette

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-04-03 22:50               ` Reinette Chatre
@ 2023-04-04  3:18                 ` Alex Williamson
  2023-04-04  3:51                   ` Tian, Kevin
  2023-04-04 16:54                   ` Reinette Chatre
  0 siblings, 2 replies; 39+ messages in thread
From: Alex Williamson @ 2023-04-04  3:18 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

On Mon, 3 Apr 2023 15:50:54 -0700
Reinette Chatre <reinette.chatre@intel.com> wrote:

> Hi Alex,
> 
> On 4/3/2023 1:22 PM, Alex Williamson wrote:
> > On Mon, 3 Apr 2023 10:31:23 -0700
> > Reinette Chatre <reinette.chatre@intel.com> wrote:
> >   
> >> Hi Alex,
> >>
> >> On 3/31/2023 3:24 PM, Alex Williamson wrote:  
> >>> On Fri, 31 Mar 2023 10:49:16 -0700
> >>> Reinette Chatre <reinette.chatre@intel.com> wrote:    
> >>>> On 3/30/2023 3:42 PM, Alex Williamson wrote:    
> >>>>> On Thu, 30 Mar 2023 16:40:50 -0600
> >>>>> Alex Williamson <alex.williamson@redhat.com> wrote:
> >>>>>       
> >>>>>> On Tue, 28 Mar 2023 14:53:34 -0700
> >>>>>> Reinette Chatre <reinette.chatre@intel.com> wrote:
> >>>>>>      
> >>
> >> ...
> >>  
> >>>>>>> +		msix_map.index = vector;
> >>>>>>> +		msix_map.virq = irq;
> >>>>>>> +		pci_msix_free_irq(pdev, msix_map);
> >>>>>>> +	}
> >>>>>>> +	vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >>>>>>>  out_put_eventfd_ctx:
> >>>>>>>  	eventfd_ctx_put(trigger);
> >>>>>>>  out_free_name:
> >>>>>>>  	kfree(ctx->name);
> >>>>>>>  	ctx->name = NULL;
> >>>>>>> +out_free_ctx:
> >>>>>>> +	if (allow_dyn_alloc && new_ctx)
> >>>>>>> +		vfio_irq_ctx_free(vdev, ctx, vector);
> >>>>>>>  	return ret;
> >>>>>>>  }
> >>>>>>>          
> >>>>>>
> >>>>>> Do we really need the new_ctx test in the above cases?  Thanks,      
> >>>>
> >>>> new_ctx is not required for correctness but instead is used to keep
> >>>> the code symmetric. 
> >>>> Specifically, if the user enables MSI-X without providing triggers and
> >>>> then later assign triggers then an error path without new_ctx would unwind
> >>>> more than done in this function, it would free the context that
> >>>> was allocated within vfio_msi_enable().     
> >>>
> >>> Seems like we already have that asymmetry, if a trigger is unset we'll
> >>> free the ctx allocated by vfio_msi_enable().  Tracking which are    
> >>
> >> Apologies, but could you please elaborate on where the asymmetry is? I am
> >> not able to see a flow in this solution where the ctx allocated by
> >> vfio_msi_enable() is freed if the trigger is unset.  
> > 
> > The user first calls SET_IRQS to enable MSI-X with some number of
> > vectors with (potentially) an eventfd for each vector.  The user later
> > calls SET_IRQS passing a -1 eventfd for one or more of the vectors with
> > an eventfd initialized in the prior step.  Given that we find the ctx,
> > the ctx has a trigger, and assuming dynamic allocation is supported, the
> > ctx is freed and vfio_msi_set_vector_signal() returns w/o allocating a
> > new ctx.  We've de-allocated both the irq and context initialized from
> > vfio_msi_enable().  
> 
> This is correct. The comment I responded to was in regards to an unset
> trigger. The flow you describe is when a trigger is set. Not that
> it changes your point though, which is that vfio_msi_set_vector_signal()
> frees memory allocated by vfio_msi_enable(). This is clear to me. This
> is intended behavior. My concern is/was with the error path where a function
> failing may not be expected to change state, you address that concern below.
> 
> >>> allocated where is unnecessarily complex, how about a policy that    
> >>
> >> I do not see this as tracking where allocations are made. Instead I
> >> see it as containing/compartmentalizing state changes with the goal of
> >> making the code easier to understand and maintain. Specifically, new_ctx
> >> is used so that if vfio_msi_set_vector_signal() fails, the state 
> >> before and after vfio_msi_set_vector_signal() will be the same.  
> > 
> > That's not really possible given how we teardown the existing ctx
> > before configuring the new one and unwind to disable contexts in
> > vfio_msi_set_block()  
> 
> Very unlikely indeed. I agree.
> 
> >> I do agree that it makes vfio_msi_set_vector_signal() more complex
> >> and I can remove new_ctx if you find that this is unnecessary after
> >> considering the motivations behind its use.   
> > 
> > If the goal is to allow the user to swap one eventfd for another, where
> > the result will always be the new eventfd on success or the old eventfd
> > on error, I don't see that this code does that, or that we've ever
> > attempted to make such a guarantee.  If the ioctl errors, I think the
> > eventfds are generally deconfigured.   We certainly have the unwind code
> > that we discussed earlier that deconfigures all the vectors previously
> > touched in the loop (which seems to be another path where we could
> > de-allocate from the set of initial ctxs).  
> 
> Thank you for your patience in hearing and addressing my concerns. I plan
> to remove new_ctx in the next version.
> 
> >>> devices supporting vdev->has_dyn_msix only ever have active contexts
> >>> allocated?  Thanks,    
> >>
> >> What do you see as an "active context"? A policy that is currently enforced
> >> is that an allocated context always has an allocated interrupt associated
> >> with it. I do not see how this could be expanded to also require an
> >> enabled interrupt because interrupt enabling requires a trigger that
> >> may not be available.  
> > 
> > A context is essentially meant to track a trigger, ie. an eventfd
> > provided by the user.  In the static case all the irqs are necessarily
> > pre-allocated, therefore we had no reason to consider a dynamic array
> > for the contexts.  However, a given context is really only "active" if
> > it has a trigger, otherwise it's just a placeholder.  When the
> > placeholder is filled by an eventfd, the pre-allocated irq is enabled.  
> 
> I see.
> 
> > 
> > This proposal seems to be a hybrid approach, pre-allocating some
> > initial set of irqs and contexts and expecting the differentiation to
> > occur only when new vectors are added, though we have some disagreement
> > about this per above.  Unfortunately I don't see an API to enable MSI-X
> > without some vectors, so some pre-allocation of irqs seems to be
> > required regardless.  
> 
> Right. pci_alloc_irq_vectors() or equivalent continues to be needed to
> enable MSI-X. Even so, it does seem possible (within vfio_msi_enable())
> to just allocate one vector using pci_alloc_irq_vectors()
> and then immediately free it using pci_msix_free_irq(). What do you think?

QEMU does something similar but I think it can really only be described
as a hack.  In this case I think we can work with them being allocated
since that's essentially the static path.

> If I understand correctly this can be done without allocating any context
> and leave MSI-X enabled without any interrupts allocated. This could be a
> way to accomplish the "active context" policy for dynamic allocation.
> This is not a policy that can be applied broadly to interrupt contexts though
> because MSI and non-dynamic MSI-X could still have contexts with allocated
> interrupts without eventfd.

I think we could come up with wrappers that handle all cases, for
example:

int vfio_pci_alloc_irq(struct vfio_pci_core_device *vdev,
		       unsigned int vector, int irq_type)
{
	struct pci_dev *pdev = vdev->pdev;
	struct msi_map map;
	int irq;

	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX)
		return pdev->irq ?: -EINVAL;

	irq = pci_irq_vector(pdev, vector);
	if (irq > 0 || irq_type == VFIO_PCI_MSI_IRQ_INDEX ||
	    !vdev->has_dyn_msix)
		return irq;

	map = pci_msix_alloc_irq_at(pdev, vector, NULL);

	return map.index;
}

void vfio_pci_free_irq(struct vfio_pci_core_device *vdev,
		       unsigned in vector, int irq_type)
{
	struct msi_map map;
	int irq;

	if (irq_type != VFIO_PCI_INTX_MSIX_INDEX ||
	    !vdev->has_dyn_msix)
		return;

	irq = pci_irq_vector(pdev, vector);
	map = { .index = vector, .virq = irq };

	if (WARN_ON(irq < 0))
		return;

	pci_msix_free_irq(pdev, msix_map);
}

At that point, maybe we'd check whether it makes sense to embed the irq
alloc/free within the ctx alloc/free.
 
> > But if non-active contexts were only placeholders in the pre-dynamic
> > world and we now manage them via a dynamic array, why is there any
> > pre-allocation of contexts without knowing the nature of the eventfd to
> > fill it?  We could have more commonality between cases if contexts are
> > always dynamically allocated, which might simplify differentiation of
> > the has_dyn_msix cases largely to wrappers allocating and freeing irqs.
> > Thanks,  
> 
> Thank you very much for your guidance. I will digest this some more and 
> see how wrappers could be used. In the mean time while trying to think how
> to unify this code I do think there is an issue in this patch in that
> the get_cached_msi_msg()/pci_write_msi_msg()
> should not be in an else branch.
> 
> Specifically, I think it needs to be:
> 	if (msix) {
> 		if (irq == -EINVAL) {
> 			/* dynamically allocate interrupt */
> 		}
> 		get_cached_msi_msg(irq, &msg);
> 		pci_write_msi_msg(irq, &msg);
> 	}

Yes, that's looked wrong to me all along, I think that resolves it.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-03-31 13:51     ` Alex Williamson
@ 2023-04-04  3:19       ` Liu, Jing2
  0 siblings, 0 replies; 39+ messages in thread
From: Liu, Jing2 @ 2023-04-04  3:19 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Chatre, Reinette, jgg, yishaih, shameerali.kolothum.thodi, Tian,
	Kevin, tglx, darwi, kvm, Jiang, Dave, Raj, Ashok, Yu, Fenghua,
	tom.zanussi, linux-kernel



> From: Alex Williamson <alex.williamson@redhat.com>
> On Fri, 31 Mar 2023 10:02:32 +0000
> "Liu, Jing2" <jing2.liu@intel.com> wrote:
> 
> > Hi Reinette,
> >
> > > @@ -409,33 +416,62 @@ static int vfio_msi_set_vector_signal(struct
> > > vfio_pci_core_device *vdev,  {
> > >  	struct pci_dev *pdev = vdev->pdev;
> > >  	struct vfio_pci_irq_ctx *ctx;
> > > +	struct msi_map msix_map = {};
> > > +	bool allow_dyn_alloc = false;
> > >  	struct eventfd_ctx *trigger;
> > > +	bool new_ctx = false;
> > >  	int irq, ret;
> > >  	u16 cmd;
> > >
> > > +	/* Only MSI-X allows dynamic allocation. */
> > > +	if (msix && pci_msix_can_alloc_dyn(vdev->pdev))
> > > +		allow_dyn_alloc = true;
> > > +
> > >  	ctx = vfio_irq_ctx_get(vdev, vector);
> > > -	if (!ctx)
> > > +	if (!ctx && !allow_dyn_alloc)
> > >  		return -EINVAL;
> > > +
> > >  	irq = pci_irq_vector(pdev, vector);
> > > +	/* Context and interrupt are always allocated together. */
> > > +	WARN_ON((ctx && irq == -EINVAL) || (!ctx && irq != -EINVAL));
> > >
> > > -	if (ctx->trigger) {
> > > +	if (ctx && ctx->trigger) {
> > >  		irq_bypass_unregister_producer(&ctx->producer);
> > >
> > >  		cmd = vfio_pci_memory_lock_and_enable(vdev);
> > >  		free_irq(irq, ctx->trigger);
> > > +		if (allow_dyn_alloc) {
> > > +			msix_map.index = vector;
> > > +			msix_map.virq = irq;
> > > +			pci_msix_free_irq(pdev, msix_map);
> > > +			irq = -EINVAL;
> > > +		}
> > >  		vfio_pci_memory_unlock_and_restore(vdev, cmd);
> > >  		kfree(ctx->name);
> > >  		eventfd_ctx_put(ctx->trigger);
> > >  		ctx->trigger = NULL;
> > > +		if (allow_dyn_alloc) {
> > > +			vfio_irq_ctx_free(vdev, ctx, vector);
> > > +			ctx = NULL;
> > > +		}
> > >  	}
> > >
> > >  	if (fd < 0)
> > >  		return 0;
> > >
> >
> > While looking at this piece of code, one thought comes to me:
> > It might be possible that userspace comes twice with the same valid fd
> > for a specific vector, this vfio code will free the resource in if(ctx
> > && ctx->trigger) for the second time, and then re-alloc again for the same fd
> given by userspace.
> >
> > Would that help if we firstly check e.g. ctx->trigger with the given
> > valid fd, to see if the trigger is really changed or not?
> 
> It's rather a made-up situation, if userspace wants to avoid bouncing the vector
> when the eventfd hasn't changed they can simply test this before calling the ioctl.
> Thanks,
> 
> Alex

Thank you very much for your answer. 

The reason is I see Qemu has such behaviors sending the same valid fd for a 
specific vector twice, when it wants to disable the MSIx (switching fd from kvm bypass
to non-bypass).

So the right consideration is, userspace should be responsible for avoiding doing the
same thing several times, if it doesn't want that.

Thanks,
Jing


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-04-04  3:18                 ` Alex Williamson
@ 2023-04-04  3:51                   ` Tian, Kevin
  2023-04-04 17:29                     ` Reinette Chatre
  2023-04-04 16:54                   ` Reinette Chatre
  1 sibling, 1 reply; 39+ messages in thread
From: Tian, Kevin @ 2023-04-04  3:51 UTC (permalink / raw)
  To: Alex Williamson, Chatre, Reinette
  Cc: jgg, yishaih, shameerali.kolothum.thodi, tglx, darwi, kvm, Jiang,
	Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua, tom.zanussi,
	linux-kernel

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, April 4, 2023 11:19 AM
> >
> > Thank you very much for your guidance. I will digest this some more and
> > see how wrappers could be used. In the mean time while trying to think
> how
> > to unify this code I do think there is an issue in this patch in that
> > the get_cached_msi_msg()/pci_write_msi_msg()
> > should not be in an else branch.
> >
> > Specifically, I think it needs to be:
> > 	if (msix) {
> > 		if (irq == -EINVAL) {
> > 			/* dynamically allocate interrupt */
> > 		}
> > 		get_cached_msi_msg(irq, &msg);
> > 		pci_write_msi_msg(irq, &msg);
> > 	}
> 
> Yes, that's looked wrong to me all along, I think that resolves it.
> Thanks,
> 

Do you mind elaborating why this change is required? I thought
pci_msix_alloc_irq_at() will compose a new msi message to write
hence no need to get cached value again in that case...

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-04-04  3:18                 ` Alex Williamson
  2023-04-04  3:51                   ` Tian, Kevin
@ 2023-04-04 16:54                   ` Reinette Chatre
  2023-04-04 18:24                     ` Alex Williamson
  1 sibling, 1 reply; 39+ messages in thread
From: Reinette Chatre @ 2023-04-04 16:54 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

Hi Alex,

On 4/3/2023 8:18 PM, Alex Williamson wrote:
> On Mon, 3 Apr 2023 15:50:54 -0700
> Reinette Chatre <reinette.chatre@intel.com> wrote:
>> On 4/3/2023 1:22 PM, Alex Williamson wrote:
>>> On Mon, 3 Apr 2023 10:31:23 -0700
>>> Reinette Chatre <reinette.chatre@intel.com> wrote:
>>>> On 3/31/2023 3:24 PM, Alex Williamson wrote:  
>>>>> On Fri, 31 Mar 2023 10:49:16 -0700
>>>>> Reinette Chatre <reinette.chatre@intel.com> wrote:    
>>>>>> On 3/30/2023 3:42 PM, Alex Williamson wrote:    
>>>>>>> On Thu, 30 Mar 2023 16:40:50 -0600
>>>>>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>>>>>       
>>>>>>>> On Tue, 28 Mar 2023 14:53:34 -0700
>>>>>>>> Reinette Chatre <reinette.chatre@intel.com> wrote:
>>>>>>>>      


...

>>> If the goal is to allow the user to swap one eventfd for another, where
>>> the result will always be the new eventfd on success or the old eventfd
>>> on error, I don't see that this code does that, or that we've ever
>>> attempted to make such a guarantee.  If the ioctl errors, I think the
>>> eventfds are generally deconfigured.   We certainly have the unwind code
>>> that we discussed earlier that deconfigures all the vectors previously
>>> touched in the loop (which seems to be another path where we could
>>> de-allocate from the set of initial ctxs).  
>>
>> Thank you for your patience in hearing and addressing my concerns. I plan
>> to remove new_ctx in the next version.
>>
>>>>> devices supporting vdev->has_dyn_msix only ever have active contexts
>>>>> allocated?  Thanks,    
>>>>
>>>> What do you see as an "active context"? A policy that is currently enforced
>>>> is that an allocated context always has an allocated interrupt associated
>>>> with it. I do not see how this could be expanded to also require an
>>>> enabled interrupt because interrupt enabling requires a trigger that
>>>> may not be available.  
>>>
>>> A context is essentially meant to track a trigger, ie. an eventfd
>>> provided by the user.  In the static case all the irqs are necessarily
>>> pre-allocated, therefore we had no reason to consider a dynamic array
>>> for the contexts.  However, a given context is really only "active" if
>>> it has a trigger, otherwise it's just a placeholder.  When the
>>> placeholder is filled by an eventfd, the pre-allocated irq is enabled.  
>>
>> I see.
>>
>>>
>>> This proposal seems to be a hybrid approach, pre-allocating some
>>> initial set of irqs and contexts and expecting the differentiation to
>>> occur only when new vectors are added, though we have some disagreement
>>> about this per above.  Unfortunately I don't see an API to enable MSI-X
>>> without some vectors, so some pre-allocation of irqs seems to be
>>> required regardless.  
>>
>> Right. pci_alloc_irq_vectors() or equivalent continues to be needed to
>> enable MSI-X. Even so, it does seem possible (within vfio_msi_enable())
>> to just allocate one vector using pci_alloc_irq_vectors()
>> and then immediately free it using pci_msix_free_irq(). What do you think?
> 
> QEMU does something similar but I think it can really only be described
> as a hack.  In this case I think we can work with them being allocated
> since that's essentially the static path.

ok. In this case I understand the hybrid approach to be required. Without
something (a hack) like this I am not able to see how an "active context"
policy can be enforced though. Interrupts allocated during MSI-X enabling may
not have eventfd associated and thus cannot adhere to an "active context" policy. I
understand from  earlier comments that we do not want to track where contexts
are allocated so I can only see a way to enforce a policy that a context has
an allocated interrupt, but not an enabled interrupt.

>> If I understand correctly this can be done without allocating any context
>> and leave MSI-X enabled without any interrupts allocated. This could be a
>> way to accomplish the "active context" policy for dynamic allocation.
>> This is not a policy that can be applied broadly to interrupt contexts though
>> because MSI and non-dynamic MSI-X could still have contexts with allocated
>> interrupts without eventfd.
> 
> I think we could come up with wrappers that handle all cases, for
> example:
> 
> int vfio_pci_alloc_irq(struct vfio_pci_core_device *vdev,
> 		       unsigned int vector, int irq_type)
> {
> 	struct pci_dev *pdev = vdev->pdev;
> 	struct msi_map map;
> 	int irq;
> 
> 	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX)
> 		return pdev->irq ?: -EINVAL;
> 
> 	irq = pci_irq_vector(pdev, vector);
> 	if (irq > 0 || irq_type == VFIO_PCI_MSI_IRQ_INDEX ||
> 	    !vdev->has_dyn_msix)
> 		return irq;
> 
> 	map = pci_msix_alloc_irq_at(pdev, vector, NULL);
> 
> 	return map.index;
> }
> 
> void vfio_pci_free_irq(struct vfio_pci_core_device *vdev,
> 		       unsigned in vector, int irq_type)
> {
> 	struct msi_map map;
> 	int irq;
> 
> 	if (irq_type != VFIO_PCI_INTX_MSIX_INDEX ||
> 	    !vdev->has_dyn_msix)
> 		return;
> 
> 	irq = pci_irq_vector(pdev, vector);
> 	map = { .index = vector, .virq = irq };
> 
> 	if (WARN_ON(irq < 0))
> 		return;
> 
> 	pci_msix_free_irq(pdev, msix_map);
> }

Thank you very much for taking the time to write this out. I am not able to
see where vfio_pci_alloc_irq()/vfio_pci_free_irq() would be called for
an INTx interrupt. Is the INTx handling there for robustness or am I
missing how it should be used for INTx interrupts?

> At that point, maybe we'd check whether it makes sense to embed the irq
> alloc/free within the ctx alloc/free.

I think doing so would be the right thing to do since it helps
to enforce the policy that interrupts and contexts are allocated together.
I think this can be done when switching around the initialization within 
vfio_msi_set_vector_signal(). I need to look into this more.

>>> But if non-active contexts were only placeholders in the pre-dynamic
>>> world and we now manage them via a dynamic array, why is there any
>>> pre-allocation of contexts without knowing the nature of the eventfd to
>>> fill it?  We could have more commonality between cases if contexts are
>>> always dynamically allocated, which might simplify differentiation of
>>> the has_dyn_msix cases largely to wrappers allocating and freeing irqs.
>>> Thanks,  
>>
>> Thank you very much for your guidance. I will digest this some more and 
>> see how wrappers could be used. In the mean time while trying to think how
>> to unify this code I do think there is an issue in this patch in that
>> the get_cached_msi_msg()/pci_write_msi_msg()
>> should not be in an else branch.
>>
>> Specifically, I think it needs to be:
>> 	if (msix) {
>> 		if (irq == -EINVAL) {
>> 			/* dynamically allocate interrupt */
>> 		}
>> 		get_cached_msi_msg(irq, &msg);
>> 		pci_write_msi_msg(irq, &msg);
>> 	}
> 
> Yes, that's looked wrong to me all along, I think that resolves it.

Thank you very much.

Reinette


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-04-04  3:51                   ` Tian, Kevin
@ 2023-04-04 17:29                     ` Reinette Chatre
  2023-04-04 18:43                       ` Alex Williamson
  0 siblings, 1 reply; 39+ messages in thread
From: Reinette Chatre @ 2023-04-04 17:29 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson
  Cc: jgg, yishaih, shameerali.kolothum.thodi, tglx, darwi, kvm, Jiang,
	Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua, tom.zanussi,
	linux-kernel

Hi Kevin,

On 4/3/2023 8:51 PM, Tian, Kevin wrote:
>> From: Alex Williamson <alex.williamson@redhat.com>
>> Sent: Tuesday, April 4, 2023 11:19 AM
>>>
>>> Thank you very much for your guidance. I will digest this some more and
>>> see how wrappers could be used. In the mean time while trying to think
>> how
>>> to unify this code I do think there is an issue in this patch in that
>>> the get_cached_msi_msg()/pci_write_msi_msg()
>>> should not be in an else branch.
>>>
>>> Specifically, I think it needs to be:
>>> 	if (msix) {
>>> 		if (irq == -EINVAL) {
>>> 			/* dynamically allocate interrupt */
>>> 		}
>>> 		get_cached_msi_msg(irq, &msg);
>>> 		pci_write_msi_msg(irq, &msg);
>>> 	}
>>
>> Yes, that's looked wrong to me all along, I think that resolves it.
>> Thanks,
>>
> 
> Do you mind elaborating why this change is required? I thought
> pci_msix_alloc_irq_at() will compose a new msi message to write
> hence no need to get cached value again in that case...

With this change an interrupt allocated via pci_msix_alloc_irq_at()
is treated the same as an interrupt allocated via pci_alloc_irq_vectors().

get_cached_msi_msg()/pci_write_msi_msg() is currently called for
every allocated interrupt and this snippet intends to maintain
this behavior.

One flow I considered that made me think this is fixing a bug is
as follows:
Scenario A (current behavior):
- host/user enables vectors 0, 1, 2 ,3 ,4
  - kernel allocates all interrupts via pci_alloc_irq_vectors()
  - get_cached_msi_msg()/pci_write_msi_msg() is called for each interrupt

Scenario B (this series):
- host/user enables vector 0
  - kernel allocates interrupt 0 via pci_alloc_irq_vectors()
  - get_cached_msi_msg()/pci_write_msi_msg() is called for interrupt 0
- host/user enables vector 1
  - kernel allocates interrupt 1 via pci_msix_alloc_irq_at()
  - get_cached_msi_msg()/pci_write_msi_msg() is NOT called for interrupt 1
    /* This seems a bug since host may expect same outcome as in scenario A */

I am not familiar with how the MSI messages are composed though and I surely
could have gotten this wrong. I would like to learn more after you considered
the motivation for this change.

Reinette

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-04-04 16:54                   ` Reinette Chatre
@ 2023-04-04 18:24                     ` Alex Williamson
  2023-04-06 20:13                       ` Reinette Chatre
  0 siblings, 1 reply; 39+ messages in thread
From: Alex Williamson @ 2023-04-04 18:24 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

On Tue, 4 Apr 2023 09:54:46 -0700
Reinette Chatre <reinette.chatre@intel.com> wrote:

> Hi Alex,
> 
> On 4/3/2023 8:18 PM, Alex Williamson wrote:
> > On Mon, 3 Apr 2023 15:50:54 -0700
> > Reinette Chatre <reinette.chatre@intel.com> wrote:  
> >> On 4/3/2023 1:22 PM, Alex Williamson wrote:  
> >>> On Mon, 3 Apr 2023 10:31:23 -0700
> >>> Reinette Chatre <reinette.chatre@intel.com> wrote:  
> >>>> On 3/31/2023 3:24 PM, Alex Williamson wrote:    
> >>>>> On Fri, 31 Mar 2023 10:49:16 -0700
> >>>>> Reinette Chatre <reinette.chatre@intel.com> wrote:      
> >>>>>> On 3/30/2023 3:42 PM, Alex Williamson wrote:      
> >>>>>>> On Thu, 30 Mar 2023 16:40:50 -0600
> >>>>>>> Alex Williamson <alex.williamson@redhat.com> wrote:
> >>>>>>>         
> >>>>>>>> On Tue, 28 Mar 2023 14:53:34 -0700
> >>>>>>>> Reinette Chatre <reinette.chatre@intel.com> wrote:
> >>>>>>>>        
> 
> 
> ...
> 
> >>> If the goal is to allow the user to swap one eventfd for another, where
> >>> the result will always be the new eventfd on success or the old eventfd
> >>> on error, I don't see that this code does that, or that we've ever
> >>> attempted to make such a guarantee.  If the ioctl errors, I think the
> >>> eventfds are generally deconfigured.   We certainly have the unwind code
> >>> that we discussed earlier that deconfigures all the vectors previously
> >>> touched in the loop (which seems to be another path where we could
> >>> de-allocate from the set of initial ctxs).    
> >>
> >> Thank you for your patience in hearing and addressing my concerns. I plan
> >> to remove new_ctx in the next version.
> >>  
> >>>>> devices supporting vdev->has_dyn_msix only ever have active contexts
> >>>>> allocated?  Thanks,      
> >>>>
> >>>> What do you see as an "active context"? A policy that is currently enforced
> >>>> is that an allocated context always has an allocated interrupt associated
> >>>> with it. I do not see how this could be expanded to also require an
> >>>> enabled interrupt because interrupt enabling requires a trigger that
> >>>> may not be available.    
> >>>
> >>> A context is essentially meant to track a trigger, ie. an eventfd
> >>> provided by the user.  In the static case all the irqs are necessarily
> >>> pre-allocated, therefore we had no reason to consider a dynamic array
> >>> for the contexts.  However, a given context is really only "active" if
> >>> it has a trigger, otherwise it's just a placeholder.  When the
> >>> placeholder is filled by an eventfd, the pre-allocated irq is enabled.    
> >>
> >> I see.
> >>  
> >>>
> >>> This proposal seems to be a hybrid approach, pre-allocating some
> >>> initial set of irqs and contexts and expecting the differentiation to
> >>> occur only when new vectors are added, though we have some disagreement
> >>> about this per above.  Unfortunately I don't see an API to enable MSI-X
> >>> without some vectors, so some pre-allocation of irqs seems to be
> >>> required regardless.    
> >>
> >> Right. pci_alloc_irq_vectors() or equivalent continues to be needed to
> >> enable MSI-X. Even so, it does seem possible (within vfio_msi_enable())
> >> to just allocate one vector using pci_alloc_irq_vectors()
> >> and then immediately free it using pci_msix_free_irq(). What do you think?  
> > 
> > QEMU does something similar but I think it can really only be described
> > as a hack.  In this case I think we can work with them being allocated
> > since that's essentially the static path.  
> 
> ok. In this case I understand the hybrid approach to be required. Without
> something (a hack) like this I am not able to see how an "active context"
> policy can be enforced though. Interrupts allocated during MSI-X enabling may
> not have eventfd associated and thus cannot adhere to an "active context" policy. I
> understand from  earlier comments that we do not want to track where contexts
> are allocated so I can only see a way to enforce a policy that a context has
> an allocated interrupt, but not an enabled interrupt.

We're talking about the contexts that we now allocate in the xarray to
store the eventfd linkage, right?  We need to pre-allocate some irqs
both to satisfy the API and to support non-dynamic MSI-X devices, but
we don't need to pre-allocate contexts.  The logic that I propose below
supports lookup of the pre-allocated irqs for all cases and falls back
to allocating a new irq only for cases that support it.  irqs and
contexts aren't exactly 1:1 for the dynamic case due to the artifacts
of the API, but the model supports only allocating contexts as they're
used, or "active".
 
> >> If I understand correctly this can be done without allocating any context
> >> and leave MSI-X enabled without any interrupts allocated. This could be a
> >> way to accomplish the "active context" policy for dynamic allocation.
> >> This is not a policy that can be applied broadly to interrupt contexts though
> >> because MSI and non-dynamic MSI-X could still have contexts with allocated
> >> interrupts without eventfd.  
> > 
> > I think we could come up with wrappers that handle all cases, for
> > example:
> > 
> > int vfio_pci_alloc_irq(struct vfio_pci_core_device *vdev,
> > 		       unsigned int vector, int irq_type)
> > {
> > 	struct pci_dev *pdev = vdev->pdev;
> > 	struct msi_map map;
> > 	int irq;
> > 
> > 	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX)
> > 		return pdev->irq ?: -EINVAL;
> > 
> > 	irq = pci_irq_vector(pdev, vector);
> > 	if (irq > 0 || irq_type == VFIO_PCI_MSI_IRQ_INDEX ||
> > 	    !vdev->has_dyn_msix)
> > 		return irq;
> > 
> > 	map = pci_msix_alloc_irq_at(pdev, vector, NULL);
> > 
> > 	return map.index;
> > }
> > 
> > void vfio_pci_free_irq(struct vfio_pci_core_device *vdev,
> > 		       unsigned in vector, int irq_type)
> > {
> > 	struct msi_map map;
> > 	int irq;
> > 
> > 	if (irq_type != VFIO_PCI_INTX_MSIX_INDEX ||
> > 	    !vdev->has_dyn_msix)
> > 		return;
> > 
> > 	irq = pci_irq_vector(pdev, vector);
> > 	map = { .index = vector, .virq = irq };
> > 
> > 	if (WARN_ON(irq < 0))
> > 		return;
> > 
> > 	pci_msix_free_irq(pdev, msix_map);
> > }  
> 
> Thank you very much for taking the time to write this out. I am not able to
> see where vfio_pci_alloc_irq()/vfio_pci_free_irq() would be called for
> an INTx interrupt. Is the INTx handling there for robustness or am I
> missing how it should be used for INTx interrupts?

Mostly just trying to illustrate that all interrupt types could be
supported, if it doesn't make sense for INTx, drop it.

> > At that point, maybe we'd check whether it makes sense to embed the irq
> > alloc/free within the ctx alloc/free.  
> 
> I think doing so would be the right thing to do since it helps
> to enforce the policy that interrupts and contexts are allocated together.
> I think this can be done when switching around the initialization within 
> vfio_msi_set_vector_signal(). I need to look into this more.

Interrupts and contexts allocated together would be ideal, but I think
given the API it's a reasonable and simple compromise given the
non-dynamic support to draw from the initial allocation where we can.
Actually, there could be a latency and reliability advantage to hang on
to the irq when an eventfd is unset, maybe we should only free irqs on
MSI-X teardown and otherwise use the allocated irqs as a cache.  Maybe
worth thinking about.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-04-04 17:29                     ` Reinette Chatre
@ 2023-04-04 18:43                       ` Alex Williamson
  2023-04-04 20:46                         ` Reinette Chatre
  0 siblings, 1 reply; 39+ messages in thread
From: Alex Williamson @ 2023-04-04 18:43 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Tian, Kevin, jgg, yishaih, shameerali.kolothum.thodi, tglx,
	darwi, kvm, Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi, linux-kernel

On Tue, 4 Apr 2023 10:29:14 -0700
Reinette Chatre <reinette.chatre@intel.com> wrote:

> Hi Kevin,
> 
> On 4/3/2023 8:51 PM, Tian, Kevin wrote:
> >> From: Alex Williamson <alex.williamson@redhat.com>
> >> Sent: Tuesday, April 4, 2023 11:19 AM  
> >>>
> >>> Thank you very much for your guidance. I will digest this some more and
> >>> see how wrappers could be used. In the mean time while trying to think  
> >> how  
> >>> to unify this code I do think there is an issue in this patch in that
> >>> the get_cached_msi_msg()/pci_write_msi_msg()
> >>> should not be in an else branch.
> >>>
> >>> Specifically, I think it needs to be:
> >>> 	if (msix) {
> >>> 		if (irq == -EINVAL) {
> >>> 			/* dynamically allocate interrupt */
> >>> 		}
> >>> 		get_cached_msi_msg(irq, &msg);
> >>> 		pci_write_msi_msg(irq, &msg);
> >>> 	}  
> >>
> >> Yes, that's looked wrong to me all along, I think that resolves it.
> >> Thanks,
> >>  
> > 
> > Do you mind elaborating why this change is required? I thought
> > pci_msix_alloc_irq_at() will compose a new msi message to write
> > hence no need to get cached value again in that case...  
> 
> With this change an interrupt allocated via pci_msix_alloc_irq_at()
> is treated the same as an interrupt allocated via pci_alloc_irq_vectors().
> 
> get_cached_msi_msg()/pci_write_msi_msg() is currently called for
> every allocated interrupt and this snippet intends to maintain
> this behavior.
> 
> One flow I considered that made me think this is fixing a bug is
> as follows:
> Scenario A (current behavior):
> - host/user enables vectors 0, 1, 2 ,3 ,4
>   - kernel allocates all interrupts via pci_alloc_irq_vectors()
>   - get_cached_msi_msg()/pci_write_msi_msg() is called for each interrupt

In this scenario, I think the intention is that there's non-zero
time since pci_alloc_irq_vectors() such that a device reset or other
manipulation of the vector table may have occurred, therefore we're
potentially restoring the programming of the vector table with this
get/write.

> Scenario B (this series):
> - host/user enables vector 0
>   - kernel allocates interrupt 0 via pci_alloc_irq_vectors()
>   - get_cached_msi_msg()/pci_write_msi_msg() is called for interrupt 0
> - host/user enables vector 1
>   - kernel allocates interrupt 1 via pci_msix_alloc_irq_at()
>   - get_cached_msi_msg()/pci_write_msi_msg() is NOT called for interrupt 1
>     /* This seems a bug since host may expect same outcome as in scenario A */
> 
> I am not familiar with how the MSI messages are composed though and I surely
> could have gotten this wrong. I would like to learn more after you considered
> the motivation for this change.

I think Kevin has a point, if it's correct that we do this get/write in
order to account for manipulation of the device since we wrote into the
vector table via either pci_alloc_irq_vectors() or
pci_msix_alloc_irq_at(), then it really only makes sense to do that
restore if we haven't allocated the irq and written the vector table
immediately prior.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-04-04 18:43                       ` Alex Williamson
@ 2023-04-04 20:46                         ` Reinette Chatre
  0 siblings, 0 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-04-04 20:46 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, jgg, yishaih, shameerali.kolothum.thodi, tglx,
	darwi, kvm, Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi, linux-kernel

Hi Alex,

On 4/4/2023 11:43 AM, Alex Williamson wrote:
> 
> I think Kevin has a point, if it's correct that we do this get/write in
> order to account for manipulation of the device since we wrote into the
> vector table via either pci_alloc_irq_vectors() or
> pci_msix_alloc_irq_at(), then it really only makes sense to do that
> restore if we haven't allocated the irq and written the vector table
> immediately prior.  Thanks,
> 

I see. Even so, could it be acceptable to call get_cached_msi_msg()/pci_write_msi_msg()
unconditionally (for MSI-X) or should the new [1] vfio_pci_alloc_irq()
indicate if a new IRQ was allocated to determine if 
get_cached_msi_msg()/pci_write_msi_msg() should be run?

Reinette


[1] https://lore.kernel.org/lkml/20230403211841.0e206b67.alex.williamson@redhat.com/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
  2023-04-04 18:24                     ` Alex Williamson
@ 2023-04-06 20:13                       ` Reinette Chatre
  0 siblings, 0 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-04-06 20:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
	kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	linux-kernel

Hi Alex,

On 4/4/2023 11:24 AM, Alex Williamson wrote:

> We're talking about the contexts that we now allocate in the xarray to
> store the eventfd linkage, right?  We need to pre-allocate some irqs
> both to satisfy the API and to support non-dynamic MSI-X devices, but
> we don't need to pre-allocate contexts.  The logic that I propose below
> supports lookup of the pre-allocated irqs for all cases and falls back
> to allocating a new irq only for cases that support it.  irqs and
> contexts aren't exactly 1:1 for the dynamic case due to the artifacts
> of the API, but the model supports only allocating contexts as they're
> used, or "active".

Now I understand. Thank you very much for your patience.

...

> Interrupts and contexts allocated together would be ideal, but I think
> given the API it's a reasonable and simple compromise given the
> non-dynamic support to draw from the initial allocation where we can.
> Actually, there could be a latency and reliability advantage to hang on
> to the irq when an eventfd is unset, maybe we should only free irqs on
> MSI-X teardown and otherwise use the allocated irqs as a cache.  Maybe
> worth thinking about.  Thanks,

I implemented this change and I think it looks good. Enabling of dynamic
MSI-X ended up consisting out of vfio_pci_alloc_irq() you suggested and
one more line that uses it. This is because I also made the change to
defer freeing irqs to MSI-X teardown and doing so is surely more efficient
in the scenario that Jing pointed out.

I did not transition the INTx code to "active" contexts - meaning that
the interrupt context continues to be allocated at the time INTx is
enabled. From what I understand the additional support for mask/unmask
requires a context but it does not need to be active.

Reinette




^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH V2 4/8] vfio/pci: Use xarray for interrupt context storage
  2023-03-28 21:53 ` [PATCH V2 4/8] vfio/pci: Use xarray for " Reinette Chatre
@ 2023-04-07  7:21   ` Liu, Jing2
  2023-04-07 16:44     ` Reinette Chatre
  0 siblings, 1 reply; 39+ messages in thread
From: Liu, Jing2 @ 2023-04-07  7:21 UTC (permalink / raw)
  To: Chatre, Reinette, jgg, yishaih, shameerali.kolothum.thodi, Tian,
	Kevin, alex.williamson
  Cc: tglx, darwi, kvm, Jiang, Dave, Raj, Ashok, Yu, Fenghua,
	tom.zanussi, linux-kernel, Liu, Jing2

Hi Reinette,

> From: Chatre, Reinette <reinette.chatre@intel.com>
> Subject: [PATCH V2 4/8] vfio/pci: Use xarray for interrupt context storage
> 
> Interrupt context is statically allocated at the time interrupts are allocated.
> Following allocation, the context is managed by directly accessing the
elements of the array using the vector as index. The storage is release
when interrupts are disabled.
> 

Looking at the dynamic allocation change for the time after MSI-x is
enabled in vfio_msi_set_vector_signal(), do we need to consider changing 
the allocation of context/interrupt in vfio_msi_enable() for MSI-x to the 
same way, which only allocates for vectors with a valid signal fd when 
dynamic MSI-x is supported?

Because in dynamic logic, I think when enabling MSI-x, not all vectors from 
zero to nvec should be allocated, and they would be done until they are really
used with setting the singal fd.

Not sure if this comment being replied here is the good place since
vfio_msi_enable() seems no change in this series. 😊 


Thanks,
Jing


> 
> Use the new data storage to allocate all elements once and free all elements
> together. Dynamic usage follows.
> 
> Create helpers with understanding that it is only possible to (after initial MSI-X
> enabling) allocate a single MSI-X index at a time. The only time multiple MSI-X
> are allocated is during initial MSI-X enabling where failure results in no
> allocations.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since RFC V1:
> - Let vfio_irq_ctx_alloc_single() return pointer to allocated
>   context. (Alex)
> - Transition INTx allocation to simplified vfio_irq_ctx_alloc_single().
> - Improve accuracy of changelog.
> 
>  drivers/vfio/pci/vfio_pci_core.c  |  1 +  drivers/vfio/pci/vfio_pci_intrs.c | 77
> ++++++++++++++++++++-----------
>  include/linux/vfio_pci_core.h     |  2 +-
>  3 files changed, 53 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index a5ab416cf476..ae0e161c7fc9 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -2102,6 +2102,7 @@ int vfio_pci_core_init_dev(struct vfio_device
> *core_vdev)
>  	INIT_LIST_HEAD(&vdev->vma_list);
>  	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
>  	init_rwsem(&vdev->memory_lock);
> +	xa_init(&vdev->ctx);
> 
>  	return 0;
>  }
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index ece371ebea00..00119641aa19 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -52,25 +52,60 @@ static
>  struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
>  					  unsigned long index)
>  {
> -	if (index >= vdev->num_ctx)
> -		return NULL;
> -	return &vdev->ctx[index];
> +	return xa_load(&vdev->ctx, index);
>  }
> 
>  static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)  {
> -	kfree(vdev->ctx);
> +	struct vfio_pci_irq_ctx *ctx;
> +	unsigned long index;
> +
> +	xa_for_each(&vdev->ctx, index, ctx) {
> +		xa_erase(&vdev->ctx, index);
> +		kfree(ctx);
> +	}
> +}
> +
> +static struct vfio_pci_irq_ctx *
> +vfio_irq_ctx_alloc_single(struct vfio_pci_core_device *vdev,
> +			  unsigned long index)
> +{
> +	struct vfio_pci_irq_ctx *ctx;
> +	int ret;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
> +	if (!ctx)
> +		return NULL;
> +
> +	ret = xa_insert(&vdev->ctx, index, ctx, GFP_KERNEL_ACCOUNT);
> +	if (ret) {
> +		kfree(ctx);
> +		return NULL;
> +	}
> +
> +	return ctx;
>  }
> 
> +/* Only called during initial interrupt enabling. Never after.  */
>  static int vfio_irq_ctx_alloc_num(struct vfio_pci_core_device *vdev,
>  				  unsigned long num)
>  {
> -	vdev->ctx = kcalloc(num, sizeof(struct vfio_pci_irq_ctx),
> -			    GFP_KERNEL_ACCOUNT);
> -	if (!vdev->ctx)
> -		return -ENOMEM;
> +	struct vfio_pci_irq_ctx *ctx;
> +	unsigned long index;
> +
> +	WARN_ON(!xa_empty(&vdev->ctx));
> +
> +	for (index = 0; index < num; index++) {
> +		ctx = vfio_irq_ctx_alloc_single(vdev, index);
> +		if (!ctx)
> +			goto err;
> +	}
> 
>  	return 0;
> +
> +err:
> +	vfio_irq_ctx_free_all(vdev);
> +	return -ENOMEM;
>  }
> 
>  /*
> @@ -218,7 +253,6 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
> static int vfio_intx_enable(struct vfio_pci_core_device *vdev)  {
>  	struct vfio_pci_irq_ctx *ctx;
> -	int ret;
> 
>  	if (!is_irq_none(vdev))
>  		return -EINVAL;
> @@ -226,15 +260,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device
> *vdev)
>  	if (!vdev->pdev->irq)
>  		return -ENODEV;
> 
> -	ret = vfio_irq_ctx_alloc_num(vdev, 1);
> -	if (ret)
> -		return ret;
> -
> -	ctx = vfio_irq_ctx_get(vdev, 0);
> -	if (!ctx) {
> -		vfio_irq_ctx_free_all(vdev);
> -		return -EINVAL;
> -	}
> +	ctx = vfio_irq_ctx_alloc_single(vdev, 0);
> +	if (!ctx)
> +		return -ENOMEM;
> 
>  	vdev->num_ctx = 1;
> 
> @@ -486,16 +514,13 @@ static void vfio_msi_disable(struct
> vfio_pci_core_device *vdev, bool msix)  {
>  	struct pci_dev *pdev = vdev->pdev;
>  	struct vfio_pci_irq_ctx *ctx;
> -	unsigned int i;
> +	unsigned long i;
>  	u16 cmd;
> 
> -	for (i = 0; i < vdev->num_ctx; i++) {
> -		ctx = vfio_irq_ctx_get(vdev, i);
> -		if (ctx) {
> -			vfio_virqfd_disable(&ctx->unmask);
> -			vfio_virqfd_disable(&ctx->mask);
> -			vfio_msi_set_vector_signal(vdev, i, -1, msix);
> -		}
> +	xa_for_each(&vdev->ctx, i, ctx) {
> +		vfio_virqfd_disable(&ctx->unmask);
> +		vfio_virqfd_disable(&ctx->mask);
> +		vfio_msi_set_vector_signal(vdev, i, -1, msix);
>  	}
> 
>  	cmd = vfio_pci_memory_lock_and_enable(vdev);
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index
> 367fd79226a3..61d7873a3973 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -59,7 +59,7 @@ struct vfio_pci_core_device {
>  	struct perm_bits	*msi_perm;
>  	spinlock_t		irqlock;
>  	struct mutex		igate;
> -	struct vfio_pci_irq_ctx	*ctx;
> +	struct xarray		ctx;
>  	int			num_ctx;
>  	int			irq_type;
>  	int			num_regions;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH V2 4/8] vfio/pci: Use xarray for interrupt context storage
  2023-04-07  7:21   ` Liu, Jing2
@ 2023-04-07 16:44     ` Reinette Chatre
  0 siblings, 0 replies; 39+ messages in thread
From: Reinette Chatre @ 2023-04-07 16:44 UTC (permalink / raw)
  To: Liu, Jing2, jgg, yishaih, shameerali.kolothum.thodi, Tian, Kevin,
	alex.williamson
  Cc: tglx, darwi, kvm, Jiang, Dave, Raj, Ashok, Yu, Fenghua,
	tom.zanussi, linux-kernel

Hi Jing,

On 4/7/2023 12:21 AM, Liu, Jing2 wrote:
> Hi Reinette,
> 
>> From: Chatre, Reinette <reinette.chatre@intel.com>
>> Subject: [PATCH V2 4/8] vfio/pci: Use xarray for interrupt context storage
>>
>> Interrupt context is statically allocated at the time interrupts are allocated.
>> Following allocation, the context is managed by directly accessing the
> elements of the array using the vector as index. The storage is release
> when interrupts are disabled.
>>
> 
> Looking at the dynamic allocation change for the time after MSI-x is
> enabled in vfio_msi_set_vector_signal(), do we need to consider changing 
> the allocation of context/interrupt in vfio_msi_enable() for MSI-x to the 
> same way, which only allocates for vectors with a valid signal fd when 
> dynamic MSI-x is supported?
> 
> Because in dynamic logic, I think when enabling MSI-x, not all vectors from 
> zero to nvec should be allocated, and they would be done until they are really
> used with setting the singal fd.
> 
> Not sure if this comment being replied here is the good place since
> vfio_msi_enable() seems no change in this series. 😊 

This is a good question and from what I understand it could be done either way.

vfio_msi_enable()->pci_alloc_irq_vectors() would always be required because
pci_alloc_irq_vectors() enables MSI-X in addition to allocating the vectors.

I expect it to be efficient to allocate a range using pci_alloc_irq_vectors()
at the same time as MSI-X enabling in anticipation of their subsequent activation
after needing to only take the vfio and MSI locks once.

As you indicate, it is also possible to only allocate one vector during MSI-X
enabling using pci_alloc_irq_vectors(), leaving the other allocations to
vfio_msi_set_vector_signal(). Not a major issue but this would require some
additional special cases within vfio_msi_enable() while the current solution
allocating all vectors using pci_alloc_irq_vectors() works for all types.

Considering which would be more efficient would depend on the use cases. The
current solution may be considered less efficient if the user enables MSI-X
by providing a range of vectors without triggers(*). From what I can tell
this is not a possible use case when using Qemu. Qemu enables MSI-X by
calling VFIO_DEVICE_SET_IRQS for vector 0 with a trigger. Making a change
to only allocate vector 0 using pci_alloc_irq_vectors() and the rest using
vfio_msi_set_vector_signal() would thus have no impact on Qemu. Both
implementations behave the same for Qemu.

Considering the efficiency of allocating multiple vectors together that
works for all interrupts (MSI, non dynamic MSI-X, and dynamic MSI-X) without
any impact to Qemu I do lean towards keeping the current implementation.

Reinette

(*) Whether it is less efficient may possibly be debated considering that 
it may be desirable to use allocated interrupts as a cache:
https://lore.kernel.org/lkml/20230404122444.59e36a99.alex.williamson@redhat.com/

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2023-04-07 16:44 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-28 21:53 [PATCH V2 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 1/8] vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 2/8] vfio/pci: Remove negative check on unsigned vector Reinette Chatre
2023-03-30 20:26   ` Alex Williamson
2023-03-30 22:32     ` Reinette Chatre
2023-03-30 22:54       ` Alex Williamson
2023-03-30 23:54         ` Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 3/8] vfio/pci: Prepare for dynamic interrupt context storage Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 4/8] vfio/pci: Use xarray for " Reinette Chatre
2023-04-07  7:21   ` Liu, Jing2
2023-04-07 16:44     ` Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 5/8] vfio/pci: Remove interrupt context counter Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 6/8] vfio/pci: Move to single error path Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x Reinette Chatre
2023-03-29  2:48   ` kernel test robot
2023-03-29 14:42     ` Reinette Chatre
2023-03-29 22:10       ` Reinette Chatre
2023-03-29  2:58   ` kernel test robot
2023-03-30 22:40   ` Alex Williamson
2023-03-30 22:42     ` Alex Williamson
2023-03-31 17:49       ` Reinette Chatre
2023-03-31 22:24         ` Alex Williamson
2023-04-03 17:31           ` Reinette Chatre
2023-04-03 20:22             ` Alex Williamson
2023-04-03 22:50               ` Reinette Chatre
2023-04-04  3:18                 ` Alex Williamson
2023-04-04  3:51                   ` Tian, Kevin
2023-04-04 17:29                     ` Reinette Chatre
2023-04-04 18:43                       ` Alex Williamson
2023-04-04 20:46                         ` Reinette Chatre
2023-04-04 16:54                   ` Reinette Chatre
2023-04-04 18:24                     ` Alex Williamson
2023-04-06 20:13                       ` Reinette Chatre
2023-03-31 10:02   ` Liu, Jing2
2023-03-31 13:51     ` Alex Williamson
2023-04-04  3:19       ` Liu, Jing2
2023-03-28 21:53 ` [PATCH V2 8/8] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X Reinette Chatre
2023-03-29  3:29   ` kernel test robot
2023-03-29  3:29   ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.