linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support allocating interrupt sets
@ 2019-01-25  9:53 Ming Lei
  2019-01-25  9:53 ` [PATCH 1/5] genirq/affinity: move allocation of 'node_to_cpumask' to irq_build_affinity_masks Ming Lei
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Ming Lei @ 2019-01-25  9:53 UTC (permalink / raw)
  To: Christoph Hellwig, Bjorn Helgaas, Thomas Gleixner
  Cc: Jens Axboe, linux-block, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-pci, Ming Lei

Hi,

The current support for allocating interrupt sets requires that same 'max_vec'
and 'min_vec' is passed to pci_alloc_irq_vectors_affinity(), then driver has to
try to allocate again and again until it succeeds.

This patch introduces .setup_affinity callback, and we can use it to
re-caculate interrupt sets and build affinity for each set after
irq vectors are allocated.

Turns out both genirq/affinity and nvme code get simplified a lot.

Please review and comment!

Ming Lei (5):
  genirq/affinity: move allocation of 'node_to_cpumask' to
    irq_build_affinity_masks
  genirq/affinity: allow driver to setup managed IRQ's affinity
  genirq/affinity: introduce irq_build_affinity()
  nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback
  genirq/affinity: remove support for allocating interrupt sets

 drivers/nvme/host/pci.c   |  97 +++++++++++++++++++++--------------------
 drivers/pci/msi.c         |  14 ------
 include/linux/interrupt.h |  42 ++++++++++++------
 kernel/irq/affinity.c     | 108 ++++++++++++++++++++++++----------------------
 4 files changed, 133 insertions(+), 128 deletions(-)

-- 
2.9.5


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5] genirq/affinity: move allocation of 'node_to_cpumask' to irq_build_affinity_masks
  2019-01-25  9:53 [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support allocating interrupt sets Ming Lei
@ 2019-01-25  9:53 ` Ming Lei
  2019-02-07 22:02   ` Bjorn Helgaas
  2019-02-10 19:01   ` [tip:irq/core] genirq/affinity: Move allocation of 'node_to_cpumask' to irq_build_affinity_masks() tip-bot for Ming Lei
  2019-01-25  9:53 ` [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity Ming Lei
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 21+ messages in thread
From: Ming Lei @ 2019-01-25  9:53 UTC (permalink / raw)
  To: Christoph Hellwig, Bjorn Helgaas, Thomas Gleixner
  Cc: Jens Axboe, linux-block, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-pci, Ming Lei

'node_to_cpumask' is just one temparay variable for irq_build_affinity_masks(),
so move it into irq_build_affinity_masks().

No functioanl change.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 kernel/irq/affinity.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 45b68b4ea48b..118b66d64a53 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -175,18 +175,22 @@ static int __irq_build_affinity_masks(const struct irq_affinity *affd,
  */
 static int irq_build_affinity_masks(const struct irq_affinity *affd,
 				    int startvec, int numvecs, int firstvec,
-				    cpumask_var_t *node_to_cpumask,
 				    struct irq_affinity_desc *masks)
 {
 	int curvec = startvec, nr_present, nr_others;
 	int ret = -ENOMEM;
 	cpumask_var_t nmsk, npresmsk;
+	cpumask_var_t *node_to_cpumask;
 
 	if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
 		return ret;
 
 	if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
-		goto fail;
+		goto fail_nmsk;
+
+	node_to_cpumask = alloc_node_to_cpumask();
+	if (!node_to_cpumask)
+		goto fail_npresmsk;
 
 	ret = 0;
 	/* Stabilize the cpumasks */
@@ -217,9 +221,12 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,
 	if (nr_present < numvecs)
 		WARN_ON(nr_present + nr_others < numvecs);
 
+	free_node_to_cpumask(node_to_cpumask);
+
+ fail_npresmsk:
 	free_cpumask_var(npresmsk);
 
- fail:
+ fail_nmsk:
 	free_cpumask_var(nmsk);
 	return ret;
 }
@@ -236,7 +243,6 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 {
 	int affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
 	int curvec, usedvecs;
-	cpumask_var_t *node_to_cpumask;
 	struct irq_affinity_desc *masks = NULL;
 	int i, nr_sets;
 
@@ -247,13 +253,9 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	if (nvecs == affd->pre_vectors + affd->post_vectors)
 		return NULL;
 
-	node_to_cpumask = alloc_node_to_cpumask();
-	if (!node_to_cpumask)
-		return NULL;
-
 	masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
 	if (!masks)
-		goto outnodemsk;
+		return NULL;
 
 	/* Fill out vectors at the beginning that don't need affinity */
 	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
@@ -271,11 +273,10 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 		int ret;
 
 		ret = irq_build_affinity_masks(affd, curvec, this_vecs,
-						curvec, node_to_cpumask, masks);
+						curvec, masks);
 		if (ret) {
 			kfree(masks);
-			masks = NULL;
-			goto outnodemsk;
+			return NULL;
 		}
 		curvec += this_vecs;
 		usedvecs += this_vecs;
@@ -293,8 +294,6 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	for (i = affd->pre_vectors; i < nvecs - affd->post_vectors; i++)
 		masks[i].is_managed = 1;
 
-outnodemsk:
-	free_node_to_cpumask(node_to_cpumask);
 	return masks;
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity
  2019-01-25  9:53 [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support allocating interrupt sets Ming Lei
  2019-01-25  9:53 ` [PATCH 1/5] genirq/affinity: move allocation of 'node_to_cpumask' to irq_build_affinity_masks Ming Lei
@ 2019-01-25  9:53 ` Ming Lei
  2019-02-07 22:21   ` Bjorn Helgaas
  2019-02-10 16:30   ` Thomas Gleixner
  2019-01-25  9:53 ` [PATCH 3/5] genirq/affinity: introduce irq_build_affinity() Ming Lei
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 21+ messages in thread
From: Ming Lei @ 2019-01-25  9:53 UTC (permalink / raw)
  To: Christoph Hellwig, Bjorn Helgaas, Thomas Gleixner
  Cc: Jens Axboe, linux-block, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-pci, Ming Lei

This patch introduces callback of .setup_affinity into 'struct
irq_affinity', so that:

1) allow drivers to customize the affinity for managed IRQ, for
example, now NVMe has special requirement for read queues & poll
queues

2) 6da4b3ab9a6e9 ("genirq/affinity: Add support for allocating interrupt sets")
makes pci_alloc_irq_vectors_affinity() a bit difficult to use for
allocating interrupt sets: 'max_vecs' is required to same with 'min_vecs'.

With this patch, driver can implement their own .setup_affinity to
customize the affinity, then the above thing can be solved easily.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/interrupt.h | 26 +++++++++++++++++---------
 kernel/irq/affinity.c     |  6 ++++++
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index c672f34235e7..f6cea778cf50 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -242,30 +242,38 @@ struct irq_affinity_notify {
 };
 
 /**
+ * struct irq_affinity_desc - Interrupt affinity descriptor
+ * @mask:	cpumask to hold the affinity assignment
+ */
+struct irq_affinity_desc {
+	struct cpumask	mask;
+	unsigned int	is_managed : 1;
+};
+
+/**
  * struct irq_affinity - Description for automatic irq affinity assignements
  * @pre_vectors:	Don't apply affinity to @pre_vectors at beginning of
  *			the MSI(-X) vector space
  * @post_vectors:	Don't apply affinity to @post_vectors at end of
  *			the MSI(-X) vector space
+ * @setup_affinity:	Use driver's method to setup irq vectors affinity,
+ * 			and driver has to handle pre_vectors & post_vectors
+ * 			correctly, set 'is_managed' flag correct too
+ * @priv:		Private data of @setup_affinity
  * @nr_sets:		Length of passed in *sets array
  * @sets:		Number of affinitized sets
  */
 struct irq_affinity {
 	int	pre_vectors;
 	int	post_vectors;
+	int	(*setup_affinity)(const struct irq_affinity *,
+				  struct irq_affinity_desc *,
+				  unsigned int);
+	void    *priv;
 	int	nr_sets;
 	int	*sets;
 };
 
-/**
- * struct irq_affinity_desc - Interrupt affinity descriptor
- * @mask:	cpumask to hold the affinity assignment
- */
-struct irq_affinity_desc {
-	struct cpumask	mask;
-	unsigned int	is_managed : 1;
-};
-
 #if defined(CONFIG_SMP)
 
 extern cpumask_var_t irq_default_affinity;
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 118b66d64a53..7b77cbdf739c 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -257,6 +257,12 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	if (!masks)
 		return NULL;
 
+	if (affd->setup_affinity) {
+		if (affd->setup_affinity(affd, masks, nvecs))
+			return NULL;
+		return masks;
+	}
+
 	/* Fill out vectors at the beginning that don't need affinity */
 	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
 		cpumask_copy(&masks[curvec].mask, irq_default_affinity);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/5] genirq/affinity: introduce irq_build_affinity()
  2019-01-25  9:53 [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support allocating interrupt sets Ming Lei
  2019-01-25  9:53 ` [PATCH 1/5] genirq/affinity: move allocation of 'node_to_cpumask' to irq_build_affinity_masks Ming Lei
  2019-01-25  9:53 ` [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity Ming Lei
@ 2019-01-25  9:53 ` Ming Lei
  2019-01-25  9:53 ` [PATCH 4/5] nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback Ming Lei
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2019-01-25  9:53 UTC (permalink / raw)
  To: Christoph Hellwig, Bjorn Helgaas, Thomas Gleixner
  Cc: Jens Axboe, linux-block, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-pci, Ming Lei

Drivers may use this API to build customized irq affinity, one example
is NVMe, which needs to build multiple irq sets, on each of which all
CPUs are spread.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/interrupt.h | 12 ++++++++++++
 kernel/irq/affinity.c     | 27 +++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index f6cea778cf50..b820b07f3b55 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -323,6 +323,10 @@ irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);
 struct irq_affinity_desc *
 irq_create_affinity_masks(int nvec, const struct irq_affinity *affd);
 
+int irq_build_affinity(const struct irq_affinity *affd, int startvec,
+                      int numvecs, int firstvec,
+                      struct irq_affinity_desc *masks, int nmasks);
+
 int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity *affd);
 
 #else /* CONFIG_SMP */
@@ -368,6 +372,14 @@ irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity *aff
 	return maxvec;
 }
 
+static inline int
+irq_build_affinity(const struct irq_affinity *affd, int startvec,
+		   int numvecs, int firstvec,
+		   struct irq_affinity_desc *masks, int nmasks)
+{
+	return 0;
+}
+
 #endif /* CONFIG_SMP */
 
 /*
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 7b77cbdf739c..524fdcda9f85 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -232,6 +232,33 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,
 }
 
 /**
+ * irq_build_affinity - build affinity masks for multiqueue spreading
+ * @affd:	Description of the affinity requirements
+ * @startvec:	The start vector for building affinity masks
+ * @numvec:	The number of vectors is needed for building affinity
+ * @firstvec:	It is the IRQ vector which we jump to for continue spread
+ * 		after the last vector(@startvec + @numvec - 1) is built.
+ * @masks:	The mask array for storing the affinity masks
+ * @nmasks:	The total number of @masks
+ *
+ * Both @startvec and @firstvec are relative to the 1st irq vectorc
+ * allocated to the device.
+ *
+ * Returns 0 if affinty masks is built successfully.
+ */
+int irq_build_affinity(const struct irq_affinity *affd, int startvec,
+		       int numvecs, int firstvec,
+		       struct irq_affinity_desc *masks, int nmasks)
+{
+	if (startvec >= nmasks || firstvec >= nmasks || numvecs > nmasks)
+		return -EINVAL;
+
+	return irq_build_affinity_masks(affd, startvec, numvecs, firstvec,
+					masks);
+}
+EXPORT_SYMBOL_GPL(irq_build_affinity);
+
+/**
  * irq_create_affinity_masks - Create affinity masks for multiqueue spreading
  * @nvecs:	The total number of vectors
  * @affd:	Description of the affinity requirements
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/5] nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback
  2019-01-25  9:53 [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support allocating interrupt sets Ming Lei
                   ` (2 preceding siblings ...)
  2019-01-25  9:53 ` [PATCH 3/5] genirq/affinity: introduce irq_build_affinity() Ming Lei
@ 2019-01-25  9:53 ` Ming Lei
  2019-02-10 16:39   ` Thomas Gleixner
  2019-02-10 18:49   ` Thomas Gleixner
  2019-01-25  9:53 ` [PATCH 5/5] genirq/affinity: remove support for allocating interrupt sets Ming Lei
  2019-01-25  9:56 ` [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support " Ming Lei
  5 siblings, 2 replies; 21+ messages in thread
From: Ming Lei @ 2019-01-25  9:53 UTC (permalink / raw)
  To: Christoph Hellwig, Bjorn Helgaas, Thomas Gleixner
  Cc: Jens Axboe, linux-block, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-pci, Ming Lei

Use the callback of .setup_affinity() to re-caculate number
of queues, and build irqs affinity with help of irq_build_affinity().

Then nvme_setup_irqs() gets simplified a lot.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/pci.c | 97 ++++++++++++++++++++++++-------------------------
 1 file changed, 48 insertions(+), 49 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9bc585415d9b..24496de0a29b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2078,17 +2078,58 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int irq_queues)
 	}
 }
 
+static int nvme_setup_affinity(const struct irq_affinity *affd,
+			       struct irq_affinity_desc *masks,
+			       unsigned int nmasks)
+{
+	struct nvme_dev *dev = affd->priv;
+	int affvecs = nmasks - affd->pre_vectors - affd->post_vectors;
+	int curvec, usedvecs;
+	int i;
+
+	nvme_calc_io_queues(dev, nmasks);
+
+	/* Fill out vectors at the beginning that don't need affinity */
+	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
+		cpumask_copy(&masks[curvec].mask, cpu_possible_mask);
+
+	for (i = 0, usedvecs = 0; i < HCTX_TYPE_POLL; i++) {
+		int this_vecs = dev->io_queues[i];
+		int ret;
+
+		if (!this_vecs)
+			break;
+
+		ret = irq_build_affinity(affd, curvec, this_vecs, curvec,
+					 masks, nmasks);
+		if (ret)
+			return ret;
+
+		curvec += this_vecs;
+		usedvecs += this_vecs;
+	}
+
+	/* Fill out vectors at the end that don't need affinity */
+	curvec = affd->pre_vectors + min(usedvecs, affvecs);
+	for (; curvec < nmasks; curvec++)
+		cpumask_copy(&masks[curvec].mask, cpu_possible_mask);
+
+	/* Mark the managed interrupts */
+	for (i = affd->pre_vectors; i < nmasks - affd->post_vectors; i++)
+		masks[i].is_managed = 1;
+
+	return 0;
+}
+
 static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
 {
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
-	int irq_sets[2];
 	struct irq_affinity affd = {
 		.pre_vectors = 1,
-		.nr_sets = ARRAY_SIZE(irq_sets),
-		.sets = irq_sets,
+		.setup_affinity = nvme_setup_affinity,
+		.priv = dev,
 	};
-	int result = 0;
-	unsigned int irq_queues, this_p_queues;
+	int result, irq_queues, this_p_queues;
 
 	/*
 	 * Poll queues don't need interrupts, but we need at least one IO
@@ -2103,50 +2144,8 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
 	}
 	dev->io_queues[HCTX_TYPE_POLL] = this_p_queues;
 
-	/*
-	 * For irq sets, we have to ask for minvec == maxvec. This passes
-	 * any reduction back to us, so we can adjust our queue counts and
-	 * IRQ vector needs.
-	 */
-	do {
-		nvme_calc_io_queues(dev, irq_queues);
-		irq_sets[0] = dev->io_queues[HCTX_TYPE_DEFAULT];
-		irq_sets[1] = dev->io_queues[HCTX_TYPE_READ];
-		if (!irq_sets[1])
-			affd.nr_sets = 1;
-
-		/*
-		 * If we got a failure and we're down to asking for just
-		 * 1 + 1 queues, just ask for a single vector. We'll share
-		 * that between the single IO queue and the admin queue.
-		 * Otherwise, we assign one independent vector to admin queue.
-		 */
-		if (irq_queues > 1)
-			irq_queues = irq_sets[0] + irq_sets[1] + 1;
-
-		result = pci_alloc_irq_vectors_affinity(pdev, irq_queues,
-				irq_queues,
-				PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
-
-		/*
-		 * Need to reduce our vec counts. If we get ENOSPC, the
-		 * platform should support mulitple vecs, we just need
-		 * to decrease our ask. If we get EINVAL, the platform
-		 * likely does not. Back down to ask for just one vector.
-		 */
-		if (result == -ENOSPC) {
-			irq_queues--;
-			if (!irq_queues)
-				return result;
-			continue;
-		} else if (result == -EINVAL) {
-			irq_queues = 1;
-			continue;
-		} else if (result <= 0)
-			return -EIO;
-		break;
-	} while (1);
-
+	result = pci_alloc_irq_vectors_affinity(pdev, 1, irq_queues,
+			PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
 	return result;
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/5] genirq/affinity: remove support for allocating interrupt sets
  2019-01-25  9:53 [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support allocating interrupt sets Ming Lei
                   ` (3 preceding siblings ...)
  2019-01-25  9:53 ` [PATCH 4/5] nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback Ming Lei
@ 2019-01-25  9:53 ` Ming Lei
  2019-02-07 22:22   ` Bjorn Helgaas
  2019-01-25  9:56 ` [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support " Ming Lei
  5 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2019-01-25  9:53 UTC (permalink / raw)
  To: Christoph Hellwig, Bjorn Helgaas, Thomas Gleixner
  Cc: Jens Axboe, linux-block, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-pci, Ming Lei

Now allocating interrupt sets can be done via .setup_affinity()
easily, so remove the support for allocating interrupt sets.

With this change, we don't need the limit of 'minvec == maxvec'
any more in pci_alloc_irq_vectors_affinity().

Meantime irq_create_affinity_masks() gets simplified a lot.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/pci/msi.c         | 14 -------------
 include/linux/interrupt.h |  4 ----
 kernel/irq/affinity.c     | 52 +++++++++++------------------------------------
 3 files changed, 12 insertions(+), 58 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 4c0b47867258..331483de1294 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1035,13 +1035,6 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 	if (maxvec < minvec)
 		return -ERANGE;
 
-	/*
-	 * If the caller is passing in sets, we can't support a range of
-	 * vectors. The caller needs to handle that.
-	 */
-	if (affd && affd->nr_sets && minvec != maxvec)
-		return -EINVAL;
-
 	if (WARN_ON_ONCE(dev->msi_enabled))
 		return -EINVAL;
 
@@ -1093,13 +1086,6 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
 	if (maxvec < minvec)
 		return -ERANGE;
 
-	/*
-	 * If the caller is passing in sets, we can't support a range of
-	 * supported vectors. The caller needs to handle that.
-	 */
-	if (affd && affd->nr_sets && minvec != maxvec)
-		return -EINVAL;
-
 	if (WARN_ON_ONCE(dev->msix_enabled))
 		return -EINVAL;
 
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index b820b07f3b55..a035e165f405 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -260,8 +260,6 @@ struct irq_affinity_desc {
  * 			and driver has to handle pre_vectors & post_vectors
  * 			correctly, set 'is_managed' flag correct too
  * @priv:		Private data of @setup_affinity
- * @nr_sets:		Length of passed in *sets array
- * @sets:		Number of affinitized sets
  */
 struct irq_affinity {
 	int	pre_vectors;
@@ -270,8 +268,6 @@ struct irq_affinity {
 				  struct irq_affinity_desc *,
 				  unsigned int);
 	void    *priv;
-	int	nr_sets;
-	int	*sets;
 };
 
 #if defined(CONFIG_SMP)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 524fdcda9f85..e8fea65325d9 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -269,9 +269,9 @@ struct irq_affinity_desc *
 irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 {
 	int affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
-	int curvec, usedvecs;
+	int curvec;
 	struct irq_affinity_desc *masks = NULL;
-	int i, nr_sets;
+	int i;
 
 	/*
 	 * If there aren't any vectors left after applying the pre/post
@@ -293,34 +293,14 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	/* Fill out vectors at the beginning that don't need affinity */
 	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
 		cpumask_copy(&masks[curvec].mask, irq_default_affinity);
-	/*
-	 * Spread on present CPUs starting from affd->pre_vectors. If we
-	 * have multiple sets, build each sets affinity mask separately.
-	 */
-	nr_sets = affd->nr_sets;
-	if (!nr_sets)
-		nr_sets = 1;
-
-	for (i = 0, usedvecs = 0; i < nr_sets; i++) {
-		int this_vecs = affd->sets ? affd->sets[i] : affvecs;
-		int ret;
-
-		ret = irq_build_affinity_masks(affd, curvec, this_vecs,
-						curvec, masks);
-		if (ret) {
-			kfree(masks);
-			return NULL;
-		}
-		curvec += this_vecs;
-		usedvecs += this_vecs;
+
+	if (irq_build_affinity_masks(affd, curvec, affvecs, curvec, masks)) {
+		kfree(masks);
+		return NULL;
 	}
 
 	/* Fill out vectors at the end that don't need affinity */
-	if (usedvecs >= affvecs)
-		curvec = affd->pre_vectors + affvecs;
-	else
-		curvec = affd->pre_vectors + usedvecs;
-	for (; curvec < nvecs; curvec++)
+	for (curvec = affd->pre_vectors + affvecs; curvec < nvecs; curvec++)
 		cpumask_copy(&masks[curvec].mask, irq_default_affinity);
 
 	/* Mark the managed interrupts */
@@ -340,21 +320,13 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
 {
 	int resv = affd->pre_vectors + affd->post_vectors;
 	int vecs = maxvec - resv;
-	int set_vecs;
+	int ret;
 
 	if (resv > minvec)
 		return 0;
 
-	if (affd->nr_sets) {
-		int i;
-
-		for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
-			set_vecs += affd->sets[i];
-	} else {
-		get_online_cpus();
-		set_vecs = cpumask_weight(cpu_possible_mask);
-		put_online_cpus();
-	}
-
-	return resv + min(set_vecs, vecs);
+	get_online_cpus();
+	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
+	put_online_cpus();
+	return ret;
 }
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support allocating interrupt sets
  2019-01-25  9:53 [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support allocating interrupt sets Ming Lei
                   ` (4 preceding siblings ...)
  2019-01-25  9:53 ` [PATCH 5/5] genirq/affinity: remove support for allocating interrupt sets Ming Lei
@ 2019-01-25  9:56 ` Ming Lei
  5 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2019-01-25  9:56 UTC (permalink / raw)
  To: Christoph Hellwig, Bjorn Helgaas, Thomas Gleixner
  Cc: Jens Axboe, linux-block, Sagi Grimberg, linux-nvme, linux-kernel,
	linux-pci, Keith Busch

On Fri, Jan 25, 2019 at 05:53:42PM +0800, Ming Lei wrote:
> Hi,
> 
> The current support for allocating interrupt sets requires that same 'max_vec'
> and 'min_vec' is passed to pci_alloc_irq_vectors_affinity(), then driver has to
> try to allocate again and again until it succeeds.
> 
> This patch introduces .setup_affinity callback, and we can use it to
> re-caculate interrupt sets and build affinity for each set after
> irq vectors are allocated.
> 
> Turns out both genirq/affinity and nvme code get simplified a lot.
> 
> Please review and comment!
> 
> Ming Lei (5):
>   genirq/affinity: move allocation of 'node_to_cpumask' to
>     irq_build_affinity_masks
>   genirq/affinity: allow driver to setup managed IRQ's affinity
>   genirq/affinity: introduce irq_build_affinity()
>   nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback
>   genirq/affinity: remove support for allocating interrupt sets
> 
>  drivers/nvme/host/pci.c   |  97 +++++++++++++++++++++--------------------
>  drivers/pci/msi.c         |  14 ------
>  include/linux/interrupt.h |  42 ++++++++++++------
>  kernel/irq/affinity.c     | 108 ++++++++++++++++++++++++----------------------
>  4 files changed, 133 insertions(+), 128 deletions(-)

oops, forget to CC Keith.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] genirq/affinity: move allocation of 'node_to_cpumask' to irq_build_affinity_masks
  2019-01-25  9:53 ` [PATCH 1/5] genirq/affinity: move allocation of 'node_to_cpumask' to irq_build_affinity_masks Ming Lei
@ 2019-02-07 22:02   ` Bjorn Helgaas
  2019-02-10 19:01   ` [tip:irq/core] genirq/affinity: Move allocation of 'node_to_cpumask' to irq_build_affinity_masks() tip-bot for Ming Lei
  1 sibling, 0 replies; 21+ messages in thread
From: Bjorn Helgaas @ 2019-02-07 22:02 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Thomas Gleixner, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

On Fri, Jan 25, 2019 at 05:53:43PM +0800, Ming Lei wrote:
> 'node_to_cpumask' is just one temparay variable for irq_build_affinity_masks(),
> so move it into irq_build_affinity_masks().
> 
> No functioanl change.

s/temparay/temporary/
s/functioanl/functional/

> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Nice patch, this is much cleaner.

Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  kernel/irq/affinity.c | 27 +++++++++++++--------------
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index 45b68b4ea48b..118b66d64a53 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -175,18 +175,22 @@ static int __irq_build_affinity_masks(const struct irq_affinity *affd,
>   */
>  static int irq_build_affinity_masks(const struct irq_affinity *affd,
>  				    int startvec, int numvecs, int firstvec,
> -				    cpumask_var_t *node_to_cpumask,
>  				    struct irq_affinity_desc *masks)
>  {
>  	int curvec = startvec, nr_present, nr_others;
>  	int ret = -ENOMEM;
>  	cpumask_var_t nmsk, npresmsk;
> +	cpumask_var_t *node_to_cpumask;
>  
>  	if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
>  		return ret;
>  
>  	if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
> -		goto fail;
> +		goto fail_nmsk;
> +
> +	node_to_cpumask = alloc_node_to_cpumask();
> +	if (!node_to_cpumask)
> +		goto fail_npresmsk;
>  
>  	ret = 0;
>  	/* Stabilize the cpumasks */
> @@ -217,9 +221,12 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,
>  	if (nr_present < numvecs)
>  		WARN_ON(nr_present + nr_others < numvecs);
>  
> +	free_node_to_cpumask(node_to_cpumask);
> +
> + fail_npresmsk:
>  	free_cpumask_var(npresmsk);
>  
> - fail:
> + fail_nmsk:
>  	free_cpumask_var(nmsk);
>  	return ret;
>  }
> @@ -236,7 +243,6 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  {
>  	int affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
>  	int curvec, usedvecs;
> -	cpumask_var_t *node_to_cpumask;
>  	struct irq_affinity_desc *masks = NULL;
>  	int i, nr_sets;
>  
> @@ -247,13 +253,9 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	if (nvecs == affd->pre_vectors + affd->post_vectors)
>  		return NULL;
>  
> -	node_to_cpumask = alloc_node_to_cpumask();
> -	if (!node_to_cpumask)
> -		return NULL;
> -
>  	masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
>  	if (!masks)
> -		goto outnodemsk;
> +		return NULL;
>  
>  	/* Fill out vectors at the beginning that don't need affinity */
>  	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
> @@ -271,11 +273,10 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  		int ret;
>  
>  		ret = irq_build_affinity_masks(affd, curvec, this_vecs,
> -						curvec, node_to_cpumask, masks);
> +						curvec, masks);
>  		if (ret) {
>  			kfree(masks);
> -			masks = NULL;
> -			goto outnodemsk;
> +			return NULL;
>  		}
>  		curvec += this_vecs;
>  		usedvecs += this_vecs;
> @@ -293,8 +294,6 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	for (i = affd->pre_vectors; i < nvecs - affd->post_vectors; i++)
>  		masks[i].is_managed = 1;
>  
> -outnodemsk:
> -	free_node_to_cpumask(node_to_cpumask);
>  	return masks;
>  }
>  
> -- 
> 2.9.5
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity
  2019-01-25  9:53 ` [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity Ming Lei
@ 2019-02-07 22:21   ` Bjorn Helgaas
  2019-02-10  9:22     ` Ming Lei
  2019-02-10 16:30   ` Thomas Gleixner
  1 sibling, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2019-02-07 22:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Thomas Gleixner, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

On Fri, Jan 25, 2019 at 05:53:44PM +0800, Ming Lei wrote:
> This patch introduces callback of .setup_affinity into 'struct
> irq_affinity', so that:
> 
> 1) allow drivers to customize the affinity for managed IRQ, for
> example, now NVMe has special requirement for read queues & poll
> queues
> 
> 2) 6da4b3ab9a6e9 ("genirq/affinity: Add support for allocating interrupt sets")
> makes pci_alloc_irq_vectors_affinity() a bit difficult to use for
> allocating interrupt sets: 'max_vecs' is required to same with 'min_vecs'.

s/is required to same with/is required to be equal to/

> With this patch, driver can implement their own .setup_affinity to
> customize the affinity, then the above thing can be solved easily.

s/driver/drivers/

> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  include/linux/interrupt.h | 26 +++++++++++++++++---------
>  kernel/irq/affinity.c     |  6 ++++++
>  2 files changed, 23 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index c672f34235e7..f6cea778cf50 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -242,30 +242,38 @@ struct irq_affinity_notify {
>  };
>  
>  /**
> + * struct irq_affinity_desc - Interrupt affinity descriptor
> + * @mask:	cpumask to hold the affinity assignment
> + */
> +struct irq_affinity_desc {
> +	struct cpumask	mask;
> +	unsigned int	is_managed : 1;
> +};

I was going to comment that "managed" already has a common usage
related to the devm_*() functions, but I don't think that's what you
mean here.  But then I noticed that you're only *moving* this code,
so you couldn't change it anyway.

But I still wonder what "managed" means here.

> +
> +/**
>   * struct irq_affinity - Description for automatic irq affinity assignements
>   * @pre_vectors:	Don't apply affinity to @pre_vectors at beginning of
>   *			the MSI(-X) vector space
>   * @post_vectors:	Don't apply affinity to @post_vectors at end of
>   *			the MSI(-X) vector space
> + * @setup_affinity:	Use driver's method to setup irq vectors affinity,
> + * 			and driver has to handle pre_vectors & post_vectors
> + * 			correctly, set 'is_managed' flag correct too

s/irq vectors/irq vector/
s/correct/correctly/

In general I don't think "correctly" is very useful in changelogs
and comments.  Usually it just means "how *I* think it should be
done", but it doesn't tell anybody else exactly *how* it should be
done.

What does it mean for a driver to handle pre_vectors & post_vectors
"correctly"?  The driver's .setup_affinity() method receives an array
of struct irq_affinity; maybe it means that method should set the
cpumask for each element as it desires.  For @pre_vectors and
@post_vectors, I suppose that means their cpumask would be
irq_default_affinity?

But I guess the .setup_affinity() method means the driver would have
complete flexibility for each vector, and it could use
irq_default_affinity for arbitrary vectors, not just the first few
(@pre_vectors) and the last few (@post_vectors)?

What's the rule for how a driver sets "is_managed"?

> + * @priv:		Private data of @setup_affinity
>   * @nr_sets:		Length of passed in *sets array
>   * @sets:		Number of affinitized sets
>   */
>  struct irq_affinity {
>  	int	pre_vectors;
>  	int	post_vectors;
> +	int	(*setup_affinity)(const struct irq_affinity *,
> +				  struct irq_affinity_desc *,
> +				  unsigned int);
> +	void    *priv;
>  	int	nr_sets;
>  	int	*sets;
>  };
>  
> -/**
> - * struct irq_affinity_desc - Interrupt affinity descriptor
> - * @mask:	cpumask to hold the affinity assignment
> - */
> -struct irq_affinity_desc {
> -	struct cpumask	mask;
> -	unsigned int	is_managed : 1;
> -};
> -
>  #if defined(CONFIG_SMP)
>  
>  extern cpumask_var_t irq_default_affinity;
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index 118b66d64a53..7b77cbdf739c 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -257,6 +257,12 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	if (!masks)
>  		return NULL;
>  
> +	if (affd->setup_affinity) {
> +		if (affd->setup_affinity(affd, masks, nvecs))
> +			return NULL;
> +		return masks;
> +	}

>  	/* Fill out vectors at the beginning that don't need affinity */
>  	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
>  		cpumask_copy(&masks[curvec].mask, irq_default_affinity);
> -- 
> 2.9.5
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] genirq/affinity: remove support for allocating interrupt sets
  2019-01-25  9:53 ` [PATCH 5/5] genirq/affinity: remove support for allocating interrupt sets Ming Lei
@ 2019-02-07 22:22   ` Bjorn Helgaas
  0 siblings, 0 replies; 21+ messages in thread
From: Bjorn Helgaas @ 2019-02-07 22:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Thomas Gleixner, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

On Fri, Jan 25, 2019 at 05:53:47PM +0800, Ming Lei wrote:
> Now allocating interrupt sets can be done via .setup_affinity()
> easily, so remove the support for allocating interrupt sets.
> 
> With this change, we don't need the limit of 'minvec == maxvec'
> any more in pci_alloc_irq_vectors_affinity().
> 
> Meantime irq_create_affinity_masks() gets simplified a lot.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>	# pci/msi.c parts

> ---
>  drivers/pci/msi.c         | 14 -------------
>  include/linux/interrupt.h |  4 ----
>  kernel/irq/affinity.c     | 52 +++++++++++------------------------------------
>  3 files changed, 12 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 4c0b47867258..331483de1294 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1035,13 +1035,6 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
>  	if (maxvec < minvec)
>  		return -ERANGE;
>  
> -	/*
> -	 * If the caller is passing in sets, we can't support a range of
> -	 * vectors. The caller needs to handle that.
> -	 */
> -	if (affd && affd->nr_sets && minvec != maxvec)
> -		return -EINVAL;
> -
>  	if (WARN_ON_ONCE(dev->msi_enabled))
>  		return -EINVAL;
>  
> @@ -1093,13 +1086,6 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
>  	if (maxvec < minvec)
>  		return -ERANGE;
>  
> -	/*
> -	 * If the caller is passing in sets, we can't support a range of
> -	 * supported vectors. The caller needs to handle that.
> -	 */
> -	if (affd && affd->nr_sets && minvec != maxvec)
> -		return -EINVAL;
> -
>  	if (WARN_ON_ONCE(dev->msix_enabled))
>  		return -EINVAL;
>  
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index b820b07f3b55..a035e165f405 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -260,8 +260,6 @@ struct irq_affinity_desc {
>   * 			and driver has to handle pre_vectors & post_vectors
>   * 			correctly, set 'is_managed' flag correct too
>   * @priv:		Private data of @setup_affinity
> - * @nr_sets:		Length of passed in *sets array
> - * @sets:		Number of affinitized sets
>   */
>  struct irq_affinity {
>  	int	pre_vectors;
> @@ -270,8 +268,6 @@ struct irq_affinity {
>  				  struct irq_affinity_desc *,
>  				  unsigned int);
>  	void    *priv;
> -	int	nr_sets;
> -	int	*sets;
>  };
>  
>  #if defined(CONFIG_SMP)
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index 524fdcda9f85..e8fea65325d9 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -269,9 +269,9 @@ struct irq_affinity_desc *
>  irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  {
>  	int affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
> -	int curvec, usedvecs;
> +	int curvec;
>  	struct irq_affinity_desc *masks = NULL;
> -	int i, nr_sets;
> +	int i;
>  
>  	/*
>  	 * If there aren't any vectors left after applying the pre/post
> @@ -293,34 +293,14 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	/* Fill out vectors at the beginning that don't need affinity */
>  	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
>  		cpumask_copy(&masks[curvec].mask, irq_default_affinity);
> -	/*
> -	 * Spread on present CPUs starting from affd->pre_vectors. If we
> -	 * have multiple sets, build each sets affinity mask separately.
> -	 */
> -	nr_sets = affd->nr_sets;
> -	if (!nr_sets)
> -		nr_sets = 1;
> -
> -	for (i = 0, usedvecs = 0; i < nr_sets; i++) {
> -		int this_vecs = affd->sets ? affd->sets[i] : affvecs;
> -		int ret;
> -
> -		ret = irq_build_affinity_masks(affd, curvec, this_vecs,
> -						curvec, masks);
> -		if (ret) {
> -			kfree(masks);
> -			return NULL;
> -		}
> -		curvec += this_vecs;
> -		usedvecs += this_vecs;
> +
> +	if (irq_build_affinity_masks(affd, curvec, affvecs, curvec, masks)) {
> +		kfree(masks);
> +		return NULL;
>  	}
>  
>  	/* Fill out vectors at the end that don't need affinity */
> -	if (usedvecs >= affvecs)
> -		curvec = affd->pre_vectors + affvecs;
> -	else
> -		curvec = affd->pre_vectors + usedvecs;
> -	for (; curvec < nvecs; curvec++)
> +	for (curvec = affd->pre_vectors + affvecs; curvec < nvecs; curvec++)
>  		cpumask_copy(&masks[curvec].mask, irq_default_affinity);
>  
>  	/* Mark the managed interrupts */
> @@ -340,21 +320,13 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
>  {
>  	int resv = affd->pre_vectors + affd->post_vectors;
>  	int vecs = maxvec - resv;
> -	int set_vecs;
> +	int ret;
>  
>  	if (resv > minvec)
>  		return 0;
>  
> -	if (affd->nr_sets) {
> -		int i;
> -
> -		for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
> -			set_vecs += affd->sets[i];
> -	} else {
> -		get_online_cpus();
> -		set_vecs = cpumask_weight(cpu_possible_mask);
> -		put_online_cpus();
> -	}
> -
> -	return resv + min(set_vecs, vecs);
> +	get_online_cpus();
> +	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
> +	put_online_cpus();
> +	return ret;
>  }
> -- 
> 2.9.5
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity
  2019-02-07 22:21   ` Bjorn Helgaas
@ 2019-02-10  9:22     ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2019-02-10  9:22 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Christoph Hellwig, Thomas Gleixner, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

On Thu, Feb 07, 2019 at 04:21:30PM -0600, Bjorn Helgaas wrote:
> On Fri, Jan 25, 2019 at 05:53:44PM +0800, Ming Lei wrote:
> > This patch introduces callback of .setup_affinity into 'struct
> > irq_affinity', so that:
> > 
> > 1) allow drivers to customize the affinity for managed IRQ, for
> > example, now NVMe has special requirement for read queues & poll
> > queues
> > 
> > 2) 6da4b3ab9a6e9 ("genirq/affinity: Add support for allocating interrupt sets")
> > makes pci_alloc_irq_vectors_affinity() a bit difficult to use for
> > allocating interrupt sets: 'max_vecs' is required to same with 'min_vecs'.
> 
> s/is required to same with/is required to be equal to/
> 
> > With this patch, driver can implement their own .setup_affinity to
> > customize the affinity, then the above thing can be solved easily.
> 
> s/driver/drivers/
> 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  include/linux/interrupt.h | 26 +++++++++++++++++---------
> >  kernel/irq/affinity.c     |  6 ++++++
> >  2 files changed, 23 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> > index c672f34235e7..f6cea778cf50 100644
> > --- a/include/linux/interrupt.h
> > +++ b/include/linux/interrupt.h
> > @@ -242,30 +242,38 @@ struct irq_affinity_notify {
> >  };
> >  
> >  /**
> > + * struct irq_affinity_desc - Interrupt affinity descriptor
> > + * @mask:	cpumask to hold the affinity assignment
> > + */
> > +struct irq_affinity_desc {
> > +	struct cpumask	mask;
> > +	unsigned int	is_managed : 1;
> > +};
> 
> I was going to comment that "managed" already has a common usage
> related to the devm_*() functions, but I don't think that's what you
> mean here.  But then I noticed that you're only *moving* this code,
> so you couldn't change it anyway.
> 
> But I still wonder what "managed" means here.

'managed' irq's affinity is managed by kernel, and user space can't change
it any more via /proc/irq/...

> 
> > +
> > +/**
> >   * struct irq_affinity - Description for automatic irq affinity assignements
> >   * @pre_vectors:	Don't apply affinity to @pre_vectors at beginning of
> >   *			the MSI(-X) vector space
> >   * @post_vectors:	Don't apply affinity to @post_vectors at end of
> >   *			the MSI(-X) vector space
> > + * @setup_affinity:	Use driver's method to setup irq vectors affinity,
> > + * 			and driver has to handle pre_vectors & post_vectors
> > + * 			correctly, set 'is_managed' flag correct too
> 
> s/irq vectors/irq vector/
> s/correct/correctly/
> 
> In general I don't think "correctly" is very useful in changelogs
> and comments.  Usually it just means "how *I* think it should be
> done", but it doesn't tell anybody else exactly *how* it should be
> done.

That is one nice question.

So far there are at least two rules related with correctness:

1) 'is_managed' flag needs to be set

2) for all managed irq vectors in one interrupt set of one device,
all possible CPUs should be covered in the setup affinities, meantime
one CPU can't be allocated to two irq vectors. The interrupt set is
unique for NVMe actually now, such as, all READ queues' interrupts
belong to one set, and other queues belong to another set. For other
device, we can think there is only one set for one device.

> 
> What does it mean for a driver to handle pre_vectors & post_vectors
> "correctly"?  The driver's .setup_affinity() method receives an array
> of struct irq_affinity; maybe it means that method should set the
> cpumask for each element as it desires.  For @pre_vectors and
> @post_vectors, I suppose that means their cpumask would be
> irq_default_affinity?

So far the default affinity for pre_vectors & post_vectors is to use
irq_default_affinity, and this patch changes it to cpu_possible_mask,
and this change won't be one big deal from driver's view.

> 
> But I guess the .setup_affinity() method means the driver would have
> complete flexibility for each vector, and it could use

Yeah, together with simplification in both driver and genirq/affinity
code.

> irq_default_affinity for arbitrary vectors, not just the first few
> (@pre_vectors) and the last few (@post_vectors)?
> 
> What's the rule for how a driver sets "is_managed"?

Please see above, and I plan to enhance the rule in genirq/affinity code
if this approach may be accepted.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity
  2019-01-25  9:53 ` [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity Ming Lei
  2019-02-07 22:21   ` Bjorn Helgaas
@ 2019-02-10 16:30   ` Thomas Gleixner
  2019-02-11  3:54     ` Ming Lei
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2019-02-10 16:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Bjorn Helgaas, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

Ming,

On Fri, 25 Jan 2019, Ming Lei wrote:

> This patch introduces callback of .setup_affinity into 'struct
> irq_affinity', so that:

Please see Documentation/process/submitting-patches.rst. Search for 'This
patch' ....

> 
> 1) allow drivers to customize the affinity for managed IRQ, for
> example, now NVMe has special requirement for read queues & poll
> queues

That's nothing new and already handled today.

> 2) 6da4b3ab9a6e9 ("genirq/affinity: Add support for allocating interrupt sets")
> makes pci_alloc_irq_vectors_affinity() a bit difficult to use for
> allocating interrupt sets: 'max_vecs' is required to same with 'min_vecs'.

So it's a bit difficult, but you fail to explain why it's not sufficient.

> With this patch, driver can implement their own .setup_affinity to
> customize the affinity, then the above thing can be solved easily.

Well, I don't really understand what is solved easily and you are merily
describing the fact that the new callback allows drivers to customize
something. What's the rationale? If it's just the 'bit difficult' part,
then what is the reason for not making the core functionality easier to use
instead of moving stuff into driver space again?

NVME is not special and all this achieves is that all drivers writers will
claim that their device is special and needs its own affinity setter
routine. The whole point of having the generic code is to exactly avoid
that. If it has shortcomings, then they need to be addressed, but not
worked around with random driver callbacks.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback
  2019-01-25  9:53 ` [PATCH 4/5] nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback Ming Lei
@ 2019-02-10 16:39   ` Thomas Gleixner
  2019-02-11  3:58     ` Ming Lei
  2019-02-10 18:49   ` Thomas Gleixner
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2019-02-10 16:39 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Bjorn Helgaas, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

On Fri, 25 Jan 2019, Ming Lei wrote:

> Use the callback of .setup_affinity() to re-caculate number
> of queues, and build irqs affinity with help of irq_build_affinity().
> 
> Then nvme_setup_irqs() gets simplified a lot.

I'm pretty sure you can achieve the same by reworking the core code without
that callback.
 
> +	/* Fill out vectors at the beginning that don't need affinity */
> +	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
> +		cpumask_copy(&masks[curvec].mask, cpu_possible_mask);

cpu_possible_mask is wrong.  Why are you deliberately trying to make this
special? There is absolutely no reason to do so.

These interrupts are not managed and therefore the initial affinity has to
be irq_default_affinity. Setting them to cpu_possible_mask can and probably
will evade a well thought out default affinity mask, which was set to
isolate a set of cores from general purpose interrupts.

This is exactly the thing which happens with driver special stuff and which
needs to be avoided. There is nothing special about this NVME setup and
yes, I can see why the current core code is a bit tedious to work with, but
that does not justify that extra driver magic by any means.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback
  2019-01-25  9:53 ` [PATCH 4/5] nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback Ming Lei
  2019-02-10 16:39   ` Thomas Gleixner
@ 2019-02-10 18:49   ` Thomas Gleixner
  2019-02-11  4:09     ` Ming Lei
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2019-02-10 18:49 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Bjorn Helgaas, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

On Fri, 25 Jan 2019, Ming Lei wrote:
> +static int nvme_setup_affinity(const struct irq_affinity *affd,
> +			       struct irq_affinity_desc *masks,
> +			       unsigned int nmasks)
> +{
> +	struct nvme_dev *dev = affd->priv;
> +	int affvecs = nmasks - affd->pre_vectors - affd->post_vectors;
> +	int curvec, usedvecs;
> +	int i;
> +
> +	nvme_calc_io_queues(dev, nmasks);

So this is the only NVME specific information. Everything else can be done
in generic code. So what you really want is:

	struct affd {
		...
+		calc_sets(struct affd *, unsigned int nvecs);
		...
	}

And sets want to be actually inside of the affinity descriptor structure:

        unsigned int num_sets;
	unsigned int set_vectors[MAX_SETS];

We surely can define a sensible maximum of sets for now. If that ever turns
out to be insufficient, then struct affd might become to large for the
stack, but for now, using e.g. 8, there is no need to do so.

So then the logic in the generic code becomes exactly the same as what you
added to nvme_setup_affinity():

	if (affd->calc_sets) {
		affd->calc_sets(affd, nvecs);
	} else if (!affd->num_sets) {
		affd->num_sets = 1;
		affd->set_vectors[0] = affvecs;
	}

	for (i = 0; i < affd->num_sets; i++) {
	    ....
	}

See?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:irq/core] genirq/affinity: Move allocation of 'node_to_cpumask' to irq_build_affinity_masks()
  2019-01-25  9:53 ` [PATCH 1/5] genirq/affinity: move allocation of 'node_to_cpumask' to irq_build_affinity_masks Ming Lei
  2019-02-07 22:02   ` Bjorn Helgaas
@ 2019-02-10 19:01   ` tip-bot for Ming Lei
  1 sibling, 0 replies; 21+ messages in thread
From: tip-bot for Ming Lei @ 2019-02-10 19:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ming.lei, mingo, tglx, linux-kernel, bhelgaas, hch, sagi, hpa, axboe

Commit-ID:  347253c42d7c673aa2a659d756bc7ff893459247
Gitweb:     https://git.kernel.org/tip/347253c42d7c673aa2a659d756bc7ff893459247
Author:     Ming Lei <ming.lei@redhat.com>
AuthorDate: Fri, 25 Jan 2019 17:53:43 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 10 Feb 2019 19:53:55 +0100

genirq/affinity: Move allocation of 'node_to_cpumask' to irq_build_affinity_masks()

'node_to_cpumask' is just one temparay variable for irq_build_affinity_masks(),
so move it into irq_build_affinity_masks().

No functioanl change.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Link: https://lkml.kernel.org/r/20190125095347.17950-2-ming.lei@redhat.com

---
 kernel/irq/affinity.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 45b68b4ea48b..118b66d64a53 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -175,18 +175,22 @@ out:
  */
 static int irq_build_affinity_masks(const struct irq_affinity *affd,
 				    int startvec, int numvecs, int firstvec,
-				    cpumask_var_t *node_to_cpumask,
 				    struct irq_affinity_desc *masks)
 {
 	int curvec = startvec, nr_present, nr_others;
 	int ret = -ENOMEM;
 	cpumask_var_t nmsk, npresmsk;
+	cpumask_var_t *node_to_cpumask;
 
 	if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
 		return ret;
 
 	if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
-		goto fail;
+		goto fail_nmsk;
+
+	node_to_cpumask = alloc_node_to_cpumask();
+	if (!node_to_cpumask)
+		goto fail_npresmsk;
 
 	ret = 0;
 	/* Stabilize the cpumasks */
@@ -217,9 +221,12 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,
 	if (nr_present < numvecs)
 		WARN_ON(nr_present + nr_others < numvecs);
 
+	free_node_to_cpumask(node_to_cpumask);
+
+ fail_npresmsk:
 	free_cpumask_var(npresmsk);
 
- fail:
+ fail_nmsk:
 	free_cpumask_var(nmsk);
 	return ret;
 }
@@ -236,7 +243,6 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 {
 	int affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
 	int curvec, usedvecs;
-	cpumask_var_t *node_to_cpumask;
 	struct irq_affinity_desc *masks = NULL;
 	int i, nr_sets;
 
@@ -247,13 +253,9 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	if (nvecs == affd->pre_vectors + affd->post_vectors)
 		return NULL;
 
-	node_to_cpumask = alloc_node_to_cpumask();
-	if (!node_to_cpumask)
-		return NULL;
-
 	masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
 	if (!masks)
-		goto outnodemsk;
+		return NULL;
 
 	/* Fill out vectors at the beginning that don't need affinity */
 	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
@@ -271,11 +273,10 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 		int ret;
 
 		ret = irq_build_affinity_masks(affd, curvec, this_vecs,
-						curvec, node_to_cpumask, masks);
+						curvec, masks);
 		if (ret) {
 			kfree(masks);
-			masks = NULL;
-			goto outnodemsk;
+			return NULL;
 		}
 		curvec += this_vecs;
 		usedvecs += this_vecs;
@@ -293,8 +294,6 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	for (i = affd->pre_vectors; i < nvecs - affd->post_vectors; i++)
 		masks[i].is_managed = 1;
 
-outnodemsk:
-	free_node_to_cpumask(node_to_cpumask);
 	return masks;
 }
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity
  2019-02-10 16:30   ` Thomas Gleixner
@ 2019-02-11  3:54     ` Ming Lei
  2019-02-11 14:39       ` Bjorn Helgaas
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2019-02-11  3:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Hellwig, Bjorn Helgaas, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

Hello Thomas,

On Sun, Feb 10, 2019 at 05:30:41PM +0100, Thomas Gleixner wrote:
> Ming,
> 
> On Fri, 25 Jan 2019, Ming Lei wrote:
> 
> > This patch introduces callback of .setup_affinity into 'struct
> > irq_affinity', so that:
> 
> Please see Documentation/process/submitting-patches.rst. Search for 'This
> patch' ....

Sorry for that, because I am not a native English speaker and it looks a bit
difficult for me to understand the subtle difference.

> 
> > 
> > 1) allow drivers to customize the affinity for managed IRQ, for
> > example, now NVMe has special requirement for read queues & poll
> > queues
> 
> That's nothing new and already handled today.
> 
> > 2) 6da4b3ab9a6e9 ("genirq/affinity: Add support for allocating interrupt sets")
> > makes pci_alloc_irq_vectors_affinity() a bit difficult to use for
> > allocating interrupt sets: 'max_vecs' is required to same with 'min_vecs'.
> 
> So it's a bit difficult, but you fail to explain why it's not sufficient.

The introduced limit is that 'max_vecs' has to be same with 'min_vecs' for
pci_alloc_irq_vectors_affinity() wrt. NVMe's use case since commit
6da4b3ab9a6e9, then NVMe has to deal with irq vectors allocation failure in
the awkward way of retrying.

And the topic has been discussed in the following links:

https://marc.info/?l=linux-pci&m=154655595615575&w=2
https://marc.info/?l=linux-pci&m=154646930922174&w=2

Bjorn and Keith thought this usage/interface is a bit awkward because the passed
'min_vecs' should have avoided driver's retrying.

For NVMe, when irq vectors are run out of from pci_alloc_irq_vectors_affinity(),
the requested number has to be decreased and retry until it succeeds, then the
allocated irq vectors has to be re-distributed among the whole irq sets. Turns
out the re-distribution need driver's knowledge, that is why the callback is
introduced.

> 
> > With this patch, driver can implement their own .setup_affinity to
> > customize the affinity, then the above thing can be solved easily.
> 
> Well, I don't really understand what is solved easily and you are merily
> describing the fact that the new callback allows drivers to customize
> something. What's the rationale? If it's just the 'bit difficult' part,
> then what is the reason for not making the core functionality easier to use
> instead of moving stuff into driver space again?

Another solution mentioned in previous discussion is to split building & setting up
affinities from allocating irq vectors, but one big problem is that allocating
'irq_desc' needs the affinity mask for figuring out 'node', see alloc_descs().

> 
> NVME is not special and all this achieves is that all drivers writers will

I mean that NVMe is the only user of irq sets.

> claim that their device is special and needs its own affinity setter
> routine. The whole point of having the generic code is to exactly avoid
> that. If it has shortcomings, then they need to be addressed, but not
> worked around with random driver callbacks.

Understood.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback
  2019-02-10 16:39   ` Thomas Gleixner
@ 2019-02-11  3:58     ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2019-02-11  3:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Hellwig, Bjorn Helgaas, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

On Sun, Feb 10, 2019 at 05:39:20PM +0100, Thomas Gleixner wrote:
> On Fri, 25 Jan 2019, Ming Lei wrote:
> 
> > Use the callback of .setup_affinity() to re-caculate number
> > of queues, and build irqs affinity with help of irq_build_affinity().
> > 
> > Then nvme_setup_irqs() gets simplified a lot.
> 
> I'm pretty sure you can achieve the same by reworking the core code without
> that callback.

Could you share the idea a bit? As I mentioned, the re-distribution
needs driver's knowledge.

>  
> > +	/* Fill out vectors at the beginning that don't need affinity */
> > +	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
> > +		cpumask_copy(&masks[curvec].mask, cpu_possible_mask);
> 
> cpu_possible_mask is wrong.  Why are you deliberately trying to make this
> special? There is absolutely no reason to do so.

It is just for avoiding to export 'irq_default_affinity'.

> 
> These interrupts are not managed and therefore the initial affinity has to
> be irq_default_affinity. Setting them to cpu_possible_mask can and probably
> will evade a well thought out default affinity mask, which was set to
> isolate a set of cores from general purpose interrupts.
> 
> This is exactly the thing which happens with driver special stuff and which
> needs to be avoided. There is nothing special about this NVME setup and
> yes, I can see why the current core code is a bit tedious to work with, but
> that does not justify that extra driver magic by any means.

OK, thanks for your explanation.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback
  2019-02-10 18:49   ` Thomas Gleixner
@ 2019-02-11  4:09     ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2019-02-11  4:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Hellwig, Bjorn Helgaas, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

On Sun, Feb 10, 2019 at 07:49:12PM +0100, Thomas Gleixner wrote:
> On Fri, 25 Jan 2019, Ming Lei wrote:
> > +static int nvme_setup_affinity(const struct irq_affinity *affd,
> > +			       struct irq_affinity_desc *masks,
> > +			       unsigned int nmasks)
> > +{
> > +	struct nvme_dev *dev = affd->priv;
> > +	int affvecs = nmasks - affd->pre_vectors - affd->post_vectors;
> > +	int curvec, usedvecs;
> > +	int i;
> > +
> > +	nvme_calc_io_queues(dev, nmasks);
> 
> So this is the only NVME specific information. Everything else can be done
> in generic code. So what you really want is:
> 
> 	struct affd {
> 		...
> +		calc_sets(struct affd *, unsigned int nvecs);
> 		...
> 	}
> 
> And sets want to be actually inside of the affinity descriptor structure:
> 
>         unsigned int num_sets;
> 	unsigned int set_vectors[MAX_SETS];
> 
> We surely can define a sensible maximum of sets for now. If that ever turns
> out to be insufficient, then struct affd might become to large for the
> stack, but for now, using e.g. 8, there is no need to do so.
> 
> So then the logic in the generic code becomes exactly the same as what you
> added to nvme_setup_affinity():
> 
> 	if (affd->calc_sets) {
> 		affd->calc_sets(affd, nvecs);
> 	} else if (!affd->num_sets) {
> 		affd->num_sets = 1;
> 		affd->set_vectors[0] = affvecs;
> 	}
> 
> 	for (i = 0; i < affd->num_sets; i++) {
> 	    ....
> 	}
> 
> See?

OK, will do this way in V2, then we can avoid drivers to abuse
the callback.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity
  2019-02-11  3:54     ` Ming Lei
@ 2019-02-11 14:39       ` Bjorn Helgaas
  2019-02-11 22:38         ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2019-02-11 14:39 UTC (permalink / raw)
  To: Ming Lei
  Cc: Thomas Gleixner, Christoph Hellwig, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

On Mon, Feb 11, 2019 at 11:54:00AM +0800, Ming Lei wrote:
> On Sun, Feb 10, 2019 at 05:30:41PM +0100, Thomas Gleixner wrote:
> > On Fri, 25 Jan 2019, Ming Lei wrote:
> > 
> > > This patch introduces callback of .setup_affinity into 'struct
> > > irq_affinity', so that:
> > 
> > Please see Documentation/process/submitting-patches.rst. Search for 'This
> > patch' ....
> 
> Sorry for that, because I am not a native English speaker and it looks a bit
> difficult for me to understand the subtle difference.

I think Thomas is saying that instead of "This patch introduces
callback ...", you could say "Introduce callback of ...".

The changelog is *part* of the patch, so the context is obvious and
there's no need to include the words "This patch".

I make the same changes to patches I receive.  In fact, I would go
even further and say "Add callback .setup_affinity() ..." because "add"
means the same as "introduce" but is shorter and simpler.

Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity
  2019-02-11 14:39       ` Bjorn Helgaas
@ 2019-02-11 22:38         ` Thomas Gleixner
  2019-02-12 11:17           ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2019-02-11 22:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ming Lei, Christoph Hellwig, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

Ming,

On Mon, 11 Feb 2019, Bjorn Helgaas wrote:

> On Mon, Feb 11, 2019 at 11:54:00AM +0800, Ming Lei wrote:
> > On Sun, Feb 10, 2019 at 05:30:41PM +0100, Thomas Gleixner wrote:
> > > On Fri, 25 Jan 2019, Ming Lei wrote:
> > > 
> > > > This patch introduces callback of .setup_affinity into 'struct
> > > > irq_affinity', so that:
> > > 
> > > Please see Documentation/process/submitting-patches.rst. Search for 'This
> > > patch' ....
> > 
> > Sorry for that, because I am not a native English speaker and it looks a bit
> > difficult for me to understand the subtle difference.

Sorry I was a bit terse.

> I think Thomas is saying that instead of "This patch introduces
> callback ...", you could say "Introduce callback of ...".
> 
> The changelog is *part* of the patch, so the context is obvious and
> there's no need to include the words "This patch".
> 
> I make the same changes to patches I receive.  In fact, I would go
> even further and say "Add callback .setup_affinity() ..." because "add"
> means the same as "introduce" but is shorter and simpler.

Thanks for the explanation, Bjorn!

There is another point here. It's not only the 'This patch introduces ...'
part. It's also good practice to structure the changelog so it provides
context and reasoning first and then tells what the change is, e.g.:

   The current handling of multiple interrupt sets in the core interrupt
   affinity logic, requires the driver to do .......  This is necessary
   because ....

   This handling should be in the core code, but the core implementation
   has no way to recompute the interrupt sets for a given number of
   vectors.

   Add an optional callback to struct affd, which lets the driver recompute
   the interrupt set before the interrupt affinity spreading takes place.

The first paragraph provides context, i.e. the status quo, The second
paragraph provides reasoning what is missing and the last one tells what's
done to solve it.

That's pretty much the same for all changelogs, even if you fix a bug. Just
in the bug case, the second paragraph describes the details of the bug and
the possible consequences.

You really need to look at the changelog as a stand alone information
source. That's important when you look at a commit as an outsider or even
if you look at your own patch 6 month down the road when you already paged
out all the details.

Hope that clarifies it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity
  2019-02-11 22:38         ` Thomas Gleixner
@ 2019-02-12 11:17           ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2019-02-12 11:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Bjorn Helgaas, Christoph Hellwig, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, linux-kernel, linux-pci

Hi Thomas,

On Mon, Feb 11, 2019 at 11:38:07PM +0100, Thomas Gleixner wrote:
> Ming,
> 
> On Mon, 11 Feb 2019, Bjorn Helgaas wrote:
> 
> > On Mon, Feb 11, 2019 at 11:54:00AM +0800, Ming Lei wrote:
> > > On Sun, Feb 10, 2019 at 05:30:41PM +0100, Thomas Gleixner wrote:
> > > > On Fri, 25 Jan 2019, Ming Lei wrote:
> > > > 
> > > > > This patch introduces callback of .setup_affinity into 'struct
> > > > > irq_affinity', so that:
> > > > 
> > > > Please see Documentation/process/submitting-patches.rst. Search for 'This
> > > > patch' ....
> > > 
> > > Sorry for that, because I am not a native English speaker and it looks a bit
> > > difficult for me to understand the subtle difference.
> 
> Sorry I was a bit terse.
> 
> > I think Thomas is saying that instead of "This patch introduces
> > callback ...", you could say "Introduce callback of ...".
> > 
> > The changelog is *part* of the patch, so the context is obvious and
> > there's no need to include the words "This patch".
> > 
> > I make the same changes to patches I receive.  In fact, I would go
> > even further and say "Add callback .setup_affinity() ..." because "add"
> > means the same as "introduce" but is shorter and simpler.
> 
> Thanks for the explanation, Bjorn!
> 
> There is another point here. It's not only the 'This patch introduces ...'
> part. It's also good practice to structure the changelog so it provides
> context and reasoning first and then tells what the change is, e.g.:
> 
>    The current handling of multiple interrupt sets in the core interrupt
>    affinity logic, requires the driver to do .......  This is necessary
>    because ....
> 
>    This handling should be in the core code, but the core implementation
>    has no way to recompute the interrupt sets for a given number of
>    vectors.
> 
>    Add an optional callback to struct affd, which lets the driver recompute
>    the interrupt set before the interrupt affinity spreading takes place.
> 
> The first paragraph provides context, i.e. the status quo, The second
> paragraph provides reasoning what is missing and the last one tells what's
> done to solve it.
> 
> That's pretty much the same for all changelogs, even if you fix a bug. Just
> in the bug case, the second paragraph describes the details of the bug and
> the possible consequences.
> 
> You really need to look at the changelog as a stand alone information
> source. That's important when you look at a commit as an outsider or even
> if you look at your own patch 6 month down the road when you already paged
> out all the details.
> 
> Hope that clarifies it.

Your description about how to write changelog is really helpful and useful
for me, thanks!

Maybe you can add it into Documentation/SubmittingPatches, so that lots
of people can benefit from the guide.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-02-12 11:17 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-25  9:53 [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support allocating interrupt sets Ming Lei
2019-01-25  9:53 ` [PATCH 1/5] genirq/affinity: move allocation of 'node_to_cpumask' to irq_build_affinity_masks Ming Lei
2019-02-07 22:02   ` Bjorn Helgaas
2019-02-10 19:01   ` [tip:irq/core] genirq/affinity: Move allocation of 'node_to_cpumask' to irq_build_affinity_masks() tip-bot for Ming Lei
2019-01-25  9:53 ` [PATCH 2/5] genirq/affinity: allow driver to setup managed IRQ's affinity Ming Lei
2019-02-07 22:21   ` Bjorn Helgaas
2019-02-10  9:22     ` Ming Lei
2019-02-10 16:30   ` Thomas Gleixner
2019-02-11  3:54     ` Ming Lei
2019-02-11 14:39       ` Bjorn Helgaas
2019-02-11 22:38         ` Thomas Gleixner
2019-02-12 11:17           ` Ming Lei
2019-01-25  9:53 ` [PATCH 3/5] genirq/affinity: introduce irq_build_affinity() Ming Lei
2019-01-25  9:53 ` [PATCH 4/5] nvme-pci: simplify nvme_setup_irqs() via .setup_affinity callback Ming Lei
2019-02-10 16:39   ` Thomas Gleixner
2019-02-11  3:58     ` Ming Lei
2019-02-10 18:49   ` Thomas Gleixner
2019-02-11  4:09     ` Ming Lei
2019-01-25  9:53 ` [PATCH 5/5] genirq/affinity: remove support for allocating interrupt sets Ming Lei
2019-02-07 22:22   ` Bjorn Helgaas
2019-01-25  9:56 ` [PATCH 0/5] genirq/affinity: introduce .setup_affinity to support " Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).